US11526370B2 - Cloud resource management using machine learning - Google Patents
Cloud resource management using machine learning Download PDFInfo
- Publication number
- US11526370B2 US11526370B2 US16/297,694 US201916297694A US11526370B2 US 11526370 B2 US11526370 B2 US 11526370B2 US 201916297694 A US201916297694 A US 201916297694A US 11526370 B2 US11526370 B2 US 11526370B2
- Authority
- US
- United States
- Prior art keywords
- virtual machine
- time
- usage
- time series
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5072—Grid computing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45575—Starting, stopping, suspending or resuming virtual machine instances
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45591—Monitoring or debugging support
Definitions
- a cloud computing service provides shared computing resources to users or customers.
- the computing resources may include hardware and software resources.
- the hardware resources may include processor elements (e.g., cores of a central processing unit, graphics processing units), memory, storage, networks, etc.
- the software resources may include operating systems, database systems, applications, libraries, programs, etc.
- a cloud computing service may include multiple data centers located in various geographical locations with each data center having multiple servers.
- the cloud computing service may offer different types of service plans or subscriptions for the computing resources.
- the cloud computing service may provide a service plan that offers software as a service (SasS), platform as a service (PaaS), infrastructure as a service (IaaS) as well as provide resources to support application development.
- SaaS software as a service
- PaaS platform as a service
- IaaS infrastructure as a service
- software may be licensed on a subscription plan.
- a PaaS subscription plan may offer access to computing resources as a platform that enables customers to develop, execute, and manage applications.
- An IaaS subscription plan offers resources that enable a customer to create and deploy virtual machines.
- the subscriptions may have different payment options that are billed based on the type of computing resources needed and/or on the usage of these resources.
- the customer may be billed for the actual usage or the customer may have a flat rate for the use of its computing resources.
- a user may provision a virtual machine with an intended amount of computing resources.
- the virtual machine may be operational for a period of time and then kept idle without using any of the resources provisioned to the virtual machine. However, the customer is billed for the idle time of the virtual machine even though the virtual machine is no longer running.
- a cloud resource management system utilizes a machine learning technique to forecast when a virtual machine hosted by a cloud computing service may become idle at a future time.
- Several machine learning models are trained on historical metric data of a virtual machine over a continuous time period.
- the metric data may include CPU usage, disk I/O usage and/or network I/O usage.
- the models are tested and at least one model is selected for use in a production run. The selected model is then used to forecast a time in an immediately succeeding time period when any one or combination of metrics of the virtual machine falls below an idle threshold.
- FIG. 1 illustrates an exemplary cloud resource management system and an exemplary cloud computing service.
- FIG. 2 is a flow diagram illustrating an exemplary method used to train, through ensemble learning, several different machine learning models and to select at least one model for use in a production run.
- FIG. 3 is a flow diagram illustrating an exemplary method that uses at least one model to forecast when a virtual machine may become idle at a future time.
- FIG. 4 is a block diagram illustrating an exemplary operating environment.
- the subject matter disclosed utilizes machine learning techniques to forecast when a virtual machine hosted by a cloud computing service may become idle at a future time.
- Several machine learning models are trained on historical metric data of a virtual machine over a continuous time period. The models are tested and at least one model is selected for use in a production run. The selected model is then used to predict the CPU usage for a succeeding future time period and to automatically turn off the virtual machine when the predicted CPU usage is below an idle threshold.
- a virtual machine is a software implementation of a computer that executes programs like a physical machine or computer.
- a hypervisor or virtual machine monitor provisions or creates the virtual machine and then runs the virtual machine.
- the hypervisor provides the virtual machine with a guest operating system and manages the execution of the guest operating system.
- a customer i.e., user, developer, client
- a cloud computing service may configure the virtual machine to utilize a certain amount and type of computing resources.
- Metric data representing the resource consumption of a virtual machine at equally-spaced time intervals is collected over the course of a training period.
- Usage data representing the type of resources consumed by the virtual machine and other configuration information of the virtual machine is also collected to forecast the savings in shutting down the virtual machine while idle.
- the metric data may represent CPU usage, disk I/O usage, and network I/O usage.
- CPU usage is the CPU time as a percentage of the CPU's capacity.
- the CPU time is the amount of time that the CPU uses.
- CPU time differs from elapsed time which includes when the CPU is idle or waiting for the completion of an operation, such as an input/output (I/O) operation.
- the disk I/O usage is a measurement of the active disk I/O time.
- the active disk I/O time is the amount of time that read and write I/O operations are performed to a logical disk.
- the network I/O usage is the amount of time taken to complete network I/O operations.
- the metric data is used in an ensemble learning methodology to train multiple time series forecasting models to predict when the CPU usage of the virtual machine may fall below an idle threshold. Multiple time series forecasting models are trained since the behavior of a time series of a virtual machine is unknown.
- the models are trained using different machine learning techniques such as: autoregressive integrated moving average (AMNIA); error, trend, seasonality (ETS); Trigonometric Box-Cox transformation, ARMA errors, Trend and Seasonal components (TBATS); and a decomposable time series technique, such as Prophet.
- AMNIA autoregressive integrated moving average
- ETS error, trend, seasonality
- TATS Trigonometric Box-Cox transformation
- ARMA errors ARMA errors
- TBATS Trend and Seasonal components
- a decomposable time series technique such as Prophet.
- the model selected for a target virtual machine is then used in a subsequent production run to forecast when the target virtual machine will be idle.
- the virtual machine may be shut down temporarily at the forecasted idle time and restarted thereafter.
- the model predicts when the virtual machine will be idle in an upcoming time period based on a predicted time when the CPU usage of the virtual machine will fall below an idle threshold. In this manner, the customer of the cloud computing system saves on the cost of operating the virtual machine during the idle time.
- FIG. 1 illustrates a block diagram of an exemplary system 100 in which various aspects of the invention may be practiced.
- system 100 includes a cloud computing service 102 communicatively coupled to a cloud resource management system 104 .
- the cloud computing service 102 includes a number of computing resources that are made available to one or more customers through a network 105 .
- Examples of a cloud computing service include, without limitation, MICROSOFT AZURE® and GOOGLE CLOUD®.
- the cloud computing service 102 is composed of one or more data centers 106 .
- a data center 106 may be located in a particular geographic location.
- a data center 106 has one or more servers 108 .
- a server 108 includes a memory 110 , one or more network interfaces 112 , one or more CPUs 114 , and multiple storage devices 116 .
- the memory 110 may include one or more virtual machines 118 coupled to at least one hypervisor 120 .
- the cloud resource management system 104 is communicatively coupled to the cloud computing service 102 .
- the cloud resource management system 104 includes a VM monitor engine 122 , a machine learning engine 124 , and a forecast engine 126 .
- the VM monitor engine 122 monitors the virtual machines 118 operating on the cloud computing service 102 continuously over a time period.
- training usage data 138 and training metric data 140 are generated from the operation of a virtual machine 118 and sent to the VM monitor engine 122 (block 128 ).
- the training usage data 138 includes the type of operating system used by the virtual machine (e.g., Linux, Windows, Paas), the size of the virtual machine, the location of the data center in which the virtual machine resides, the type of virtual machine, and the type of cloud computing service.
- a cloud computing service 102 may provision a platform service with different sizes that are based on the resources needed to provision a virtual machine. For example, in the MICROSOFT AZURE® cloud computing service, there are several sizes offered to provision a virtual machine based on the number of CPU cores, the size of the memory, the amount of temporary storage, the maximum number of network interface cards (NIC) and network bandwidth.
- An extra-small size of a virtual machine consists of a single CPU core, 0.768 gigabytes of memory, 20 gigabytes of temporary storage (e.g., disk storage), and a single NIC with low network bandwidth.
- a small size may include a single CPU core, 1.75 gigabytes of memory, 225 gigabytes of temporary storage, and a single NIC with moderate network bandwidth.
- a medium size may include two CPU cores, 3.5 gigabytes of memory, 490 gigabytes of temporary storage, and a single NIC with moderate network bandwidth.
- a large size may include four CPU cores, 7 gigabytes of memory, 1000 gigabytes of temporary storage, and two NIC with high network bandwidth.
- An extra-large size may include eight CPU cores, 14 gigabytes of memory, 2040 gigabytes of temporary storage, and four NIC with high network bandwidth. Other sizes are available having a predefined amount of CPU cores, memory, temporary storage, NICs, and network bandwidth.
- the training metric data 140 is generated for each equally-spaced time interval (e.g., five-minute interval) during a training period (e.g., twenty consecutive days).
- the training metric data 140 may include for each time interval, the CPU usage during each time interval, the amount of disk usage used during each time interval, and the amount of network usage during each time interval.
- Multiple metrics are used to train the ensemble of models since a single metric may not accurately capture the behavior of the virtual machine. For instance, a user might be in browser mode while the virtual machine is executing read/write operations to disk storage. In this situation, the CPU usage is low and the disk I/O usage is high. By relying solely on the CPU usage as the only metric would erroneously represent the resource usage of the virtual machine.
- the machine learning engine 124 uses ensemble learning to train multiple models on the training usage and metric data of a virtual machine during a training period (block 130 ).
- parallel ensemble learning is used where different models are trained in parallel in order to exploit independence between the models.
- Various models are also trained since the behavior of the time series representing a virtual machine is not known before the virtual machine is monitored. The best model to represent the time series of a virtual machine will be selected based on training performance metrics for further prediction.
- the models are trained on time series forecasting techniques.
- the machine learning engine may perform ensemble learning using the following time series forecasting techniques: an autoregressive integrated moving average (AMNIA); error, trend, seasonality (ETS); Trigonometric Box-Cox transformation, ARMA errors, Trend and Seasonal components (TBATS); and Prophet.
- AMNIA autoregressive integrated moving average
- ETS error, trend, seasonality
- TATS Trigonometric Box-Cox transformation
- ARMA errors Trend and Seasonal components
- TBATS Trend and Seasonal components
- Forecasting is about predicting the future as accurately as possible using historical data and knowledge of future events. Forecasting situations differ enormous in the types of data patterns that occur over time, the time horizon, and in the factors that affect future events.
- Time series forecasting is a technique that predicts a sequence of events from a time series. A time series is an ordered sequence of data points occurring at successive equally-spaced points in time. The time series is analyzed to identify patterns with the assumption that these patterns will exist in the future.
- a time series is defined by the following factors: level; trend; seasonality; and noise.
- the level is the baseline or average value in the time series.
- a trend exists when there is an increase or decrease in the data.
- a seasonal pattern or seasonality occurs when a time series is affected by seasonal factors such as the time of the day, the day of the week, the hour of the day, etc. Noise exists when there is variability in the data that cannot be explained by the model.
- ARIMA model uses a weighted sum of recent past observations where an exponentially decreasing weight is used for the past observations. In this manner, ARIMA accounts for the growth or decline in the time series data, the rate of change of the growth/decline in the time series data and the noise between consecutive time points. ARIMA is typically used for non-stationary data. A time series is stationary if its statistical properties, such as mean and variance are constant over time. A time series is non-stationary when there is a variable variance and a changing mean. ARIMA uses differencing to transform a non-stationary time series into a stationary time series before identifying the pattern.
- ARIMA AutoRegressive Integrated Moving Average and typically represented as ARIMA(p, d, q).
- An autoregressive model uses the dependent relationship between an observation and p lagged observations (i.e., previous values).
- An integrated model uses the differencing technique to make the time series stationary by subtracting an observation from the previous time step and d is the number of times raw observations are differences.
- the moving average model specifies that an observation depends linearly on current or past residual errors and q is the order of the moving average model.
- the techniques used to generate an ARIMA model are described in Wei, W. W. S. (1979), “Some consequences of temporal aggregation in seasonal time series models”, https://www.census.gov/ts/papers/Conference1978/Wei1978.pdf, which is hereby incorporated by reference.
- the Akaike Information Criteria is used to fit the training data to obtain the estimated parameters p, d, and q for the ARIMA model.
- ETS is an acronym for Error, Trend and Seasonality.
- ETS is an exponential smoothing method to explicitly model error, trend and seasonality in a time series. Exponential smoothing uses an exponential window function to smooth a time series. This method computes a weighted average on past observations with the weights decaying exponentially as the observations get older. ETS is preferable for virtual machines that have strong seasonal patterns and is used to quickly capture the day of the week and hour of the day seasonal effects. ETS is described more fully in Hyndman, R. J., Koehler, A. B., Snyder, R. D., Grose, S., “A state space framework for automatic forecasting using exponential smoothing methods”, International Journal of Forecasting, 18, 439-454 (2002), which is hereby incorporated by reference.
- TBATS is an acronym for Trigonometric Seasonal, Box-Cox Transformation, ARIMA residuals, Trend and Seasonality.
- TBATS can model multiple seasonal effects, high-frequency seasonality and non-integer seasonality.
- a virtual machine may have multiple seasonal patterns, such as different monthly, weekly and daily seasonality, which would be more readily captured by TBATS.
- TBATS is further described in DeLivera, et al., “Forecasting time series with complex seasonal patterns using exponential smoothing”, Journal of the American Statistical Association, 106(496), 1513-1527, which is hereby incorporated by reference.
- Prophet is a time series decomposition technique that models a time series as a combination of trend, seasonality, and noise components.
- the technique does not require much prior knowledge and can automatically discover seasonal trends and other periodic usage patterns. This model is more interpretable and can better capture the predicted trends of various scales of the time series data.
- Prophet uses a Bayesian-based curve fitting method to predict time series data. A Fourier series is used to represent multi-period seasonality and Stan's Maximum A Posterior (MAP) is used to obtain the posterior distribution for the model parameters.
- MAP Stan's Maximum A Posterior
- a portion of the training data is reserved to test each of the trained models (block 130 ). At least one of the models is selected for use in a production run for a target virtual machine (block 130 ). The model having the closest forecasted results to the actual results is selected for the production run (block 130 ).
- the selected model is used by the forecast engine 126 with production usage data 142 and production metric data 144 to forecast the time when the CPU usage will be below the idle threshold (block 132 ).
- the forecast engine 126 may utilize the usage data to produce cost estimates of the savings in shutting down a virtual machine during a forecasted idle time (block 132 ).
- This forecast may be used to automatically shutdown the virtual machine at the forecasted time and to turn on the virtual machine thereafter (block 134 ).
- the forecast may be provided to the user of the virtual machine along with the estimated savings in order for the user to decide whether or not to shutdown the virtual machine (block 134 ).
- the user may direct the cloud resource management system 104 to take an appropriate action, such as, shutdown the virtual machine for a limited time span, increase usage of the virtual machine, ignore the forecast, and/or reduce the amount of resources consumed by the virtual machine (block 134 ).
- the system 100 as shown in FIG. 1 has a limited number of elements in a certain topology, it may be appreciated that the system 100 may include more or less elements in alternate topologies as desired for a given implementation.
- the cloud resource management system 104 may be incorporated into a data center 106 or part of the cloud computing service 102 .
- the machine learning engine 124 and the model 146 may be incorporated into the forecast engine 126 .
- a virtual machine is monitored during a training period to obtain metric data representing a behavior of the virtual machine (block 202 ).
- the training period is a consecutive time period in which the metric data is generated at equally-spaced time intervals.
- the training period may be twenty-eight consecutive days and the metric data is generated at every five-minute interval during the twenty-eight consecutive days.
- the metric data may include the CPU usage, disk I/O usage and network I/O usage.
- the usage data may include the sizes of the resources provisioned to the virtual machine (e.g., memory, CPUs, storage devices, NIC, etc.), the type of virtual machine, the location of the data center hosting the virtual machine and the class of the virtual machine.
- the usage and metric data that is collected is then split between training data and testing data (block 204 ).
- the split may be 50% training data and 50% testing data.
- the portions of the split may vary to suit an intended purpose.
- the training data is used to trained each of the time series forecasting models in parallel (block 206 ).
- the testing data is used to test each of the time series forecasting models (block 206 ).
- the time series forecasting models may include an ARIMA model, a TBATS model, an ETS model, and a decomposable time series model, such as Prophet.
- each of the models is tested with the test data to forecast when the CPU usage will be below an idle threshold (block 208 ).
- the forecasts from each of the models is compared with actual CPU usage results and the model having the most accurate result is selected (block 208 ).
- FIG. 3 illustrates an exemplary method for predicting the idle time of a virtual machine using a time series forecasting model.
- the cloud resource management system 104 monitors a target virtual machine during a first time period to collect production usage data and production metric data (block 302 ).
- the production usage data includes the type of operating system used by the virtual machine (e.g., Linux, Windows, Paas), the size of the virtual machine, the location of the data center in which the virtual machine resides, the type of virtual machine, and the type of cloud computing service.
- the production metric data includes the CPU usage, disk I/O usage, and network I/O usage and is collected at equally-spaced time intervals (block 304 ).
- the production metric data is a time series that is then input to the time series forecasting model to forecast when the CPU usage of the virtual machine will be below an idle threshold, such as below 5% of the CPU usage for a future time period.
- the cloud resource management system 104 may take one of several actions (block 306 ). If the user of the virtual machine has configured the virtual machine for an automatic shutdown, the system may initiate actions to automatically shut down the virtual machine for a predetermined length of time. The virtual machine may be restarted after the forecasted idle time. The system may signal the hypervisor to shut down the virtual machine for the intended time period. Alternatively, the user may be informed of the idle time and provided with a cost estimate of the savings in shutting down the virtual machine. The user may initiate actions to shut down the virtual machine, reduce resources provisioned to the virtual machine, ignore the idle time forecast, or take any other action.
- a training period is set to a 28-day consecutive time period and the testing period is set to one day immediately following the training period.
- Metric data from a virtual machine having been operational for at least 21 days during the training period is collected and used to train each of the time series forecasting models.
- One model is selected and then used to forecast if the virtual machine would be idle during what time of the next day by predicting the CPU usage of the next day.
- the time when the CPU usage is predicted to be below the idle threshold of the virtual machine, such as below 5% of the virtual machine's CPU usage, is then the forecasted idle time.
- This forecasted idle time may be used to shutdown the virtual machine for a predetermined length of time, such as 15 minutes.
- a time series model may be trained for a virtual machine with training data collected during a training period that may span one day to seven days.
- the time series model may be used predict when in the next 24 hours the CPU usage, disk I/O usage, and/or network usage may fall below an idle threshold. This forecasted time may be used to shutdown the virtual machine for the forecasted idle time and restarted thereafter.
- FIG. 4 illustrates an exemplary operating environment 400 that includes at least one computing device of the cloud computing service 402 and at least one computing device of the cloud resource management system 404 through a network 406 .
- the computing devices 402 , 404 may be any type of electronic device, such as, without limitation, a mobile device, a personal digital assistant, a mobile computing device, a smart phone, a cellular telephone, a handheld computer, a server, a server array or server farm, a web server, a network server, a blade server, an Internet server, a work station, a mini-computer, a mainframe computer, a supercomputer, a network appliance, a web appliance, a distributed computing system, multiprocessor systems, or combination thereof.
- the operating environment 400 may be configured in a network environment, a distributed environment, a multi-processor environment, or a stand-alone computing device having access to remote or local storage devices.
- the computing device of the cloud computing service 402 may include one or more processors 408 , a communication interface 410 , one or more storage devices 412 , one or more input and output devices 414 , and a memory 416 .
- a processor 408 may be any commercially available or customized processor and may include dual microprocessors and multi-processor architectures.
- the communication interface 410 facilitates wired or wireless communications between the client machines 402 and other devices.
- a storage device 412 may be a computer-readable medium that does not contain propagating signals, such as modulated data signals transmitted through a carrier wave.
- Examples of a storage device 412 include without limitation RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, all of which do not contain propagating signals, such as modulated data signals transmitted through a carrier wave.
- the input devices 414 may include a keyboard, mouse, pen, voice input device, touch input device, etc., and any combination thereof.
- the output devices 414 may include a display, speakers, printers, etc., and any combination thereof.
- the memory 416 may be any non-transitory computer-readable storage media that may store executable procedures, applications, and data.
- the computer-readable storage media does not pertain to propagated signals, such as modulated data signals transmitted through a carrier wave. It may be any type of non-transitory memory device (e.g., random access memory, read-only memory, etc.), magnetic storage, volatile storage, non-volatile storage, optical storage, DVD, CD, floppy disk drive, etc. that does not pertain to propagated signals, such as modulated data signals transmitted through a carrier wave.
- the memory 416 may also include one or more external storage devices or remotely located storage devices that do not pertain to propagated signals, such as modulated data signals transmitted through a carrier wave.
- the memory 416 may contain instructions, components, and data.
- a component is a software program that performs a specific function and is otherwise known as a module, program, engine, and/or application.
- the memory 414 may include an operating system 418 , one or more hypervisors 420 , one or more guest operating systems 422 , one or more virtual machines 424 , and other applications and data 426 .
- a computing device of the cloud resource management system 404 may include one or more processors 434 , a communication interface 438 , one or more storage devices 440 , one or more input and output devices 442 , and a memory 444 .
- a processor 434 may be any commercially available or customized processor and may include dual microprocessors and multi-processor architectures.
- the communication interface 438 facilitates wired or wireless communications between the server machine 404 and other devices.
- a storage device 440 may be a computer-readable medium that does not contain propagating signals, such as modulated data signals transmitted through a carrier wave.
- Examples of a storage device 440 include without limitation RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, all of which do not contain propagating signals, such as modulated data signals transmitted through a carrier wave.
- the input devices 442 may include a keyboard, mouse, pen, voice input device, touch input device, etc., and any combination thereof.
- the output devices 442 may include a display, speakers, printers, etc., and any combination thereof.
- the memory 444 may be any non-transitory computer-readable storage media that may store executable procedures, applications, and data.
- the computer-readable storage media does not pertain to propagated signals, such as modulated data signals transmitted through a carrier wave. It may be any type of non-transitory memory device (e.g., random access memory, read-only memory, etc.), magnetic storage, volatile storage, non-volatile storage, optical storage, DVD, CD, floppy disk drive, etc. that does not pertain to propagated signals, such as modulated data signals transmitted through a carrier wave.
- the memory 444 may also include one or more external storage devices or remotely located storage devices that do not pertain to propagated signals, such as modulated data signals transmitted through a carrier wave.
- the memory 444 may contain instructions, components, and data.
- a component is a software program that performs a specific function and is otherwise known as a module, program, and/or application.
- the memory 444 may include an operating system 446 , a machine learning engine 448 , a forecast engine 450 , training data 452 , test data 454 , production data 456 , an ARIMA model 458 , an ETS model 460 , a TBATS model 462 , a prophet model 464 , and a forecast 466 .
- the network 406 may employ a variety of wired and/or wireless communication protocols and/or technologies.
- Various generations of different communication protocols and/or technologies that may be employed by a network may include, without limitation, Global System for Mobile Communication (GSM), General Packet Radio Services (GPRS), Enhanced Data GSM Environment (EDGE), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (W-CDMA), Code Division Multiple Access 2000, (CDMA-2000), High Speed Downlink Packet Access (HSDPA), Long Term Evolution (LTE), Universal Mobile Telecommunications System (UMTS), Evolution-Data Optimized (Ev-DO), Worldwide Interoperability for Microwave Access (WiMax), Time Division Multiple Access (TDMA), Orthogonal Frequency Division Multiplexing (OFDM), Ultra Wide Band (UWB), Wireless Application Protocol (WAP), User Datagram Protocol (UDP), Transmission Control Protocol/Internet Protocol (TCP/IP), any portion of the Open Systems Interconnection (OSI) model protocols, Session Initiated Protocol/Real-Time
- the subject matter described herein is not limited to the configuration of components shown in FIG. 4 .
- the components of the computing device of the cloud computing service and the cloud resource management system may be incorporated into one computing device or cloud service.
- a system having at least one processor and a memory coupled to the at least one processor.
- the at least one processor is configured to: receive metric data of a virtual machine, the metric data including CPU usage of the virtual machine at equally-spaced time points over a first time period; train at least one time series forecasting model on the metric data for the first time period; apply the time series forecasting model to determine the CPU usage of the virtual machine at a time interval succeeding the first time period; and when the forecasted CPU usage is below a threshold, initiate actions to reduce resource consumption of the virtual machine.
- the metric data includes one or more of disk I/O usage and network I/O usage.
- the plurality of time series forecasting models includes ARIMA, ETS, TBATS, and Prophet.
- the reduction of the resource consumption of the virtual machine comprises shutting down the virtual machine.
- the at least one processor is further configured to: apply ensemble learning to train a plurality of time series forecasting models on the metric data; select one of the plurality of time series forecasting models to forecast an idle time of the virtual machine based on the CPU usage of the virtual machine; and/or train the at least one time series forecasting model with usage data of the virtual machine, the usage data including sizes of resources used to provision the virtual machine.
- a method comprises: obtaining a time series forecasting model trained to predict a future idle time of a virtual machine; receiving metric data during a production run of the virtual machine during a first time period; applying the time series forecasting model to determine the future idle time of the virtual machine; and initiating measures to shut down the virtual machine during the idle time.
- the method further comprises: determining the future idle time of the virtual machine based on monitoring CPU usage of the virtual machine at a time period immediately preceding the idle time.
- the future idle time of the virtual machine is based on monitoring disk I/O usage and network I/O usage.
- the initiation of the measures to shutdown the virtual machine includes requesting permission from a user of the virtual machine to shutdown the virtual machine.
- the time series forecasting model is at least one of ARMIA, TBATS, ETS, or a decomposable time series model.
- the future idle time is based on CPU usage forecasted to be below a threshold.
- the method further comprises: monitoring the virtual machine over a time period to obtain usage data and metric data; and training the time series forecasting model with the usage data and the metric data.
- the usage data includes sizes of resources used to provision the virtual machine.
- the metric data includes CPU usage, network I/O usage, and disk I/O usage obtained at equally-spaced time intervals. Wherein prior to initiating measures to shut down the virtual machine during the idle time, informing a user of the virtual machine of the forecasted idle time.
- a device having at least one processor and a memory coupled to the at least one processor.
- the memory includes instructions that when executed on the at least one processor performs actions that: forecast a future idle time of a virtual machine executing on a computing device, the forecast achieved through use of a time series forecasting model trained on historical metric data and usage data of the virtual machine, the historical metric data including a time series of equally-spaced data points representing a CPU usage of the virtual machine, the historical usage data including physical dimensions of resources consumed by the virtual machine, the forecast being below an idle threshold for the virtual machine; and automatically shuts down the virtual machine at the future idle time.
- the memory includes further instructions that when executed on the at least one processor performs additional actions that: applies ensemble learning to train a plurality of time series forecasting models to predict when the CPU usage of the virtual machine will be below the idle threshold.
- the plurality of time series forecasting models includes a decomposable time series model, ARIMA, TBATS, and ETS.
- the historical metric data further includes disk I/O usage and network I/O usage. Automatically shutting down the virtual machine is performed upon concurrence of a user of the virtual machine.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
Claims (19)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/297,694 US11526370B2 (en) | 2019-03-10 | 2019-03-10 | Cloud resource management using machine learning |
| PCT/US2020/016681 WO2020185329A1 (en) | 2019-03-10 | 2020-02-04 | Cloud resource management using machine learning |
| EP20709064.8A EP3938901A1 (en) | 2019-03-10 | 2020-02-04 | Cloud resource management using machine learning |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/297,694 US11526370B2 (en) | 2019-03-10 | 2019-03-10 | Cloud resource management using machine learning |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20200285503A1 US20200285503A1 (en) | 2020-09-10 |
| US11526370B2 true US11526370B2 (en) | 2022-12-13 |
Family
ID=69740851
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/297,694 Active 2039-09-27 US11526370B2 (en) | 2019-03-10 | 2019-03-10 | Cloud resource management using machine learning |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US11526370B2 (en) |
| EP (1) | EP3938901A1 (en) |
| WO (1) | WO2020185329A1 (en) |
Families Citing this family (33)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10896069B2 (en) * | 2018-03-16 | 2021-01-19 | Citrix Systems, Inc. | Dynamically provisioning virtual machines from remote, multi-tier pool |
| US11663493B2 (en) | 2019-01-30 | 2023-05-30 | Intuit Inc. | Method and system of dynamic model selection for time series forecasting |
| US11620150B2 (en) * | 2019-07-31 | 2023-04-04 | Okestro Co., Ltd. | Virtual machine management method using virtual machine deployment simulation |
| US11755372B2 (en) | 2019-08-30 | 2023-09-12 | Microstrategy Incorporated | Environment monitoring and management |
| US11714658B2 (en) * | 2019-08-30 | 2023-08-01 | Microstrategy Incorporated | Automated idle environment shutdown |
| US12423162B2 (en) * | 2019-09-16 | 2025-09-23 | Oracle International Corporation | Anomaly detection using forecasting computational workloads |
| US12423155B2 (en) * | 2019-09-16 | 2025-09-23 | Oracle International Corporation | Multi-layer forecasting of computational workloads |
| US11657302B2 (en) * | 2019-11-19 | 2023-05-23 | Intuit Inc. | Model selection in a forecasting pipeline to optimize tradeoff between forecast accuracy and computational cost |
| US11423250B2 (en) | 2019-11-19 | 2022-08-23 | Intuit Inc. | Hierarchical deep neural network forecasting of cashflows with linear algebraic constraints |
| US11494697B2 (en) * | 2020-02-27 | 2022-11-08 | Hitachi, Ltd. | Method of selecting a machine learning model for performance prediction based on versioning information |
| US11556791B2 (en) * | 2020-04-01 | 2023-01-17 | Sas Institute Inc. | Predicting and managing requests for computing resources or other resources |
| US11436000B2 (en) * | 2020-10-19 | 2022-09-06 | Oracle International Corporation | Prioritized non-active memory device update |
| US11609794B2 (en) * | 2020-11-10 | 2023-03-21 | Oracle International Corporation | Techniques for modifying cluster computing environments |
| CN112433819B (en) * | 2020-11-30 | 2024-04-19 | 中国科学院深圳先进技术研究院 | Simulation method and device for heterogeneous cluster scheduling, computer equipment and storage medium |
| CN112734492A (en) * | 2021-01-18 | 2021-04-30 | 广州虎牙科技有限公司 | Prediction model construction method, data prediction method, device, electronic equipment and readable storage medium |
| US20220229685A1 (en) * | 2021-01-21 | 2022-07-21 | Capital One Services, Llc | Application execution on a virtual server based on a key assigned to a virtual network interface |
| US11237813B1 (en) * | 2021-01-29 | 2022-02-01 | Splunk Inc. | Model driven state machine transitions to configure an installation of a software program |
| LU102509B1 (en) * | 2021-02-12 | 2022-08-17 | Microsoft Technology Licensing Llc | Multi-layered data center capacity forecasting system |
| US20220343187A1 (en) * | 2021-04-23 | 2022-10-27 | Samya.Ai Technologies Private Limited | System and method for estimating metric forecasts associated with related entities with more accuracy by using a metric forecast entity relationship machine learning model |
| US12149417B2 (en) * | 2021-07-14 | 2024-11-19 | Hughes Network Systems, Llc | Efficient maintenance for communication devices |
| CN113687867B (en) * | 2021-08-24 | 2023-12-29 | 济南浪潮数据技术有限公司 | Shutdown method, system, equipment and storage medium of cloud platform cluster |
| CN113688929B (en) * | 2021-09-01 | 2024-02-23 | 睿云奇智(重庆)科技有限公司 | Predictive model determination method, device, electronic equipment and computer storage medium |
| CN116028201A (en) * | 2021-10-26 | 2023-04-28 | 中国移动通信集团贵州有限公司 | Service system capacity prediction method and device |
| CN114518959A (en) * | 2022-02-21 | 2022-05-20 | 中国联合网络通信集团有限公司 | Distributed node resource load balancing method and device and electronic equipment |
| US12340244B2 (en) * | 2022-03-15 | 2025-06-24 | Dell Products L.P. | Device management based on degradation and workload |
| US20240004685A1 (en) * | 2022-07-01 | 2024-01-04 | Citrix Systems, Inc. | Virtual Machine Managing System Using Snapshot |
| US20240012667A1 (en) * | 2022-07-11 | 2024-01-11 | Dell Products L.P. | Resource prediction for microservices |
| US20240103925A1 (en) * | 2022-09-28 | 2024-03-28 | Oracle International Corporation | Framework for effective stress testing and application parameter prediction |
| CN115934319A (en) * | 2022-11-22 | 2023-04-07 | 上海联蔚盘云科技有限公司 | Method and device for cloud resource optimization |
| CN116415687B (en) * | 2022-12-29 | 2023-11-21 | 江苏东蓝信息技术有限公司 | Artificial intelligent network optimization training system and method based on deep learning |
| US20250004858A1 (en) * | 2023-06-30 | 2025-01-02 | International Business Machines Corporation | Reinforcement learning policy serving and training framework in production cloud systems |
| CN117812081B (en) * | 2023-12-11 | 2025-11-04 | 天翼云科技有限公司 | An automatic prediction method and cloud resource adjustment system based on attention mechanism |
| CN117648173B (en) * | 2024-01-26 | 2024-05-14 | 杭州阿里云飞天信息技术有限公司 | Resource scheduling method and device |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180349168A1 (en) * | 2017-05-30 | 2018-12-06 | Magalix Corporation | Systems and methods for managing a cloud computing environment |
| US20200104189A1 (en) * | 2018-10-01 | 2020-04-02 | Vmware, Inc. | Workload placement with forecast |
| US20200125419A1 (en) * | 2018-10-23 | 2020-04-23 | Vmware, Inc. | Anticipating future resource consumption based on user sessions |
| US20200209946A1 (en) * | 2018-12-31 | 2020-07-02 | Bmc Software, Inc. | Power management for virtual machines |
| US20200311617A1 (en) * | 2017-11-22 | 2020-10-01 | Amazon Technologies, Inc. | Packaging and deploying algorithms for flexible machine learning |
-
2019
- 2019-03-10 US US16/297,694 patent/US11526370B2/en active Active
-
2020
- 2020-02-04 WO PCT/US2020/016681 patent/WO2020185329A1/en not_active Ceased
- 2020-02-04 EP EP20709064.8A patent/EP3938901A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180349168A1 (en) * | 2017-05-30 | 2018-12-06 | Magalix Corporation | Systems and methods for managing a cloud computing environment |
| US20200311617A1 (en) * | 2017-11-22 | 2020-10-01 | Amazon Technologies, Inc. | Packaging and deploying algorithms for flexible machine learning |
| US20200104189A1 (en) * | 2018-10-01 | 2020-04-02 | Vmware, Inc. | Workload placement with forecast |
| US20200125419A1 (en) * | 2018-10-23 | 2020-04-23 | Vmware, Inc. | Anticipating future resource consumption based on user sessions |
| US20200209946A1 (en) * | 2018-12-31 | 2020-07-02 | Bmc Software, Inc. | Power management for virtual machines |
Non-Patent Citations (11)
| Title |
|---|
| "International Search Report and Written Opinion issued in PCT Application No. PCT/US20/016681", dated Apr. 29, 2020, 13 Pages. |
| Damra, Bassel, "Auto-stop your VM based on CPU utilization (Azure Automation)", Retrieved From: https://www.linkedin.com/pulse/auto-stop-your-vm-based-cpu-utilization-azure-automation-bassel-damra, Nov. 26, 2018, 8 Pages. |
| Livera, et al., "Forecasting Time Series With Complex Seasonal Patterns Using Exponential Smoothing", In Working Paper 15/09, Department of Econometrics and Business Statistics, Monash University, Oct. 28, 2010, 40 Pages. |
| Ngo, et al., "The Box-Jenkins Methodology for Time Series Models", In Proceedings of the SAS Global Forum Conference, vol. 6, Paper 454-2013, Apr. 28, 2013, pp. 1-11. |
| Nikos, "Improving your Forecasts using Multiple Temporal Aggregation", Retrieved From: https://web.archive.org/web/20190208114549/https://kourentzes.com/forecasting/2014/05/26/improving-forecasting-via-multiple-temporal-aggregation/, May 26, 2014, 11 Pages. |
| Paulraj, "A combined forecast-based virtual machine migration in cloud data centers", Elsevier Ltd (Year: 2018). * |
| Renyu Yang, "Intelligent Resource Scheduling at Scale: a Machine Learning Perspective", IEEE https://doi.org/10.1109/SOSE.2018.00025 (Year: 2018). * |
| Sharkh, "An evergreen cloud: Optimizing energy efficiency in heterogeneous cloud computing architectures", 2017 , Elsevier (Year: 2017). * |
| Sharma, Girish, "Azure Alerts—How to create alert rules in Azure monitoring", Retrieved From: https://www.youtube.com/watch?v=2NMj9EIR?los, Sep. 3, 2018, 7 Pages. |
| Taylor, et al., "Forecasting at Scale", Published by PeerJ Preprints, Sep. 27, 2017, pp. 1-25. |
| Wei, et al., "Some Consequences of Temporal Aggregation in Seasonal Time Series Models", In NBER Book Seasonal Analysis of Economic Time Series, Jan. 1978, pp. 433-448. |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2020185329A1 (en) | 2020-09-17 |
| US20200285503A1 (en) | 2020-09-10 |
| EP3938901A1 (en) | 2022-01-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11526370B2 (en) | Cloud resource management using machine learning | |
| US11645293B2 (en) | Anomaly detection in big data time series analysis | |
| US10452983B2 (en) | Determining an anomalous state of a system at a future point in time | |
| US11341097B2 (en) | Prefetching based on historical use and real-time signals | |
| EP4066118B1 (en) | Computer network with time series seasonality-based performance alerts | |
| US20200265119A1 (en) | Site-specific anomaly detection | |
| US20190311297A1 (en) | Anomaly detection and processing for seasonal data | |
| EP3089034A1 (en) | System and method for optimizing energy consumption by processors | |
| US9609074B2 (en) | Performing predictive analysis on usage analytics | |
| US20160019271A1 (en) | Generating synthetic data | |
| US20190220345A1 (en) | Forecasting workload transaction response time | |
| US11966775B2 (en) | Cloud native adaptive job scheduler framework for dynamic workloads | |
| CN113806122B (en) | Robust anomaly and change detection using sparse decomposition | |
| WO2017045472A1 (en) | Resource prediction method and system, and capacity management apparatus | |
| US20210240459A1 (en) | Selection of deployment environments for applications | |
| CN103971170A (en) | Method and device for forecasting changes of feature information | |
| WO2017040435A1 (en) | Predicting service issues by detecting anomalies in event signal | |
| RU2640637C2 (en) | Method and server for conducting controlled experiment using prediction of future user behavior | |
| US20160004566A1 (en) | Execution time estimation device and execution time estimation method | |
| CN107608781A (en) | A kind of load predicting method, device and network element | |
| US20160203038A1 (en) | Predicting a likelihood of a critical storage problem | |
| CN114911627A (en) | Method and device for adjusting number of micro-service instances | |
| JP2009193205A (en) | Automatic tuning system, automatic tuning device, and automatic tuning method | |
| CN117370065B (en) | An abnormal task determination method, electronic device and storage medium | |
| CN117193980A (en) | Task remaining duration calculation method and device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC., WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KAMATH, TANMAYEE PRAKASH;RAMANATHAN CHANDRASEKHAR, ARUN;SUBRAMANIAN, BALAN;AND OTHERS;SIGNING DATES FROM 20190305 TO 20190315;REEL/FRAME:048617/0854 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |