WO2016040699A1

WO2016040699A1 - Computing instance launch time

Info

Publication number: WO2016040699A1
Application number: PCT/US2015/049521
Authority: WO
Inventors: Anton André EICHER; Matthew James EDDEY; Richard Alan HAMMAN
Original assignee: Amazon Technologies, Inc.
Priority date: 2014-09-10
Filing date: 2015-09-10
Publication date: 2016-03-17
Also published as: CN107077385B; EP3191948A1; JP6564023B2; JP2017527037A; CN107077385A

Abstract

A technology is described for predicting a launch time for a computing instance. An example method may include receiving a request for a predicted launch time to launch a computing instance on a physical host within a computing service environment. Data associated with launch features of a computing instance may then be obtained, where the launch features may be determined to have an impact on a launch time of the computing instance on a physical host within a computing service environment. The launch features of the computing instance may then be input to a machine learning model that outputs the predicted launch time for launching the computing instance within the computing service environment.

Description

COMPUTING INSTANCE LAUNCH TIME

BACKGROUND

[0001] The advent of virtualization technologies for computing resources has provided benefits with respect to managing large-scale computing resources for many customers with diverse needs and has allowed various computing resources or computing services to be efficiently and securely shared by multiple customers. For example, virtualization technologies may allow a single physical computing machine to be shared among multiple customers by providing each customer with one or more computing instances hosted by the single physical computing machine using a hypervisor. Each computing instance may be a guest machine acting as a distinct logical computing system that provides a customer with the perception that the customer is the sole operator and administrator of a given virtualized hardware computing resource.

[0002] Launching one or more computing instances on a single physical computing machine may entail identifying available computing resources (e.g., a physical host) on which a computing instance may be loaded and executed. A time to load and launch a computing instance on a host server may vary due to various aspects of the computing environment containing a physical host and aspects of the computing instance being launched. As a result, a launch time for a computing instance may range from a few minutes to several minutes.

BRIEF DESCRIPTION OF THE DRAWINGS

[0003] FIG. 1 is a block diagram illustrating an example system for predicting a launch time of a computing instance within a computing service environment.

[0004] FIG. 2 is a block diagram that illustrates various example components included in a system for predicting a computing instance launch time.

[0005] FIG. 3 is a block diagram that illustrates an example computing service environment that includes a predicted launch time service.

[0006] FIG. 4 is a diagram illustrating an example method for configuring and training a machine learning model used to generate a predicted launch time.

[0007] FIG. 5 is a flow diagram illustrating an example method for predicting a breach of an SLA (Service Level Agreement) launch time using a predicted launch time. [0008] FIG. 6 is a flow diagram that illustrates an example method for predicting a launch time for a computing instance.

[0009] FIG. 7 is block diagram illustrating an example of a computing device that may be used to execute a method for predicting a launch time for a computing instance.

[0010] FIG. 8 is a block diagram that illustrates various components included in a system for placing a computing instance on a physical host in a computing service environment using estimated launch times according to an example of the present technology.

[0011] FIG. 9 illustrates a system and related operations for placing a computing instance on a physical host in a computing service environment using estimated launch times according to an example of the present technology.

[0012] FIG. 10 illustrates a system and related operations for using an estimated attachment time to determine placement for a computing instance in a computing service environment according to an example of the present technology.

[0013] FIG. 11 illustrates a system and related operations for placement of a computing instance in a computing service environment using estimated launch times according to an example of the present technology.

[0014] FIG. 12 is a block diagram that illustrates generating a launch time prediction model for predicting launch times for computing instances launched in a computing service environment according to an example of the present technology.

[0015] FIG. 13 is a flowchart of an example method for determining computing instance placement using estimated launch times within a computing service

environment.

[0016] FIG. 14 is a flowchart of another example method for determining computing instance placement using estimated launch times within a computing service environment.

[0017] FIG. 15 illustrates a system and related operations for using launch time predictions to organize caching of machine images in order to reduce computing instance launch times in a computing service environment according to an example of the present technology.

[0018] FIG. 16 is a block diagram that illustrates various components included in a system for using launch time predictions to organize caching of machine images in order to reduce computing instance launch times in a computing service environment according to an example of the present technology.

[0019] FIG. 17 illustrates a system and related operations for using launch time predictions to organize caching of machine images in order to reduce computing instance launch times in a computing service environment according to an example of the present technology.

[0020] FIG. 18 illustrates a system and related operations for identifying a physical host in a computing service environment for caching a machine image in order to achieve a desired launch time for launching a computing instance according to an example of the present technology.

[0021] FIG. 19 illustrates a system and related operations for caching machine images in a computing service environment in order to comply with a service level agreement (SLA) for the computing service environment according to an example of the present technology.

[0022] FIG. 20 is a block diagram that illustrates generating a launch time prediction model for predicting launch times for computing instances launched in a computing service environment according to an example of the present technology.

[0023] FIG. 21 is a flowchart of an example method for reducing computing instance launch times.

[0024] FIG. 22 is a flowchart of another example method for reducing computing instance launch times.

DETAILED DESCRIPTION

[0025] A technology is described for determining a predicted launch time for a computing instance within a computing service. In one example of the technology, in response to a request for a predicted launch time (e.g., a time to launch a computing instance on a physical host within a computing service), launch features associated with launching the computing instance on a physical host may be input into a machine learning model that outputs a predicted launch time. The launch features used as input to the machine learning model may be launch features that have been determined to have an impact on an amount of time in which the computing instance is launched on the physical host. As referred to in this disclosure, a computing instance may be a virtual machine (e.g., an instance of a software implementation of a computer) that executes applications like a physical machine. A computing service may be a network accessible service that provides customers with network accessible computing instances.

[0026] The machine learning model used to generate a predicted launch time may be trained using features that represent launch metrics from previous computing instance launches. The features used to train the machine learning model may be features determined to have an impact on the amount of time in which to launch a computing instance. In one example configuration, training of the machine learning model may be performed offline (e.g., in a non-production environment) using features extracted from historical launch metrics (e.g., weekly using the previous week's data). In another example configuration, the machine learning model may be trained while online (e.g., in a production environment) using features extracted from recent launch metrics.

[0027] A launch time for a computing instance, in one example, may include executing service calls to setup computing instance resources (e.g., storage and network interfaces), selecting a physical host for the computing instance and creating the computing instance on the physical host. The launch time may vary based on a launch configuration for the computing instance. Therefore, a computing service provider may have difficulty in providing an expectation time frame for when a particular computing instance will be available for use. As a result of this technology, a computing service provider may obtain a predicted launch time that may then be used for a number of purposes. For example, the computing service provider may provide a customer with an estimate of when a computing instance may be available for use, determine whether an SLA (Service Level Agreement) time may be met, suggest launch configurations that may result in a faster launch time, as well as a number of other purposes.

[0028] FIG. 1 is a diagram illustrating a high level example of a system 100 that may be used to predict a launch time for a computing instance 112 within a computing service environment 108. The system 100 may include a number of physical hosts 106 that execute computing instances 112 via an instance manager 110 (e.g., a hypervisor) and a server 114 that executes a machine learning model 116. In one example configuration, the server 114 may be in communication with a number of data sources from which data for launch features 102 may be obtained (e.g., training data and data associated with a launch request). The machine learning model 116 may be trained using historical training data, after which the machine learning model 116 may generate predicted launch times for computing instances 112 by using launch features 102 for a computing instance launch to determine a predicted launch time for the computing instance launch.

[0029] As an illustration, a server 114 executing a machine learning model 116

(e.g., a random forest regression model) that has been previously trained may receive a request for a predicted launch time. A predicted launch time may be the time between receiving a launch request (e.g., a computing instance state is "pending") and the start of a computing instance boot (e.g., the computing instance state is "running"). The request for the predicted launch time may reference a launch configuration used to identify launch features 102 associated with a launch request. The launch features 102 identified may be used by the machine learning model 116 to determine a predicted launch time for a computing instance 112. As an illustration, a launch request (1) may be sent to a control plane 104 of a computing service requesting that a computing instance 112 be launched. Upon receiving the launch request, a launch request configuration may be generated that specifies various parameters for launching the computing instance 112. For example, a launch request configuration may specify a computing instance type for the computing instance 112 (e.g., micro, small medium, large, etc. and general purpose, memory intensive, etc.), a machine image for the computing instance 112, a network type to associate with the computing instance 112, a storage volume to attach to the computing instance 112, a physical host 106 selected to host the computing instance 112, as well as other specifications.

[0030] The control plane 104 may then make a request (2) to the server 114 hosting the machine learning model 116 for a predicted launch time. Using information included in the launch request configuration, launch features identified as impacting a launch time may be collected and the launch features may then be provided to the machine learning model 116. As an illustration, a launch request configuration may be referenced to obtain information about a computing instance 112 to be launched, accessories to be attached to the computing instance 112 and information about a physical host 106 that will host the computing instance 112. The information from the launch request configuration may then be used to identify launch features, such as a machine image and a kernel image used to create the computing instance 112, an operating system and networking type, a geographic region where the physical host 106 is to be located, a maximum number of computing instances 112 that the physical host 106 is capable of executing, etc. The launch features 102 identified using information from the launch request configuration may then be provided as input to the machine learning model 116, which may then output (3) a predicted launch time for the computing instance 112.

[0031] The predicted launch times generated by the machine learning model 116 may be used for any purpose. For example, a predicted launch time may be a factor used in analysis to improve computing instance launch times, a predicted launch time may be used to determine the physical host 106 placement of a computing instance 112, a predicted launch time may be used to determine SLA (Service Level Agreement) launch times provided to customers, or a predicted launch time may be a factor in advising customers of computing instance configurations that result in faster launch times. As an illustration of utilizing a predicted launch time, an SLA launch time (e.g., an agreement between a computing service provider and a customer of a time in which to launch a computing instance 112) may be compared with a predicted launch time for a computing instance 112 to determine whether the SLA launch time is likely to be met. As such, a computing service provider and/or a customer may be notified that an SLA launch time for a computing instance 112 is likely to be breached, which may allow the computing service provider and/or the customer to take action in response to the notification.

[0032] Prior to placing the machine learning model 1 16 in a production environment where the machine learning model 116 receives requests for predicted launch times, the machine learning model may be trained in order to predict launch times for various computing instance launch configurations. In one example configuration, the machine learning model 116 may be trained using features that have been determined to have an impact on a launch time of a computing instance 112 within a computing service environment 108. In making a determination of which features have an impact on a launch time of a computing instance 112, analysis of computing instance launches may be performed to identify features correlated with or associated with launching the computing instance 112. As an illustration, a launch of a computing instance 112 may involve the steps of executing service calls to setup computing instance resources (e.g., storage and networking interfaces) for the computing instance 112, selecting a physical host 106 for the computing instance 112 (e.g., location) and creating the computing instance 112 on the physical host.

[0033] Analyzing the steps of the computing instance launch may identify features associated with launching a computing instance 112. For example, features associated with setting up the computing instance resources, features associated with selecting a physical host 106 and features associated with the configuration of the computing instance 112 (e.g., a machine image used to create the computing instance 112). Those features identified may then be sorted or ranked according to an impact that the features have on the launch time. For example, the features may be ranked according to features that have the greatest impact on the launch time, and those features having the greatest impact on the launch time may receive a higher ranking as compared to those features having little impact the launch time. Those features having higher rankings may be selected and used when determining a predicted launch time.

[0034] In another example, features determined to have an impact on a launch time of a computing instance 112 may be selected from a number of feature categories. Illustratively, feature categories may include machine image features, physical host features and customer configuration features (e.g., features of a launch configuration that are within a customer's control to modify). Features from these categories may be selected and used in determining a predicted launch time.

[0035] Feature data for those features selected as having an impact on the launch time of a computing instance 112 may be retrieved from a respective data source (e.g., active training data or historical training data) and used to train the machine learning model 116. The feature data may be, for example, launch metrics from previous computing instance launches within the computing service 108. The feature data may in some examples be transformed into a reduced representation set of features (e.g., a features vector) when feature data is redundant or large. Further, feature data may be normalized prior to training the machine learning model 116.

[0036] In one example configuration, the machine learning model 116 may be trained offline (e.g., prior to placing the machine learning model 116 into production) using historical training data (e.g., archived data associated with launching computing instances 112 within the computing service 108). After training the machine learning model 116 using the historical training data, the machine learning model 116 may be placed online (e.g., in a production environment) where the machine learning model 116 may process requests for predicted launch times. In some examples, periodically, the machine learning model 116 may be taken offline and retrained using historical training data that has accumulated since the last time the machine learning model 116 was trained.

[0037] In another example configuration, the machine learning model 116 may be initially trained using historical training data and placed in production where the machine learning model 116 may process requests for predicted launch times.

Subsequently, the machine learning model 116 may be retrained while in production using active training features (e.g., recent feature data associated with launching computing instances 112 within the computing service environment 108). For example, feature data accumulated over the past number of minutes, hours or days may be used to retrain or further refine the training of the machine learning model 116. A feature data set accumulated within a relatively short time span may be small enough that the machine learning model 116 may be retrained within a short time period (e.g., minutes) without having to take the machine learning model 116 out of production.

[0038] FIG. 2 illustrates components of an example system 200 on which the present technology may be executed. The system 200 may include a computing service environment 202 that may be accessible to a number of customer devices 228 via a network 226. The computing service 202 may provide customers with network accessible services, such as computing instances that execute on physical hosts 236. Included in the computing service environment 202 may be a server 204 that hosts a launch time prediction module 218 that may be used to generate predicted launch times for computing instances launched on the physical hosts 236. In addition to the launch time prediction module 218, the server 204 may contain a training module 222, a launch feature module 220 and one or more data stores 206 having data that may be accessible to the modules contained on the server 204.

[0039] In one example configuration, the launch time prediction module 218 may be configured to generate predicted launch times using a machine learning model. The launch time prediction module 218 may provide predicted launch times for computing instances placed on physical hosts 236 located within a particular portion of the computing service environment 202. For example, as illustrated in FIG. 2, the launch time prediction module 218 may be executed within the computing service environment 202 and may provide predicted launch times for computing instances launched in the computing service environment 202. In another example configuration, the launch time prediction module 218 may be external to any computing services and may receive requests for predicted launch times from any number of computing services by way of a network.

[0040] Examples of machine learning models that may be used by the launch time prediction module 218 to predict a launch time may include regression models, such as a random forest model, extremely randomized trees model, an AdaBoost model, a stochastic gradient descent model, a support vector machine model, as well as other types of machine learning models not specifically mentioned here.

[0041] A training module 222 may be configured to obtain features from various data sources that are then used to train a machine learning model used by the launch time prediction module 218. In one example, feature and training data may be retrieved from a data warehouse 224. The feature data may be launch metrics from previous computing instance launches within the computing service 202 that have been stored to the data warehouse 224. Illustratively, an information management service 238 may push (e.g., upload) launch related data to the data warehouse 224 making the data accessible to the training module 222. Data retrieved from the data warehouse 224 may be recent data (e.g., seconds, minutes or hours old) or historical data (e.g., days, weeks or months old) associated with computing instance launches.

[0042] Feature data retrieved from the data warehouse 224 may align with launch features 208 determined to have an impact on a launch time of a computing instance. Illustratively, analysis may be performed to determine which launch features 208 impact a launch time and a query may then be constructed that selects feature data for the launch features 208 from the data warehouse 224. In some examples, feature data for the launch features 208 may be processed and summarized when the feature data may be large or redundant. For example, feature data may be processed into a reduced representation set of launch features (e.g., features vector). Having obtained the launch features 208, the machine learning model may then be trained using the launch features 208.

[0043] As described earlier, the machine learning model may be initially trained using historical data and then placed in production where the machine learning model may provide a predicted launch time according to an on demand basis. The training module 222 may be configured to obtain historical data for launch features 208 from a data warehouse 224 and provide the historical data to the machine learning model. The historical data may be used to initially train the machine learning model. Subsequent training of the machine learning model may be performed by taking the machine learning model out of production (e.g., offline) and training the machine learning model using historical data (e.g., data from the previous day, week, month, etc.). Alternatively, subsequent training may be performed while the machine learning model is in production (e.g., online) using recent data (e.g., data from the previous minutes, hours, day, etc.). [0044] The launch feature module 220 may be configured to obtain launch features 208 associated with a request for a predicted launch time. The launch features 208 obtained may then be provided as input to the machine learning model. As an illustration, a request to launch a computing instance (e.g., via a customer device 228) may be received by a control plane 240 for a computing service 202. A launch request may be for a single computing instance or any number of computing instances (e.g., tens, hundreds or thousands of computing instances). Upon receiving the launch request, a launch configuration may be determined for the computing instance that specifies, among other things, machine image features, physical host features and customer configuration features (e.g., storage devices, network types, geographic region, etc.). The launch configuration (or a reference to the launch configuration) may then be included in a request for a predicted launch time.

[0045] Upon the server 204 receiving a request for a predicted launch time, the launch configuration may be provided to the launch feature module 220, whereupon the launch configuration may be evaluated and data for launch features 208 corresponding to the launch configuration may be collected. Based on the specifications of the launch configuration, data for launch features 208 may then be obtained.

[0046] Data collected for the launch features 208 may be provided to the launch time prediction module 218 and input to a machine learning model. The launch time prediction module 218 may then generate a predicted launch time via an algorithm that determines a predicted launch time by evaluating the launch features 208 provided to the launch time prediction module 218. As one example, a machine learning model used by the launch time prediction module 218 may comprise a number of decision trees where launch features 208 are input into the decision trees and using regression, a predicted launch time is calculated from the output of the decision trees. The predicted launch time generated by the machine learning model may then be used for various purposes associated with a computing service 202 as described earlier.

[0047] A physical host 236 included in the system 200 may be a server computer configured to execute an instance manager (i.e., a hypervisor, virtual machine monitor (VMM), or another type of program) that manages multiple computing instances on a single physical host 236. The physical hosts 236 may be located in data centers within various geographical regions 210. As a result, launch times for computing instances may be influenced based on the geographical region 210 of a physical host 236 selected to host a computing instance. Also, a launch time may be influenced by other attributes of a physical host 236, such as architecture, brand, etc.

[0048] A machine image 216 may be a pre-configured virtual machine image

(e.g., a virtual appliance) that may be executed by an instance manager. A machine image 216 may include a machine executable package for a computing instance that may include an operating system, an application server and various applications, any of which may influence a launch time of a computing instance. Further the machine image 216 may include mappings to storage volumes that attach to a corresponding computing instance when the computing instance is launched.

[0049] Illustratively, machine images 216 may be stored in block level storage volumes or in a network file storage service. The storage location of a machine image 216 may influence a launch time of a computing instance. For example, when storing a machine image 216 in a network file storage service, the machine image 216 may be compressed in order to facilitate transferring of the machine image 216 over a network. As a result, after transferring the machine image 216 to a physical host 236 selected to host a computing instance, the further operation of decompressing the machine image 216 may increase a launch time of the computing instance.

[0050] The various processes and/or other functionality contained within the system 200 may be executed on one or more processors 230 that are in communication with one or more memory modules 232. The system 200 may include a number of computing devices (e.g., physical hosts 236 and servers 204) that are arranged, for example, in one or more server banks or computer banks or other arrangements.

[0051] The term "data store" may refer to any device or combination of devices capable of storing, accessing, organizing and/or retrieving data, which may include any combination and number of data servers, relational databases, object oriented databases, cluster storage systems, data storage devices, data warehouses, flat files and data storage configuration in any centralized, distributed, or clustered environment. The storage system components of the data store may include storage systems such as a SAN (Storage Area Network), cloud storage network, volatile or non-volatile RAM, optical media, or hard-drive type media. The data store may be representative of a plurality of data stores as can be appreciated.

[0052] In some examples, a customer may utilize a customer device 228 to request a launch of a computing instance and thereafter access the computing instance. A customer device 228 may include any device capable of sending and receiving data over a network 226. A customer device 228 may comprise, for example a processor-based system such as a computing device.

[0053] The network 226 may include any useful computing network, including an intranet, the Internet, a local area network, a wide area network, a wireless data network, or any other such network or combination thereof. Components utilized for such a system may depend at least in part upon the type of network and/or environment selected. Communication over the network may be enabled by wired or wireless connections and combinations thereof.

[0054] FIG. 2 illustrates that certain processing modules may be discussed in connection with this technology and these processing modules may be implemented as computing services. In one example configuration, a module may be considered a service with one or more processes executing on a server or other computer hardware. Such services may be centrally hosted functionality or a service application that may receive requests and provide output to other services or consumer devices. For example, modules providing services may be considered on-demand computing that are hosted in a server, virtualized service environment, grid or cluster computing system. An API may be provided for each module to enable a second module to send requests to and receive output from the first module. Such APIs may also allow third parties to interface with the module and make requests and receive output from the modules. While FIG. 2 illustrates an example of a system that may implement the techniques above, many other similar or different environments are possible. The example environments discussed and illustrated above are merely representative and not limiting.

[0055] FIG. 3 is a block diagram illustrating an example computing service environment 300 that may be used to execute and manage a number of computing instances 304a-d. In particular, the computing service environment 300 depicted illustrates one environment in which the technology described herein may be used. The computing service environment 300 may be one type of environment that includes various virtualized service resources that may be used, for instance, to host computing instances 304a-d.

[0056] The computing service environment 300 may be capable of delivery of computing, storage and networking capacity as a software service to a community of end recipients. In one example, the computing service environment 300 may be established for an organization by or on behalf of the organization. That is, the computing service environment 300 may offer a "private cloud environment." In another example, the computing service environment 300 may support a multi -tenant environment, wherein a plurality of customers may operate independently (i.e., a public cloud environment). Generally speaking, the computing service environment 300 may provide the following models: Infrastructure as a Service ("IaaS"), Platform as a Service ("PaaS"), and/or Software as a Service ("SaaS"). Other models may be provided. For the IaaS model, the computing service environment 300 may offer computers as physical or virtual machines and other resources. The virtual machines may be run as guests by a hypervisor, as described further below. The PaaS model delivers a computing platform that may include an operating system, programming language execution environment, database, and web server.

[0057] Application developers may develop and run their software solutions on the computing service platform without incurring the cost of buying and managing the underlying hardware and software. The SaaS model allows installation and operation of application software in the computing service environment 300. End customers may access the computing service environment 300 using networked client devices, such as desktop computers, laptops, tablets, smartphones, etc. running web browsers or other lightweight client applications, for example. Those familiar with the art will recognize that the computing service environment 300 may be described as a "cloud" environment.

[0058] The particularly illustrated computing service environment 300 may include a plurality of physical hosts 302a-d. While four physical hosts are shown, any number may be used, and large data centers may include thousands of physical hosts 302a-d. The computing service environment 300 may provide computing resources for executing computing instances 304a-d. Computing instances 304a-d may, for example, be virtual machines. A virtual machine may be an instance of a software implementation of a machine (i.e. a computer) that executes applications like a physical machine. In the example of a virtual machine, each of the physical hosts 302a-d may be configured to execute an instance manager 308a-d capable of executing the instances. The instance manager 308a-d may be a hypervisor, virtual machine monitor (VMM), or another type of program configured to enable the execution of multiple computing instances 304a-d on a single physical host. Additionally, each of the computing instances 304a-d may be configured to execute one or more applications. [0059] One or more server computers 314 and 316 may be reserved to execute software components for managing the operation of the computing service environment 300 and the computing instances 304a-d. For example, a server computer 314 may execute a predicted launch time service that may respond to requests for a predicted launch time for a computing instance launched on a physical host 302a-d.

[0060] A server computer 316 may execute a management component 318. A customer may access the management component 318 to configure various aspects of the operation of the computing instances 304a-d purchased by a customer. For example, the customer may setup computing instances 304a-d and make changes to the configuration of the computing instances 304a-d.

[0061] A deployment component 322 may be used to assist customers in the deployment of computing instances 304a-d. The deployment component 322 may have access to account information associated with the computing instances 304a-d, such as the name of an owner of the account, credit card information, country of the owner, etc. The deployment component 322 may receive a configuration from a customer that includes data describing how computing instances 304a-d may be configured. For example, the configuration may include an operating system, provide one or more applications to be installed in computing instances 304a-d, provide scripts and/or other types of code to be executed for configuring computing instances 304a-d, provide cache logic specifying how an application cache should be prepared, and other types of information. The deployment component 322 may utilize the customer-provided configuration and cache logic to configure, prime, and launch computing instances 304a- d. The configuration, cache logic, and other information may be specified by a customer accessing the management component 318 or by providing this information directly to the deployment component 322.

[0062] Customer account information 324 may include any desired information associated with a customer of the multi-tenant environment. For example, the customer account information may include a unique identifier for a customer, a customer address, billing information, licensing information, customization parameters for launching instances, scheduling information, etc. As described above, the customer account information 324 may also include security information used in encryption of

asynchronous responses to API requests. By "asynchronous" it is meant that the API response may be made at any time after the initial request and with a different network connection.

[0063] A network 310 may be utilized to interconnect the computing service environment 300, the physical hosts 302a-d and the server computers 316. The network 310 may be a local area network (LAN) and may be connected to a Wide Area Network (WAN) 312 or the Internet, so that end customers may access the computing service environment 300. The network topology illustrated in FIG. 3 has been simplified, many more networks and networking devices may be utilized to interconnect the various computing systems disclosed herein.

[0064] Moving now to FIG. 4, a diagram illustrates an example method 400 for configuring and training a machine learning model 416 used to generate a predicted launch time. As in block 406, launch feature selection may be performed by analyzing various computing instance launches to determine launch features that have an impact on a computing instance launch time. For example, various features of launching a computing instance on a physical host within a computing service environment, where the features are capable of being observed, may be identified.

[0065] Examples of launch features may include, but are not limited to: a number of contending computing instances on a physical host, a number of running computing instances on a physical host, , a data store type containing a machine image used to create a computing instance, a kernel image used by a computing instance, an architecture of a physical host, a virtualization type of a computing instance, a maximum number of computing instances that a physical host is capable of hosting, a percentage of occupancy of a physical host by computing instances at a start of a computing instance launch, a geographical region where a physical host is located, a hardware type of a physical host, a hardware vendor of a physical host, and an operating system, networking type, data store and size of a computing instance.

[0066] Launch features determined to have an impact on a launch time of a computing instance may be categorized. For example, categories of launch features may be based on various aspects of a computing instance launch. As an illustration, launch features may be categorized into machine image launch features, physical host launch features and customer configuration launch features.

[0067] In one example, identified launch features may be sorted or ranked according to an impact of a launch feature on a computing instance launch time and those launch features having the greatest impact on launch time may be selected as features to be used for predicting launch times. For example, launch features may be analyzed to determine a percentage of contribution that an individual launch feature has on a launch time. Launch features identified as having the greatest contribution on a launch time may be selected as input to a machine learning model. It should be noted that any number of launch features may be selected and the selection of the launch features may not be limited to just those launch features having the greatest impact on a launch time.

[0068] Having identified the launch features, launch feature data 402 for the launch features may then be obtained from data sources containing data associated with the launch features. As illustrated, launch feature data 402 may be obtained from a data store containing, for example, computing service management data, inventory data (e.g., physical host information), as well as other data associated with a computing service. The launch feature data 402 may be normalized enabling launch feature data 402 obtained from different data sources to be input into the machine learning model 416. The launch feature data 402 may be divided into training data 410, cross validation data 412 and test data 414. For example, a percentage of the launch feature data 402 may be randomly selected as test data 414 and cross validation data 412, and the remaining launch feature data 402 may be used as training data 410 to train the machine learning model 416.

[0069] The machine learning model 416 may be selected from among any available machine learning algorithm. In one example, a number of regression machine learning models may be tested to determine a machine learning model that provides an acceptable approximation of a launch time. One aspect of generating machine learning models may be, as in block 408, performing a parameter value search for machine learning parameters that result in a goodness-of-fit of the machine learning model 416 to the launch features. Machine learning parameters (i.e., parameters used to configure a machine learning model 416, such as setting a depth of a decision tree) may affect how a machine learning model 416 fits the training data 410. In one example, grid search or a gradient descent algorithm may be used to perform a parameter value search. In another example, an evolutionary algorithm (e.g., a distributed genetic algorithm), a swarm algorithm (e.g., particle swarm optimization), simulated annealing or like algorithms may be used when a parameter space of a machine learning model 416 may be too large to perform a thorough parameter value search. [0070] After selecting a machine learning model 416, the machine learning model 416 may be trained using the training data 410. The cross validation data 412 and the test data 414 may then be run through the machine learning model 416 to test whether the outputs of the machine learning model are representative for additional historical cases. Thereafter, as in block 418, data analysis may be performed to determine how well the machine learning model 416 was able to predict a launch time compared to an actual launch time. After testing two or more machine learning models 416, as in block 420, the results of the machine learning models 416 may be compared to identify the better performing machine learning model 416, which may then be selected and placed in a production environment.

[0071] FIG. 5 is a flow diagram illustrating one example of a method 500 in which a predicted launch time may be used. The example method 500 illustrated is for predicting a likely breach of an SLA launch time using a predicted launch time. An SLA launch time may be, in one example, a launch time for a computing instance that a computing service provider has agreed to provide as part of service contract. As such, a computing service provider may wish to be notified that an SLA will likely be breached prior to an actual breach of the SLA launch time, allowing the computing service provider to act accordingly.

[0072] Beginning in block 502, a launch request may be received requesting that one or more computing instances be launched within a computing service. For example, the request may be made by a customer wishing to launch one computing instance or a group of computing instances within the computing service environment. Upon receiving the launch request, a launch service may identify a launch configuration for the one or more computing instances to be launched.

[0073] As in block 504, an SLA associated with the customer making the launch request may be identified. The SLA may, among other things, specify an SLA launch time for a computing instance. Illustratively, the SLA launch time may be a time between a computing service receiving a launch request to a time where the computing instance is running (e.g., starting the boot up process). Thus, a customer making a launch request may expect that a computing instance will be ready within the SLA launch time.

[0074] After receiving the launch request and identifying the launch configuration and SLA launch time, as in block 506, a predicted launch time for the computing instance may be obtained. For example, a request may be made to a predicted launch time service that generates a predicted launch time as described earlier. As an illustration, the request for the predicted launch time may include a launch configuration or a reference to a launch configuration for one or more computing instances. The predicted launch time service may then generate a predicted launch time for the one or more computing instances by identifying launch features based at least in part on the launch configuration and inputting the launch features into a machine learning model that outputs a predicted launch time.

[0075] As in block 508, the predicted launch time may then be compared with the

SLA time to determine whether, as in block 510, the predicted launch time is greater than the SLA launch time. The comparison of the predicted launch time and the SLA launch time may provide an indication of whether the SLA launch time is likely be achieved or breached.

[0076] In a case where the predicted launch time is not greater than the SLA launch time, as in block 514, the one or more computing instances may be launched. In a case where the predicted launch time is greater than the SLA time, as in block 512, a predetermined action may be performed in response to a potential SLA launch time breach. One example of a predetermined action may include notifying a computing service operator and/or a customer that the SLA launch time is likely not going to be achieved. As such, the computing service operator and/or the customer may attempt to mitigate or prevent the likely SLA launch time breach by performing actions that may increase the launch time. For example, a computing service provider may remove a physical host that is causing increased launch times from a group of physical hosts providing computing capacity. Alternatively, or in addition, a customer may be advised to modify those aspects of a launch configuration for a computing instance that may be within the customer's control, as well as other actions that a computing service operator and/or a customer may perform that are not specifically described here.

[0077] In one example configuration, upon a determination that an SLA launch time will likely be breached, a computing process may analyze the state of a computing service environment in which the computing instance is to be launched to determine whether an action preventing a breach of the SLA launch time can be performed. As one example of an action that may be performed, available computing capacity may be analyzed to determine whether adding additional physical hosts that increase computing capacity may increase launch times. For instance, a group of physical hosts may provide available computing capacity to host a plurality of computing instances. The group of physical hosts may be analyzed to determine how many computing instances the group of physical hosts may be capable of hosting and to determine how many computing instances the group of physical hosts is currently hosting (e.g., running computing instances). Based on the results of the analysis, additional physical hosts may be added to the group of physical hosts to increase available computing capacity.

[0078] As another example of an action that may be performed in response to a likely SLA launch time breach, individual physical hosts included in a group of physical hosts providing computing capacity may be analyzed to determine whether a physical host may be negatively affecting launch times. As a specific example, an overloaded physical host included in a group of physical hosts may affect launch times due to a number of computing instances being launched simultaneously on the overloaded physical host. For example, the overloaded physical host may appear to have available computing capacity to host a computing instance, but due to the number of computing instance launches that the overloaded physical host is processing, a launch time for a computing instance on the overloaded physical host may exceed an SLA launch time. As such, the overloaded physical host may be removed from the group of physical hosts that are considered available to host a computing instance. Specifically, prior to generating a second predicted launch time for a computing instance (e.g., because the first predicted launch time included the overloaded physical host), the overloaded physical host may be removed from the available computing capacity. The second predicted launch time may then be generated, which may result in prediction of a faster launch time as compared to the first predicted launch time based on available computing capacity that included the overloaded physical host.

[0079] As yet another example of an action that may be performed in response to a likely SLA launch time breach, a launch configuration for a computing instance may be analyzed to determine whether changes to the launch configuration may result in an increased launch time. As an illustration, a launch configuration may specify parameters and computing resources that are used to launch a computing instance. These parameters and computing resources may affect a predicted launch time for the computing instance. As such, the launch configuration may be analyzed to determine whether changes to the launch configuration may result in a predicted launch time that does not breach the SLA launch time. As a specific example, a launch configuration may specify a geographic region in which to launch a computing instance. Analysis may be performed to determine whether launching the computing instance in a different geographic region would result in a better predicted launch time. In a case where analysis determines that a different geographic region may result in a better predicted launch time, the launch configuration may be modified to include the different geographic region.

[0080] Alternatively, or in addition to the operations described above, a feature

(e.g., an SLA breach feature) representing the SLA launch time breach may be provided as input to a machine learning classification model that outputs a classification indicating whether a computing instance launch may breach an SLA launch time. For example, the SLA breach feature may be considered along with other features provided to the machine learning classification model. Using an algorithm (e.g., a classifier), input feature data provided to the machine learning model may be mapped to a category. Thus, in an example where a predicted launch time feature may be greater than an SLA launch time feature, the machine learning classification model may output a classification indicating that the launch time for a computing instance will likely breach the SLA launch time.

[0081] FIG. 6 is a flow diagram illustrating an example method 600 for predicting a launch time for a computing instance. Beginning in block 610, a request may be received for a predicted launch time associated with launching a computing instance on a physical host within a computing service environment. The predicted launch time may be a time in which a computing instance is in a pending state (i.e., executing service calls to setup computing instance resources, identifying a physical host to host the computing instance and creating the computing instance on the physical host) to a time in which the computing instance is in an executing state (i.e., the start of booting the computing instance). In some examples, a time in which a customer receives a usable computing instance (e.g., a booted computing instance) may be included in a predicted launch time by including a boot time for the computing instance, which may be affected by an internal configuration of the computing instance.

[0082] As in block 620, data may be obtained that is associated with launch features of a computing instance determined to have an impact on a launch time of the computing instance on a physical host within a computing service environment. For example, launch features that may be determined to have an impact on launch time may include, but are not limited to: machine image launch features (e.g., features for a machine image used to create the computing instance), physical host launch features (e.g., features of a physical host selected to host the computing instance) and launch

configuration features that may be controlled by a customer (e.g., machine image configuration, geographic region, number of computing instances launched

simultaneously, etc.). In one example, after obtaining the data associated with the launch features, the data may then be normalized.

[0083] As in block 630, the launch features (i.e., the data for the launch features) may be input to a machine learning model that outputs a predicted launch time for launching the computing instance on a selected physical host within the computing service environment. In one example, the machine learning model may be a regression model (e.g., a random forest regression model).

[0084] The machine learning model may be trained using historical data and then placed in a production environment where the machine learning model receives active requests for predicted launch times. In one example configuration, the machine learning model may be periodically trained using historical data (e.g., prior day, week or month launch features). In another example configuration, the machine learning model may be trained by extracting launch features from active data (e.g., prior seconds, minutes, hours) and retraining the machine learning model while the machine learning model is in a production environment, thereby enabling the machine learning model to be adaptive to changes occurring within the computing service.

[0085] The predicted launch time generated by the machine learning model may then be provided in response to the request. As one example, the predicted launch time may be provided to various services within the computing service, such as a computing instance placement service that selects a physical host for a computing instance. As another example, the predicted launch time may be provided to a customer, thereby informing the customer of the predicted launch time, or advising the customer of whether an SLA launch time is likely to be achieved. As yet another example, the predicted launch time may be provided to a computing service operator allowing the computing service operator to analyze and modify a computing service environment according to the predicted launch time. As will be appreciated, a predicted launch time may be utilized for any purpose and is not therefore limited to the examples disclosed herein.Technology is described for determining the placement of a computing instance within physical hosts in a computing service environment using estimated launch times. During placement of the computing instance, a physical host having available computing slots (e.g., computing resources used to launch the computing instance) or a group of physical hosts providing reduced launch times may be identified in the computing service environment using the estimated launch times and other placement criteria. The computing instance may be placed onto the physical host, also known as a server computer, and the computing instance may be launched or executed on the physical host within the computing service environment.

[0086] In one example, a request to launch a computing instance in the computing service environment may be received. The request may be received from a customer who desires computing services from the computing service environment. A determination as to which physical host in the computing service environment is to provide placement for the computing instance may be performed upon receiving the request to launch the computing instance from the customer. For instance, the physical host offering a reduced launch time for the computing instance, as compared to other physical hosts in the computing service environment may be identified, and the computing instance may be launched on that physical host. Thus, the computing instance may be placed on the physical host offering the reduced launch time for the computing instance. The term "launch time" may generally refer to a period of time between 1) receiving the request to launch the computing instance and 2) booting the computing instance on the physical host that is selected to launch the computing instance.

[0087] In one configuration, instance features associated with the computing instance involved in the request may be identified when determining placement for the computing instance or a machine image from which the computing instance may be generated. The instance features may describe or characterize the computing instance. For example, the instance features may include, but are not limited to, a size of the computing instance, a computing instance image type used by the computing instance (e.g., a machine image or a kernel image), an architecture type of the computing instance (e.g., a 32-bit architecture or a 64-bit architecture), a virtualization type of the computing instance (e.g., paravirtualization or a hardware virtual machine), and a type of data store used by the computing instance. The instance features may include user-controlled features, such as a type of operating system (OS) and a networking type (e.g., virtual private network) used for launching the computing instance.

[0088] In one configuration, physical host features associated with the physical hosts in the computing service environment may be identified when determining a placement for the computing instance. The physical host features may describe or characterize aspects of the physical hosts in the computing service environment at a given time (e.g., when the computing instance is to be launched). Alternatively, the physical host features may describe a defined group of physical hosts in the computing service environment. The physical host features may include, but are not limited to, a maximum number of computing instances that can be hosted at the physical host, a hardware type associated with the physical host, a hardware vendor associated with the physical host, a percentage occupancy for computing instances at the physical host when the computing instance is to be launched and a zone in which the physical host is located. In addition, the physical host features may include a number of computing instances that are currently pending and/or a running on the physical host.

[0089] The instance features associated with the computing instance and the physical host features associated with the physical hosts in the computing environment may be provided to a launch time prediction model. The launch time prediction model may predict an estimated launch time for launching the computing instance on a physical host in the computing service environment using the instance features and the physical host features,. More specifically, the launch time prediction model may predict the estimated launch time for launching the computing instance on Physical Host A, the estimated launch time for launching the computing instance on Physical Host B, the estimated launch time for launching the computing instance on Physical Host C, and so on. The launch time prediction model may be a machine learning model (e.g., a regression model) that has been trained using historical launch time information and features for a plurality of previously launched computing instances in order to predict the estimated launch time for the computing instance to be launched in the computing service environment.

[0090] As a non-limiting example of instance features, a computing instance to be launched in the computing service environment may: be relatively small in size, consist of a 32-bit architecture, use a hardware virtual machine (HVM), and/or use a defined type of data store. The computing instance may be launched on Physical Host A, Physical Host B, or Physical Host C. Physical Host A may be 80% occupied (i.e., 80%> of the computing resources for Physical Host A are currently being used) and is currently launching ten other computing instances. Physical Host B may be 50% occupied and is currently launching six other computing instances. Physical Host C may be 20% occupied and is currently launching two other computing instances. The launch time prediction model may receive the identified features and determine that the estimated launch time for launching the computing instance on Physical Host A is 70 seconds, the estimated launch time for launching the computing instance on Physical Host B is 40 seconds, and the estimated launch time for launching the computing instance on Physical Host C is 15 seconds. Therefore, the estimated launch times may be considered when determining which physical host is to provide placement for the computing instance.

[0091] In the example above, the physical host that can offer the reduced launch time, as compared to other physical hosts, may be selected for placement of the computing instance. In the example described above, Physical Host C may be selected to launch the computing instance because Physical Host C may provide the reduced launch time as compared to Physical Host A and Physical Host B.

[0092] In an alternative configuration, the physical host selected for placement of the computing instance may have a fewest number of computing instances that are simultaneously being launched as compared to other physical hosts in the computing service environment, because the number of simultaneous launches on the physical host may increase the launch time for launching the computing instance. As a non-limiting example, at the time when the placement decision is being performed, Physical Host A may be launching ten computing instances, Physical Host B may be launching two computing instances, and Physical Host C may be launching a hundred computing instances. Therefore, Physical Host B may be selected to launch the computing instance because Physical Host B may be inferred to provide the lowest launch time as compared to Physical Host A and Physical Host C.

[0093] The estimated launch time for the computing instance may be one of numerous placement factors used when determining placement for the computing instance. For example, other factors related to placement of the computing instance may include physical host utilization, licensing costs, disaster impact, etc. The placement factors, including the estimated launch time, may each be assigned a weighting value related to an importance level of the placement factor. For example, the estimated launch time may account for 50% of the placement decision, the physical host utilization may account for 30% of the placement decision, the licensing costs may account for 20% of the placement decision, and the disaster impact may account for 10% of the placement decision. [0094] FIG. 7 illustrates a computing device 710 on which modules of this technology may execute. A computing device 710 is illustrated on which a high level example of the technology may be executed. The computing device 710 may include one or more processors 712 that are in communication with a plurality of memory devices 720. The computing device 710 may include a local communication interface 718 for the components in the computing device. For example, the local communication interface 718 may be a local data bus and/or any related address or control busses as may be desired.

[0095] A memory device 720 may contain modules 724 that are executable by the processor(s) 712 and data for the modules 724. For example, a memory device 720 may contain a training module and a launch feature module. The modules 724 may execute the functions described earlier. A data store 722 may also be located in the memory device 720 for storing data related to the modules 724 and other applications along with an operating system that is executable by the processor(s) 712.

[0096] Other applications may also be stored in the memory device 720 and may be executable by the processor(s) 712. Components or modules discussed in this description that may be implemented in the form of software using high programming level languages that are compiled, interpreted or executed using a hybrid of the methods.

[0097] The computing device may also have access to I/O (input/output) devices

714 that are usable by the computing devices. Networking devices 716 and similar communication devices may be included in the computing device. The networking devices 716 may be wired or wireless networking devices that connect to the internet, a LAN, WAN, or other computing network.

[0098] The components or modules that are shown as being stored in the memory device 720 may be executed by the processor(s) 712. The term "executable" may mean a program file that is in a form that may be executed by a processor 712. For example, a program in a higher level language may be compiled into machine code in a format that may be loaded into a random access portion of the memory device 720 and executed by the processor 712, or source code may be loaded by another executable program and interpreted to generate instructions in a random access portion of the memory to be executed by a processor. The executable program may be stored in any portion or component of the memory device 720. For example, the memory device 720 may be random access memory (RAM), read only memory (ROM), flash memory, a solid state drive, memory card, a hard drive, optical disk, floppy disk, magnetic tape, or any other memory components.

[0099] The processor 712 may represent multiple processors and the memory

720 may represent multiple memory units that operate in parallel to the processing circuits. This may provide parallel processing channels for the processes and data in the system. The local interface 718 may be used as a network to facilitate communication between any of the multiple processors and multiple memories. The local interface 718 may use additional systems designed for coordinating communication such as load balancing, bulk data transfer and similar systems.

[00100] FIG. 8 illustrates components of an example computing service environment 800 according to one example of the present technology. The computing service environment 800 may include a server computer 810 in communication with a number of client devices 860 via a network 850, and the server computer may be part of the control plane for the service provider environment 800. The server computer 210 may contain a data store 830 and a number of modules used to determine a placement of a computing instance. In addition, the computing service environment 800 may include a number of server computers 840a-c executing a plurality of computing instances.

[00101] The server computers 840a-c may have available computing slots 842a-c

(e.g., idle computing resources) that may be used to execute a computing instance. The available computing slots 842a-c may be allocated to customers who may then utilize an available computing slot 842a-c to execute a computing instance. Examples of computing instances may include on-demand computing instances, reserved computing instances and interruptible computing instances. An on-demand computing instance may be a computing instance that a customer may purchase and execute upon request. A reserved computing instance may be a reservation for a computing instance that a customer may purchase for a defined period of time making the computing instance available when the customer requests the computing instance, and an interruptible computing instance may be a computing instance that may be executed in a computing slot 842a-c not being used by another computing instance type unless the price being paid for the interruptible computing instance falls below a current bid price.

[00102] The data stored in the data store 830 may include instance features 832.

The instance features 832 may be associated with the computing instance to be launched in the computing service environment 800. In addition, the instance features 832 may be associated with the computing instance image from which the computing instance is launched in the computing service environment 800. The instance features 832 may describe or characterize the computing instance that is to be launched in the computing service environment 800. For example, the instance features may have numerical values or other scalar values.

[00103] The data stored in the data store 830 may include physical host features

834. The physical host features 834 may be associated with the plurality of physical hosts in the computing service environment 800. The physical host features 834 may describe or characterize the physical hosts in the computing service environment 800 that potentially may launch the computing instance. For example, the physical host features 834 may have numerical values or other scalar values.

[00104] The data stored in the data store 830 may include estimated launch times

836. The estimated launch times 836 may be for a plurality of computing instances to be launched in the computing service environment 800. The estimated launch times 836 may indicate, for a given computing instance, an estimated launch time to launch the computing instance on each physical host (or server computer) in a plurality of physical hosts in the computing service environment 800. The estimated launch times 836 may be stored in the data store 830 for quality control purposes, record keeping or other uses. The estimated launch times 836 may be determined using a launch time prediction model. In one example, the launch time prediction model may use the instance features 832 associated with the computing instance and the physical host features 834 associated with the plurality of physical hosts when determining the estimated launch times 836. As a non-limiting example, the estimated launch times 836 for launching a computing instance on three distinct physical hosts may be 10 seconds, 50 seconds or two minutes, respectively.

[00105] The server computer 810 may include a computing instance request module 822, an estimated launch time prediction module 824, a physical host selection module 826, and other applications, services, processes, systems, engines, or functionality not discussed in detail herein. The computing instance request module 822 may be configured to receive a request to launch one or more computing instances in the computing service environment 800. The request may be received from a customer who desires computing services from the computing service environment 800. The request may include a number of computing instances to be launched and a type or size of the computing instances to be launched. In one example, the request may specify a particular geographical region or zone to launch the computing instances.

[00106] The estimated launch time prediction module 824 may be configured to receive or identify instance features associated with the computing instances in the request and physical host features associated with the plurality of physical hosts in the computing service environment 800. The instance features may include a size of the computing instance, a machine image used to launch the computing instance, an architecture type of the computing instance, a virtualization type of the computing instance, a type of data store used by the computing instance, etc. In addition, the physical host features may include, for each physical host in the computing service environment 800 or for a defined group of physical hosts in the computing service environment 800, a maximum number of computing instances that can be hosted on the physical host, a hardware type, a hardware vendor, a percentage of occupancy, a geographical zone in which the physical host is located, a number of currently pending or running instances on the physical host, etc.

[00107] The estimated launch time prediction module 824 may identify the estimated launch times for launching the computing instance in the computing service environment 800. The estimated launch time prediction module 824 may identify the estimated launch times using a machine learning model. The machine learning model may predict the estimated launch times given the instance features and the physical host features. The instance features and the physical host features may affect the estimated launch times for the computing instance. For example, certain instance features and/or physical host features (e.g., a number of simultaneous computing instance launches on the physical host, a size of the computing instance) may increase the estimated launch times for the computing instance, whereas other instance features and/or physical host features may decrease the estimated launch times for the computing instance. In one example, the machine learning model may be a regression model that uses historical launch time information for a plurality of previously launched computing instances to predict the estimated launch time for the computing instance to be launched in the computing service environment 800.

[00108] The physical host selection module 826 may be configured to select a physical host from a group of physical hosts in the computing service environment 800 that can provide placement of the computing instance. The physical host selection module 826 may select the physical host based on the estimated launch times for the computing instance. In one example, the physical host selection module 826 may select the physical host that can provide placement of the computing instance at a reduced estimated launch time or a lowest estimated launch time as compared to other physical hosts in the group of physical hosts. In addition, the physical host selection module 826 may use additional placement factors when determining placement for the computing instance. The additional factors may include, but are not limited to, physical host utilization, licensing costs, and a disaster impact. The estimated launch times and the additional placement factors may each be assigned a weighting value that correlates to an importance level of the placement factor when determining placement for the computing instance. The computing instance may be loaded and executed on the physical host upon placement in order to provide computing services to the customer.

[00109] FIG. 9 illustrates an exemplary system and related operations for placing a computing instance on a physical host in a computing service environment 900. The computing instance may be launched in order to provide computing services upon placement onto the physical host. A computing instance request 910 to launch the computing instance may be received at the computing service environment 900. For example, a customer may perform the computing instance request 910 in order to obtain the computing services from the computing service environment 900. The physical host on which the computing instance is placed may be selected according to a predetermined objective. The predetermined objective may be defined by the customer and/or the computing service environment 900. In one example, the predetermined objective may include placing the computing instance on a physical host that can provide a fastest launch time as compared to other physical hosts in the computing service environment 900.

[00110] Instance features 915 associated with the soon to be launched computing instance included in the computing instance request 910 and physical host features 920 for a plurality of physical hosts in the computing service environment 900 may be identified. For example, the physical hosts 950-960 may be directly queried for data corresponding to the physical host features 920. The instance features 915 and the physical host features 920 may describe the computing instance and the physical hosts 950-960 in the computing service environment 900, respectively. The identification of the instance features 915 and the physical host features 920 may enable placement of the computing instance onto the physical host. More specifically, the physical host that is selected for placement may depend on the instance features 915 and the physical host features 920. As previously described, the instance features 915 may include a size of the computing instance, a machine image used by the computing instance, an architecture type of the computing instance, a virtualization type of the computing instance, a type of data store used by the computing instance, etc. In addition, the physical host features 920 may include, for each physical host in the computing service environment 900 or for a defined group of physical hosts in the computing service environment 900, a maximum number of computing instances that can be hosted on the physical host, a hardware type, a hardware vendor, a percentage of occupancy, a geographical zone in which the physical host is located, a number of currently pending or running instances on the physical host, etc.

[00111] The instance features 915 and the physical host features 920 may be provided to a machine learning model 930. The machine learning model 930 may be a regression model that predicts an estimated launch time for launching the computing instance on a given physical host based on the instance features 915 and the physical host features 920. The machine learning model 930 may have been trained using historical information for previously launched computing instances (e.g., types of computing instances that were previously launched, launch times for the computing instances, the number of simultaneous launches on the physical hosts that launched the computing instances, etc.) in order to predict the estimated launch time for the computing instance.

[00112] In one example, the machine learning model 930 may predict the estimated launch times for launching the computing instance on each of the available physical hosts in the computing service environment 900. Alternatively, the machine learning model 930 may predict the estimated launch times for launching the computing instance on individual physical hosts in a defined group of physical hosts. For example, the machine learning model 930 may predict the estimated launch time for launching the computing instance on a physical host 950, as well as the estimated launch time for launching the computing instance on a physical host 960.

[00113] The machine learning model 930 may provide the estimated launch times to a placement module 940. The placement module 940 may use the estimated launch times when performing a placement decision for the computing instance (i.e., which physical host is to host or launch the computing instance). In addition, the placement module 940 may use additional placement factors 935 when performing the placement decision. The additional placement factors 935 may include, but are not limited to, a physical host utilization placement factor, a licensing cost placement factor and a disaster impact placement factor. The physical host utilization placement factor may represent a predetermined objective of maximizing physical host utilization among the physical hosts included in the computing service environment 900. The licensing placement factor may represent a predetermined objective of minimizing software licensing costs associated with the placement of computing instances on the physical hosts included in the computing service environment 900. The disaster impact placement factor may represent a predetermined objective of minimizing an impact of computing service failure (e.g., physical host failure, rack failure, availability zone failure or hardware failure) on a customer's executing computing instances.

[00114] The estimated launch times determined by the machine learning model

930, as well as the additional placement factors 935, may each be assigned a weighting value indicating a respective importance of the placement factor when determining placement of the computing instance. In other words, the weighting value may be assigned to each of the placement factors according to how the computing service environment 900 may be affected by the placement of a computing instance. For example, where maintaining high physical host utilization is desired, the physical host utilization factor may receive a relatively high weighting value. In a case where optimizing the software licensing costs is of lesser value to the computing service environment, the licensing cost placement factor may receive a lower weighting value as compared to the weighting value assigned to the utilization placement factor. In a case where placement of the computing instance in an overall computing service environment 900 may currently have a negative effect on a number of customers impacted due to a system failure, a weighting value assigned to the disaster impact placement factor may be a relatively high value. As a non-limiting example, the estimated launch times for the computing instance may be weighted 50% when the placement module 840 performs the placement decision, the physical host utilization placement factor may be weighted 20%, the licensing cost placement factor may be weighted 15%, and a disaster impact placement factor may be weighted 15%.

[00115] The placement module 940 may receive the estimated launch times for launching the computing instance from the machine learning model 930, as well as the additional placement factors 935. The placement module 940 may determine which physical host is to receive the computing instance in order to comply with the

predetermined objectives of the computing service environment 900. In one example, the placement module 940 may select the physical host for placement that provides a reduced launch time or a lowest launch time for the computing instance.

[00116] As a non-limiting example, the placement module 940 may determine that the physical host 950 can launch the computing instance in 28 seconds. In addition, the placement module 940 may determine that the physical host 960 can launch the computing instance in 30 seconds. The placement module 940 may select the physical host 950 for placement of the computing instance because the physical host 950 provides a lower launch time.

[00117] In one configuration, the physical host to provide placement of the computing instance may be selected using placement constraints included in the computing instance request 910. In one example, the customer requesting the computing instance to be launched may provide the placement constraints. The placement constraints may indicate whether or not the computing instance request 910 is a plan for launching a cluster of computing instances. The placement constraints may indicate specific types of hardware, operating systems or networking types to be used when launching the computing instance. In addition, the placement constraints may indicate whether the computing instances are to be launched in a group of physical hosts that are relatively close to each other, as opposed to a group of physical hosts that are relatively spread apart.

[00118] FIG. 10 illustrates an exemplary system and related operations for using an estimated attachment time when determining placement for a computing instance in a computing service environment 1000. A request to launch the computing instance may be received (e.g., from a customer). In one example, the request may include an attachment request 1010. The attachment request 1010 may be to attach a network interface and/or network storage to the computing instance when the computing instance is launched. Since a number of attachments and/or the size of the attachments used when launching the computing instance may affect a launch time of the computing instance, attachment times may be considered when performing a placement decision for the computing instance. [00119] In one example, a machine learning model 1030 may predict estimated attachment times, i.e., an amount of time to perform the attachment. The machine learning model 1030 may use the attachment request 1010, as well as attachment features 1020 associated with the attachment request 1010, when predicting the estimated attachment times. The attachment features may include, but are not limited to, a number of attachments included in the attachment request 1010, a size of the attachments, whether the attachment relates to data storage or a network interface, etc. In one example, the machine learning model 1030 may be a regression model that predicts the estimated attachment times using historical information relating to past attachment requests.

[00120] The machine learning model 1030 may provide the estimated attachment times to a placement module 1040. The placement module 1040 may select a physical host for placement of the computing instance based on the estimated attachment times. For example, the placement module 1040 may place the computing instance on one of a physical host 1050, a physical host 1060, or a physical host 1070 depending on the estimated attachment times. In one example, the placement module 1040 may select the physical host that can provide placement of the computing instance with a reduced estimated attachment time.

[00121] In another configuration, the customer may request an ad hoc attachment

(i.e., a request for additional storage after the computing instance has been launched). The machine learning model 1030 may predict an estimated amount of time to obtain the additional storage based on characteristics of the ad hoc attachment request, such as a size of the additional storage in the request, etc. In other words, the machine learning model 1030 may determine the estimated amount of time to obtain the additional storage based on past additional storage requests. In one example, the estimated amount of time to provide the additional storage may be provided to the customer via a user interface.

[00122] FIG. 11 illustrates an exemplary system and related operations for placing a computing instance on a physical host (e.g., server) in a physical or geographical area that is selected from at least one of a plurality of topology layers 1150 in a computing service environment 1100. A computing instance request 1110 to launch the computing instance may be received from a customer. Features 1120 associated with the computing instance and physical hosts in the computing service environment 1100 may be provided to a machine learning model 1130. The machine learning model 1130 may determine estimated launch times for launching the computing instance on physical hosts in individual areas within varying topology layers 1150, such as a physical host in a particular geographical region, zone, data center, data rack, physical host, computing slot, etc. In one example, the geographical region may include a plurality of zones, each zone may include a plurality of data centers, each data center may include a plurality of data racks, each data rack may include a plurality of physical hosts, and each physical host may include a plurality of computing slots. The machine learning model 1130 may determine whether placing the computing instance on a physical host in particular topology layers 1150 could result in an improved launch time. For example, the machine learning model 1130 may indicate that placing the computing instance on a physical host in a first data center in a particular zone may result in a faster launch time as compared to placing the computing instance in a second data center in the particular zone. The machine learning model 1130 may communicate the estimated launch times for the topology layers 1130 to a placement module 1140. The placement module 1140 may use the estimated launch times when determining placement for the computing instance (i.e., choosing which topology layer 1150 is to host the computing instance).

[00123] FIG. 12 is an exemplary block diagram 1200 that illustrates generating a machine learning model 1250 to predict launch times for computing instances that are launched in a computing service environment. The machine learning model 1250 may be created using actual launch time prediction data 1210. The actual launch time input data 1210 may include information (e.g., launch metrics) for a plurality of computing instances that have been previously launched in the computing service environment. So, the actual launch time input data 1210 may include historical information relating to previously launched computing instances in the computing service environment. In addition, the actual launch time input data 1210 may include historical information for a plurality of physical hosts in the computing service environment. The actual launch time input data 1210 may be transformed to be used to train the machine learning model 1250, as discussed later.

[00124] As a non-limiting example, the actual launch time input data 1210 may indicate that computing instance A took 60 seconds to launch, while computing instance A was relatively large in size, used a first type of data store, used a 32-bit architecture, and launched on a physical host that was simultaneously launching five other computing instances. As another non- limiting example, the actual launch time input data 1210 may indicate that Computing Instance B took 15 seconds to launch, while Computing Instance B was relatively small in size, used a second type of data store, used a 64-bit architecture, and launched on a physical host that was not simultaneously launch other computing instances.

[00125] The actual launch time input data 1210 may be provided to a feature selection and normalization module 1220. The feature selection and normalization module 1220 may convert the actual launch time input data 1210 into model features. In other words, the model features may relate to characteristics of the computing instances that were previously launched and characteristics of the physical hosts upon which the computing instances were previously launched. The model features may be classified as instance features and physical host features.

[00126] The instance features may include, but are not limited to, sizes of the computing instances, machine images used by the computing instance (e.g., a machine images or kernel images), architecture types of the computing instances (e.g., 32-bit architectures or 64-bit architectures), virtualization types of the computing instances (e.g., paravirtualization or hardware virtual machines), and types of data stores used by the computing instances. The instance features may include user-controlled features, such as type of operating systems (OS) and networking types (e.g., virtual private cloud) used for launching the computing instances.

[00127] The physical host features may include, but are not limited to, a maximum number of computing instances the physical host can host, a hardware type associated with the physical host, a hardware vendor associated with the physical host, a percentage occupancy at the physical host when the computing instance is to be launched, and a zone in which the physical host is located. The physical host features may include an average, minimum and maximum number of pending computing instances and/or running computing instances on the physical host that launches the computing instance (i.e., a target physical host). In addition, the physical host features may include a number of computing instances that are currently in a pending state and/or a running state on the physical host that launches the computing instance (i.e., the target physical host).

[00128] The feature selection and normalization module 1220 may normalize the model features (i.e., adjust values measured on different scales to a notionally common scale) in order to create launch time prediction training data 1230. The launch time prediction training data 1230 may represent aggregated features for the plurality of computing instances identified in the launch time prediction input data 1210. The launch time prediction training data 1230 may be provided to a machine learning selection module 1240. The machine learning selection module 1240 may use the launch time prediction training data 1230 to train various machine learning models 1242. For example, a regression model may be trained. The regression models 1242 may include, but are not limited to, support vector machines, stochastic gradient descent, adaptive boosting, extra trees, and random forest. The various regression models 1242 may correspond to the launch time prediction training data 1230 with various levels of success. In one example, a random forest regressor may provide a relatively high rate of accuracy with respect to the launch time prediction training data 1230, and therefore, the machine learning selection module 1240 may use the random forest regressor when estimating launch times for computing instances.

[00129] The machine learning model 1250 may receive a request to launch a computing instance, and based on the instance features and the physical host features associated with the computing instance, the machine learning model 1250 may predict the launch time for the computing instance. In one example, the machine learning model 1250 may determine that the number of simultaneous computing instance launches on the same physical host, the type of data stored used by the computing instance, the architecture type used by the computing instance and the computing instance image associated with the computing instance may have a greater effect on the launch time for the computing instance as compared to the other model features.

[00130] In some cases, the predicted launch time from the computing instance may diverge from an actual launch time of the computing instance. The instance features and the physical host features associated with the computing instance, as well as the actual launch time to launch the computing instance, may be used to further train the machine learning model 1250 in order to improve future launch time predictions.

[00131] FIG. 13 is a flow diagram illustrating an example method for determining computing instance placement within a computing service environment. A request to launch a computing instance in a computing service environment may be received, as in block 1310. The request to launch the computing instance may be received from a customer who desires computing services from the computing service environment.

[00132] Instance features associated with the computing instance and physical host features associated with a group of physical hosts in the computing service environment may be provided to a machine learning model, as in block 1320. The instance features may describe or characterize the computing instance to be launched according to the request. The physical host features may describe or characterize each physical host in the computing service environment at a given time (i.e., when the computing instance is to be launched in accordance with the request).

[00133] Estimated launch times for launching the computing instance on individual physical hosts in the computing service environment may be determined using the machine learning model, as in block 1330. The machine learning model may predict the estimated launch times given the instance features and the physical host features. In one example, the machine learning model may be a regression model that uses historical launch time information for a plurality of previously launched computing instances to predict the estimated launch time for the computing instance to be launched in the computing service environment.

[00134] A physical host from the group of physical hosts may be selected to provide placement of the computing instance according to a lower estimated launch time as compared to other physical hosts in the group of physical hosts, as in block 1340. In addition, the physical host to provide placement of the computing instance may be selected using placement constraints included in the request to launch the computing instance. The estimated launch times may be one of a plurality of placement factors used when selecting the physical host for placement of the computing instance. In one example, a weighting value may be assigned to the estimated launch times for the computing instance when the placement is determined using the plurality of placement factors, and the physical host that is to provide the placement of the computing instance may be selected based in part on the weighting value assigned to the estimated launch times.

[00135] FIG. 14 is a flow diagram illustrating another example method for determining computing instance placement within a computing service environment. A request to launch a computing instance in a computing service environment may be received, as in block 1410. The request to launch the computing instance may be received from a customer requesting computing services from the computing service environment.

[00136] Estimated launch time for launching the computing instance on physical hosts in a group of physical hosts may be identified, as in block 1420. The estimated launch times may include a period of time between receiving a computing instance launch request from the customer and booting the computing instance on the physical host. The estimated launch times for launching the computing instance may be identified using a regression model that predicts launch times for computing instances. The estimated launch times for launching the computing instance may be identified based on instance features associated with the computing instance and physical host features associated with the physical hosts in the group of physical hosts, wherein the instance features associated with the computing instance includes user selected features.

[00137] A physical host in the group of physical hosts may be selected to provide placement of the computing instance based in part on the estimated launch times for the computing instance and may optionally include additional factors related to placement of the computing instance, as in block 1430. The additional factors related to placement of the computing instance may include a physical host utilization placement factor, a licensing cost placement factor and a disaster impact placement factor. The computing instance may be loaded on the physical host in order to provide a computing service.

[00138] In one configuration, the estimated launch times for the computing instance on each physical host may be compared with the estimated launch times for the other physical hosts in the group of physical hosts. The physical host in the group of physical hosts that can provide placement of the computing instance with a lower estimated launch time as compared with the other physical hosts in the group of physical hosts may be selected. Alternatively, the physical host that includes a lower number of computing instances being simultaneously launched as compared to other physical hosts in the group of physical hosts may be selected. In one example, the physical host selected to execute the computing instance may be verified to not be simultaneously executing a number of computing instances that exceeds a predetermined threshold.

[00139] In another configuration, a region or a zone may be selected for placement of the computing instance based in part on the estimated launch times for computing instances in the region or zone. In one example, a weighting value may be assigned to the estimated launch times for the computing instance when the placement of the computing instance is determined using a plurality of placement factors, and the physical host for placement of the computing instance may be selected based in part on the weighting value assigned to the estimated launch times. In addition, an estimated amount of time to launch an attachment associated with the computing instance may be identified, and the physical host that can provide placement of the computing instance with a lower estimated attachment time as compared to other physical hosts in the group of physical hosts may be selected.

[00140] Technology is described for using launch time predictions to organize caching of machine images in a computing service environment. A machine image may provide information for launching a computing instance in the computing service environment (i.e., the computing instance may be launched from the machine image). For example, the machine image may indicate a type of data store used for launching the computing instance, launch permissions, etc. The machine image may be cached or stored on a physical host, also known as a server computer, in the computing service

environment in order to reduce a launch time for launching the computing instance. In other words, caching the machine image local to the physical host that is launching the associated computing instance may provide a relatively faster launch time for the computing instance as compared to retrieving the machine image from a data store over a network. The term "launch time" generally refers to a period of time between receiving a request to launch the computing instance and booting the machine image associated with the computing instance onto a physical host that is selected to launch the computing instance.

[00141] In one configuration, expected traffic patterns for the computing service environment may be identified. The expected traffic patterns may indicate particular computing instances that are likely to be launched in the computing service environment during a defined time period (and possibly a defined geography). For example, the expected traffic patterns may indicate that computing instance A is likely to be launched at 8:30 AM on Tuesday. In one example, the expected traffic patterns may be identified using heuristic rules relating to past traffic patterns in the computing service environment. In another example, the expected traffic patterns may be identified using a machine learning model that uses historical traffic information for the computing service environment in order to predict expected traffic patterns for the computing service environment.

[00142] The expected traffic patterns for the computing service environment (e.g., features for the computing instances expected to be launched during the defined time period) may be provided to a launch time prediction model. The launch time prediction model may determine whether preemptively caching the machine images associated with the computing instances at predefined locations in the computing service environment may result in reduced estimated launch times for launching the computing instances. In other words, the launch time prediction model may determine whether caching the machine images at particular locations may reduce estimated launch times as compared to not caching the machine images or caching the machine images at other locations that do not improve the estimated launch times. The predefined locations may include particular physical hosts, groups of physical hosts or localized storage locations (e.g., localized network attached storage) in the computing service environment. As an example, the launch time prediction model may be a regression model that uses historical launch time information, including historical computing instance caching information, for a plurality of previously launched computing instances for determining the estimated launch times for the computing instances expected to be launched in the computing service

environment.

[00143] As a non-limiting example, computing instance A may be likely to be launched in the computing service environment according to the expected traffic patterns. Physical host X and physical host Y may be identified as being available to cache the machine image associated with computing instance A. The launch time prediction model may determine that caching the machine image on physical host X may result in a predicted launch time of 60 seconds for computing instance A. In addition, the launch time prediction model may determine that caching the machine image on physical host Y may result in a predicted launch time of 30 seconds for computing instance A. The machine image may be cached in physical host Y with the expectation that the computing instance A may be requested by a customer in the future and that the machine image is likely to launch the fastest when the computing image is cached on physical host Y.

[00144] Thus, the physical host that is predicted to provide a reduced launch time for the computing instance when the machine image associated with the computing instance is cached on the physical host, as determined using the launch time prediction model, may be selected to cache the machine image. The physical host that is selected to cache the machine image may be included in a cache layout. The physical host that is included in the cache layout may be available and/or capable of caching the machine image. In one example, the cache layout may identify a single physical host that is available to cache the machine image. Alternatively, the cache layout may identify a group of physical hosts that are available to cache the machine image. The physical host included in the cache layout may have available computing slots (e.g., computing resources) used to execute the computing instance. In addition, the available computing slots may support a type or size of the machine image.

[00145] FIG. 15 is a diagram that illustrates caching machine images to reduce computing instance launch times in a computing service environment 1500. Expected traffic patterns 1510 for launching computing instances may be identified for the computing service environment 1500. The expected traffic patterns 1510 may indicate a computing instance that is likely to be launched in the computing service environment 1500 during a defined time period (and possibly a defined geography). The computing instance may be associated with machine image 1512. In one example, the features of the computing instance and/or the machine image 1512 may be provided to a launch time prediction model 1530. The features may include whether the machine image 1512 is cached, a size of the computing instance, where the machine image may be cached, etc. The launch time prediction model 130 may determine, based on the features of the computing instance and/or the machine image 1512, that caching the machine image 1512 at predefined locations in the computing service environment 1500 may reduce a launch time for the computing instance. The predefined locations may include certain physical hosts or localized storage locations connected via a high speed network connection, such as network attached storage (NAS) on a server rack or in buildings with servers.

[00146] A cache layout module 1540 may use the launch time prediction model

1530 to determine a cache layout for caching the machine image 1512 in the computing service environment 1500. The cache layout may include a physical host (or physical hosts) in the computing service environment 1500 having a caching slot that is available and/or capable of caching the machine image 1512. In addition, the physical host selected for use in the cache layout may provide a reduced launch time for launching the computing instance when the machine image 1512 associated with the computing instance is cached on the physical host.

[00147] As a non-limiting example, the computing service environment 1500 may include multiple physical hosts 1550, 1560 and 1570. The cache layout module 1540 may use the launch time prediction model 1530 to determine that caching the machine image 1512 on the physical host 1550 may result in an estimated launch time of 1580 seconds for the computing instance. In addition, the cache layout module 1540 may determine that caching the machine image 1512 on the physical host 1560 or on the physical host 1570 may result in estimated launch times of 165 seconds and 190 seconds, respectively, for the computing instance. Therefore, the cache layout module 1540 may select the physical host 1560 for caching the machine image 1512 (i.e., the cache layout includes the physical host 1560) in order to achieve a reduced launch time for launching the computing instance as compared to caching the machine image 1512 on the physical host 1550 or the physical host 1570.

[00148] FIG. 16 illustrates components of an example computing service environment 1600 according to one example of the present technology. The computing service environment 1600 may include a server computer 1610 in communication with a number of client devices 1660 via a network 1650, and the server computer may be part of the control plane for the service provider environment 1600. The server computer 1610 may contain a data store 1630 and a number of modules used to determine a cache placement of a machine image. In addition, the computing service environment 1600 may include a number of server computers 240a-b executing a plurality of computing instances.

[00149] The server computers 1640a-b may have available computing slots

1642a-b (e.g., idle computing resources) that may be used to execute a computing instance. The available computing slots 1642a-b may be allocated to customers who may then utilize an available computing slot 1642a-b to execute a computing instance. In addition, the server computers 1640a-b may have available caching slots 1644a-b that may be used to cache a machine image associated with a computing instance to be executed. Examples of computing instances may include on-demand computing instances, reserved computing instances and interruptible computing instances. An on- demand computing instance may be a computing instance that a customer may purchase and execute upon request. A reserved computing instance may be a reservation for a computing instance that a customer may purchase for a defined period of time making the computing instance available when the customer requests the computing instance, and an interruptible computing instance may be a computing instance that may be executed in a computing slot 1642a-b not being used by another computing instance type unless the price being paid for the interruptible computing instance falls below a current bid price.

[00150] The data stored in the data store 1630 may include expected traffic patterns 1632 that are based on historical data. The expected traffic patterns 1632 may identify computing instances that are expected to be launched in the computing service environment 1600 during a defined time period. For example, the expected traffic patterns 1632 may indicate that computing instance Z is likely to be launched on Saturday at 7 PM. In one example, the expected traffic patterns 1632 may be determined based on historical traffic information for the computing service environment 1600. For example, the expected traffic patterns 1632 may indicate that computing instance Z is likely to be launched on Saturday at 7 PM because computing instance Z has been launched at similar times for the past two months. In one configuration, the expected traffic patterns 1632 may be identified using heuristic rules relating to past traffic patterns in the computing service environment 1600. In yet another configuration, a machine learning model may be trained to determine the expected traffic patterns 1632 using historical traffic patterns for the computing service environment 1600.

[00151] The data stored in the data store 1630 may include cache layouts 1634 for computing instances to be launched in the computing service environment 1600. The cache layouts 1634 may identify physical hosts in the computing service environment 1600 that are selected for caching of machine images because the physical hosts are available and/or capable of caching machine images associated with the computing instances. The cache layouts 1634 may identify a single physical host that is available to cache the machine images or a group of physical hosts that are available to cache the machine images. The physical hosts included in the cache layouts 1634 may have available computing slots (e.g., computing resources used to execute the computing instance) that support defined types or sizes of the machine images.

[00152] In one example, the cache layouts 1634 may identify physical hosts in a particular region or zone for caching the machine images. In addition, the cache layouts 1634 may be modified or updated in response to changes in the computing service environment 1600. For example, when physical hosts that were previously identified in the cache layouts 1634 for caching the machine images become overloaded or full (e.g., resulting in decreased launch times), the cache layouts 1634 may be updated to include other physical hosts for caching the machine images that result in reduced launch times for the computing instances. Thus, the cache layouts 1634 may be periodically updated.

[00153] The server computer 1610 may include an expected traffic patterns identification module 1622, an estimated launch time prediction module 1624, a cache layout module 1626, a cache setup module 1628, and other applications, services, processes, systems, engines, or functionality not discussed in detail herein. The expected traffic patterns identification module 1622 may be configured to identify expected traffic patterns in the computing service environment 1600. The expected traffic patterns may indicate a machine image associated with a computing instance that is likely to be launched in the computing service environment during a defined time period. The expected traffic patterns identification module 1622 may identify the expected traffic patterns using a machine learning model that predicts the expected traffic patterns using historical traffic information for the computing service environment 1600. In one example, the expected traffic patterns identification module 1622 may user heuristic rules to identify the expected traffic patterns in the computing service environment 1600.

[00154] The estimated launch time prediction module 1624 may determine that caching the machine image in a predefined location in the computing service environment 1600 may reduce a launch time for the computing instance as compared to not caching the machine image. The predefined location may include a particular physical host, a group of physical hosts or a local storage component relative to the physical hosts in the computing service environment 1600. In one example, the estimated launch time prediction module 1624 may use a launch time prediction model to determine whether or not to cache the machine image at the predefined location. In other words, the launch time prediction model may provide an estimated launch time for launching the computing instance when the machine image associated with the computing instance is cached on a particular physical host.

[00155] The cache layout module 1626 may be configured to determine a cache layout to enable caching of the machine image in the computing service environment 1600. The cache layout may identify physical hosts at the predefined location that are available to cache the machine image in order to reduce the launch time for the computing instance associated with the machine image. The physical hosts indicated in the cache layout may have sufficient resources and capabilities to cache the machine image. In one example, the cache layout module 1626 may select the physical hosts to be included in the cache layout using the launch time prediction model. In other words, a physical host in the cache layout may have been identified by the launch time prediction as being likely to provide a reduced launch time for the computing instance when the machine image is cached on the physical host. In addition, the cache layout module 1626 may identify various characteristics of the physical hosts (e.g., hardware types, addressing) when selecting the physical hosts to be included in the cache layout. As another example, the cache layout module 1626 may use a genetic technique or particle swarm optimization when selecting the physical hosts to be included in the cache layout. Thus, the cache layout module 1626 may use the genetic technique or particle swarm optimization to select physical hosts that provide reduced launch times when machine images are cached on the physical hosts.

[00156] The cache setup module 1628 may be configured to store the machine image on at least one physical host in the computing environment 1600 according to the cache layout. In one example, the cache setup module 1628 may store the machine image on a physical host associated with certain topology layers (e.g., a particular region, zone, server rack, and physical host). Accordingly, there may be localized storage that is associated with a specific topology layer. For instance, a caching device (e.g., an NAS) may be provided at the region, zone, or server rack level. By preemptively caching the machine image, the launch time for launching the computing instance may be reduced. In one example, the cache setup module 1628 may send the machine image to the caching location prior to a defined time period when the computing instance is expected to be launched and the computing instance may reside in the cache location for a defined time period. When the defined time period has ended, the cache setup module 1628 may clear the machine image from the cache. As a non-limiting example, when a particular computing instance is expected to be launched on Saturday at 8 AM - 9 AM, the machine image associated with the computing instance may be cached on the physical host on Saturday between 6 AM to 10 AM, and thereafter removed from the physical host.

[00157] FIG. 17 illustrates a system and related operations for caching machine images in a computing service environment 1700 in order to reduce computing instance launch times. A machine image may provide information for launching a computing instance in the computing service environment (i.e., the computing instance may be launched from the machine image). For example, the machine image may indicate a type of data store used for launching the computing instance, launch permissions, etc. The machine image may be stored on at least one physical host in the computing service environment 1700 in order to reduce the computing image launch time. In other words, locally caching the machine image may provide for a relatively faster computing instance launch time as compared to fetching the machine image from a separate data store over a network.

[00158] Expected traffic patterns 1710 for the computing service environment

1700 may be identified. The expected traffic patterns 1710 may indicate that the computing instance is expected to be launched at a certain time period and/or in a certain geographic location. As a non-limiting example, the expected traffic patterns 1710 may indicate that a computing instance is likely to be launched on Monday at 8 AM. In one configuration, the expected traffic patterns 1710 may be identified using heuristic rules 1712. The heuristic rules 1712 may relate to past traffic patterns for the computing service environment 1700. As an example, using the heuristic rules 1712 may infer that the computing instance is likely to be launched on Monday at 8 AM if the computing instance has been launched at similar times in the past. In another configuration, the expected traffic patterns 1710 may be identified using a machine learning model 1714. The machine learning model 1714 may use historical traffic information from the computing service environment 400 in order to predict the expected traffic patterns 1710 in the computing service environment 1700.

[00159] In one example, instance features related to the computing instance that is expected to be launched during the certain time period may be provided to a launch time prediction model 1730. The instance features may be related to a machine image 1722 used by the computing instance and/or the computing instance to be launched in the computing service environment 1700. For example, the instance features may include a size of the computing instance, an architecture type of the computing instance (e.g., 32-bit architectures or 64-bit architecture), a virtualization type of the computing instance (e.g., paravirtualization or hardware virtual machine) and/or a type of data store used by the computing instance.

[00160] The launch time prediction model 1730 may determine, based on the instance features, whether preemptively caching the machine image 1722 at a predefined location in the computing service environment 1700 may reduce the launch time for the computing instance. In addition, the launch time prediction model 1730 may determine that caching the machine image 1722 at a particular location (e.g., a particular physical host) may reduce the launch time as compared to caching the machine image 1722 at another location in the computing service environment 1700. The predefined location may include a particular physical host or a group of physical hosts in the computing service environment 1700.

[00161] The launch time prediction model 1730 may be a machine learning model that is trained by historical launch time information for a plurality of previously launched computing instances and the launch time prediction model 1730 may be used for determining the estimated launch times for the computing instances expected to be launched in the computing service environment 1700. So, the launch time prediction model 1730 may use historical launch time information related to previous launch times when the machine image was cached, previous launch times when the computing instance was not cached, previous launch times when the machine image was cached at particular locations, etc. in training to predict the estimated launch times for the computing instances expected to be launched. In one example, the launch time prediction model 1730 may be a regression model used for predicting the estimated launch times.

[00162] As a non-limiting example, the computing service environment may include physical hosts 1750, 1760. The launch time prediction model 1730 may provide information to help determine whether to cache the machine image 1722 on one of the physical hosts 1750, 1760 in order to reduce the launch time of the computing instance. The launch time prediction model 1730 may determine that caching the machine image 1722 on the physical host 1750 may result in an estimated launch time of 30 seconds for the computing instance. In addition, the launch time prediction model 1730 may determine that caching the machine image 1722 on the physical host 1760 may result in estimated launch times of 25 seconds. The launch time prediction model 1730 may use various types of information, in addition to whether the machine image 1722 is cached, when predicting the estimated launch times, such as a size of the computing instance, a number of simultaneous launches at the physical hosts, a percentage of occupancy at the physical hosts, etc.

[00163] A cache layout module 1740 may use the prediction information from the launch time prediction model 1730 to determine a cache layout for caching the machine image 1722 in the computing service environment 1700. The cache layout may include at least one physical host that is selected to cache the machine image 1722 in order to reduce the launch time for the computing instance. In order to determine the cache layout, the cache layout module 1740 may identify at least one physical host at the predefined location in the computing service environment 1700 that is available and/or capable of caching the machine image 1722. The cache layout module 1740 may, via the launch time prediction model 1730, compare the estimated launch times for launching the machine image 1722 on each of the available physical hosts to determine if the machine image 1722 are to be cached on each of the respective physical hosts. The cache layout module 1740 may compare the estimated launch times for launching the computing instance on the physical hosts and select the physical host that can provide the reduced launch time for the computing instance. In other words, the physical host that can provide the reduced launch time may be included in the cache layout. In one example, dozens or hundreds of copies of the same machine image 1722 may be cached in a group of physical hosts in order to provide the reduced launch time.

[00164] In one configuration, the cache layout module 1740 may use a genetic technique or particle swarm optimization to identify the physical host or the group of physical hosts in the computing service environment 1700 that can cache the machine image 1722 in order to provide the reduced launch time for the computing instance. The machine image 1722 may be stored on the selected physical host and loaded from the physical host when the computing instance is launched in the computing service environment 1700.

[00165] As a non-limiting example, the cache layout module 1740 may determine that the physical hosts 1750, 1760 are available to cache the machine image 1722. The cache layout module 1740 may determine, via the launch time prediction model 1730, that caching the machine image 422 on one of the physical hosts 1750, 1760 may result in an estimated launch time of 30 seconds, 25 seconds or 40 seconds, respectively.

Therefore, the cache layout module 1740 may select the physical host 1760 to be included in the cache layout because caching the machine image 1722 on the physical host 1760 may result in a lower launch time as compared to caching the machine image 1722 on the physical host 1750. Alternatively, the cache layout module 1740 may determine to cache the machine image 1722 on a network attached storage (NAS) device 1770 included in the computing service environment 400 in order to lower the launch time.

[00166] In one example, the cache layout module 1740 may store the machine image 1722 in an available caching slot of the physical host that is capable of supporting the size of the machine image 1722. For example, the physical host 1750 may include available caching slots that are of a first type. The physical host 1760 may include available caching slots that are of a second type and a third type. The machine image 1722 may be configured in a number of sizes and types. The cache layout module 1740 may verify that the type of caching slot available in the physical host is capable of storing the type of machine image 1722.

[00167] In another example, the machine image 1722 may be stored on physical hosts or physical devices associated with individual areas for varying topology layers. The topology layers may include a particular region, zone, data center, server rack, physical host, caching slot, etc. As an example, the cache layout may include a particular zone or a particular data center in the zone that is to store the machine image 1722, such as the NAS 1770. The topology layers may provide a cache which in turn may provide the reduced launch time for the computing instance.

[00168] In one configuration, a request to launch the computing instance may be received in the computing service environment 1700. For example, the request may be received from a customer who desires computing services from the computing service environment 1700. The machine image 1722 associated with the computing instance may be identified as being in a paused state. In other words, a domain creation process for the machine image 1722 may be complete, but the domain has not been started for launching the computing instance. The computing instance may be launched by loading the machine image 1722 and then switching the machine image 1722 from the paused state to a running state, thereby minimizing the launch time for the computing instance. In one example, the most popular machine images or the most recently used machine images may be stored in the paused state in the computing service environment 1700. Therefore, launching these machine images may be performed with minimal launch times.

[00169] FIG. 18 illustrates an exemplary system and related operations for caching a machine image 1814 in a computing service environment 1800 in order to achieve a desired launch time 1812 for launching a computing instance associated with the machine image 1814. In one example, a customer may provide a launch request 1810 that specifies the desired launch time 1812 for launching the computing instance. For example, the customer may request that the computing instance be launched in less than 45 seconds.

[00170] The launch request 1810 may be provided to a cache layout module 1840.

The cache layout module 1840 may determine a cache layout for caching the machine image 1814 so that the launch time for the computing instance substantially meets the customer's requirements (e.g., 45 seconds). In other words, the cache layout module 1840 may identify physical hosts in the computing service environment 1800 that can cache the machine image 1814 so that the desired launch time 1812 can be achieved.

[00171] In one example, the cache layout module 1840 may use a launch time prediction model 530 to select which physical host is to cache the machine image 1814. The cache layout module 1840 may determine whether to cache the machine image 1814 on physical host 1850, 1860 or 1870. The cache layout module 1840 may predict an estimated launch time for launching the computing instance when the machine image 1814 is cached on the physical host 1850. The cache layout module 1840 may similarly predict the estimated launch times when the machine image 1814 is cached on the physical host 1860 or cached on the physical host 1870. The cache layout module 1840, based on information from the launch time prediction model 1830, may determine to cache the machine image 1814 on the physical host that can provide a launch time that corresponds to the desired launch time 1812 specified by the customer. As a non-limiting example, the cache layout module 1840 may determine that caching the machine image 1814 on the physical host 1850 may correspond to the desired launch time 1812 (e.g., 45 seconds) for launching the computing instance.

[00172] In another example, the desired launch time 1812 for launching the computing instance may be received from the customer and, in response, the cache layout module 1840 may identify a physical host that has cached the machine image associated with the computing instance to be launched. The cache layout module 1840 may use information from the launch time prediction model 1830 to determine whether launching the computing instance on the physical host that has cached the machine image complies with the desired launch time 1812, and if an predicted launch time complies with the desired launch time 1812, the computing instance may be launched on the physical host.

[00173] FIG. 19 illustrates an exemplary system and related operations for caching machine images in a computing service environment 1900 in accordance with a service level agreement (SLA). A computing instance actual launch time 1910 may be determined for a computing instance that is launched in the computing service environment 1900. The actual launch time for the computing instance may be determined after the computing instance has been successfully launched. The actual launch time for the computing instance may be provided to an SLA comparison module 1920. The SLA comparison module 1920 may compare the actual launch time for the computing instance with the SLA for the computing service environment 1900. In one example, the SLA comparison module 1920 may determine that the actual launch time for the computing instance is in accordance with the SLA for the computing service environment 1900.

[00174] Alternatively, the SLA comparison module 1920 may compare the actual launch time for the computing instance with the SLA for the computing service environment 1900 and determine that the actual launch time for the computing instance is not in accordance with the SLA for the computing service environment 1900. For example, the SLA may specify that the launch time for the computing instance is to be less than 10 minutes. However, the SLA comparison module may determine that the actual launch time was greater than 10 minutes. When the actual launch time is not in accordance with the SLA, the SLA comparison module 1920 may notify a cache layout module 1940. The cache layout module 1940 may determine a cache layout for caching a machine image 1912 associated with the computing instance, such that the launch time for the computing instance aligns with the SLA (e.g., the launch time is reduced). In one example, the cache layout module 1940 may modify an existing cache layout by storing the machine image 1912 on additional physical hosts in the computing service environment 1900. In the example shown in FIG. 19, the cache layout module 1940 may store the machine image 1912 on a physical host 1950 and a physical host 1970, but not on a physical host 1950, in order to reduce the launch time and comply with the SLA for the computing service environment 1900.

[00175] FIG. 20 is an exemplary block diagram 2000 that illustrates generating a launch time prediction model 2050 to predict launch times for computing instances that are launched in a computing service environment. The launch time prediction model 2050 may be a machine learning model that is created using actual launch time prediction data 2010. The actual launch time input data 2010 may include information (e.g., launch metrics) for a plurality of computing instances that have been previously launched in the computing service environment. So, the actual launch time input data 2010 may include historical information relating to previously launched computing instances in the computing service environment. In addition, the actual launch time input data 2010 may include historical information for a plurality of physical hosts in the computing service environment. The actual launch time input data 2010 may be transformed to be used to train the launch time prediction model 2050, as discussed later.

[00176] As a non-limiting example, the actual launch time input data 2010 may indicate that computing instance A took 60 seconds to launch, while computing instance A was relatively large in size, used a first type of data store, used a 32-bit architecture, and launched on a physical host that was simultaneously launching five other computing instances. As another non- limiting example, the actual launch time input data 2010 may indicate that computing instance B took 15 seconds to launch, while computing instance B was relatively small in size, used a second type of data store, used a 64-bit architecture, and launched on a physical host that was not simultaneously launch other computing instances.

[00177] The actual launch time input data 2010 may be provided to a feature selection and normalization module 2020. The feature selection and normalization module 2020 may convert the actual launch time input data 2010 into model features. In other words, the model features may relate to characteristics of the computing instances that were previously launched and characteristics of the physical hosts upon which the computing instances were previously launched. The model features may be classified as instance features and physical host features.

[00178] The instance features may include, but are not limited to, sizes of the computing instances, machine images used by the computing instance (e.g., a machine images or kernel images), whether the machine images are cached or not cached in a physical host when the computing instances are launched, architecture types of the computing instances (e.g., 32-bit architectures or 64-bit architectures), virtualization types of the computing instances (e.g., paravirtualization or hardware virtual machines), and types of data stores used by the computing instances. The instance features may include user-controlled features, such as type of operating systems (OS) and networking types (e.g., virtual private cloud) used for launching the computing instances.

[00179] The physical host features may include, but are not limited to, a maximum number of computing instances the physical host can host, a hardware type associated with the physical host, a hardware vendor associated with the physical host, a percentage occupancy at the physical host when the computing instance is to be launched, and a zone in which the physical host is located. The physical host features may include an average, minimum and maximum number of pending computing instances and/or running computing instances on the physical host that launches the computing instance (i.e., a target physical host). In addition, the physical host features may include a number of computing instances that are currently in a pending state and/or a running state on the physical host that launches the computing instance (i.e., the target physical host).

[00180] The feature selection and normalization module 2020 may normalize the model features (i.e., adjust values measured on different scales to a notionally common scale) in order to create launch time prediction training data 2030. The launch time prediction training data 2030 may represent aggregated features for the plurality of computing instances identified in the launch time prediction input data 2010. The launch time prediction training data 2030 may be provided to a machine learning selection module 2040. The machine learning selection module 2040 may use the launch time prediction training data 2030 to train various machine learning models 2042. For example, a regression model may be trained. The regression models 2042 may include, but are not limited to, support vector machines, stochastic gradient descent, adaptive boosting, extra trees, and random forest. The various regression models 2042 may correspond to the launch time prediction training data 2030 with various levels of success. In one example, a random forest regressor may provide a relatively high rate of accuracy with respect to the launch time prediction training data 2030, and therefore, the machine learning selection module 2040 may use the random forest regressor when estimating launch times for computing instances.

[00181] The launch time prediction model 2050 may receive a request to launch a computing instance, and based on the instance features and the physical host features associated with the computing instance, the launch time prediction model 2050 may predict the launch time for the computing instance. In one example, the launch time prediction model 2050 may determine that the number of simultaneous computing instance launches on the same physical host, the type of data stored used by the computing instance, the architecture type used by the computing instance and the machine image associated with the computing instance may have a greater effect on the launch time for the computing instance as compared to the other model features.

[00182] In some cases, the predicted launch time from the computing instance may diverge from an actual launch time of the computing instance. The instance features and the physical host features associated with the computing instance, as well as the actual launch time to launch the computing instance, may be used to further train the launch time prediction model 2050 in order to improve future launch time predictions.

[00183] FIG. 21 is a flow diagram illustrating an example method for reducing computing instance launch times. Expected traffic patterns in a computing service environment may be identified, as in block 2110. The expected traffic patterns may indicate machine images associated with computing instances that are expected to be launched in the computing service environment during a defined time period. In one example, heuristic rules may be used to identify the expected traffic patterns for the computing service environment. The heuristic rules may relate to historical traffic patterns for the computing service environment. [00184] The machine images may be determined to be cached in a predefined location in the computing service environment in order to reduce launch times for the computing instances as compared to not caching the machine images, as in block 2120. In other words, caching the machine image local to a physical host that is launching the computing instance may provide for a relatively faster launch time as compared to fetching the machine image from a data store over a network. In one example, a launch time prediction model may be used for determining to cache the machine image at the predefined location.

[00185] A cache layout to enable caching of the machine images in the computing service environment may be determined, as in block 2130. The cache layout may identify physical hosts at the predefined location in the computing service environment that are available to cache the machine image. The cache layout may identify a physical host that is available to cache the machine image or a group of physical hosts that are available to each cache a copy of the machine image.

[00186] The machine images may be stored on at least one physical host in the computing environment according to the cache layout, as in block 2140. Storing the machine image on the physical host may reduce the launch time for launching the computing instance. In one example, the machine image may be stored on multiple physical hosts in order to reduce the launch time for multiple computing instances based on the same machine image.

[00187] In one example, a desired launch time for launching the computing instance may be received, and the cache layout may be determined to enable caching of the machine image so an actual launch time for launching the computing instance is substantially similar to the desired launch time. In yet another example, various types of machine images that are to be cached in the computing service environment may be identified, and the cache layout may be determined to include one or more caching slots on the physical host that are capable of caching the various types of machine images.

[00188] FIG. 22 is a flow diagram illustrating another example method for reducing computing instance launch times. A computing instance that is expected to be launched in a computing service environment during a defined time period may be identified, as in block 2210. In one example, heuristic rules may be used to identify the computing instance that is expected to be launched in the computing service environment. The heuristic rules may relate to historical traffic patterns for the computing service environment. In one example, the computing instance that is expected to be launched in the computing service environment during the defined time period may be identified using a machine learning model that predicts expected traffic patterns for the computing service environment.

[00189] A determination may be made to cache the machine image in the computing service environment in order to reduce a launch time for launching the computing instance as compared to not caching the machine image, as in block 2220. In one example, a launch time prediction model may be used for determining to cache the machine image. The launch time prediction model may be a regression model that predicts the launch time for the computing instance based in part on instance features associated with the computing instance and physical host features associated with a group of physical hosts in the computing service environment.

[00190] At least one physical host in the computing service environment that is available to cache the machine image for the computing instance may be selected using the launch time prediction model, as in block 2230. In addition, the physical host may be capable of caching the machine image (e.g., the physical host includes available storage slots that are of sufficient capacity to cache the machine image).

[00191] The machine image may be stored in the physical host, as in block 2240.

The machine image may be stored on the physical host in order to reduce the launch time for launching the computing instance. In one example, the machine image may be stored on physical hosts or physical storage devices in the computing service environment according to a selected topology layer. In another example, the machine image may be stored according to a cache layout. The cache layout may indicate a plurality of physical hosts in the computing service environment that are available to cache the machine image in order to reduce the launch time for the computing instance(s). In yet another example, the machine image may be stored on the physical host so that the launch time for the computing instance is in accordance with a service level agreement (SLA) for the computing service environment.

[00192] In one configuration, a request to launch the computing instance may be received. The machine image associated with the computing instance that is cached on the physical host in the computing service environment may be identified. The machine image may be loaded on the physical host in order to provide a computing service. In another configuration, a size of the machine image to be cached in the computing service environment may be identified. The physical host in the computing service environment having a slot that is capable of caching machine images of that size may be selected. The machine image may be cached at a predefined location in the computing service environment in order to reduce the launch time associated with the machine image.

[00193] In one example, a desired launch time for launching the computing instance may be identified. The desired launch time may be received from a customer accessing the computing service environment. The physical host in the computing service environment that is available to store the machine image associated with the computing instance may be selected. The physical host may be verified as being able to launch the computing instance in accordance with the desired launch time using the launch time prediction model. The machine image may be stored in the physical host in the computing environment so an actual launch time for launching the computing instance is

substantially similar to the desired launch time.

[00194] In another example, the launch time for the computing instance may be determined to not be in accordance with a service level agreement (SLA) for the computing service environment. The machine image associated with the computing instance may be stored on additional physical hosts in the computing service environment to further reduce the launch time for the computing instance in order to comply with the SLA. The storage of the machine image on additional physical hosts may reduce the launch time when multiple computing instances are simultaneously launched. In addition, storing the machine image on additional physical hosts may allow a physical host providing a relatively lowest launch time to be selected for launching the computing instance. In yet another example, a request to launch the computing instance in the computing service environment may be received. The machine image associated with the computing instance may be identified as being in a paused state. The computing instance may be launched in the computing service environment by switching the machine image from the paused state to a running state, thereby minimizing the launch time for launching the computing instance in the computing service environment. Embodiments of the disclosure can be described in view of the following clauses:

1. A non-transitory machine readable storage medium having instructions embodied thereon, the instructions when executed by a processor:

obtain training data representing launch features for a plurality of previous computing instance launches; train a random forest regression model using the training data;

receive a request for a predicted launch time to launch a computing instance on a physical host within a computing service environment;

identify launch features associated with launching the computing instance within the computing service environment that have been determined to have an impact on a launch time of the computing instance; and

input the launch features associated with launching the computing instance into a machine learning regression model that outputs a predicted launch time for launching the computing instance within the computing service environment.

2. A non-transitory machine readable storage medium as in clause 1, wherein instructions that when executed by the processor further select launch features determined to have an impact on a launch time of a computing instance from launch features including machine image launch features, physical host launch features and customer configuration launch features.

3. A non-transitory machine readable storage medium as in clause 1, wherein instructions that when executed by the processor further determine whether an SLA (Service Level Agreement) launch time is likely to be met by comparing the predicted launch time with the SLA launch time.

4. A computer implemented method, comprising:

under control of one or more computer systems configured with executable instructions, receiving a request for a predicted launch time to launch a computing instance on a physical host within a computing service environment;

obtaining data associated with launch features of a computing instance that are determined to have an impact on a launch time of the computing instance on a physical host within a computing service environment, using a processor; and

inputting the launch features of the computing instance to a machine learning model that outputs a predicted launch time for launching a computing instance within the computing service environment, using the processor. 5. A method as in clause 4, wherein obtaining the data associated with launch features further comprises obtaining data associated with machine image launch features, physical host launch features and customer configuration launch features.

6. A method as in clause 4, further comprising normalizing the launch features prior to inputting the launch features to the machine learning model.

7. A method as in clause 4, further comprising performing a parameter value search for machine learning parameters that result in a goodness-of-fit of the machine learning model to the launch features.

8. A method as in clause 7, wherein performing the parameter value search further comprises, performing a parameter value search for machine learning parameters using a distributed genetic algorithm.

9. A method as in clause 4, wherein obtaining data associated with launch features further comprises, obtaining active training data representing launch features for a plurality of previous computing instance launches;

extracting features from the active training data associated with the launch features; and

training the machine learning model using the features from the active training data.

10. A method as in clause 4, wherein inputting the launch features to a machine learning model further comprises inputting the launch features to a machine learning model selected from at least one of: a random forest model, an extremely randomized trees model, an AdaBoost model, a stochastic gradient descent model or a support vector machine model.

11. A method as in clause 4, wherein inputting the launch features to a machine learning model further comprises, inputting the launch features of the computing instance to a machine learning regression model. 12. A method as in clause 4, further comprising, receiving a launch request from a customer to launch the computing instance;

identifying an SLA launch time associated with a computing instance type for the computing instance; and

determining whether the SLA launch time is likely to be met by comparing the predicted launch time for the computing instance with the SLA launch time.

13. A method as in clause 12, further comprising notifying a computing service provider that the SLA launch time is likely not going to be achieved when a

determination is made that the predicted launch time is greater than the SLA launch time.

14. A method as in clause 12, further comprising constructing an SLA breach feature for the predicted launch time being greater than the SLA launch time and including the SLA breach feature with other features inputted to a machine learning classification model.

15. A method as in clause 4, further comprising:

identifying an SLA launch time for the computing instance;

determining whether the SLA launch time is likely to be met by comparing the predicted launch time for the computing instance with the SLA launch time; and

analyzing a state of a computing service environment in which the computing instance is to be launched to determine whether an action may be performed that is likely to prevent a breach of the SLA launch time when a determination has been made that the SLA launch time will likely be breached.

16. A method as in clause 15, wherein an action performed further comprises, removing a physical host from a group of physical hosts available to host the computing instance.

17. A method as in clause 15, wherein an action performed further comprises, adding at least one physical host to a group of physical hosts available to host the computing instance. 18. A system comprising :

a processor;

a memory device including instructions that, when executed by the processor, cause the system to:

identifying launch features contained in a launch configuration, the launch features having been determined to have an impact on a launch time of a computing instance within a computing service environment;

obtain data for the launch features from a data source;

input the launch features to a machine learning model that outputs a predicted launch time for launching a computing instance within the computing service

environment; and

determine whether an SLA launch time is likely to be met by comparing the predicted launch time with the SLA launch time.

19. A system as in clause 18, wherein the memory device includes instructions that, when executed by the processor, cause the system to input the launch features to a machine learning regression model.

20. A system as in clause 18, wherein the memory device includes instructions that, when executed by the processor, cause the system to obtain: features of a machine image used to launch the computing instance, features of physical host servers capable of hosting the computing instance and features of computing instance attachments that are to be attached to the computing instance when launched.

21. A non-transitory machine readable storage medium comprising instructions embodied thereon, the instructions upon execution by a processor cause a system to: receive a request to launch a computing instance in a computing service environment; provide instance features associated with the computing instance and physical host features associated with a group of physical hosts in the computing service environment to a machine learning model;

determine estimated launch times for launching the computing instance on individual physical hosts in the computing service environment using the machine learning model; and select a physical host from the group of physical hosts to provide placement of the computing instance according to a lower estimated launch time as compared to other physical hosts in the group of physical hosts.

22. The non-transitory machine readable storage medium as in clause 21, further comprising instructions that upon execution further cause the system to:

assign a weighting value to the estimated launch times for the computing instance; and select the physical host for placement of the computing instance based in part on the weighting value assigned to the estimated launch times.

23. The non-transitory machine readable storage medium as in clause 21, further comprising instructions that upon execution further cause the system to: select the physical host from the group of physical hosts to provide placement of the computing instance using placement constraints included in the request to launch the computing instance.

24. The non-transitory machine readable storage medium as in clause 21 , wherein the estimated launch times are one of a plurality of factors used when selecting the physical host for placement of the computing instance, the plurality of factors including at least one of a physical host utilization placement factor, a licensing cost placement factor and a disaster impact factor.

25. A computer implemented method, comprising:

under control of one or more computer systems configured with executable instructions: receiving a request to launch a computing instance in a computing service environment, using one or more processors of the computer systems;

identifying estimated launch times for the computing instance to launch on physical hosts in a group of physical hosts, using the one or more processors of the computer systems; and

selecting a physical host in the group of physical hosts to place the computing instance based in part on the estimated launch times for the computing instance, using the one or more processors of the computer systems. 26. The method of clause 25, further comprising causing the computing instance on the physical host to be launched.

27. The method of clause 25, further comprising:

comparing the estimated launch times for the computing instance on each physical host with other physical hosts in the group of physical hosts; and

selecting the physical host in the group of physical hosts with a lower estimated launch time as compared with the other physical hosts in the group of physical hosts.

28. The method of clause 25, further comprising identifying the estimated launch times for launching the computing instance based in part on instance features associated with the computing instance and physical host features associated with the physical hosts in the group of physical hosts, wherein the instance features associated with the computing instance include user selected features.

29. The method of clause 25, further comprising identifying the estimated launch times for launching the computing instance using a regression model.

30. The method of clause 25, further comprising:

assigning a weighting value to the estimated launch times for the computing instance; and selecting the physical host for placement of the computing instance based in part on the weighting value assigned to the estimated launch times.

31. The method of clause 25, further comprising selecting the physical host in the group of physical hosts to place the computing instance using additional placement factors, the additional placement factors including at least one of: a physical host utilization placement factor, a licensing cost placement factor or a disaster impact placement factor.

32. The method of clause 25, further comprising selecting the physical host that includes a lower number of computing instances being simultaneously launched as compared to other physical hosts in the group of physical hosts. 33. The method of clause 25, further comprising selecting a region or a zone for placement of the computing instance based in part on the estimated launch times for computing instances in the region or zone.

34. The method of clause 25, further comprising verifying that the physical host selected to execute the computing instance is not simultaneously executing a number of computing instances that exceeds a predetermined threshold.

35. The method of clause 25, wherein the estimated launch times include a period of time between receiving a computing instance launch request from a customer and booting the computing instance on the physical host.

36. The method of clause 25, further comprising:

identifying an estimated amount of time to launch an attachment associated with the computing instance; and

selecting the physical host that can provide placement of the computing instance with a lower estimated attachment time as compared to other physical hosts in the group of physical hosts.

37. A system for determining computing instance placement, comprising:

a processor;

receive a request to launch a computing instance on a computing service provider;

identify estimated launch times for launching the computing instance on a physical host based in part on instance features associated with the computing instance and physical host features associated with physical hosts configured to host computing instances; and select a physical host to place the computing instance based in part on the estimated launch times for the computing instance.

38. The system of clause 37, wherein the memory device including instructions that, when executed by the processor, cause the system to select the physical host having a lower number of computing instances that are simultaneously being launched as compared to other physical hosts.

39. The system of clause 37, wherein the memory device including instructions that, when executed by the processor, cause the system to verify that the physical host selected to execute the computing instance is not simultaneously executing a number of computing instances that exceeds a predetermined threshold.

40. The system of clause 37, wherein the memory device including instructions that, when executed by the processor, cause the system to identify the estimated launch times for the computing instance using a regression model that predicts launch times for computing instances based in part on the instance features and the physical host features.

41. A non-transitory machine readable storage medium having instructions embodied thereon, the instructions being executed by a processor to improve computing instance launch times, comprising:

identifying expected traffic patterns indicating machine images associated with computing instances that are expected to be launched in a computing service environment during a defined time period;

determining, using a launch time prediction model, that caching the machine images at a predefined location in the computing service environment will reduce launch times for the computing instances as compared to not caching the machine images;

determining a cache layout to enable caching of the machine images in the computing service environment, the cache layout identifying physical hosts at the predefined location in the computing service environment that are available to cache the machine images; and storing the machine images on at least one physical host in the computing environment according to the cache layout.

42. A non-transitory machine readable storage medium as in claim 41, further comprising using heuristic rules to identify the expected traffic patterns for the computing service environment, the heuristic rules relating to historical traffic patterns for the computing service environment. 43. A non-transitory machine readable storage medium as in claim 41, further comprising:

receiving a desired launch time for launching a computing instance; and

determining the cache layout to enable caching of a machine image associated with the computing instance so an actual launch time for launching the computing instance is substantially similar to the desired launch time.

44. A non-transitory machine readable storage medium as in claim 41, further comprising:

identifying various types of machine images that are to be cached in the computing service environment; and

determining the cache layout to include one or more slots on the physical host that are capable of caching the various types of machine images.

45. A computer implemented method, comprising:

under control of one or more computer systems configured with executable instructions: identifying a computing instance that is expected to be launched in a computing service environment during a defined time period, using one or more processors of the computer systems;

determining, via a launch time prediction model, that caching a machine image for the computing instance in the computing service environment will reduce a launch time for launching the computing instance as compared to not caching the machine image, using the one or more processors of the computer systems;

selecting at least one physical host in the computing service environment that is available to cache the machine image to lower the launch time of the computing instance as predicted by the launch time prediction model, using the one or more processors of the computer systems; and

storing the machine image in the physical host, using the one or more processors of the computer systems.

46. The method of claim 45, further comprising:

receiving a request to launch the computing instance; identifying the machine image associated with the computing instance that is cached on the physical host in the computing service environment; and

loading the machine image on the physical host in order to provide a computing service.

47. The method of claim 45, further comprising using heuristic rules to identify expected traffic patterns for the computing service environment, the heuristic rules relating to historical traffic patterns for the computing service environment.

48. The method of claim 45, further comprising identifying the computing instance that is expected to be launched in the computing service environment during the defined time period and a defined geographical location using a machine learning model that predicts expected traffic patterns for the computing service environment.

49. The method of claim 45, further comprising:

identifying a size of the machine image to be cached in the computing service environment;

selecting the physical host in the computing service environment having a slot that is capable of caching machine images of that size, the machine image being cached at a predefined location in the computing service environment in order to reduce the launch time associated with the machine image.

50. The method of claim 45, further comprising storing the machine image on the computing service environment according to a selected topology layer.

51. The method of claim 45, further comprising storing the machine image according to a cache layout, the cache layout indicating a plurality of physical hosts in the computing service environment that are available to cache the machine image in order to reduce the launch time.

52. The method of claim 45, further comprising:

identifying a desired launch time for launching the computing instance;

selecting the physical host in the computing service environment that is available to store the machine image associated with the computing instance; and storing the machine image in the physical host in the computing environment so an actual launch time for launching the computing instance is substantially similar to the desired launch time.

53. The method of claim 45, further comprising storing the machine image on the physical host so that the launch time is in accordance with a service level agreement (SLA) for the computing service environment.

54. The method of claim 45, further comprising:

determining that the launch time for the computing instance is not in accordance with a service level agreement (SLA) for the computing service environment; and

storing the machine image associated with the computing instance on additional physical hosts in the computing service environment to further reduce the launch time for computing instances associated with the machine image in order to comply with the SLA.

55. The method of claim 45, further comprising:

receiving a request to launch the computing instance in the computing service

environment;

identifying that the machine image associated with the computing instance is in a paused state; and

launching the computing instance in the computing service environment by switching the machine image from the paused state to a running state, thereby minimizing the launch time for launching the computing instance in the computing service environment.

56. The method of claim 45, wherein the launch time prediction model is a regression model that predicts the launch time for the computing instance based in part on instance features associated with the computing instance and physical host features associated with a group of physical hosts in the computing service environment.

57. A system for reducing computing instance launch times, comprising

a processor;

a memory device including instructions that, when executed by the processor, cause the system to: identify expected traffic patterns in a computing service environment, the expected traffic patterns indicating a machine image associated with a computing instance that is expected to be launched in the computing service environment during a defined time period;

determine, using a launch time prediction model, that caching the machine image in a predefined location in the computing service environment will reduce a launch time for the computing instance as compared to not caching the machine image;

determine a cache layout to enable caching of the machine image in the computing service environment, the cache layout identifying physical hosts at the predefined location in the computing service environment that are available to cache the machine image; and store the machine image on at least one physical host in the computing environment according to the cache layout.

58. The system of claim 57, wherein the memory device including instructions that, when executed by the processor, cause the system to store the machine image on the physical host so that the launch time is in accordance with a service level agreement (SLA) for the computing service environment.

59. The system of claim 57, wherein the memory device including instructions that, when executed by the processor, cause the system to store the machine image on the computing service environment according to a selected topology layer.

60. The system of claim 57, wherein the memory device including instructions that, when executed by the processor, cause the system to:

receive a request to launch the computing instance;

identify the machine image associated with the computing instance that is cached on the physical host in the computing service environment; and

load the machine image in order to provide a computing service.

[00195] While the flowcharts presented for this technology may imply a specific order of execution, the order of execution may differ from what is illustrated. For example, the order of two more blocks may be rearranged relative to the order shown. Further, two or more blocks shown in succession may be executed in parallel or with partial parallelization. In some configurations, one or more blocks shown in the flow chart may be omitted or skipped. Any number of counters, state variables, warning semaphores, or messages might be added to the logical flow for purposes of enhanced utility, accounting, performance, measurement, troubleshooting or for similar reasons.

[00196] Some of the functional units described in this specification have been labeled as modules, in order to more particularly emphasize their implementation independence. For example, a module may be implemented as a hardware circuit comprising custom VLSI circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. A module may also be

implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices or the like.

[00197] Modules may also be implemented in software for execution by various types of processors. An identified module of executable code may, for instance, comprise one or more blocks of computer instructions, which may be organized as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different locations which comprise the module and achieve the stated purpose for the module when joined logically together.

[00198] Indeed, a module of executable code may be a single instruction, or many instructions and may even be distributed over several different code segments, among different programs and across several memory devices. Similarly, operational data may be identified and illustrated herein within modules and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices. The modules may be passive or active, including agents operable to perform desired functions.

[00199] The technology described here may also be stored on a computer readable storage medium that includes volatile and non-volatile, removable and non-removable media implemented with any technology for the storage of information such as computer readable instructions, data structures, program modules, or other data. Computer readable storage media include, but is not limited to, non-transitory media such as RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or any other computer storage medium which may be used to store the desired information and described technology.

[00200] The devices described herein may also contain communication connections or networking apparatus and networking connections that allow the devices to communicate with other devices. Communication connections are an example of communication media. Communication media typically embodies computer readable instructions, data structures, program modules and other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. A "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example and not limitation, communication media includes wired media such as a wired network or direct-wired connection and wireless media such as acoustic, radio frequency, infrared and other wireless media. The term computer readable media as used herein includes communication media.

[00201] Reference was made to the examples illustrated in the drawings and specific language was used herein to describe the same. It will nevertheless be understood that no limitation of the scope of the technology is thereby intended.

Alterations and further modifications of the features illustrated herein and additional applications of the examples as illustrated herein are to be considered within the scope of the description.

[00202] Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more examples. In the preceding description, numerous specific details were provided, such as examples of various configurations to provide a thorough understanding of examples of the described technology. It will be recognized, however, that the technology may be practiced without one or more of the specific details, or with other methods, components, devices, etc. In other instances, well-known structures or operations are not shown or described in detail to avoid obscuring aspects of the technology.

[00203] Although the subject matter has been described in language specific to structural features and/or operations, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features and operations described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. Numerous modifications and alternative arrangements may be devised without departing from the spirit and scope of the described technology.

Claims

CLAIMS WHAT IS CLAIMED IS:

1. A computer implemented method, comprising:

inputting the launch features of the computing instance to a machine learning model that outputs a predicted launch time for launching a computing instance within the computing service environment, using the processor.

2. A method as in claim 1, wherein obtaining the data associated with launch features further comprises obtaining data associated with machine image launch features, physical host launch features and customer configuration launch features.

3. A method as in claim 1, further comprising normalizing the launch features prior to inputting the launch features to the machine learning model.

4. A method as in claim 1, further comprising performing a parameter value search for machine learning parameters that result in a goodness-of-fit of the machine learning model to the launch features.

5. A method as in claim 4, wherein performing the parameter value search further comprises, performing a parameter value search for machine learning parameters using a distributed genetic algorithm.

6. A method as in claim 1, wherein obtaining data associated with launch features further comprises, obtaining active training data representing launch features for a plurality of previous computing instance launches;

7. A method as in claim 1, wherein inputting the launch features to a machine learning model further comprises inputting the launch features to a machine learning model selected from at least one of: a random forest model, an extremely randomized trees model, an AdaBoost model, a stochastic gradient descent model or a support vector machine model.

8. A method as in claim 1, wherein inputting the launch features to a machine learning model further comprises, inputting the launch features of the computing instance to a machine learning regression model.

9. A method as in claim 1, further comprising, receiving a launch request from a customer to launch the computing instance;

10. A method as in claim 9, further comprising notifying a computing service provider that the SLA launch time is likely not going to be achieved when a

11. A method as in claim 9, further comprising constructing an SLA breach feature for the predicted launch time being greater than the SLA launch time and including the SLA breach feature with other features inputted to a machine learning classification model.

12. A method as in claim 1, further comprising:

identifying an SLA launch time for the computing instance;

13. A system comprising :

a processor;

obtain data for the launch features from a data source;

environment; and

14. A system as in claim 13, wherein the memory device includes instructions that, when executed by the processor, cause the system to input the launch features to a machine learning regression model.

15. A system as in claim 13, wherein the memory device includes instructions that, when executed by the processor, cause the system to obtain: features of a machine image used to launch the computing instance, features of physical host servers capable of hosting the computing instance and features of computing instance attachments that are to be attached to the computing instance when launched.