WO2017046635A1

WO2017046635A1 - High-availability multi-component cloud application placement using stochastic availability models

Info

Publication number: WO2017046635A1
Application number: PCT/IB2015/059043
Authority: WO
Inventors: Ali Kanso; Parisa HEIDARI; Manar JAMMAL
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2015-09-14
Filing date: 2015-11-23
Publication date: 2017-03-23

Abstract

Techniques are disclosed for providing high-availability multi-component cloud application placement using stochastic availability models. A model definition defining a plurality of components of a multi-component cloud application can be transformed to a stochastic model (such as a Stochastic Colored Petri Nets model) that includes representations of the plurality of components, one or more virtual machines (VMs) that execute the plurality of components, one or more server computing devices that execute the one or more VMs, and one or more data centers hosting the one or more server computing devices. A plurality of deployment possibilities of the multi-component application can be evaluated using the stochastic model to yield a plurality of service availabilities of the multi-component application corresponding to the plurality of deployment possibilities. The service availabilities can be used to determine a deployment possibility that can be used to allocate resources for the application.

Description

HIGH-AVAILABILITY MULTI-COMPONENT CLOUD APPLICATION PLACEMENT USING STOCHASTIC AVAILABILITY MODELS

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority from U.S. Provisional Application No. 62/218,510, filed on September 14, 2015, the content of which is incorporated by reference.

FIELD

[0002] Embodiments of the invention generally relate to the field of cloud computing, and more specifically, to high-availability multi-component cloud application placement using stochastic availability models.

BACKGROUND

[0003] With the cloud computing era, many business applications are offered as cloud services where they can be accessed anytime and anywhere. Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) are essential forms of cloud services provided for many enterprises, such as Microsoft Azure and Amazon Elastic Compute 2 (EC2). Depending on the cloud user's needs, PaaS and IaaS provide the required web applications and computational resources in the form of virtual machines (VMs). With the widespread deployment of and reliance upon on-demand cloud services/VMs, availability has become a paramount aspect for cloud providers and users. However, these services can encounter a variety of different types of hardware and software failures and consequently become unavailable. Accordingly, there is a need for robust high-availability (HA) solutions that can mitigate any encountered downtime and recover any data loss.

SUMMARY

[0004] In some embodiments, a method in a computing device for performing High- Availability (HA) aware scheduling of a multi-component cloud application includes generating, based upon a model definition, which defines a plurality of components of the multi-component cloud application, a stochastic model. The stochastic model can be a Stochastic Colored Petri Nets model. The stochastic model includes representations of the plurality of components, one or more virtual machines that execute the plurality of components, one or more server computing devices that execute the one or more virtual machines, and one or more data centers hosting the one or more server computing devices. The method also includes evaluating, by the computing device using the stochastic model, a plurality of deployment possibilities of the multi-component application to yield a plurality of service availabilities of the multi-component application corresponding to the plurality of deployment possibilities. The method also includes selecting, by the computing device based upon at least some of the plurality of service availabilities, a first deployment possibility from the plurality of deployment possibilities. This selecting of the first deployment possibility can occur in accordance with a selection scheme.

[0005] In some embodiments, generating the stochastic model includes generating a dependency graph based upon the definitions of the plurality of components from the model definition; identifying a number of tiers of the multi-component cloud application and an ordering of the tiers; and creating, for each of the tiers, a representation of: a load balancer model for the tier, a component model for each of the plurality of components belonging to the tier, a virtual machine model for each of the one or more virtual machines belonging to the tier, and a server model for each of the one or more server computing devices belonging to the tier. In some embodiments, generating the stochastic model further includes creating one or more data center models.

[0006] In some embodiments, the evaluating comprises utilizing a simulator tool to quantify the plurality of service availabilities of the multi-component cloud application. In some embodiments, the simulator tool quantifies the plurality of service availabilities by, for each of the plurality of deployment possibilities: (a) simulating a processing of a plurality of requests by the stochastic model under the deployment possibility, wherein the simulating includes stochastically introducing one or more failures into the stochastic model; and (b) determining the service availability by determining a percentage of the plurality of requests that were successfully processed through the stochastic model.

[0007] In some embodiments, at least a first of the plurality of deployment possibilities that is evaluated includes placing all of the plurality of components within a same data center; and at least a second of the plurality of deployment possibilities that is evaluated includes placing at least a first of the plurality of components in a first data center and at least a second of the plurality of components in a second data center.

[0008] In some embodiments, the model definition includes, for at least one of the

components: (a) an arrival rate attribute value indicating an incoming number of requests for the component; (b) a processing time attribute value indicating a time duration for the component to process each of the requests; (c) a buffer size attribute value indicating a number of the requests that the component can process in parallel; (d) a queue size attribute value indicating a maximum capacity of the requests that can wait in a queue to be processed; and (e) a replica number attribute value indicating a number of redundant replicas considered for the component. In some embodiments, the model definition further includes, for the at least one of the components: (f) a redundancy model attribute value indicating a redundancy type that the component is capable to accept, wherein the redundancy type is at least one of active, standby, and spare.

[0009] In some embodiments, the method further includes causing the multi-component cloud application to be scheduled according to the selected first deployment possibility.

[0010] In some embodiments, the method further includes providing, to a client, information describing the plurality of service availabilities; and receiving, from the client, a selection of the first deployment possibility to be used for deploying the multi-component cloud application.

[0011] In some embodiments, the stochastic model comprises a Stochastic Colored Petri Nets model.

[0012] In some embodiments, selecting the first deployment possibility from the plurality of deployment possibilities comprises identifying a plurality of candidate placement solutions from the plurality of deployment possibilities, determining a distance score for each of the plurality of candidate placement solutions, and selecting a first of the plurality of candidate placement solutions that has a highest determined distance score among the determined distance scores. In some embodiments, determining the distance score for each of the plurality of candidate placement solutions includes determining a data center distance score for each of the plurality of data centers utilized in the plurality of candidate placement solutions. In some embodiments, determining the data center distance score for each of the plurality of data centers utilized in the plurality of candidate placement solutions includes: (a) calculating an average utilization of all of the others of the plurality of data centers utilized in the plurality of candidate placement solutions; (b) determining an allowed load value for the data center based upon the average utilization and an overload factor of the data center; and (c) determining the data center distance score based upon the allowed load value and a current load of the data center. In some embodiments, determining the distance score for each of the plurality of candidate placement solutions further includes averaging, for each of the plurality of candidate placement solutions, the data center distance scores of those data centers utilized by the candidate placement solution.

[0013] According to some embodiments, a non-transitory computer-readable storage medium can store instructions which, when executed by a processor of a computing device such as a server computing device, cause the computing device to perform the above method. According to some embodiments, a computing device is described that comprises a processor and the above non-transitory computer-readable storage medium. According to some embodiments, a computing device is described that comprises one or more interfaces, and one or more processors operationally connected to the one or more interfaces and to a memory, which contains instructions that, when executed, cause the one or more processors to perform the above method.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014] The invention may best be understood by referring to the following description and accompanying drawings that are used to illustrate embodiments of the invention. In the drawings:

[0015] Figure 1 is a block diagram illustrating an exemplary multi-component cloud application.

[0016] Figure 2 is a flow diagram illustrating exemplary operations for performing high- availability multi-component cloud application placement using stochastic availability models according to some embodiments.

[0017] Figure 3 illustrates an exemplary Unified Modeling Language (UML) model for a cloud deployment according to some embodiments.

[0018] Figure 4 is a flow diagram illustrating exemplary operations for transforming a cloud model to yield a Stochastic Petri Net model according to some embodiments.

[0019] Figure 5A illustrates an exemplary data center sub-model according to some embodiments.

[0020] Figure 5B illustrates an exemplary server sub-model according to some embodiments.

[0021] Figure 5C illustrates an exemplary virtual machine sub-model according to some embodiments.

[0022] Figure 6 illustrates an exemplary load balancer sub-model according to some embodiments.

[0023] Figure 7 illustrates an exemplary component sub-model according to some

embodiments.

[0024] Figures 8 A, 8B, and 8C collectively illustrate an exemplary Stochastic Colored Petri Net (SCPN) model of a three-tier application running in a cloud environment according to some embodiments.

[0025] Figure 9A illustrates service availability of different deployments and different Mean Time To Repair (MTTRs) where the data centers have similar Mean Time To Failure (MTTF) values according to some embodiments.

[0026] Figure 9B illustrates service availability of different deployments and different MTTRs where the data centers have different MTTF values according to some embodiments. [0027] Figure 10A illustrates service availability of different deployments and different MTTRs where the data centers have different MTTF values than those of Figure 9B according to some embodiments.

[0028] Figure 10B illustrates served request amounts of different deployments when the served requests have different processing times according to some embodiments.

[0029] Figure 11 is a block diagram illustrating an exemplary data processing system that can be used in some embodiments.

[0030] Figure 12 illustrates a non-limiting example functional block diagram of a server computing device in accordance with some embodiments.

[0031] Figure 13 is a flow diagram illustrating exemplary operations for selecting a placement scenario, that may be optimal, for a multi-component cloud application from multiple placement scenarios according to some embodiments.

DESCRIPTION OF EMBODIMENTS

[0032] In the following description, numerous specific details such as logic implementations, resource partitioning/sharing/duplication implementations, types and interrelationships of system components, and logic partitioning/integration choices are set forth in order to provide a more thorough understanding of embodiments of the present invention. It will be appreciated, however, by one skilled in the art that the invention may be practiced without such specific details. In other instances, control structures, gate level circuits, and full software instruction sequences have not been shown in detail in order not to obscure aspects of the invention. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation. Accordingly, the figures and description provided herein are not intended to be restrictive.

[0033] References in the specification to "one embodiment," "an embodiment," "an example embodiment," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

[0034] Bracketed text and blocks with dashed borders (e.g., large dashes, small dashes, dot- dash, and dots) may be used herein to illustrate optional operations that add additional features to embodiments of the invention. However, such notation should not be taken to mean that these are the only options or optional operations, and/or that blocks with solid borders are not optional in certain embodiments of the invention.

[0035] In the following description and claims, the terms "coupled" and "connected," along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. "Coupled" is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other. "Connected" is used to indicate the establishment of communication between two or more elements that are coupled with each other.

[0036] Analytical models such as stochastic Petri Nets (SPNs) and Markov chains can be used to analyze the reliability/availability of many complicated information technology (IT) systems. However, the complicated nature of cloud infrastructure's configurations and dynamic state changes require a comprehensive and analytical availability-centric model. Such model should satisfy requirements including (1) the ability to capture the stochastic nature of failures according to different probability distribution functions, (2) the ability to capture the cloud elements (data centers, servers, and VMs) and the correlation aspect of their failures, (3) the ability to capture the functional workflow between the components of multi-component and/or multi-tiered applications (queuing and request forwarding) as well as the high availability mechanisms they employ (load balancing and redundancy schemes), (4) the ability to capture different deployments of the application components in the cloud (inter-Data Center (DC) vs. intra-DC deployment), and (5) the ability to assess and quantify the expected availability of the application according to its cloud deployment. Embodiments described herein provide such a model providing these abilities.

[0037] Cloud services experience various stochastic failures and consequently become unavailable. With the trend of such systems needing to be "always on" and "always available," inoperative services halt the business continuity. It is not enough only to provide a High- Availability (HA) solution that can mitigate failures and maintain certain availability baseline, but it is also necessary to assess the solution and its resiliency to any failure modes.

[0038] Embodiments disclosed herein implement an availability analysis approach that can consider the effects of hardware and/or software failures of a variety of types, recovery durations, load balancing delays, and/or request processing times to thereby assess whether a given cloud deployment would be able to satisfy a service level agreement (SLA) for a deployment of a cloud application. The embodiments utilize a stochastic model (e.g., a

Stochastic Petri Net model, or "SPN" model) to evaluate the availability of cloud services and their deployment in geographically distributed data centers. In some embodiments, the model captures the characteristics of the cloud provider and user, and represents them as elements of an availability model. These elements can be then synchronized according to the dependencies between them in order to form a stochastic availability model. Thus, some embodiments evaluate whether a given cloud deployment can satisfy the HA requirements of a cloud deployed application.

[0039] As disclosed herein, some embodiments utilize stochastic models that account for the different types of hardware and software failure as well as account for scheduling challenges of cloud application, and thereby provide beneficial aspects in availability planning. Accordingly, embodiments can utilize a stochastic model that evaluates the availability of a cloud' s services and their deployments in inter- or intra-data centers. This model can account for different failure types, functionality constraints, redundancy models, and/or interdependencies between different components' applications. Consequently, different decisions can be extracted from this model that aid in designing the best HA solution of an existing cloud model.

[0040] Although some HA models have been proposed to mitigate the software or hardware failures of a virtualized system, these approaches do not associate their HA-aware solution with an availability assessment model to evaluate the impact of the above requirements on that solution. For example, different types of failures affect the cloud infrastructure and applications, which are not accounted for in existing approaches. Additionally, further challenges are raised when choosing a best deployment for a cloud applications while satisfying the HA and functionality requirements. Therefore, it is necessary to understand the various failure forms affecting the cloud model, the HA-aware scheduling challenges, and the need for a stochastic model to handle them.

[0041] Additional background including different failure types affecting the cloud model, challenges of applications' scheduling, and the need for stochastic models is detailed below.

[0042] Failure types and Distributions

[0043] A cloud model typically includes multiple data centers (DCs), each having a set of server computers and a set of applications with multiple components. Using an appropriate scheduling solution, the applications are set up to be hosted on those servers that best fit the application requirements using Virtual Machine (VM) (or containers) mapping. Consequently, any DC/server's failure mode can bring the hosted application down whether it is a planned or unplanned outage. Unplanned downtime can be defined as the time where a system enters a failure mode and becomes unavailable. Such downtime is a result of unexpected failure events, and consequently, neither the cloud provider nor the users are notified of it in advance.

Therefore, it is necessary to have a model that takes into account the actual effect of failures on the system's availability. There are different forms of failures: [0044] One form of failures is Hardware/Infrastructure failures that happen at the data center and server layers. They can be the results of faulty server's, storage's, and network's elements such as faults in memory chips, disk drivers/arrays, switches, routers, or cabling. Such failures can be captured by the failure rates of the servers as well as the entire DC.

[0045] Another form of failures includes Application failures, which are defects that occur at the application and VM levels. Application failures might be generated from a hypervisor malfunctioning, unresponsiveness of the operating system, file corruption, or viruses and/or software bugs such as Heisenbugs, Bohrbugs, Schroedinbugs, or Mandelbugs. Such failures are captured by the failure rate of the components and VMs.

[0046] A third form of failure is a Force majeure failure, which are events that affect both the cloud provider infrastructure and the cloud applications. They can be generated from power loss, storms, fires, earthquakes, floods, or other natural disasters. Due to their scale, such failures can be captured by the failure rate of the DC.

[0047] A fourth form of failure is known as a Cascading failure, which is the result of an accumulated impact of hardware or software failures. For example, a dynamic host

configuration protocol (DHCP) server malfunctioning can flood the network with DHCP requests causing a DC failure. Consequently, its corresponding servers and their hosted applications/VMs may become inaccessible. The functionality of the corresponding VM or application is ceased, which associates its recovery with the repair or recovery policy of its host. Due to their propagation impact, such failures can be captured by the failure rate of the DC.

[0048] Each of the previous failure states is associated with a failure rate or mean time to failure (MTTF) and mean time to repair or recover (MTTR) determined by the used repair or recovery policy. Due to the stochastic nature of the corresponding failure events, embodiments of the present invention assume that they are generated using certain probabilistic distribution functions. However, there is no restriction or specific consent on the distribution type of every failure event, and it can follow exponential, Weibull, normal, or any other stochastic model. Regarding the recovery or repair policy, it is assumed to have a deterministic or a stochastic nature depending on the utilized recovery behavior. The exponential failure distribution can be used in failure analysis and availability analysis and in some embodiments, the exponential failure distribution is used to reflect failure rate or MTTF of DCs, servers, and

applications/VMs. Such distribution can be applied on all the stochastic failure transitions of the disclosed Petri Net model. As for the repair/recovery timed transitions, a deterministic distribution can be applied on them to trigger any repair or recovery behavior for the DCs, servers, and VMs/applications. It should be noted that some embodiments also support other failure rates, and do not depend on a specific probability distribution. [0049] Challenges in Modeling and Placing Multi-component/Multi-tiered Applications

[0050] When it comes to HA-aware scheduling of applications in a cloud environment, various HA approaches can be adopted to mitigate the impact of outages. Some scheduling solutions are associated with load balancing mechanism for HA purposes, while other schedulers incorporate their approach with replication or failover techniques to maintain certain HA baseline. The challenge here lies in selecting the best deployment model while analyzing the impact of the adopted HA mechanism, different failure types, functionality constraints, the redundancy, and interdependency models between different components.

[0051] For example, Figure 1 is a block diagram illustrating an exemplary multi-component cloud application 100. In this diagram, one or more clients 102A-102N interact with the multi- component cloud application 100 deployed in a cloud 110, which can include one data center or multiple data centers 165. In some embodiments, the cloud 110 includes multiple data centers 165, and some or all of the multiple data centers 165 can be at separate geographic locations.

[0052] Typical multi-component applications include three tiers, including a "first" front-end tier 150 (e.g., multiple web servers 104A-104L), a "second" business logic application tier 155 (of one or more application instances 106A-106M) on the middle tier, and a "third" backend tier 160 (e.g., one or more databases (DBs) 108A-108P) storing the system state at the back-end, as illustrated in Figure 1.

[0053] In this illustrated embodiment, the web servers 104A-104L depend on the

application(s) 106A-106M that, in turn, is sponsored by the one or more databases 108A-108P. Each tier or component type (web or "Hyper Text Transfer Protocol (HTTP)" servers 104A- 104L, App(s) 106A-106M, and DB(s) 108A-108P) may include a primary component (e.g., 104A, 106A, 108A) and multiple active and/or potentially standby replicas (e.g., 104B-104L, 106B-106M, 108B-108M) as shown in Figure 1.

[0054] Each type of component can be associated with certain failure types. When it comes to deploying such an application 100 in a single cloud 110 with geographically distributed DCs 165, multiple options can be considered on whether inter-DC or intra-DC deployment should be selected. For example, it is not always the case that a maximum inter-DC distribution (i.e., a distribution where the components are distributed over different DCs and/or server computing devices as much as possible) is optimal, because this decision depends on many factors such as the failure distributions, recovery behaviors, and the utilized HA mechanisms as will be detailed further herein.

[0055] Multi-component applications use redundancy models and load balancing to maintain certain HA baselines. As detailed above, in some embodiments, each layer or "tier" can include a primary component that can be backed up with multiple active/standby/spare components depending on the used redundancy model. Upon the arrival of requests from the client(s) 102A- 102N, load balancers (110A, 110B, 1 IOC) distribute the requests between different components. Through constant monitoring, the load balancers (110A, 110B, 1 IOC) can ensure that these requests are served by healthy components (or VMs implementing the components). Upon a failure detection of a component in a tier, the associated load balancer can remove the faulty component from the load balancing group and redirect the request to a healthy component in the group.

[0056] Modeling systems with Petri Nets

[0057] Petri Nets can be used to model the behavior of different Discrete Event Systems (DES). They are graphically represented as directed graphs with two types of nodes: places and transitions. Different extensions of Petri Nets have been developed to make them more expressive. Deterministic Stochastic Petri Nets (DSPN) are one type of Petri Net extension for modeling systems with stochastic and deterministic behavior. Three transition types are defined in DSPN: (1) immediate transitions that model the actions that happen without any delay under a condition, (2) timed transitions that model the actions that happen after a deterministic delay, and (3) stochastic transitions that model the actions that happen after an exponentially distributed delay.

[0058] DSPN can be formally represented as a tuple of (P, T, I, O, H, G, Mo, r, W, Π), where P and T are the non- empty disjoint finite sets of places and transitions, respectively. / and O are the forward and backward incidence functions such that 1, 0 : (P X T) K (T x P)→ N where N is the set of non-negative integers; H describes the inhibition conditions; G is an enabling function that, given a transition and a model state, determines whether the transition is enabled; M₀ is the initial marking; the function τ associates timed transitions with a non- negative rational number ( τ : T→Q⁺ , where Q⁺ stands for the set of non-negative rational numbers); the function W associates an immediate transition with a weight (relative firing probability); and finally, Π associates an immediate transition with a priority to determine a precedence among some simultaneously fire-able immediate transitions.

[0059] Note that the priority of timed transitions (either deterministic or stochastic) against immediate ones is zero. To model the behavior of an application running in a cloud with stochastic failures, some embodiments disclosed herein use Stochastic Colored Petri Nets (SCPN), which is a class of DSPN models where the tokens can have different colors/types. In some embodiments , the received/generated model can then be simulated and analyzed using a simulator, such as the TimeNet simulator. Although DSPN imposes the restriction of only one enabled deterministic transition in each marking, the TimeNet simulator provides transient and stationary analysis of SCPN without any restriction on the number of concurrent enabled transitions. In the following section, the SCPN model utilized in some embodiments for a multi-component application deployed in the cloud is explained. However, it should be noted that stochastic models, other than Petri Nets, could be used, as will be appreciated by skilled persons in the art. Therefore embodiments of the present invention are not limited to the Petri Net models.

[0060] APPROACH

[0061] Figure 2 is a flow diagram 200 illustrating exemplary operations for performing high- availability multi-component cloud application placement using stochastic availability models according to some embodiments. In some embodiments, these operations can be performed by a High- Availability Cloud Application Placement Module executed by a server computing device (e.g., 1100 of Figure 11), which can be implemented by executing high-availability cloud application placement code (e.g., 1130 of Figure 1 1). The operations in this and other flow diagrams will be described with reference to the exemplary embodiments of the other figures. However, it should be understood that the operations of the flow diagrams can be performed by embodiments of the invention other than those discussed with reference to the other figures, and the embodiments of the invention discussed with reference to these other figures can perform operations different than those discussed with reference to the flow diagrams.

[0062] The flow 200 optionally includes at block 205, generating a model definition describing the application. The model definition optionally can be a Unified Modeling Language (UML) object model (block 210) or another type or format of model definition, and the application optionally can be a multi-tier application (block 215) and/or multi-component application (block 217). For example, the application could be a three-tier, multi-component cloud application 100 as illustrated in Figure 1. Alternately, the flow 200 could include receiving the model definition, as illustrated by block 219.

[0063] A typical cloud deployment includes some software components running on an execution environment (e.g., a VM or container). The VM can be hosted on a server computer, which in turn can be hosted in a data center. Figure 3 illustrates a simplified (exemplary) UML model definition that captures such a cloud deployment according to some embodiments.

[0064] Each software component (represented as component object 305 in Figure 3) has attributes (and corresponding attribute values) 310 to capture certain features - for example an incoming workload distribution (arrivalRate), a time duration required to process a request (processingTime), a number of requests the component can process in parallel (bufferSize), a maximum capacity of the requests that can wait to be processed (queueSize), a number of redundant replicas considered for each component {numberOfReplicas), and/or the redundancy schema of the component (redundancyModel) to indicate a redundancy type (active, standby, spare, etc.) the component is capable to accept.

[0065] As illustrated in Figure 3, an execution environment (represented by execution environment object 320), which can be a VM or container, for example (represented by VM object 330 or container object 325) can fail due to different failure types (represented by failure type object 315), as can a server computer (represented by server object 335) and/or data center (represented by data center object 345). Each failure type can have a defined failure rate (or mean time to failure (MTTF)), a recommended recovery action, and/or a recovery duration (or mean time to recovery (MTTR)) based on the recommended recovery.

[0066] However, to address the challenges of HA-aware scheduling, a stochastic behavioral model is utilized in some embodiments in order to capture the stochastic nature of different failures in a system. Although the UML model 300 can describe service availability features, UML as a semi-formal model cannot simulate the behavior of the system or measure the availability of a service while different stochastic failures are happening.

[0067] In contrast, stochastic models such as stochastic Petri nets are suitable to model and simulate the behavior of such systems with stochastic behaviors. Thus, some embodiments include mapping (or translating/converting) an instance of the UML model 300 definition describing a given deployment of the application in the cloud to a corresponding stochastic model (e.g., a Stochastic Colored Petri Nets (SCPN) model), and thereafter analyzing this model (e.g., using a simulator tool) to quantify the expected availability of the application.

[0068] Accordingly, turning back to Figure 2, at block 220, the flow 200 includes generating a stochastic model based upon transforming the generated/received model. The stochastic model can be, for example, a Stochastic Colored Petri Net model (at block 225), though in other embodiments it can be a different type of stochastic model. The model can include (at block 230) "building blocks" (data structures or other representations of entities) comprising one or more of: a software component of the application, a load balancer, a virtual machine, a server computer, a data center, etc.

[0069] As one example, Figure 4 is a flow diagram 400 illustrating exemplary operations for transforming (block 220) a cloud model definition to yield a Stochastic Petri Net model according to some embodiments.

[0070] Before detailing these operations, a one-to-one mapping to achieve the transformation from the UML model definition of a cloud system to a corresponding SCPN model is explained. Thus, certain "building blocks" to be included in the SCPN model are described.

[0071] SCPN model building blocks [0072] This section explains the SCPN model used to evaluate various HA application deployments in a cloud environment according to some embodiments. Various building blocks of SCPN are defined herein which, when combined together for a complete SCPN model, can be analyzed to assess the expected availability according to some embodiments. For example, five different building blocks are defined which can be used in the model transformation phase. In this described model, each of the software components is running on a virtual machine, and the VM is hosted on a server computing device. The server computing device in turn is hosted at a DC. Each VM, server computing device, and DC can have its own failure rate (MTTF) and recovery time (MTTR). The Figures provided herein to illustrate the sub-components (e.g., Figures 5A-5C, 6, 7, 8) show immediate transitions as black bars, while deterministic and exponential timed transitions are shown as thick white-filled bars. Note should be taken that, this representation is slightly different from a "standard" DSPN presentation where immediate transitions are modeled with narrow bars, timed transitions are modeled with thick black-filled bars, and exponential transitions are modeled with thick white-filled bars.

[0073] Figure 5A illustrates an exemplary data center sub-model 500 according to some embodiments. A data center has two states: healthy state 502 (the place DQ) and failed state 504 (the place DCi_fail). Failure is modeled using an exponential timed transition 506 (Ti_DCfail) whereas the recovery is a deterministic 508 one (Ti_DCup). The transitions, types, and time functions of the data center sub-model 500 can be:

[0074] Figure 5B illustrates an exemplary server sub-model 530 according to some embodiments. The server also has two states - healthy 532 (Si) and failed 534 (Si_fail). The server can fail and the failure is an exponential transition 536 (Ti_sfail); it can also fail immediately 538 due to the failure of its hosting datacenter (Ti_sDCfail). The datacenter hosting Si is represented with S(i)DC

[0075] In the following, the place name is used in the formulas to show the number of the tokens available in that place. The immediate transition Ti_sDCfail is guarded with:

^TLsDCfail ⁼ ^(i)DC ⁼⁼

[0076] The recovery occurs according to a deterministic transition 540 (Ti_sUP). A server cannot be recovered unless its host data center is healthy. Thus, Ti_sUP is guarded with: ^Ti_sUP ^(i)DC 1)

[0077] The following table provides the information about the timed transitions of server submodel 530:

[0078] Figure 5C illustrates an exemplary virtual machine sub-model 560 according to some embodiments. A VM can fail through an exponential transition 566 (Ti_fail) or can fail immediately 568 due to the failure of its hosting server or data center (Ti_Hfail). The server and DC hosting the VM are referred to as VM(i)Server and VM(i)DC, respectively, and Ti_Hfail is guarded with:

G_TLfail = (VM_(i)DC == 0 v VM_(i)Server == 0)

[0079] The recovery happens after a deterministic delay 570 (Ti_up). Note that, in this case also a VM cannot be recovered unless its host data center and server are healthy. Thus, Ti_UP is guarded with:

G_TLUP = (VM_(i)DC == 1 Λ VM_(i)Smer == 1)

[0080] The following table provides the information of the timed transitions of the VM submodel 560.

[0081] Figure 6 illustrates an exemplary load balancer sub-model 600 (e.g., a load generator and round robin load balancer sub-model to be used within an overall model) according to some embodiments. The place LoadDistributor 602 has a fixed number of tokens 604, and the load balancer transitions 606 (T_LBi and T_LBo) distribute the workload among the active replicas of the same component. Each component has a queue 608 (Ci_queue) to represent the number of requests it can queue for processing and a flushing place 610 (Ci_flushing). Note: the flushing place 610 can be a place holder for the load balancing mechanism to ensure a round robin distribution, and thus, it is not used to capture a specific component behavior. [0082] The transitions T_LBi and Ti_flush 612 are guarded such that they model a round robin policy. When a component Q receives a token in its queue, its flushing place 6 10 is marked and the component will not receive another token until its flushing place 610 is unmarked. Let the round robin order be Ci, C₂, C₃, .. . CM where M is the number of replicas, and then the same order repeats. The transition T_LBi 606 is the first one that becomes enabled, and its clock starts elapsing. Once it is fired, one token is produced in Ci_queue 608 and one token is produced in Ci_flushing 610 and is marked, Ci cannot receive another token. On the other hand, Ti_flush 612 cannot be fired until all other components have received their share. As soon as Ci receives a token, the transition T_LB₂ 606 becomes enabled, and its clock starts elapsing. Then T_LB₂ 606 fires and C₂_queue 608 and C₂_flushing 610 receive a token. The same way other components receive their share until CM receives a token. At this time, Ti_flush 612 is enabled, and Ci_flushing 610 is unmarked. Subsequently, T₂_flush, T₃_flush, ... TM_flush also fire. According to the nature of workload arrival of the system, T_LBi can have different distribution (e.g. deterministic, exponential, etc.).

[0083] The following table lists different timed transitions of the load balancer sub-model, their type and time function:

* Depending on the nature of the workload arrival, these transitions can have other time functions.

[0084] If a component is not available due to a full queue or a component failure, VM failure, server failure, or data center failure, it should give its turn to the next available component. Thus, M is set to be the number of replicas inumberOfReplicas) and L the maximum capacity of a component queue (queueSize), and the server and DC hosting VM© are referred to as VM(i)Server and VM(i)DC respectively. Thus, VSDH© and VSDF© are variables (e.g., "VSD" refers to [ VM, Server, Data center], "H" refers to "healthy," and "F" refers to "faulty") used to capture the combined status of the VM, its hosting server, and its hosting DC, and can be defined as follows:

VSD_HM = WM, = MM (i) Server == \ A VM (i)DC = 1]

VSD_E(I) = [VM, = 0 vVM, (i) Server == vVM (i)DC == 0] [0085] T_LBi is guarded with GT LBI: ^ iel-.M ^T _ _LB. —

(C_i - flushing == 0 Λ VSD_{H (i)} A C_f _ queue < L)

(C_k_f lushing == 1 V VSD_F^)

fc=l:i-l

[0086] And Ti_flush is guarded with Crn_fiush:

( C_k_flushing == 1 V 5D_F(fc))

fc=£+l:M

[0087] If all of the components are failed or their queue is full, the requests are dropped and sent to the place DeniedService. Transition T_LBo is guarded with:

VM& == 0 V VM_(0Server == 0 V VM_&DC == 0 V C _ueue > L

[0088] An alternative solution used in some embodiments to model the load distribution is to use the loop back arcs from T_LBi and T_LBo to the place LoadDistributor to re-enable continuously the load balancer transitions and regenerate the workload infinitely. Note should be taken that with this alternative approach of load distributing, as the model can be over-flooded with tokens if the generation rate of the tokens (representing the arrival rate of requests) is faster than the consumption rate of the tokens (representing the processing rate of the requests). To avoid this issue, embodiments fix the number of tokens in the place LoadDistributor and do not consider the feedback input arcs. The transitions and their guards remain the same to model the round robin policy. Accordingly, either of these techniques can be selected based upon the one that best fits the needs of the simulator. [0089] Figure 7 illustrates an exemplary component sub-model 700 according to some embodiments. This figure illustrates the model of a component including partially the load balancer delivering the workload to the component. Each component has a queue (Ci_queue) 702 to model the maximum capacity of the requests waiting to be processed and also a buffer 704 to model the maximum number of requests a component can process in parallel (Coprocessing), e.g., with multi-threaded components. The requests stored in the queue 702 can enter the buffer 704 only if the component and its corresponding server and VM are healthy and the number of tokens already in the buffer 704 is below the maximum. When a component fails, all the requests in its buffer are lost and transferred to the place 706 Lost_in_phasei where 'V is the tier number. The transition 708 Ti_Lost_in_Processing is guarded with:

Ti _ Lost _ in _ Pr oces sin g

VM_in = 0 v VM _u)Smer= 0 v VM_mDC == 0

[0090] In addition, in each tier if all of the replicas fail at the same time, all of the tokens stored in the component queue are transferred to the place 710 LostReq. The transition 712 Ti_Lost is guarded with:

Gxijost =

Λ VM_& == 0 V VM_{( Server} == 0 V VM _c == 0

/ \;=1:M

[0091] When a component fails, the requests already stored in its queue are transferred again to the load distributor to be failed over to the other healthy components. This behavior simulates a multi-active stateful redundancy, where each component is equally backed up by the other components. The transition 714 T_failover_Ci_to_LB is guarded with:

^ iel:M GT _ failover _ Ci _ to_LB

(VM_{i) == 0 vVM_(i)Server == 0 vVM_(i)DC = 0) Λ

VM ₎ == 1 Λ VM _)Server == 1 Λ VM_Q)DC =

1

VM_m—— 1 Λ VM_(k)Server— 1 Λ VM_mDC [0092] The tokens successfully processed are stored in the place 716 Cmid. Note that in a multi-tier system, the tokens successfully processed in one tier are carried to the next tier where they are load balanced among the replicas of the next tier. The tokens successfully processed in all of the tiers are stored in a final place and only those tokens reached to this final place indicate the availability of the system. The following table presents the list of timed transitions and their information for the component sub-model 700:

: Depending on the nature of the work oad arrival, this transition can have other time functions.

[0093] Transformation from UML object diagram to SCPN

[0094] Having described the building blocks of an exemplary SCPN model, we continue back at Figure 4 and generating a stochastic model based upon transforming the generated/received model definition.

[0095] In some embodiments, this approach is based on transforming an instance of the UML model (e.g., an object model) into a solvable stochastic (e.g., SCPN) model. The overall transformation algorithm utilized in some embodiments is described in the flowchart in Figure 4. The flow 400 can include, after receiving or obtaining an instance of the model definition at block 405, building a dependency graph based on the components type's dependencies at block 410. At this stage, the block 410 can include identifying the number of tiers (e.g., 3 tiers of the application) and their order at block 415.

[0096] Next, the flow 400 includes creating the common places and transitions that are common in stochastic models (at block 420), such as the LoadDistributor and LostReq places used in SCPN models. This can conclude the initialization phase 470, and the tier-iteration phase 475 begins.

[0097] Thus, the flow iterates over each tier creating the load balancer, all the component replicas, their VMs, and their respective servers at blocks 425, 430, 435, and 440.

[0098] This creation can be based on utilizing the building blocks defined above. For instance, if the model definition includes five VMs, the VM building block can be replicated five times. However, the transition and guards of each building blocks may be different, and it is in the annotation phase 480 that the DCs are created and the transitions are annotated with the proper rates and the guards are annotated with the proper conditions (at block 445). It is the annotation phase 480 that glues the model together reflecting the actual deployment and the failure cascading effects. The flow 400 then ends at block 450.

[0099] Turning back to Figure 2, the flow 200 continues with performing model analysis to yield results at block 235. This analysis can include block 240 and simulating different deployment scenarios for the application. In some embodiments, this includes block 245 and, for each deployment scenario, adjusting one or more variables/settings. These can include, for example, adjusting data center MTTR values (block 250), data center MTTF values (block 255), data center failure rate values (block 260), load processing time values (block 265), etc.

[00100] For explanatory purposes, an example of a cloud deployment modeled by SCPN is provided, and then the model is used to evaluate different deployments from a HA perspective. The exemplary system is stated to be a three-tier application, e.g., a "Big Data" analysis application. At the front end, Filters receive unstructured data and remove redundant/useless data. In the middle, Analysis Engines analyze the data and generate structured data form. At the back end, Databases store the structured data produced by the Analysis Engine.

[00101] In each tier, it is assumed that the software component is running on a virtual machine, and the VM is hosted on a server computing device. The server computing device, in turn, is hosted on a DC. Each tier is replicated three times with a multi-active redundancy model. The data centers are geographically distributed. In each tier, a load balancer distributes the workload among the replicas based on a round robin policy.

[00102] Using this example, it is assumed that the operator is particularly interested in comparing inter- and intra-DC scheduling, and thus the data center hosting the servers and VMs will be changed alternatively. Further, it is assumed that each VMi is hosted on a server Si. The data center hosting Si is not fixed and, depending on the deployment, the server can be hosted on any of the available DCs. We refer to the data center hosting VMi and Si using VM(i)DC.

[00103] Figures 8A, 8B, and 8C collectively illustrate an exemplary SCPN model 800A-800C of a three-tier application running in a cloud environment according to some embodiments. For example, the depicted three-tier application could be a "Big Data" analysis application as described above, and could include requests being received at a frontend (e.g., a first tier) and be processed through the other tiers (e.g., a second and third tier) where eventually the processing stops at the backend (e.g., data is persisted in a database). This configuration could be thought of as being "one-way." However, in some embodiments, a three-tier application could be configured in a "round-trip" manner, e.g., receive requests at a frontend that are processed and ultimately passed to the backend, where after processing, data is passed back through the tiers to the frontend to be sent back to the client (or to another destination). Accordingly, this "round- trip" nature (e.g., flowing through each of the tiers twice) of such embodiments can instead be modeled using six tiers (e.g., three tiers for the trip from the frontend to the backend, and another three tiers for the way back). Moreover, these disclosed techniques are applicable to any number of tiers of an application, and, applications exhibiting such "round-trip" flows can be modeled with double the number of tiers.

[00104] Turning back to Figures 8A, 8B and 8C analyzing the service availability can be done either by (1) quantifying the percentage of time a given service is in a healthy state, or (2) by analyzing the percentage of served requests in comparison to the total number of received requests.

[00105] For example, using the latter technique, the number of tokens in the initial

LoadDistributor place can be fixed. Note that, when the model is created using the building blocks described above, some places may overlap. For example, the place 'Lost_in_phasei' is shared in each tier among the replicas, whereas the place 'LostReq ' is unique per model. In each tier, served requests are stored in a place which serves as the load generator of the next tier (e.g. Cmid and Cmidi places in Figure 8). The tokens successfully processed in all of the three tiers are stored in the place ServedReq in the 3^rd tier. The percentage of the requests that are successfully processed through the three tiers (ServedReq) indicates the service availability of the cloud application. If all of the components are failed or their queue is full, the requests are dropped and sent to the place DeniedService. When a component fails, the requests already stored in its queue are resent to the load distributor to be failed over to the other healthy components. Lost_in _phasei, Lost_in _phase2 and Lost_in _phase3 collect in each phase the lost requests from the component buffers. If all of the replicas of a tier fail at the same time, all of the tokens waiting in the component queues are transferred to the place LostReq.

[00106] Evaluation and results

[00107] To investigate different DC scheduling, multiple deployment scenarios can be considered and experiments can be conducted with the SCPN model. With a focus upon the effect of DC failures on HA, the VMs and servers can fail due to DC failure through immediate transitions Ti_sDCfail and Ti_Hfail. VM and server failure rates (used in Ti_fail and Ti_sfail) are fixed throughout the experiments. DCs can have similar or different failure rates. As a base line, they all have the same MTTF (x; x; x). Then, the failure rate of the DCs is modified assuming that DC 1 is failing more frequently, DC3 is the most reliable, and DC2 has a failure rate between the two others. Then different MTTR is considered for each variation of the MTTF. However, recovery time is the same among the DCs. For example the following table shows different parameters/variables altered in these experiments: MTTF(DC1; DC2; DC3) x*; x; x x; 1.5x; 2x x; 2x; 3x

MTTR x/3 x/10 x/30

Load Processing Time a* 5a 10a

[00108] In this example, three deployments are evaluated: the first deployment maximizes the distribution among the DCs, such that in each tier at least one of the replicas is on DC1, one is on DC2, and one is on DC3 (named Dep.1-2-3).

[00109] In the second deployment, one replica of each tier is placed on DC2, and two other replicas of each tier on DC3 (named Dep. 2-3).

[00110] In the third deployment, all the replicas are hosted by the most reliable DC, which is DC3 (Dep.3 afterwards).

[00111] Thus, each of the three deployments is evaluated to determine which deployment should be chosen to maximize the overall availability of the application. For example, in a scenario where DC3 is the most reliable one, this simulation can determine whether it is better to choose the third deployment and put all of the replicas on the most reliable DC, or whether it may be better to maximize the distribution among the DCs, etc.

[00112] First, the case is evaluated where all of the DCs have the same MTTF (x; x; x), and the MTTR among DCs is varied as presented in the table above. The results of the simulation are presented in Figure 9A, which illustrates service availability of different deployments and different MTTRs where the data centers have similar MTTF values according to some embodiments. These results 900 indicate that when the DCs have the same failure rates, it may be optimal to utilize a maximum distribution (i.e., between data centers) as it reduces the probability of the service outage due to multi-DC failures.

[00113] In a second evaluation, the failure rates of DC1, DC2, and DC3 are changed to x, 1.5x, and 2x respectively and change the recovery time as listed in the table above. The results 950 are presented in Figure 9B, which illustrates service availability of different deployments and different MTTRs where the data centers have different MTTF values according to some embodiments.

[00114] Finally, the case where DCs have different MTTF of x, 2x, and 3x, respectively is evaluated. Again, the MTTR is varied according to the table above. The results 1000 are presented in Figure 10A, which illustrates service availability of different deployments and different MTTRs where the data centers have different MTTF values than those of Figure 9B according to some embodiments.

[00115] Based upon the results of Fig. 9B and Fig. 10A when the reliability of DCs differs, it can be determined that the most reliable DCs should be utilized instead of a maximum distribution, and that a single DC deployment is not the optimal choice. [00116] For the last set of experiments, the impact of changing the load processing time is investigated. It is assumed that DC1, DC2, and DC3 have different MTTF of (x; 2x; 3x), respectively. Three load processing times of 'Α', '5A' and ΊΟΑ' are used for the experiments, where A represents the request arrival rate. The results 1050 are presented in Figure 10B, which illustrates served request amounts of different deployments when the served requests have different processing times according to some embodiments. As the processing time impacts the length of the processing queue, an increased processing time should reduce the system availability due to the requests failed in processing. On the other hand, by decreasing the processing time, the impact of failures is reduced, and therefore the difference in HA between the intra- and inter-DC deployments is reduced.

[00117] Turning back to Figure 2, after performing the model analysis at block 235, the flow can continue in a variety of ways. For example, at block 270 the flow can optionally include providing the results of the analysis to a user, and the results may include for the multiple deployment scenarios the service availabilities thereof. The providing of the results could occur using electronic messaging, and could include providing the results as part of a web page. The results can include text (e.g., alphanumeric characters), graphics (e.g., charts), etc., and can include user input elements allowing the user to provide a user input to select one of the deployment scenarios. Thus, the flow can include block 275, where a user input is received from the user that indicates a request to allocate resources for the application according to an identified/selected one of the multiple deployment scenarios. As a result, at block 285, the flow can include causing resources to be allocated for the application according to the

selected/identified one of the multiple deployment scenarios.

[00118] In some embodiments, the flow 200 includes automatically (i.e., according to programmed and executed logic) identifying/selecting a "chosen" or first deployment scenario from the multiple deployment scenarios. This selection of a first deployment scenario can be based upon using a selection scheme and the results of the analysis (from block 235). The selection scheme can be based upon the expected system availabilities, and can include selecting the deployment scenario having the highest determined system availability value, for example. Of course, other selection schemes can be developed by those of ordinary skill in the art to flexibly select particular deployment scenarios according to the desires of the involved user(s). Further detail regarding an exemplary selection scheme is provided later herein with regard to Figure 13. However, after an identification of a "chosen" or "optimal" scheme, at block 285 the flow can include causing resources to be allocated for the application according to the selected/identified one of the multiple deployment scenarios. [00119] The techniques and operations disclosed herein can be implemented, in whole or in part, using one or more electronic devices. An electronic device stores and transmits (internally and/or with other electronic devices over a network) code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using machine-readable media (also called computer-readable media), such as machine-readable storage media (e.g., magnetic disks, optical disks, read only memory (ROM), flash memory devices, phase change memory) and machine-readable transmission media (also called a carrier) (e.g., electrical, optical, radio, acoustical or other form of propagated signals - such as carrier waves, infrared signals). Thus, an electronic device (e.g., a computer) includes hardware and software, such as a set of one or more processors coupled to one or more machine-readable storage media to store code for execution on the set of processors and/or to store data. For instance, an electronic device may include non-volatile memory containing the code since the non- olatile memory can persist code/data even when the electronic device is turned off (when power is removed), and while the electronic device is turned on that part of the code that is to be executed by the processor(s) of that electronic device is typically copied from the slower non-volatile memory into volatile memory (e.g., dynamic random access memory (DRAM), static random access memory (SRAM)) of that electronic device. Some electronic devices also include a set or one or more physical network interface(s) to establish network connections (to transmit and/or receive code and/or data using propagating signals) with other electronic devices.

[00120] Figure 11 is a block diagram illustrating an exemplary data processing system 1100 that can be used in some embodiments. Data processing system 1100 includes one or more microprocessors 1105 (or processing circuits) and connected system components (e.g., multiple connected chips). Alternatively, the data processing system 1100 can be a system on a chip. One or more such data processing systems 1100 may be utilized to implement the functionality of a server computing device(s) executing a High- Availability Cloud Application Placement Module as described herein. For example, the data system 1100 can be used to perform method 200 of Figure 2

[00121] The illustrated data processing system 1100 includes memory 1110, which is coupled to one or more microprocessor(s) 1105. The memory 1110 can be used for storing data, metadata, and/or programs for execution by the one or more microprocessor(s) 1105. For example, the depicted memory 1110 may store high-availability cloud application placement code 1130 that, when executed by the microprocessor(s) 1105, causes the data processing system 1100 to perform high- availability multi-component cloud application placement using stochastic availability models and to perform other operations as described herein, such as the operations illustrated in Figure 2 . The memory 1110 may include one or more of volatile and non-volatile memories, such as Random Access Memory ("RAM"), Read Only Memory ("ROM"), a solid state disk ("SSD"), Flash, Phase Change Memory ("PCM"), or other types of data storage. The memory 1110 may be internal or distributed memory.

[00122] The data processing system 1100 also includes an audio input/output (I/O) subsystem 1125 which may include a microphone and/or a speaker for, for example, playing back music or other audio, receiving voice instructions to be executed by the microprocessor(s) 1105, playing audio notifications, etc. A display controller and display device 1120 provides a visual user interface for the user, e.g., graphical user interface (GUI) elements or windows.

[00123] The data processing system 1100 also includes one or more input or output ("I/O") devices and interfaces 1115, which are provided to allow a user to provide input to, receive output from, and otherwise transfer data to and from the system 1100. These I O devices 1115 may include a mouse, keypad, keyboard, a touch panel or a multi-touch input panel, camera, optical scanner, network interface, modem, other known I/O devices or a combination of such I/O devices. The touch input panel can be a single touch input panel that is activated with a stylus or a finger, or a multi-touch input panel that is activated by one finger or a stylus or multiple fingers. The touch input panel can be capable of distinguishing between one or two or three or more touches, and can be capable of providing inputs derived from those differentiated touches to other components of the processing system 1100.

[00124] The I/O devices and interfaces 1125 can also include a connector for a dock or a connector for a USB interface, FireWire, Thunderbolt, Ethernet, etc., to connect the system 1100 with another device, external component, or network. Exemplary I/O devices and interfaces 1115 can also include wireless transceivers, such as an IEEE 802.11 transceiver, an infrared transceiver, a Bluetooth transceiver, a wireless cellular telephony transceiver (e.g., 2G, 3G, 4G), or another wireless protocol to connect the data processing system 1100 with another device, external component, or network, and receive stored instructions, data, tokens, etc. It will be appreciated that one or more buses may be used to interconnect the various components shown in Figure 11. It will also be appreciated that additional components, not shown, may also be part of the system 1100, and, in certain embodiments, fewer components than those shown in Figure 11 may also be used in a data processing system 1100.

[00125] Figure 12 illustrates a non-limiting example functional block diagram of a server computing device in accordance with some embodiments. It is not strictly necessary that each module be implemented as physically separate units. Some or all modules may be combined in a physical unit. Also, the modules need not be implemented strictly in hardware. In some embodiments, the units may be implemented through a combination of hardware and software. For example, the server computing device 1200 may include one or more central processing units executing program instructions stored in a non-transitory storage medium or in firmware to perform the functions of the modules.

[00126] The server device 1200 can include a model transformation module 1210 and a model evaluation module 1215. In various embodiments, the server device 1200 can also optionally include one or more of: a model generation/reception module 1205, a model providing module 1220, a user input reception module 1225, a deployment selection module 1230, and/or a scheduler module 1235.

[00127] The model generation/reception module 1205 can be adapted for receiving or generating a model definition defining a multi-component cloud application, wherein the model definition defines a plurality of components of the multi-component cloud application.

[00128] The model transformation module 1210 can be adapted for generating, based upon the model definition, a Stochastic Colored Petri Net (SCPN) model. The SCPN model includes representations of the plurality of components, one or more virtual machines that execute the plurality of components, one or more server computing devices that execute the one or more virtual machines, and one or more data centers hosting the one or more server computing devices.

[00129] The model evaluation module 1215 can be adapted for evaluating, using the SCPN model, a plurality of deployment possibilities of the multi-component application to yield a plurality of service availabilities of the multi-component application corresponding to the plurality of deployment possibilities.

[00130] The deployment selection module 1230 can be adapted for selecting, by the computing device based upon at least some of the plurality of service availabilities, a first deployment possibility from the plurality of deployment possibilities. This selecting of the first deployment possibility can occur according to a selection scheme.

[00131] In some embodiments, the model providing module 1220 can be adapted for providing, to a client, information describing the plurality of service availabilities.

[00132] The user input reception module 1225 can be adapted for receiving, from the client, a selection of the first deployment possibility as the preferred deployment.

[00133] The scheduler module 1235 can be adapted for causing resources to be allocated for application according to the selected/identified one of the multiple deployment scenarios.

[00134] Cloud Placement Selection

[00135] Figure 13 is a flow diagram illustrating exemplary operations for selecting a first placement that may be an optimal placement scenario for a multi-component cloud application from multiple placement scenarios according to some embodiments. As described above, a result of performing model analysis (see, e.g., block 235) can potentially lead to multiple placement scenarios being identified that could be chosen. For example, in some embodiments an application may have an associated high-availability requirement (e.g., as part of an SLA) and multiple deployment scenarios can be identified as being suitable (i.e., satisfying the HA requirement). In some embodiments, one of these multiple deployment scenarios can be automatically selected to be utilized, which may occur according to a selection scheme 1300, such as the one detailed here.

[00136] At block 1305, the cloud datacenter information 1305 is accessed, which can include the results of the model analysis (see, e.g., block 235 of Figure 2). In some cases, multiple deployment scenarios may be determined. For the purposes of this example, a scenario is considered in which the data underlying the bar graph of Figure 10A is the cloud datacenter information 1305, and a high-availability requirement for the cloud application requires "90%" HA.

[00137] At block 1310, the scheme 1300 includes identifying a set of candidate placement solutions (e.g., those satisfying the HA requirements, other requirements/preferences, etc.). Turning to Figure 10A, assuming the 90% HA requirement, it can be seen that "Deployment 1- 2-3" and "Deployment 2-3" satisfy the HA requirement, assuming a MTTR of x/10 or x/30, because their determined service availability scores (92.07 and 92.21 for Dep 1-2-3; and 94.64 and 93.91 for Dep. 2-3) are greater than 90. Thus, the set of candidate placement solutions includes "Dep. 1-2-3" (i.e., using DCs 1, 2, 3) and "Dep. 2-3" (i.e., using DCs 2 and 3).

[00138] At block 1315, the scheme 1300 includes determining a distance score for each candidate placement solution. This block can include, at block 1320, determining a distance score for each data center utilized in the candidate placement solutions. For example, data centers 1, 2, and 3 are included in the candidate placement solutions.

[00139] For each of these data centers (i.e., 1, 2, and 3), the scheme 1300 can include performing the following:

[00140] (A) At block 1325, calculating an average utilization of all the "other" data centers. The utilization can be a system utilization of a data center, indicating an amount of "usage" of that data center in any number of ways known to those of skill in the art, such as a percentage of "allowed" components or VMs allowed to be executed at the data center. Of course, many other possibilities exist for determining a utilization, and thus this is to be viewed as illustrative and non-limiting. For this example, it is assumed that DC1 has a utilization of 42%, DC2 has a utilization of 41%, and DC3 has a utilization of 40%. Thus, to calculate an average utilization of all the "other" data centers, with respect to DC1 i.e., to calculate the average utilization of DC2 and DC3, then the following is performed: average(DC2, DC3) = (41+40)/2 = 40.5. [00141] (B) At block 1330, determining an allowed load of the DC (e.g. DCl) by multiplying the average utilization of all other DCs (from block 1325) by an "overload factor" of the DC (i.e., DCl). In some embodiments, an overload factor can be a value configured by an administrator to represent an amount that each particular DC can be used in excess of a baseline amount, for example. For example, it is assumed that DCl may be located in a geographic area having lower electricity costs or labor costs, etc., and thus results in a lower cost of operation, and it may have a configured overload factor of 1.2. This value, 1.2., indicates that DCl can be utilized 20% more (1.2 - 1 = 0.2) than some "baseline" DC in the system (e.g., an average utilization, etc.). It is also assumed that DC2 has an overload factor of 1.1, due to it having more modern equipment and thus having a lower carbon footprint, for example. Further, it can be assumed that DC3 has an overload factor of 1.0, meaning that it is the baseline and/or has 0% added preference. Accordingly, with the "average utilization of all other DCs" value of 40.5, and the "overload factor" of DCl being 1.2, the allowed load is given by multiplying the two values, 40.5 and 1.2, to yield the value 48.6.

[00142] (C) At block 1335, subtracting the current load of the DC from the determined allowed load of the DC (from block 1330). For example, the current load of DCl can be assumed to be 42. Thus, the "distance score" for DCl is the difference between the current load and the determined allowed load: i.e. 48.6-42=6.6, with the allowed load being 48.6, and the current load being 42.

[00143] This process can continue for each of the other two DCs - it is assumed that the distance score for DC2 is 4.1, and the distance score for DC3 is 1.5.

[00144] Once all distance scores for each data center are determined (block 1320), then the distance score for each candidate placement solution (block 1315) can be determined by, as illustrated at block 1337, averaging the distance scores of each solution -utilized DC (i.e., computing a mean of the distance scores for all of the DCs involved in providing one or more resources for a particular candidate placement solution) for each of the candidate placement solutions. For example, for "Dep 1-2-3", block 1337 can include calculating the average of 6.6, 4.1, and 1.5 - which is 4.06. Additionally, for "Dep 2-3", block 1337 can include calculating the average of 4.1 and 1.5 - which is 2.8.

[00145] Next, the selection scheme 1300 includes block 1340 for selecting the placement solution having the maximum distance score. Continuing the above example, the "Dep 1-2-3" has a distance score of 4.06, and "Dep 2-3" has a distance score of 2.8. Accordingly, the "Dep 1- 2-3" can be selected as the optimal placemen solution, and in some embodiments, the system can provide (at block 270 of Figure 2) these results to a user, and/or continue to block 285 of Figure 2 and cause resources to be allocated for the application according to the "Dep 1-2-3" deployment scenario.

[00146] For added detail, the operations of block 1315 may be represented as the following, where NumDC is the total number of available DCs, Dep is the set of DCs used for each deployment, DepN is the number of elements in set Dep, DCi.relative_aveUtil is the average utilization of other DCs except DCi, DCi.OL is the permitted overload for DCi compared to its peers' current load, DCi.allowed_load is the allowed load for DCi (taking into account the current loads of its peers), and DCi.c rrent_load is the actual workload of DCi:

[00147] The average utilization of the other DCs, except for DCi is given by: j ) /(NumDC— 1)

[00148] The overload factor for each DC can be defined as OL. Then, the maximum allowed workload is obtained from:

V _iei_:NumDCDC_i .allowed _ load = DC relative _ aveUtil x DC_i .OL

[00149] The distance for each DC is obtained from:

^ i_Gi- NumDc DC t .dist = DC_{ .allow dload - DC_t .current _ load

[00150] Then, for every eligible deployment, the distance score, referred to as deployment.dist can be calculated as explained below. Suppose that Dep is the set of DCs used in a deployment and DepN stands for the number of elements in the set Dep. Then, for each eligible deployment solution, the distance score is given by:

Deployment.dist = ( ^ DC_i .dist) I DepN)

Vie Dep

[00151] The eligible deployment having the maximum deployment distance or distance score will be chosen (e.g., at block 1340).

[00152] It should be appreciated by a person skilled in the art that other performance metrics can be used in the above example. For instance, metrics others than the distance score, average utilization and overload factors could be used.

[00153] Some portions of the preceding detailed descriptions have been presented in terms of algorithms and symbolic representations of transactions on data bits within a computer memory. These algorithmic descriptions and representations are the ways used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of transactions leading to a desired result. The transactions are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.

[00154] It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" and the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

[00155] The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method transactions. The required structure for a variety of these systems will appear from the description above. In addition, embodiments of the present invention are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of embodiments of the invention as described herein.

[00156] In the foregoing specification, embodiments of the invention have been described with reference to specific exemplary embodiments thereof. It will be evident that various modifications may be made thereto without departing from the broader scope of the invention as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense.

[00157] Throughout the description, embodiments of the present invention have been presented through flow diagrams. It will be appreciated that the order of transactions and transactions described in these flow diagrams are only intended for illustrative purposes and not intended as a limitation of the present invention. For example, although the flow diagrams illustrated in the figures show a particular order of operations performed by certain

embodiments, it should be understood that such order is exemplary (e.g., alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, etc.). Accordingly, one having ordinary skill in the art would recognize that variations can be made to the flow diagrams without departing from the broader scope of the invention as set forth in the following claims. Various modifications and equivalents are within the scope of the following claims.

Claims

CLAIMS What is claimed is:

1. A method in a computing device for performing High- Availability (HA) aware scheduling of a multi-component cloud application, the method comprising:

generating, based upon a model definition defining a plurality of components of the multi-component cloud application, a stochastic model including representations of the plurality of components, one or more virtual machines that execute the plurality of components, one or more server computing devices that execute the one or more virtual machines, and one or more data centers hosting the one or more server computing devices;

evaluating, by the computing device using the stochastic model, a plurality of

deployment possibilities of the multi-component cloud application to yield a plurality of service availabilities of the multi-component cloud application corresponding to the plurality of deployment possibilities; and

selecting, by the computing device based upon at least some of the plurality of service availabilities, a first deployment possibility from the plurality of deployment possibilities.

2. The method of claim 1, wherein generating the stochastic model includes:

generating a dependency graph based upon the definitions of the plurality of components from the model definition;

identifying a number of tiers of the multi-component cloud application and an ordering of the tiers; and

creating, for each of the tiers, a representation of:

a load balancer sub-model for the tier,

a component sub-model for each of the plurality of components belonging to the tier,

a virtual machine sub-model for each of the one or more virtual machines

belonging to the tier, and

a server sub-model for each of the one or more server computing devices

belonging to the tier.

3. The method of claim 2, wherein generating the stochastic model further includes creating one or more data center models.

4. The method of any one of claims 1-3, wherein the evaluating comprises utilizing, by the computing device, a simulator tool to quantify the plurality of service availabilities of the multi- component cloud application.

5. The method of claim 4, wherein the simulator tool quantifies the plurality of service availabilities by, for each of the plurality of deployment possibilities:

simulating a processing of a plurality of requests by the stochastic model under the

deployment possibility, wherein the simulating includes stochastically introducing one or more failures into the stochastic model; and

determining a percentage of the plurality of requests that were successfully processed through the stochastic model.

6. The method of any one of claims 1-5, wherein:

at least a first of the plurality of deployment possibilities that is evaluated includes

placing all of the plurality of components within a same data center; and at least a second of the plurality of deployment possibilities that is evaluated includes placing at least a first of the plurality of components in a first data center and at least a second of the plurality of components in a second data center.

7. The method of any one of claims 1-6, wherein the model definition includes, for at least one of the components:

an arrival rate attribute value indicating an incoming number of requests for the

component;

a processing time attribute value indicating a time duration for the component to process each of the requests;

a buffer size attribute value indicating a number of the requests that the component can process in parallel;

a queue size attribute value indicating a maximum capacity of the requests that can wait in a queue to be processed; and

a replica number attribute value indicating a number of redundant replicas considered for the component.

8. The method of claim 7, wherein the model definition further includes, for the at least one of the components: a redundancy model attribute value indicating a redundancy type that the component is capable to accept, wherein the redundancy type is at least one of active, standby, and spare.

9. The method of any one of claims 1-8, further comprising:

causing the multi-component cloud application to be scheduled according to the selected first deployment possibility.

10. The method of any one of claims 1-9, further comprising:

providing, to a client, information describing the plurality of service availabilities; and receiving, from the client, a selection of the first deployment possibility to be used for deploying the multi-component cloud application.

11. The method of any one of claims 1-10, wherein the stochastic model comprises a Stochastic Colored Petri Nets model.

12. The method of any one of claims 1-11, wherein selecting the first deployment possibility from the plurality of deployment possibilities comprises:

identifying a plurality of candidate placement solutions from the plurality of deployment possibilities;

determining a distance score for each of the plurality of candidate placement solutions; and

selecting a first of the plurality of candidate placement solutions which has a highest determined distance score among the determined distance scores.

13. The method of claim 12, wherein determining the distance score for each of the plurality of candidate placement solutions includes:

determining a data center distance score for each of the plurality of data centers utilized in the plurality of candidate placement solutions.

14. The method of claim 13, wherein determining the data center distance score for each of the plurality of data centers utilized in the plurality of candidate placement solutions includes: calculating an average utilization of all of the others of the plurality of data centers utilized in the plurality of candidate placement solutions;

determining an allowed load value for the data center based upon the average utilization and an overload factor of the data center; and determining the data center distance score based upon the allowed load value and a current load of the data center.

15. The method of claim 14, wherein determining the distance score for each of the plurality of candidate placement solutions further includes:

averaging, for each of the plurality of candidate placement solutions, the data center distance scores of those data centers utilized by the candidate placement solution.

16. A non-transitory computer-readable storage medium storing instructions which, when executed by a processor of a computing device, cause the computing device to perform high- availability multi-component cloud application placement using stochastic models by performing the method of any one of claims 1-15.

17. A computing device, comprising:

one or more processors; and

the non- transitory computer-readable storage medium of claim 16.

18. A computing device configured to perform High- Availability (HA) aware scheduling of multi-component cloud applications, the computing device comprising:

a model transformation module for generating, based upon a model definition defining a plurality of components of a multi-component cloud application, a stochastic model including representations of the plurality of components, one or more virtual machines that execute the plurality of components, one or more server computing devices that execute the one or more virtual machines, and one or more data centers hosting the one or more server computing devices; a model evaluation module for evaluating, using the stochastic model, a plurality of deployment possibilities of the multi-component application to yield a plurality of service availabilities of the multi-component cloud application corresponding to the plurality of deployment possibilities; and

a deployment selection module for selecting, based upon at least some of the plurality of service availabilities, a first deployment possibility from the plurality of deployment possibilities.

19. A computing device configured to perform High- Availability (HA) aware scheduling of multi-component cloud applications, the computing device comprising:

one or more interfaces; and one or more processors, operationally connected to the one or more interfaces and to a memory that contains instructions which, when executed, cause the one or more processors to:

generate, based upon a model definition defining a plurality of components of a multi-component cloud application, a stochastic model including representations of the plurality of components, one or more virtual machines that execute the plurality of components, one or more server computing devices that execute the one or more virtual machines, and one or more data centers hosting the one or more server computing devices; evaluate, by the computing device using the stochastic model, a plurality of deployment possibilities of the multi-component cloud application to yield a plurality of service availabilities of the multi-component cloud application corresponding to the plurality of deployment possibilities; and select, by the computing device based upon at least some of the plurality of service availabilities, a first deployment possibility from the plurality of deployment possibilities.

20. The computing device of claim 19, wherein the one or more processors are configured to select the first deployment possibility from the plurality of deployment possibilities by:

21. The computing device of claim 20, wherein the one or more processors are configured to determine the distance score for each of the plurality of candidate placement solutions by: determining a data center distance score for each of the plurality of data centers utilized in the plurality of candidate placement solutions.

22. The computing device of claim 21, wherein the one or more processors are configured to determine the data center distance score for each of the plurality of data centers utilized in the plurality of candidate placement solutions by:

calculating an average utilization of all of the others of the plurality of data centers utilized in the plurality of candidate placement solutions; determining an allowed load value for the data center based upon the average utilization and an overload factor of the data center; and

determining the data center distance score based upon the allowed load value and a

current load of the data center.

23. The computing device of claim 21, wherein the one or more processors are configured to determine the distance score for each of the plurality of candidate placement solutions by: