WO2016020731A1

WO2016020731A1 - Component high availability scheduler

Info

Publication number: WO2016020731A1
Application number: PCT/IB2014/066021
Authority: WO
Inventors: Ali Kanso; Manar JAMMAL; Abdallah SHAMI
Original assignee: Telefonaktiebolaget L M Ericsson (Publ)
Priority date: 2014-08-05
Filing date: 2014-11-13
Publication date: 2016-02-11

Abstract

Cloud computing is continuously growing as a business model for hosting information and communications technology applications. While the on-demand resource consumption and faster deployment time make this model appealing for the enterprise, other concerns arise regarding the quality of service offered by the cloud. The placement strategy of the virtual machines hosting the applications has a tremendous effect on the High Availability of the services provided by these applications hosted in the cloud. Systems and methods for virtual machine scheduling that takes into consideration the interdependencies between the components of the applications and other constraints such as the communication delay tolerance and resource utilization are provided.

Description

COMPONENT HTGH AVATLABTLTTY SCHEDULER

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority to previously filed US Provisional Patent Application Number 62/033,469 entitled "COMPONENT HIGH AVAILABILITY SCHEDULER" and filed on August 5, 2014, the contents of which are incorporated herein by reference.

TECHNICAL FIELD

This disclosure relates generally to systems and methods for placing virtual machines on servers in a cloud computing environment. BACKGROUND

Recently, the cloud has become the lifeblood of many telecommunication network services and information technology (IT) software applications. With the development of the cloud market, cloud computing can be seen as an opportunity for information and communications technology (ICT) companies to deliver communication and IT services over any fixed or mobile network, high performance and secure end-to-end quality of service (QoS) for end users. Although cloud computing provides benefits to different players in its ecosystem and makes services available anytime, anywhere and in any context, other concerns arise regarding the performance and the quality of services offered by the cloud.

One area of concern is the High Availability (HA) of the applications hosted in the cloud. Since these applications are hosted by virtual machines (VMs) residing on physical servers, their availability depends on that of the hosting servers. When a hosting server fails, its VMs, as well as their applications become inoperative. The absence of applications protection planning can have a tremendous effect on the business continuity and IT enterprises. According to Aberdeen Group, "Why Mid-Sized Enterprises Should Consider Using Disaster Recovery-as-a-Service," http://www.aberdeen.com/Aberdeen-Library/7873/AI-disaster-recovery-downtime.aspx, April 2012, the cost of one hour of downtime is $74,000 for small organizations and $1.1 million for larger ones. This is excluding the reputation damage that can be significantly greater in the longer term. Another Ponemon Institute study, "2013 Study on Data Center Outages"http://www.emersonnetworkpower.com/documentation/en- us/brands/liebert/documents/white%20papers/2013_emerson_data_center_outages_sl- 24679.pdf, Sep. 2013, shows that in the years 2011 and 2012, 91% of data centers endured unplanned outages.

A solution to these failures is to develop a highly available system that protects services, avoids downtime and maintains the business continuity. Since failure are bound to occur, the software applications must be deployed in a highly available manner, according to redundancy models, which can ensure that when one component of the application fails, another standby replica is capable of resuming the functionality of the faulty one. The HA of the applications would then be a factor of the redundancy model of the application, its recovery time, failure rate, and the reliability of its hosting server, corresponding rack and data center (DC) as well.

Therefore, it would be desirable to provide a system and method that obviate or mitigate the above described problems.

SUMMARY

It is an object of the present invention to obviate or mitigate at least one disadvantage of the prior art.

In a first aspect of the present invention, there is provided a method for determining placement of an application comprising a plurality of components onto one or more host servers. A criticality value is calculated for each component in the plurality indicating the relative impact of a failure of the component on the application. A component having the highest criticality value is selected for placement. A list of candidate host servers is modified to remove servers that do not satisfy a functionality requirement associated with the selected component. A server is identified in the modified list of candidate host servers that maximizes the availability of the application. The selected component is instantiated on the identified server.

In another aspect of the present invention, there is provided a cloud manager comprising a processor and a memory. The memory contains instructions executable by the processor whereby the cloud manager is operative to calculate a criticality value for each component in the plurality, the criticality value indicating the relative impact of a failure of the component on the application. The cloud manager is operative to select a component having the highest criticality value for placement. The cloud manager is operative to modify a list of candidate host servers to remove servers that do not satisfy a functionality requirement associated with the selected component. The cloud manager is operative to identify a server in the modified list of candidate host servers that maximizes the availability of the application. The cloud manager is operative to instantiate the selected component on the identified server.

In another aspect of the present invention, there is provided a cloud manager comprising a number of modules. The cloud manager includes a criticality module for calculating a criticality value for each component in the plurality, the criticality value indicating the relative impact of a failure of the component on the application. The cloud manager includes a selection module for selecting a component having the highest criticality value for placement. The cloud manager includes a candidate server module for modifying a list of candidate host servers to remove servers that do not satisfy a functionality requirement associated with the selected component. The cloud manager includes an identification module for identifying a server in the modified list of candidate host servers that maximizes the availability of the application. The cloud manager includes a placement module for instantiating the selected component on the identified server.

In some embodiments, the criticality value can be calculated in accordance with one or more parameters, alternatively or in combination. The criticality value can be calculated in accordance with a recovery time associated with the component. The criticality value can be calculated in accordance with a failure rate associated with the component. The criticality value can be calculated in accordance with comparing a recovery time of the component to an outage tolerance of a dependent component. The criticality value can be calculated in accordance with determining a minimum outage tolerance of a plurality of dependent components. The criticality value can be calculated in accordance with a number of active instances of a component type associated with the component.

Some embodiments can further comprise ranking the plurality of components in descending order in accordance with their respective criticality value.

In some embodiments, the functional requirement can be at least one of a capacity requirement and/or a delay requirement associated with the component.

In some embodiments, the server can be identified in accordance with at least one of a mean time to failure parameter and/or a mean time to recovery parameter associated with the server.

Some embodiments can further comprise modifying the list of candidate host servers in response to determining that the selected component must be co-located with a second component in the plurality. Some embodiments can further comprise modifying the list of candidate host servers in response to determining that the selected component cannot be co-located with a second component in the plurality.

There various aspects and embodiments described herein can be combined alternatively, optionally and/or in addition to one another.

Other aspects and features of the present invention will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present invention will now be described, by way of example only, with reference to the attached Figures, wherein:

Figure 1 illustrates an example Application deployment in the cloud;

Figure 2 illustrates an example Capacity Algorithm;

Figure 3 illustrates an example Delay Tolerance Algorithm Figure 4 illustrates an example Availability Algorithm;

Figure 5 is a flow chart illustrating a method for placing virtual machines on servers;

Figure 6 illustrates an example cloud management system architecture;

Figure 7 is a flow chart illustrating a method for placing an application;

Figure 8 is a block diagram of an example network node; and

Figure 9 is a block diagram of an example cloud manager.

DETAILED DESCRIPTION

Reference may be made below to specific elements, numbered in accordance with the attached figures. The discussion below should be taken to be exemplary in nature, and not as limiting of the scope of the present invention. The scope of the present invention is defined in the claims, and should not be considered as limited by the implementation details described below, which as one skilled in the art will appreciate, can be modified by replacing elements with equivalent functional elements.

Embodiments of the present disclosure demonstrate the effect of the placement strategy of applications on the high availability (HA) of the services provided by virtualized cloud to its end users. In order to attain the objectives, the cloud and the applications can be captured as unified modeling language (UML) models.

It is noted that the terminology of application/component/virtual machine "scheduling" and "placement" are well understood in the art as being synonymous with one another. This terminology will be used interchangeably herein as it relates to selecting a server to host a virtual machine.

Embodiments of the present disclosure are directed towards a HA-aware scheduling technique that takes into consideration capacity constraints, network delay demands, interdependencies and redundancies between the applications' components. The HA-aware placement can also be modeled as mixed integer linear programming (MILP) problem. The optimization model and the HA-aware scheduler can evaluate the availability of the components in terms of their mean time to fail mean time to repair and recovery time.

Some embodiments disclosed herein are directed towards capturing a number of the constraints that affect the application placement starting from capacity constraints, to network delay and availability constraints. Some embodiments disclosed herein reflect the availability constraints not only by failure rates of applications' components and the scheduled servers, but also by the functionality requirements, which generate co-location and anti -location constraints. Some embodiments disclosed herein consider different interdependencies and redundancies relations between applications' components. Some embodiments disclosed herein examine multiple failure scopes that might affect the component itself, its execution environment, and its dependent components. Some embodiments disclosed herein introduce the application's component "criticality" concept to the proposed approach. The criticality based analysis that ranks components of an application according to their criticality, can be used to ensure that most critical components are given higher scheduling priorities.

At the infrastructure-as-a-service (IaaS) level, a cloud provider or operator may provide a certain level of availability of the VMs assigned to the tenant(s). However, this may not necessarily guarantee the HA of the applications deployed in those VMs. In fact, the tenants would have to deploy their applications in an HA manner whereby redundant standby components can take over the workload when a VM or a server fails. Such a virtualized application can be comprised of a number of components having interdependencies.

To illustrate this point, consider the example of a multi-tier HA web-server application consisting of three component types: (1) the front end HTTP servers, (2) the Application servers, and (3) the databases. The HTTP servers handle static user requests and forward the dynamic ones to the App servers that dynamically generate HTML content. The users' information is stored at the back end databases.

Figure 1 illustrates an exemplary HA-aware deployment of the example webserver application 100. At the front end, there are two active (stateless) HTTP servers deployed on VM1 102 and VM2 104, sharing the load of requests, where if one fails the other would serve its workload. Most likely this will incur performance degradation. The (stateful) Application server has a 2+1 redundancy model with one standby (on VM5 110) backing up the two active Application servers (on VM3 106 and VM4 108). At the back end, there is one active database (on VM6 112) serving all of the requests that is backed up by one standby database (on VM7 1 14). Functional dependency clearly exists amongst the different component types.

The notion of a "computational path" (or data path) is defined as the path that a user request must follow through a chain of dependent components until its successful completion. For instance, in order for dynamic request to be processed, at least one active HTTP server, App server, and database must be healthy. Such an example of a computational path 1 16 is shown in Figure 1 as traversing VM1 102 -> VM3 106 -> VM6 112.

The components deployed in a redundant manner form a redundancy group. For example, for the Application server component type, redundancy group 1 18 is illustrated. Each component can have a different "impact" on the overall application depending on how many active replica(s) it has. For instance, as there is only one active instance of the database (VM6 1 12), its failure would impact all incoming requests. This would give the database a higher impact than the Application server, for example.

Cloud schedulers that are agnostic of the intricacies of a tenant's application may result in sub-optimal placements, where redundant components may be placed too close to each other, rendering their existence obsolete as a single failure can affect them all. Or further, the delay constraints can be violated and hinder the overall functionality of the application. HA-aware scheduling in the cloud can consider both the details of the applications as well as the details of the cloud infrastructure. To this end, the cloud and the application can be modelled using a unified modelling language (UML) class diagram.

The exemplary cloud architecture can be captured in such a UML class diagram where, at the root level, the cloud consists of data centers distributed across different geographical areas. Each data center consists of multiple racks communicating using aggregated switches. Each rack has a set of shelves embodying large number of servers of different capacities and failure rates. Servers residing on the same rack are connected with each other using the same network device (e.g. top of the rack switch). Finally, the VMs are hosted on the servers. This tree structure can determine the network delay constraints and consequently can determine the delay between the communicating applications. This architecture divides the cloud into five different latency zones that will be further discussed herein.

In the exemplary tree structure, each node has its own failure rate (λ) and mean time to recover (MTTR). The series reliability system is used to capture the availability of the cloud model. Therefore, the mean time to fail (MTTF) can be calculated as follows:

MTTF = =i— (1)

Further, each data center has its own MTTF, MTTR, and recovery time. In each Data Center, there exists a set of servers residing on different racks. Also, each server {S} has its own MTTF, MTTR, recovery time and available resources such as CPU and memory.

This architecture can divide the inter-Data Centers into latency zones and the intra-Data Centers into latency and capacity (CPU and memory) zones. The inter latency zone (D₄) can place the requested applications in any physical server in the cloud if the other constraints are satisfied. The intra latency zone can place the applications either within a data center (D₃), within a rack (D₂), within a server (Dj) or within a VM (D₀).

Each zone can select the highly available server as follows:

MTTFZ

Availabilty_server -

IVll 1 'aggregated ^aggregated

(2)

^Server ⁺ ^SourceRack ⁺ ^Si wceDC

and MTTR aggregat .ed , = MTTR server +MTTR f lCK , +MTTR server

Each application is composed of at least one component, which can be configured in at most one application. An application can combine the functionalities of its components to provide a higher level service. In order to maintain availability requirements, each component can have one or more associated redundant components. The primary component and its redundant ones are grouped into a dynamic redundancy group. In that group, each component is assigned specific number of active and standby redundant components. As shown in the UML model, each redundancy group is assigned to at most one application, which consists of at least one redundancy group.

As for the component, it belongs to at most one component type, which consists of at least one component. A component type is a software deployment. From this perspective, the component represents a running instance of the component type. Components of the same type have the attributes that are defined in the component type class such as the computational resources (CPU and memory) attributes.

Each component can be configured to depend on other components. The dependency relationship between component types can be configured using the delay tolerance, outage tolerance and/or communication bandwidth attributes. The delay tolerance determines the required latency to maintain a communication between sponsor and dependent components. As for the outage tolerance or tolerance time, it is the time where the dependent component can tolerate without the sponsor component. The same association is used to describe the relation between redundant components that need to synchronize their states.

Finally, each component type is associated with at least one failure type. The list of failure types determines the failure scope of each component type, its MTTF, and recommended recovery.

As discussed, the exemplary HA-aware scheduler searches for the optimum physical server to host the requested component. Whenever the server is scheduled, a VM is mapped to the corresponding component and to the chosen server. Therefore, each component can reside on at most one VM. Also, each VM can be hosed on at most one server. A failover group can be defined as the set of interdependent VMs (different VMs hosting dependent components). It defines a set of VMs that must failover together in case of unforeseen failure events. The VM placement method should generate mappings between the VMs on which the tenants' application are hosted and the cloud network physical servers while satisfying different constraints. Embodiments of the HA-aware scheduler will be described that can provide an efficient and highly available allocation by satisfying at least the following constraints: 1) capacity requirements, 2) network delay requirements, and 3) high availability requirements.

Capacity Requirements: These constraints can be used to generate a list of servers that satisfy the resource demands required by each application in order to meet the service level agreement (SLA). In the exemplary HA-aware scheduler, the computational resources consist of the CPU and memory requirements.

Network Delay Requirements: These constraints can be used to generate another list of servers that satisfy latency requirements in order to avoid service degradation between the communicating applications. The delay requirements will be divided into five delay types (i.e. latency zones) Do - D4 in this example as follows. Do Type requires that all the communicating components should be hosted on the same VM, and consequently, on the same server. D] Type requires that all the communicating components should be hosted on the same server. D₂ Type requires that all the communicating components should be hosted on the same rack. D₃ Type requires that all the communicating components should be hosted in the same DC. D₄ Type requires that all the communicating components can be hosted across the data centers but within the same cloud. As discussed, these delay types divide the cloud architecture into different latency zones to facilitate the scheduling problem.

High Availability Requirements: Using the list of candidate servers, HA requirements are used to select the server that maximizes the availability of an application. In order to attain this objective, Availability Constraints, Co-location Constraints, and Anti-location Constraints can be considered.

Availability Constraint: The server that maximizes the availability of a component is selected. This is attained by finding the server with the highest MTTF and lowest MTTR in a given server list. Co-location Constraint: This constraint is applied on dependent components that cannot tolerate the recovery time of its sponsor. Since the MTTF of a component is inversely proportional to its failure rate, the dependent and its sponsor components should be placed in the same server. It is assumed here that the failure rate of the hosting server is independent from the type of the hosted component.

Anti-location Constraint: This constraint ensures that components should be placed on different servers. It is applied on redundant components and dependent ones that can tolerate the absence of their sponsors. This is valid whenever the tolerance time of the dependent component is greater than the recovery time of its sponsor. By considering this case, the MTTF of the application will be maximized since its failure rate is minimized.

The MTTF of the component can be calculated as follows:

, τρ component _ 1

aggregated ^~ . . server '

component aggregated

These constraints can be used to prune the candidate servers generated by the capacity and delay constraints to select the server that will maintain a high level of the application availability while satisfying the functionality requirements.

With these constraints, a MILP model can be developed that maximizes the availability of the application while finding the best physical server to host it. An example MILP model will be discussed to illustrate solving the HA application placement problem.

Different parameters can be used in solving the placement problem and developing the MILP model. First, a virtual machine is denoted as J⁷ and a server as S. Each VM consist of an application (A), which consists of specific number of components (C), which are of component types (CT). Therefore, each application is a set of C and CT, and it is denoted as A ={C, CT}. This notation ensures that whenever a set of components C of types CT are scheduled, their corresponding application is hosted. As to the computational resources, _cr and L denote the resources, which can be memory or CPU of components and servers respectively. Table 1 shows the different notations for parameters used in the exemplary MILP model.

TABLE 1 Variable Notations

As to the decision variables, X_cs is denoted as a binary decision variable such that X_cs = 1 if component C is hosted on server S. Variable z_c is denoted as a binary decision variable in the delay constraints such that z_c = 1 when the delay requirement is satisfied.

The objective function of the formulated MILP model is to minimize the downtime of the requested components and consequently their applications. The objective function and its constraints are formulated as follows:

Objective Function:

min ∑∑ (Downtime _c + Downtime _s ) x X_cs

c s

Subject to:

Capacity Constraints:

∑(X_csx L_cr ) < L^T _sr Vs e S r e R (4)

∑X_CS = 1 Vc e C (5)

r_a e {0, l} (6) Network Delay Constraints:

(X_cVxDEL_ss,-Dr_c)< x

X_CJ-l<Mx(l-z_c,) (8)

Vc,c'e , Vs, s'eS^*

E{0,1} (9)

Availability Constraints:

X_cs + X_c,_s≤l (10)

Vc,c'≡C,\/s≡S,V RED_cc,

X_cs + X_c,_s<2 (11)

Vc,c'eC,VseS,V OT ,≤RT_c,V DEP ,

X_cs + X_c,_s≤l (12)

V c, c' e Com/?, Vs e S, V OT ,≥RT_C,V DEP_c

Downtime_c,Downtime_s≥0 Vc,s (13)

As previously discussed, the HA-aware placement of the application can be affected by capacity, delay, and availability constraints. Regarding the capacity constraints, constraint (4) ensures that the requested resources of VMs must not exceed available resources of the selected destination server. Constraint (5) determines that the VM can be placed on at most one physical server. Constraint (6) ensures that the decision variable (X_cs) is a binary integer. As for the delay constraints, constraint (7), (8) and (9) ensure that communicating components should be placed on the server that satisfies the required latency. These constraints are applied on the dependency and redundancy communication relation between the scheduled components. As for the availability constraints, constraint (10) reflects the anti -location constraints between a component and its redundant components. Using constraints (11), the dependent components should share the same server in case their outage tolerance is smaller than the recovery time of their sponsor component. Conversely, the anti-location constraint between dependent and sponsor components is active in the contrary case. The boundary constraint (13) specifies real positive values for downtimes of C and S.

The exemplary formulation can be feasible for small Data Center networks consisting of 20 components and 50 servers distributed across the network. In this small network, the number of variables generated in the optimization solver is around 4000 variables. Therefore, the exemplary component HA-aware scheduler in cloud environment is an approximation solution to the MILP model. It is based on a combination between greedy and pruning algorithms that aims at produce local optimization results.

The heuristic methodology iterates around all the applications and for each application, performs a criticality analysis of its components and then ranks them accordingly. Next, for each component, the method filters out the servers that do not satisfy the delay tolerance constraints and the ones that do not have enough capacity to host the component's VM. Next, the method selects among the remaining servers the one on which by placing the component, we would maximize the availability of its application.

Initially, the heuristic requires a reference point to start with the placement procedure because of the dependency and the redundancy communication relations between different components. Therefore, the concept of "criticality analysis" is introduced. This concept indicates that any component is considered a "critical" component when its failure causes an outage of the entire application or service. Each component has its own MTTF and MTTR, and therefore its failure can cause either an outage of the application or degradation of the service. The criticality value escalates when the failure scope of the component affects not only itself but its execution environment and its dependent environment as well.

An exemplary method for calculating the criticality of a component will now be discussed. Each component failure may have a different impact on the service availability; the most critical components are the ones that cause the most impact. The "impact" can be defined as a function of: 1) the service outage caused by the component failure; 2) the service degradation caused by the component failure; and 3) the portion size of the service being affected. The portion size can refer to be the number of users affected, the percentage of traffic affected, or any other metric representing the share of the service that would be affected by the component's failure.

It will be appreciated by those skilled in the art that the calculation of a criticality value is not an absolute value, but a relative value reflecting the impact a component's failure would have on the application.

Table 2 shows the different notations for parameters used in the exemplary criticality calculation.

TABLE 2 Criticality Terms

In order to be able to compare the criticalities of different component, a common unit of measurement will be defined.

1 unit of outage = F unit of degradation (14)

where 'i ' is a Factor or weighting, a positive input variable that allows the user of our method to determine the proportion between failure and degradation. For instance in some services an outage and a degradation may give the same quality of experience. In such a case, F = 1. In other cases the user may specify a different value for F, e.g. if F=2 this implies that an outage causes twice the impact of the degradation.

Equation (15) illustrates the service outage (out) that a single failure of a given component causes. If the recovery time of the component is less or equal to the outage tolerance time of its dependents, then there is no outage (however there is degradation). If the component is a front end component, (i.e. it has no dependents) the outage is equal to the recovery time. recT if the component is a front end

n

out = { 0 if (recT≤ min( / depOTi)) (15)

recT - min(V depOTi) otherwise

Equation (16) illustrates the service outage (out_I) caused by the failure of the component over the period of time in which the Failure Rate was defined (can be over the period of one year, for example). This service outage equal the outage caused by a single failure multiplied by the failure rate of the component. out I = FRx out (16)

Equation (17) illustrates the service degradation (deg) caused by the failure of a given component. In the case where the recovery time of the component is less the tolerance time of all its dependents, then the degradation is equal to the recovery time. Otherwise, the degradation time is equal to the minimum outage tolerance of its dependents. deg = J^{mi V}o ^deP°^Tl) if (recT > min( V depOTi) ( (17)

recT otherwise

Equation (18) illustrates the degradation (deg l) caused by the failure of the component over the given period of time. deg_I = FRx deg (18)

Finally, the criticality of the component is equal to the impact of its failure on the service being provided by the application. The impact is shown in equation (19) where it is a factor of the outage, the degradation, and the number of active replicas (or instances) of the same component type as the faulty one. This number includes the faulty component.

„ . . .. (Fx out I + deg I) .. ..

Cnticahty = = ( 19)

numAct Following the calculation of criticality values for each component in the application, a heuristic model (for example greedy and pruning algorithms) can be used to sort all components in decreasing order of their criticality. The sorting procedure allows the algorithm to start with the highly critical component.

The proposed heuristic can be divided into different sub-algorithms. Each sub- algorithm can deal with a specific constraint(s) such as the capacity, delay and the availability constraints.

Figure 2 illustrates an example Capacity Algorithm 200. Once the current component to be placed is selected, the heuristic executes the capacity sub-algorithm. This algorithm traverses the cloud and finds the servers that satisfy the computation resources needed by the requested components.

Figure 3 illustrates an example Delay Tolerance Algorithm 300. The set of candidate servers satisfying the capacity constraints are inputted to the delay sub- algorithm. In this algorithm, a pruning procedure is executed to discard the servers that violate the delay constraint. Because the scheduler deals with the case where the minimum delay of the application is the same as its maximum delay, the delay and availability sub-algorithms are applied to each delay type.

Figure 4 illustrates an example Availability Algorithm 400. After delay pruning, the baseline communication performance between the various components is maintained. At this point, an availability baseline can be achieved. In this sub-algorithm, the candidate server list undergoes another stage of pruning that to maximize the availability of each component while finding the locally optimal deployment. Before searching for the server with the highest availability, this algorithm executes the co- location and anti-location algorithms depending on the relation between the tolerance time of a dependent component and the recovery time of its sponsor.

If the co-location constraint is valid, then the capacity algorithm must be executed again in order to find servers that satisfy the computational demands of the group of components that must be co-located. After generating the candidate servers, MaxAvailability algorithm is executed to select the server with the highest availability. If the anti-location algorithm is required, the MaxAvailability algorithm is executed to select the server with the highest availability.

If the capacity, delay and availability algorithms indicate that all the components including the dependent and the redundant ones can be placed on servers satisfying all of the mentioned constraints, then a redundancy algorithm can be executed to generate placements for the redundant components based on the anti-location constraints.

However, if any of the sub-algorithms indicate that the candidate server cannot host the whole package of the components, then computational paths analysis is generated. In this analysis, and if the redundancy model allows, a fail-over procedure is executed in case of a failure in the primary components or any of its execution environment. The computational path analysis identifies the different paths in the given components dependency relation. Simply, a path consists of the components that are needed to maintain the delivery of the service to the user while all the performance and QoS baselines are maintained. The paths are divided into a primary path including all the primary components and a block of paths including the redundant components. Once the computation paths are designated, the earlier sub-algorithms are executed to deal with each computational path on its own.

Although the described algorithms attempt to place all the requested components, sometimes a server is assigned to a critical component while leaving a less-critical component with no candidate servers satisfying all the required constraints. Therefore, an elimination algorithm can be used to handle this case. Whenever a component has a limited number of servers that can satisfy it with its redundant or protection group, the elimination algorithm can be executed to discard these servers from the candidates of other components. The number of redundant components, number of candidate servers, and number of common servers between any two component types are the attributes that trigger this algorithm.

At this stage, the heuristic determines a host for each component. Yet a mapping should be obtained among the elected server, the component and a VM. The heuristic executes a mapping algorithm that creates VMs for the scheduled components according to the delay constraints and then maps them to the chosen hosts.

Figure 5 is a flow chart illustrating an example scheduling method according to embodiments of the present disclosure. The method illustrated in Figure 5 can be used to place virtual machines associated with one or more applications on host servers in a cloud environment. Each application can comprise one or more components. Each component can run on a virtual machine. The cloud network can comprise a plurality of data centers. A data center can comprise multiple hierarchical levels of racks, servers, and blades. A set of available "candidate" servers can be considered for hosting the application(s).

The method of Figure 5 begins by determining if there is at least one application to be scheduled or placed (block 502). If yes, the next application is selected for processing (block 504). For each component in the selected application, a criticality analysis is performed (block 506). The criticality analysis can take, as an input, the various criteria, constraints and inter-dependencies between the components as have been described herein. A criticality value can be calculated for each component. The components of the selected application can then be ranked, or placed in an ordered list, based on their criticality. The components can be ranked in accordance with the relative impact their failure would have on the service(s) provided by the application. The component whose potential failure is deemed to have the highest impact on the overall application can have the highest rank.

The method continues by determining if there is at least one component to be scheduled (block 510). If yes, the highest ranked component in the list is for scheduling (block 512).

The delay tolerance of the selected component is compared to the delay tolerances of the candidate servers (block 514). All servers that do not satisfy the delay tolerance of the component are removed or filtered out of the list of candidate servers. The capacity constraints (e.g. the CPU, memory, storage, bandwidth, etc. requirements) of the selected component are then compared to the available capacity of the remaining servers on the list of candidate servers (block 516). All servers that cannot satisfy the capacity constraints of the component are removed or filtered out of the list of candidate servers. The modified list of candidate servers now only includes servers that meet both the delay tolerance and capacity requirements of the component.

The selected component is then scheduled, or placed, on the server that will maximize the availability of the application that the component belongs to (block 518). Maximizing the availability can include minimizing the potential downtime by selecting a candidate server that will minimize the impact and/or frequency of a component's failure on the application. The step of scheduling can optionally include transmitting instructions to the selected host server to instantiate a virtual machine and to launch the component.

The method then returns to determining if there are any components remaining to be scheduled for the first selected application (block 510). The method will iteratively process the next-highest ranked component, filter out the servers that do not satisfy the delay tolerance and capacity constraints, and place the next-highest component on the server that maximizes the availability of the component's associated applications. This continues until it is determined (in block 510) that all components of the application have been placed on host servers.

The method then returns to determining if there are any applications remaining to be scheduled (block 502). The process continues until all applications have their associated components scheduled.

Optionally, the method of Figure 5 can also include a step of ranking the applications to be scheduled. Applications can be ranked using one or more of the criteria and factors discussed herein. Alternatively, the applications can be ranked based on user preferences.

In some embodiments, the order of the steps performed in the method of Figure 5 can be optionally modified or re-arranged. For example, in some embodiments, the list of candidate servers can be modified to remove servers that do not meet the capacity requirements of the given component (block 516) prior to modifying the server list to remove servers that do not meet the delay tolerance (block 514).

Figure 6 illustrates an example cloud management system architecture. The cloud management system 600 as described herein is designed to perform scheduling in a real cloud setting. The Input/Output (I/O) module 602 is configured for information exchange where it can communicate with the graphical user interface (GUI) 614 to collect application information specified by the user. I/O module 602 can include a cloud model serializer/deserializer 604 to read a model from a file (deserialize) and save the model to file (serialize). I/O module 602 also communicates with the OpenStack module 616, which includes Nova 618 (the compute module of OpenStack) and its database 620, which can be extended to support the notions of datacenters and racks. The database 620 can also be extended for the hosts to include the failure and recovery information. The I/O module 602 also interfaces with the scheduler module 606, and can collect the scheduling results and applies them using Nova CLI commands.

Scheduler module 606 can include the various filters and algorithms as have been discussed herein, including Capacity Filter 608, Delay Filter 610 and HA Filter 612. The scheduler 606 communicates with the OpenStack module 616 to make use of capabilities of any existing filters/algorithms of the OpenStack module 616 and complement them with other filters.

The GUI 614 can contain multiple panels that provide different views of the application's components and the cloud infrastructure. The user can specify the applications, their redundancy groups, the components as well as the component types and the failure types. The user can initiate scheduling of an application via the GUI 614. This triggers the scheduler 606 to define the VM placement, and thereafter the I/O module 602 to update the OpenStack module 616. The GUI 614 may display a view of where exactly the components were scheduled and the expected availability of each component.

Figure 7 is a flow chart illustrating a method for determining placement of an application comprising a plurality of components onto one or more host servers. The method of Figure 7 can be implemented by a cloud manager or scheduler as have been described herein. The set of components that make up the application can encompass a number of different component types. Dependencies between the components in the application can also be defined.

The method begins by calculating a criticality value for each component in the plurality (block 700). The criticality value indicates the relative impact that a failure of the component would have on the overall application. A component's criticality value can be calculated in accordance with a recovery time associated with the component and/or a failure rate associated with the component. The criticality value can also be based on comparing the recovery time of the component with an outage tolerance of a second component in the application that has a dependency on the given component. The criticality value can be calculated in accordance with a degradation value that can be based on determining the minimum outage tolerance of all the components that depend on the given components. In some embodiments, the criticality value can be calculated in accordance with the number of active instances of a component type associated with the component that exist in the application.

Optionally, the plurality of components in the application can be ranked in accordance with their respective criticality values (block 710). In some embodiments, the components are ranked in descending order of criticality.

The component having the highest calculated criticality value is selected for placement (block 720). In some embodiments, a component will be removed from the ranked list of components once it has been placed.

A list of candidate servers for hosting the application components can be compiled and maintained. The list of candidate servers is modified to remove any servers that do not satisfy a functional requirement of the selected component (block 730). The functional requirement can include at least one of a capacity requirement and/or a delay requirement associated with the selected component.

A server is identified and selected (block 740), from the modified list of candidate servers, to host the selected component that will maximize the availability of the application. The server identification can be determined in accordance with a mean time to failure (MTTF) parameter and/or a mean time to recovery (MTTR) parameter associated with the server. In some embodiments, the server with the highest MTTF on the list of candidates can be selected. In some embodiments, the server with lowest MTTR on the list of candidates can be selected. In other embodiments, both MTTF, MTTR, and other parameters can be used to identify a server in the list of candidate servers. A host can be considered to maximize the availability of the application if it minimizes the impact that its potential failure (e.g. failure of the hosted selected component) will have on the application.

In some embodiments, the list of candidate servers can be further modified prior to identify the server to host the selected component in block 740. Optionally, the list of candidate host servers can be modified in response to determining that the selected component must be co-located with a second component in the plurality. In such a case, the list of candidate servers can be modified to include only servers capable of hosting both the selected component and the second component. Optionally, the list of candidate host servers can be modified in response to determining that the selected component cannot be co-located with a second component in the plurality. In this case, a server can be removed from the candidate list if it hosts such a second component. This can include a redundancy relationship between the selected component and the second component indicating that the components cannot be co-located on the same host server.

In some embodiments, other factors can also be considered in maximizing the availability of the application. Resource utilization can be maximized by favoring servers that are already hosting other virtual machines (associated with the same application or other applications). Servers can also be selected based on their relative costs, e.g. one data center site may be powered by a less expensive source of energy or may by more energy efficient than another site.

The selected component is then instantiated on the identified server (block 750). This step can include sending instructions for the component to be instantiated on the identified server. The instructions can be sent to the identified server or a hypervisor/virtualization manager associated with the identified server. The component can be instantiated in response to such instructions. In some optional embodiments, steps 720 through 750 can be repeated iteratively until all components of the application have been placed on host servers. The component with the next highest criticality value can be selected for placement. The list of candidate servers can be redefined for each iteration.

Figure 8 is a block diagram illustrating an example network node or element 800 according to embodiments of the present invention. Network element 800 can be a cloud manager or cloud scheduler device as have been described herein. The cloud manager 800 includes a processor 802, a memory or instruction repository 804, and a communication interface 806. The communication interface 806 can include at least one input port and at least one output port. The memory 804 contains instructions executable by the processor 802 whereby the cloud manager 800 is operable to perform the various embodiments as described herein. In some embodiments, the cloud manager 800 can be a virtualized application hosted by the underlying physical hardware.

Cloud manager 800 is operative to calculate a criticality value for each component in the plurality; select a component having the highest criticality value for placement; modify a list of candidate host servers to remove servers that do not satisfy a functionality requirement associated with the selected component; identify a server in the modified list of candidate host servers that maximizes the availability of the application; and instantiate the selected component on the identified server.

Figure 9 is a block diagram of an example cloud manager node 900 that can include a number of modules. Cloud manager node 900 can include a criticality module 902, a selection module 904, a candidate server module 906, an identification module 908, and a placement module 910. Criticality module 902 is configured to calculate a criticality value for each component in the plurality, the criticality value indicating the relative impact of a failure of the component on the application. Selection module 904 is configured to select a component having the highest criticality value for placement. Candidate server module 906 is configured to modify a list of candidate host servers to remove servers that do not satisfy a functionality requirement associated with the selected component. Identification module 908 is configured to identify a server in the modified list of candidate host servers that maximizes the availability of the application. Placement module 910 is configured to instantiate the selected component on the identified server.

The unexpected outage of cloud services has a great impact on business continuity and IT enterprises. One key to achieving these requirements is to develop an approach that is immune to failure while considering real-time interdependencies and redundancies between applications. Attaining always-on and always-available is an objective of the described HA-aware scheduler by generating a highly-available optimal placement for the requested applications. Those skilled in the art will appreciate that the proposed systems and methods can be extended to include multiple objectives, such as maximizing the HA of applications' components and maximizing resource utilization of the used infrastructure.

Embodiments of the invention may be represented as a software product stored in a machine-readable medium (also referred to as a computer-readable medium, a processor-readable medium, or a computer usable medium having a computer readable program code embodied therein). The non-transitory machine-readable medium may be any suitable tangible medium including a magnetic, optical, or electrical storage medium including a diskette, compact disk read only memory (CD-ROM), digital versatile disc read only memory (DVD-ROM) memory device (volatile or non-volatile), or similar storage mechanism. The machine-readable medium may contain various sets of instructions, code sequences, configuration information, or other data, which, when executed, cause a processor to perform steps in a method according to an embodiment of the invention. Those of ordinary skill in the art will appreciate that other instructions and operations necessary to implement the described invention may also be stored on the machine -readable medium. Software running from the machine-readable medium may interface with circuitry to perform the described tasks.

The above-described embodiments of the present invention are intended to be examples only. Alterations, modifications and variations may be effected to the particular embodiments by those of skill in the art without departing from the scope of the invention, which is defined solely by the claims appended hereto.

Claims

What is claimed is:

1. A method for determining placement of an application comprising a plurality of components onto one or more host servers, the method comprising:

calculating a criticality value for each component in the plurality, the criticality value indicating the relative impact of a failure of the component on the application; selecting a component having the highest criticality value for placement;

modifying a list of candidate host servers to remove servers that do not satisfy a functionality requirement associated with the selected component;

identifying a server in the modified list of candidate host servers that maximizes the availability of the application; and

instantiating the selected component on the identified server.

2. The method of claim 1, wherein the criticality value is calculated in accordance with a recovery time associated with the component.

3. The method of any one of claims 1 to 2, wherein the criticality value is calculated in accordance with a failure rate associated with the component.

4. The method of any one of claims 1 to 3, wherein the criticality value is calculated in accordance with comparing a recovery time of the component to an outage tolerance of a dependent component.

5. The method of any one of claims 1 to 4, wherein the criticality value is calculated in accordance with determining a minimum outage tolerance of a plurality of dependent components.

6. The method of any one of claims 1 to 5, wherein the criticality value is calculated in accordance with a number of active instances of a component type associated with the component.

7. The method of any one of claims 1 to 6, further comprising, ranking the plurality of components in descending order in accordance with their respective criticality value.

8. The method of any one of claims 1 to 7, wherein the functional requirement is a capacity requirement.

9. The method of any one of claims 1 to 8, wherein the functional requirement is a delay requirement.

10. The method of any one of claims 1 to 9, wherein the server is identified in accordance with a mean time to failure parameter associated with the server.

11. The method of any one of claims 1 to 10, wherein the server is identified in accordance with a mean time to recovery parameter associated with the server.

12. The method of any one of claims 1 to 11, further comprising, further modifying the list of candidate host servers in response to determining that the selected component must be co-located with a second component in the plurality.

13. The method of any one of claims 1 to 12, further comprising, further modifying the list of candidate host servers in response to determining that the selected component cannot be co-located with a second component in the plurality.

14. A cloud manager comprising a processor and a memory, the memory containing instructions executable by the processor whereby the cloud manager is operative to: calculate a criticality value for each component in the plurality, the criticality value indicating the relative impact of a failure of the component on the application; select a component having the highest criticality value for placement;

modify a list of candidate host servers to remove servers that do not satisfy a functionality requirement associated with the selected component;

identify a server in the modified list of candidate host servers that maximizes the availability of the application; and instantiate the selected component on the identified server.

15. The cloud manager of claim 14, wherein the criticality value is calculated in accordance with a recovery time associated with the component.

16. The cloud manager of any one of claims 14 to 15, wherein the criticality value is calculated in accordance with a failure rate associated with the component.

17. The cloud manager of any one of claims 14 to 16, wherein the criticality value is calculated in accordance with comparing a recovery time of the component to an outage tolerance of a dependent component.

18. The cloud manager of any one of claims 14 to 17, wherein the criticality value is calculated in accordance with determining a minimum outage tolerance of a plurality of dependent components.

19. The cloud manager of any one of claims 14 to 18, wherein the criticality value is calculated in accordance with a number of active instances of a component type associated with the component.

20. The cloud manager of any one of claims 14 to 19, further comprising, ranking the plurality of components in descending order in accordance with their respective criticality value.

21. The cloud manager of any one of claims 14 to 20, wherein the functional requirement is a capacity requirement.

22. The cloud manager of any one of claims 14 to 21, wherein the functional requirement is a delay requirement.

23. The cloud manager of any one of claims 14 to 22, wherein the server is identified in accordance with a mean time to failure parameter associated with the server.

24. The cloud manager of any one of claims 14 to 23, wherein the server is identified in accordance with a mean time to recovery parameter associated with the server.

25. The cloud manager of any one of claims 14 to 24, further comprising, further modifying the list of candidate host servers in response to determining that the selected component must be co-located with a second component in the plurality.

26. The cloud manager of any one of claims 14 to 25, further comprising, further modifying the list of candidate host servers in response to determining that the selected component cannot be co-located with a second component in the plurality.

27. A cloud manager comprising:

a criticality module for calculating a criticality value for each component in the plurality, the criticality value indicating the relative impact of a failure of the component on the application;

a selection module for selecting a component having the highest criticality value for placement;

a candidate server module for modifying a list of candidate host servers to remove servers that do not satisfy a functionality requirement associated with the selected component;

an identification module for identifying a server in the modified list of candidate host servers that maximizes the availability of the application; and

a placement module for instantiating the selected component on the identified server.