WO2020139072A1

WO2020139072A1 - A method of migrating virtual machines

Info

Publication number: WO2020139072A1
Application number: PCT/MY2019/050127
Authority: WO
Inventors: Shahrol Hisham Bin BAHAROM; Sharipah Binti Setapa; Swee Leong LOW; Jing Yuan Luke; Hong Hoe ONG
Original assignee: Mimos Berhad
Priority date: 2018-12-26
Filing date: 2019-12-24
Publication date: 2020-07-02
Also published as: MY202027A

Abstract

A method of migrating virtual machines (300, 400) is provided, the method (300, 400) including the steps of collecting input/output data of a plurality of hosts over a period of time (301), storing the collected data into a time series database for a predetermined period, building a forecast model using the stored data, predicting a subsequent input/output data within a margin of error using the forecast model, comparing actual input/output data with the predicted data, registering data that falls outside the predicted data as an anomaly and clustering data into normal and abnormal data clusters, tracing the anomaly back to a relevant host, wherein the virtual machines located in hosts with anomalous data are migrated out to other hosts (303) based on analysis of collected data (302) using the steps above and a host selection sequence determined by closest nodes to a source host.

Description

A METHOD OF MIGRATING VIRTUAL MACHINES

Field of Invention

The invention relates to a method of migrating virtual machines.

Background Typically, a virtual computing system consists of host machines and guest virtual machines, VMs. Cloud orchestration platforms such as Open Nebula install agents in the host machines so that these agents continuously send messages to the orchestration platform. If a particular host is found to be unresponsive, the agents will stop sending these messages to the platform, 'litis indicates to the cloud orchestration platform that the host is down and the platform then begins virtual machine migration to other hosts.

However, in most cases the VM migration begins after the host is down and this causes the VM migration to fail as the original or source host is no longer available.

US 2017/0315838 A I describes a system for migrating VMs from a source server to a destination server based on suitability of the destination server to host the VM that is being migrated. However, this solution is only implemented when the source server is close to failure at a certain threshold of load based on performance metrics. This may mean that the VMs may not be efficiently migrated to the new servers or hosts before the source server or host fails. Therefore, there is a need for efficient migration of VMs to a destination host which overcomes the above issues.

Summary of Invention

In an aspect of the invention, a method of migrating virtual machines, is provided, the method including the steps of collecting input/output data of a plurality of hosts over a period of time, storing the collected data into a time series database for a predetermined period, characterized by, building a forecast model using the stored data, predicting a subsequent input/output data within a margin of error using the forecast model, comparing actual input/oulput data with the predicted data, registering data that falls outside the predicted data as an anomaly and clustering data into normal and abnormal data clusters, tracing the anomaly back to a relevant host, wherein virtual machines located in hosts with anomalous data are advantageously migrated out to other hosts based on analysis of collected data using the steps above and a host selection sequence determined by closest nodes to a source host.

Typically, the anomalies are checked for a predetermined period of time and further, anomalous data of hosts are continuously detected until anomalous behaviour returns to normal behaviour in a particular host.

Typically, the VMs are migrated back to the source host once the anomalous behaviour ends. in one embodiment, the method is used to build a decision tree based on the VM migration and anomalous behaviour detection over time which is used to construct a forecast model. Advantageously, a destination host is selected based on the host selection sequence and nearest neighbouring host if two hosts are within a same number of hops.

Typically, the host selection sequence is determined by transformation if network nodes have failed, undergone any change, or are redesigned.

Advantageously, the host selection sequence is dynamically revised based on changes in network nodes where a migration table is used tbr identifying potential hosts for migrating the VMs. in a further aspect of the invention, a virtual machine migration system is provided to be used for migrating virtual machines to a stable host computing system when hosts become unstable.

Typically, the system includes a plurality of VMs hosted by multiple host computing systems in a failover cluster and a management server in communication to the multiple host computing system through a management network via a plurality of virtual switches. Advantageously, the management network provides migration networking if a migration process of the VMs is initiated. in one embodiment, the management server includes a machine learning module within a management virtual machine. Typically, the multiple host computing systems and the plurality of VMs include a machine learning agent in each host computing systems and further a VM network in communication with at least one client device. Advantageously, the machine learning agents collect all input/output data such as

CPU usage data, memory data and networking data and send this data to the machine learning module wherein the data is then saved in a lime series database.

Typically, the machine learning agent builds a forecast model using data in the time series database after a predetermined period. Advantageously, the model is used to predict a subsequent input/output data within a margin of error. Actual time series data is then compared with the model. If the actual data does not tall within the model and margin of error, this is registered as an anomaly. in one embodiment, a machine learning cluster includes a plurality of VMs hosted by multiple host computing systems is in connection to a management network through a plurality of virtual switches wherein the machine learning cluster advantageously provides for the machine learning module to detect the above mentioned anomaly in input/output data from machine learning agent, typically, this detection initiates m igration of a virtual machine from a source host to destination host.

Brief Description of Drawing»

It will be convenient to further describe the present invention with respect to the accompanying drawings that illustrate possible arrangements of the invention. Other arrangements of the invention are possible, and consequently the particularity of the accompanying drawings is not to be understood as superseding the generality of the preceding description of the invention.

Figure l illustrates a block diagram of virtual machine migration system (100) where virtual machines are hosted by multiple host computing systems in a failover cluster.

Figure 2 illustrates a machine learning clusters in the virtual machine migration system to be used for migrating virtual machines.

Figure 3 illustrates a general flow chart of a method of migrating virtual machines.

Figure 4 illustrates a detailed flow chart of the method of migrating virtual machines.

Figure 5 illustrates a flow chart showing a method of determining potential failure of hosts by creating a cluster anomaly based on data collection. Figure 6 illustrates a flow chart showing details of a sequence for selection of a suitable host once any hosts have been identified as a potential failure.

Figure 7 illustrates a diagram showing an example of cluster anomaly of normal and abnormal behaviour.

Detailed Descrtotion

Figure 1 shows virtual machine migration system (100) to be used for migrating virtual machines (VMs) to a stable host or server computing system when hosts or servers become unstable. The system (100) includes a plurality of VMs (VM1-N) hosted by multiple hast computing systems (130A-N) in a failover cluster (133). The system (100) further includes a management server (102) in communication to the multiple host computing system (130A-N) through a management network (106) via a plurality of virtual switches (1 10A-N). The management network (106) provides migration networking if a migration process of the VMs is initiated. The management server (102) further includes a machine learning module (104) within a management virtual machine (103). lire multiple host computing systems (130A-N) and the plurality of VMs (VM I-N) include a machine learning agent (122A-N) in each host computing systems (130A-N). Figure 1 also shows a VM network (107) in communication with at least one client device (105).

The machine learning agents (122A-N) collect all input/output data such as CPU usage data, memory data and networking data. These data is then sent to the machine learning module (104) wherein the data is then saved in a time series database. After a predetermined period, the machine learning agent (122A-N) builds a forecast model using data in the time series database. The model is then used to predict a subsequent input/output data within a margin of error. Actual time series data is then compared with the model. If the actual data does not fall within the model and margin of error, this is registered as an anomaly.

Figure 2 shows a machine learning cluster ( 134) which includes a plurality of VMs

(VMl-N) hosted by multiple host computing systems ( 130A-N) in connection to a management network (106) through a plurality of virtual switches (1 10A-N). For example, the machine learning cluster (134) provides for the machine learning module (104) to detect the above mentioned anomaly in input/output data from the machine learning agent (122B). In this example, this detection initiates migration of VMl and VM2 from host 130A to host 130 B as depicted in Figure 2. It is to be appreciated that usage of the term machine learning cluster (134) is in reference to the failover cluster (133) and hosts (130A-N) and VMs (VM1-N) from the previous paragraph which have now incorporated use of the machine learning module (104) to detect the anomalies and initiate migrations of the VMs. Figure 2 shows host 130B which now includes VMl to VM4 after migration in comparison with the failover cluster (133) where VMl and VM2 where placed in host 130A.

Figure 3 shows a general flow diagram of a method of migrating virtual machines

(300). The method (300) aims to predict host availability when a host, VM or failover cluster (133) becomes busy or unstable in a virtual datacentre. The method (300) includes the steps of collection of host information data (301 ) where the data is collected and sent to the machine learning module ( 104) to be stored in a time series database. The collected data is analyzed (302) through an algorithm. Output of the data analysis in .302 is used to decide on VM migration to a low risk host (303). A

5 suitable host tor migration is determined by the analyzed data gathered from the machine learning agents (122A-N) installed in each host.

Figure 4 shows a detailed flow diagram of the method of migrating virtual machines

(400). Data that is collected from the machine learning agents (I22A-N) is analysed (401). The analysis is performed using a reinforcement learning algorithm and forecasting is done using a Kalman filter technique. Mathematical transformation of the data is performed (402) to build a forecast mode! such as ARJMA model. The model is then used to detect anomalies in the data collection by comparing with actual data from a time series database from the hosts to determine if the actual data falls within the model and margin of error to detect anomalies in data. Any unusual anomalies that are detected (403) will be traced back to a relevant host. This unusual behaviour of the host is then tagged (404) wherein a decision is made to migrate all the VMs on this tagged host to a new host (405) when the condition of anomalies are fulfilled. These anomalies are checked for a predetermined period of time (406) wherein further abnormal data of hosts are continuously detected (407). VMs on these hosts are then migrated out of source hosts based on the anomalous pattern to ensure that the VMs are no longer hosted on them (408). This loop is continuously run until anomalous behaviour returns to normal behaviour and the VMs are migrated back to the source host once the anomalous period ends (410). This method is nm continuously to check for anomalies even after the anomalous period ends. The method (400) is used to build a decision tree based on the VM migration and anomalous pattern detection over time which is used to construct the forecast model for forecasting failure of the hosts as previously explained.

Figure 5 shows further details of the method of migrating virtual machines by determining potential failure of hosts. The method creates a cluster anomaly based on data collection (SOI) from the input/output data of hosts. The collected data is differentiated or categorized by normal and abnormal behaviour (502) based on the forecast model built from the earlier data collection. Hosts in both normal and abnormal clusters are checked (503) in order to find suitable hosts with different abilities when considered in varying views, environment and host capabilities. This establishes an early detection of failure (504) where a host selection sequence can be determined by transformation (505) if network nodes have failed, undergone any change, or are redesigned. This transformation is represented by a migration table which is described below.

Figure 6 shows details of a sequence for selection of a suitable host once any hosts have been identified as a potential failure. A host selection sequence is picked and executed (601 ). If any changes to the network nodes are done, the selection sequence is revised in order to reflect this change (602). A migration table is drawn up and potential hosts for migrating the VMs are selected based on this table (603). Further details on the migration table is described later. If the closest suitable host must be selected from two host destinations with the same number of hops from source host to destination host (604), the method selects the nearest neighbour to migrate the VM to (60S) out of the two host: destinations. If there are no host destinations with die same number of hops from the source, the sequence ends as the hosts are selected based on the selection sequence based on the migration table. An example of this is described further in the following paragraphs,

Figure 7 shows an example of a 5 cluster anomaly of hosts that are determined using the method described above. Using the method described above, dusters 1, 4 and 5 have been determined to be a normal duster based on the collected input/output data but cluster 2 and 3 behave abnormally and contain vague information when comparing neighbourhood clusters with different views, environment and host capabilities. Hosts 1, 2 and 3 fall under normal clusters of 1, 4 and 5, however, hosts

4 and 5 fall within abnomtal dusters of 2 and 3. Therefore, the method provides an awareness of a possibility of failure of hosts 4 and 5 and necessary steps must be prepared for early VM migration before failure occurs.

An example of an early VM migration is explained herein. Table l shows a distance of path or hop destination for migrating VMs wherein VMl, VM2 and VM3 are located in host 1, host 2 and host 3 respectively. If host 4 and host 5 have failed, preparation of a path to migrate VMs are done based on Table 1. From table 1, host 4 and 5 are the closest nodes as it takes the minimum number of hops to get to the destination. Table 2 shows the path of migration based on nearest host (determined by smallest number of hops taken to destination). Host 4 and 5 would be omitted or replaced with new nodes or devices in future predictions if early detection shows abnormal activity of behaviour. This information is updated constantly as the path keeps changing based on network node changes.

A tracert or traceroute command is used to find number of hops between the source host and the destination host, in the event that a hop path has changed due to an updated device in a node or if a particular node is down. The method then searches for a nearest host based on number of hops taken to destination. For example in the migration of VM3 as seen in Table 2, although the number of hops taken to reach host 2 and 5 is the same, i.e 6 hops, the method selects host 2 as it is physically closer to VM3 (in this example).

it is to be appreciated that the method may be implemented for early failure detection in a data centre before a host becomes unstable. The method is used to migrate VMs out of potentially affected hosts in a failover cluster to a stable host computing system.

Claims

1. A method of migrating virtual machines (300, 400), the method (300, 400) including the steps of:

collecting inpul/Output data of a plurality of hosts over a period of time (301); and

storing the collected data into a time series database for a predetermined period.

characterized by,

building a forecast model using the stored data;

predicting a subsequent input/output data within a margin of error using the forecast model;

comparing actual inpul/output data with the predicted data;

registering data that falls outside the predicted data as an anomaly and clustering data into normal and abnormal data clusters; and

tracing the anomaly back to a relevant host,

wherein the virtual machines located in hosts with anomalous data are migrated out to other hosts (303) based on analysis of collected data (302) and a host selection sequence determined by closest nodes to a source host.

2. The method (300, 400) as claimed in claim l, wherein the anomalies are cheeked for a predetermined period of time (406) and further, anomalous data of hosts are continuously detected (407) until anomalous behaviour returns to normal behaviour in a particular host.

3. The method (300, 400) as claimed in claim 2, wherein the virtual machines are migrated back to the source host once the anomalous behaviour ends

(410).

4. The method (300, 400) as claimed in claim 2, wherein the method (300, 400) is used to build a decision tree based on the virtual machine migration and anomalous behaviour detection over time which is used to construct the forecast model.

5. The method (300, 400) as claimed in claim l, wherein a destination host is selected based on the host selection sequence and nearest neighbouring host if two hosts are within a same number of hops.

6. The method (300, 400) as claimed in claim 1, wherein the host selection sequence is determined by transformation (505) if network nodes have failed. undergone any change, or are redesigned.

7. The method (300, 400) as claimed in claim 6, wherein the host selection sequence is dynamically revised based on changes in network nodes (602) where a migration table is used for identifying potential hosts for migrating the virtual machines.