WO2020139072A1 - A method of migrating virtual machines - Google Patents

A method of migrating virtual machines Download PDF

Info

Publication number
WO2020139072A1
WO2020139072A1 PCT/MY2019/050127 MY2019050127W WO2020139072A1 WO 2020139072 A1 WO2020139072 A1 WO 2020139072A1 MY 2019050127 W MY2019050127 W MY 2019050127W WO 2020139072 A1 WO2020139072 A1 WO 2020139072A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
host
hosts
virtual machines
anomalous
Prior art date
Application number
PCT/MY2019/050127
Other languages
French (fr)
Inventor
Shahrol Hisham Bin BAHAROM
Sharipah Binti Setapa
Swee Leong LOW
Jing Yuan Luke
Hong Hoe ONG
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2020139072A1 publication Critical patent/WO2020139072A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/815Virtual
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5019Workload prediction

Abstract

A method of migrating virtual machines (300, 400) is provided, the method (300, 400) including the steps of collecting input/output data of a plurality of hosts over a period of time (301), storing the collected data into a time series database for a predetermined period, building a forecast model using the stored data, predicting a subsequent input/output data within a margin of error using the forecast model, comparing actual input/output data with the predicted data, registering data that falls outside the predicted data as an anomaly and clustering data into normal and abnormal data clusters, tracing the anomaly back to a relevant host, wherein the virtual machines located in hosts with anomalous data are migrated out to other hosts (303) based on analysis of collected data (302) using the steps above and a host selection sequence determined by closest nodes to a source host.

Description

A METHOD OF MIGRATING VIRTUAL MACHINES
Field of Invention
The invention relates to a method of migrating virtual machines.
Background Typically, a virtual computing system consists of host machines and guest virtual machines, VMs. Cloud orchestration platforms such as Open Nebula install agents in the host machines so that these agents continuously send messages to the orchestration platform. If a particular host is found to be unresponsive, the agents will stop sending these messages to the platform, 'litis indicates to the cloud orchestration platform that the host is down and the platform then begins virtual machine migration to other hosts.
However, in most cases the VM migration begins after the host is down and this causes the VM migration to fail as the original or source host is no longer available.
US 2017/0315838 A I describes a system for migrating VMs from a source server to a destination server based on suitability of the destination server to host the VM that is being migrated. However, this solution is only implemented when the source server is close to failure at a certain threshold of load based on performance metrics. This may mean that the VMs may not be efficiently migrated to the new servers or hosts before the source server or host fails. Therefore, there is a need for efficient migration of VMs to a destination host which overcomes the above issues.
Summary of Invention
In an aspect of the invention, a method of migrating virtual machines, is provided, the method including the steps of collecting input/output data of a plurality of hosts over a period of time, storing the collected data into a time series database for a predetermined period, characterized by, building a forecast model using the stored data, predicting a subsequent input/output data within a margin of error using the forecast model, comparing actual input/oulput data with the predicted data, registering data that falls outside the predicted data as an anomaly and clustering data into normal and abnormal data clusters, tracing the anomaly back to a relevant host, wherein virtual machines located in hosts with anomalous data are advantageously migrated out to other hosts based on analysis of collected data using the steps above and a host selection sequence determined by closest nodes to a source host.
Typically, the anomalies are checked for a predetermined period of time and further, anomalous data of hosts are continuously detected until anomalous behaviour returns to normal behaviour in a particular host.
Typically, the VMs are migrated back to the source host once the anomalous behaviour ends. in one embodiment, the method is used to build a decision tree based on the VM migration and anomalous behaviour detection over time which is used to construct a forecast model. Advantageously, a destination host is selected based on the host selection sequence and nearest neighbouring host if two hosts are within a same number of hops.
Typically, the host selection sequence is determined by transformation if network nodes have failed, undergone any change, or are redesigned.
Advantageously, the host selection sequence is dynamically revised based on changes in network nodes where a migration table is used tbr identifying potential hosts for migrating the VMs. in a further aspect of the invention, a virtual machine migration system is provided to be used for migrating virtual machines to a stable host computing system when hosts become unstable.
Typically, the system includes a plurality of VMs hosted by multiple host computing systems in a failover cluster and a management server in communication to the multiple host computing system through a management network via a plurality of virtual switches. Advantageously, the management network provides migration networking if a migration process of the VMs is initiated. in one embodiment, the management server includes a machine learning module within a management virtual machine. Typically, the multiple host computing systems and the plurality of VMs include a machine learning agent in each host computing systems and further a VM network in communication with at least one client device. Advantageously, the machine learning agents collect all input/output data such as
CPU usage data, memory data and networking data and send this data to the machine learning module wherein the data is then saved in a lime series database.
Typically, the machine learning agent builds a forecast model using data in the time series database after a predetermined period. Advantageously, the model is used to predict a subsequent input/output data within a margin of error. Actual time series data is then compared with the model. If the actual data does not tall within the model and margin of error, this is registered as an anomaly. in one embodiment, a machine learning cluster includes a plurality of VMs hosted by multiple host computing systems is in connection to a management network through a plurality of virtual switches wherein the machine learning cluster advantageously provides for the machine learning module to detect the above mentioned anomaly in input/output data from machine learning agent, typically, this detection initiates m igration of a virtual machine from a source host to destination host.
Brief Description of Drawing»
It will be convenient to further describe the present invention with respect to the accompanying drawings that illustrate possible arrangements of the invention. Other arrangements of the invention are possible, and consequently the particularity of the accompanying drawings is not to be understood as superseding the generality of the preceding description of the invention.
Figure l illustrates a block diagram of virtual machine migration system (100) where virtual machines are hosted by multiple host computing systems in a failover cluster.
Figure 2 illustrates a machine learning clusters in the virtual machine migration system to be used for migrating virtual machines.
Figure 3 illustrates a general flow chart of a method of migrating virtual machines.
Figure 4 illustrates a detailed flow chart of the method of migrating virtual machines.
Figure 5 illustrates a flow chart showing a method of determining potential failure of hosts by creating a cluster anomaly based on data collection. Figure 6 illustrates a flow chart showing details of a sequence for selection of a suitable host once any hosts have been identified as a potential failure.
Figure 7 illustrates a diagram showing an example of cluster anomaly of normal and abnormal behaviour.
Detailed Descrtotion
Figure 1 shows virtual machine migration system (100) to be used for migrating virtual machines (VMs) to a stable host or server computing system when hosts or servers become unstable. The system (100) includes a plurality of VMs (VM1-N) hosted by multiple hast computing systems (130A-N) in a failover cluster (133). The system (100) further includes a management server (102) in communication to the multiple host computing system (130A-N) through a management network (106) via a plurality of virtual switches (1 10A-N). The management network (106) provides migration networking if a migration process of the VMs is initiated. The management server (102) further includes a machine learning module (104) within a management virtual machine (103). lire multiple host computing systems (130A-N) and the plurality of VMs (VM I-N) include a machine learning agent (122A-N) in each host computing systems (130A-N). Figure 1 also shows a VM network (107) in communication with at least one client device (105).
The machine learning agents (122A-N) collect all input/output data such as CPU usage data, memory data and networking data. These data is then sent to the machine learning module (104) wherein the data is then saved in a time series database. After a predetermined period, the machine learning agent (122A-N) builds a forecast model using data in the time series database. The model is then used to predict a subsequent input/output data within a margin of error. Actual time series data is then compared with the model. If the actual data does not fall within the model and margin of error, this is registered as an anomaly.
Figure 2 shows a machine learning cluster ( 134) which includes a plurality of VMs
(VMl-N) hosted by multiple host computing systems ( 130A-N) in connection to a management network (106) through a plurality of virtual switches (1 10A-N). For example, the machine learning cluster (134) provides for the machine learning module (104) to detect the above mentioned anomaly in input/output data from the machine learning agent (122B). In this example, this detection initiates migration of VMl and VM2 from host 130A to host 130 B as depicted in Figure 2. It is to be appreciated that usage of the term machine learning cluster (134) is in reference to the failover cluster (133) and hosts (130A-N) and VMs (VM1-N) from the previous paragraph which have now incorporated use of the machine learning module (104) to detect the anomalies and initiate migrations of the VMs. Figure 2 shows host 130B which now includes VMl to VM4 after migration in comparison with the failover cluster (133) where VMl and VM2 where placed in host 130A.
Figure 3 shows a general flow diagram of a method of migrating virtual machines
(300). The method (300) aims to predict host availability when a host, VM or failover cluster (133) becomes busy or unstable in a virtual datacentre. The method (300) includes the steps of collection of host information data (301 ) where the data is collected and sent to the machine learning module ( 104) to be stored in a time series database. The collected data is analyzed (302) through an algorithm. Output of the data analysis in .302 is used to decide on VM migration to a low risk host (303). A
5 suitable host tor migration is determined by the analyzed data gathered from the machine learning agents (122A-N) installed in each host.
Figure 4 shows a detailed flow diagram of the method of migrating virtual machines
(400). Data that is collected from the machine learning agents (I22A-N) is analysed (401). The analysis is performed using a reinforcement learning algorithm and forecasting is done using a Kalman filter technique. Mathematical transformation of the data is performed (402) to build a forecast mode! such as ARJMA model. The model is then used to detect anomalies in the data collection by comparing with actual data from a time series database from the hosts to determine if the actual data falls within the model and margin of error to detect anomalies in data. Any unusual anomalies that are detected (403) will be traced back to a relevant host. This unusual behaviour of the host is then tagged (404) wherein a decision is made to migrate all the VMs on this tagged host to a new host (405) when the condition of anomalies are fulfilled. These anomalies are checked for a predetermined period of time (406) wherein further abnormal data of hosts are continuously detected (407). VMs on these hosts are then migrated out of source hosts based on the anomalous pattern to ensure that the VMs are no longer hosted on them (408). This loop is continuously run until anomalous behaviour returns to normal behaviour and the VMs are migrated back to the source host once the anomalous period ends (410). This method is nm continuously to check for anomalies even after the anomalous period ends. The method (400) is used to build a decision tree based on the VM migration and anomalous pattern detection over time which is used to construct the forecast model for forecasting failure of the hosts as previously explained.
Figure 5 shows further details of the method of migrating virtual machines by determining potential failure of hosts. The method creates a cluster anomaly based on data collection (SOI) from the input/output data of hosts. The collected data is differentiated or categorized by normal and abnormal behaviour (502) based on the forecast model built from the earlier data collection. Hosts in both normal and abnormal clusters are checked (503) in order to find suitable hosts with different abilities when considered in varying views, environment and host capabilities. This establishes an early detection of failure (504) where a host selection sequence can be determined by transformation (505) if network nodes have failed, undergone any change, or are redesigned. This transformation is represented by a migration table which is described below.
Figure 6 shows details of a sequence for selection of a suitable host once any hosts have been identified as a potential failure. A host selection sequence is picked and executed (601 ). If any changes to the network nodes are done, the selection sequence is revised in order to reflect this change (602). A migration table is drawn up and potential hosts for migrating the VMs are selected based on this table (603). Further details on the migration table is described later. If the closest suitable host must be selected from two host destinations with the same number of hops from source host to destination host (604), the method selects the nearest neighbour to migrate the VM to (60S) out of the two host: destinations. If there are no host destinations with die same number of hops from the source, the sequence ends as the hosts are selected based on the selection sequence based on the migration table. An example of this is described further in the following paragraphs,
Figure 7 shows an example of a 5 cluster anomaly of hosts that are determined using the method described above. Using the method described above, dusters 1, 4 and 5 have been determined to be a normal duster based on the collected input/output data but cluster 2 and 3 behave abnormally and contain vague information when comparing neighbourhood clusters with different views, environment and host capabilities. Hosts 1, 2 and 3 fall under normal clusters of 1, 4 and 5, however, hosts
4 and 5 fall within abnomtal dusters of 2 and 3. Therefore, the method provides an awareness of a possibility of failure of hosts 4 and 5 and necessary steps must be prepared for early VM migration before failure occurs.
An example of an early VM migration is explained herein. Table l shows a distance of path or hop destination for migrating VMs wherein VMl, VM2 and VM3 are located in host 1, host 2 and host 3 respectively. If host 4 and host 5 have failed, preparation of a path to migrate VMs are done based on Table 1. From table 1, host 4 and 5 are the closest nodes as it takes the minimum number of hops to get to the destination. Table 2 shows the path of migration based on nearest host (determined by smallest number of hops taken to destination). Host 4 and 5 would be omitted or replaced with new nodes or devices in future predictions if early detection shows abnormal activity of behaviour. This information is updated constantly as the path keeps changing based on network node changes.
Figure imgf000013_0002
A tracert or traceroute command is used to find number of hops between the source host and the destination host, in the event that a hop path has changed due to an updated device in a node or if a particular node is down. The method then searches for a nearest host based on number of hops taken to destination. For example in the migration of VM3 as seen in Table 2, although the number of hops taken to reach host 2 and 5 is the same, i.e 6 hops, the method selects host 2 as it is physically closer to VM3 (in this example).
Figure imgf000013_0001
Figure imgf000014_0001
it is to be appreciated that the method may be implemented for early failure detection in a data centre before a host becomes unstable. The method is used to migrate VMs out of potentially affected hosts in a failover cluster to a stable host computing system.

Claims

1. A method of migrating virtual machines (300, 400), the method (300, 400) including the steps of:
collecting inpul/Output data of a plurality of hosts over a period of time (301); and
storing the collected data into a time series database for a predetermined period.
characterized by,
building a forecast model using the stored data;
predicting a subsequent input/output data within a margin of error using the forecast model;
comparing actual inpul/output data with the predicted data;
registering data that falls outside the predicted data as an anomaly and clustering data into normal and abnormal data clusters; and
tracing the anomaly back to a relevant host,
wherein the virtual machines located in hosts with anomalous data are migrated out to other hosts (303) based on analysis of collected data (302) and a host selection sequence determined by closest nodes to a source host.
2. The method (300, 400) as claimed in claim l, wherein the anomalies are cheeked for a predetermined period of time (406) and further, anomalous data of hosts are continuously detected (407) until anomalous behaviour returns to normal behaviour in a particular host.
3. The method (300, 400) as claimed in claim 2, wherein the virtual machines are migrated back to the source host once the anomalous behaviour ends
(410).
4. The method (300, 400) as claimed in claim 2, wherein the method (300, 400) is used to build a decision tree based on the virtual machine migration and anomalous behaviour detection over time which is used to construct the forecast model.
5. The method (300, 400) as claimed in claim l, wherein a destination host is selected based on the host selection sequence and nearest neighbouring host if two hosts are within a same number of hops.
6. The method (300, 400) as claimed in claim 1, wherein the host selection sequence is determined by transformation (505) if network nodes have failed. undergone any change, or are redesigned.
7. The method (300, 400) as claimed in claim 6, wherein the host selection sequence is dynamically revised based on changes in network nodes (602) where a migration table is used for identifying potential hosts for migrating the virtual machines.
PCT/MY2019/050127 2018-12-26 2019-12-24 A method of migrating virtual machines WO2020139072A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI2018002920 2018-12-26
MYPI2018002920 2018-12-26

Publications (1)

Publication Number Publication Date
WO2020139072A1 true WO2020139072A1 (en) 2020-07-02

Family

ID=71128318

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2019/050127 WO2020139072A1 (en) 2018-12-26 2019-12-24 A method of migrating virtual machines

Country Status (1)

Country Link
WO (1) WO2020139072A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115800272A (en) * 2023-02-06 2023-03-14 国网山东省电力公司东营供电公司 Power grid fault analysis method, system, terminal and medium based on topology identification

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080250265A1 (en) * 2007-04-05 2008-10-09 Shu-Ping Chang Systems and methods for predictive failure management
US20120137293A1 (en) * 2004-05-08 2012-05-31 Bozek James J Dynamic migration of virtual machine computer programs upon satisfaction of conditions
US20130305093A1 (en) * 2012-05-14 2013-11-14 International Business Machines Corporation Problem Determination and Diagnosis in Shared Dynamic Clouds
US20140068608A1 (en) * 2012-09-05 2014-03-06 Cisco Technology, Inc. Dynamic Virtual Machine Consolidation
US20150074023A1 (en) * 2013-09-09 2015-03-12 North Carolina State University Unsupervised behavior learning system and method for predicting performance anomalies in distributed computing infrastructures

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120137293A1 (en) * 2004-05-08 2012-05-31 Bozek James J Dynamic migration of virtual machine computer programs upon satisfaction of conditions
US20080250265A1 (en) * 2007-04-05 2008-10-09 Shu-Ping Chang Systems and methods for predictive failure management
US20130305093A1 (en) * 2012-05-14 2013-11-14 International Business Machines Corporation Problem Determination and Diagnosis in Shared Dynamic Clouds
US20140068608A1 (en) * 2012-09-05 2014-03-06 Cisco Technology, Inc. Dynamic Virtual Machine Consolidation
US20150074023A1 (en) * 2013-09-09 2015-03-12 North Carolina State University Unsupervised behavior learning system and method for predicting performance anomalies in distributed computing infrastructures

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115800272A (en) * 2023-02-06 2023-03-14 国网山东省电力公司东营供电公司 Power grid fault analysis method, system, terminal and medium based on topology identification

Similar Documents

Publication Publication Date Title
US10375169B1 (en) System and method for automatically triggering the live migration of cloud services and automatically performing the triggered migration
US10048996B1 (en) Predicting infrastructure failures in a data center for hosted service mitigation actions
US8862744B2 (en) Optimizing traffic load in a communications network
US11165690B2 (en) Request routing based on server software versions
US10797938B2 (en) Automatic monitoring, correlation, and resolution of network alarm conditions
US9400731B1 (en) Forecasting server behavior
EP3956771B1 (en) Timeout mode for storage devices
JP2018207241A (en) Management device, management method, and management program
AU2021218159B2 (en) Utilizing machine learning models to determine customer care actions for telecommunications network providers
US20230124166A1 (en) Application programming interface anomaly detection
CN108347339A (en) A kind of service restoration method and device
Diallo et al. AutoMigrate: a framework for developing intelligent, self-managing cloud services with maximum availability
CN105635285B (en) A kind of VM migration scheduling method based on state aware
WO2020139072A1 (en) A method of migrating virtual machines
Tuli et al. Carol: Confidence-aware resilience model for edge federations
CN109818785A (en) A kind of data processing method, server cluster and storage medium
US20230094964A1 (en) Dynamic management of locations of modules of a platform hosted by a distributed system
De Grande et al. Dynamic partitioning of distributed virtual simulations for reducing communication load
CN113596146B (en) Resource scheduling method and device based on big data
WO2023154051A1 (en) Determining root causes of anomalies in services
Inoue et al. Noise-induced VNE method for software-defined infrastructure with uncertain delay behaviors
US11315693B2 (en) Method and system for managing operation associated with an object on IoT enabled devices
CN104883273A (en) Method and system for processing service influence model in virtualized service management platform
KR102467522B1 (en) High Availability System of Global Sharing Virtualization Resource for Cloud Infrastructure
CN113038488B (en) Link planning method and device for network slice, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19903320

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19903320

Country of ref document: EP

Kind code of ref document: A1