WO2020139072A1 - A method of migrating virtual machines - Google Patents
A method of migrating virtual machines Download PDFInfo
- Publication number
- WO2020139072A1 WO2020139072A1 PCT/MY2019/050127 MY2019050127W WO2020139072A1 WO 2020139072 A1 WO2020139072 A1 WO 2020139072A1 MY 2019050127 W MY2019050127 W MY 2019050127W WO 2020139072 A1 WO2020139072 A1 WO 2020139072A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- host
- hosts
- virtual machines
- anomalous
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3447—Performance evaluation by modeling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/4557—Distribution of virtual machine instances; Migration and load balancing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/815—Virtual
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5019—Workload prediction
Abstract
A method of migrating virtual machines (300, 400) is provided, the method (300, 400) including the steps of collecting input/output data of a plurality of hosts over a period of time (301), storing the collected data into a time series database for a predetermined period, building a forecast model using the stored data, predicting a subsequent input/output data within a margin of error using the forecast model, comparing actual input/output data with the predicted data, registering data that falls outside the predicted data as an anomaly and clustering data into normal and abnormal data clusters, tracing the anomaly back to a relevant host, wherein the virtual machines located in hosts with anomalous data are migrated out to other hosts (303) based on analysis of collected data (302) using the steps above and a host selection sequence determined by closest nodes to a source host.
Description
A METHOD OF MIGRATING VIRTUAL MACHINES
Field of Invention
The invention relates to a method of migrating virtual machines.
Background Typically, a virtual computing system consists of host machines and guest virtual machines, VMs. Cloud orchestration platforms such as Open Nebula install agents in the host machines so that these agents continuously send messages to the orchestration platform. If a particular host is found to be unresponsive, the agents will stop sending these messages to the platform, 'litis indicates to the cloud orchestration platform that the host is down and the platform then begins virtual machine migration to other hosts.
However, in most cases the VM migration begins after the host is down and this causes the VM migration to fail as the original or source host is no longer available.
US 2017/0315838 A I describes a system for migrating VMs from a source server to a destination server based on suitability of the destination server to host the VM that is being migrated. However, this solution is only implemented when the source server is close to failure at a certain threshold of load based on performance metrics. This may mean that the VMs may not be efficiently migrated to the new servers or hosts before the source server or host fails.
Therefore, there is a need for efficient migration of VMs to a destination host which overcomes the above issues.
Summary of Invention
In an aspect of the invention, a method of migrating virtual machines, is provided, the method including the steps of collecting input/output data of a plurality of hosts over a period of time, storing the collected data into a time series database for a predetermined period, characterized by, building a forecast model using the stored data, predicting a subsequent input/output data within a margin of error using the forecast model, comparing actual input/oulput data with the predicted data, registering data that falls outside the predicted data as an anomaly and clustering data into normal and abnormal data clusters, tracing the anomaly back to a relevant host, wherein virtual machines located in hosts with anomalous data are advantageously migrated out to other hosts based on analysis of collected data using the steps above and a host selection sequence determined by closest nodes to a source host.
Typically, the anomalies are checked for a predetermined period of time and further, anomalous data of hosts are continuously detected until anomalous behaviour returns to normal behaviour in a particular host.
Typically, the VMs are migrated back to the source host once the anomalous behaviour ends.
in one embodiment, the method is used to build a decision tree based on the VM migration and anomalous behaviour detection over time which is used to construct a forecast model. Advantageously, a destination host is selected based on the host selection sequence and nearest neighbouring host if two hosts are within a same number of hops.
Typically, the host selection sequence is determined by transformation if network nodes have failed, undergone any change, or are redesigned.
Advantageously, the host selection sequence is dynamically revised based on changes in network nodes where a migration table is used tbr identifying potential hosts for migrating the VMs. in a further aspect of the invention, a virtual machine migration system is provided to be used for migrating virtual machines to a stable host computing system when hosts become unstable.
Typically, the system includes a plurality of VMs hosted by multiple host computing systems in a failover cluster and a management server in communication to the multiple host computing system through a management network via a plurality of virtual switches.
Advantageously, the management network provides migration networking if a migration process of the VMs is initiated. in one embodiment, the management server includes a machine learning module within a management virtual machine. Typically, the multiple host computing systems and the plurality of VMs include a machine learning agent in each host computing systems and further a VM network in communication with at least one client device. Advantageously, the machine learning agents collect all input/output data such as
CPU usage data, memory data and networking data and send this data to the machine learning module wherein the data is then saved in a lime series database.
Typically, the machine learning agent builds a forecast model using data in the time series database after a predetermined period. Advantageously, the model is used to predict a subsequent input/output data within a margin of error. Actual time series data is then compared with the model. If the actual data does not tall within the model and margin of error, this is registered as an anomaly. in one embodiment, a machine learning cluster includes a plurality of VMs hosted by multiple host computing systems is in connection to a management network through a plurality of virtual switches wherein the machine learning cluster advantageously provides for the machine learning module to detect the above mentioned anomaly in
input/output data from machine learning agent, typically, this detection initiates m igration of a virtual machine from a source host to destination host.
Brief Description of Drawing»
It will be convenient to further describe the present invention with respect to the accompanying drawings that illustrate possible arrangements of the invention. Other arrangements of the invention are possible, and consequently the particularity of the accompanying drawings is not to be understood as superseding the generality of the preceding description of the invention.
Figure l illustrates a block diagram of virtual machine migration system (100) where virtual machines are hosted by multiple host computing systems in a failover cluster.
Figure 2 illustrates a machine learning clusters in the virtual machine migration system to be used for migrating virtual machines.
Figure 3 illustrates a general flow chart of a method of migrating virtual machines.
Figure 4 illustrates a detailed flow chart of the method of migrating virtual machines.
Figure 5 illustrates a flow chart showing a method of determining potential failure of hosts by creating a cluster anomaly based on data collection.
Figure 6 illustrates a flow chart showing details of a sequence for selection of a suitable host once any hosts have been identified as a potential failure.
Figure 7 illustrates a diagram showing an example of cluster anomaly of normal and abnormal behaviour.
Detailed Descrtotion
Figure 1 shows virtual machine migration system (100) to be used for migrating virtual machines (VMs) to a stable host or server computing system when hosts or servers become unstable. The system (100) includes a plurality of VMs (VM1-N) hosted by multiple hast computing systems (130A-N) in a failover cluster (133). The system (100) further includes a management server (102) in communication to the multiple host computing system (130A-N) through a management network (106) via a plurality of virtual switches (1 10A-N). The management network (106) provides migration networking if a migration process of the VMs is initiated. The management server (102) further includes a machine learning module (104) within a management virtual machine (103). lire multiple host computing systems (130A-N) and the plurality of VMs (VM I-N) include a machine learning agent (122A-N) in each host computing systems (130A-N). Figure 1 also shows a VM network (107) in communication with at least one client device (105).
The machine learning agents (122A-N) collect all input/output data such as CPU usage data, memory data and networking data. These data is then sent to the machine
learning module (104) wherein the data is then saved in a time series database. After a predetermined period, the machine learning agent (122A-N) builds a forecast model using data in the time series database. The model is then used to predict a subsequent input/output data within a margin of error. Actual time series data is then compared with the model. If the actual data does not fall within the model and margin of error, this is registered as an anomaly.
Figure 2 shows a machine learning cluster ( 134) which includes a plurality of VMs
(VMl-N) hosted by multiple host computing systems ( 130A-N) in connection to a management network (106) through a plurality of virtual switches (1 10A-N). For example, the machine learning cluster (134) provides for the machine learning module (104) to detect the above mentioned anomaly in input/output data from the machine learning agent (122B). In this example, this detection initiates migration of VMl and VM2 from host 130A to host 130 B as depicted in Figure 2. It is to be appreciated that usage of the term machine learning cluster (134) is in reference to the failover cluster (133) and hosts (130A-N) and VMs (VM1-N) from the previous paragraph which have now incorporated use of the machine learning module (104) to detect the anomalies and initiate migrations of the VMs. Figure 2 shows host 130B which now includes VMl to VM4 after migration in comparison with the failover cluster (133) where VMl and VM2 where placed in host 130A.
Figure 3 shows a general flow diagram of a method of migrating virtual machines
(300). The method (300) aims to predict host availability when a host, VM or failover cluster (133) becomes busy or unstable in a virtual datacentre. The method
(300) includes the steps of collection of host information data (301 ) where the data is collected and sent to the machine learning module ( 104) to be stored in a time series database. The collected data is analyzed (302) through an algorithm. Output of the data analysis in .302 is used to decide on VM migration to a low risk host (303). A
5 suitable host tor migration is determined by the analyzed data gathered from the machine learning agents (122A-N) installed in each host.
Figure 4 shows a detailed flow diagram of the method of migrating virtual machines
(400). Data that is collected from the machine learning agents (I22A-N) is analysed (401). The analysis is performed using a reinforcement learning algorithm and forecasting is done using a Kalman filter technique. Mathematical transformation of the data is performed (402) to build a forecast mode! such as ARJMA model. The model is then used to detect anomalies in the data collection by comparing with actual data from a time series database from the hosts to determine if the actual data falls within the model and margin of error to detect anomalies in data. Any unusual anomalies that are detected (403) will be traced back to a relevant host. This unusual behaviour of the host is then tagged (404) wherein a decision is made to migrate all the VMs on this tagged host to a new host (405) when the condition of anomalies are fulfilled. These anomalies are checked for a predetermined period of time (406) wherein further abnormal data of hosts are continuously detected (407). VMs on these hosts are then migrated out of source hosts based on the anomalous pattern to ensure that the VMs are no longer hosted on them (408). This loop is continuously run until anomalous behaviour returns to normal behaviour and the VMs are migrated back to the source host once the anomalous period ends (410). This method
is nm continuously to check for anomalies even after the anomalous period ends. The method (400) is used to build a decision tree based on the VM migration and anomalous pattern detection over time which is used to construct the forecast model for forecasting failure of the hosts as previously explained.
Figure 5 shows further details of the method of migrating virtual machines by determining potential failure of hosts. The method creates a cluster anomaly based on data collection (SOI) from the input/output data of hosts. The collected data is differentiated or categorized by normal and abnormal behaviour (502) based on the forecast model built from the earlier data collection. Hosts in both normal and abnormal clusters are checked (503) in order to find suitable hosts with different abilities when considered in varying views, environment and host capabilities. This establishes an early detection of failure (504) where a host selection sequence can be determined by transformation (505) if network nodes have failed, undergone any change, or are redesigned. This transformation is represented by a migration table which is described below.
Figure 6 shows details of a sequence for selection of a suitable host once any hosts have been identified as a potential failure. A host selection sequence is picked and executed (601 ). If any changes to the network nodes are done, the selection sequence is revised in order to reflect this change (602). A migration table is drawn up and potential hosts for migrating the VMs are selected based on this table (603). Further details on the migration table is described later. If the closest suitable host must be selected from two host destinations with the same number of hops from source host
to destination host (604), the method selects the nearest neighbour to migrate the VM to (60S) out of the two host: destinations. If there are no host destinations with die same number of hops from the source, the sequence ends as the hosts are selected based on the selection sequence based on the migration table. An example of this is described further in the following paragraphs,
Figure 7 shows an example of a 5 cluster anomaly of hosts that are determined using the method described above. Using the method described above, dusters 1, 4 and 5 have been determined to be a normal duster based on the collected input/output data but cluster 2 and 3 behave abnormally and contain vague information when comparing neighbourhood clusters with different views, environment and host capabilities. Hosts 1, 2 and 3 fall under normal clusters of 1, 4 and 5, however, hosts
4 and 5 fall within abnomtal dusters of 2 and 3. Therefore, the method provides an awareness of a possibility of failure of hosts 4 and 5 and necessary steps must be prepared for early VM migration before failure occurs.
An example of an early VM migration is explained herein. Table l shows a distance of path or hop destination for migrating VMs wherein VMl, VM2 and VM3 are located in host 1, host 2 and host 3 respectively. If host 4 and host 5 have failed, preparation of a path to migrate VMs are done based on Table 1. From table 1, host 4 and 5 are the closest nodes as it takes the minimum number of hops to get to the destination. Table 2 shows the path of migration based on nearest host (determined by smallest number of hops taken to destination). Host 4 and 5 would be omitted or replaced with new nodes or devices in future predictions if early detection shows
abnormal activity of behaviour. This information is updated constantly as the path keeps changing based on network node changes.
A tracert or traceroute command is used to find number of hops between the source host and the destination host, in the event that a hop path has changed due to an updated device in a node or if a particular node is down. The method then searches for a nearest host based on number of hops taken to destination. For example in the migration of VM3 as seen in Table 2, although the number of hops taken to reach host 2 and 5 is the same, i.e 6 hops, the method selects host 2 as it is physically closer to VM3 (in this example).
it is to be appreciated that the method may be implemented for early failure detection in a data centre before a host becomes unstable. The method is used to migrate VMs out of potentially affected hosts in a failover cluster to a stable host computing system.
Claims
1. A method of migrating virtual machines (300, 400), the method (300, 400) including the steps of:
collecting inpul/Output data of a plurality of hosts over a period of time (301); and
storing the collected data into a time series database for a predetermined period.
characterized by,
building a forecast model using the stored data;
predicting a subsequent input/output data within a margin of error using the forecast model;
comparing actual inpul/output data with the predicted data;
registering data that falls outside the predicted data as an anomaly and clustering data into normal and abnormal data clusters; and
tracing the anomaly back to a relevant host,
wherein the virtual machines located in hosts with anomalous data are migrated out to other hosts (303) based on analysis of collected data (302) and a host selection sequence determined by closest nodes to a source host.
2. The method (300, 400) as claimed in claim l, wherein the anomalies are cheeked for a predetermined period of time (406) and further, anomalous data of hosts are continuously detected (407) until anomalous behaviour returns to normal behaviour in a particular host.
3. The method (300, 400) as claimed in claim 2, wherein the virtual machines are migrated back to the source host once the anomalous behaviour ends
(410).
4. The method (300, 400) as claimed in claim 2, wherein the method (300, 400) is used to build a decision tree based on the virtual machine migration and anomalous behaviour detection over time which is used to construct the forecast model.
5. The method (300, 400) as claimed in claim l, wherein a destination host is selected based on the host selection sequence and nearest neighbouring host if two hosts are within a same number of hops.
6. The method (300, 400) as claimed in claim 1, wherein the host selection sequence is determined by transformation (505) if network nodes have failed. undergone any change, or are redesigned.
7. The method (300, 400) as claimed in claim 6, wherein the host selection sequence is dynamically revised based on changes in network nodes (602) where a migration table is used for identifying potential hosts for migrating the virtual machines.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
MYPI2018002920 | 2018-12-26 | ||
MYPI2018002920 | 2018-12-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020139072A1 true WO2020139072A1 (en) | 2020-07-02 |
Family
ID=71128318
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/MY2019/050127 WO2020139072A1 (en) | 2018-12-26 | 2019-12-24 | A method of migrating virtual machines |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2020139072A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115800272A (en) * | 2023-02-06 | 2023-03-14 | 国网山东省电力公司东营供电公司 | Power grid fault analysis method, system, terminal and medium based on topology identification |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080250265A1 (en) * | 2007-04-05 | 2008-10-09 | Shu-Ping Chang | Systems and methods for predictive failure management |
US20120137293A1 (en) * | 2004-05-08 | 2012-05-31 | Bozek James J | Dynamic migration of virtual machine computer programs upon satisfaction of conditions |
US20130305093A1 (en) * | 2012-05-14 | 2013-11-14 | International Business Machines Corporation | Problem Determination and Diagnosis in Shared Dynamic Clouds |
US20140068608A1 (en) * | 2012-09-05 | 2014-03-06 | Cisco Technology, Inc. | Dynamic Virtual Machine Consolidation |
US20150074023A1 (en) * | 2013-09-09 | 2015-03-12 | North Carolina State University | Unsupervised behavior learning system and method for predicting performance anomalies in distributed computing infrastructures |
-
2019
- 2019-12-24 WO PCT/MY2019/050127 patent/WO2020139072A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120137293A1 (en) * | 2004-05-08 | 2012-05-31 | Bozek James J | Dynamic migration of virtual machine computer programs upon satisfaction of conditions |
US20080250265A1 (en) * | 2007-04-05 | 2008-10-09 | Shu-Ping Chang | Systems and methods for predictive failure management |
US20130305093A1 (en) * | 2012-05-14 | 2013-11-14 | International Business Machines Corporation | Problem Determination and Diagnosis in Shared Dynamic Clouds |
US20140068608A1 (en) * | 2012-09-05 | 2014-03-06 | Cisco Technology, Inc. | Dynamic Virtual Machine Consolidation |
US20150074023A1 (en) * | 2013-09-09 | 2015-03-12 | North Carolina State University | Unsupervised behavior learning system and method for predicting performance anomalies in distributed computing infrastructures |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115800272A (en) * | 2023-02-06 | 2023-03-14 | 国网山东省电力公司东营供电公司 | Power grid fault analysis method, system, terminal and medium based on topology identification |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10375169B1 (en) | System and method for automatically triggering the live migration of cloud services and automatically performing the triggered migration | |
US10048996B1 (en) | Predicting infrastructure failures in a data center for hosted service mitigation actions | |
US8862744B2 (en) | Optimizing traffic load in a communications network | |
US11165690B2 (en) | Request routing based on server software versions | |
US10797938B2 (en) | Automatic monitoring, correlation, and resolution of network alarm conditions | |
US9400731B1 (en) | Forecasting server behavior | |
EP3956771B1 (en) | Timeout mode for storage devices | |
JP2018207241A (en) | Management device, management method, and management program | |
AU2021218159B2 (en) | Utilizing machine learning models to determine customer care actions for telecommunications network providers | |
US20230124166A1 (en) | Application programming interface anomaly detection | |
CN108347339A (en) | A kind of service restoration method and device | |
Diallo et al. | AutoMigrate: a framework for developing intelligent, self-managing cloud services with maximum availability | |
CN105635285B (en) | A kind of VM migration scheduling method based on state aware | |
WO2020139072A1 (en) | A method of migrating virtual machines | |
Tuli et al. | Carol: Confidence-aware resilience model for edge federations | |
CN109818785A (en) | A kind of data processing method, server cluster and storage medium | |
US20230094964A1 (en) | Dynamic management of locations of modules of a platform hosted by a distributed system | |
De Grande et al. | Dynamic partitioning of distributed virtual simulations for reducing communication load | |
CN113596146B (en) | Resource scheduling method and device based on big data | |
WO2023154051A1 (en) | Determining root causes of anomalies in services | |
Inoue et al. | Noise-induced VNE method for software-defined infrastructure with uncertain delay behaviors | |
US11315693B2 (en) | Method and system for managing operation associated with an object on IoT enabled devices | |
CN104883273A (en) | Method and system for processing service influence model in virtualized service management platform | |
KR102467522B1 (en) | High Availability System of Global Sharing Virtualization Resource for Cloud Infrastructure | |
CN113038488B (en) | Link planning method and device for network slice, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19903320 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19903320 Country of ref document: EP Kind code of ref document: A1 |