CN103440160A

CN103440160A - Virtual machine recovering method and virtual machine migration method , device and system

Info

Publication number: CN103440160A
Application number: CN2013103566539A
Authority: CN
Inventors: 刘力力; 于璠
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2013-08-15
Filing date: 2013-08-15
Publication date: 2013-12-11
Anticipated expiration: 2033-08-15
Also published as: CN103440160B

Abstract

The invention discloses a virtual machine recovering method which is applied to a virtualization trunking system. The virtual machine recovering method includes the steps that a management node forecasts whether computational nodes and virtual machines operated on the computational nodes will break down or not; a first snapshot command is sent to the first computational node if the management node forecasts that at least one virtual machine on the first computational node will break down, and therefore the first computational node conducts snapshot on the virtual machine which will break down; a second snapshot command is sent to the second computational node if the management node forecasts that the second computational node will break down, and therefore the second computational node conducts snapshot on the virtual machines operated on the second computational node. The embodiment of the invention further provides a corresponding device and system for the virtual machine recovering method and a virtual machine migration method, device and system. The virtual machine recovering method can obviously shorten the recovery time of the virtual machines, and ensures the continuity of services.

Description

Virtual machine restoration methods and virtual machine migration method and equipment and system

Technical field

The present invention relates to communication technical field, be specifically related to a kind of virtual machine restoration methods and a kind of virtual machine migration method and equipment and system.

Background technology

Along with the develop rapidly of computer technology, increasing company and research institution start to pay close attention to and how to reduce energy consumption and improve resource utilization, and cloud computing is wherein crucial computation schema.Cloud computing is abstracted into specific computational resource by all computing machines, then these computational resources is offered to the user, rather than directly provides one or more computing machine as before.The benefit of cloud computing mode maximum is exactly that the user can apply for resource according to the needs of oneself, avoids the waste of unnecessary resource, improves resource utilization.

The server virtualization technology is the gordian technique based on infrastructure layer in cloud computing.This technology is virtual by physical server (hereinafter referred to as server) is carried out, and realizes at many virtual machines of separate unit physical node deploy, improves the resource utilization of server, reduces use cost.

Along with the demand to computer computation ability constantly promotes, the computing power that independent server virtualization provides can't meet the demand of client for computing machine.At this moment, virtual cluster just arises at the historic moment, and virtual cluster consists of an organic whole by many virtualized servers, thereby promotes the computing power of virtualization system integral body.Virtual cluster carries out unified management to multiple servers, and by Intel Virtualization Technology, by the abstract resource pool formed for the various resources such as storage, calculating, network of physical resource, the mode by the on-demand application resource provides virtual machine to the user.

In order to improve the business continuance of system of virtual cluster, make when the virtual machine of carrying out certain business breaks down, the operation that can recover in time this business, allow the minimal time of service disconnection; Virtual cluster can be supported high techniques available (High Availability, HA) technology usually.The HA technology can regularly be monitored the service condition of virtual machine, when virtual-machine fail occurs, can recover in time the virtual machine of fault, guarantees the business continuance of virtual machine operation.

The virtual cluster of having opened the HA technology is called as the HA cluster.In the HA cluster, each station server all configures a HA agency, and the HA agency detects the state of other servers continuously.The HA agency regularly sends heartbeat signal to other servers, supposes that a station server can't respond heartbeat signal continuous three times, and the HA agency will think this server possible breakdown, and reports corresponding fault.At this moment, HA agency can restart to recover the virtual machine business on other servers in the HA cluster by all virtual machines on failed server, and this situation that just makes virtual-machine fail is transparent for the customer fully, thus the continuity of the business of assurance.

In the research and practice process to prior art, the present inventor finds, by on other servers, restart virtual machine realize failed server on the recovery of virtual machine, need the operations such as initialization operation system, need generally a longer release time.And, after recovery, before virtual machine, the most information of the business of operation has all been lost, these service needed re-execute, and cause business continuance effectively to be guaranteed.

Summary of the invention

The embodiment of the present invention provides a kind of virtual machine restoration methods and equipment and system, to solve to a certain extent the release time that existing virtual machine restoration methods need to be longer and the problem that can not guarantee business continuance.

The embodiment of the present invention also provides a kind of virtual machine migration method and equipment and system, the time of recovering to shorten virtual machine (vm) migration.

First aspect present invention provides a kind of virtual machine restoration methods, is applied to system of virtual cluster; Described system of virtual cluster comprises management node and at least one computing node, and described computing node comprises hardware layer, operates in the host Host on described hardware layer and operates at least one virtual machine on described Host; Described method comprises: described management node predicts whether each virtual machine moved on each computing node and each computing node will break down; At least one virtual machine on the first computing node that described at least one computing node comprises if predict will break down, send the first snapshot and indicate to described the first computing node, so that described the first computing node carries out snapshot at least one virtual machine that will break down; The second computing node that described at least one computing node comprises if predict will break down, and send the second snapshot and will indicate to described the second computing node, so that described the second computing node carries out snapshot to all virtual machines that move on the second computing node.

In the first in possible implementation, before predicting the step whether each virtual machine of moving on each computing node and each computing node will break down, described management node also comprises: the historical failure information of obtaining each virtual machine moved on the historical failure information of each computing node and each computing node; Obtain state and the input and output IO performance data of each virtual machine and the environment temperature of each computing node of each virtual machine.

The possible implementation in conjunction with the first of first aspect, at the second in possible implementation, described management node is predicted whether each virtual machine moved on each computing node and each computing node will break down and comprised: determine more than first a yuan of group that means doubtful fault for each virtual machine, the element in described more than first yuan of group comprises: the IO performance data of the state of virtual machine and virtual machine and the failure rate calculated according to the historical failure information of virtual machine; For each virtual machine is determined more than second a yuan of group that means fault, described more than second yuan of group comprises the threshold value corresponding with each element difference in described more than first yuan of group; Choose a virtual machine, judge in threshold range corresponding to more than second yuan of group of the selected the virtual machine whether element in more than first yuan of group of selected virtual machine falls into, if predict that selected virtual machine will break down.

The possible implementation in conjunction with the first of first aspect, at the third in possible implementation, described management node is predicted whether each virtual machine moved on each computing node and each computing node will break down and comprised: for each computing node is determined the 3rd many yuan of groups that mean doubtful fault, the element in the described the 3rd many yuan of groups comprises: the environment temperature of computing node and the failure rate calculated according to the historical failure information of described computing node; For each computing node is determined the 4th many yuan of groups that mean fault, the described the 4th many yuan of groups comprise and each element in the described the 3rd many yuan of groups corresponding threshold value respectively; Choose a computing node, judge whether element in the 3rd many yuan of groups of selected computing node falls in threshold range corresponding to the 4th many yuan of groups of selected computing node, if predict that selected computing node will break down.

Second aspect present invention provides a kind of virtual machine restoration methods, is applied to system of virtual cluster; Described system of virtual cluster comprises management node and at least one computing node; Described computing node comprises hardware layer, operates in the host Host on described hardware layer and operates at least one the virtual machine VM on described Host; Described method comprises: described computing node receives the first or second snapshot indication that described management node sends, described the first snapshot indication is used to indicate at least one virtual machine moved on described computing node and will breaks down, and described the second snapshot indication is used to indicate described computing node and will breaks down; According to described the first snapshot indication, the virtual machine that will break down is carried out to snapshot, or, according to described the second snapshot indication, all virtual machines that move on described computing node are carried out to snapshot, and preserve corresponding snapshot document; Detect on described computing node and whether have virtual machine that fault has occurred, and whether detection there is other computing node that fault has occurred; Detect on described computing node while having virtual machine really to break down, obtain snapshot document corresponding to the described virtual machine broken down, the described virtual machine broken down is recovered; When other computing node having been detected and really breaking down, obtain snapshot document corresponding to other computing nodes broken down, all virtual machines on other computing nodes that break down are recovered.

In the first, in possible implementation, described method also comprises: obtain the failure message of the virtual machine broken down, or obtain the failure message of other computing nodes that break down, report described management node.

Third aspect present invention provides a kind of management node, is applied to system of virtual cluster, and described system of virtual cluster comprises described management node and at least one computing node; Described management node deploy has the virtual-machine fail prediction module; Described virtual-machine fail prediction module comprises: whether the failure prediction unit, will break down for each virtual machine moved on each computing node of predicting system of virtual cluster and each computing node; The indication transmitting element, if at least one virtual machine predicted on the first computing node that described at least one computing node comprises for the failure prediction unit will break down, send the first snapshot and indicate to described the first computing node, so that described the first computing node carries out snapshot at least one virtual machine that will break down; If the failure prediction unit predicts the second computing node that described at least one computing node comprises and will break down, send the second snapshot and indicate to described the second computing node, so that described the second computing node carries out snapshot to all virtual machines that move on the second computing node.

In the first, in possible implementation, described virtual-machine fail prediction module also comprises: acquiring unit, for the historical failure information of each virtual machine of moving on the historical failure information of obtaining each computing node and each computing node; And state and the input and output IO performance data of each virtual machine and the environment temperature of each computing node of obtaining each virtual machine.

The possible implementation in conjunction with the first of the third aspect, at the second in possible implementation, described failure prediction unit comprises: the first tectonic element, be used to each virtual machine to determine more than first a yuan of group that means doubtful fault, the element in described more than first yuan of group comprises: the IO performance data of the state of virtual machine and virtual machine and the failure rate calculated according to the historical failure information of virtual machine; The second tectonic element, be used to each virtual machine to determine more than second a yuan of group that means fault, and described more than second yuan of group comprises the threshold value corresponding with each element difference in described more than first yuan of group; The first predicting unit, for choosing a virtual machine, judge in threshold range corresponding to more than second yuan of group of the selected the virtual machine whether element in more than first yuan of group of selected virtual machine falls into, if predict that selected virtual machine will break down.

The possible implementation in conjunction with the first of the third aspect, at the third in possible implementation, described failure prediction unit comprises: the 3rd tectonic element, be used to each computing node to determine the 3rd many yuan of groups that mean doubtful fault, the element in the described the 3rd many yuan of groups comprises: the environment temperature of computing node and the failure rate calculated according to the historical failure information of described computing node; The 4th tectonic element, be used to each computing node to determine the 4th many yuan of groups that mean fault, and the described the 4th many yuan of groups comprise and each element in the described the 3rd many yuan of groups corresponding threshold value respectively; The second predicting unit, for choosing a computing node, judge whether element in the 3rd many yuan of groups of selected computing node falls in threshold range corresponding to the 4th many yuan of groups of selected computing node, if predict that selected computing node will break down.

Fourth aspect present invention provides a kind of computing node, is applied to system of virtual cluster; Described computing node comprises: hardware layer, operate in the host Host on described hardware layer and operate at least one the virtual machine VM on described Host; Further be deployed with virtual machine dynamic snapshot module, virtual-machine fail detection module and virtual-machine fail on described Host and recover module; Described virtual machine dynamic snapshot module, the the first or second snapshot indication sent for receiving described management node, described the first snapshot indication is used to indicate at least one virtual machine moved on described computing node and will breaks down, and described the second snapshot indication is used to indicate described computing node and will breaks down; According to described the first snapshot indication, the virtual machine that will break down is carried out to snapshot, or, according to described the second snapshot indication, all virtual machines that move on described computing node are carried out to snapshot, and preserve corresponding snapshot document; Described virtual-machine fail detection module, whether for detection of there being virtual machine that fault has occurred on described computing node, and whether detection has other computing node that fault has occurred; Virtual-machine fail recovers module, while for detecting on described computing node at described virtual-machine fail detection module, having virtual machine really to break down, obtain snapshot document corresponding to the described virtual machine broken down, the described virtual machine broken down is recovered; While having other computing node really to break down, obtain the snapshot document of other computing nodes that break down, all virtual machines on other computing nodes that break down are recovered.

In the first, in possible implementation, described virtual-machine fail detection module, also for obtaining the failure message of the virtual machine broken down, or obtain the failure message of other computing node broken down, and reports described management node.

Fifth aspect present invention provides a kind of system of virtual cluster, and described system of virtual cluster comprises management node and at least one computing node; Described management node is management node as described as third aspect present invention, and described computing node is computing node as described as fourth aspect present invention.

Sixth aspect present invention provides a kind of virtual machine migration method, is applied to system of virtual cluster; Described system of virtual cluster comprises at least two computing nodes, and described at least two computing nodes comprise the first computing node and the second computing node; Wherein, each computing node comprises hardware layer, operates in the host Host on described hardware layer and operates at least one the virtual machine VM on described Host; Described method comprises: the first computing node carries out snapshot to the first virtual machine to be migrated of host on described the first computing node, and preserves corresponding snapshot document; After snapshot completes, described the first virtual machine is hung up; To the second computing node, send for starting needed the first log-on message of described the first virtual machine, so that described the second computing node according to described the first log-on message and described snapshot document on described the second computing node, described the first virtual machine is recovered, made by described the first virtual machine host after migration on described the second computing node.

In the first in possible implementation, described at least two computing nodes also comprise the 3rd computing node, described method also comprises: the first computing node receives the second log-on message that the 3rd computing node sends, and described the second log-on message is for starting the needed information of the second virtual machine to be migrated of host on described the 3rd node; The first computing node, according to described the second log-on message and corresponding snapshot document, is recovered described the second virtual machine, makes by described the second virtual machine host after migration on described the first computing node.

The possible implementation of the first in conjunction with the 6th aspect or the 6th aspect, at the second in possible implementation, described system of virtual cluster also comprises DRS, before described the first computing node carries out the step of snapshot to the first virtual machine to be migrated on described the first computing node, also comprise: described the first computing node receive that described DRS sends, for described the first virtual machine is moved to the migration order of the second computing node from the first computing node; Described the first computing node carries out snapshot to the first virtual machine to be migrated on described the first computing node, and the preservation snapshot document comprises: in response to described migration order, described the first computing node carries out snapshot to the first virtual machine to be migrated of host on described the first computing node, and preserves corresponding snapshot document.

Seventh aspect present invention provides a kind of computing node, is applied to system of virtual cluster; Described computing node comprises hardware layer, operates in the host Host on described hardware layer and operates at least one the virtual machine VM on described Host; Further be deployed with virtual machine dynamic snapshot module, virtual machine information processing module on described Host; Described virtual machine dynamic snapshot module, carry out snapshot for the first virtual machine to be migrated to host on described computing node, and preserve corresponding snapshot document; Described virtual machine information processing module, after completing at snapshot, hang up described the first virtual machine; And will to the second computing node, send for starting the needed log-on message of described the first virtual machine, so that described the second computing node is recovered described the first virtual machine on described the second computing node according to described log-on message and described snapshot document, make by described the first virtual machine host after migration on described the second computing node.

, also be deployed with virtual machine on described Host and restart the recovery module in possible implementation in the first; Described virtual machine information processing module, the second log-on message also sent for receiving the 3rd computing node, described the second log-on message is for starting the needed information of the second virtual machine to be migrated of host on described the 3rd node; Described virtual machine is restarted the recovery module, for according to described the second log-on message and corresponding snapshot document, described the second virtual machine is recovered, and makes by described the second virtual machine host after migration on described the first computing node.

Eighth aspect present invention provides a kind of system of virtual cluster, and described system of virtual cluster comprises at least two computing nodes; One of them computing node is computing node as described as seventh aspect present invention.

The virtual machine restoration methods that the embodiment of the present invention provides, employing is carried out failure prediction by management node to the virtual machine moved on computing node and computing node, if having predicted virtual machine or computing node will break down, the index gauge operator node is to the virtual machine that will break down or the technical scheme of all virtual machines on the computing node that will break down being carried out to snapshot, make: which kind of opportunity system of virtual cluster can be decided in its sole discretion on is carried out snapshot to virtual machine, and needn't adopt artificial mode; And, the choose opportunities of carrying out snapshot virtual machine will but while also not breaking down; Thereby, when virtual machine breaks down really, computing node can carry out fast quick-recovery to virtual machine according to snapshot document.With respect to traditional technological means of restarting virtual machine, the present embodiment method is because having reduced the operations such as initialization operation system, can significantly shorten virtual machine release time, and, because the direct state to will break down the time by business recovery, and needn't re-execute business, can guarantee the continuity of business.

The virtual machine migration method that the embodiment of the present invention provides, adopt snapping technique to carry out virtual machine (vm) migration, can shorten the time that virtual machine (vm) migration recovers, can also reduce computational resource in transition process and the consumption of Internet resources, avoid increasing the weight of the load of computing node, and then avoid the working environment variation of whole system of virtual cluster.

The accompanying drawing explanation

Fig. 1 is the schematic diagram of the system of virtual cluster that provides of one embodiment of the invention;

Fig. 2 is the process flow diagram of the virtual machine restoration methods that provides of one embodiment of the invention;

Fig. 3 is the schematic diagram of four kinds of modules relating to of embodiment of the present invention method;

Fig. 4 is the process flow diagram of the virtual restoration methods that provides of another embodiment of the present invention;

Fig. 5 is the schematic diagram of the management node that provides of the embodiment of the present invention;

Fig. 6 is the schematic diagram of the computing node that provides of the embodiment of the present invention;

Fig. 7 is the process flow diagram of the virtual machine migration method that provides of the embodiment of the present invention;

Fig. 8 is the schematic diagram of the computing node that provides of another embodiment of the present invention.

Embodiment

The embodiment of the present invention provides a kind of virtual machine restoration methods and equipment and system, to solve to a certain extent the release time that existing virtual machine restoration methods need to be longer and the problem that can not guarantee business continuance.The embodiment of the present invention also provides a kind of virtual machine migration method and equipment and system, the time of recovering to shorten virtual machine (vm) migration.

In order to make those skilled in the art person understand better the present invention program, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the embodiment of a part of the present invention, rather than whole embodiment.Embodiment based in the present invention, those of ordinary skills, not making under the creative work prerequisite the every other embodiment obtained, should belong to the scope of protection of the invention.

To facilitate understanding of the present embodiment of the invention, at first at this, introduce the several key elements that can introduce in embodiment of the present invention description;

Virtual machine (Virtual Machine, VM):

Can on a physical computer, simulate one or many virtual computing machines by software virtual machine, and these virtual machines carry out work just as real computing machine, can installing operating system and application program on virtual machine, virtual machine is the addressable network resource also.For for the application program of moving in virtual machine, virtual machine similarly is to carry out work in real computing machine.

Hardware layer:

The hardware platform of virtualized environment operation.Wherein, hardware layer can comprise multiple hardwares, for example the hardware layer of certain computing node can comprise processor (for example CPU) and storer (for example internal memory), can also comprise network interface card, storer etc. high speed/low speed I/O (I/O, Input/Output) equipment, and the miscellaneous equipment with particular procedure function, as input and output memory management unit (IOMMU, Input/Output Memory Management Unit), wherein IOMMU can be used for the conversion of virtual machine physical address and Host physical address.

Host (Host):

As administration and supervision authorities, in order to management, the distribution that completes hardware resource; For virtual machine presents the virtual hardware platform; Realize scheduling and the isolation of virtual machine.Wherein, Host may be monitor of virtual machine (VMM, Virtual Machine Monitor); In addition, VMM and 1 franchise virtual machine coordinate sometimes, and both are in conjunction with forming Host.Wherein, the virtual hardware platform provides various hardware resources to each virtual machine of operation on it, as virtual cpu, internal memory, virtual disk, Microsoft Loopback Adapter etc. are provided.Wherein, this virtual disk can corresponding Host a file or a logical block equipment.Virtual machine operates on the virtual hardware platform that Host is its preparation, the one or more virtual machines of the upper operation of Host.

Embodiment mono-,

The embodiment of the present invention provides a kind of virtual machine restoration methods.The method is applied to system of virtual cluster.

Please refer to Fig. 1, described system of virtual cluster comprises management node 210 and at least one computing node 220; Described computing node 220 comprises hardware layer, operates in the host (Host) on described hardware layer and operate at least one virtual machine on described Host.Preferably/and optional, described at least one computing node 220 in described system of virtual cluster can adopt shared storage, and this shared storage is for storing snapshot document.

Please refer to Fig. 2, described method comprises:

101, described management node predicts whether each virtual machine moved on each computing node and each computing node will break down.

If 102 at least one virtual machine predicted on the first computing node that described at least one computing node comprises will break down, send the first snapshot and indicate to described the first computing node, so that described the first computing node carries out snapshot at least one virtual machine that will break down.

If 103 predict the second computing node that described at least one computing node comprises, will break down, send the second snapshot and indicate to described the second computing node, so that described the second computing node carries out snapshot to all virtual machines that move on the second computing node.

Optionally, before step 101, can also comprise: the historical failure information of obtaining each virtual machine moved on the historical failure information of each computing node and each computing node; And obtain input and output (IO) performance data of state He each virtual machine of each virtual machine and the environment temperature of each computing node., in step 101, can carry out failure prediction according to the state of described historical failure information and each virtual machine and the IO performance data of each virtual machine and the environment temperature of each computing node.

In a kind of embodiment, on described management node 210, can be deployed with the virtual-machine fail prediction module, this virtual-machine fail prediction module can specifically be deployed in the management system on the hardware layer that operates in described management node 210; Can be deployed with virtual-machine fail detection module, virtual machine dynamic snapshot module and virtual-machine fail on described computing node 220 and recover module, these three kinds of modules specifically can be deployed on the Host moved on computing node.Each step of the present embodiment method, specifically can recover module by above-mentioned virtual-machine fail prediction module, virtual-machine fail detection module, virtual machine dynamic snapshot module and virtual-machine fail and carry out and complete.Fig. 3 shows virtual-machine fail prediction module 110, virtual machine dynamic snapshot module 120, and virtual-machine fail detection module 130 and virtual-machine fail recover the mutual relationship between module 140.Below, above-mentioned four kinds of modules are elaborated:

Virtual-machine fail prediction module 110, be deployed on the management node of system of virtual cluster.Above-mentioned steps 101,102 and 103 all can be carried out and be completed by this virtual-machine fail prediction module 110.This module 110 can be obtained by modes such as timing detect the environment temperature of state and IO performance data and each computing node of each virtual machine in virtual cluster.Described IO performance data specifically comprises read rate and writing speed.This module 110 can receive the failure message of the virtual machine that computing node reports and the failure message of computing node, and the failure message of receiving is saved as to historical failure information.Whether it is foundation that this module 110 can be take the state of described virtual machine and the historical failure information of IO performance data and virtual machine, to described virtual machine, will break down and be predicted; Whether can take the environment temperature of computing node and the historical failure information of computing node is foundation, to computing node, will break down and be predicted.Forecasting Methodology is predefined, according to the difference of concrete application scenarios, can preset different Forecasting Methodologies, has a detailed description below.

Virtual machine dynamic snapshot module 120, be deployed on the computing node of system of virtual cluster.This module 120 can be obtained the snapshot that comprises the failure prediction result indication that virtual-machine fail prediction module 110 sends.The indication of described snapshot comprises and is used to indicate the first snapshot indication that at least one virtual machine of moving on described computing node will break down, and is used to indicate the second snapshot indication that described computing node will break down.One or more virtual machines on certain computing node are predicted to be in the time of will breaking down, and the virtual machine dynamic snapshot module 120 on this computing node can be carried out snapshot to one or more virtual machines that will break down within a predefined short period.When computing node itself is predicted to be will break down the time, the virtual machine dynamic snapshot module 120 on this computing node can be carried out snapshot to all virtual machines on this computing node within a predefined short period.The snapshot document that snapshot generates has been preserved all internal memory and the disc informations of virtual machine, in other words, has comprised the operation information that virtual machine is current.The snapshot document that each computing node generates is kept in shared storage, but the equal free access shared storage of each computing node, thus each computing node can obtain the snapshot document that any computing node generates easily.

The virtual machine dynamic snapshot is divided into two kinds of full dose snapshot and incremental snapshots.Full dose is presented a note to all multidate informations of preserving virtual machine soon, but needs the long period; Incremental snapshot is only preserved the change information of a full dose snapshot, and the holding time is very fast.When virtual machine dynamic snapshot module is received the first snapshot indication, first the virtual machine that will break down is done to the full dose snapshot one time, if set in advance as also carrying out incremental snapshot or being necessary to carry out incremental snapshot through the management node judgement, after the full dose snapshot at interval of a period of time for example 30s carry out incremental snapshot one time, to guarantee as far as possible to preserve in the moment that approaches virtual-machine fail the operation information of virtual machine, thus guarantee follow-up can be at the nearest recovering state virtual machine of fault; After a period of time, can think that the fault excessive risk period leaves during from the full dose snapshot, so can stop after a period of time continuing snapshot according to setting in advance to rise when the full dose snapshot.If computing node fault, need virtual machines all on computing node is carried out to snapshot, because the snapshot workload is larger, only virtual machines all on this computing node is carried out to a full dose snapshot and get final product, but the present embodiment is not got rid of follow-up possibility of carrying out incremental snapshot.

Virtual-machine fail detection module 130, be deployed on the computing node of system of virtual cluster.On the one hand, this module 130 regularly detects the state of all virtual machines on the computing node at place, judged whether that virtual machine breaks down, when confirming that certain virtual machine breaks down really, the failure message of this virtual machine is sent to virtual-machine fail prediction module 110, as historical failure information; Simultaneously, 140 pairs of these virtual machines of indication virtual-machine fail recovery module are recovered.On the other hand, this module 130 can regularly detect all other computing nodes, to have judged whether that other computing node breaks down, when confirming that certain computing node breaks down really, the failure message of this computing node is sent to virtual-machine fail prediction module 110, as historical failure information; Simultaneously, on the computing node that indication virtual-machine fail recovery module 140 is sitting at self, all virtual machines on other computing node broken down are recovered.

Virtual-machine fail recovers module 140, is deployed on the computing node of system of virtual cluster.This module 140, after receiving the recovery indication that virtual-machine fail detection module 130 sends, is recovered all virtual machines on the virtual machine broken down or the computing node broken down.The method of recovering virtual machine is: at first obtain 120 pairs of virtual machine dynamic snapshot modules, before breaking down, virtual machine is carried out to the snapshot document that snapshot obtains, then according to the operation information of the virtual machine of preserving in snapshot document, virtual machine is carried out to fast quick-recovery.Wherein, if fault has occurred certain virtual machine on this computing node, only the virtual machine of this fault is recovered, if fault has occurred other certain computing node, all virtual machines on the computing node of fault are recovered.

In practical application, described virtual-machine fail prediction module 110, can carry out failure prediction to virtual machine and computing node according to predetermined prediction algorithm.In order to make prediction algorithm more targeted, predict the outcome more accurate, predetermined speed is faster, and the embodiment of the present invention is divided into two large classes by virtual-machine fail, for each class virtual-machine fail, defines respectively different prediction algorithms.In described two large class virtual-machine fails, the first kind is single virtual-machine fail, and the critical data of this class virtual-machine fail accesses virtual machine during normally due to the virtual machine self-operating unsuccessfully causes, and also can be called the fault of virtual machine self; Equations of The Second Kind is the computing node fault at virtual machine place, this class virtual-machine fail normally due to computing node for a long time under high load condition, hardware problem goes wrong, for example central processing unit (CPU) excess Temperature causes.Below respectively the prediction algorithm of the virtual-machine fail of two large classes is described:

For single virtual-machine fail:

Because the fault of virtual machine self is common and the modification of virtual machine, and virtual machine accesses virtual machine mirror image is relevant, so in the embodiment of the present invention, can be by input and output (IO) performance data of monitoring virtual machine, and the status information of combined with virtual machine and historical failure information, carry out the prediction of this class fault.

At first, can in the virtual-machine fail prediction module, construct all doubtful faults of set expression polynary group,, for each virtual machine is determined more than first a yuan of group that means doubtful fault, each element in more than first yuan of group comprises: the state of described virtual machine and IO performance data and the failure rate calculated according to the historical failure information of described virtual machine.For example, the set of this more than first yuan of group can be expressed as S _vm={<Status,<DiskIO, NetWorkIO>, Rate>, wherein, Status means the state of virtual machine, DiskIO means the disk I/O performance of virtual machine, and NetWorkIO means the network I/O performance of virtual machine, and Rate means failure rate, that is the probability that under this state that the historical failure information, detected according to the virtual-machine fail detection module obtains, virtual machine breaks down.

Secondly, according to S set _vm, can define a set F _vm, mean polynary group of fault, that is, for each virtual machine is determined more than second a yuan of group, more than second yuan of group comprises and each element in described more than first yuan of group corresponding threshold value respectively.Because there is certain fluctuation in performance data itself, for performance data, the present invention increases a mobility scale Δ P and judges that performance data is whether in polynary group in fault.For example, the set of more than second yuan of group can be expressed as follows: F _vm=<Status,<DiskIO ± Δ P, NetWorkIO ± Δ P>| and r>RATEFail, r is Rate value in S _vmwith<Status,<DiskIO, NetWorkIO>.Wherein, RATEFail is self-defining threshold value, means to think that higher than this probability virtual machine will break down, and r is a probable value.As can be seen here, F _vmin an element be a scope, when the state of certain virtual machine and the failure rate under IO performance data and current state drop on F _vmin, be considered to this virtual machine and break down possibly.

The process that the virtual-machine fail prediction module is carried out failure prediction specifically comprises: choose a virtual machine, judge whether element in described more than the first yuan of group of selected virtual machine falls in threshold range corresponding to described more than the second yuan of group of this virtual machine, if predict that selected virtual machine will break down; Then, continue to choose next virtual machine and predicted, until all predict complete; Successively each virtual machine is predicted, can be also to predict a plurality of virtual machines simultaneously.Illustrate, can regularly detect the performance data of state and the IO of each virtual machine in virtual cluster, if find that there is more than first yuan of group of virtual machine, be present in F _vmin set, mean that this virtual machine may break down, will send snapshot and indicate to virtual machine dynamic snapshot module, by virtual machine dynamic snapshot module, this virtual machine is carried out to snapshot, carry out the front protection of fault.But, because there is certain alternative in the state of virtual machine, for the virtual machine in transition state, can't carry out snapshot operation.So, before changing virtual machine state, the management node of system of virtual cluster can call the virtual-machine fail prediction module, imports the state that virtual machine is new into, if new state adds that the IO performance data of virtual machine is at F _vmin set, need to carry out snapshot with Backup Data, now the management node of system of virtual cluster can determine whether that carry out immediately snapshot prevents virtual-machine fail by service priority.

After virtual machine dynamic snapshot module receives the snapshot that has virtual machine the to break down indication of virtual machine prediction module transmission, can carry out dynamic snapshot to virtual machine, preserve the running state information of virtual machine.The virtual machine dynamic snapshot is divided into two kinds of full dose snapshot and incremental snapshots.If independent virtual-machine fail, after first doing in principle full dose snapshot, optionally, afterwards at interval of a period of time for example 30s carry out incremental snapshot one time, to guarantee at the nearest recovering state virtual machine of fault, after after a while, can think that the fault excessive risk period leaves, so can stop continuing snapshot during from the full dose snapshot.Whether carry out incremental snapshot, can be judged by management node, also can set in advance.

Wherein, according to virtual machine, in different states, corresponding snapshot policy can be different:

For the state with not mutual exclusion of snapshot operation, as running status, first carry out one time the full dose snapshot, risk (t afterwards _risk) in the time, at interval of a period of time temporary variable (Δ t for example _interval) time carries out incremental snapshot one time, guarantees follow-up recovery virtual machine state that can be the fastest;

For the state with the snapshot operation mutual exclusion, as transition state, if need snapshot, only carry out one time the full dose snapshot.

When the virtual-machine fail detection module detects virtual-machine fail, need the information such as the IO performance data of virtual machine at that time and state, as the historical failure information reporting to the virtual-machine fail prediction module.The virtual-machine fail prediction module is set up according to the historical failure information of virtual machine or is upgraded corresponding polynary group, and, according to different situations, upgrades Rate value: if this virtual-machine fail by the virtual-machine fail prediction module, predicted out before, the Rate increase by 2; If this virtual-machine fail does not have predicted mistake, Rate increases by 1.

Computing node fault for the virtual machine place:

Because virtual machine is for example to move on server at concrete computing node, so when the computing node at virtual machine place breaks down, all virtual machines on this computing node all can break down, at this moment, all virtual machines all need to recover, and are easy to cause the problems such as service disconnection.

Near temperature computing node fault and computing node has direct relation, and temperature is higher, and the probability that computing node breaks down is larger.So the embodiment of the present invention is by monitoring the environment temperature of computing node, and, in conjunction with the historical failure information of computing node, carry out failure prediction.

At first, polynary group of all doubtful faults of a set expression computing node of structure in the virtual-machine fail prediction module,, for each computing node is determined the 3rd many yuan of groups that mean doubtful fault, the element in the 3rd many yuan of groups comprises: the environment temperature of computing node and the failure rate calculated according to the historical failure information of described computing node.For example, the set of the 3rd many yuan of groups can be expressed as follows S _host=<T, t, Rate>| and T>T _fail; wherein; T means the environment temperature of server; and only has temperature higher than default dangerous temperature; just start the state of monitoring calculation node, t means that computing node is stabilized in the time of temperature T, and Rate means failure rate; that is the probability that under this state that the historical failure information, detected according to the virtual-machine fail detection module obtains, computing node breaks down.Because temperature itself is a continuous value variable, in order to monitor better, the present invention is divided into a time gradient by temperature by scale, and every Δ T is as a class, and 0-Δ T is one grade (not comprising Δ T), and Δ T-2* Δ T is next shelves, etc.

Secondly, according to S _host, can mean for each computing node structure the 4th many yuan of groups of the fault of this computing node, the 4th many yuan of groups comprise the threshold value corresponding with each element difference in described more than first yuan of group.For example, the set of the 4th many yuan of groups can be expressed as F _host=<T, t>| and r>RATEFail, r is Rate value in S _hostwith<T, t>.Wherein, RATEFail is self-defining threshold value, means to think that higher than this probability virtual machine will break down, and r is a probable value.

The process that the virtual-machine fail prediction module is carried out failure prediction specifically comprises: choose a computing node, judge whether element in the described the 3rd many yuan of groups of selected computing node falls in threshold range corresponding to the described the 4th many yuan of groups of selected computing node, if determine that selected computing node will break down; Then, continue to choose next computing node and predicted, until all predict complete; Successively each computing node is predicted, can be also to predict a plurality of computing nodes simultaneously.Illustrate, the environment temperature of monitoring calculation node regularly, when the environment temperature of finding computing node surpasses T _fail, just according to temperature statistics S _hostin set<T, t>polynary group.When find that there is computing node<T, t+ Δ t>be present in the set F _hostin, think that this computing node, after the Δ t time, may break down, will notify the virtual machine dynamic snapshot module on this computing node to carry out snapshot to all virtual machines on this computing node, preserve the running status of virtual machine.Herein, it is mainly for the reserved regular hour of virtual machine snapshot that Δ t purpose is set, and guarantees that the information of virtual machine can effectively be preserved.

After virtual machine dynamic snapshot module receives the snapshot indication of the prediction and calculation node failure that the virtual-machine fail prediction module sends, can carry out snapshot to all virtual machines on computing node, to preserve virtual machine state.Owing to may existing multi-dummy machine to need snapshot, at this moment, can only to all virtual machines, carry out the full dose snapshot one time, do not carry out afterwards incremental snapshot, to avoid increasing the computing node load.

In addition, when the virtual-machine fail detection module detects the computing node fault, can obtain failure message, as the historical failure information reporting to the virtual-machine fail prediction module, the virtual-machine fail prediction module can be current correspondence<T, t>polynary group be updated to S set _hostin, and upgrade corresponding Rate value, make Rate increase by 1.

Because all belonging to low probability, virtual machine and computing node fault occur, so for S set _vm, F _vm, S _host, F _host, in the embodiment of the present invention, preferably by the test mode of off-line in advance, collect the initial state information of certain failure message as the real system operation.

Above, the embodiment of the present invention provides a kind of virtual machine restoration methods, employing is carried out failure prediction by management node to computing node and operation virtual machine thereon, if having predicted virtual machine or computing node will break down, the index gauge operator node carries out snapshot to the virtual machine that will break down or all virtual machines on the computing node that will break down is carried out to the technical scheme of snapshot, this scheme makes: which kind of opportunity system of virtual cluster can be decided in its sole discretion on is carried out snapshot to virtual machine, and needn't adopt artificial mode; And, the choose opportunities of carrying out snapshot virtual machine will but while also not breaking down; Thereby, when virtual machine breaks down really, computing node can quickly recover to the state before fault by virtual machine according to snapshot document, with respect to traditional technology of restarting virtual machine, because having reduced the operations such as initialization operation system, can significantly shorten virtual machine release time, and, because the direct state to will break down the time by business recovery, and needn't re-execute business, guaranteed the continuity of business.

Embodiment bis-,

Please refer to Fig. 4, the embodiment of the present invention provides a kind of virtual machine restoration methods, is applied to system of virtual cluster.Described system of virtual cluster comprises management node and at least one computing node, and described computing node comprises hardware layer, operates in the host (Host) on described hardware layer and operate at least one virtual machine on described Host.Preferably/and optional, described at least one computing node in described system of virtual cluster can adopt shared storage, and this shared storage is for storing snapshot document;

The present embodiment method be take computing node as executive agent, and described method comprises:

401, described computing node receives the first or second snapshot indication that described management node sends, described the first snapshot indication is used to indicate at least one virtual machine moved on described computing node and will breaks down, and described the second snapshot indication is used to indicate described computing node and will breaks down;

402, according to described the first snapshot indication, the virtual machine that will break down is carried out to snapshot, or, according to described the second snapshot indication, all virtual machines that move on described computing node are carried out to snapshot, and preserve snapshot document;

403, detect on described computing node and whether have virtual machine that fault has occurred, and whether detection there is other computing node that fault has occurred;

404, detect on described computing node while having virtual machine really to break down, obtain snapshot document corresponding to the described virtual machine broken down, the described virtual machine broken down is recovered;

When 405, other computing node having been detected and really break down, obtain snapshot document corresponding to other computing nodes broken down, all virtual machines on other computing nodes that break down are recovered.

Optionally, described method also comprises:

Obtain the failure message of the virtual machine broken down, or obtain the failure message of other computing nodes that break down, report described management node.This step is carried out after 403, can be before 404 and 405, afterwards or simultaneously.

About the more detailed description of the present embodiment method, please refer to the description in embodiment mono-.

Embodiment tri-,

Please refer to Fig. 5, the embodiment of the present invention provides a kind of management node 210, and for system of virtual cluster, described system of virtual cluster comprises described management node and at least one computing node.

Described management node 210 deploy have virtual-machine fail prediction module 110;

Described virtual-machine fail prediction module 110 comprises:

Whether failure prediction unit 1101, will break down for each virtual machine moved on each computing node of predicting system of virtual cluster and each computing node;

Indication transmitting element 1102, if at least one virtual machine predicted on the first computing node that described at least one computing node comprises for the failure prediction unit will break down, send the first snapshot and indicate to described the first computing node, so that described the first computing node carries out snapshot at least one virtual machine that will break down; If the failure prediction unit predicts the second computing node that described at least one computing node comprises and will break down, send the second snapshot and indicate to described the second computing node, so that described the second computing node carries out snapshot to all virtual machines that move on the second computing node.

Optionally, described virtual-machine fail prediction module can also comprise:

Acquiring unit, for the historical failure information of each virtual machine of moving on the historical failure information of obtaining each computing node and each computing node; And state and the input and output IO performance data of each virtual machine and the environment temperature of each computing node of obtaining each virtual machine.

In a kind of embodiment, described failure prediction unit 1101 comprises:

The first tectonic element, be used to each virtual machine to determine more than first a yuan of group that means doubtful fault, the element in described more than first yuan of group comprises: the IO performance data of the state of virtual machine and virtual machine and the failure rate calculated according to the historical failure information of virtual machine;

The second tectonic element, be used to each virtual machine to determine more than second a yuan of group that means fault, and described more than second yuan of group comprises the threshold value corresponding with each element difference in described more than first yuan of group;

The first predicting unit, for choosing a virtual machine, judge in threshold range corresponding to more than second yuan of group of the selected the virtual machine whether element in more than first yuan of group of selected virtual machine falls into, if predict that selected virtual machine will break down.

In another kind of embodiment, described failure prediction unit 1101 comprises:

The 3rd tectonic element, be used to each computing node to determine the 3rd many yuan of groups that mean doubtful fault, the element in the described the 3rd many yuan of groups comprises: the environment temperature of computing node and the failure rate calculated according to the historical failure information of described computing node;

The 4th tectonic element, be used to each computing node to determine the 4th many yuan of groups that mean fault, and the described the 4th many yuan of groups comprise and each element in the described the 3rd many yuan of groups corresponding threshold value respectively;

The second predicting unit, for choosing a computing node, judge whether element in the 3rd many yuan of groups of selected computing node falls in threshold range corresponding to the 4th many yuan of groups of selected computing node, if predict that selected computing node will break down.

Above, the embodiment of the present invention provides a kind of management node, please refer to the content of record in embodiment mono-about the more detailed description of this management node.Whether this management node can will break down and be predicted the virtual machine moved on computing node and computing node, before virtual machine or computing node will break down, the index gauge operator node carries out snapshot to virtual machine, while really breaking down with convenient virtual machine or computing node, the snapshot document that computing node can obtain according to snapshot carries out fast quick-recovery to virtual machine, thereby, make: which kind of opportunity system of virtual cluster can be decided in its sole discretion on is carried out snapshot to virtual machine, and needn't adopt artificial mode; And, make computing node virtual machine be quickly recovered to the state before fault according to snapshot document, with respect to traditional technology of restarting virtual machine, because having reduced the operations such as initialization operation system, can significantly shorten virtual machine release time, guarantee the continuity of business because re-executing business.

Embodiment tetra-,

Please refer to Fig. 6, the embodiment of the present invention provides a kind of computing node 220, for system of virtual cluster.

Described computing node 220 comprises: hardware layer, operate in the host (Host) on described hardware layer and operate at least one virtual machine on described Host; Further be deployed with virtual-machine fail detection module 130, virtual machine dynamic snapshot module 120 and virtual-machine fail on described Host and recover module 140;

Described virtual machine dynamic snapshot module 120, the the first or second snapshot indication sent for receiving described management node, described the first snapshot indication is used to indicate at least one virtual machine moved on described computing node and will breaks down, and described the second snapshot indication is used to indicate described computing node and will breaks down; According to described the first snapshot indication, the virtual machine that will break down is carried out to snapshot, or, according to described the second snapshot indication, all virtual machines that move on described computing node are carried out to snapshot, and preserve corresponding snapshot document;

Whether described virtual-machine fail detection module 130, for detecting on described computing node, have virtual machine that fault has occurred, and whether detection has other computing node that fault has occurred;

Described virtual-machine fail recovers module 140, while for detecting on described computing node at described virtual-machine fail detection module, having virtual machine really to break down, obtain snapshot document corresponding to the described virtual machine broken down, the described virtual machine broken down is recovered; While having other computing node really to break down, obtain the snapshot document of other computing nodes that break down, all virtual machines on other computing nodes that break down are recovered.

Optionally, described virtual-machine fail detection module 130, also for obtaining the failure message of the virtual machine broken down, or obtain the failure message of other computing node broken down, and reports management node.

Above, the embodiment of the invention discloses a kind of computing node, please refer to the content of record in embodiment mono-and two about the more detailed description of this computing node.This computing node can be according to the indication of management node, before the virtual machine moved on computing node or computing node will break down, virtual machine is carried out to snapshot, when virtual machine or computing node break down really, the snapshot document obtained according to snapshot quickly recovers to the state before fault by virtual machine, with respect to traditional technology of restarting virtual machine, because having reduced the operations such as initialization operation system, can significantly shorten virtual machine release time, guarantee the continuity of business because re-executing business.

Embodiment five,

Please refer to Fig. 1, the embodiment of the present invention provides a kind of system of virtual cluster.

Described system of virtual cluster comprises management node 210 and at least one computing node 220, preferably/optional, described at least one computing node 220 in described system of virtual cluster can adopt shared storage, and this shared storage is for storing snapshot document;

Wherein, described management node is the management node of describing in embodiment tri-, and described computing node is the computing node of describing as in embodiment tetra-.

Please refer to the content of record in embodiment mono-to embodiment tetra-about the more detailed description of this system of virtual cluster.

Above, the embodiment of the present invention provides a kind of system of virtual cluster, employing is carried out failure prediction by management node to computing node and operation virtual machine thereon, if having predicted virtual machine or computing node will break down, the index gauge operator node carries out snapshot to virtual machine, when virtual machine or computing node break down really, computing node can carry out according to done snapshot the technical scheme of fast quick-recovery to virtual machine, this scheme makes: which kind of opportunity system of virtual cluster can be decided in its sole discretion on is carried out snapshot to virtual machine, and needn't adopt artificial mode, and, the choose opportunities of carrying out snapshot virtual machine will but while also not breaking down, thereby, when virtual machine breaks down really, computing node can quickly recover to the state before fault by virtual machine according to snapshot document, with respect to traditional technology of restarting virtual machine, because having reduced the operations such as initialization operation system, can significantly shorten virtual machine release time, guarantee the continuity of business because re-executing business.

Embodiment six,

The embodiment of the present invention provides a kind of virtual machine migration method, is applied to system of virtual cluster.

In prior art, in the time of the load too high of certain computing node, the distributed scheduling resource entity of system of virtual cluster (Distributed Resource Schedule, DRS) can on computing node virtual machine (vm) migration is lower to other loads by thermophoresis, move, thereby reduce the load of the high computing node of load, make whole system more stable.When the system of virtual cluster overall load is low, DRS can by hang down virtual machine on the server of load all migration walk, and close these computing nodes, save the energy consumption of cluster.But these migrations are all adjusted by the live migration of virtual machine technology.Although the live migration of virtual machine technology can guarantee that in virtual machine (vm) migration, service outage duration is very little, but in the thermophoresis process, need to consume extra computational resource and Internet resources, likely cause the load of computing node further to raise, make the deleterious of virtual machine service.The virtual machine migration method that the embodiment of the present invention provides proposes for the problems referred to above, and purpose is to reduce computational resource in transition process and the consumption of Internet resources, avoids increasing the weight of the load of computing node.

Described system of virtual cluster comprises at least two computing nodes, and described at least two computing nodes comprise the first computing node and the second computing node; Wherein, each described computing node comprises hardware layer, operates in the host (Host) on described hardware layer and operate at least one virtual machine on described Host.Preferably/and optional, described at least two computing nodes in described system of virtual cluster can adopt shared storage, and this shared storage is for storing snapshot document.

Please refer to Fig. 7, the virtual machine migration method that the embodiment of the present invention provides comprises:

701, the first computing node carries out snapshot to the first virtual machine to be migrated of host on described the first computing node, and preserves corresponding snapshot document;

702,, after snapshot completes, described the first virtual machine is hung up;

703, will to the second computing node, send for starting needed the first log-on message of described the first virtual machine, so that described the second computing node according to described the first log-on message and described snapshot document on described the second computing node, described the first virtual machine is recovered, made by described the first virtual machine host after migration on described the second computing node.

Wherein, described the first log-on message comprises: virtual check figure, memory amount, IO facility information, the main configuration information of first virtual machines such as virtual disk information.

Optionally, described system of virtual cluster also comprises DRS, before step 701, can also comprise: described the first computing node receive that described DRS sends, for described the first virtual machine is moved to the migration order of the second computing node from the first computing node; Step 701 can comprise: in response to described migration order, described the first computing node carries out snapshot to the first virtual machine to be migrated of host on described the first computing node, and preserves corresponding snapshot document.

Optionally, described at least two computing nodes also comprise the 3rd computing node, described method also comprises: the first computing node receives the second log-on message that the 3rd computing node sends, and described the second log-on message is for starting the needed information of the second virtual machine to be migrated of host on described the 3rd node; According to the second log-on message and corresponding snapshot document, described the second virtual machine is recovered, make by described the second virtual machine host after migration on described the first computing node.

Wherein, described the second log-on message comprises: virtual check figure, memory amount, IO facility information, the main configuration information of second virtual machines such as virtual disk information.

Above, the embodiment of the present invention provides a kind of virtual machine migration method, the method adopts snapping technique to carry out virtual machine (vm) migration, can shorten the time that virtual machine (vm) migration recovers, can reduce computational resource in transition process and the consumption of Internet resources, avoid increasing the weight of the load of computing node, and then avoid the working environment variation of whole system of virtual cluster.

Embodiment seven,

Please refer to Fig. 8, the embodiment of the present invention provides a kind of computing node.

This computing node is applied to system of virtual cluster; Comprise hardware layer, operate in the host (Host) on described hardware layer and operate at least one virtual machine on described Host;

Further be deployed with virtual machine dynamic snapshot module 810, virtual machine information processing module 820 on described Host;

Described virtual machine dynamic snapshot module 810, carry out snapshot for the first virtual machine to be migrated to host on described computing node, and preserve corresponding snapshot document;

Described virtual machine information processing module 820, after completing at snapshot, hang up described the first virtual machine; And will to the second computing node, send for starting the needed log-on message of described the first virtual machine, so that described the second computing node is recovered described the first virtual machine on described the second computing node according to described log-on message and described snapshot document, make by described the first virtual machine host after migration on described the second computing node.

Optionally, also being deployed with virtual machine on described Host restarts and recovers module 830; Described virtual machine information processing module 820, the second log-on message also sent for receiving the 3rd computing node, described the second log-on message is for starting the needed information of the second virtual machine to be migrated of host on described the 3rd node; Described virtual machine is restarted recovery module 830, for according to described the second log-on message and corresponding snapshot document, described the second virtual machine being recovered, makes by described the second virtual machine host after migration on described the first computing node.

Above, the embodiment of the present invention provides a kind of computing node, please refer to the content of record in embodiment six about this computing node more detailed description, described computing node adopts snapping technique to carry out virtual machine (vm) migration, can shorten the time that virtual machine (vm) migration recovers, can reduce computational resource in transition process and the consumption of Internet resources, avoid increasing the weight of the load of computing node, and then avoid the working environment variation of whole system of virtual cluster.

Embodiment eight,

The embodiment of the present invention provides a kind of system of virtual cluster.Described system of virtual cluster comprises at least two computing nodes, and one of them computing node is computing node as described as claim 22 or 23.Preferably/and optional, described at least two computing nodes in described system of virtual cluster can adopt shared storage, and this shared storage is for storing snapshot document.

Optionally, described system of virtual cluster also comprises: distributed scheduling resource entity (DRS); Described DRS, move to the first virtual machine the migration order of the second computing node to described the first computing node for sending from the first computing node.

Above, the embodiment of the invention discloses a kind of system of virtual cluster, please refer to the content of record in embodiment six about the more detailed description of this system.

The system of virtual cluster that the embodiment of the present invention provides, adopt snapping technique to carry out virtual machine (vm) migration, can shorten the time that virtual machine (vm) migration recovers, can reduce computational resource in transition process and the consumption of Internet resources, avoid increasing the weight of the load of server, and then avoid the working environment variation of whole system of virtual cluster.

One of ordinary skill in the art will appreciate that all or part of step in the whole bag of tricks of above-described embodiment can complete by hardware, hardware that also can be relevant by programmed instruction (for example processor) completes, this program can be stored in a computer-readable recording medium, and storage medium can comprise: ROM (read-only memory), random-access memory, disk or CD etc.

The above virtual machine restoration methods that the embodiment of the present invention is provided and equipment and system and virtual machine migration method and equipment and system are described in detail, but the explanation of above embodiment is just understood method of the present invention and core concept thereof for helping, and should not be construed as limitation of the present invention.In the technical scope that those skilled in the art disclose in the present invention, the variation that can expect easily or replacement, within all should being encompassed in protection scope of the present invention.

Claims

1. a virtual machine restoration methods, is characterized in that, is applied to system of virtual cluster;

Described system of virtual cluster comprises management node and at least one computing node, and described computing node comprises hardware layer, operates in the host Host on described hardware layer and operates at least one virtual machine on described Host;

Described method comprises:

Described management node predicts whether each virtual machine moved on each computing node and each computing node will break down;

At least one virtual machine on the first computing node that described at least one computing node comprises if predict will break down, send the first snapshot and indicate to described the first computing node, so that described the first computing node carries out snapshot at least one virtual machine that will break down;

The second computing node that described at least one computing node comprises if predict will break down, and send the second snapshot and will indicate to described the second computing node, so that described the second computing node carries out snapshot to all virtual machines that move on the second computing node.

2. method according to claim 1, is characterized in that, before described management node is predicted the step whether each virtual machine of moving on each computing node and each computing node will break down, also comprises:

Obtain the historical failure information of each virtual machine moved on the historical failure information of each computing node and each computing node;

Obtain state and the input and output IO performance data of each virtual machine and the environment temperature of each computing node of each virtual machine.

3. method according to claim 2, is characterized in that, described management node is predicted whether each virtual machine moved on each computing node and each computing node will break down and comprised:

Determine more than first a yuan of group that means doubtful fault for each virtual machine, the element in described more than first yuan of group comprises: the IO performance data of the state of virtual machine and virtual machine and the failure rate calculated according to the historical failure information of virtual machine;

For each virtual machine is determined more than second a yuan of group that means fault, described more than second yuan of group comprises the threshold value corresponding with each element difference in described more than first yuan of group;

Choose a virtual machine, judge in threshold range corresponding to more than second yuan of group of the selected the virtual machine whether element in more than first yuan of group of selected virtual machine falls into, if predict that selected virtual machine will break down.

4. method according to claim 2, is characterized in that, described management node is predicted whether each virtual machine moved on each computing node and each computing node will break down and comprised:

For each computing node is determined the 3rd many yuan of groups that mean doubtful fault, the element in the described the 3rd many yuan of groups comprises: the environment temperature of computing node and the failure rate calculated according to the historical failure information of described computing node;

For each computing node is determined the 4th many yuan of groups that mean fault, the described the 4th many yuan of groups comprise and each element in the described the 3rd many yuan of groups corresponding threshold value respectively;

Choose a computing node, judge whether element in the 3rd many yuan of groups of selected computing node falls in threshold range corresponding to the 4th many yuan of groups of selected computing node, if predict that selected computing node will break down.

5. a virtual machine restoration methods, is characterized in that, is applied to system of virtual cluster;

Described system of virtual cluster comprises management node and at least one computing node; Described computing node comprises hardware layer, operates in the host Host on described hardware layer and operates at least one the virtual machine VM on described Host;

Described method comprises:

Described computing node receives the first or second snapshot indication that described management node sends, described the first snapshot indication is used to indicate at least one virtual machine moved on described computing node and will breaks down, and described the second snapshot indication is used to indicate described computing node and will breaks down;

According to described the first snapshot indication, the virtual machine that will break down is carried out to snapshot, or, according to described the second snapshot indication, all virtual machines that move on described computing node are carried out to snapshot, and preserve corresponding snapshot document;

Detect on described computing node and whether have virtual machine to break down, and whether detection there is other computing node to break down;

Detect on described computing node while having virtual machine really to break down, obtain snapshot document corresponding to the described virtual machine broken down, the described virtual machine broken down is recovered;

When other computing node having been detected and really breaking down, obtain snapshot document corresponding to other computing nodes broken down, all virtual machines on other computing nodes that break down are recovered.

6. method according to claim 5, is characterized in that, also comprises:

Obtain the failure message of the virtual machine broken down, or obtain the failure message of other computing nodes that break down, report described management node.

7. a management node, is characterized in that, is applied to system of virtual cluster, and described system of virtual cluster comprises described management node and at least one computing node; Described management node deploy has the virtual-machine fail prediction module; Described virtual-machine fail prediction module comprises:

Whether the failure prediction unit, will break down for each virtual machine moved on each computing node of predicting system of virtual cluster and each computing node;

The indication transmitting element, if at least one virtual machine predicted on the first computing node that described at least one computing node comprises for the failure prediction unit will break down, send the first snapshot and indicate to described the first computing node, so that described the first computing node carries out snapshot at least one virtual machine that will break down; If the failure prediction unit predicts the second computing node that described at least one computing node comprises and will break down, send the second snapshot and indicate to described the second computing node, so that described the second computing node carries out snapshot to all virtual machines that move on the second computing node.

8. management node according to claim 7 is characterized in that:

Described virtual-machine fail prediction module also comprises:

9. management node according to claim 8 is characterized in that:

Described failure prediction unit comprises:

10. management node according to claim 8 is characterized in that:

Described failure prediction unit comprises:

11. a computing node, is characterized in that, is applied to system of virtual cluster; Described computing node comprises: hardware layer, operate in the host Host on described hardware layer and operate at least one the virtual machine VM on described Host; Further be deployed with virtual machine dynamic snapshot module, virtual-machine fail detection module and virtual-machine fail on described Host and recover module;

Described virtual machine dynamic snapshot module, the the first or second snapshot indication sent for receiving described management node, described the first snapshot indication is used to indicate at least one virtual machine moved on described computing node and will breaks down, and described the second snapshot indication is used to indicate described computing node and will breaks down; According to described the first snapshot indication, the virtual machine that will break down is carried out to snapshot, or, according to described the second snapshot indication, all virtual machines that move on described computing node are carried out to snapshot, and preserve corresponding snapshot document;

Described virtual-machine fail detection module, whether for detection of there being virtual machine that fault has occurred on described computing node, and whether detection has other computing node that fault has occurred;

Virtual-machine fail recovers module, while for detecting on described computing node at described virtual-machine fail detection module, having virtual machine really to break down, obtain snapshot document corresponding to the described virtual machine broken down, the described virtual machine broken down is recovered; While having other computing node really to break down, obtain the snapshot document of other computing nodes that break down, all virtual machines on other computing nodes that break down are recovered.

12. computing node according to claim 11 is characterized in that:

Described virtual-machine fail detection module, also for obtaining the failure message of the virtual machine broken down, or obtain the failure message of other computing node broken down, and reports described management node.

13. a system of virtual cluster, is characterized in that,

Described system of virtual cluster comprises management node and at least one computing node;

Described management node is as arbitrary described management node in claim 7 to 10, and described computing node is computing node as described as claim 11 or 12.

14. a virtual machine migration method, is characterized in that, is applied to system of virtual cluster;

Described system of virtual cluster comprises at least two computing nodes, and described at least two computing nodes comprise the first computing node and the second computing node; Wherein, each computing node comprises hardware layer, operates in the host Host on described hardware layer and operates at least one the virtual machine VM on described Host;

Described method comprises:

The first computing node carries out snapshot to the first virtual machine to be migrated of host on described the first computing node, and preserves corresponding snapshot document;

After snapshot completes, described the first virtual machine is hung up;

To the second computing node, send for starting needed the first log-on message of described the first virtual machine, so that described the second computing node according to described the first log-on message and described snapshot document on described the second computing node, described the first virtual machine is recovered, made by described the first virtual machine host after migration on described the second computing node.

15. method according to claim 14, is characterized in that, described at least two computing nodes also comprise the 3rd computing node, and described method also comprises:

The first computing node receives the second log-on message that the 3rd computing node sends, and described the second log-on message is for starting the needed information of the second virtual machine to be migrated of host on described the 3rd node;

The first computing node, according to described the second log-on message and corresponding snapshot document, is recovered described the second virtual machine, makes by described the second virtual machine host after migration on described the first computing node.

16. method according to claim 14, it is characterized in that, described system of virtual cluster also comprises distributed scheduling resource entity DRS, before described the first computing node carries out the step of snapshot to the first virtual machine to be migrated on described the first computing node, also comprises:

That described the first computing node receives is that described DRS sends, for described the first virtual machine is moved to the migration order of the second computing node from the first computing node;

Described the first computing node carries out snapshot to the first virtual machine to be migrated on described the first computing node, and the preservation snapshot document comprises:

In response to described migration order, described the first computing node carries out snapshot to the first virtual machine to be migrated of host on described the first computing node, and preserves corresponding snapshot document.

17. a computing node, is characterized in that, is applied to system of virtual cluster;

Described computing node comprises hardware layer, operates in the host Host on described hardware layer and operates at least one the virtual machine VM on described Host;

Further be deployed with virtual machine dynamic snapshot module, virtual machine information processing module on described Host;

Described virtual machine dynamic snapshot module, carry out snapshot for the first virtual machine to be migrated to host on described computing node, and preserve corresponding snapshot document;

Described virtual machine information processing module, after completing at snapshot, hang up described the first virtual machine; And will to the second computing node, send for starting the needed log-on message of described the first virtual machine, so that described the second computing node is recovered described the first virtual machine on described the second computing node according to described log-on message and described snapshot document, make by described the first virtual machine host after migration on described the second computing node.

18. computing node according to claim 17, is characterized in that, also is deployed with virtual machine on described Host and restarts the recovery module;

Described virtual machine information processing module, the second log-on message also sent for receiving the 3rd computing node, described the second log-on message is for starting the needed information of the second virtual machine to be migrated of host on described the 3rd node;

Described virtual machine is restarted the recovery module, for according to described the second log-on message and corresponding snapshot document, described the second virtual machine is recovered, and makes by described the second virtual machine host after migration on described the first computing node.

19. a system of virtual cluster, is characterized in that,

Described system of virtual cluster comprises at least two computing nodes; One of them computing node is computing node as described as claim 17 or 18.