CN107656845A

CN107656845A - A kind of virtual machine high availability method

Info

Publication number: CN107656845A
Application number: CN201710843325.XA
Authority: CN
Inventors: 韩飞; 邓玉芳; 季统凯
Original assignee: G Cloud Technology Co Ltd
Current assignee: G Cloud Technology Co Ltd
Priority date: 2017-09-18
Filing date: 2017-09-18
Publication date: 2018-02-02

Abstract

The present invention relates to virtual machine technique field, particularly a kind of virtual machine high availability method.The present invention is that virtual machine monitor starts monitoring flow；Stab and mark toward the virtual machine write time when monitoring flow monitoring virtual machine storage system is normal；When stabbing update abnormal such as host monitoring module detection time, the control module of control machine is alerted；After control module confirms failure, virtual machine owner is changed, carries out fault recovery.The present invention provides high reliability for virtual machine in cloud computing platform and High Availabitity service easy to use provides scheme；It can be used on Virtual Machine Manager.

Description

A kind of virtual machine high availability method

Technical field

The present invention relates to virtual machine technique field, particularly a kind of virtual machine high availability method.

Background technology

In cloud computing platform, because many reasons such as network, storage system, hardware and software failure, virtual machine is potential because event Barrier causes the possibility of end of service.In order to solve this problem, many cloud computing platforms can provide high availability mechanism for virtual machine The operation of automatic fast quick-recovery virtual machine after virtual machine failure.These high availability mechanisms can be divided into two major classes, there is controller With without controller.Their typical case realizes often there is problems with：

First, for without controller scheme, in order to ensure virtual machine High Availabitity, it is necessary to be deploying virtual machine one in advance High-availability cluster, shifted by certain distributed algorithm to monitor the state of virtual machine and link up failure.Vmware and Azure are It is this kind of implementation method.The problem of this method, is the waste for operating complexity, resource on O＆M, and uncontrollability, Reliability is not high；The complexity that mechanism is realized.Another be exactly with ready-made Open-Source Tools such as keepalived, Heartbeat etc. is done, but Operating Complexity is high, and scheme reliability is low.

2nd, for having controller scheme, the automatic recovery for performing virtual machine is carried out independent of distributed coordination algorithm Journey, but rely on control module and coordinate whole recovery process to control, controllability is higher；But the implementation of main flow is generally deposited The reliability deficiency the problem of, such as：Agent is relied on, does not find failure, judges failure, data in magnetic disk damage etc. by accident.

The content of the invention

Present invention solves the technical problem that it is to provide a kind of virtual machine High Availabitity side easy to use, controllable and highly reliable Method, meet the needs that virtual machine automatic fault is recovered under cloud computing platform environment.

The present invention solve above-mentioned technical problem scheme be：

Described method is that virtual machine monitor starts monitoring flow；It is normal in monitoring flow monitoring virtual machine storage system When toward the virtual machine write time stab mark；When stabbing update abnormal such as host monitoring module detection time, the control of control machine is alerted Molding block；After control module confirms failure, virtual machine owner is changed, carries out fault recovery.

Specifically comprise the following steps：

Step 1：Start an independent monitoring flow in virtual machine monitor, whether disk is checked when virtual machine starts Belong to virtual machine oneself, if be not belonging to, backed off after random can be alerted；Belong to then normal operation, and periodic test virtual machine magnetic Disk；

Step 2：If virtual machine storage system is normal, monitoring flow is stabbed toward the magnetic disk of virtual machine write time and marked；Such as Fruit storage system is abnormal, then can not stab renewal time；

Step 3：The timestamp of monitoring module periodic test virtual machine renewal where virtual machine on host, if do not had There is normal renewal, then send alarm to the control module in control main frame；

Step 4：Control module can check virtual machine state after receiving alarm again by suspected malfunctions magnetic disk of virtual machine, If certain failure, step 5 is performed；If simultaneously non-faulting or failure are recovered, step 6 is performed；If do not receive To the heartbeat signal of monitoring module, show that at least monitoring module is to out of joint between control module, control module can also be sent Warning.

Step 5：Change virtual machine owner, wait one section of safety time after other hosts recover virtual machine；

Step 6：Any recovery flow is not taken, while is reported to administrative staff, and flow terminates.

Described virtual machine monitor is hypervisor, is the system of actual motion and management and control virtual machine on host, Including Xen, qemu-kvm；

The host refers to the physical server of actual motion virtual machine, and monitoring module is transported independently of virtual machine monitor Row is on host；

The control main frame is to be responsible for operation in cloud computing cluster and provide the server of control service；

The owner, refer to which platform virtual machine and host possess the right to use of the virtual disk.

Described monitoring flow is to start the thread started before virtual machine；The thread is virtual dedicated for cycle detection The readable writability of machine disk, and renewal time stamp is to show that virtual machine hypervisor and virtual machine are accessing storage system just Often.

Server where described control module needs that the virtual disk of suspected malfunctions virtual machine can be had access to, and only needs The review time, stamp was either with or without change within the virtual machine update of time stamp cycle, and did not needed the accuracy of time.

Described control module selects a suitable host to resume operation the fault virtual machine, changes virtual disk After owner, an assurance time period is waited, then start the virtual machine to be recovered on the host of selection；

One assurance time period of the wait, is to prevent virtual machine for no other reason than that provisional network failure causes Wrong report and erroneous trigger failure transfer, so as to causing service disconnection or even corrupted data；Virtual machine is ensured that in this wait Assurance time period in voluntarily exit, prevent fissure.

Beneficial effects of the present invention are as follows：

(1) present invention can occurred because of network, deposited by the virtual machine high availability scheme based on virtual machine monitor Virtual machine caused by the reasons such as storage system failure, equipment fault, software fault is out of service and service disconnection in the case of, realize from Dynamicization, simply reliable virtual-machine fail recovery, ensure the high availability of virtual machine.

(2) present invention realizes simple and reliable virtual machine by realizing fault detection mechanism from virtual machine monitor layer Fault detect and recovery；By with hop controller, it is possible to achieve flexibly it is controllable, use simply based on strategy virtual-machine fail Restoration Mechanism.

Based on above reason, in order to realize a deployment O＆M it is easy to use, it is controllable, economize on resources, highly reliable virtual Machine high availability mechanism provides the virtual machine that underlying mechanisms support, it is necessary to a kind of tape controller pattern, in hypervisor levels High availability scheme.

Brief description of the drawings

The present invention is further described below in conjunction with the accompanying drawings：

Fig. 1 is the inventive method flow chart；

Fig. 2 is module deployment topologies figure of the present invention.

Embodiment

As shown in Figure 1, 2, the basic procedure of invention is：

In above-mentioned flow, the testing process of hypervisor aspects, be this programme core and emphasis, there is provided High Availabitity Base layer support mechanism.This testing process is realized inside hypervisor, is run and is detected before virtual machine starts, and in Continuous service and incipient fault is detected in virtual machine running.Hypervisor detections basic procedure false code is as follows：

The basic procedure of monitoring module on host is as follows：

The basic procedure of control module is as follows in control machine：

Claims

A kind of 1. virtual machine high availability method, it is characterised in that：Described method is that virtual machine monitor starts monitoring flow； Stab and mark toward the virtual machine write time when monitoring flow monitoring virtual machine storage system is normal；When being detected such as host monitoring module Between when stabbing update abnormal, alert the control module of control machine；After control module confirms failure, virtual machine owner is changed, carries out event Barrier recovers.
2. according to the method for claim 1, it is characterised in that：Specifically comprise the following steps：

Step 1：Start an independent monitoring flow in virtual machine monitor, check whether disk belongs to when virtual machine starts Virtual machine oneself, if be not belonging to, backed off after random can be alerted；Belong to then normal operation, and periodic test magnetic disk of virtual machine；

Step 2：If virtual machine storage system is normal, monitoring flow is stabbed toward the magnetic disk of virtual machine write time and marked；If deposit Storage system is abnormal, then can not stab renewal time；

Step 3：The timestamp of monitoring module periodic test virtual machine renewal where virtual machine on host, if without just Often renewal, then send alarm to the control module in control main frame；

Step 4：Control module can check virtual machine state after receiving alarm again by suspected malfunctions magnetic disk of virtual machine, if Certain failure, then perform step 5；If simultaneously non-faulting or failure are recovered, step 6 is performed；If it is not received by prison The heartbeat signal of module is controlled, shows that at least monitoring module is to out of joint between control module, control module can also give a warning.

Step 5：Change virtual machine owner, wait one section of safety time after other hosts recover virtual machine；

Step 6：Any recovery flow is not taken, while is reported to administrative staff, and flow terminates.
3. according to the method for claim 2, it is characterised in that：Described virtual machine monitor is hypervisor, is place The system of actual motion and management and control virtual machine on main frame, including Xen, qemu-kvm；

The host refers to the physical server of actual motion virtual machine, and monitoring module is run on independently of virtual machine monitor On host；

The control main frame is to be responsible for operation in cloud computing cluster and provide the server of control service；

The owner, refer to which platform virtual machine and host possess the right to use of the virtual disk.
4. according to the method for claim 2, it is characterised in that：Described monitoring flow be start virtual machine before start one Individual thread；The thread is dedicated for the readable writability of cycle detection magnetic disk of virtual machine, and renewal time stamp is to show virtual machine It is normal that hypervisor and virtual machine access storage system.
5. according to the method for claim 3, it is characterised in that：Described monitoring flow be start virtual machine before start one Individual thread；The thread is dedicated for the readable writability of cycle detection magnetic disk of virtual machine, and renewal time stamp is to show virtual machine It is normal that hypervisor and virtual machine access storage system.
6. according to the method described in any one of claim 2 to 5, it is characterised in that：Server needs where described control module The virtual disk of suspected malfunctions virtual machine can be had access to, and only needs the review time within the virtual machine update of time stamp cycle to stab Either with or without change, and the accuracy of time is not needed.
7. according to the method described in any one of claim 2 to 5, it is characterised in that：Described control module selects one properly Host resume operation the fault virtual machine, after the owner for changing virtual disk, wait an assurance time period, then selecting Start the virtual machine to be recovered on the host selected；

One assurance time period of the wait, is to prevent virtual machine for no other reason than that provisional network failure causes to report by mistake And erroneous trigger failure transfer, so as to causing service disconnection or even corrupted data；Virtual machine ensures that the peace in this wait Voluntarily exited in the full time cycle, prevent fissure.
8. according to the method for claim 6, it is characterised in that：Described control module selects a suitable host extensive Run the fault virtual machine again, after the owner for changing virtual disk, wait an assurance time period, then the host in selection The upper startup virtual machine to be recovered；

One assurance time period of the wait, is to prevent virtual machine for no other reason than that provisional network failure causes to report by mistake And erroneous trigger failure transfer, so as to causing service disconnection or even corrupted data；Virtual machine ensures that the peace in this wait Voluntarily exited in the full time cycle, prevent fissure.