CN112256477A

CN112256477A - Virtualization fault-tolerant method and device

Info

Publication number: CN112256477A
Application number: CN202011073260.3A
Authority: CN
Inventors: 蒋迪; 尤永康; 梅磊
Original assignee: Shanghai Zstack Information Technology Co ltd
Current assignee: Shanghai Zstack Information Technology Co ltd
Priority date: 2020-10-09
Filing date: 2020-10-09
Publication date: 2021-01-22

Abstract

The method comprises the steps that at least one main virtual machine and a virtual hard disk thereof are established on a first physical machine through a virtual machine manager, at least one standby virtual machine is correspondingly established on a second physical machine, and the virtual hard disk of the main virtual machine is copied to the second physical machine; starting the main virtual machine and starting the standby virtual machine after the main virtual machine is in the fault-tolerant mode, opening an external network interface of the main virtual machine, and simultaneously blocking the external network interface of the standby virtual machine; synchronizing the state information of the primary virtual machine to the standby virtual machine; if the first physical machine and/or the main virtual machine is detected to be out of order, the main virtual machine is switched to the standby virtual machine, an external network interface of the standby virtual machine is opened, and the standby virtual machine is operated by the synchronous latest state information, so that the fault tolerance technology is applied to virtualization, the hardware difference is shielded, the performance is ensured, and the deployed applications in the servers also have high availability.

Description

Virtualization fault-tolerant method and device

Technical Field

The present application relates to the field of computers, and in particular, to a virtualization fault tolerance method and apparatus.

Background

In the prior art, increasing the availability of the application program is always an important consideration target in building an information-based infrastructure, and in reality, due to the architectural limitation of the application program, additional development of components is required to expand the reliability of the application program, so that the availability of the application program can meet certain requirements. The traditional application high-availability technologies comprise main and standby technologies, double active technologies, clustering technologies and the like, but the technologies can be achieved only by application modification, and for a plurality of single-machine applications such as industrial field application and embedded application, if enterprises need to guarantee the availability of the applications so as to avoid loss of production business, individual application cluster software, database cluster software and the like need to be purchased, the deployment and maintenance costs are very expensive, and the high availability requirements are difficult to achieve. Therefore, if the fault-tolerant technology is used in the scenes, the aim of seamless switching can be achieved by ensuring that the application in another machine is on line immediately after one server is down, and therefore the service is kept on line all the time.

In the existing fault-tolerant scheme, a hardware fault-tolerant product FT Server adopts technologies such as locking step (lockstep) of a Central Processing Unit (CPU), hard disk RAID, and the like, to ensure that the CPUs, memories, hard disks, and the like of two servers are in a synchronous state, and when one machine crashes, the state of the other machine is immediately switched to on-line, which can ensure that an application is always on-line. For the hardware fault-tolerant technology, the running states of the two servers are kept consistent by adopting the CPU locking technology, so that certain defects can be generated in the aspects of compatibility, performance and the like, for example, the CPU locking technology is equivalent to a debugging switch, so that the running state of the CPU can be copied into an external CPU, but part of the computing capacity of the CPU is consumed in state copying, and the performances of the CPU, the memory and the like are lost; for another example, the CPU lockstep technology needs to be supported on a specific CPU model and motherboard, which requires hardware, and the user cannot deploy the technology using the own server; for another example, only one operating system can be deployed in a hardware fault-tolerant product, but an application program on the hardware fault-tolerant product is difficult to utilize all hardware performance of the hardware fault-tolerant product, so that resource waste is caused; for another example, hardware fault-tolerant products often adopt X86 servers, which results in being unusable in many scenarios adopting ARM servers, and thus the availability of these ARM server applications cannot meet the requirements.

Disclosure of Invention

An object of the present application is to provide a virtualization fault-tolerant method and device, which apply a fault-tolerant technique to virtualization, shield hardware differences, and enable applications deployed in each server to have extremely high availability while ensuring performance.

According to one aspect of the present application, a virtualization fault tolerance method is provided, which is applied to a virtual machine manager, wherein the method includes:

creating at least one main virtual machine and a virtual hard disk thereof on a first physical machine, correspondingly creating at least one standby virtual machine on a second physical machine, and copying the virtual hard disk of the main virtual machine to the second physical machine;

starting the main virtual machine and after the main virtual machine is in a fault-tolerant mode, starting the standby virtual machine, opening an external network interface of the main virtual machine, and simultaneously blocking the external network interface of the standby virtual machine;

synchronizing the state information of the primary virtual machine to the standby virtual machine on the second physical machine;

if the first physical machine and/or the main virtual machine is detected to be in fault, switching from the main virtual machine to the standby virtual machine, and opening an external network interface of the standby virtual machine to operate the standby virtual machine with the synchronous latest state information.

Further, in the above method, a simulator is included in each of the first physical machine and the second physical machine;

wherein the synchronizing the state information of the primary virtual machine to the standby virtual machine on the second physical machine includes:

synchronizing, by the simulator, state information of the primary virtual machine to the standby virtual machine on the second physical machine via an internal network between the first physical machine and the second physical machine.

Further, in the foregoing method, the synchronizing the state information of the primary virtual machine to the standby virtual machine on the second physical machine includes:

acquiring a network state and a memory state of the main virtual machine in real time;

determining synchronous transmission frequency based on the network state and the memory state of the primary virtual machine;

synchronizing state information of the primary virtual machine to the standby virtual machine on the second physical machine based on the synchronization transfer frequency.

Further, in the above method, the method further includes:

performing heartbeat detection on the first physical machine and the main virtual machine through an internal network between the first physical machine and the second physical machine so as to detect whether the first physical machine and the main virtual machine have faults or not.

Further, in the above method, the virtual hard disk of the primary virtual machine and the virtual hard disk of the standby virtual machine are independent of each other;

wherein the method further comprises:

and writing the related file of the primary virtual machine into the virtual hard disk of the primary virtual machine, and simultaneously writing the related file of the primary virtual machine into the virtual hard disk of the standby virtual machine.

Further, in the above method, the status information includes at least any one of:

central processing unit, memory, hard disk and network.

According to another aspect of the present application, there is also provided a non-volatile storage medium having computer readable instructions stored thereon, which, when executed by a processor, cause the processor to implement the virtualization fault tolerance method as described above.

According to another aspect of the present application, there is also provided an apparatus for virtualizing fault tolerance, wherein the apparatus includes:

one or more processors;

a computer-readable medium for storing one or more computer-readable instructions,

when executed by the one or more processors, cause the one or more processors to implement a virtualization fault tolerance method as described above.

Compared with the prior art, the method and the device have the advantages that at least one main virtual machine and the virtual hard disk thereof are created on a first physical machine through the virtual machine manager, at the same time, at least one standby virtual machine is correspondingly created on a second physical machine, and the virtual hard disk of the main virtual machine is copied to the second physical machine; starting the main virtual machine and after the main virtual machine is in a fault-tolerant mode, starting the standby virtual machine, opening an external network interface of the main virtual machine, and simultaneously blocking the external network interface of the standby virtual machine; synchronizing the state information of the primary virtual machine to the standby virtual machine on the second physical machine; if the first physical machine and/or the main virtual machine is detected to be out of order, the main virtual machine is switched to the standby virtual machine, an external network interface of the standby virtual machine is opened, and the standby virtual machine is operated by the synchronous latest state information, so that the fault tolerance technology is applied to virtualization, the operation efficiency of the virtual machine is improved, the hardware difference is shielded, the problem that a single-machine stateful application program cannot be highly available is solved, and the application deployed in each server has high availability while the performance is ensured.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 illustrates a flow diagram of a virtualization fault tolerance method in accordance with an aspect of the subject application;

FIG. 2 illustrates a virtual machine fault-tolerant architecture in a virtualization fault-tolerant method according to an aspect of the subject application;

FIG. 3 illustrates a flow diagram of fault-tolerant virtual machine creation in a virtualization fault-tolerant method according to an aspect of the subject application;

FIG. 4 illustrates a virtualization fault tolerance diagram in a virtualization fault tolerance method according to an aspect of the subject application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present application is described in further detail below with reference to the attached figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

Fig. 1 is a flow chart illustrating a virtualization fault tolerance method applied to a virtualization technology using a fault tolerance function in an Advanced RISC Machines (ARM) server, which is applied to a virtual machine manager side managing all physical Machines and virtual Machines according to an aspect of the present application. The method comprises the steps of S11, S12, S13 and S14, and specifically comprises the following steps:

step S11, the virtual machine manager creates at least one primary virtual machine and a virtual hard disk thereof on a first physical machine, at the same time, correspondingly creates at least one standby virtual machine on a second physical machine, and copies the virtual hard disk of the primary virtual machine to the second physical machine; the first physical machine and the second physical machine are independent from each other, wherein one or more primary virtual machines can be created on the first physical machine, and correspondingly, standby virtual machines with the same number as the primary virtual machines created on the first physical machine are automatically created on the second physical machine, so as to create the primary virtual machine on the first physical machine and the corresponding standby virtual machine on the second physical machine. In a preferred embodiment of the present application, it is preferable that 2 primary virtual machines are created on the first physical machine, as shown in fig. 2, that is, the primary virtual machines 1 are respectively created on the physical machine 1: fault tolerant virtual machine a and primary virtual machine 2: a fault tolerant virtual machine B, and is the primary virtual machine 1: fault tolerant virtual machine a and primary virtual machine 2: the fault-tolerant virtual machine B sets the high availability level as follows: a fault-tolerant mode, namely two primary virtual machines established on the physical machine 1 are in the fault-tolerant mode; thereafter, on the physical machine 2, a virtual machine 1: a standby virtual machine 1 corresponding to the fault-tolerant virtual machine a: fault tolerant virtual machine a' and primary virtual machine 2: and a standby virtual machine 2 corresponding to the fault-tolerant virtual machine B: the fault-tolerant virtual machine B ', the fault-tolerant virtual machine A ' and the fault-tolerant virtual machine B ' are respectively as follows in sequence: copies of fault tolerant virtual machine a and fault tolerant virtual machine B, of course, here would create 2 primary virtual machines on the first physical machine is the preferred embodiment only.

Step S12, after the virtual machine manager starts the primary virtual machine and is in a fault-tolerant mode, the standby virtual machine is started, an external network interface of the primary virtual machine is opened, and the external network interface of the standby virtual machine is blocked;

step S13, the virtual machine manager synchronizes the state information of the primary virtual machine to the standby virtual machine on the second physical machine; here, the state information of the primary virtual machine includes, but is not limited to, at least any one of: a Central Processing Unit (CPU), a memory, a hard disk, a network and the like, so that the fault-tolerant virtual machine A and the fault-tolerant virtual machine B respectively copy and synchronize state information of the network, the CPU, the memory, the hard disk and the like of the fault-tolerant virtual machine A and the fault-tolerant virtual machine B to a standby virtual machine 1 in a physical machine 2 in real time through an internal network of the physical machine 1: fault tolerant virtual machine a' and standby virtual machine 2: in the fault-tolerant virtual machine B', the real-time monitoring and synchronization of the state information of the main virtual machine are realized through the virtual machine manager.

Step S14, if the virtual machine manager detects that the first physical machine and/or the primary virtual machine has a failure, the virtual machine manager switches from the primary virtual machine to the standby virtual machine, and opens an external network interface of the standby virtual machine, and runs the standby virtual machine with the synchronized latest status information, so that the virtual machine manager monitors the synchronization process and the status information of the primary virtual machine, so as to ensure that the primary virtual machine can switch to the standby virtual machine in time when the physical machine and/or the primary virtual machine has a failure, and ensure that the entire system can be in a working state even if the failure is sent. Here, the failure of the first physical machine and/or the main virtual machine includes, but is not limited to, a down or unexpected power failure.

Through the steps S11 to S14, the fault tolerance technology is applied to virtualization, the running efficiency of the virtual machine is improved, the hardware difference is shielded, the problem that a single-machine stateful application program cannot be highly available is solved, and the application deployed in each server has extremely high availability while the performance is ensured.

For example, as shown in fig. 3, the virtual machine manager first creates a primary virtual machine on the physical machine 1, and starts the primary virtual machine in the FT fault tolerant mode; then, creating a virtual hard disk used by a primary virtual machine, wherein the virtual hard disk corresponding to the primary virtual machine can be an empty disk or a hard disk from a system template; meanwhile, a standby virtual machine corresponding to the primary virtual machine is established on the physical machine 2, and after a virtual hard disk of the primary virtual machine is established on the physical machine 1, the virtual hard disk of the primary virtual machine is also copied to the standby virtual machine in the physical machine 2 to be used as the virtual hard disk of the standby virtual machine; starting a primary virtual machine on a physical machine 1, starting a standby virtual machine with the same configuration as the primary virtual machine on a physical machine 2, and simultaneously only opening an external network interface of the primary virtual machine to provide services to the outside, wherein although the external network interface of the standby virtual machine is also started, the virtual machine manager blocks the external network interface of the standby virtual machine to block the network from being provided to the outside; after the primary virtual machine and the standby virtual machine are started and operated, the virtual machine manager synchronizes state information of the primary virtual machine on the physical machine 1 to the standby virtual machine on the physical machine 2 in real time to achieve synchronization of the state information between the primary virtual machine on the physical machine 1 and the standby virtual machine on the physical machine 2, so that the primary virtual machine on the physical machine 1 can be switched to the standby virtual machine on the physical machine 2 when the physical machine 1 and/or the primary virtual machine fails subsequently. With the change of the relevant state of the application program involved in the primary virtual machine on the physical machine 1, the state information of the primary virtual machine itself is also changing continuously, where the relevant state of the application program involved in the primary virtual machine includes, but is not limited to, the state of the application program, the state of a database, memory data, and a network with the outside world; after the state information of the primary virtual machine on the physical machine 1 is continuously synchronized to the standby virtual machine on the physical machine 2, the related state of the application program related in the standby virtual machine on the physical machine 2 is consistent with that of the primary virtual machine on the physical machine 1 at any time, and if the virtual machine manager detects that the physical machine and/or the primary virtual machine has a fault, the virtual machine manager immediately switches the standby virtual machine on the physical machine 2 to a working state, that is, immediately opens the external network interface of the standby virtual machine on the physical machine 2, so that the application is ensured to continue to operate in the latest state before the fault, and the continuity of the service is ensured.

Following the above embodiments of the present application, both the first physical machine and the second physical machine include a simulator thereon;

wherein, the step S14 of synchronizing the state information of the primary virtual machine to the standby virtual machine on the second physical machine specifically includes:

For example, the first physical machine further includes a simulator corresponding to the primary virtual machine, and the second physical machine also includes a simulator corresponding to the standby virtual machine, where the simulator is configured to simulate hardware configuration information and run an operating system on the simulator to obtain results, data, and the like related to the running. As shown in fig. 4, the virtual machine manager transfers the state information of the CPU, the memory, the hard disk, the network, and the like of the primary virtual machine on the physical machine 1 to the standby virtual machine on the physical machine 2 through the internal network between the physical machine 1 and the physical machine 2 by using a simulator (qemu-system-a-arch 64), so as to implement real-time synchronization of the state information of the standby virtual machine on the physical machine 2.

Next to the foregoing embodiment of the present application, the step S13 synchronizing the state information of the primary virtual machine to the standby virtual machine on the second physical machine specifically includes:

For example, in the process of synchronizing the state information of the primary virtual machine to the standby virtual machine on the second physical machine in step S13, in order to better synchronize the state information, it is necessary to obtain the network state and the memory state of the primary virtual machine in real time, and then determine a synchronization transfer frequency according to the network state and the memory state of the primary virtual machine, that is, the synchronization transfer frequency is mainly determined by the change of the network state and the change of the memory state in the primary virtual machine, for example, the synchronization transfer frequency synchronizes the state information of the primary virtual machine to the standby virtual machine on the second physical machine with a micro-checkpoint (micro-checkpoint) as a transfer unit, where the synchronization with the micro-checkpoint (micro-checkpoint) as the transfer unit is only a preferred embodiment of the synchronization transfer frequency; and then, synchronizing the state information of the primary virtual machine to the standby virtual machine on the second physical machine based on the synchronous transmission frequency, so as to realize synchronous transmission of different frequencies on the state information of the primary virtual machine based on the change of the relevant state of the primary virtual machine.

Further, another aspect of the present application provides a virtualization fault tolerance method, further including:

For example, as shown in fig. 4, in order to facilitate fault detection of the primary virtual machines on the physical machine 1 and the physical machine 2, the virtual machine managers on the two physical machines perform heartbeat detection on the first physical machine and the primary virtual machine through an internal network between the physical machine 1 and the physical machine 2, and detect whether the first physical machine and the primary virtual machine are faulty through the heartbeat detection, so as to implement heartbeat detection on whether the physical machine 1 and the corresponding primary virtual machine are faulty; when detecting that the main virtual machine or the physical machine 1 cannot contact, the virtual machine manager for managing the physical machine and the virtual machine immediately opens the external network interface of the standby virtual machine on the physical machine 2, so that the communication network channel of the standby virtual machine on the physical machine 2 is opened, the standby virtual machine can directly communicate with the outside, and the continuous operation of the application is ensured.

Next, in the foregoing embodiment of the present application, the virtual hard disk of the primary virtual machine is independent of the virtual hard disk of the standby virtual machine;

wherein the method further comprises:

For example, as shown in fig. 4, the virtual hard disk of the primary virtual machine on the first physical machine and the virtual hard disk of the standby virtual machine on the second physical machine are independent from each other, the files stored in the respective virtual hard disks are also independent files (files in qcow2 format), the contents of the virtual hard disk 1 and the virtual hard disk 2 before system startup are completely the same, after the primary virtual machine and the standby virtual machine are both started, the virtual machine manager writes the relevant files of the primary virtual machine into the virtual hard disk 1 of the primary virtual machine, and simultaneously writes the relevant files of the primary virtual machine into the virtual hard disk 2 of the standby virtual machine, so as to implement real-time synchronization of the contents such as the files of the virtual hard disk 1 of the primary virtual machine into the virtual hard disk 2 of the standby virtual machine.

In fig. 4, the network of the primary virtual machine on the physical machine 1 will pass through the virtual machine manager and then communicate with the outside, and meanwhile, since the standby virtual machine and the primary virtual machine have the same information such as IP and MAC addresses, the virtual machine manager in the physical machine 2 is required to intercept the external transmission network thereof, that is, the virtual machine manager blocks the external network interface of the standby virtual machine of the physical machine 2, so as to prevent network conflict with the primary virtual machine, thereby implementing application of virtualization fault-tolerant technology in the ARM server.

one or more processors;

Here, the details of each embodiment of the device for virtualization fault tolerance may specifically refer to the corresponding part of the embodiment of the virtualization fault tolerance method at the virtual machine manager end, and are not described herein again.

To sum up, the method includes the steps that at least one primary virtual machine and a virtual hard disk thereof are created on a first physical machine through a virtual machine manager, at the same time, at least one standby virtual machine is correspondingly created on a second physical machine, and the virtual hard disk of the primary virtual machine is copied to the second physical machine; starting the main virtual machine and after the main virtual machine is in a fault-tolerant mode, starting the standby virtual machine, opening an external network interface of the main virtual machine, and simultaneously blocking the external network interface of the standby virtual machine; synchronizing the state information of the primary virtual machine to the standby virtual machine on the second physical machine; if the first physical machine and/or the main virtual machine is detected to be out of order, the main virtual machine is switched to the standby virtual machine, an external network interface of the standby virtual machine is opened, and the standby virtual machine is operated by the synchronous latest state information, so that the fault tolerance technology is applied to virtualization, the hardware difference is shielded, and the application deployed in each server has high availability while the performance is ensured.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A virtualization fault tolerance method is applied to a virtual machine manager, wherein the method comprises the following steps:

2. The method of claim 1, wherein the first physical machine and the second physical machine each include a simulator thereon;

3. The method of claim 2, wherein the synchronizing the state information of the primary virtual machine to the standby virtual machine on the second physical machine comprises:

4. The method of claim 1, wherein the method further comprises:

5. The method according to any one of claims 1 to 4, wherein the virtual hard disk of the primary virtual machine and the virtual hard disk of the standby virtual machine are independent of each other;

wherein the method further comprises:

6. The method of any of claims 1 to 4, wherein the status information comprises at least any of:

central processing unit, memory, hard disk and network.

7. A non-transitory storage medium having stored thereon computer readable instructions which, when executed by a processor, cause the processor to implement the method of any one of claims 1 to 6.

8. An apparatus for virtualizing fault tolerance, wherein the apparatus comprises:

one or more processors;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.