CN109308232B - Method, device and system for rollback after virtual machine live migration fault - Google Patents

Method, device and system for rollback after virtual machine live migration fault Download PDF

Info

Publication number
CN109308232B
CN109308232B CN201710630142.XA CN201710630142A CN109308232B CN 109308232 B CN109308232 B CN 109308232B CN 201710630142 A CN201710630142 A CN 201710630142A CN 109308232 B CN109308232 B CN 109308232B
Authority
CN
China
Prior art keywords
virtual machine
live migration
original
rollback
machine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710630142.XA
Other languages
Chinese (zh)
Other versions
CN109308232A (en
Inventor
张超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710630142.XA priority Critical patent/CN109308232B/en
Publication of CN109308232A publication Critical patent/CN109308232A/en
Application granted granted Critical
Publication of CN109308232B publication Critical patent/CN109308232B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1482Generic software techniques for error detection or fault masking by means of middleware or OS functionality
    • G06F11/1484Generic software techniques for error detection or fault masking by means of middleware or OS functionality involving virtual machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Retry When Errors Occur (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses a method, a device and a system for rollback after a live migration fault of a virtual machine, wherein the method comprises the following steps: judging the stage of virtual machine live migration fault, if the virtual machine live migration fault occurs in the stage of virtual machine operation suspension, executing the following steps; storing running state data of an original virtual machine at one side of a source physical machine; cleaning an original virtual machine and a corresponding back-end resource thereof; creating a new virtual machine and a corresponding back-end resource thereof according to the stored running state data of the original virtual machine; and starting the new virtual machine. By using the method, the problem that rollback fails after a virtual machine live migration fault due to inconsistent front and rear end resources caused by shutdown operation of part of the existing operating system can be solved.

Description

Method, device and system for rollback after virtual machine live migration fault
Technical Field
The application relates to the field of virtual machines, in particular to a method, a device and a system for rollback after a live migration fault of a virtual machine. The present application further relates to a method of data processing.
Background
The virtual machine live migration is a key technology in cloud computing operation, and the virtual machine is migrated from one physical machine to another physical machine through the live migration, so that dynamic scheduling, load balancing, active fault tolerance of physical faults and online maintenance of computing resources are realized, the reliability of a system is improved, and stronger computing capacity, larger memory and faster communication capacity can be obtained.
At present, the mainstream virtual machine live migration method is a memory pre-copy migration method, and the process is as follows: under the condition that the normal operation of the virtual machine on the source physical host is kept, the memory of the virtual machine is copied from the source physical host to the destination physical host in an iteration mode, and the virtual machine always keeps an operating state in the copying process, so that the memory which is continuously transmitted is rewritten by the virtual machine again, and the memory of the virtual machine can be recursively copied to a destination end only in a multi-round copying mode. After copying most of the memories, the virtual machine monitor can suspend the running of the virtual machine, and finally transmit the last residual dirty memory pages, the state information of the CPU and the state information of the virtual machine equipment to the destination end together, so as to complete the operation of virtual machine migration.
When the virtual machine live migration process fails, the virtual machine is required to continue to run on the source physical machine, and the process is called rollback after the virtual machine live migration failure. The existing method for rollback after a virtual machine live migration fault is to directly recover the running of the virtual machine on a source physical machine, and the theoretical basis of the method is as follows: when the virtual machine is failed in live migration, the virtual machine runs at the source end, the related context is not changed, and dirty data on the source physical machine can be reused, so that the purpose of fast rollback can be achieved by directly recovering the virtual machine to be in a running state.
However, the above existing method for rollback after a virtual machine live migration failure has certain drawbacks.
In the memory pre-copy migration method, a fault may occur at any stage of the live migration, and requirements for fault rollback at each stage are different, when the fault occurs in the memory pre-copy process, the process of the live migration needs to be terminated, and resources reserved at a destination end are cleaned at the same time, so that the virtual machine can be ensured to resume running on a source physical machine; when a fault occurs in a halt copy stage, restoring the operation of the virtual machine requires that front-end and back-end resources of the virtual machine can be kept consistent, however, the existing live migration rollback method cannot guarantee that the front-end and back-end resources of all virtual machine operating systems can be always kept consistent in the process, in the halt copy stage, the halt operation has a certain difference for different types of operating systems, the difference mainly comes from uncertainty of connection conditions of the front-end and back-end resources during halt copy in the live migration process, connection of the front-end and back-end resources of a part of operating systems can be disconnected, for example, a part of Linux systems can disconnect a network, and connection of the front-end and back-end resources of some operating systems can not be disconnected, for example, network card drive of a windows system. When the front-end and back-end resources of the live-migration virtual machine are in a disconnected state during shutdown copy, if the state of the back-end resources is simply considered to be unchanged and the back-end resources are not initialized again, the connection cannot be established due to inconsistency of the states of the front-end and back-end resources when the virtual machine is directly recovered to run; when the front-end and back-end resources are in a connected state when the live-migrated virtual machine is in a shutdown copy, if the virtual machine is reconnected when the operation is resumed, the states of the front-end and back-end resources are also not unified, so that the hot migration rollback failure of the virtual machine is caused.
Disclosure of Invention
The application provides a method for rollback after a virtual machine live migration fault, which aims to solve the problem that rollback after the virtual machine live migration fault fails due to inconsistent front-end and back-end resources caused by shutdown operation of part of operating systems in the prior art. The application additionally provides a device and a system for rollback after a virtual machine live migration fault. The present application additionally provides a method of data processing.
The application provides a method for rollback after a virtual machine live migration fault, which comprises the following steps:
judging the stage of virtual machine live migration fault, if the virtual machine live migration fault occurs in the stage of virtual machine operation suspension, executing the following steps;
storing running state data of an original virtual machine at one side of a source physical machine;
cleaning an original virtual machine and a corresponding back-end resource thereof;
creating a new virtual machine and a corresponding back-end resource thereof according to the stored running state data of the original virtual machine;
and starting the new virtual machine.
Optionally, the cleaning of the original virtual machine and the corresponding back-end resource thereof includes:
and releasing the process of the original virtual machine and cleaning back-end resources of the original virtual machine on the source physical machine.
Optionally, the clearing back-end resources of the original virtual machine on the source physical machine includes:
disconnecting storage and network connection;
and cleaning a storage and network connection driver program corresponding to the original virtual machine on the back-end equipment.
Optionally, the creating a new virtual machine and its corresponding backend resource according to the stored running state data of the original virtual machine includes:
loading the original virtual machine process from the stored running state data of the original virtual machine; and is
And according to the stored running state data of the original virtual machine, establishing a back-end resource which meets the parameter requirement of the original virtual machine on the source physical machine.
Optionally, the backend resource includes:
a storage and network driver corresponding to the virtual machine.
Optionally, after the newly-built virtual machine and the corresponding backend resource thereof are started, before the starting of the new virtual machine, the method further includes:
and modifying the state identifier of the control structure of the virtual machine to indicate that the virtual machine needs to trigger power-on management.
Optionally, the starting the new virtual machine includes:
resuming operation of the virtual processor;
enable a timer and interrupt;
triggering power management of the virtual machine equipment;
and the scanning equipment is connected with the front end and the rear end of the trigger.
Optionally, the method is applicable to rollback after a live migration fault of the virtual machine in the Xen virtualization architecture.
The present application further provides a device for rollback after a live migration failure of a virtual machine, the device comprising:
the virtual machine live migration fault generation stage judgment unit is used for judging the generation stage of the virtual machine live migration fault, and if the virtual machine live migration fault occurs in the stage of virtual machine operation suspension, the following units are triggered;
the original virtual machine running state data storage unit is used for storing the running state data of the original virtual machine at one side of the source physical machine;
the original virtual machine and the corresponding back-end resource cleaning unit are used for cleaning the original virtual machine and the corresponding back-end resource,
the new virtual machine and the corresponding back-end resource creating unit are used for creating the new virtual machine and the corresponding back-end resource according to the stored running state data of the original virtual machine;
and the new virtual machine starting unit is used for starting the new virtual machine.
The present application further provides a system, comprising:
a processor;
a system control unit;
a system memory;
a non-volatile memory or storage device;
a network interface;
an input/output (I/O) device;
a program of instructions;
the system memory and the non-volatile memory or storage device store a temporary copy and a persistent copy of the program of instructions, respectively; when the program of instructions is executed by the processor, the system performs the method described above.
The present application additionally provides a method of data processing, the method comprising the steps of:
determining that a first virtual machine live migration fails, wherein the failure occurs in a suspended operation stage of the first virtual machine;
clearing the first virtual machine and the corresponding back-end resource thereof, and storing the running state data of the first virtual machine;
and creating a second virtual machine and a corresponding back-end resource thereof according to the running state data of the first virtual machine.
Optionally, the backend resource includes: and in the virtualization process, the driver program can call the local device driver according to the request information of the virtual machine, so that the purpose of accessing the physical hardware is achieved.
Compared with the prior art, the method has the following advantages:
when the thermal migration fault of the virtual machine occurs in a shutdown copy stage, the virtual machine and the corresponding back-end resources are directly cleaned, a new virtual machine is rebuilt according to the running state data of the original virtual machine, and the virtual machine is started, so that the virtual machine reaches the same running state as the original virtual machine, the recovery running of the original virtual machine on the source physical machine is completed, and the possible problem that the virtual machine fails to recover the running on the source physical machine due to the fact that the front-end and back-end resources of the virtual machine cannot be unified is solved.
Drawings
Fig. 1 is a flowchart of a method for rollback after a live migration failure of a virtual machine according to a first embodiment of the present application;
fig. 2 is a schematic diagram of a fault occurring in a virtual machine live migration process according to a first embodiment of the present application;
FIG. 3 is an architecture diagram illustrating the operation of a rollback method after a virtual machine live migration failure according to the present application;
FIG. 4 is a schematic diagram of virtual machine live migration provided herein;
FIG. 5 is a complete schematic diagram of a virtual machine rollback after a live migration failure as provided herein;
FIG. 6 is a block diagram of an apparatus unit for rollback after a virtual machine live migration failure according to a second embodiment of the present application;
FIG. 7 is a schematic view of a system provided in a third embodiment of the present application;
fig. 8 is a flowchart of a data processing method according to a fourth embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, but rather construed as limited to the embodiments set forth herein.
While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. However, it should be understood by those skilled in the art that the purpose of the present description is not to limit the technical solution of the present application to the specific embodiments disclosed in the present description, but to cover all modifications, equivalents, and alternative embodiments consistent with the technical solution of the present application.
References in the specification to "an embodiment," "this embodiment," or "an example embodiment," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
Embodiments of the present application may be implemented in software, hardware, firmware, or a combination thereof, or otherwise. Embodiments of the application may also be implemented as instructions stored on a non-transitory or non-transitory machine-readable medium (e.g., a computer-readable medium) that may be read and executed by one or more processors. A machine-readable medium includes any storage device, mechanism, or other physical structure that stores or transmits information in a form readable by a machine. For example, a machine-readable medium may include Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media, optical storage media, flash memory devices, and others.
In the drawings provided in this specification, some structural or methodical features are typically presented in a particular arrangement and/or order. It is to be understood that such specific arrangements and/or sequences are not required. In some embodiments, the features may be organized in a different arrangement and/or order than shown in the figures. Furthermore, the inclusion of a feature in a structure or method in a drawing does not imply that the feature is included in all embodiments, in some embodiments the feature may not be included, or the feature may be combined with other features.
In the application, a method for virtual machine live migration rollback, a device for virtual machine live migration rollback and a system are provided. The following examples are provided to explain in detail.
The method for rollback after the live migration fault of the virtual machine provided by the application has the following basic ideas: when the failure of the virtual machine live migration occurs in the shutdown copy stage, the virtual machine and the corresponding back-end resource thereof are directly cleaned, and the new virtual machine and the corresponding back-end resource thereof are reconstructed according to the running state data of the original virtual machine, and the newly-built virtual machine is started, so that the virtual machine with the failure in the live migration process recovers running on the source physical machine, please refer to fig. 5 to understand the method, which is a complete schematic diagram of the virtual machine live migration after failure rollback provided by the application.
The method for rollback after the live migration fault of the virtual machine is used as a part of a virtual machine management program and runs on a virtual machine management platform. For example, in the Xen virtualization architecture, a virtual machine hypervisor Xen is located between the operating system and the hardware, and is responsible for providing virtualized hardware resources for the operating system kernel running in the upper layer, and for managing and allocating these resources, as well as ensuring mutual isolation between the upper layer virtual machines (called Domain). Domain 0 is a privileged Domain created by the hypervisor of the virtual machine, runs on top of the Xen hypervisor, plays the role of an administrator in Xen, has the privilege of directly accessing hardware and managing other hypervisor domains U, and provides virtual resource services at the same time. In this embodiment, the method for rollback after a live migration fault of a virtual machine also runs on Domain 0, and is used as a certain management module of a virtual machine running environment. An operation framework of the method is shown in fig. 3, which is an operation framework diagram of a rollback method after a virtual machine live migration fault in the Xen virtualization architecture provided by this embodiment.
A first embodiment of the present application provides a method for rollback after a live migration failure of a virtual machine. Referring to fig. 1, a flowchart of a method for rollback after a live migration failure of a virtual machine according to a first embodiment of the present application is shown, which is described below with reference to fig. 1.
Block S101: and judging the occurrence stage of the virtual machine live migration fault, and executing the subsequent steps of the method if the virtual machine live migration fault occurs in the stage of virtual machine operation suspension.
The virtual machine live migration, also called virtual machine dynamic migration or virtual machine online migration, refers to migrating a running virtual machine from a source physical host to a destination physical host, and ensuring that an application running on the virtual machine can run normally in a migration process.
For example, a virtual machine live migration method adopted by a Xen virtual machine in a Xen virtualization technology is a mainstream memory pre-copy (pre-copy) migration method, and the working principle of the memory pre-copy migration method is as follows: the virtual machine management program marks all the memory pages as dirty pages and copies the dirty pages to a target host, then iteratively copies the memory pages, in the process, the dirty pages modified in the previous round of transmission are migrated at this time, when the memory to be transmitted is smaller than a threshold value or the iteration number exceeds a set maximum iteration number, the iterative copying of the memory pages is stopped, and finally the remaining memory dirty pages, the state information of the CPU and the state information of the virtual machine equipment are copied to the target physical host together.
In the memory pre-copy migration method, a virtual machine live migration process may be summarized into two stages, which are a memory pre-copy stage of a first stage and a shutdown copy stage of a second stage, respectively, as shown in fig. 2, which is a schematic diagram of a fault occurrence in the virtual machine live migration process, where the first stage includes the following stages:
201 starts migration → 202 destination resource reservation → 203 virtual machine memory iterative copy → 204 virtual machine stops running.
The second stage comprises the following stages:
204 stopping the virtual machine → 205 copying the virtual machine state and the device information → 206 activating the destination virtual machine state.
When a fault occurs in the live migration process of the virtual machine, if the fault occurs in the shutdown copy stage of the second stage, the method for rolling back after the live migration fault is executed.
If the failure occurs in the memory pre-copy stage of the first stage, it is a prior art method for rolling back the live migration of the virtual machine, and will not be described here.
And a block S102: and storing the running state data of the original virtual machine at one side of the source physical machine.
The storage of the running state data of the virtual machine is a common function of the virtual machine, the forms are various, and the common virtual machine snapshot and clone technology utilizes the running state storage logic of the virtual machine.
The running state data of the virtual machine refers to the CPU state (various general purpose registers and control registers), all memory images, and the device state (network card, video card, terminal controller, etc.) of the virtual machine, which includes the information of the creation parameters, resource configuration, etc. of the virtual machine. The method is characterized in that the running state data of the original virtual machine is stored on one side of the source physical machine, and the purpose of the method is to completely store all state data of the original virtual machine, such as creation information, configuration information, running information and the like, as the filing information of the virtual machine.
In this embodiment, a storage process of a complete virtual machine in the Xen virtualization technology is taken as an example for explanation, the process is similar to a storage stage in a "storage/recovery" function of a Xen virtual machine, and since the virtual machine in this embodiment is in a stop state, the storage process is simple, and it is only necessary to directly store all the memories of the virtual machine, a state of an analog device, a state related to the virtual machine, and a state of a virtual machine control structure VMCS in a predetermined order.
Block S103: and cleaning the original virtual machine and the corresponding back-end resource thereof.
After the running state data of the virtual machine is stored, the stage of cleaning the original virtual machine and the corresponding back-end resource state of the original virtual machine is entered.
The back-end resource refers to the local device driver which can be called according to the request information of the virtual machine during the operation of virtualization, so as to achieve the driving program for accessing the real hardware of the physical machine, the most core back-end resources are storage resources and network resources, for example, in the Xen virtualization architecture, backend equipment is established in a privileged Domain 0, frontend equipment is established in a user Domain U, in operation, all user domain operating systems send requests to the front-end device, the front-end device sends the requests and the identity information of the user domain to the back-end device in the privileged domain, the back-end device completes hardware access by using the device driver according to the request information, in the process, all real hardware access is initiated by calling a local device driver by the back-end device of the privileged domain, and the front-end device only needs to complete the forwarding operation of the data request. In the above process, the user Domain U is a virtualized client, and the privileged Domain 0 is a virtual machine administrator, where the virtual machine administrator includes a storage device driver and a network device driver, and correspondingly, the front end of each paravirtualized client includes a driver corresponding to the virtual machine administrator to operate a network and a disk, so as to implement storage connection and network connection of the paravirtualized client; accordingly, each fully virtualized client has a corresponding daemon program in the hypervisor that implements network and disk access operations for the fully virtualized client.
And clearing back-end resources of the original virtual machine, which is essentially to clear a back-end driver corresponding to the virtual machine, wherein the process is to disconnect the network and storage connection of the virtual machine, then clear the disconnected back-end driver, receive IO and network requests from a virtualization client, and then call a local device driver to perform real hardware access.
The process of cleaning the original virtual machine is the process of releasing the process of the virtual machine, and when the virtual machine is created in a mode of simulating by a virtual machine simulator, the process of the virtual machine is the simulation process of the virtual machine simulator. And in the process of releasing the virtual machine process, the memory of the virtual machine is kept unchanged. For example, when a virtual operating system emulator QEMU is used as a creation tool for virtual machines, each virtual machine corresponds to one QEMU process in the physical host, and when cleaning the virtual machine, only the QEMU process needs to be cleaned.
And a frame S104: and creating a new virtual machine and a corresponding back-end resource thereof according to the stored running state data of the original virtual machine.
After the original virtual machine and the corresponding back-end resource are cleaned, a new virtual machine and the corresponding back-end resource are created according to the stored running state data of the original virtual machine. The virtual machine and the back-end resource thereof are synchronously cleaned or created, and the purpose is to keep the states of the front-end resource and the back-end resource of the virtual machine uniform.
The creating of the new virtual machine and the corresponding back-end resource according to the stored running state data of the original virtual machine means that the virtual machine identical to the stored configuration information is re-created according to the stored information of the original virtual machine, such as the CPU state, all memory images, the equipment state and the like, and the back-end resource meeting the parameter requirement of the original virtual machine is newly created according to the stored state information.
In this embodiment, the rebuilding of the Xen fully virtualized virtual machine is consistent with the method of the recovery phase in the "save/restore" function of the Xen virtual machine, and the process is as follows: and creating a new virtual machine according to the stored configuration information, and recovering the memory of the original virtual machine, the related state of the virtual machine and the virtual machine control structure VMCS on the virtual machine. When a virtual operating system simulator QEMU is used as a virtual machine creation tool, the virtual machine is created in a way of loading a QEMU process. The memory recovery process is to recover the memory layout to make it consistent with the original virtual machine, and then recover the memory data.
The parameter meeting the parameter requirement of the original virtual machine means that the access requirement of the virtual machine on the storage resource and the network resource of the source physical machine can be met, so that storage and network connection are realized. The process of creating the backend resource corresponding to the new virtual machine corresponding to the deletion of the backend resource is as follows: and creating a driver program corresponding to the front-end resource of the original virtual machine and used for realizing storage and network connection on the virtual machine administrator.
Block S105: and starting the new virtual machine.
And when the virtual machine is created, starting the virtual machine to finish the running of the virtual machine on the source physical machine.
Because the virtual machine is a newly-built virtual machine and the power-on management needs to be triggered again when the virtual machine is started, the state identifier of the EAX register in the control structure VMCS of the virtual machine needs to be modified to indicate that the current operating system needs to execute a new power-on starting process, and the state identifier is read by the operating system of the virtual machine after the virtual machine is started, so as to trigger the subsequent corresponding operation when the new virtual machine is started.
In this embodiment, the process of starting the virtual machine is a standardized process of starting the virtual machine by powering on, and the specific steps include: restoring the virtual processor to a running state; setting the interruption and timing operation of the virtual machine, and re-enabling the timer and the interruption; triggering power management of the virtual machine equipment; scanning all devices of the virtual machine, and triggering the connection of front-end and rear-end driving devices; and the virtual machine resumes running. Since the starting process of the virtual machine is a standard flow in the prior art, it will not be described herein too much.
In the method, the process of live migration fault rollback is converted into the process of re-creation and running recovery of the virtual machine and the rear-end resources thereof, so that the virtual machines of different types of operating systems successfully rollback in any fault state is realized. In this embodiment, the method is implemented by using a corresponding hypervisor in the Domain 0 of the virtual machine administrator in the Xen virtualization platform, and in other virtualization platforms, the implementation process of the method can be completed by using the hypervisor corresponding to the virtualization platform. A second embodiment of the present application provides a device for rollback after a live migration failure of a virtual machine, please refer to fig. 6, which is a block diagram of units in the embodiment. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.
The device for rollback after the live migration fault of the virtual machine comprises:
a virtual machine live migration fault occurrence stage determining unit 601, configured to determine a virtual machine live migration fault occurrence stage, and if the virtual machine live migration fault occurs at a virtual machine operation suspension stage, trigger the following unit;
an original virtual machine running state data saving unit 602, configured to save running state data of an original virtual machine on a source physical machine side;
the original virtual machine and the corresponding back-end resource cleaning unit 603 are used for cleaning the original virtual machine and the corresponding back-end resource;
a new virtual machine and its corresponding back-end resource creating unit 604, configured to create a new virtual machine and its corresponding back-end resource according to the stored running state data of the original virtual machine;
a new virtual machine starting unit 605, configured to start the new virtual machine.
Optionally, the cleaning of the original virtual machine and the corresponding backend resource thereof includes:
and releasing the process of the original virtual machine and cleaning back-end resources of the original virtual machine on the source physical machine.
Optionally, the cleaning back-end resources of the virtual machine on the source physical machine includes:
disconnecting storage and network connection;
and cleaning a storage and network connection driver program corresponding to the original virtual machine on the back-end equipment.
Optionally, the creating a new virtual machine and its corresponding backend resource according to the saved running state data of the original virtual machine includes:
loading the original virtual machine process from the stored running state data of the original virtual machine; and is
And according to the stored running state data of the original virtual machine, establishing a back-end resource which meets the parameter requirement of the original virtual machine on the source physical machine.
Optionally, the backend resource includes:
a storage and network driver corresponding to the virtual machine.
Optionally, after the newly-built virtual machine and the corresponding backend resource thereof are started, before the starting of the new virtual machine, the method further includes:
and modifying the state identifier of the control structure of the virtual machine to indicate that the virtual machine needs to trigger power-on management.
Optionally, the starting the new virtual machine includes:
resuming operation of the virtual processor;
enable Timer and interrupt;
triggering power management of the virtual machine equipment;
and the scanning equipment triggers the front end and the rear end to be connected.
Optionally, the method is applicable to rollback after a live migration fault of a virtual machine in a Xen virtualization architecture.
A third embodiment of the present application provides a system, please refer to fig. 7, which is a schematic diagram of the system embodiment. The system 700 includes: a processor 701, a System control unit 702 coupled to the processor, a System Memory (System Memory)703 coupled to the System control unit, a non-volatile Memory (non-volatile Memory-NVM) or storage device 704 coupled to the System control unit, a network interface 705 coupled to the System control unit, an input/output (I/O) device 706, and an instruction program 707. The system memory 703 and the non-volatile storage or storage device 704 may store a temporary copy and a persistent copy of a program of instructions 707, respectively, and when the program of instructions 707 is executed by at least one of the processors 701, the system 700 will perform the method of virtual machine rollback after a live migration failure as provided in the first embodiment of the present application.
The processor 701 may include at least one processor, each of which may be a single core processor or a multi-core processor. The processor 701 may include any combination of general-purpose processors and special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In particular implementation, the processor 701 may be configured to implement the method for rollback after a live migration failure of a virtual machine according to the first embodiment of the present application in different embodiments.
The system control unit 702 may include any corresponding interface controller to provide an interface for at least one of the processors 701, and/or any device or component in communication with the system control unit 702.
The system control unit 702 may include at least one memory controller that provides an interface to the system memory 703. The system memory 703 may be used to load and store data and/or instructions. The system memory 703 may include any volatile memory, such as Dynamic Random Access Memory (DRAM).
The non-volatile memory or storage device 704 may include at least one tangible, non-transitory computer-readable medium for storing data and/or instructions. The non-volatile memory or storage device 704 may include any form of non-volatile memory, such as flash memory, and/or any non-volatile storage device, such as at least one Hard Disk Drive (HDD), at least one optical disk drive, and/or at least one Digital Versatile Disk (DVD) drive.
The network interface 705 may include a transceiver that provides a wireless interface for the system 700, through which the system 700 may communicate across a network and/or with other devices. The network interface 705 may include any hardware and/or firmware. The network interface 705 may include multiple antennas that provide multiple-input, multiple-output wireless interfaces. In particular implementations, the network interface 705 may be a network adapter, a wireless network adapter, a telephone modem, and/or a wireless modem.
In particular implementations, at least one of the processors 701 may be packaged together with control logic for at least one controller in the system control unit 702. In specific implementation, at least one of the processors 701 may be packaged together with control logic of at least one controller in the System control unit 702 to form a System in Package-SiP. In particular, at least one of the processors 701 may be integrated on the same chip with the control logic of at least one of the controllers in the system control unit 702. In a specific implementation, at least one of the processors 701 may be integrated on the same Chip with the control logic of at least one of the controllers in the System control unit 702 to form a System on Chip (SoC).
The input/output devices 706 may include a user interface for user interaction with the system 700 and/or a peripheral component interface for peripheral component interaction with the system 700.
In various embodiments, the user interface may include, but is not limited to: a display (e.g., a liquid crystal display, a touch screen display, etc.), a speaker, a microphone, at least one camera device (e.g., a camera, and/or a camcorder), a flash, and a keyboard.
In various embodiments, the peripheral component interface may include, but is not limited to: a non-volatile memory port, an audio jack, and a power interface.
In various embodiments, the system 700 may be deployed on an electronic device such as a personal computer, a mobile computing device, and the like, which may include but is not limited to: a laptop, a tablet, a mobile phone, and/or other smart devices, etc. In different embodiments, the system 700 may include more or fewer components, and/or different architectures.
The fourth embodiment of the present application provides a data processing method. Please refer to fig. 8, which is a flowchart illustrating a method of data processing according to a fourth embodiment of the present application, and the embodiment is described below with reference to fig. 8.
Block S801 determines that a first virtual machine live migration fails, and the failure occurs in a suspended operation stage of the first virtual machine.
A block S802, cleaning the first virtual machine and its corresponding back-end resource, and storing running state data of the first virtual machine; wherein the back-end resources include: and in the virtualization process, the driver program can call the local device driver according to the request information of the virtual machine, so that the purpose of accessing the physical hardware is achieved.
And block S803, creating a second virtual machine and its corresponding backend resource according to the running state data of the first virtual machine.
The method for processing data provided in this embodiment is substantially the same as the method for rollback after a live migration failure of a virtual machine provided in the first embodiment, and only slightly adjusts the implementation steps and expressions, where the first virtual machine represents an original virtual machine in the first embodiment, and the second virtual machine represents a new virtual machine in the first embodiment.
Although the present invention has been described with reference to the preferred embodiments, it should be understood that the invention is not limited thereto, and that various changes and modifications may be made by those skilled in the art without departing from the spirit and scope of the invention.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims (9)

1. A method for rollback after a virtual machine live migration fault is characterized by comprising the following steps:
judging the occurrence stage of the virtual machine live migration fault, and if the virtual machine live migration fault occurs in the stage of virtual machine operation suspension, executing the following steps;
storing running state data of an original virtual machine at one side of a source physical machine;
clearing an original virtual machine and a corresponding back-end resource thereof, wherein the back-end resource is a driving program which can call a local device driver according to request information of the virtual machine in the virtualization operation so as to access real hardware of a physical machine;
creating a new virtual machine and a corresponding back-end resource thereof according to the stored running state data of the original virtual machine;
starting the new virtual machine;
after the new virtual machine and its corresponding backend resource are created, and before the new virtual machine is started, the method includes:
and modifying the state identifier of the control structure of the virtual machine to indicate that the virtual machine needs to trigger power-on management.
2. The method according to claim 1, wherein the clearing of the original virtual machine and its corresponding backend resources comprises:
and releasing the process of the original virtual machine and cleaning back-end resources of the original virtual machine on the source physical machine.
3. The method according to claim 2, wherein the clearing back-end resources of the original virtual machine on a source physical machine comprises:
disconnecting storage and network connection;
and cleaning a storage and network connection driver program corresponding to the original virtual machine on the back-end equipment.
4. The method according to claim 1, wherein the creating a new virtual machine and its corresponding backend resources according to the saved running state data of the original virtual machine includes:
loading the original virtual machine process from the stored running state data of the original virtual machine; and is
And according to the stored running state data of the original virtual machine, establishing a back-end resource which meets the parameter requirement of the original virtual machine on the source physical machine.
5. The method for rollback after virtual machine live migration fault according to claim 1, wherein said back-end resources comprise:
a storage and network driver corresponding to the virtual machine.
6. The method according to claim 1, wherein the starting the new virtual machine comprises:
resuming operation of the virtual processor;
enable a timer and interrupt;
triggering power management of the virtual machine equipment;
and the scanning equipment triggers the front end and the rear end to be connected.
7. The method for virtual machine live migration post-failure rollback according to claim 1, wherein the method is applied to virtual machine live migration post-failure rollback in a Xen virtualization architecture.
8. An apparatus for rollback after a virtual machine live migration failure, comprising:
the virtual machine live migration fault generation stage judgment unit is used for judging the generation stage of the virtual machine live migration fault, and if the virtual machine live migration fault occurs in the stage of virtual machine operation suspension, the following units are triggered;
the original virtual machine running state data storage unit is used for storing the running state data of the original virtual machine at one side of the source physical machine;
the system comprises an original virtual machine and a corresponding back-end resource cleaning unit, wherein the original virtual machine and the corresponding back-end resource cleaning unit are used for cleaning the original virtual machine and the corresponding back-end resource, and the back-end resource is a driving program which can call a local device driver according to request information of the virtual machine in the virtualization operation so as to achieve the purpose of accessing real hardware of a physical machine;
the new virtual machine and the corresponding back-end resource creating unit are used for creating the new virtual machine and the corresponding back-end resource according to the stored running state data of the original virtual machine;
the new virtual machine starting unit is used for starting the new virtual machine;
after the new virtual machine and its corresponding backend resource are created, and before the new virtual machine is started, the method includes:
and modifying the state identifier of the control structure of the virtual machine to indicate that the virtual machine needs to trigger power-on management.
9. A system for virtual machine rollback after a live migration failure, comprising:
a processor;
a system control unit;
a system memory;
non-volatile memory or storage;
a network interface;
an input/output (I/O) device;
a program of instructions;
the system memory and the non-volatile memory or storage device store a temporary copy and a persistent copy of the program of instructions, respectively; the system performs the method of any of claims 1-7 when the program of instructions is executed by the processor.
CN201710630142.XA 2017-07-28 2017-07-28 Method, device and system for rollback after virtual machine live migration fault Active CN109308232B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710630142.XA CN109308232B (en) 2017-07-28 2017-07-28 Method, device and system for rollback after virtual machine live migration fault

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710630142.XA CN109308232B (en) 2017-07-28 2017-07-28 Method, device and system for rollback after virtual machine live migration fault

Publications (2)

Publication Number Publication Date
CN109308232A CN109308232A (en) 2019-02-05
CN109308232B true CN109308232B (en) 2022-09-06

Family

ID=65204922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710630142.XA Active CN109308232B (en) 2017-07-28 2017-07-28 Method, device and system for rollback after virtual machine live migration fault

Country Status (1)

Country Link
CN (1) CN109308232B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231064A (en) * 2020-10-23 2021-01-15 苏州浪潮智能科技有限公司 Dynamic fault tolerance method, system, device and storage medium for virtual machine migration
CN114884836A (en) * 2022-04-28 2022-08-09 济南浪潮数据技术有限公司 High-availability method, device and medium for virtual machine

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569881A (en) * 2015-10-09 2017-04-19 中国石油化工股份有限公司 Data migration method and system based on KVM (Kernel-based Virtual Machine)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7743126B2 (en) * 2001-06-28 2010-06-22 Hewlett-Packard Development Company, L.P. Migrating recovery modules in a distributed computing environment
CN102193813A (en) * 2010-03-09 2011-09-21 上海拜翰网络科技有限公司 Embedded type virtualized quick start method and system
CN104424015B (en) * 2013-09-11 2018-10-09 华为技术有限公司 A kind of virtual machine management method and device
WO2016121830A1 (en) * 2015-01-28 2016-08-04 日本電気株式会社 Virtual network function management device, system, healing method, and program
CN106549783A (en) * 2015-09-18 2017-03-29 中兴通讯股份有限公司 Virtual-machine fail treating method and apparatus
CN106533769B (en) * 2016-11-24 2019-12-13 华为技术有限公司 fault recovery method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569881A (en) * 2015-10-09 2017-04-19 中国石油化工股份有限公司 Data migration method and system based on KVM (Kernel-based Virtual Machine)

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Trusted Virtual Machine in an Untrusted Management Environment;Chunxiao Li等;《IEEE TRANSACTIONS ON SERVICES COMPUTING》;20110623;第5卷(第4期);第472-483页 *
虚拟化云计算平台的能耗管理;叶可江等;《计算机学报》;20120615;第35卷(第6期);第1262-1285页 *

Also Published As

Publication number Publication date
CN109308232A (en) 2019-02-05

Similar Documents

Publication Publication Date Title
US9823877B2 (en) Virtual machine backup from storage snapshot
US9959177B2 (en) Backing up virtual machines
US8635395B2 (en) Method of suspending and resuming virtual machines
US8694828B2 (en) Using virtual machine cloning to create a backup virtual machine in a fault tolerant system
US9052935B1 (en) Systems and methods for managing affinity rules in virtual-machine environments
US9092248B1 (en) Systems and methods for restoring distributed applications within virtual data centers
CN112199162B (en) Disk snapshot method, device and medium based on virtualized disk double-active disaster tolerance
CN104598294B (en) Efficient and safe virtualization method for mobile equipment and equipment thereof
US10789135B2 (en) Protection of infrastructure-as-a-service workloads in public cloud
US9600369B2 (en) Operating system recovery method and apparatus, and terminal device
Kadav et al. Live migration of direct-access devices
EP2800303A1 (en) Switch method, device and system for virtual application dual machine in cloud environment
WO2012149844A1 (en) Virtual machine memory snapshot generating and recovering method, device and system
CN102446119B (en) Virtual machine dynamical migration method based on Passthrough I/O device
KR101673299B1 (en) Operating system recovery method and apparatus, and terminal device
CN110704161B (en) Virtual machine creation method and device and computer equipment
WO2020063432A1 (en) Method and apparatus for upgrading virtualized emulator
CN106569876A (en) Direct device visiting model-based virtual machine migrating method and apparatus
CN112256397A (en) Virtual machine cross-platform migration method and system
JP2016110183A (en) Information processing system and control method thereof
CN112328365A (en) Virtual machine migration method, device, equipment and storage medium
EP3022647A1 (en) Systems and methods for instantly restoring virtual machines in high input/output load environments
US9557980B2 (en) Seamless application integration apparatus and method
WO2023184875A1 (en) Bare metal disk backup method and device, and computer-readable storage medium
CN109308232B (en) Method, device and system for rollback after virtual machine live migration fault

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230530

Address after: Room 1-2-A06, Yungu Park, No. 1008 Dengcai Street, Sandun Town, Xihu District, Hangzhou City, Zhejiang Province

Patentee after: Aliyun Computing Co.,Ltd.

Address before: Box 847, four, Grand Cayman capital, Cayman Islands, UK

Patentee before: ALIBABA GROUP HOLDING Ltd.

TR01 Transfer of patent right