CN108255576B - Virtual machine live migration exception handling method and device and storage medium - Google Patents

Virtual machine live migration exception handling method and device and storage medium Download PDF

Info

Publication number
CN108255576B
CN108255576B CN201711292113.3A CN201711292113A CN108255576B CN 108255576 B CN108255576 B CN 108255576B CN 201711292113 A CN201711292113 A CN 201711292113A CN 108255576 B CN108255576 B CN 108255576B
Authority
CN
China
Prior art keywords
virtual machine
computing node
live migration
state
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711292113.3A
Other languages
Chinese (zh)
Other versions
CN108255576A (en
Inventor
任苗健
吴开剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weikun Shanghai Technology Service Co Ltd
Original Assignee
Weikun Shanghai Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weikun Shanghai Technology Service Co Ltd filed Critical Weikun Shanghai Technology Service Co Ltd
Priority to CN201711292113.3A priority Critical patent/CN108255576B/en
Publication of CN108255576A publication Critical patent/CN108255576A/en
Application granted granted Critical
Publication of CN108255576B publication Critical patent/CN108255576B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Retry When Errors Occur (AREA)

Abstract

The invention relates to a virtual machine live migration exception handling method, a virtual machine live migration exception handling device, computer equipment and a storage medium, wherein the method comprises the following steps: when the live migration process of the virtual machine is abnormal, modifying the running state of the virtual machine into an activated state; initiating a virtual machine restart instruction; the virtual machine restarting instruction is used for restarting the virtual machine in an activated state; when the virtual machine is failed to restart and the running state of the virtual machine is an abnormal state, restarting the virtual machine restarting instruction after repairing according to a corresponding abnormal log; and when the virtual machine is restarted successfully, continuing the live migration process of the virtual machine. The scheme provided by the application can realize the recovery of the exception in the live migration process of the virtual machine, and reduce the unavailable time of the virtual machine.

Description

Virtual machine live migration exception handling method and device and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a virtual machine live migration exception handling method and apparatus, a computer device, and a storage medium.
Background
The live migration refers to migration of a virtual machine between different physical hosts while ensuring normal operation of the virtual machine. The successful migration process is that the overall migration time and the unavailable time are both short, and the performance of the virtual machine running on the migrated host is affected less. The live migration process involves copying of a disk, and reading and writing of a virtual machine are frequent, so that abnormal situations of live migration can occur.
The traditional method mainly takes precautions as the main to avoid the abnormal situation of the hot migration, and relies on the system backup and recovery technology to backup the states of an operating system and an application program on a source virtual machine in real time, then connect a storage medium to a target host computer and rebuild the virtual machine on the target host computer, but the migration mode causes the unavailable time of the virtual machine in the migration process to be longer, thereby causing the data loss of the virtual machine.
Disclosure of Invention
Therefore, it is necessary to provide a virtual machine live migration exception handling method, an apparatus, a computer device, and a storage medium, for solving the problem that the existing manner for preventing the occurrence of the exception in the live migration may result in a long virtual machine unavailable time.
A virtual machine live migration exception handling method, the method comprising:
when the hot migration process of the virtual machine is abnormal, then
Modifying the running state of the virtual machine into an activated state;
initiating a virtual machine restart instruction; the virtual machine restarting instruction is used for restarting the virtual machine in an activated state;
when the virtual machine is failed to restart and the running state of the virtual machine is an abnormal state, restarting the virtual machine restarting instruction after repairing according to a corresponding abnormal log;
and when the virtual machine is restarted successfully, continuing the live migration process of the virtual machine.
A virtual machine live migration exception handling apparatus, the apparatus comprising:
the modification module is used for modifying the running state of the virtual machine into an activated state when the live migration process of the virtual machine is abnormal;
the restarting module is used for initiating a virtual machine restarting instruction; the virtual machine restarting instruction is used for restarting the virtual machine in an activated state; when the virtual machine is failed to restart and the running state of the virtual machine is an abnormal state, restarting the virtual machine restarting instruction after repairing according to a corresponding abnormal log;
and the live migration module is used for continuing the live migration process of the virtual machine when the virtual machine is restarted successfully.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the virtual machine live migration exception handling method provided in any embodiment of the present invention when executing the computer program.
One or more storage media storing a computer program, which when executed by a processor, causes the processor to perform the steps of the virtual machine live migration exception handling method provided in any embodiment of the present invention.
According to the virtual machine live migration exception handling method, the virtual machine live migration exception handling device, the computer equipment and the storage medium, when the live migration process of the virtual machine is abnormal, the running state of the virtual machine is modified into the activated state, and at the moment, a restart instruction can be sent to the virtual machine so as to restart the virtual machine in the activated state; when the virtual machine is failed to restart and the virtual machine has an error report, the virtual machine can be restored according to the corresponding abnormal log, and then a virtual machine restart instruction is initiated again, so that the virtual machine can be restarted successfully, the correctness of the state of the virtual machine is verified, the live migration process can be continued, the unavailable time of the virtual machine is shortened, and the database abnormality can be quickly recovered.
Drawings
FIG. 1 is a diagram of an application environment of a virtual machine live migration exception handling method in one embodiment;
FIG. 2 is a flowchart illustrating a virtual machine live migration exception handling method according to an embodiment;
FIG. 3 is a flowchart illustrating a virtual machine live migration exception handling method according to another embodiment;
FIG. 4 is a flowchart illustrating a virtual machine live migration exception handling method according to yet another embodiment;
FIG. 5 is a flowchart illustrating a virtual machine live migration exception handling method according to an embodiment;
FIG. 6 is a block diagram of an apparatus for handling a virtual machine live migration exception in one embodiment;
FIG. 7 is a block diagram showing an example of the structure of a virtual machine live migration exception handling apparatus according to another embodiment;
FIG. 8 is a block diagram showing a configuration of a virtual machine live migration exception handling apparatus according to still another embodiment;
FIG. 9 is a block diagram that illustrates a live migration module in the virtual machine live migration exception handling apparatus, according to an embodiment;
FIG. 10 is a block diagram showing a configuration of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
FIG. 1 is a diagram of an application environment of a virtual machine live migration exception handling method in an embodiment. Referring to fig. 1, the virtual machine live migration exception handling method is applied to a virtual machine live migration exception handling system. The virtual machine live migration exception handling system comprises a control node 110, a source computing node 120 and a target computing node 130 in an OpenStack environment, wherein the control node 110 is connected with the source computing node 120 and the target computing node 130 through a network. The source computing node 120 is connected to the target computing node 130 over a network. The control node 110 is used to schedule the OpenStack live migration process. OpenStack live migration is the migration of a virtual machine located on a source compute node 120 to a target compute node 130. The source computing node 120 and the target computing node 130 are each located on different physical hosts. For example, the IP corresponding to the physical host where the control node 110 is located is 192.168.3.10, the IP corresponding to the physical host where the source computing node is located is 192.168.3.12, and the IP corresponding to the physical host where the target computing node is located is 192.168.3.14. Each node can exchange data with another node through the IP address.
As shown in fig. 2, in an embodiment, a virtual machine live migration exception handling method is provided, which is exemplified by applying the method to the control node 110 in fig. 1, and the method specifically includes the following steps:
s202, when the hot migration process of the virtual machine is abnormal, the running state of the virtual machine is modified into an activated state.
Wherein the virtual machine is a virtual machine in an OpenStack environment. The live migration process of a virtual machine is a process of migrating a virtual machine from a source compute node to a target compute node. For example, if the live migration process of the virtual machine is abnormal, the virtual machine may be in a migration state (IMIGRATING), but only a part of the disk file of the source computing node is migrated to the target computing node, that is, the live migration process is stuck for a long time.
The running state of the virtual machine in the process of the hot migration comprises ACTIVE, identification or ERROR and the like. ACTIVE, i.e. ACTIVE state, represents that the virtual machine is operating normally during the live migration. IMIGRATING is the virtual machine in a migrated state. ERROR is that the virtual machine is in an abnormal state when an ERROR occurs in the migration process of the virtual machine. And in the migration state of the virtual machine, the control node cannot control the virtual machine to restart or shut down.
Specifically, the running state of the virtual machine in the OpenStack environment is stored in an instance table of a Nova database, and when the control node finds that the live migration process of the virtual machine is abnormal, the running state of the virtual machine in the database is modified into an active state, so that the virtual machine can be restarted.
In one embodiment, the control node may list the tenant of the source computing node, modify the running state of the virtual machine to be migrated into an active state according to the virtual machine identifier after querying the virtual machine to be migrated and the virtual machine identifier corresponding to the virtual machine identifier, and check the states of all tenants again after activation, so as to determine that the running state of the virtual machine to be migrated has been modified.
For example, the control node finds out the tenants of all source computing nodes by executing nova list-all-tenants; when the virtual machine to be migrated and the corresponding virtual machine identifier are 0001-.
S204, initiating a virtual machine restart instruction; the virtual machine restart instruction is used for restarting the virtual machine in the activated state.
The virtual machine restarting instruction is used for restarting the virtual machine in the activated state after attempting normal shutdown. The restart to the virtual machine here is a "soft restart". Specifically, when the control node finds that the live migration process of the virtual machine is abnormal, it needs to first find out whether the virtual machine to be migrated is in IMIGRATING, PENDING, PAUSING, SUSPENDIN or other states, and if so, it needs to first modify the running state of the virtual machine in the database to an active state, so as to execute a restart instruction on the virtual machine. For example, the control node may specifically restart the virtual machine identified by the virtual machine 0001-.
And S206, when the virtual machine is failed to restart and the running state of the virtual machine is an abnormal state, restarting the virtual machine restarting instruction after repairing according to the corresponding abnormal log.
When the virtual machine fails to be restarted and the running state of the virtual machine is an abnormal state, the running state of the virtual machine inquired from the database by the control node is ERROR. The exception log is log information corresponding to the virtual machine being in the ERROR state.
Specifically, when the control node finds that the virtual machine is failed to restart, the control node can locate the reason of the abnormality by finding a log corresponding to the abnormal state of the virtual machine, and after the located abnormality is repaired in the OpenStack database, the control node initiates a virtual machine restart instruction again. The control node can specifically obtain an exception log corresponding to the exception state of the virtual machine by executing/var/lib/nova/nova-api.
In one embodiment, the control node may locate the reason for the abnormal operation state of the virtual machine according to the key words in the abnormal log. Since the log format of OpenStack is uniform, the following is:
< timestamp > < log level > < code Module > < Request ID > < log content > < Source code location >
Wherein the log level includes: INFO, WARNING, ERROR, DEBUG, etc. The control node can locate the abnormal log through the preset keywords in the log level; after the abnormal log is positioned, the reason of the abnormality is determined according to the log content in the log, so that the control node can repair the state information of the virtual machine in the database in a targeted manner.
And S208, when the virtual machine is restarted successfully, continuing the live migration process of the virtual machine.
Specifically, if the virtual machine can be restarted successfully after the restart instruction is executed on the virtual machine, and the control node does not receive any error log, the correctness of the state information of the virtual machine in the OpenStack database is verified, and the control node can continue the live migration process on the virtual machine. In particular, the control node may continue the live-migration process by performing a novalive-migration.
According to the virtual machine live migration exception handling method, when the live migration process of the virtual machine is abnormal, the running state of the virtual machine is modified into the activated state, and at the moment, a restart instruction can be sent to the virtual machine so as to restart the virtual machine in the activated state; when the virtual machine is failed to restart and the virtual machine has an error report, the virtual machine can be restored according to the corresponding abnormal log, and then a virtual machine restart instruction is initiated again, so that the virtual machine can be restarted successfully, the correctness of the state of the virtual machine is verified, the live migration process can be continued, the unavailable time of the virtual machine is shortened, and the database abnormality can be quickly recovered.
In one embodiment, as shown in fig. 3, the virtual machine live migration exception handling method further includes the following steps:
s302, acquiring the live migration progress state information generated in the live migration process of the virtual machine.
The information of the hot migration progress state includes the total amount of data being processed, the total amount of remaining data, the total amount of processed memory, the total amount of remaining memory, and the like in the hot migration process. Specifically, when the control node finds that a long time is stuck in the virtual machine live migration process, that is, the live migration progress state information is stagnated, the control node may query the corresponding virtual machine live migration progress state information by executing a live migration progress check instruction. Specifically, the control node may check the virtual machine live migration progress state information corresponding to the virtual machine identifier 0001-.
S304, judging whether the thermal migration process is abnormal or not according to the thermal migration progress state information.
Specifically, the control node judges whether the live migration process is abnormal or not according to the observed live migration progress state information of the virtual machine. For example, when it is found that the total amount of data being processed and the total amount of remaining data do not change in the content of the preset time period, it may be determined that an exception occurs in the live migration process. Or, when the control node queries that the total amount of the remaining data is empty, but the target node still does not completely receive the corresponding data, it may be determined that an exception occurs in the hot migration process.
In this embodiment, the control node may determine whether the live migration process is abnormal according to the live migration progress state information in the live migration process of the virtual machine, restart the virtual machine after modifying the running state after determining that the abnormal process occurs, obtain a corresponding abnormal log, modify the state information in the database according to the abnormal log, and continue the live migration process after successfully restarting.
In one embodiment, the virtual machine live migration exception handling method further includes the following steps: after the virtual machine restart instruction is restarted, when the virtual machine is failed to restart and the running state of the virtual machine is in an abnormal state, the step of modifying the running state of the virtual machine into an activated state is returned until the virtual machine is successfully restarted.
Specifically, when the virtual machine fails to restart and the running state of the virtual machine is an abnormal state, it indicates that there is still wrong state information in the database, and the error information needs to be located and then repaired. And when the control node receives the prompt that the running state of the virtual machine is the abnormal state, returning to the step of modifying the running state of the virtual machine into the activated state, relocating the abnormal log, and repairing the wrong state information in the database until the virtual machine can be restarted successfully.
In this embodiment, the control node may continuously modify the running state of the virtual machine in the database to restart the virtual machine, and locate the reason of the abnormality according to the abnormal log after the restart, and repair the abnormality until the virtual machine can be restarted successfully.
In one embodiment, as shown in fig. 4, before step S202, the virtual machine live migration exception handling method further includes the following steps:
s402, sending a first backup instruction to a source computing node where a virtual machine is located; the first backup instruction is used for instructing the source computing node to backup the file of the virtual machine on the source computing node and copy the file to the target computing node to which the virtual machine is migrated.
The first backup instruction is an instruction used for instructing the source computing node to backup the file of the virtual machine at the source computing node. The file of the virtual machine at the source computing node comprises log information of the virtual machine when the virtual machine is started, a disk file of the virtual machine, a configuration file of the virtual machine and the like.
Specifically, in order to avoid the loss of the data of the virtual machine due to an unrecoverable error occurring in the abnormal processing process of the virtual machine in the live migration process, before the running state of the virtual machine in the database is modified to the activated state, the control node sends a first backup instruction to the source computing node where the virtual machine is located, so as to backup the corresponding file of the virtual machine on the source computing node, and after the backup, the control node copies the file of the virtual machine to the target computing node. The control node may copy the files of the virtual machine to the target compute node by executing the SCP instructions.
S404, sending a second backup instruction to the target computing node to which the virtual machine is migrated; the second backup instruction is used for instructing the target computing node to backup the file of the virtual machine at the target computing node.
And the second backup instruction is used for instructing the target computing node to backup the file of the virtual machine at the target computing node. Files of the OpenStack virtual machine are stored in a/var/lib/nova/instances/directory of a computing node, and when the computing node is down, the virtual machine has an unrecoverable error. In order to avoid such situations, the control node sends a second backup instruction to the target computing node where the virtual machine is located, so as to backup the file of the virtual machine at the target computing node.
In this embodiment, the running state of the virtual machine is modified in the database after the file of the virtual machine on the computing node is backed up, so that it can be further ensured that the data of the virtual machine is not lost.
In one embodiment, the virtual machine live migration process in the virtual machine live migration exception handling method includes: querying currently available computing nodes; determining a target computing node from the queried computing nodes; acquiring available storage resources of a target computing node; dividing a corresponding storage area from available storage resources according to a storage area dividing mode of the storage resources on a source computing node where a virtual machine is located; and respectively migrating the data in each storage area on the source computing node where the virtual machine is located to the corresponding storage area on the target computing node.
The available computing nodes are computing nodes with load amounts within a preset threshold range. For example, a computing node with a load within 50% is an available computing node.
In one embodiment, the control node may obtain currently available computing nodes by querying load information of each computing node, and then detect a matching degree between a source computing node and the available computing nodes through a node resource detection function, and use the available computing nodes capable of carrying the running of the virtual machine to be migrated as target computing nodes.
In one embodiment, the control node may obtain the storable resource of the target computing node through the node resource detection function, and partition the corresponding storage area from the available storage resource of the target computing node according to the storage area partition manner of the storage resource on the source computing node where the virtual machine is located. For example, the storage areas occupied by the system files and the data files of the source computing node where the virtual machine is located may be divided into a target system storage area and a target data storage area. And finally, executing a live migration instruction on the virtual machine by the control node to migrate the system files and the data files on the source computing node to the target system storage area and the target data storage area on the target computing node.
In the above embodiment, in the process of live migration of a virtual machine, after a target computing node is determined, and a corresponding storage area is partitioned from an available storage area of the target computing node, the probability of occurrence of an exception in the process of live migration can be effectively reduced.
In one embodiment, in the process of live migration of a virtual machine, the step of migrating data in each storage area on a source computing node where the virtual machine is located to a corresponding storage area on a target computing node respectively specifically includes: respectively traversing the storage blocks in the storage areas on the source computing node where the virtual machine is located; when the currently traversed storage block does not have the data of the virtual machine, skipping the currently traversed storage block to continue traversing; and when the data of the virtual machine exists in the currently traversed storage block, migrating the data in the currently traversed storage block to a corresponding storage area on the target computing node.
In the process of virtual machine live migration, data in each storage block of a storage area corresponding to a source computing node where a virtual machine is located needs to be migrated, and when a virtual machine created for a user is a newly-created virtual machine, the data volume of the storage area corresponding to the newly-created virtual machine is not large, and many storage blocks in the corresponding storage area do not store user data.
In one embodiment, the control node may respectively traverse the storage blocks of the storage areas on the source computing node where the virtual machine is located, and when the currently traversed storage block does not have the data of the virtual machine, the control node does not perform migration processing on the data in the storage block, and skips over the storage block.
In one embodiment, the control node may need to migrate the data in the current storage block to a corresponding storage area on the target computing node when traversing to the current storage block to store the data of the virtual machine.
In the embodiment, the storage blocks of the storage areas on the source computing node where the virtual machine is located are traversed, so that the problem that the unavailable time of the virtual machine is prolonged due to the fact that the storage blocks which do not store user data are migrated can be solved.
In a specific embodiment, as shown in fig. 5, the virtual machine live migration exception handling method specifically includes the following steps:
s501, acquiring hot migration progress state information generated in the hot migration process of the virtual machine.
And S502, judging whether the thermal migration process is abnormal or not according to the thermal migration progress state information.
S503, sending a first backup instruction to a source computing node where the virtual machine is located; the first backup instruction is used for instructing the source computing node to backup the file of the virtual machine on the source computing node and copy the file to the target computing node to which the virtual machine is migrated.
S504, sending a second backup instruction to the target computing node to which the virtual machine is migrated; the second backup instruction is used for instructing the target computing node to backup the file of the virtual machine at the target computing node.
And S505, when the live migration process of the virtual machine is abnormal, modifying the running state of the virtual machine into an activated state.
S506, initiating a virtual machine restart instruction; the virtual machine restart instruction is used for restarting the virtual machine in the activated state.
And S507, when the virtual machine is failed to restart and the running state of the virtual machine is an abnormal state, restarting the virtual machine restart instruction after repairing according to the corresponding abnormal log.
S508, when the virtual machine is restarted successfully, the live migration process of the virtual machine is continued.
And S509, after the virtual machine restart instruction is restarted, when the virtual machine restart fails and the running state of the virtual machine is in an abnormal state after the virtual machine restart instruction is restarted, returning to the step of modifying the running state of the virtual machine into an activated state until the virtual machine restart is successful.
According to the virtual machine live migration exception handling method, when the live migration process of the virtual machine is abnormal, the running state of the virtual machine is modified into the activated state, and at the moment, a restart instruction can be sent to the virtual machine so as to restart the virtual machine in the activated state; when the virtual machine is failed to restart and the virtual machine has an error report, the virtual machine can be restored according to the corresponding abnormal log, and then a virtual machine restart instruction is initiated again, so that the virtual machine can be restarted successfully, the correctness of the state of the virtual machine is verified, the live migration process can be continued, the unavailable time of the virtual machine is shortened, and the database abnormality can be quickly recovered.
Fig. 5 is a flowchart illustrating a virtual machine live migration exception handling method according to an embodiment. It should be understood that, although the steps in the flowchart of fig. 5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 5 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
As shown in FIG. 6, in one embodiment, a virtual machine live migration exception handling apparatus 600 is shown. Referring to fig. 6, the virtual machine live migration exception handling apparatus 600 includes a modification module 602, a restart module 604, and a live migration module 606.
The modifying module 602 is configured to modify an operating state of the virtual machine to an active state when an exception occurs in a live migration process of the virtual machine.
A restart module 604, configured to initiate a virtual machine restart instruction; the virtual machine restarting instruction is used for restarting the virtual machine in the activated state; the restarting module is also used for restarting the virtual machine after repairing according to the corresponding abnormal log when the virtual machine fails to restart and the running state of the virtual machine is in an abnormal state.
And a live migration module 606, configured to continue a live migration process of the virtual machine when the virtual machine is successfully restarted.
When the live migration process of the virtual machine is abnormal, the virtual machine live migration abnormal processing device modifies the running state of the virtual machine into an activated state, and at the moment, a restart instruction can be sent to the virtual machine so as to restart the virtual machine in the activated state; when the virtual machine is failed to restart and the virtual machine has an error report, the virtual machine can be restored according to the corresponding abnormal log, and then a virtual machine restart instruction is initiated again, so that the virtual machine can be restarted successfully, the correctness of the state of the virtual machine is verified, the live migration process can be continued, the unavailable time of the virtual machine is shortened, and the database abnormality can be quickly recovered.
As shown in fig. 7, in one embodiment, the virtual machine live migration exception handling apparatus 600 further includes: an acquisition module 702 and a determination module 704.
An obtaining module 702 is configured to obtain live migration progress status information generated in a live migration process of a virtual machine.
The determining module 704 is configured to determine whether a live migration process is abnormal according to the live migration progress status information.
In this embodiment, the control node may determine whether the live migration process is abnormal according to the live migration progress state information in the live migration process of the virtual machine, restart the virtual machine after modifying the running state after determining that the abnormal process occurs, obtain a corresponding abnormal log, modify the state information in the database according to the abnormal log, and continue the live migration process after successfully restarting.
In an embodiment, the restart module 604 in the apparatus 600 for processing a live migration exception of a virtual machine is further configured to, after the virtual machine restart instruction is reinitiated, when the virtual machine is failed to restart and the running state of the virtual machine is in an abnormal state, return to the step of modifying the running state of the virtual machine to an active state until the virtual machine is successfully restarted.
In this embodiment, the control node may continuously modify the running state of the virtual machine in the database to restart the virtual machine, and locate the reason of the abnormality according to the abnormal log after the restart, and repair the abnormality until the virtual machine can be restarted successfully.
As shown in FIG. 8, in one embodiment, virtual machine live migration exception handling apparatus 600 further comprises a first backup module 802, a copy module 804, and a second backup module 806.
A first backup module 802, configured to send a first backup instruction to a source computing node where a virtual machine is located; the first backup instruction is used for instructing the source computing node to backup the file of the virtual machine at the source computing node.
The copy module 804 is configured to copy the file to the target computing node to which the virtual machine is migrated.
A second backup module 806, configured to send a second backup instruction to the target computing node to which the virtual machine is migrated; the second backup instruction is used for instructing the target computing node to backup the file of the virtual machine at the target computing node.
In this embodiment, the running state of the virtual machine is modified in the database after the file of the virtual machine on the computing node is backed up, so that it can be further ensured that the data of the virtual machine is not lost.
As shown in fig. 9, in one embodiment, the live migration module 606 of the virtual machine live migration exception handling apparatus 600 specifically includes a query module 902, a target compute node determination module 904, a storage partitioning module 906, and a migration module 908.
And a query module 902, configured to query currently available computing nodes.
And a target computing node determining module 904, configured to determine a target computing node from the queried computing nodes.
A storage area partitioning module 906, configured to obtain available storage resources of the target computing node; and dividing corresponding storage areas from the available storage resources according to the storage area dividing mode of the storage resources on the source computing node where the virtual machine is located.
The migration module 908 is configured to migrate data in each storage area on the source computing node where the virtual machine is located to a corresponding storage area on the target computing node.
In this embodiment, in the process of live migration of a virtual machine, after a target computing node is determined, and a corresponding storage area is partitioned from an available storage area of the target computing node, the probability of occurrence of an exception in the process of live migration can be effectively reduced.
In one embodiment, the migration module 908 is further configured to traverse storage blocks in storage areas on the source computing node where the virtual machine is located; when the currently traversed storage block does not have the data of the virtual machine, skipping the currently traversed storage block to continue traversing; and when the data of the virtual machine exists in the currently traversed storage block, migrating the data in the currently traversed storage block to a corresponding storage area on the target computing node.
In this embodiment, the problem that the unavailable time of the virtual machine is prolonged due to the fact that the storage blocks which do not store the user data are migrated can be solved by traversing the storage blocks of the storage areas on the source computing node where the virtual machine is located.
FIG. 10 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be the control node 110 in fig. 1. As shown in fig. 10, the computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the control node 110 is configured to provide computational and control capabilities to support the operation of the entire control node 110. The memory of the computer device includes a non-volatile storage medium and an internal memory. The nonvolatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is for network communication with a source computing node and a target computing node. When executed by a processor, the computer program may cause the processor to implement the virtual machine live migration exception handling method provided in the embodiments and applicable to the control node 110.
Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the virtual machine live migration exception handling apparatus provided in the present application may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in fig. 10. The memory of the computer device may store various program modules constituting the virtual machine live migration exception handling apparatus, such as the modification module 602, the restart module 604, and the live migration module 606 shown in fig. 6. The computer program constituted by the program modules causes the processor to execute the steps in the virtual machine live migration exception handling method according to the embodiments of the present application described in the present specification.
For example, the computer device shown in fig. 10 may execute, by the modification module 602 in the virtual machine live migration exception handling apparatus shown in fig. 6, the step of modifying the running state of the virtual machine to the active state when an exception occurs in the live migration process of the virtual machine. The computer device may execute the initiate virtual machine restart instruction via the restart module 604; the virtual machine restarting instruction is used for restarting the virtual machine in the activated state; the restarting module is also used for restarting the virtual machine after repairing according to the corresponding abnormal log when the virtual machine fails to restart and the running state of the virtual machine is in an abnormal state. The computer device may perform the step of continuing the live migration process of the virtual machine when the virtual machine is successfully restarted via the live migration module 606.
The embodiment of the invention provides computer equipment, which comprises a series of computer programs stored on a memory, wherein when the computer programs are executed by a processor, the insurance policy distribution task distribution method provided by the embodiments of the invention can be realized. In some embodiments, based on the particular operations implemented by the various parts of the computer program.
In one embodiment, a computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program: when the live migration process of the virtual machine is abnormal, modifying the running state of the virtual machine into an activated state; initiating a virtual machine restart instruction; the virtual machine restarting instruction is used for restarting the virtual machine in the activated state; when the virtual machine is failed to restart and the running state of the virtual machine is an abnormal state, restarting a virtual machine restarting instruction after repairing according to a corresponding abnormal log; and when the virtual machine is restarted successfully, continuing the live migration process of the virtual machine.
In one embodiment, the computer program causes the processor to further perform the steps of: acquiring hot migration progress state information generated in the hot migration process of the virtual machine; and judging whether the thermal migration process is abnormal or not according to the thermal migration progress state information.
In one embodiment, the computer program causes the processor to further perform the steps of: after the virtual machine restart instruction is restarted, when the virtual machine is failed to restart and the running state of the virtual machine is in an abnormal state, the step of modifying the running state of the virtual machine into an activated state is returned until the virtual machine is successfully restarted.
In one embodiment, the computer program causes the processor to, prior to performing the step of modifying the operating state of the virtual machine to an active state, further perform the steps of: sending a first backup instruction to a source computing node where a virtual machine is located; the first backup instruction is used for indicating a source computing node to backup a file of a virtual machine on the source computing node and copying the file to a target computing node to which the virtual machine is migrated; sending a second backup instruction to the target computing node to which the virtual machine is migrated; the second backup instruction is used for instructing the target computing node to backup the file of the virtual machine at the target computing node.
In one embodiment, a live migration process for a virtual machine includes: querying currently available computing nodes; determining a target computing node from the queried computing nodes; acquiring available storage resources of a target computing node; dividing a corresponding storage area from available storage resources according to a storage area dividing mode of the storage resources on a source computing node where a virtual machine is located; and respectively migrating the data in each storage area on the source computing node where the virtual machine is located to the corresponding storage area on the target computing node.
In one embodiment, the computer program causes the processor to specifically execute the following steps when executing the step of migrating data in each storage area on the source computing node where the virtual machine is located to a corresponding storage area on the target computing node: respectively traversing the storage blocks in the storage areas on the source computing node where the virtual machine is located; when the currently traversed storage block does not have the data of the virtual machine, skipping the currently traversed storage block to continue traversing; and when the data of the virtual machine exists in the currently traversed storage block, migrating the data in the currently traversed storage block to a corresponding storage area on the target computing node.
When the live migration process of the virtual machine is abnormal, the computer equipment modifies the running state of the virtual machine into an activated state, and at the moment, a restart instruction can be sent to the virtual machine so as to restart the virtual machine in the activated state; when the virtual machine is failed to restart and the virtual machine has an error report, the virtual machine can be restored according to the corresponding abnormal log, and then a virtual machine restart instruction is initiated again, so that the virtual machine can be restarted successfully, the correctness of the state of the virtual machine is verified, the live migration process can be continued, the unavailable time of the virtual machine is shortened, and the database abnormality can be quickly recovered.
One or more storage media storing a computer program that, when executed by a processor, causes the processor to perform the steps of: when the live migration process of the virtual machine is abnormal, modifying the running state of the virtual machine into an activated state; initiating a virtual machine restart instruction; the virtual machine restarting instruction is used for restarting the virtual machine in the activated state; when the virtual machine is failed to restart and the running state of the virtual machine is an abnormal state, restarting a virtual machine restarting instruction after repairing according to a corresponding abnormal log; and when the virtual machine is restarted successfully, continuing the live migration process of the virtual machine.
In one embodiment, the computer program causes the processor to further perform the steps of: acquiring hot migration progress state information generated in the hot migration process of the virtual machine; and judging whether the thermal migration process is abnormal or not according to the thermal migration progress state information.
In one embodiment, the computer program causes the processor to further perform the steps of: after the virtual machine restart instruction is restarted, when the virtual machine is failed to restart and the running state of the virtual machine is in an abnormal state, the step of modifying the running state of the virtual machine into an activated state is returned until the virtual machine is successfully restarted.
In one embodiment, the computer program causes the processor to, prior to performing the step of modifying the operating state of the virtual machine to an active state, further perform the steps of: sending a first backup instruction to a source computing node where a virtual machine is located; the first backup instruction is used for indicating a source computing node to backup a file of a virtual machine on the source computing node and copying the file to a target computing node to which the virtual machine is migrated; sending a second backup instruction to the target computing node to which the virtual machine is migrated; the second backup instruction is used for instructing the target computing node to backup the file of the virtual machine at the target computing node.
In one embodiment, a live migration process for a virtual machine includes: querying currently available computing nodes; determining a target computing node from the queried computing nodes; acquiring available storage resources of a target computing node; dividing a corresponding storage area from available storage resources according to a storage area dividing mode of the storage resources on a source computing node where a virtual machine is located; and respectively migrating the data in each storage area on the source computing node where the virtual machine is located to the corresponding storage area on the target computing node.
In one embodiment, the computer program causes the processor to specifically execute the following steps when executing the step of migrating data in each storage area on the source computing node where the virtual machine is located to a corresponding storage area on the target computing node: respectively traversing the storage blocks in the storage areas on the source computing node where the virtual machine is located; when the currently traversed storage block does not have the data of the virtual machine, skipping the currently traversed storage block to continue traversing; and when the data of the virtual machine exists in the currently traversed storage block, migrating the data in the currently traversed storage block to a corresponding storage area on the target computing node.
When the live migration process of the virtual machine is abnormal, the computer storage medium modifies the running state of the virtual machine into an activated state, and at the moment, a restart instruction can be sent to the virtual machine so as to restart the virtual machine in the activated state; when the virtual machine is failed to restart and the virtual machine has an error report, the virtual machine can be restored according to the corresponding abnormal log, and then a virtual machine restart instruction is initiated again, so that the virtual machine can be restarted successfully, the correctness of the state of the virtual machine is verified, the live migration process can be continued, the unavailable time of the virtual machine is shortened, and the database abnormality can be quickly recovered.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, or a Read-Only Memory (ROM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A virtual machine hot migration exception handling method comprises the following steps:
acquiring hot migration progress state information generated in the hot migration process of the virtual machine;
judging whether the thermal migration process is abnormal or not according to the thermal migration progress state information within a preset time period;
when the live migration process of the virtual machine is abnormal, modifying the running state of the virtual machine into an activated state;
initiating a virtual machine restart instruction; the virtual machine restarting instruction is used for restarting the virtual machine in an activated state; when the virtual machine is failed to restart and the running state of the virtual machine is an abnormal state, positioning the reason why the running state of the virtual machine is abnormal according to keywords in an abnormal log, repairing the abnormality and then restarting the virtual machine;
and when the virtual machine is restarted successfully, continuing the live migration process of the virtual machine.
2. The method of claim 1, further comprising:
after the virtual machine restarting instruction is restarted, when the virtual machine restarting fails and the running state of the virtual machine is in an abnormal state, returning to the step of modifying the running state of the virtual machine into an activated state until the virtual machine restarting succeeds.
3. The method of claim 1, wherein prior to modifying the operating state of the virtual machine to an active state, the method further comprises:
sending a first backup instruction to a source computing node where the virtual machine is located; the first backup instruction is used for instructing the source computing node to backup the file of the virtual machine on the source computing node and copy the file to a target computing node to which the virtual machine is migrated;
sending a second backup instruction to the target computing node to which the virtual machine is migrated; the second backup instruction is used for instructing the target computing node to backup the file of the virtual machine at the target computing node.
4. The method of claim 1, wherein the live migration process of the virtual machine comprises:
querying currently available computing nodes;
determining a target computing node from the queried computing nodes;
obtaining available storage resources of the target computing node;
dividing corresponding storage areas from the available storage resources according to the storage area division mode of the storage resources on the source computing node where the virtual machine is located;
and respectively migrating the data in each storage area on the source computing node where the virtual machine is located to the corresponding storage area on the target computing node.
5. The method according to claim 4, wherein the migrating the data in the storage areas of the source computing node where the virtual machine is located to the corresponding storage areas of the target computing node respectively comprises:
respectively traversing the storage blocks in the storage areas on the source computing node where the virtual machine is located;
when the data of the virtual machine does not exist in the currently traversed storage block, skipping the currently traversed storage block to continue traversing;
and when the data of the virtual machine exists in the currently traversed storage block, migrating the data in the currently traversed storage block to a corresponding storage area on the target computing node.
6. An apparatus for processing a virtual machine live migration exception, the apparatus comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring the live migration progress state information generated in the live migration process of the virtual machine;
the judging module is used for judging whether the thermal migration process is abnormal or not according to the thermal migration progress state information in a preset time period;
the modification module is used for modifying the running state of the virtual machine into an activated state when the live migration process of the virtual machine is abnormal;
the restarting module is used for initiating a virtual machine restarting instruction; the virtual machine restarting instruction is used for restarting the virtual machine in an activated state; when the virtual machine is failed to restart and the running state of the virtual machine is an abnormal state, positioning the reason why the running state of the virtual machine is abnormal according to keywords in an abnormal log, repairing the abnormality and then restarting the virtual machine;
and the live migration module is used for continuing the live migration process of the virtual machine when the virtual machine is restarted successfully.
7. The apparatus according to claim 6, wherein after the restart module re-initiates the virtual machine restart instruction, when the virtual machine restart fails and the running state of the virtual machine is an abnormal state, the modification module is further configured to modify the running state of the virtual machine to an active state until the virtual machine restart succeeds.
8. The apparatus of claim 6, wherein the thermomigration module further comprises a migration module to:
respectively traversing the storage blocks in the storage areas on the source computing node where the virtual machine is located; when the data of the virtual machine does not exist in the currently traversed storage block, skipping the currently traversed storage block to continue traversing; and when the data of the virtual machine exists in the currently traversed storage block, migrating the data in the currently traversed storage block to a corresponding storage area on the target computing node.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1 to 5 when executing the computer program.
10. One or more storage media storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method of any one of claims 1 to 5.
CN201711292113.3A 2017-12-08 2017-12-08 Virtual machine live migration exception handling method and device and storage medium Active CN108255576B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711292113.3A CN108255576B (en) 2017-12-08 2017-12-08 Virtual machine live migration exception handling method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711292113.3A CN108255576B (en) 2017-12-08 2017-12-08 Virtual machine live migration exception handling method and device and storage medium

Publications (2)

Publication Number Publication Date
CN108255576A CN108255576A (en) 2018-07-06
CN108255576B true CN108255576B (en) 2021-02-26

Family

ID=62722393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711292113.3A Active CN108255576B (en) 2017-12-08 2017-12-08 Virtual machine live migration exception handling method and device and storage medium

Country Status (1)

Country Link
CN (1) CN108255576B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110858851B (en) * 2018-08-24 2022-06-14 阿里巴巴集团控股有限公司 Data connection restarting method, data processing method and device in live broadcast service
CN110795211A (en) * 2019-10-25 2020-02-14 北京金山云网络技术有限公司 Method and device for upgrading configuration of virtual machine, electronic equipment and readable storage medium
CN111698131B (en) * 2020-06-10 2021-10-08 中国工商银行股份有限公司 Information processing method, information processing apparatus, electronic device, and medium
CN111708613B (en) * 2020-08-18 2020-12-11 广东睿江云计算股份有限公司 Method and system for repairing boot failure card task of VM virtual machine
CN114968469A (en) * 2021-02-23 2022-08-30 澜起电子科技(昆山)有限公司 Method and device for live migration of virtual machine

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103677967A (en) * 2012-09-03 2014-03-26 阿里巴巴集团控股有限公司 Remote data service system of data base and task scheduling method
CN106598594A (en) * 2016-12-14 2017-04-26 捷开通讯(深圳)有限公司 Test system and method for quickly restoring test program
CN106959885A (en) * 2017-03-31 2017-07-18 山东超越数控电子有限公司 A kind of virtual machine High Availabitity realizes system and its implementation
CN107196803A (en) * 2017-05-31 2017-09-22 中国人民解放军信息工程大学 The dynamic generation and maintaining method of isomery cloud main frame
CN107209709A (en) * 2015-01-27 2017-09-26 日本电气株式会社 Network function virtual management and layout device, system, management method and program

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6044131B2 (en) * 2012-06-25 2016-12-14 富士通株式会社 Program, management server, and virtual machine migration control method
US9356945B2 (en) * 2014-07-17 2016-05-31 Check Point Advanced Threat Prevention Ltd Automatic content inspection system for exploit detection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103677967A (en) * 2012-09-03 2014-03-26 阿里巴巴集团控股有限公司 Remote data service system of data base and task scheduling method
CN107209709A (en) * 2015-01-27 2017-09-26 日本电气株式会社 Network function virtual management and layout device, system, management method and program
CN106598594A (en) * 2016-12-14 2017-04-26 捷开通讯(深圳)有限公司 Test system and method for quickly restoring test program
CN106959885A (en) * 2017-03-31 2017-07-18 山东超越数控电子有限公司 A kind of virtual machine High Availabitity realizes system and its implementation
CN107196803A (en) * 2017-05-31 2017-09-22 中国人民解放军信息工程大学 The dynamic generation and maintaining method of isomery cloud main frame

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于混合迁移的OpenStack虚拟机在线迁移改进方案;何玫峻等;《系统工程理论与实践》;20140630;第216-220页 *

Also Published As

Publication number Publication date
CN108255576A (en) 2018-07-06

Similar Documents

Publication Publication Date Title
CN108255576B (en) Virtual machine live migration exception handling method and device and storage medium
CN109614276B (en) Fault processing method and device, distributed storage system and storage medium
CN109656895B (en) Distributed storage system, data writing method, device and storage medium
CN109656896B (en) Fault repairing method and device, distributed storage system and storage medium
WO2021226905A1 (en) Data storage method and system, and storage medium
CN109669822B (en) Electronic device, method for creating backup storage pool, and computer-readable storage medium
CN108920301B (en) Data backup method and system, and data recovery method and system
CN110727698A (en) Database access method and device, computer equipment and storage medium
CN109558209B (en) Monitoring method for virtual machine
CN105550071A (en) System file upgrading and detecting method and communication device
CN114138192A (en) Storage node online upgrading method, device, system and storage medium
CN113986450A (en) Virtual machine backup method and device
CN107943615B (en) Data processing method and system based on distributed cluster
CN110555017A (en) block chain data cleaning method and device, computer equipment and storage medium
CN108804239B (en) Platform integration method and device, computer equipment and storage medium
CN113312309B (en) Snapshot chain management method, device and storage medium
CN111427718B (en) File backup method, file recovery method and file recovery device
CN114416689A (en) Data migration method and device, computer equipment and storage medium
CN109254870B (en) Data backup method and device
US10489239B2 (en) Multiplexing system, multiplexing method, and computer program product
CN114328374A (en) Snapshot method, device, related equipment and database system
CN110990052A (en) Configuration saving method and device
CN112306962A (en) File copying method and device in computer cluster system and storage medium
CN110737546B (en) Consistency snapshot checking method, device, equipment and storage medium
CN112711382B (en) Data storage method and device based on distributed system and storage node

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200201

Address after: 200120 floor 15, 1333 Lujiazui Ring Road, free trade Experimental Zone, Pudong New Area, Shanghai

Applicant after: Weikun (Shanghai) Technology Service Co., Ltd

Address before: 200120 13 floor, 1333 Lujiazui Road, Pudong New Area free trade pilot area, Shanghai.

Applicant before: Lujiazui Shanghai international financial assets market Limited by Share Ltd

GR01 Patent grant
GR01 Patent grant