CN107544839B

CN107544839B - Virtual machine migration system, method and device

Info

Publication number: CN107544839B
Application number: CN201610481831.4A
Authority: CN
Inventors: 莫衍; 潘晓东
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Priority date: 2016-06-27
Filing date: 2016-06-27
Publication date: 2021-05-25
Anticipated expiration: 2036-06-27
Also published as: CN107544839A

Abstract

The embodiment of the application provides a virtual machine migration system, a method and a device, wherein each computing node reports own monitoring data respectively; the collector cluster respectively carries out fault detection on each computing node according to the monitoring data of each computing node, determines a fault computing node with a fault, and reports the fault computing node to the cloud controller; the cloud controller determines a target computing node and migrates virtual machines in the fault computing node to the target computing node respectively. Therefore, the virtual machine of the fault computing node normally operates in the target computing node, and key services and key applications of enterprises can continue to operate.

Description

Virtual machine migration system, method and device

Technical Field

The embodiment of the application relates to the technical field of cloud platforms, in particular to a virtual machine migration system, a virtual machine migration method and a virtual machine migration device.

Background

With the continuous development of cloud computing technology, the complexity of a cloud platform is gradually increased, and the cloud platform comprises a cloud controller, a cluster controller, a computing node controller and computing nodes. The cloud controller is used for managing cluster information; the cluster controller is used for managing network resource information, computing node information and virtual cluster information; the computing node provides a physical server with physical resources such as a hard disk, a memory, a CPU and the like, and can comprise one or more virtual machines; the compute node controller is to manage virtual machines in the compute nodes.

With the continuous development of cloud computing technology, key services and key applications of enterprises are gradually migrated to virtual machines of computing nodes in a cloud platform. When a computing node fails, the virtual machine cannot run, which results in that key services and key applications of an enterprise cannot run.

Disclosure of Invention

In view of this, the present invention provides a virtual machine migration system, method and device, so as to overcome the problem in the prior art that when a computing node fails, a virtual machine cannot run, which results in that key services and key applications of an enterprise cannot run.

In order to achieve the purpose, the invention provides the following technical scheme:

a virtual machine migration system, comprising: collector cluster, cloud controller, storage cluster and a plurality of compute node, wherein:

the collector cluster is used for receiving the monitoring data reported by each computing node, respectively carrying out fault detection on each computing node according to the monitoring data of each computing node, determining a fault computing node with a fault, and reporting the information of the fault computing node to the cloud controller;

the storage cluster is used for storing the configuration file of the virtual machine;

the cloud controller is used for receiving the information of the failed computing node, determining a target computing node from the computing nodes which do not have the failure in the plurality of computing nodes, sending a virtual machine configuration file acquisition instruction to the target computing node, and adding the virtual machine information corresponding to the failed computing node to the recorded virtual machine information corresponding to the target computing node;

and the destination computing node is used for receiving a virtual machine configuration file acquisition instruction sent by the cloud controller, acquiring the virtual machine configuration file from the storage cluster and configuring the virtual machine configuration file.

A virtual machine migration method is applied to a collector cluster, and comprises the following steps:

receiving monitoring data respectively reported by each computing node;

respectively carrying out fault detection on each computing node according to the monitoring data of each computing node, and determining the faulty computing node;

reporting the information of the fault computing node to the cloud controller; the information of the fault computing node is a condition for triggering the cloud controller to determine a target computing node and sending a virtual machine configuration file obtaining instruction to the target computing node, wherein the virtual machine configuration file obtaining instruction is a basis for the target computing node to obtain a virtual machine configuration file from a storage cluster.

A virtual machine migration method is applied to a cloud controller and comprises the following steps:

receiving information of a fault calculation node reported by a collector cluster, wherein the information of the fault calculation node is determined by the collector cluster according to monitoring data reported by the fault calculation node;

determining a target computing node from the computing nodes which do not have faults;

sending a virtual machine configuration file acquisition instruction to the target computing node, wherein the virtual machine configuration file acquisition instruction is a basis for the target computing node to acquire a virtual machine configuration file from a storage cluster;

and adding the virtual machine information corresponding to the fault computing node into the recorded virtual machine information corresponding to the target computing node.

A virtual machine migration method is applied to a computing node and comprises the following steps:

collecting monitoring data;

reporting the monitoring data to an acquisition device cluster so that the acquisition device cluster can carry out fault detection on the computing node according to the monitoring data, and reporting the monitoring data to a cloud controller when the computing node has a fault;

and when the computing node is not in fault, if an instruction for acquiring the virtual machine configuration file sent by the cloud controller is received, acquiring the virtual machine configuration file from the storage cluster, and configuring.

A virtual machine migration device is applied to a collector cluster, and comprises:

the receiving module is used for receiving the monitoring data reported by each computing node;

the determining module is used for respectively carrying out fault detection on each computing node according to the monitoring data of each computing node and determining a fault computing node with a fault;

the sending module is used for reporting the information of the fault computing node to the cloud controller; the information of the fault computing node is a condition for triggering the cloud controller to determine a target computing node and sending a virtual machine configuration file obtaining instruction to the target computing node, wherein the virtual machine configuration file obtaining instruction is a basis for the target computing node to obtain a virtual machine configuration file from a storage cluster.

A virtual machine migration device applied to a cloud controller comprises:

the receiving module is used for receiving information of a fault computing node reported by a collector cluster, wherein the information of the fault computing node is determined by the collector cluster according to the monitoring data of the fault computing node;

the determining module is used for determining a target computing node from the computing nodes which never have faults;

and the sending module is used for sending a virtual machine configuration file obtaining instruction to the target computing node, wherein the virtual machine configuration file obtaining instruction is a basis for the target computing node to obtain a virtual machine configuration file from a storage cluster.

A virtual machine migration device applied to a computing node, the virtual machine migration device comprising:

the acquisition module is used for acquiring monitoring data;

the sending module is used for reporting the monitoring data to the collector cluster so that the collector cluster can carry out fault detection on the computing node according to the monitoring data, and when the computing node has a fault, the monitoring data is reported to the cloud controller;

and the configuration module is used for acquiring the virtual machine configuration file from the storage cluster and configuring the virtual machine configuration file if receiving a virtual machine configuration file acquisition instruction sent by the cloud controller when the computing node fails.

As can be seen from the foregoing technical solutions, compared with the prior art, in the virtual machine migration system provided in the embodiment of the present application, each computing node respectively reports its own monitoring data; the collector cluster respectively carries out fault detection on each computing node according to the monitoring data of each computing node, determines a fault computing node with a fault, and reports the fault computing node to the cloud controller; the cloud controller determines a target computing node and migrates virtual machines in the fault computing node to the target computing node respectively. Therefore, the virtual machine of the fault computing node normally operates in the target computing node, and key services and key applications of enterprises can continue to operate.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic frame diagram of a virtual machine migration system according to an embodiment of the present application;

fig. 2 is a signaling flowchart of a virtual machine migration method according to an embodiment of the present application;

fig. 3 is a schematic diagram illustrating a connection relationship between each computing node and each collector in a collector cluster according to an embodiment of the present application;

fig. 4 is a detailed framework diagram of a virtual machine migration system according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a virtual machine migration apparatus applied to a collector cluster, a virtual machine migration apparatus applied to a cloud controller, and a virtual machine migration apparatus applied to a compute node according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The virtual machine migration system provided by the embodiment of the application comprises a collector cluster 11, a cloud controller 12, a storage cluster 13 and a plurality of computing nodes 14. The specific framework is shown in fig. 1.

Wherein the plurality of compute nodes 14 may be a plurality of physical servers 14. Each physical server may include one or more virtual machines. One or more virtual servers, i.e. virtual machines, can be simulated on one physical server through virtual machine software, and the virtual machines can work exactly like real physical servers, such as installing an operating system, installing application programs, accessing network resources, and the like.

Collector cluster 11 may comprise a cluster of multiple servers. Collector cluster 11 may monitor each compute node.

The cloud controller 12 may be a cluster of multiple servers 12.

The storage cluster 13 may be a cluster formed by a plurality of servers 13, and may store a virtual machine configuration file.

Collector cluster 11, cloud controller 12, storage cluster 13, and multiple computing nodes may be connected by wireless or wired means.

Clustering refers to associating a group of servers so that they look like the same server from many aspects outside. The physical servers within a cluster are typically connected by a local area network.

Based on the above architecture, a virtual machine migration method is described, as shown in fig. 2, the virtual machine migration method includes:

step S201: and each computing node acquires respective monitoring data and reports the monitoring data to the collector cluster.

The monitoring data can be used for reflecting whether the network environment where the computing node is located is good or not; calculating whether a virtual machine in a node can normally run; whether a compute node will perform poorly due to excessive load.

Each compute node contains a physical monitoring agent that can collect the compute node's monitoring data.

Each of the computing nodes belongs to a virtual cluster, and a virtual cluster may include one or more physical clusters; of course, each of the computing nodes may also belong to the same physical cluster, that is, in the virtual machine migration method provided in the embodiment of the present application, a virtual machine may migrate between each of the computing nodes in the physical cluster where the virtual machine is located, or may migrate between each of the computing nodes in different physical clusters belonging to the same virtual cluster.

Step S202: and the collector cluster respectively carries out fault detection on each computing node according to the monitoring data reported by each computing node, and determines the faulty computing node.

When the network environment where the computing node is located fails, the virtual machine in the computing node cannot normally run, and the computing node can be determined as a failed computing node; when the monitoring data shows that the virtual machine in the computing node can not normally run, the computing node can be determined as a fault computing node; when a compute node is too heavy because of the load amount, the compute node may be determined to be a failed compute node.

Step S203: and the collector cluster reports the information of the fault computing node to the cloud controller.

The information of the failed compute node may include an identification of the failed compute node and an identification of a virtual machine contained in the failed compute node.

The failed compute node information may also include an identification of the physical cluster to which it belongs.

When each computing node in the virtual machine migration method provided by the embodiment of the present application belongs to the same physical cluster, the failure computing information may not include an identifier of the physical cluster to which the failure computing information belongs. When each computing node belongs to the same virtual cluster, the virtual cluster may include a plurality of physical clusters, so that the failure computing node information needs to include an identifier of the physical cluster to which the failure computing node belongs, so that the cloud controller updates the corresponding relationship among the virtual machine, the computing node and the clusters in the following process after the virtual machine in the failure computing node is migrated or after a configuration file obtaining instruction is sent to a target computing node.

Step S204: and the cloud controller receives the information of the failed computing node and determines a target computing node from the computing nodes which do not have the failure in the plurality of computing nodes.

The cloud controller can determine which computing nodes in the plurality of computing nodes do not have faults through the identification of the faulty computing node, and determine a target computing node from the computing nodes which do not have faults.

Step S205: and the cloud controller sends an instruction for acquiring the configuration file of the virtual machine to the target computing node.

Step S206: and the target computing node receives a virtual machine configuration file acquisition instruction sent by the cloud controller, acquires the virtual machine configuration file from the storage cluster and configures the virtual machine configuration file.

The virtual machine configuration file is used to configure hardware information of the virtual machine, for example, the virtual machine configuration file includes two cores of a CPU, a memory 4G, and a disk 80G, and when the destination computing node is configured according to the virtual machine configuration file, the destination computing node configures a corresponding CPU, a memory, and a disk.

The instruction for obtaining the virtual machine configuration file may include path information of the destination computing node accessing the storage cluster.

The virtual machine profiles in the computing nodes in the same virtual cluster may be uniform.

Step S207: and the cloud controller adds the virtual machine information corresponding to the fault computing node into the recorded virtual machine information corresponding to the target computing node.

The cloud controller may record a corresponding relationship between the cluster identifier and the computing node identifier and the virtual machine identifier, and when a virtual machine in a failed computing node is migrated to a destination computing node, the corresponding relationship needs to be updated.

The computing nodes may include a failed computing node that has failed, a properly functioning computing node, and a destination computing node. Wherein the destination computing node is determined from the normally operating computing nodes. For clarity of drawing the signaling diagram of fig. 2, the compute nodes are divided into a failed compute node, a destination compute node, and a normal operation compute node.

In the virtual machine migration method provided by the embodiment of the application, each computing node respectively reports its own monitoring data; the collector cluster respectively carries out fault detection on each computing node according to the monitoring data of each computing node, determines a fault computing node with a fault, and reports the fault computing node to the cloud controller; the cloud controller determines a target computing node and migrates virtual machines in the fault computing node to the target computing node respectively. Therefore, the virtual machine of the fault computing node normally operates in the target computing node, and key services and key applications of enterprises can continue to operate.

The computing node shown in fig. 1 and having the functions of the computing node shown in fig. 2 is specifically configured to, when collecting monitoring data:

collecting heartbeat monitoring data of a network environment where the computing nodes are located; collecting monitoring data of a virtual machine control process in the computing node; and collecting load data in the computing nodes.

Acquiring heartbeat monitoring data of a network environment where the computing nodes are located comprises: acquiring heartbeat monitoring data of a management network through a management network port of a management network card; collecting heartbeat monitoring data of a data network through a data network port of a data network card; and acquiring heartbeat monitoring data of the storage network through a storage network port of the storage network card.

The heartbeat monitoring data of the management network can be continuous internet packet explorer ping of the management network; the heartbeat monitoring data of the data network may be an internet packet explorer ping of the continuous data network; the heartbeat monitoring data of the storage network may be an internet packet explorer ping of the continuous storage network.

Ping is a command under Windows, Unix, and Linux systems. ping also belongs to a communication protocol and is part of the TCP/IP protocol. By using the 'ping' command, whether the network is connected or not can be checked, and the network fault can be well analyzed and judged.

The virtual machine migration system or method provided by the embodiment of the application can be applied to a cloud platform, and the networking of the cloud platform comprises three network layers: a management network 31, a data network 32 and a storage network 33, each computing node can determine whether its network environment is good by monitoring the management network, the data network and the storage network.

Fig. 3 is a schematic diagram of a connection relationship between each computing node and a collector cluster.

It can be seen from fig. 3 that each computing node 14 includes three ports, namely, a management port 141, a data port 142, and a storage port 143, which are ports that can be network cards in the computing node. The computing node can obtain heartbeat monitoring data of the management network through a management network port of the management network card; heartbeat monitoring data of the data network can be obtained through a data network port of the data network card; the heartbeat monitoring data of the storage network can be obtained through the storage network port of the storage network card.

Collector cluster 11 includes a plurality of collectors 111. each collector 111 includes a management portal 1111, a data portal 1112, and a storage portal 1113, which may also be a network portal of a network card.

The number of the collectors 111 in the collector cluster 11 may be the same as or different from the number of the computing nodes, and the number of the collectors 111 may be smaller than the number of the computing nodes, that is, one collector 111 may collect monitoring data of a plurality of computing nodes.

The computing node and the collector are connected with a management network through a management network port of the computing node and a data network through a data network port of the computing node and a storage network through a storage network port of the computing node and the collector.

And the management network port in the computing node is used for receiving a command, such as a login command, sent to the computing node by the computing control node or the cluster controller through the management network.

The data network port in the computing node may also be referred to as a service network port, and is used for the virtual machine in the computing node to communicate with the outside through a data network, and for the computing node to communicate with the outside.

And the storage network port in the computing node is used for the virtual machine to communicate with the storage cluster through a storage network and storing the configuration file and the disk data of the virtual machine to the storage cluster.

The computing node can send the collected monitoring data to the data network port of the collector through the data network port.

The virtual machine control process monitoring data may include virtualization management processes, such as libvrit in openstack, and computing daemons, such as nova in openstack.

The virtualization management process is packaged with a virtualization technology, if the virtualization management process has a problem, the computing node controller cannot send a control instruction to the virtual machine, that is, cannot change the current running state of the virtual machine, for example, the running of the virtual machine needs to be suspended, and because the process has a problem, the virtual machine cannot be suspended.

And the computing daemon is used for synchronizing the state of the computing node and the state of the virtual machine to the computing node controller, and if the computing daemon goes wrong, the computing node is displayed to be unavailable in the computing node controller. The calculation daemon receives a control instruction sent by the calculation control node, and forwards the control instruction to the virtualization management process.

The load data may include CPU occupancy, memory utilization, and/or disk utilization.

The above-mentioned computing node in fig. 3 may collect various types of monitoring data, and it is understood that, in different application scenarios, the collector cluster shown in fig. 1 and having the function of the collector cluster shown in fig. 2 has different monitoring data for fault detection on the computing node, for example, in application scenario a, the computing node includes a plurality of virtual machines, and the CPU occupancy of the computing node is always about 90%, and the memory utilization is about 92%, but the computing node is regarded as a non-fault computing node by an operator.

In order to more conveniently apply the virtual machine migration method provided by the embodiment of the application to a plurality of different application scenarios, a monitoring data selection interface can be displayed for an operator, various types of data collected by the computing nodes can be displayed in the monitoring data selection interface, and an operator can select which types of monitoring data are needed to perform fault detection on the computing nodes in the monitoring data selection interface, still taking the above as an example, the operator may not select load data, may select only heartbeat monitor data for the network environment and virtual machine control process monitor data, optionally, the monitoring data selection interface may also display detailed monitoring data types of heartbeat monitoring data of a management network, heartbeat monitoring data of a data network, heartbeat monitoring data of a storage network, a virtualization management process, a computation daemon process, a CPU occupancy rate, a memory usage rate, and the like.

The collector cluster can respectively detect the faults of the computing nodes according to the data corresponding to the data types in the monitoring data of the computing nodes. For example, if the operator does not select the load data, the collector cluster does not perform fault detection on the computing node according to the load data when performing fault detection on the computing node.

In the embodiment of the application, the collector cluster performs fault detection on the computing node, and the type of the used monitoring data can not be selected by an operator. Regardless of whether the operator can select the type of the monitoring data or the type of the monitoring data, the collector cluster having the function of the collector cluster shown in fig. 2 shown in fig. 1 may specifically be configured to, when performing fault detection on each computing node according to the monitoring data of each computing node and determining a faulty computing node that has a fault, perform:

when the heartbeat monitoring data of the management network are not detected within first preset time and the heartbeat monitoring data of the storage network are not detected within second preset time, determining the computing node as a fault computing node; or, when the heartbeat monitoring data of the management network is not detected within the first preset time and the heartbeat monitoring data of the data network is not detected within the third preset time, determining the computing node as a fault computing node; or, when the heartbeat monitoring data of the management network is not detected within the first preset time and the virtual machine control process is in a stop running state within a fourth preset time, determining the computing node as a fault computing node; or when detecting that the load data of the computing node is greater than or equal to a preset threshold value, determining the computing node as a fault computing node.

When it is detected that the data of the load amount of the compute node is greater than or equal to the preset threshold, determining the compute node as a faulty compute node may include: and when the CPU utilization rate of the computing node is detected to be larger than or equal to a first preset threshold value or the memory utilization rate of the computing node is detected to be larger than or equal to a second preset threshold value, determining the computing node as a fault computing node.

The first preset time, the second preset time, the third preset time and the fourth preset time may be the same or different, and may be determined according to actual conditions.

From the description of the respective network ports in fig. 3, it can be seen that when a problem occurs in the management network, the computing node cannot receive a command sent by the computing control node or the cluster controller. In which case the computing node may be determined to be a failed computing node. A computing node may be considered a non-failing computing node when it allows that it may not always need to receive commands sent by the computing control node or the cluster controller.

When a data network has a problem, the computing node and the virtual machine of the computing node cannot interact with the outside, and at the moment, it is required to determine that the computing node has a fault. When the computing node is allowed to interact with the outside world, the computing node may be considered to be a non-failed node.

When a storage network has a problem, the virtual machine cannot communicate with the storage cluster, and the computing node may be considered as a failed computing node by an operator or a non-failed computing node by the operator according to different application scenarios.

That is, when any of the above networks has a problem, in some application scenarios, the operator may consider the computing node as a failed computing node, and in other application scenarios, the operator may consider the computing node as a non-failed computing node. If only one network has a problem, the computing node is confirmed as a failed computing node, and some operators considered as non-failed computing nodes are determined as failed computing nodes.

Through continuous research, the applicant finds that when two networks, namely a management network and a storage network, simultaneously have problems, or the management network and a data network simultaneously have problems, or the management network has problems and a virtual machine control process has problems, or load data is too large, in any application scene, the calculation is the fault calculation node, and therefore the fault detection method for the calculation node is researched.

In other applications, if any combination of the above fails, the computing node is determined to be a failed node, and at this time, the collector cluster with the collector cluster function performs fault detection on each computing node according to the monitoring data of each computing node, and when a failed computing node is determined, each computing node may be specifically configured to:

the method comprises the steps that heartbeat monitoring data of a management network are not detected within first preset time, heartbeat monitoring data of a storage network are not detected within second preset time, heartbeat monitoring data of a data network are not detected within third preset time, and a computing node is determined as a fault computing node when one or more of the conditions that a virtual machine control process is in a stop operation state and the computing node load data is detected to be larger than or equal to a preset threshold value within fourth preset time.

It can be understood that, when the number of the computing nodes is small, the collector cluster shown in fig. 1 and having the function of the collector cluster shown in fig. 2 may store the monitoring data reported by the computing nodes in the database in the collector cluster in real time, and when the number of the computing nodes is large, if the monitoring data is stored in the database in real time, interaction with the database may be too frequent, and for this reason, optionally, the collector cluster may further be configured to: caching the received monitoring data of each computing node; and when the cached monitoring data reach a preset number, storing the preset number of monitoring data into a database in the collector cluster.

Optionally, as shown in fig. 1, after receiving the information of the faulty computing node reported by the collector cluster, the cloud controller having the function of the cloud controller shown in fig. 2 is further configured to:

sending information for confirming whether a fault occurs to the fault computing node; and when receiving the confirmation information fed back by the fault computing node, triggering and executing to determine a target computing node from the multiple computing nodes, and sending a virtual machine configuration file acquisition instruction to the target computing node. Namely, the cloud controller and the fault computing node carry out a fault synchronization confirmation mechanism.

Optionally, the cloud controller provided in this embodiment of the present application may have a forcedown mechanism, and when the number of the failure computing nodes is smaller than the preset failure number, to avoid the collector cluster erroneously determining a non-failure computing node as a failure computing node, the cloud controller may perform a failure synchronization determination mechanism on the failure computing node. When the number of the faulty computing nodes is greater than or equal to the preset number of the faulty computing nodes, if the cloud controller still performs a fault synchronization confirmation mechanism with each faulty computing node, the whole migration process may be slow, and therefore, the embodiment of the present application may provide a fault synchronization confirmation mechanism selection interface, execute the fault synchronization confirmation mechanism when an operator selects the fault synchronization confirmation mechanism, and not execute the fault synchronization confirmation mechanism if the operator does not select the fault synchronization confirmation mechanism. Namely, when the cloud controller receives the fault computing node, the target computing node is directly determined and migrated without a fault synchronization confirmation mechanism, so that the migration speed is increased.

Fig. 1 shows that, when determining a destination computing node from among the computing nodes that have not failed in the cloud controller having the function of the cloud controller shown in fig. 2, in an implementation manner, the cloud controller is specifically configured to:

monitoring scheduling parameters of each computing node in real time, wherein the scheduling parameters comprise: the remaining amount of resources or the energy consumption or the time sequence of joining the cloud platform; and determining the computing node with the scheduling parameter meeting the scheduling strategy as a target computing node.

When the scheduling parameter is the resource residual amount, the scheduling strategy can be that the computing node with the largest resource residual amount is used as a target computing node or the computing node with the smallest resource residual amount is used as a target computing node; when the scheduling parameter is energy consumption, the scheduling strategy can be that the computing node with the minimum energy consumption is used as a target computing node; when the scheduling parameter is the time sequence of joining the cloud platform, the scheduling policy may be to use the computing node that has the longest time to join the cloud platform as the destination computing node (i.e., preferentially use the old computing node), or to use the computing node that has the shortest time to join the cloud platform as the destination computing node (i.e., preferentially use the new computing node).

The resource surplus can be obtained by calculating the comprehensive proportion of the CPU, the memory and the disk, and can also be the occupancy rate of 1-CPU or the usage rate of 1-memory or the occupancy rate of 1-disk.

The scheduling parameters are various, and the destination computing node can be determined according to different scheduling parameters in different application scenarios. When the computing nodes described in fig. 1 belong to the same virtual cluster but belong to different physical clusters, the scheduling parameters may be different when selecting the target computing node in different physical clusters. Based on this, in the virtual machine migration method or system provided by the present application, when monitoring the scheduling parameter of each computing node in real time, the cloud controller is specifically configured to: determining a selected target scheduling strategy in a target computing node scheduling strategy selection interface; and monitoring the scheduling parameters corresponding to the target scheduling strategy in each computing node in real time.

An operator can select a current application scene or scheduling strategies required by different physical clusters in a scheduling strategy selection interface of a destination computing node, and the scheduling strategy selected by the operator is called a destination scheduling strategy. The cloud controller can monitor scheduling parameters corresponding to the target scheduling strategies in each computing node in real time according to different target scheduling strategies.

As shown in fig. 1, for each virtual machine in the faulty computing node, when the cloud controller having the function of the cloud controller shown in fig. 2 determines, as a destination computing node, a computing node whose scheduling parameter satisfies a scheduling policy, the cloud controller is specifically configured to: and taking the computing node of which the scheduling parameter meets the scheduling policy in the computing nodes which do not have faults at present as the target computing node of the virtual machine.

And selecting a target computing node with a scheduling parameter meeting the scheduling policy from all the non-fault computing nodes, regardless of whether the target computing node and the fault computing node belong to the same physical cluster.

Preferably, the target computing node with the scheduling parameter meeting the scheduling policy is searched in the physical cluster to which the faulty computing node belongs, and if the target computing node does not meet the scheduling policy, the target computing node is searched in other physical clusters. For each virtual machine in the failure computing node, when determining the computing node whose scheduling parameter meets the scheduling policy as a target computing node, the cloud controller is specifically configured to:

and when the computing node with the scheduling parameter meeting the scheduling policy exists in the cluster to which the fault computing node belongs, taking the computing node with the scheduling parameter meeting the scheduling policy as a target computing node of the virtual machine.

And when the scheduling parameter of the computing node in the cluster to which the fault computing node belongs does not meet the scheduling strategy, taking the computing node of which the scheduling parameter meets the scheduling strategy in other clusters as a target computing node of the virtual machine.

In summary, the method for the cloud controller to determine the destination computing node from the non-failed computing nodes may be applied to a single cluster, that is, each computing node mentioned in fig. 1 belongs to the same physical cluster. The present invention may also be applied to cross-cluster, that is, each of the computing nodes mentioned in fig. 1 belongs to the same virtual cluster, and the virtual cluster may include a plurality of physical clusters, where cross-cluster refers to cross-physical cluster.

An embodiment of the present application further provides a virtual machine migration system, as shown in fig. 1, including a collector cluster 11, a cloud controller 12, a storage cluster 13, and a plurality of computing nodes 14, where:

the collector cluster 11 has the function of the collector cluster shown in fig. 2; the cloud controller 12 has the functions of the cloud controller described in fig. 2; the storage cluster 13 stores a virtual machine configuration file; each of the plurality of compute nodes 14 has the functionality of the compute node described in fig. 2.

As shown in fig. 4, a detailed framework of the virtual machine migration system may include, in the collector cluster 11: a database 112 (e.g., Mysql database), a memory 113, a caching system 114 (e.g., redis) for caching monitoring data, an analytics server 115, a plurality of collectors 111.

Collector 111 may be a server.

When the cache system 114 caches the monitoring data to a predetermined amount, the predetermined amount of monitoring data is stored in the database 112. The cache system 114 may be a database.

The memory 113 contains queues. It can be understood that the number of monitoring data reported to the collector cluster by each computing node may be very large, so that a queue is set in the collector cluster, and each monitoring data may be sorted in the queue according to the reporting time of each computing node.

And the analysis server 115 is used for acquiring the monitoring data of each computing node from the database 112 and determining a failed computing node.

The computing node 14 includes a monitoring agent, a computing daemon process, and a virtualization management process, where the monitoring agent includes a reporting process and an acquisition process, and the monitoring agent acquires monitoring data through the acquisition process and reports the monitoring data to the acquisition unit 114 through the reporting process.

The cloud controller 12 includes an election process and a scheduling process, and determines a destination computing node through the election process, and sends an instruction for obtaining a virtual machine configuration file to the destination computing node through the scheduling process.

The virtual machine migration system in the embodiment of the application can enable the cloud platform to have the automatic fault switching capability, and when a computing node fails, a virtual machine in the failed computing node can be migrated, so that the reliability of the virtual machine is guaranteed.

The following describes the virtual machine migration apparatus provided in the embodiment of the present application, and the virtual machine migration apparatus described below and the virtual machine migration method described above may be referred to correspondingly.

The embodiment of the application also provides a virtual machine migration device applied to the collector cluster, a virtual machine migration device applied to the cloud controller and a virtual machine migration device applied to the computing node. Fig. 5 is a schematic diagram showing a connection relationship between modules in the three apparatuses.

The virtual machine migration apparatus 51 applied to the collector cluster includes: a receiving module 511, a determining module 512 and a sending module 513; the virtual machine migration apparatus 52 applied to the cloud controller includes: a receiving module 521, a determining module 522 and a transmitting module 523; the virtual machine migration apparatus 53 applied to the computing node includes: an acquisition module 531, a sending module 532, and a configuration module 533, wherein:

and the acquisition module 531 is used for acquiring monitoring data.

A sending module 532, configured to report the monitoring data to the receiving module 511.

A receiving module 511, configured to receive the monitoring data reported by the sending module 532.

The determining module 512 is configured to perform fault detection on each computing node according to the monitoring data of each computing node, and determine a faulty computing node that has a fault.

A sending module 513, configured to report the information of the faulty computing node to a receiving module 521.

A receiving module 521, configured to receive the information of the faulty computing node reported by the sending module 513.

A determining module 522, configured to determine a destination computing node from the non-failed computing nodes.

A sending module 523, configured to send an instruction to obtain a virtual machine configuration file to the configuration module 533 of the destination computing node.

The configuration module 533 is configured to, when the computing node fails, obtain a virtual machine configuration file from the storage cluster and configure the virtual machine configuration file if a virtual machine configuration file obtaining instruction sent by the cloud controller is received.

In the virtual machine migration device applied to the compute node, the virtual migration device applied to the collector cluster, and the virtual migration device applied to the cloud controller provided in the embodiment of the present application, each sending module 532 reports the monitoring data collected by the collecting module 531; the determining module 512 performs fault detection on each computing node according to the monitoring data of each computing node, determines a faulty computing node, and the sending module 513 reports the faulty computing node to the receiving module 521; the determining module 522 determines a destination computing node, the sending module 523 sends an instruction of obtaining a virtual machine configuration file to the configuration module 533 of the destination computing node, and the configuration module 533 obtains the virtual machine configuration file from the storage cluster and configures the virtual machine configuration file. Therefore, the virtual machines in the fault computing nodes are respectively migrated to the target computing nodes. Therefore, the virtual machine of the fault computing node normally operates in the target computing node, and key services and key applications of enterprises can continue to operate.

The embodiment of the application provides an optional structure of an acquisition module in a virtual machine migration device applied to a computing node, which is specifically as follows: the acquisition module may include:

and the first acquisition unit is used for acquiring heartbeat monitoring data of a network environment where the computing node is located.

And the second acquisition unit is used for acquiring the monitoring data of the control process of the virtual machine in the computing node.

And the second acquisition unit is used for acquiring load data in the computing node.

The embodiment of the application further provides an optional structure applied to the first acquisition unit, which is as follows: the first acquisition unit includes:

and the first acquisition subunit is used for acquiring heartbeat monitoring data of the management network through the management network port of the management network card.

And the second acquisition subunit is used for acquiring heartbeat monitoring data of the data network through the data network port of the data network card.

And the third acquisition subunit is used for acquiring heartbeat monitoring data of the storage network through the storage network port of the storage network card.

The embodiment of the present application further provides an optional structure of a determination module in a virtual machine migration apparatus applied to a collector cluster, which is specifically as follows: the determining module comprises:

the receiving unit is used for receiving and displaying the selected data type in the monitoring data selection interface;

and the detection unit is used for respectively carrying out fault detection on each computing node according to the data corresponding to the data type in the monitoring data of each computing node.

The embodiment of the present application further provides another optional structure of a determination module in a virtual machine migration apparatus applied to a collector cluster, which is specifically as follows:

the first determining unit is used for determining the computing node as a fault computing node when the heartbeat monitoring data of the management network is not detected within first preset time and the heartbeat monitoring data of the storage network is not detected within second preset time;

or, the second determining unit is configured to determine the computing node as a faulty computing node when the heartbeat monitoring data of the management network is not detected within the first preset time and the heartbeat monitoring data of the data network is not detected within a third preset time;

or, a third determining unit, configured to determine, when the heartbeat monitoring data of the management network is not detected within the first preset time and the virtual machine control process is in a stop operation state within a fourth preset time, the computing node as a faulty computing node;

or, the fourth determining unit is configured to determine the computing node as a faulty computing node when detecting that the operating load of the computing node is greater than or equal to a preset threshold.

The embodiment of the present application further provides a virtual machine migration apparatus applied to a collector cluster, which may further include the following structure, specifically as follows:

the cache module is used for caching the received monitoring data of each computing node;

and the data sending module is used for storing the monitoring data of the preset number into a database in the collector cluster when the cached monitoring data reach the preset number.

The embodiment of the present application further provides a virtual machine migration apparatus applied to a cloud controller, which may further include the following structure, specifically:

a message sending confirmation module used for sending information for confirming whether a fault occurs to the fault calculation node;

and the first triggering module is used for triggering the determining module in the cloud controller when receiving the confirmation information fed back by the fault computing node.

and the second triggering module is used for triggering the confirmation information sending module when the fault synchronization confirmation mechanism is selected in the received fault synchronization confirmation mechanism selection interface.

The embodiment of the present application further provides an optional structure of a determination module applied to a virtual machine migration apparatus of a cloud controller, which is specifically as follows: the determining module comprises:

the monitoring unit is used for monitoring scheduling parameters of each computing node in real time, and the scheduling parameters comprise: the remaining amount of resources or the energy consumption or the time sequence of joining the cloud platform;

and the determining unit is used for determining the computing node with the scheduling parameter meeting the scheduling strategy as a target computing node.

The embodiment of the present application further provides an optional structure of a monitoring unit in a determination module in a virtual machine migration apparatus applied to a cloud controller, which is specifically as follows: the monitoring unit includes:

the first determining subunit is used for determining a selected target scheduling policy in a target computing node scheduling policy selection interface;

and the monitoring subunit is used for monitoring the scheduling parameters corresponding to the target scheduling strategy in each computing node in real time.

The embodiment of the present application further provides an optional structure of a determination unit in a determination module applied to a virtual machine migration apparatus of a cloud controller, which is specifically as follows: the determination unit includes:

and the second determining subunit is used for taking the computing node of which the scheduling parameter meets the scheduling policy in the computing nodes which are not failed currently as the target computing node of the virtual machine.

a third determining subunit, configured to, when a computing node whose scheduling parameter meets the scheduling policy exists in the cluster to which the faulty computing node belongs, take the computing node whose scheduling parameter meets the scheduling policy as a destination computing node of the virtual machine;

and the fourth determining subunit is configured to, when the scheduling parameter of the computing node in the cluster to which the faulty computing node belongs does not satisfy the scheduling policy, use the computing node whose scheduling parameter satisfies the scheduling policy in another cluster as the destination computing node of the virtual machine.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A virtual machine migration system, comprising: collector cluster, cloud controller, storage cluster and a plurality of compute node, wherein:

the collector cluster is used for receiving monitoring data reported by each computing node, wherein the monitoring data at least comprises heartbeat monitoring data of a network environment where the computing node is located, which is collected by the computing node, and virtual machine control process monitoring data in the computing node; the network environment comprises a management network, a data network and a storage network; respectively carrying out fault detection on each computing node according to the monitoring data of each computing node, determining a fault computing node with a fault, and reporting the information of the fault computing node to the cloud controller, wherein the fault computing node with the fault at least comprises a computing node with a fault in the network environment;

when the computing node collects heartbeat monitoring data of a network environment where the computing node is located, the computing node is specifically configured to: acquiring heartbeat monitoring data of the management network through a management network port of a management network card of the computing node; collecting heartbeat monitoring data of the data network through a data network port of a data network card of the computing node; acquiring heartbeat monitoring data of the storage network through a storage network port of a storage network card of the computing node;

2. The virtual machine migration system according to claim 1, wherein the cloud controller, after receiving the information of the failed computing node, is further configured to:

sending information for confirming whether a fault occurs to the fault computing node;

when receiving confirmation information fed back by the failed computing node, triggering and executing the computing nodes which do not have the failure in the plurality of computing nodes, determining a target computing node, and sending a virtual machine configuration file acquisition instruction to the target computing node;

and the fault computing node is also used for feeding back confirmation information to the cloud controller when receiving the confirmation information sent by the cloud controller whether the fault information occurs.

3. The virtual machine migration system of claim 2, wherein the cloud controller is further configured to:

and when receiving a message that the fault synchronization confirmation mechanism is selected in the fault synchronization confirmation mechanism selection interface, triggering execution to send information for confirming whether a fault occurs to the fault computing node.

4. The virtual machine migration system according to claim 1, wherein when determining the destination computing node from among the computing nodes that have not failed, the cloud controller is specifically configured to:

monitoring scheduling parameters of each computing node in real time, wherein the scheduling parameters comprise: the remaining amount of resources or the energy consumption or the time sequence of joining the cloud platform;

and determining the computing node with the scheduling parameter meeting the scheduling strategy as a target computing node.

5. The virtual machine migration system according to claim 4, wherein the cloud controller, when monitoring the scheduling parameter of each computing node in real time, is specifically configured to:

determining a selected target scheduling strategy in a target computing node scheduling strategy selection interface;

and monitoring the scheduling parameters corresponding to the target scheduling strategy in each computing node in real time.

6. The virtual machine migration system according to claim 4 or 5, wherein when determining, as the destination computing node, the computing node whose scheduling parameter satisfies the scheduling policy, the cloud controller is specifically configured to:

when the computing node with the scheduling parameter meeting the scheduling policy exists in the cluster to which the fault computing node belongs, taking the computing node with the scheduling parameter meeting the scheduling policy as a target computing node of the virtual machine;

and when the scheduling parameter of the computing node in the cluster to which the fault computing node belongs does not meet the scheduling policy, taking the computing node of which the scheduling parameter meets the scheduling policy in other clusters as a target computing node of the virtual machine.

7. The virtual machine migration system according to claim 1, wherein when reporting the monitoring data to the collector cluster, the computing node is further specifically configured to:

and collecting load data in the computing nodes.

8. The virtual machine migration system according to claim 7, wherein the collector cluster, when performing fault detection on each computing node according to the monitoring data of each computing node and determining a faulty computing node, is specifically configured to, for each computing node:

determining the computing node as a failure computing node when the heartbeat monitoring data of the management network is not detected within a first preset time and the heartbeat monitoring data of the storage network is not detected within a second preset time;

or, when the heartbeat monitoring data of the management network is not detected within the first preset time and the heartbeat monitoring data of the data network is not detected within the third preset time, determining the computing node as a failure computing node;

or, when the heartbeat monitoring data of the management network is not detected within the first preset time and the virtual machine control process is in a stop running state within a fourth preset time, determining the computing node as a fault computing node;

or when the running load of the computing node is detected to be greater than or equal to a preset threshold value, determining the computing node as a fault computing node.

9. The virtual machine migration system according to claim 1, wherein the collector cluster is specifically configured to, when performing fault detection on each computing node according to the monitoring data of each computing node:

receiving and displaying the selected data type in the monitoring data selection interface;

and respectively carrying out fault detection on each computing node according to the data corresponding to the data type in the monitoring data of each computing node.

10. A virtual machine migration method is applied to a collector cluster, and comprises the following steps:

receiving monitoring data respectively reported by each computing node, wherein the monitoring data at least comprises heartbeat monitoring data of a network environment where the computing node is located, which is acquired by the computing node, and virtual machine control process monitoring data in the computing node; the network environment comprises a management network, a data network and a storage network;

when the computing node collects heartbeat monitoring data of a network environment where the computing node is located, the heartbeat monitoring data of the management network are collected through a management network port of a management network card of the computing node; collecting heartbeat monitoring data of the data network through a data network port of a data network card of the computing node; acquiring heartbeat monitoring data of the storage network through a storage network port of a storage network card of the computing node;

respectively carrying out fault detection on each computing node according to the monitoring data of each computing node, and determining fault computing nodes with faults, wherein the fault computing nodes with faults at least comprise computing nodes with faults in the network environment;

reporting the information of the fault computing node to a cloud controller; the information of the fault computing node is a condition for triggering the cloud controller to determine a target computing node and sending a virtual machine configuration file obtaining instruction to the target computing node, wherein the virtual machine configuration file obtaining instruction is a basis for the target computing node to obtain a virtual machine configuration file from a storage cluster.

11. The virtual machine migration method according to claim 10, wherein the performing fault detection on each computing node according to the monitoring data of each computing node respectively comprises:

12. The virtual machine migration method according to claim 10 or 11, wherein the monitoring data includes heartbeat monitoring data of a management network, heartbeat monitoring data of a data network, heartbeat monitoring data of a storage network, virtual machine control process monitoring data in a compute node, and compute node load amount data, the fault detection is performed on each compute node according to the monitoring data of each compute node, and a faulty compute node is determined, and for each compute node, the method includes:

13. The virtual machine migration method according to any one of claims 10 to 12, further comprising:

caching the received monitoring data of each computing node;

and when the cached monitoring data reach a preset number, storing the preset number of monitoring data into a database in the collector cluster.

14. A virtual machine migration method is applied to a cloud controller, and comprises the following steps:

receiving information of a fault computing node reported by a collector cluster, wherein the information of the fault computing node is determined by the collector cluster according to monitoring data reported by the fault computing node, and the monitoring data at least comprises heartbeat monitoring data of a network environment where the computing node is located and collected by the computing node and virtual machine control process monitoring data in the computing node; the network environment comprises a management network, a data network and a storage network; the failed computing nodes at least comprise the computing nodes with failures in the network environment;

15. The virtual machine migration method according to claim 14, wherein after receiving the information of the failed computing node reported by the collector cluster, the method further comprises:

and when receiving the confirmation information fed back by the fault computing node, triggering and executing to determine a target computing node from a plurality of computing nodes, and sending a virtual machine configuration file acquisition instruction to the target computing node.

16. The virtual machine migration method according to claim 15, further comprising:

and triggering and executing to send information for confirming whether a fault occurs to the computing node when the fault synchronization confirmation mechanism is selected in the received fault synchronization confirmation mechanism selection interface.

17. The virtual machine migration method according to claim 14, wherein said determining a destination computing node from among the non-failed computing nodes comprises:

and determining the computing node with the scheduling parameter meeting the scheduling policy as a target computing node.

18. The virtual machine migration method according to claim 17, wherein the monitoring the scheduling parameter of each computing node in real time comprises:

19. The virtual machine migration method according to claim 17 or 18, wherein, for each virtual machine in the failed computing node, the determining, as a destination computing node, the computing node whose scheduling parameter satisfies the scheduling policy comprises:

and taking the computing node of which the scheduling parameter meets the scheduling strategy in the computing nodes which are not failed at present as a target computing node of the virtual machine.

20. The virtual machine migration method according to claim 17 or 18, wherein, for each virtual machine in the failed computing node, the determining, as a destination computing node, the computing node whose scheduling parameter satisfies the scheduling policy comprises:

21. A virtual machine migration method is applied to a computing node, and comprises the following steps:

collecting monitoring data, wherein the monitoring data at least comprises heartbeat monitoring data of a network environment where the computing node is located and virtual machine control process monitoring data in the computing node; the network environment comprises a management network, a data network and a storage network;

reporting the monitoring data to an acquisition device cluster so that the acquisition device cluster can carry out fault detection on the computing node according to the monitoring data, and reporting the monitoring data to a cloud controller when the computing node has a fault; the failure of the computing node at least comprises the failure of the network environment where the computing node is located;

22. The virtual machine migration method according to claim 21, wherein the collecting monitoring data further comprises:

and collecting load data in the computing nodes.

23. A virtual machine migration apparatus, applied to a collector cluster, includes:

the receiving module is used for receiving monitoring data reported by each computing node, wherein the monitoring data at least comprises heartbeat monitoring data of a network environment where the computing node is located, which is acquired by the computing node, and virtual machine control process monitoring data in the computing node; the network environment comprises a management network, a data network and a storage network; when the computing node collects heartbeat monitoring data of a network environment where the computing node is located, the computing node is specifically configured to: acquiring heartbeat monitoring data of the management network through a management network port of a management network card of the computing node; collecting heartbeat monitoring data of the data network through a data network port of a data network card of the computing node; acquiring heartbeat monitoring data of the storage network through a storage network port of a storage network card of the computing node;

the determining module is used for respectively carrying out fault detection on each computing node according to the monitoring data of each computing node and determining a fault computing node which has a fault, wherein the fault computing node which has the fault at least comprises a computing node which has the fault in the network environment;

24. The virtual machine migration device is applied to a cloud controller, and comprises:

the monitoring data at least comprises heartbeat monitoring data of a network environment where the computing node is located and acquired by the computing node, and virtual machine control process monitoring data in the computing node; the network environment comprises a management network, a data network and a storage network; the failed computing nodes at least comprise the computing nodes with failures in the network environment; when the computing node collects heartbeat monitoring data of a network environment where the computing node is located, the computing node is specifically configured to: acquiring heartbeat monitoring data of the management network through a management network port of a management network card of the computing node; collecting heartbeat monitoring data of the data network through a data network port of a data network card of the computing node; acquiring heartbeat monitoring data of the storage network through a storage network port of a storage network card of the computing node;

25. A virtual machine migration apparatus applied to a compute node, the virtual machine migration apparatus comprising:

the acquisition module is used for acquiring monitoring data, wherein the monitoring data at least comprises heartbeat monitoring data of a network environment where the computing node is located and virtual machine control process monitoring data in the computing node; the network environment comprises a management network, a data network and a storage network;

the sending module is used for reporting the monitoring data to the collector cluster so that the collector cluster can carry out fault detection on the computing node according to the monitoring data, and when the computing node has a fault, the monitoring data is reported to the cloud controller; the failure of the computing node at least comprises the failure of the network environment where the computing node is located;