WO2016180005A1 - 处理虚拟机集群的方法和计算机系统 - Google Patents

处理虚拟机集群的方法和计算机系统 Download PDF

Info

Publication number
WO2016180005A1
WO2016180005A1 PCT/CN2015/095654 CN2015095654W WO2016180005A1 WO 2016180005 A1 WO2016180005 A1 WO 2016180005A1 CN 2015095654 W CN2015095654 W CN 2015095654W WO 2016180005 A1 WO2016180005 A1 WO 2016180005A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual machine
virtual
information
machines
virtual machines
Prior art date
Application number
PCT/CN2015/095654
Other languages
English (en)
French (fr)
Inventor
伍湘平
李龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP15891701.3A priority Critical patent/EP3291487B1/en
Publication of WO2016180005A1 publication Critical patent/WO2016180005A1/zh
Priority to US15/812,747 priority patent/US10728099B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/12Arrangements for remote connection or disconnection of substations or of equipment thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1482Generic software techniques for error detection or fault masking by means of middleware or OS functionality
    • G06F11/1484Generic software techniques for error detection or fault masking by means of middleware or OS functionality involving virtual machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/203Failover techniques using migration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/815Virtual

Definitions

  • Embodiments of the present invention relate to the field of information technology, and, more particularly, to a method and computer system for processing a virtual machine cluster.
  • a cluster is usually a parallel or distributed system of nodes (such as computers or virtual machines) that are connected to each other. These nodes work together and run a common set of applications, while providing a single system mapping for users and applications.
  • nodes such as computers or virtual machines
  • a server cluster connects multiple servers through a communication link. From the outside, these servers work like a server. For the internals, external loads are dynamically allocated to the server through certain mechanisms. Achieve high performance and high availability that is only available in super servers.
  • a virtual machine (English: Virtual Machine, referred to as "VM") is software running on a host, which creates an environment between the computer platform and the end user, and the end user is created based on the software. The environment to operate.
  • a virtual machine cluster is a parallel or distributed system in which multiple virtual machines are connected to each other.
  • Virtual machine clusters typically use a master/slave architecture.
  • the Master host is responsible for monitoring all slave hosts and restarting the virtual machines on the slave hosts when the Slave host is down.
  • the slave node also receives the heartbeat message sent by the master node to confirm whether the master node is alive. If the host of the primary master is down, the slave in the cluster will reselect a master.
  • the master host is responsible for monitoring and managing all the slave hosts in the cluster.
  • the performance of the master host will not be sufficient to support maintenance.
  • a large number of slave hosts make the Master the bottleneck of the entire cluster and reduce the overall performance of the virtual machine cluster.
  • the slave host will re-select the new master. This process takes a certain amount of time, which delays the cluster's failure recovery time and reduces the fault tolerance of the virtual machine cluster.
  • some Slave hosts may lose contact with the Master host. This part of the Slave host will re-elect the Master host. This resulted in two separate cluster partitions in a cluster.
  • the management center that uses the primary node as a virtual machine cluster affects the fault tolerance and performance of the virtual machine cluster.
  • the method and the computer system for processing a virtual machine cluster provided by the embodiments of the present invention can improve the fault tolerance and performance of the virtual machine cluster.
  • a method for processing a virtual machine cluster includes N virtual machines, and each of the N virtual machines saves a virtual machine list and performs the first virtual machine in a peer-to-peer manner.
  • the virtual machine list includes information of N virtual machines
  • the method of the first aspect includes: the first virtual machine sends a first heartbeat message to at least two neighboring virtual machines of the N virtual machines, so that at least two neighbor virtual machines Detecting a first heartbeat message, where the detection result of the first heartbeat message is used to determine a state of the first virtual machine, and the first virtual machine establishes a neighbor relationship with the at least two neighboring virtual machines according to the information of the N virtual machines; And detecting, by the neighboring virtual machine of the second virtual machine of the at least two virtual machines, a second heartbeat message sent by the second virtual machine, where the detection result of the second heartbeat message and the other neighboring virtual machine of the second virtual machine are detected.
  • the method further includes: the first virtual machine sends the first synchronization information to the at least two neighboring virtual machines, where the first synchronization information is used to indicate the first An update of the virtual machine list saved in the virtual machine, so that at least two neighbor virtual machines update the respective saved virtual machine list; the first virtual machine receives the second synchronization information sent by the second virtual machine, and the second synchronization information is used to indicate the first The update of the virtual machine list saved in the virtual machine, the first virtual machine updates the virtual machine list saved by the first virtual machine according to the second synchronization information.
  • the at least two neighboring virtual machines including the N virtual machines, are directly connected to the first virtual machine Two to six virtual machines that are capable of interacting with information.
  • the method further includes: when the first virtual machine determines that the state of the second virtual machine is faulty The second virtual machine triggers the second virtual machine to restart or triggers the second virtual machine to migrate from the source host of the second virtual machine to the target host, where the source host is a faulty host, and the target host is a normal host.
  • the first feedback information further includes configuration information of the second virtual machine, where the method further includes: When the virtual machine determines that the state of the second virtual machine is faulty and cannot be restarted, the first virtual machine triggers the second virtual machine to migrate from the source host where the second virtual machine is located to the target host, where the source host is faulty. Host, the target host is a normal host.
  • the method further includes: when the first virtual machine determines that the state of the second virtual machine is leaving, the first virtual machine triggers The second virtual machine is deleted from the source host where the second virtual machine is located.
  • the method of the first aspect further includes: sending, by the first virtual machine, an upper node of the N virtual machines The detection result of the second heartbeat message, so that the upper node determines the state of the second virtual machine according to the detection result of the second heartbeat message and the detection result of the heartbeat message sent by the second virtual machine to the other neighboring virtual machines of the second virtual machine;
  • the virtual machine receives an indication message sent by the upper node, where the indication message is used to indicate the status of the second virtual machine.
  • the method of the first aspect further includes: the first virtual machine receives another neighbor virtual machine of the second virtual machine Detecting a detection result obtained by the heartbeat message sent by the second virtual machine; the first virtual machine determining, according to the detection result of the second heartbeat message, and the detection result obtained by the other neighboring virtual machine of the second virtual machine detecting the heartbeat message sent by the second virtual machine The state of the second virtual machine.
  • the method of the first aspect further includes: the first virtual The machine sends information of the first virtual machine to other ones of the N virtual machines; the first virtual machine receives respective information sent by other virtual machines in the N virtual machine to generate a virtual machine list.
  • the first virtual machine when the first virtual machine joins the virtual machine cluster, the first virtual machine is directed to the index server. Sending registration information, where the registration information includes information of the first virtual machine, where the index server is a registration center of the virtual machine cluster, and is used to provide registration services for the virtual machines in the virtual machine cluster; the first virtual machine receives N sent by the index server. Information about other virtual machines in the virtual machine to generate a list of virtual machines.
  • the method further includes: the first virtual machine adopts a neighbor relationship algorithm, according to the N virtual saved in the virtual machine list Machine information, select at least two of the N virtual machines as neighbor virtual machines.
  • the information of the N virtual machines includes: state information of each of the N virtual machines Neighbor relationship information of each virtual machine in N virtual machines, number of starts of each virtual machine in N virtual machines, heartbeat value of each virtual machine in N virtual machines, and N virtual machines A combination of any one or more of configuration information of each virtual machine and configuration information of a virtual machine cluster.
  • a computer system in a second aspect, includes a physical hardware layer of at least one computer node, and a virtual machine cluster is run on a physical hardware layer of at least one computer node, the virtual machine cluster including N virtual
  • Each virtual machine of the N virtual machines saves a virtual machine list and works as a first virtual machine in a peer-to-peer manner.
  • the virtual machine list includes information of N virtual machines, where the first virtual machine includes: a sending module.
  • the sending module further sends the first synchronization information to the at least two neighboring virtual machines, where the first synchronization information is used to indicate an update of the virtual machine list saved in the first virtual machine. So that at least two neighbor virtual machines update the respective saved virtual machine list, the receiving module further receives the second synchronization information sent by the second virtual machine, and the second synchronization information is used to indicate the update of the virtual machine list saved in the second virtual machine.
  • the first virtual machine further includes an update module, configured to update the virtual machine list saved by the first virtual machine according to the second synchronization information.
  • the at least two neighboring virtual machines include the first virtual machine and the first virtual machine Two to six virtual machines with the ability to directly interact with information.
  • the computer system of the second aspect further includes: a triggering module, configured to determine a state of the second virtual machine In the event of a failure, the second virtual machine is triggered to restart or trigger the second virtual machine to migrate from the source host of the second virtual machine to the target host, where the source host is a failed host and the target host is a normal host.
  • a triggering module configured to determine a state of the second virtual machine In the event of a failure, the second virtual machine is triggered to restart or trigger the second virtual machine to migrate from the source host of the second virtual machine to the target host, where the source host is a failed host and the target host is a normal host.
  • the computer system of the second aspect further includes: a triggering module, configured to determine the second virtual machine If the status is faulty and cannot be restarted, the second virtual machine is triggered to migrate from the source host where the second virtual machine is located to the target host, where the source host is a faulty host, and the target host is a normal host. .
  • the computer system of the second aspect further includes: a triggering module, configured to determine the second virtual machine If the status is away, the second virtual machine is triggered to be deleted from the source host where the second virtual machine is located.
  • the sending module further sends the detection result of the second heartbeat message to the upper node of the N virtual machines, so that the upper layer
  • the node determines the state of the second virtual machine according to the detection result of the second heartbeat message and the detection result of the heartbeat message sent by the second virtual machine to the other neighboring virtual machines of the second virtual machine
  • the receiving module further receives the indication message sent by the upper node.
  • the indication message is used to indicate the status of the second virtual machine.
  • the receiving module further receives, by the other neighboring virtual machine of the second virtual machine, the heartbeat sent by the second virtual machine.
  • the receiving module further determines the state of the second virtual machine according to the detection result of the second heartbeat message and the detection result obtained by the other neighboring virtual machine of the second virtual machine detecting the heartbeat message sent by the second virtual machine.
  • the sending module further sends the virtual machine cluster to the other one of the N virtual machines when the first virtual machine joins the virtual machine cluster
  • the virtual machine sends information of the first virtual machine, where the index server is a note of the virtual machine cluster
  • the registration center is used to provide registration services for virtual machines in the virtual machine cluster
  • the receiving module also receives respective information sent by other virtual machines in the N virtual machine to generate a virtual machine list.
  • the sending module sends the registration information to the index server when the first virtual machine joins the virtual machine cluster
  • the registration information includes information of the first virtual machine
  • the receiving module further receives information of other virtual machines of the N virtual machines sent by the index server to generate a virtual machine list.
  • the computer system of the second aspect further includes: a selecting module, configured to adopt a neighbor relationship algorithm, according to the virtual The information of the N virtual machines saved in the machine list is selected from the N virtual machines as at least two neighboring virtual machines.
  • the information of the N virtual machines includes: a state of each of the N virtual machines Information, neighbor relationship information of each virtual machine in N virtual machines, number of starts of each virtual machine in N virtual machines, heartbeat value of each virtual machine in N virtual machines, N virtual machines A combination of any one or more of the configuration information of each virtual machine and the configuration information of the virtual machine cluster.
  • each virtual machine in the virtual machine cluster may determine at least two neighbor virtual machines according to the saved virtual machine list, and each virtual machine in the virtual machine cluster may be at least two The neighbor virtual machines send heartbeat messages to determine the state of the virtual machine based on the detection results of the at least two neighbor virtual machines. Since the state of each virtual machine can be determined by the result of detecting the heartbeat information sent by the virtual machine by its neighbor virtual machine, the master that avoids the master/slave structure becomes the bottleneck problem of the entire cluster, and since there is no re The situation of the master host is elected. Therefore, the virtual machine cluster that uses this scheme for status determination does not cause delays in recovery time and contention for resources. Therefore, the fault tolerance and performance of the virtual machine cluster are improved.
  • FIG. 1 is a schematic diagram of an architecture of a virtual machine cluster according to an embodiment of the present invention.
  • FIG. 2 is a schematic flowchart of a method for processing a virtual machine cluster according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a neighbor relationship of a virtual machine cluster according to an embodiment of the present invention.
  • FIG. 4 is a schematic flow chart of a process of establishing a virtual machine cluster in accordance with an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a heartbeat detection mechanism in accordance with an embodiment of the present invention.
  • FIG. 6 is a schematic flow chart of inter-virtual machine synchronization in accordance with an embodiment of the present invention.
  • FIG. 7 is a block diagram showing the structure of a computer system in accordance with an embodiment of the present invention.
  • FIG. 8 is a block diagram of a computer system 800 in accordance with an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of an architecture of a virtual machine cluster 100 according to an embodiment of the present invention.
  • the virtual machine cluster 100 is a distributed architecture, and multiple virtual machines in the virtual machine cluster provide services for the user equipment as a single virtual machine as a whole from the user side.
  • the relationship between virtual machines in the virtual machine cluster 100 is peer-to-peer compared to the Master/Slave virtual machine cluster architecture in the conventional technology.
  • the computer system in which the virtual machine cluster 100 is located may include, for example, physical hosts ESXi-1, ESXi-2, and ESXi-3.
  • virtual machine VM 101 and VM 102 are running on host ESXi-1
  • virtual machines VM 103 and VM 104 are running on host ESXi-2
  • virtual machines VM 105 and VM 106 are running on host ESXi-3.
  • Each virtual machine in the virtual machine cluster 100 can maintain a virtual machine list for storing information of virtual machines in the virtual machine cluster 100, including status information, configuration information, and/or management information.
  • Information synchronization between virtual machines can be implemented using the Peer to Peer (P2P) protocol.
  • the status information is used to indicate the working status of the virtual machine, for example, the usage of the CPU or the usage of the memory.
  • the configuration information is used to indicate information about configuring a virtual machine, for example, an IP address assigned to a virtual machine.
  • the management information is used to indicate information of the management virtual machine, for example, the heartbeat value of the virtual machine, the number of startups of the virtual machine, the failure of the virtual machine, the restart or migration, and the like.
  • Virtual machine VM101 establishes neighbors with virtual machines VM102, VM103, VM105, and VM106
  • the virtual machine 102 establishes a neighbor relationship with the virtual machines VM101, VM104, and VM105;
  • the virtual machine 103 establishes a neighbor relationship with the virtual machines VM101, VM106, and VM104;
  • the virtual machine 104 establishes a neighbor relationship with the virtual machines VM101, VM102, and VM106.
  • the virtual machine 105 establishes a neighbor relationship with the virtual machines VM101, VM102, and VM103;
  • the virtual machine 106 establishes a neighbor relationship with the virtual machines VM101, VM104, and VM105.
  • the neighbor relationship of the virtual machine can also be recorded in the virtual machine list. For example, a neighbor virtual machine for each virtual machine can also be listed in the virtual machine column. Each virtual machine can determine its neighbor virtual machine using a specific neighbor relation algorithm.
  • the above virtual machine cluster may be located on one physical host or on multiple physical hosts.
  • the neighbor relationship between virtual machines is independent of the physical host they are in. From the perspective of distributed clusters, only virtual machines are considered, regardless of which physical host they are actually located on.
  • a virtual machine can use a virtual machine located on the same physical host as a neighbor, or a virtual machine located on another physical host as a neighbor.
  • the foregoing neighbor relationship may refer to a physical neighbor relationship or a logical neighbor relationship.
  • FIG. 2 is a schematic flowchart of a method for processing a virtual machine cluster according to an embodiment of the present invention.
  • the method shown in FIG. 2 is performed by each virtual machine in the virtual machine cluster 100 of FIG. 1.
  • the virtual machine cluster includes N virtual machines, and each of the N virtual machines saves a virtual machine list and is peer-to-peer.
  • the mode works as the first virtual machine, and the virtual machine list includes information of N virtual machines.
  • the method of Figure 2 includes the following.
  • the first virtual machine sends a first heartbeat message to at least two neighboring virtual machines of the N virtual machines, so that at least two neighboring virtual machines detect the first heartbeat message, where the detection result of the first heartbeat message is used to determine the first a state of a virtual machine, the first virtual machine establishing a neighbor relationship with the at least two neighbor virtual machines according to information of the N virtual machines.
  • the first virtual machine is used as a neighboring virtual machine of the second virtual machine of the at least two virtual machines, and detects a second heartbeat message sent by the second virtual machine, where the detection result of the second heartbeat message and the second virtual machine are The detection result obtained by the other neighbor virtual machine detecting the heartbeat message sent by the second virtual machine is used to determine the state of the second virtual machine.
  • the first virtual machine can be any virtual machine in the virtual machine cluster.
  • To send a heartbeat message to its neighbor virtual machine so that the neighbor virtual machine of the first virtual machine monitors whether the first virtual machine is alive.
  • the first virtual machine as a neighboring virtual machine of other virtual machines can also receive other virtual machines to send heartbeat messages to monitor whether other virtual machines are alive.
  • each virtual machine in the virtual machine cluster can perform the foregoing 210 and 220 in a peer-to-peer manner, that is, each virtual machine can send a heartbeat message to its neighbor virtual machine, so that its neighbor virtual machine monitors the virtual machine.
  • the state for example, is alive.
  • the virtual machine as a neighboring virtual machine of other virtual machines can also receive other virtual machines to send heartbeat messages to monitor the status of other virtual machines, so that the state of each virtual machine can be based on multiple neighbor virtual machines of the virtual machine.
  • the results of a heartbeat test are comprehensively determined.
  • each virtual machine in the virtual machine cluster can monitor its neighbor virtual machines and be monitored by its neighbor virtual machines. Therefore, the virtual machine cluster does not need a primary node for monitoring heartbeat messages for all virtual machines.
  • each virtual machine in the virtual machine cluster may determine at least two neighbor virtual machines according to its saved virtual machine list, and each virtual machine in the virtual machine cluster may send to at least two neighbor virtual machines thereof
  • the heartbeat message is used to determine the state of the virtual machine based on the detection results of the at least two neighbor virtual machines. Since the state of each virtual machine can be determined by the result of detecting the heartbeat information sent by the virtual machine by its neighbor virtual machine, the master that avoids the master/slave structure becomes the bottleneck problem of the entire cluster, and since there is no re The situation of the master host is elected. Therefore, the virtual machine cluster that uses this scheme for status determination does not cause delays in recovery time and contention for resources. Therefore, the fault tolerance and performance of the virtual machine cluster are improved.
  • the virtual machine cluster does not have the problem of the virtual machine cluster of the master/slave architecture. Since the master node is not required in the virtual machine cluster of the embodiment of the present invention, there is no problem that the primary node performance is insufficient to support maintenance of a large number of slave nodes, and there is no reselection due to the failure of the master node. Problems caused by the primary node (for example, long recovery time, cluster brain splitting, etc.).
  • the method of FIG. 1 further includes: the first virtual machine may adopt a neighbor relationship algorithm, and select at least two of the N virtual machines as neighbor virtual machines according to the information of the N virtual machines saved in the virtual machine list.
  • the at least two neighbor virtual machines may include two to six virtual machines of the N virtual machines having the capability of directly interacting with the first virtual machine.
  • the neighbor relation algorithm described above may include criteria or policies for determining a neighbor relationship of a virtual machine.
  • the neighbor discovery algorithm in the P2P technology can be used to determine the neighbor according to an embodiment of the present invention. Relationships, for example, each virtual machine can select 2 to 6 virtual machines that are closest to the physical location of the virtual machine and do not belong to the same host as neighbor virtual machines for the virtual machine.
  • the embodiment of the present invention does not limit the neighbor relationship algorithm.
  • each virtual machine may also randomly select multiple virtual machines from the virtual machine list as its neighbor virtual machines.
  • the information of the N virtual machines includes: state information of each of the N virtual machines, neighbor relationship information of each of the N virtual machines, and N virtual machines The number of starts per virtual machine, the heartbeat value of each virtual machine in N virtual machines.
  • the configuration information includes: configuration information of each virtual machine in the N virtual machines and configuration information of the virtual machine cluster.
  • the virtual machine list may include a combination of any one or more of the above management information and configuration information.
  • the number of times a virtual machine is started may refer to the number of times the virtual machine is started after it joins the virtual machine cluster, and is used to determine whether the virtual machine is restarted after the virtual machine is faulty. For example, the virtual machine is not restarted after the number of startups exceeds a preset threshold. And add a new virtual machine to ensure the stability of the entire system.
  • the heartbeat value of a virtual machine refers to the total number of heartbeat messages sent by the virtual machine since it was last started. It is used to determine the normal time since the virtual machine was last started.
  • the configuration information of the virtual machine may include the configuration information of the virtual machine cluster to which the virtual machine belongs (English: Cluster Configuration, referred to as: ClusterConf) and the node information of the virtual machine (English: Node Information, referred to as NodeInf).
  • ClusterConf the configuration information of the virtual machine cluster to which the virtual machine belongs
  • NodeInf the node information of the virtual machine
  • the virtual machine can send information saved in the virtual machine list to other virtual machines. After receiving the information, the other virtual machine maintains or updates the information of the virtual machine saved by the virtual machine according to the information.
  • the above information may also be saved in other forms.
  • the above information may be saved in the form of an array.
  • the above information may also contain other information that may be used to determine neighbor relationships, such as neighbor relationship information for other virtual machines.
  • the method of FIG. 2 further includes: the first virtual machine sends first synchronization information to the at least two neighbor virtual machines, where the first synchronization information is used to indicate the virtual machine saved in the first virtual machine. Updating the list, so that at least two neighboring virtual machines update the respective saved virtual machine list; the first virtual machine receives the second synchronization information sent by the second virtual machine, and the second synchronization information is used to indicate the virtual saved in the second virtual machine. Updating of the machine list; the first virtual machine updates the virtual machine list saved by the first virtual machine according to the second synchronization information.
  • the virtual machine The information of the N virtual machines saved in the virtual machine may be sent to the neighbor virtual machine by using the synchronization information, or the virtual machine may send only the information of the updated virtual machine to the neighbor virtual machine through the synchronization information, or the virtual The machine may send only the indication or index of the updated virtual machine information to the neighbor virtual machine through the synchronization information, so that its neighbor virtual machine updates the information of the virtual machine according to the synchronization information.
  • each virtual machine can obtain management information and configuration information of the virtual machine saved by each neighbor virtual machine of the virtual machine, and the management information and the configuration information are all up-to-date. It can be understood that if each virtual machine in the virtual machine cluster can obtain management information and configuration information of the saved virtual machine of the neighbor virtual machine, and can send management information and configuration information of the virtual machine saved by itself. For each of its neighbor virtual machines, each virtual machine can hold management information and configuration information for all virtual machines in the entire virtual machine cluster. In this way, when any one of the virtual machine clusters fails, the virtual machine that stores the management information of the failed virtual machine can use the saved management information to recover the failed virtual machine. For example, the failed virtual machine can be rebuilt on another physical host. It can be understood that, in the event of a virtual machine failure, the failed virtual machine can be restarted first. If the restart is unsuccessful, the virtual machine can be rebuilt on other physical hosts.
  • the method of FIG. 2 further includes: when the first virtual machine determines that the state of the second virtual machine is a fault, the second virtual machine triggers the second virtual machine to restart or trigger the first The second virtual machine is migrated from the source host of the second virtual machine to the target host, where the source host is a faulty host, and the target host is a normal host.
  • each virtual machine in the virtual machine cluster may indicate that the physical host where the failed virtual machine is located restarts or migrates the virtual machine if the state of the other virtual machine is determined to be faulty.
  • a virtual machine can manually initiate a restart or migration of a virtual machine by means of an alarm, or run a dedicated migration software to perform a reboot or migration. By rebooting, it is possible to make the failed virtual machine work again. Through migration, you can copy the configuration files and disk files of the failed virtual machine from the source host to the target host, enabling the failed virtual machine to work again on the target host.
  • the method of FIG. 2 further includes: when the first virtual machine determines that the state of the second virtual machine is faulty and cannot be restarted, the first virtual machine determines the second If the state of the virtual machine is faulty and cannot be restarted, the first virtual machine triggers the second virtual machine to migrate from the source host where the second virtual machine is located to the target host, where the source host is a faulty host.
  • the target host is a normal host.
  • the first virtual machine receives the information sent by the migrated second virtual machine to update the information of the migrated second virtual machine, where the first virtual machine sends the first synchronization information to the at least two neighbor virtual machines.
  • the first virtual machine sends the first synchronization information to the at least two neighbor virtual machines, where the first synchronization information includes information of the migrated second virtual machine, so that at least two neighbor virtual machines of the first virtual machine are updated after the migration.
  • the information of the second virtual machine is not limited to the migration.
  • each virtual machine in the virtual machine cluster may instruct the physical host where the failed virtual machine is located to restart the virtual machine.
  • the virtual machine determines that the faulty virtual machine cannot be restarted, for example, after the preset time of the restart command is issued (for example, the preset time may be greater than the restart time of the virtual machine), the heartbeat message sent by the faulty virtual machine is not received, The failed virtual machine cannot be restarted.
  • the virtual machine issues a migration indication.
  • the failed virtual machine can be migrated to another physical host, that is, the failed virtual machine is restarted on another physical host.
  • the preset time setting is such that the faulty virtual machine keeps the neighbor relationship of the virtual machine cluster unchanged if the restart is successful, so that the neighbor relationship does not need to be re-determined.
  • the failed virtual machine may still retain the original neighbor relationship or re-determine the neighbor relationship after restarting.
  • the information can be synchronized by the synchronization information.
  • the method of FIG. 2 further includes: when the first virtual machine determines that the state of the second virtual machine is away, the first virtual machine determines that the state of the second virtual machine is In the case of leaving, the first virtual machine triggers the second virtual machine to be deleted from the source host where the second virtual machine is located.
  • each virtual machine in the virtual machine cluster may indicate to delete the virtual machine if it determines that the state of the other virtual machine is away.
  • the virtual machine can indicate the deletion of the virtual machine manually by means of an alarm, or run a dedicated migration software to perform the deletion. For example, by deleting, you can delete the configuration file and disk file of the failed virtual machine from the source host.
  • the first virtual machine may delete the information of the second virtual machine saved in the virtual machine list, where the first virtual machine sends the first synchronization information to the at least two neighbor virtual machines, including: the first virtual machine to the first virtual machine At least two neighboring virtual machines of a virtual machine send first synchronization information, where the first synchronization information includes indication information for indicating deletion of the second virtual machine, so that at least two neighboring virtual machines of the first virtual machine delete the second virtual machine Information.
  • a virtual machine For example, if a virtual machine is actively stopped, it means that the virtual machine is no longer used and can be deleted from the virtual machine cluster. In this case, the neighbor virtual machine leaving the virtual machine is known. After the virtual machine leaves, the information of the leaving virtual machine can be deleted from the saved virtual machine list, thereby triggering other virtual machines to delete the information of the leaving virtual machine from the saved virtual machine list.
  • the method of FIG. 2 further includes: the first virtual machine sends a detection result of the second heartbeat message to an upper node of the N virtual machines, so that the upper node performs the detection result according to the second heartbeat message.
  • the detection result of the heartbeat message sent by the second virtual machine to the other neighboring virtual machine of the second virtual machine determines the state of the second virtual machine; the first virtual machine receives the indication message sent by the upper node, and the indication message is used to indicate the second virtual machine status.
  • the detection result of the heartbeat message can be reported to the upper layer node (for example, the management node) of each virtual machine of each virtual machine, and the detection result of the multiple neighbor virtual machines is integrated to determine the virtual machine. State to more accurately determine the state of the virtual machine.
  • the neighboring virtual machine of each virtual machine detects a heartbeat message sent by the virtual machine, and reports the detection result to the upper node.
  • the upper node determines that each of the neighboring virtual machines of the virtual machine does not detect the heartbeat message within a preset time, the upper node may determine that the virtual machine is faulty or left, and notify the virtual machine of the virtual machine failure or departure information.
  • Each neighbor's virtual machine may determine that the virtual machine is faulty or left, and notify the virtual machine of the virtual machine failure or departure information.
  • the reason for this is that it is possible to determine whether the virtual machine is faulty or left by detecting a heartbeat message sent by the virtual machine, so as to accurately and timely discover the faulty or leaving virtual machine.
  • the master node needs to accurately discover the faulty or leaving node according to the multiple heartbeat messages sent by the slave node, so that the faulty or leaving node cannot be found in time.
  • the method of FIG. 2 further includes: the first virtual machine receives a detection result obtained by detecting, by the other neighboring virtual machine of the second virtual machine, a heartbeat message sent by the second virtual machine; The detection result of the second heartbeat message and the detection result obtained by the other neighboring virtual machine of the second virtual machine detecting the heartbeat message sent by the second virtual machine determine the state of the second virtual machine.
  • each virtual machine's neighbor virtual machine can determine the state (eg, failure or departure) of the virtual machine according to the heartbeat message sent by the virtual machine, and each neighbor virtual machine of the virtual machine can be from the virtual machine.
  • the other neighbor virtual machines receive the state of the virtual machine detected by the other neighbor virtual machines, and determine the state of the virtual machine based on the detections. For example, if each neighboring virtual machine of the virtual machine detects the virtual machine failure and receives a message sent by another neighboring virtual machine that detects the virtual machine failure, the virtual machine may be determined to be faulty.
  • Each neighboring virtual machine of the virtual machine may actively send the detection result to other neighbor virtual machines when the heartbeat message is not detected within a predetermined time, or periodically send the detection result to other neighbor virtual machines, and the embodiment of the present invention does not Limited, for example, to other neighbors when no heartbeat message is detected within a preset time
  • the virtual machine actively requests to send the detection result.
  • each node needs to know which virtual machines in the virtual machine cluster, and related information of the virtual machines, and then determine which virtual machines can be born on this basis. Become your own neighbor.
  • the method of FIG. 2 further includes: when the first virtual machine joins the virtual machine cluster, the first virtual machine sends the information of the first virtual machine to the other virtual machines of the N virtual machines; The first virtual machine receives respective information sent by other virtual machines in the N virtual machine.
  • the P2P protocol can be used between virtual machines, for example, the Gossip protocol for communication.
  • the Gossip process is a timing program.
  • the embodiment of the present invention can utilize the process to randomly select a certain number (for example, three) of other virtual machines from the locally maintained virtual machine list to communicate with each other in 1 s to exchange respective information. .
  • the advantage of this approach is that the entire cluster is completely self-organizing, reducing the number of configurations.
  • the method of FIG. 2 further includes: when the first virtual machine joins the virtual machine cluster, the first virtual machine sends registration information to the index server, where the registration information includes information of the first virtual machine, where
  • the index server is a registration center of the virtual machine cluster, and is used to provide a registration service for a virtual machine in the virtual machine cluster; the first virtual machine receives information of other virtual machines in the N virtual machines sent by the index server. .
  • all VM nodes in a VM cluster register their own information to the index server when they start and join the cluster, and then obtain information about other VMs in the current cluster from the server.
  • the index server can be a single node or a virtual machine in the cluster, such as the first virtual machine to join the cluster.
  • information of all virtual machines may be configured on each virtual machine by an administrator through a configuration tool, and then each virtual machine calculates and builds its own neighbor virtual machine according to a specific neighbor relationship algorithm. Adjacency.
  • the advantage of this method is that the cluster node information can be quickly learned, and the disadvantages of broadcasting hello messages are avoided.
  • FIG. 3 is a schematic diagram of a neighbor relationship of a virtual machine cluster 300 in accordance with an embodiment of the present invention.
  • the first virtual machine in the embodiment of FIG. 2 may be any one of the clusters 300.
  • the first virtual machine in this embodiment is the VM 301.
  • VM 301 establishes a neighbor relationship with VM 302 and VM 304. Therefore, the neighbor virtual machines of VM 301 are VM 302 and VM 304.
  • the second virtual machine can be any of the VM 302 and the VM 304. It is assumed that the second virtual machine in this embodiment is VM 302.
  • the first virtual machine VM 301 holds management information of the VM 301, management information of the VM 304, and management information of the VM 307.
  • the number of starts of the VM 301 in the management information of the VM 301 stored in the first node list is 2, and the heartbeat value of the VM 301 is 8; the number of starts of the VM 304 in the management information of the VM 204 stored in the first node list is 3,
  • the heartbeat value of the VM 304 is 3; the number of starts of the VM 307 in the management information of the VM 307 held in the first node list is 1, and the heartbeat value of the VM 307 is 9.
  • the second virtual machine VM 302 holds management information of the VM 302, management information of the VM 301, and management information of the VM 303.
  • the number of starts of the VM 301 in the management information of the VM 301 stored in the second node list is 2, and the heartbeat value of the VM 301 is 2; the number of starts of the VM 302 in the management information of the VM 302 saved in the second node list is 3, VM
  • the heartbeat value of 302 is 3; the number of starts of the VM 303 in the management information of the VM 303 stored in the second node list is 1, and the heartbeat value of the VM 303 is 5.
  • the management information of the virtual machine saved by the first virtual machine and the second virtual machine further includes an identifier of the virtual machine.
  • the identifier of each virtual machine in this embodiment is the corresponding number of the virtual machine in the cluster 300, for example, the identifier of the VM 301 is "VM 301".
  • the configuration information of the first virtual machine may include configuration information (ClusterConf) of the cluster to which the first virtual machine belongs and node information (NodeInf) to which the first virtual machine belongs.
  • the configuration information of the second virtual machine includes configuration information of a cluster to which the second virtual machine belongs and node information to which the second neighbor virtual machine belongs.
  • the ClusterConf in the configuration information of the first virtual machine may be 0, and the NodeInf may be 1.
  • the value of the NodeConf is 0, and the value of the ClusterConf is larger.
  • the value of the NodeInf is larger, indicating that the value of the NodeInf is newer.
  • the first virtual machine may send the management information of the three virtual machines saved in the first node list and the configuration information of the first virtual machine to the second virtual machine.
  • the second virtual machine After receiving the information, the second virtual machine maintains the management information of the virtual machine stored in the second virtual machine according to the management information of the three virtual machines.
  • the second virtual machine can compare management information of the three virtual machines with the second virtual machine.
  • the management information of the saved three virtual machines determines whether there is a virtual machine that needs to be updated in the management information of the virtual machine saved by the second virtual machine. Since the number of starts in the VM 301 management information is 2, the heartbeat value is 8, and the number of starts in the management information of the VM 301 held by the second virtual machine is 2, and the heartbeat value is 2.
  • the second virtual machine can determine that the VM 301 management information is updated than the management information held at the second virtual machine.
  • the second virtual machine may also determine the management information of the VM 304 and the management information of the VM 307, and the management information of the virtual machine saved by the second virtual machine does not have the management information of the VM 3204 and the management information of the VM 307.
  • the second virtual machine updates the management information of the saved VM 301 to the management information of the VM 301 when it is determined that the VM 301 management information is updated than the management information held by the second virtual machine.
  • the second virtual machine can save management information of the VM 304 and the VM 307.
  • the second virtual machine may further determine that the second virtual machine is receiving the management information of the VM 302 and the management information of the VM 303 and the management information of the VM 302 and the management information of the VM 303.
  • the second neighbor virtual machine may send the management information of the VM 302 and the management information of the VM 303 to the first virtual machine, so that the first virtual machine saves the management information of the VM 302 and the management information of the VM 303.
  • the second virtual machine can maintain the configuration information of the second virtual machine according to the configuration information of the first virtual machine.
  • the second virtual machine determines that the ClusterConf in the configuration information of the first virtual machine is smaller than the ClusterConf in the configuration information of the first virtual machine, and the second virtual machine may determine the second virtual The ClusterConf update in the machine's configuration information.
  • the second neighboring virtual machine determines that the NodeInf in the configuration information of the first virtual machine is greater than the NodeInf in the configuration information of the second virtual machine, and the second virtual machine may determine the NodeInf in the configuration information of the first virtual machine. Update.
  • the second virtual machine may keep the ClusterConf in the configuration information of the second virtual machine unchanged, and update the NodeInf in the configuration information of the second virtual machine to the configuration information of the first virtual machine. NodeInf.
  • the second virtual machine may send the configuration information of the second virtual machine to the first virtual machine.
  • the first virtual machine may update the configuration information of the first virtual machine according to the configuration information of the second virtual machine.
  • the process of updating the configuration information by the first virtual machine is similar to the process of updating the configuration information by the second virtual machine, and details are not described herein.
  • each virtual machine in cluster 300 can perform the above process with the corresponding virtual machine.
  • each virtual machine in the cluster 300 stores management information of other virtual machines, and the configuration information of all the virtual machines is also the same. Therefore, it is possible to avoid the problem of excessive recovery time and cluster splitting caused by downtime of the primary host.
  • a method of processing a virtual machine cluster in accordance with an embodiment of the present invention is described in detail above.
  • the process of initial establishment, monitoring, and management of a virtual machine cluster according to an embodiment of the present invention will be separately described below with reference to specific examples.
  • each virtual machine needs to know which virtual machines in the virtual machine cluster, and related information of the virtual machines, and then determine which virtual machines can be based on this. Become your own neighbor.
  • the initially created virtual machine list may include status information (for example, information such as usage of a central processing unit (CPU) and usage of memory), management information (heartbeat value, number of starts, etc.) Information) and configuration information (for example, information such as the virtual machine's IP address).
  • status information for example, information such as usage of a central processing unit (CPU) and usage of memory
  • management information heartbeat value, number of starts, etc.
  • configuration information for example, information such as the virtual machine's IP address.
  • the embodiment of the present invention can initially establish a virtual machine cluster by using a broadcast mode, a manual configuration mode, and an index server mode.
  • each virtual machine can broadcast a Hello message. After receiving the Hello message, the other virtual machines return a confirmation message of the Hello message to the virtual machine to exchange status information between the two virtual machines. , management information and/or configuration information.
  • FIG. 4 is a schematic flow chart of a process of establishing a virtual machine cluster in accordance with an embodiment of the present invention.
  • virtual machine 1 broadcasts HelloMessage (hello message), the HelloMessage contains the Information about the virtual machine, such as status information, management information, and/or configuration information.
  • HelloMessage contains the Information about the virtual machine, such as status information, management information, and/or configuration information.
  • the virtual machine 2 receives the HelloMessage sent by the virtual machine 1, reads the information of the virtual machine 1 from the HelloMessage, and adds the virtual machine 1 to the local virtual machine list, that is, records the virtual machine 1 in the local virtual machine list. ID and status information, management information, and/or configuration information of virtual machine 1.
  • the virtual machine 2 returns a confirmation message AckMessage (confirmation message) to the virtual machine 1, and the AckMessage includes information of the virtual machine saved by the virtual machine 2.
  • the virtual machine 1 receives the AckMessage from the virtual machine 2, and extracts information of the virtual machine 2 from the AckMessage to save the virtual machine, and adds it to the locally saved virtual machine list.
  • the virtual machine 1 returns an Ack2Message (Confirmation 2 message) to the virtual machine 2, and the Ack2Message includes information of the locally added virtual machine to confirm message synchronization with the virtual machine 2.
  • the virtual machine 1 can calculate and establish an adjacency relationship with the neighbor according to the virtual machine list by using a specific neighbor relationship algorithm.
  • Each virtual machine can generate a list of virtual machines locally when a virtual machine cluster is established. Then, each virtual machine can calculate its neighbor and establish an adjacency relationship with the neighbor according to the virtual machine list by using a specific neighbor relation algorithm. For example, each virtual machine can further add information about its neighbor virtual machine to the list of initially created virtual machines. Each virtual machine may calculate a neighbor virtual machine according to the virtual machine list in Table 1 and adopt a specific neighbor relation algorithm. Each virtual machine may notify the neighboring virtual machine that is determined by itself to another virtual machine, or may be each virtual machine. The same algorithm is used to directly calculate the neighbor virtual machines of all virtual machines according to the virtual machine list. For example, after determining the neighbor relationship, the virtual machine list is as shown in Table 2.
  • each virtual machine in the virtual machine cluster can learn other virtualities in the cluster.
  • the advantage of this approach is that the entire cluster is completely self-organizing, reducing the number of configurations.
  • the Gossip protocol in the P2P protocol can be adopted. Through the Gossip protocol, each P2P node can know all other nodes, or only a few neighbor nodes. As long as these nodes can communicate through the network, their status is consistent.
  • the state information, management information, and/or configuration information of all the virtual machines may be configured on the virtual machine by the administrator through the configuration tool.
  • the advantage of this approach is that each virtual machine in the virtual machine cluster can quickly learn the state information, management information and/or configuration information of the virtual machine cluster.
  • the index server when the virtual machine cluster is established by using the index server mode, the index server is equivalent to a registration center, and all the virtual machines in the virtual machine cluster will have their own state information when starting and initial joining the cluster.
  • the management information and/or configuration information is registered in the index server, and the state information, management information, and/or configuration information of other virtual machines in the virtual machine cluster are obtained from the index server.
  • the index server can be a single physical node or host, or it can be a virtual machine in a virtual machine cluster, such as the first virtual machine that joins the cluster.
  • the neighbor relationship of each virtual machine may be dynamically changed, for example, the second join
  • the virtual machine can be the first virtual machine to join as the neighbor
  • the third virtual machine can join the second virtual machine as the neighbor
  • the third virtual machine joins as the virtual machine joins.
  • Other virtual machines may be selected as neighbors.
  • FIG. 5 is a schematic diagram of a heartbeat detection mechanism in accordance with an embodiment of the present invention.
  • each virtual machine establishes its own neighbor relationship.
  • one virtual machine has two to six neighbor virtual machines, and the virtual machines monitor each other through a fast heartbeat mechanism. .
  • the virtual machines monitor each other through a fast heartbeat mechanism.
  • the advantage of multi-point detection is that it can shorten the time of fault detection by changing the concept of space. For example, referring to FIG.
  • the neighbor virtual machines 302, 304, 306, 308 of the VM 305 fail to detect the VM 305 after a predetermined time (eg, may be greater than one cycle of sending a heartbeat message and less than two cycles of sending a heartbeat message).
  • Heartbeat The message is considered to be a failure or departure of the virtual machine VM 305.
  • the detection node In order to improve the accuracy of detection and avoid false alarms, the traditional heartbeat detection mechanism, the detection node needs multiple confirmation mechanisms to determine whether the detected node is faulty. For example, if the detection node continuously loses 3 heartbeats, it is considered that the detected node is faulty. According to the mechanism of multi-point simultaneous detection and spatial time change according to an embodiment of the present invention, the failure can be determined only by one loss of heartbeat. Because it is multi-point simultaneous detection, it can effectively avoid false alarms, shorten the detection time and ensure high accuracy.
  • FIG. 6 is a schematic flow chart of inter-virtual machine synchronization in accordance with an embodiment of the present invention.
  • the virtual machine list and other related state information, management information, and/or configuration information in the virtual machine cluster need to be maintained by each virtual machine in the virtual machine cluster. And achieve data consistency by synchronizing virtual machines with each other. There are two ways to synchronize the status information, management information, and/or configuration information between virtual machines.
  • the virtual machines in the virtual machine cluster synchronize state information, management information, and/or configuration information at regular intervals or cycles.
  • the virtual machine in the virtual machine cluster triggers synchronization of status information, management information, and/or configuration information only when the state information, management information, and/or configuration information of the virtual machine changes.
  • the message synchronization process initiated by the virtual machine 1 with the virtual machine 2 includes the following.
  • the virtual machine 1 sends a synchronization message to the virtual machine 2,
  • the synchronization message may be a SynMessage (synchronization message).
  • the information of the updated content of the virtual machine list saved in the virtual machine 1 may be included in the SynMessage.
  • the updated content in the virtual machine list may be sent to the virtual machine 2 through the SynMessage, or the index corresponding to the updated content may be sent to the virtual through the SynMessage.
  • the virtual machine 2 may be a neighbor virtual machine of the virtual machine 1. Virtual Machine 1 can send SynMessage to all of its neighbor virtual machines.
  • the virtual machine 2 updates the local virtual machine list according to the synchronization message.
  • the virtual machine 2 after receiving the synchronization message, will update the updated status information, management information, and/or configuration contained in the synchronization message.
  • the information is compared with locally saved state information, management information, and/or configuration information to determine which state, management information, and/or configuration information is saved on the virtual machine. If it is determined that the state information, management information, and/or configuration information stored by the virtual machine 1 is updated, the locally saved virtual machine list is updated according to the updated state information, management information, and/or configuration information.
  • the synchronization message can be SynMessage.
  • the virtual machine 2 returns a determination message to the virtual machine 1.
  • the confirmation message can be an AckMessage.
  • status information, management information, and/or configuration information of each virtual machine in each virtual machine list of the two virtual machines may also be updated.
  • the virtual machine 1 updates the local virtual machine list according to the determination message.
  • the local virtual machine list is updated with status information, management information, and/or configuration information saved according to the virtual machine 2 included in the Ackmessage.
  • the virtual machine 1 sends a determination message to the virtual machine 2.
  • the virtual machine 2 is After receiving the synchronization message, the received status information of the updated status, management information, and/or configuration information is compared with the locally saved status information, management information, and/or configuration information, to determine which virtual machine The saved status information, management information, and/or configuration information are updated.
  • updated status information e.g, updated status information, management information, and/or an index of configuration information
  • the virtual machine 2 is After receiving the synchronization message, the received status information of the updated status, management information, and/or configuration information is compared with the locally saved status information, management information, and/or configuration information, to determine which virtual machine The saved status information, management information, and/or configuration information are updated.
  • an acknowledgment message is sent to the virtual machine 1, and the acknowledgment message includes the virtual machine 2
  • the updated status information, management information, and/or configuration information indication information on the requested virtual machine 1 is used to request the virtual machine 1 to send updated status information, management information, and/or configuration information to the virtual machine 2.
  • the confirmation message is Ack2Message. After receiving the Ack2Message, the virtual machine 2 can update the updated status information, management information and/or configuration information saved on the virtual machine 1 locally.
  • the message synchronization function can be implemented by a Gossip process.
  • two virtual machines can interact with state information, management information, and/or configuration information via GossipSynMessage, GossipAckMessage, and GossipAck2Message.
  • the management of a virtual machine cluster includes initial establishment of a virtual machine cluster, joining of a new virtual machine, failure or restart of a virtual machine, migration of a virtual machine, leaving of a virtual machine, and the like.
  • the process of joining a new virtual machine is similar to the initial establishment process of a virtual machine cluster.
  • the newly added virtual machine can also obtain the global information of the virtual machine cluster through three methods: broadcast, manual configuration, and index server, that is, state information and management of all virtual machines. Information and/or configuration information, then find their neighbors by joining the neighbor relationship algorithm inside the virtual machine and join themselves into the virtual machine cluster.
  • the virtual machines in the virtual machine cluster may be classified into three types: a seed virtual machine, a normal virtual machine, and an unreachable virtual machine.
  • the role of the seed virtual machine is to provide an initial list of virtual machines for virtual machines that are newly joined to the virtual machine cluster.
  • Unreachable virtual machines refer to those virtual machines that are temporarily unreachable through the detection mechanism between virtual machines, including virtual machine failures and restarts.
  • a normal virtual machine is a virtual machine in a virtual machine cluster other than a seed virtual machine and an unreachable virtual machine.
  • the communication mode can also be changed accordingly, that is, the virtual machines are randomly selected (the default is 3), and one of the three types of virtual machines is randomly selected.
  • the advantage of this is that it can ensure the survivability of the seed virtual machine, so that the newly added virtual machine can obtain the initial virtual machine list, and at the same time, it can monitor the virtual machines that are temporarily unreachable for various reasons, so that After they are restored, they can be known by other virtual machines in time. For example, when a virtual machine is newly added to the cluster, you can first obtain a list of virtual machines from the seed virtual machine. The virtual machine can synchronize state information, management information, and/or configuration information with other virtual machines according to the virtual machine list.
  • the virtual machine newly joined to the cluster can re-establish its own local node list by sending a broadcast message to obtain node information from other virtual machines.
  • other virtual machines in the virtual machine cluster can return their own state information, management information, and/or configuration information to the newly added virtual machine, and update local state information, management information, and/or configuration information. .
  • FIG. 7 is a block diagram of a computer system 700 in accordance with an embodiment of the present invention.
  • the computer system includes a physical hardware layer of at least one computer node, and a virtual machine cluster is run on a physical hardware layer of the at least one computer node, the virtual machine cluster includes N virtual machines, each of the N virtual machines is saved a virtual machine list, and working as a first virtual machine in a peer-to-peer manner, the virtual machine list including information of the N virtual machines, where the first virtual machine includes:
  • the sending module 710 is configured to send a first heartbeat message to at least two neighboring virtual machines of the N virtual machines, so that the at least two neighboring virtual machines detect the first heartbeat message, where the first heartbeat
  • the detection result of the message is used to determine a state of the first virtual machine, and the first virtual machine establishes a neighbor relationship with the at least two neighbor virtual machines according to the information of the N virtual machines;
  • the receiving module 720 is configured to detect a second heartbeat message sent by the second virtual machine of the at least two virtual machines, where the first virtual machine is a neighbor virtual machine of the second virtual machine, and the second The detection result of the heartbeat message and the detection result obtained by the other neighboring virtual machine of the second virtual machine detecting the heartbeat message sent by the second virtual machine are used to determine the state of the second virtual machine.
  • each virtual machine in the virtual machine cluster may determine at least two neighbor virtual machines according to its saved virtual machine list, and each virtual machine in the virtual machine cluster may send to at least two neighbor virtual machines thereof
  • the heartbeat message is used to determine the state of the virtual machine based on the detection results of the at least two neighbor virtual machines. Since the state of each virtual machine can be determined by the result of detecting the heartbeat information sent by the virtual machine by its neighbor virtual machine, the master that avoids the master/slave structure becomes the bottleneck problem of the entire cluster, and since there is no re The situation of the master host is elected. Therefore, the virtual machine cluster that uses this scheme for status determination does not cause delays in recovery time and contention for resources. Therefore, the fault tolerance and performance of the virtual machine cluster are improved.
  • the sending module 710 further sends first synchronization information to the at least two neighbor virtual machines, where the first synchronization information is used to indicate an update of the virtual machine list saved in the first virtual machine, so that at least two The neighbor virtual machines update the respective saved virtual machine list, and the receiving module 720 also And receiving, by the second virtual machine, the second synchronization information, where the second synchronization information is used to indicate an update of the virtual machine list saved in the second virtual machine, where the first virtual machine further includes an update module 730, configured to update according to the second synchronization information.
  • the at least two neighbor virtual machines include two to six virtual machines of the N virtual machines having the capability of directly interacting with the first virtual machine.
  • the computer system 700 includes: a triggering module 740, configured to trigger the second virtual machine to restart or trigger the first step when determining that the state of the second virtual machine is a fault.
  • the second virtual machine is migrated from the source host of the second virtual machine to the target host, where the source host is a faulty host, and the target host is a normal host.
  • the computer system 700 includes: a triggering module 740, configured to trigger the first virtual machine in a case where it is determined that the state of the second virtual machine is faulty and cannot be restarted.
  • the second virtual machine is migrated from the source host where the second virtual machine is located to the target host, where the source host is a faulty host, and the target host is a normal host.
  • the computer system 700 includes: a triggering module 740, configured to trigger the second virtual machine to be from the second virtual report if the state of the second virtual machine is determined to be away Deleted on the source host where the machine is located.
  • a triggering module 740 configured to trigger the second virtual machine to be from the second virtual report if the state of the second virtual machine is determined to be away Deleted on the source host where the machine is located.
  • the sending module 710 further sends the detection result of the second heartbeat message to the upper node of the N virtual machines, so that the upper node sends the second heartbeat message according to the detection result of the second heartbeat message to the second virtual machine.
  • the detection result of the heartbeat message of the other neighboring virtual machine of the second virtual machine determines the state of the second virtual machine, and the receiving module 720 further receives an indication message sent by the upper node, where the indication message is used to indicate the state of the second virtual machine.
  • the receiving module 720 further receives a detection result obtained by detecting, by the other virtual machine of the second virtual machine, a heartbeat message sent by the second virtual machine, and the receiving module 720 is further configured according to the detection result of the second heartbeat message.
  • the detection result obtained by detecting the heartbeat message sent by the second virtual machine with the other neighboring virtual machine of the second virtual machine determines the state of the second virtual machine.
  • the sending module 710 further sends information of the first virtual machine to other virtual machines of the N virtual machines when the first virtual machine joins the virtual machine cluster
  • the receiving module 720 further receives N virtual The respective information sent by other virtual machines in the machine to generate a list of virtual machines.
  • the sending module 710 sends registration information to the index server, where the registration information includes information of the first virtual machine, where the index server is a registry of virtual machine clusters for use in the virtual machine cluster
  • the virtual machine provides a registration service
  • the receiving module 720 also receives information of other virtual machines among the N virtual machines sent by the index server to generate a virtual machine list.
  • the selection module 750 adopts a neighbor relationship algorithm to select at least two of the N virtual machines as neighbor virtual machines according to the information of the N virtual machines saved in the virtual machine list.
  • the information of the N virtual machines includes: state information of each of the N virtual machines, neighbor relationship information of each of the N virtual machines, and N virtual machines The number of starts of each virtual machine, the heartbeat value of each virtual machine in the N virtual machines, the configuration information of each virtual machine in the N virtual machines, and the configuration information of the virtual machine cluster combination.
  • FIG. 8 is a block diagram of a computer system 800 in accordance with an embodiment of the present invention.
  • the computer system includes a physical hardware layer of at least one computer node, and the virtual machine cluster runs on a physical hardware layer of the at least one computer node, the virtual machine cluster includes N virtual machines, and each of the N virtual machines saves the virtual machine Listing and working in a peer-to-peer manner as a first virtual machine, the virtual machine list including information of N virtual machines, the first virtual machine comprising: a processor 810, calling and executing a code stored in the memory 850 through the bus 840;
  • the transmitter 820 is configured to send a first heartbeat message to at least two neighboring virtual machines of the N virtual machines, so that at least two neighboring virtual machines detect the first heartbeat message, where the detection result of the first heartbeat message is used to determine a state of a virtual machine, the first virtual machine establishes a neighbor relationship with the at least two neighbor virtual machines according to the information of the N virtual machines; and the receiver 830 is configured
  • the first virtual machine is a neighbor virtual machine of the second virtual machine, the detection result of the second heartbeat message, and the other of the second virtual machine Ranking the detection result of the heartbeat message sent by the second virtual machine detects the resulting virtual machine for determining the state of the second virtual machine, the second virtual machine is one of the N virtual machines.
  • each virtual machine in the virtual machine cluster may determine at least two neighbor virtual machines according to its saved virtual machine list, and each virtual machine in the virtual machine cluster may send to at least two neighbor virtual machines thereof
  • the heartbeat message is used to determine the state of the virtual machine based on the detection results of the at least two neighbor virtual machines. Since the state of each virtual machine can be determined by the result of detecting the heartbeat information sent by the virtual machine by its neighbor virtual machine, the master that avoids the master/slave structure becomes the bottleneck problem of the entire cluster, and since there is no re Election of the Master host, due to Therefore, the virtual machine cluster adopting such a scheme for state determination does not cause delay of failure recovery time and contention for resources, thereby improving fault tolerance and performance of the virtual machine cluster.
  • the transmitter 820 further sends first synchronization information to the at least two neighbor virtual machines, where the first synchronization information is used to indicate an update of the virtual machine list saved in the first virtual machine, so that at least two The neighboring virtual machines update the respective saved virtual machine lists
  • the receiver 830 further receives the second synchronization information sent by the second virtual machine, and the second synchronization information is used to indicate the update of the virtual machine list saved in the second virtual machine
  • the processor The 810 also updates the virtual machine list saved by the first virtual machine according to the second synchronization information.
  • the at least two neighbor virtual machines include two to six virtual machines of the N virtual machines having the capability of directly interacting with the first virtual machine.
  • the processor 810 is further configured to: when determining that the state of the second virtual machine is a fault, the second virtual machine triggers the second virtual machine to restart or trigger the The second virtual machine is migrated from the source host of the second virtual machine to the target host, where the source host is a faulty host, and the target host is a normal host.
  • the processor 810 is further configured to: when the first virtual machine determines that the state of the second virtual machine is faulty and cannot be restarted, the first virtual machine triggers the The second virtual machine is migrated from the source host where the second virtual machine is located to the target host, where the source host is a faulty host, and the target host is a normal host.
  • the processor 810 when the processor 810 further determines that the state of the second virtual machine is leaving, the first virtual machine triggers the second virtual machine from the second virtual machine. Deleted on the source host.
  • the transmitter 820 further sends the detection result of the second heartbeat message to the upper node of the N virtual machines, so that the upper node sends the result to the second virtual machine according to the detection result of the second heartbeat message.
  • the detection result of the heartbeat message of the other neighboring virtual machine of the second virtual machine determines the state of the second virtual machine, and the receiver 830 further receives an indication message sent by the upper node, where the indication message is used to indicate the state of the second virtual machine.
  • the receiver 830 further receives a detection result obtained by the other neighboring virtual machine of the second virtual machine detecting the heartbeat message sent by the second virtual machine, and the receiving module 730 further determines the detection result according to the second heartbeat message.
  • the detection result obtained by detecting the heartbeat message sent by the second virtual machine with the other neighboring virtual machine of the second virtual machine determines the state of the second virtual machine.
  • the transmitter 820 also joins the virtual machine cluster in the first virtual machine.
  • the information of the first virtual machine is sent to other virtual machines in the N virtual machines, and the receiver 830 also receives the respective information sent by other virtual machines in the N virtual machine to generate a virtual machine list.
  • the sender 820 when the first virtual machine joins the virtual machine cluster, sender 820 sends registration information to the index server, where the registration information includes information of the first virtual machine, where the index server is the virtual
  • the registration center of the cluster is used to provide registration services for the virtual machines in the virtual machine cluster, and the receiver 830 also receives information of other virtual machines of the N virtual machines sent by the index server to generate a virtual machine list.
  • the processor 810 uses a neighbor relationship algorithm to select at least two of the N virtual machines as neighbor virtual machines according to the information of the N virtual machines saved in the virtual machine list.
  • the information of the N virtual machines includes: state information of each of the N virtual machines, neighbor relationship information of each of the N virtual machines, and N virtual machines The number of starts of each virtual machine, the heartbeat value of each virtual machine in the N virtual machines, the configuration information of each virtual machine in the N virtual machines, and the configuration information of the virtual machine cluster combination.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be indirect coupling through some interface, device or unit.
  • a communication connection which may be in electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Abstract

本发明提供了一种管理虚拟机集群的方法和计算机系统。该虚拟机集群包括N个虚拟机,N个虚拟机中的每个虚拟机保存N个虚拟机的状态信息,并以对等方式作为第一虚拟机进行工作,虚拟机列表包括N个虚拟机的信息,该方法包括:第一虚拟机向N个虚拟机中的至少两个邻居虚拟机发送第一心跳消息,以便至少两个邻居虚拟机检测第一心跳消息,其中第一心跳消息的检测结果用于确定第一虚拟机的状态,所述第一虚拟机根据所述N个虚拟机的信息与所述至少两个邻居虚拟机建立了邻居关系;;第一虚拟机作为至少两个虚拟机中的第二虚拟机的邻居虚拟机,检测第二虚拟机发送的第二心跳消息。本发明实施例的技术方案能够提高虚拟机集群的容错能力和性能。

Description

处理虚拟机集群的方法和计算机系统
本申请要求于2015年5月14日提交中国专利局、申请号为201510244239.8、发明名称为“处理虚拟机集群的方法和计算机系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及信息技术领域,并且更具体地,涉及一种处理虚拟机集群的方法和计算机系统。
背景技术
集群(Cluster)通常是由一些互相连接在一起的节点(例如,计算机或虚拟机)构成的一个并行或分布式系统。这些节点一起工作并运行一系列共同的应用程序,同时,为用户和应用程序提供单一的系统映射。例如,对于计算机集群而言,从外部来看,计算机集群是一个系统,对外提供统一的服务,对内部来说,集群内的计算机在物理上通过电缆连接,在逻辑上则通过集群软件连接。服务器集群是把多台服务器通过通信链路连接,从外部看来,这些服务器就像一台服务器在工作,而对内部来说,外来的负载通过一定的机制动态地分配到服务器中去,从而达到超级服务器才有的高性能、高可用。
虚拟机(英文:Virtual Machine,简称“VM”)是在主机(host)上运行的软件,其可以在计算机平台和终端用户之间创造一种环境,而终端用户则是基于这个软件所创造的环境来操作。虚拟机集群是指多个虚拟机相互连接在一起构成的并行或分布式系统。
虚拟机集群通常采用主/从(Master/Slave)架构的方式。Master主机负责监测所有的slave主机,并在Slave主机宕机时对Slave主机上的虚拟机进行重启。从节点也会接收主节点发送的心跳消息,以便确认主节点是否存活。如果主Master主机的主机宕机了,集群中的Slave主机会重新选择一个Master主机。
Master主机作为集群的管理中心,负责集群中所有Slave主机的监测和管理。当集群中的Slave主机过多时,Master主机的性能会不足以支持维护 大量的Slave主机,使得Master成为整个集群的瓶颈,降低了虚拟机集群的整体性能。同时,当Master主机的主机发生故障时,Slave主机将重新选出新的Master主机。这一过程需要耗费一定的时间,因此会拖延集群的故障恢复时间,降低了虚拟机集群的容错能力。此外,一些Slave主机可能失去与Master主机的联系。这部分Slave主机会重新选举Master主机。这就导致了一个集群中出现了两个各自独立的集群分区。由于一个集群中出现了两个独立的集群分区,使得两个分区的Master主机均误以为对方出现了故障,从而争抢资源,造成资源不足和数据破坏,降低了虚拟机集群的性能。因此,采用主节点为虚拟机集群的管理中心会影响虚拟机集群的容错能力和性能。
发明内容
本发明实施例提供的处理虚拟机集群的方法和计算机系统,能够提高虚拟机集群的容错能力和性能。
第一方面,提供了一种处理虚拟机集群的方法,虚拟机集群包括N个虚拟机,N个虚拟机中的每个虚拟机保存虚拟机列表,并以对等方式作为第一虚拟机进行工作,虚拟机列表包括N个虚拟机的信息,第一方面的方法包括:第一虚拟机向N个虚拟机中的至少两个邻居虚拟机发送第一心跳消息,以便至少两个邻居虚拟机检测第一心跳消息,其中第一心跳消息的检测结果用于确定第一虚拟机的状态,第一虚拟机根据N个虚拟机的信息与至少两个邻居虚拟机建立了邻居关系;第一虚拟机作为至少两个虚拟机中的第二虚拟机的邻居虚拟机,检测第二虚拟机发送的第二心跳消息,其中第二心跳消息的检测结果和第二虚拟机的其它邻居虚拟机检测第二虚拟机发送的心跳消息得到的检测结果用于确定第二虚拟机的状态。
结合第一方面,在第一方面的第一种可能的实现方式中,该方法还包括:第一虚拟机向至少两个邻居虚拟机发送第一同步信息,第一同步信息用于指示第一虚拟机中保存的虚拟机列表的更新,以便至少两个邻居虚拟机更新各自保存的虚拟机列表;第一虚拟机接收第二虚拟机发送的第二同步信息,第二同步信息用于指示第二虚拟机中保存的虚拟机列表的更新,第一虚拟机根据第二同步信息更新第一虚拟机保存的虚拟机列表。
结合第一方面或第一种可能的实现方式,在第三种可能的实现方式中,所述至少两个邻居虚拟机包括所述N个虚拟机中具备与所述第一虚拟机直 接交互信息的能力的二至六个虚拟机。
结合第二方面或第二方面的上述任一种可能的实现方式,在第三种可能的实现方式中,该方法还包括:第一虚拟机在确定第二虚拟机的状态为故障的情况下,第二虚拟机触发第二虚拟机重启或触发第二虚拟机从第二虚拟机的源主机迁移至目标主机,其中所述源主机为故障主机,所述目标主机为正常主机。
结合第一方面或第一方面的上述任一种可能的实现方式,在第四种可能的实现方式中,该第一反馈信息还包括该第二虚拟机的配置信息;该方法还包括:第一虚拟机在确定第二虚拟机的状态为故障且无法重启的情况下,第一虚拟机触发第二虚拟机从第二虚拟机所在的源主机迁移至目标主机,其中所述源主机为故障主机,所述目标主机为正常主机。
结合第一方面上述任一种可能的实现方式,在第五种可能的实现方式中,该方法还包括:第一虚拟机确定第二虚拟机的状态为离开的情况下,第一虚拟机触发第二虚拟机从第二虚报机所在的源主机上删除。
结合第一方面或第一方面的上述任一种可能的实现方式,在第六种可能的实现方式中,第一方面的方法还包括:第一虚拟机向N个虚拟机的上层节点发送第二心跳消息的检测结果,以便上层节点根据第二心跳消息的检测结果和第二虚拟机发送给第二虚拟机的其它邻居虚拟机的心跳消息的检测结果确定第二虚拟机的状态;第一虚拟机接收上层节点发送的指示消息,指示消息用于指示第二虚拟机的状态。
结合第一方面或第一方面的上述任一种可能的实现方式,在第七种可能的实现方式中,第一方面的方法还包括:第一虚拟机接收第二虚拟机的其它邻居虚拟机检测第二虚拟机发送的心跳消息得到的检测结果;第一虚拟机根据第二心跳消息的检测结果和第二虚拟机的其它邻居虚拟机检测第二虚拟机发送的心跳消息得到的检测结果确定第二虚拟机的状态。
结合第一方面或第一方面的上述任一种可能的实现方式,在第八种可能的实现方式中,在第一虚拟机加入虚拟机集群时,第一方面的方法还包括:第一虚拟机向N个虚拟机中的其它虚拟机发送第一虚拟机的信息;第一虚拟机接收N虚拟机中其它虚拟机发送的各自的信息,以生成虚拟机列表。
结合第一方面或第一方面的上述任一种可能的实现方式,在第九种可能的实现方式中,在第一虚拟机加入虚拟机集群时,第一虚拟机向索引服务器 发送注册信息,注册信息包括第一虚拟机的信息,其中索引服务器为虚拟机集群的注册中心,用于为虚拟机集群中的虚拟机提供注册服务;第一虚拟机接收索引服务器发送的N个虚拟机中的其它虚拟机的信息,以生成虚拟机列表。
结合第一方面或第一方面的上述任一种可能的实现方式,在第十种可能的实现方式中,还包括:第一虚拟机采用邻居关系算法,根据虚拟机列表中保存的N个虚拟机的信息,从N个虚拟机中选择至少两个作为邻居虚拟机。
结合第一方面或第一方面的上述任一种可能的实现方式,在第十一种可能的实现方式中,N个虚拟机的信息包括:N个虚拟机中的每个虚拟机的状态信息、N个虚拟机中的每个虚拟机的邻居关系信息、N个虚拟机中的每个虚拟机的启动次数、N个虚拟机中的每个虚拟机的心跳值、N个虚拟机中的每个虚拟机的配置信息以及虚拟机集群的配置信息中的任意一个或多个的组合。
第二方面,提供了一种计算机系统,其特征在于,计算机系统包括至少一个计算机节点的物理硬件层,在至少一个计算机节点的物理硬件层之上运行虚拟机集群,虚拟机集群包括N个虚拟机,N个虚拟机中的每个虚拟机保存虚拟机列表,并以对等方式作为第一虚拟机进行工作,虚拟机列表包括N个虚拟机的信息,第一虚拟机包括:发送模块,用于向N个虚拟机中的至少两个邻居虚拟机发送第一心跳消息,以便至少两个邻居虚拟机检测第一心跳消息,其中第一心跳消息的检测结果用于确定第一虚拟机的状态,第一虚拟机根据N个虚拟机的信息与至少两个邻居虚拟机建立了邻居关系;接收模块,用于检测至少两个虚拟机中的第二虚拟机发送的第二心跳消息,其中第一虚拟机为第二虚拟机的邻居虚拟机,第二心跳消息的检测结果和第二虚拟机的其它邻居虚拟机检测第二虚拟机发送的心跳消息得到的检测结果用于确定第二虚拟机的状态。
结合第二方面,在第一种可能的实现方式中,发送模块还向至少两个邻居虚拟机发送第一同步信息,第一同步信息用于指示第一虚拟机中保存的虚拟机列表的更新,以便至少两个邻居虚拟机更新各自保存的虚拟机列表,接收模块还接收第二虚拟机发送的第二同步信息,第二同步信息用于指示第二虚拟机中保存的虚拟机列表的更新,第一虚拟机还包括更新模块,用于根据第二同步信息更新第一虚拟机保存的虚拟机列表。
结合第二方面或第二方面的第一种可能的实现方式,在第二种可能的实现方式中,所述至少两个邻居虚拟机包括所述N个虚拟机中具备与所述第一虚拟机直接交互信息的能力的二至六个虚拟机。
结合第二方面或第二方面的上述任一种可能的实现方式,在第三种可能的实现方式中,第二方面的计算机系统还包括:触发模块,用于在确定第二虚拟机的状态为故障的情况下,触发第二虚拟机重启或触发第二虚拟机从第二虚拟机的源主机迁移至目标主机,其中所述源主机为故障主机,所述目标主机为正常主机。
结合第二方面或第二方面的上述任一种可能的实现方式,在第四种可能的实现方式中,第二方面的计算机系统还包括:触发模块,用于在确定所述第二虚拟机的状态为故障且无法重启的情况下,触发所述第二虚拟机从所述第二虚拟机所在的源主机迁移至目标主机,其中所述源主机为故障主机,所述目标主机为正常主机。
结合第二方面或第二方面的上述任一种可能的实现方式,在第五种可能的实现方式中,第二方面的计算机系统还包括:触发模块,用于在确定所述第二虚拟机的状态为离开的情况下,触发所述第二虚拟机从所述第二虚报机所在的源主机上删除。
结合第二方面或第二方面的上述任一种可能的实现方式,在第六种可能的实现方式中,发送模块还向N个虚拟机的上层节点发送第二心跳消息的检测结果,以便上层节点根据第二心跳消息的检测结果和第二虚拟机发送给第二虚拟机的其它邻居虚拟机的心跳消息的检测结果确定第二虚拟机的状态,接收模块还接收上层节点发送的指示消息,指示消息用于指示第二虚拟机的状态。
结合第二方面或第二方面的上述任一种可能的实现方式中,在第七种可能的实现方式中,接收模块还接收第二虚拟机的其它邻居虚拟机检测第二虚拟机发送的心跳消息得到的检测结果,接收模块还根据第二心跳消息的检测结果和第二虚拟机的其它邻居虚拟机检测第二虚拟机发送的心跳消息得到的检测结果确定第二虚拟机的状态。
结合第二方面或第二方面的上述任一种可能的实现方式,在第八种可能的实现方式中,发送模块还在第一虚拟机加入虚拟机集群时,向N个虚拟机中的其它虚拟机发送第一虚拟机的信息,其中索引服务器为虚拟机集群的注 册中心,用于为虚拟机集群中的虚拟机提供注册服务,接收模块还接收N虚拟机中其它虚拟机发送的各自的信息,以生成虚拟机列表。
结合第二方面或第二方面的上述任一种可能的实现方式中,在第九种可能的实现方式中,发送模块还在第一虚拟机加入虚拟机集群时,向索引服务器发送注册信息,注册信息包括第一虚拟机的信息,接收模块还接收索引服务器发送的N个虚拟机中的其它虚拟机的信息,以生成虚拟机列表。
结合第二方面或第二方面的上述任一种可能的实现方式中,在第十种可能的实现方式中,第二方面的计算机系统还包括:选择模块,用于采用邻居关系算法,根据虚拟机列表中保存的N个虚拟机的信息,从N个虚拟机中选择至少两个作为邻居虚拟机。
结合第二方面或第二方面的上述任一种可能的实现方式中,在第十一种可能的实现方式中,N个虚拟机的信息包括:N个虚拟机中的每个虚拟机的状态信息、N个虚拟机中的每个虚拟机的邻居关系信息、N个虚拟机中的每个虚拟机的启动次数、N个虚拟机中的每个虚拟机的心跳值、N个虚拟机中的每个虚拟机的配置信息以及虚拟机集群的配置信息中的任意一个或多个的组合。
上述技术方案中,根据本发明的实施例,虚拟机集群中的每个虚拟机可以根据其保存虚拟机列表确定至少两个邻居虚拟机,虚拟机集群中的每个虚拟机可以向其至少两个邻居虚拟机发送心跳消息以便根据至少两个邻居虚拟机的检测结果确定该虚拟机的状态。由于每个虚拟机的状态均可以由其邻居虚拟机检测该虚拟机发送的心跳信息的结果来确定,因此避免了主/从结构存在的Master成为整个集群的瓶颈问题,同时由于不会出现重新选举Master主机的情况,因此,使得采用这种方案进行状态确定的虚拟机集群不会造成故障恢复时间的延迟和争抢资源的情况,因此,提高了虚拟机集群的容错能力和性能。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例中所需要使用的附图作简单地介绍,显而易见地,下面所描述的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是根据本发明实施例提供的虚拟机集群的架构的示意图。
图2是根据本发明实施例的一种处理虚拟机集群的方法的示意性流程图。
图3是本发明的实施例的虚拟机集群的邻居关系的示意图。
图4是根据本发明的实施例的建立虚拟机集群的过程的示意性流程图。
图5是根据本发明的实施例的心跳检测机制的示意图。
图6是根据本发明的实施例的虚拟机间同步的示意性流程图。
图7是根据本发明的实施例的一种计算机系统的结构示意图。
图8是根据本发明的实施例的一种计算机系统800的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,的实施例是本发明的一部分实施例,而不是全部实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都应属于本发明保护的范围。
图1是根据本发明实施例提供的虚拟机集群100的架构的示意图。
如图1所示,虚拟机集群100为分布式架构,虚拟机集群中的多个虚拟机从用户端来看整体上作为单一虚拟机为用户设备提供业务。与常规技术中Master/Slave虚拟机集群架构相比,虚拟机集群100中的虚拟机之间的关系是对等的。虚拟机集群100所在的计算机系统例如可以包括物理主机ESXi-1、ESXi-2和ESXi-3。例如,在主机ESXi-1上运行虚拟机VM101和VM102,在主机ESXi-2上运行虚拟机VM103和VM104,在主机ESXi-3上运行虚拟机VM105和VM106。虚拟机集群100中的每个虚拟机可以维护一个虚拟机列表,用于保存虚拟机集群100中的虚拟机的信息,包括状态信息、配置信息和/或管理信息。虚拟机之间的信息同步可以采用对等(Peer to Peer,P2P)协议来实现。其中状态信息用于指示虚拟机的工作状态,例如,CPU的使用量或内存的使用量等信息。配置信息用于指示配置虚拟机的相关信息,例如,分配给虚拟机的IP地址等信息。管理信息用于指示管理虚拟机的信息,例如,虚拟机的心跳值、虚拟机的启动次数、虚拟机的故障、重启或迁移等信息。
虚拟机VM101与虚拟机VM102、VM103、VM105和VM106建立了邻 居关系;虚拟机102与虚拟机VM101、VM104和VM105建立了邻居关系;虚拟机103与虚拟机VM101、VM106和VM104建立了邻居关系;虚拟机104与虚拟机VM101、VM102和VM106建立了邻居关系;虚拟机105与虚拟机VM101、VM102和VM103建立了邻居关系;虚拟机106与虚拟机VM101、VM104和VM105建立了邻居关系。虚拟机的邻居关系也可以记录在虚拟机列表中。例如,虚拟机列中还可以列出了每个虚拟机的邻居虚拟机。每个虚拟机可以采用特定的邻居关系算法确定各自的邻居虚拟机。
应理解,上述虚拟机集群可以位于一个物理主机上,也可以位于多个物理主机上。虚拟机之间的邻居关系与其所处的物理主机无关,即从分布式集群的角度,只考虑虚拟机,而不考虑其实际位于哪一台物理主机。虚拟机可以将位于同一物理主机上的虚拟机作为邻居,也可以将位于其它物理主机上的虚拟机作为邻居。
还应理解,上述邻居关系可以是指物理上的邻居关系,也可以是指逻辑上的邻居关系。
还应理解,上述物理主机和每个物理主机上虚拟机的数目仅仅是举例说明。本发明的实施例对每个物理主机上的虚拟机的数目和虚拟机集群所在的物理主机的数目不作限定。
图2是根据本发明实施例的一种处理虚拟机集群的方法的示意性流程图。图2所示的方法由图1的虚拟机集群100中的每个虚拟机执行,虚拟机集群包括N个虚拟机,N个虚拟机中的每个虚拟机保存虚拟机列表,并以对等方式作为第一虚拟机进行工作,虚拟机列表包括N个虚拟机的信息。图2的方法包括如下内容。
210,第一虚拟机向N个虚拟机中的至少两个邻居虚拟机发送第一心跳消息,以便至少两个邻居虚拟机检测第一心跳消息,其中第一心跳消息的检测结果用于确定第一虚拟机的状态,所述第一虚拟机根据所述N个虚拟机的信息与所述至少两个邻居虚拟机建立了邻居关系。
220,第一虚拟机作为上述至少两个虚拟机中的第二虚拟机的邻居虚拟机,检测第二虚拟机发送的第二心跳消息,其中第二心跳消息的检测结果和第二虚拟机的其它邻居虚拟机检测第二虚拟机发送的心跳消息得到的检测结果用于确定第二虚拟机的状态。
根据本发明的实施例,第一虚拟机作为虚拟机集群中的任一个虚拟机可 以通过向它的邻居虚拟机发送心跳消息,以便第一虚拟机的邻居虚拟机监测第一虚拟机是否存活。同时,该第一虚拟机作为其它虚拟机的邻居虚拟机也可以接收其它虚拟机发送心跳消息,以便监测其它虚拟机是否存活。具体而言,虚拟机集群中的每个虚拟机都可以以对等方式执行上述210和220,即每个虚拟机均可以向其邻居虚拟机发送心跳消息,以便其邻居虚拟机监测该虚拟机的状态,例如,是否存活。同时,该虚拟机作为其它虚拟机的邻居虚拟机也可以接收其它虚拟机发送心跳消息,以便监测其它虚拟机的状态,这样,每个虚拟机的状态可以根据该虚拟机的多个邻居虚拟机的一次心跳检测结果来综合确定。换句话说,该虚拟机集群中的每个虚拟机都可以对其邻居虚拟机进行监测,同时也被其邻居虚拟机监测。因此,该虚拟机集群无需一个用于监测所有虚拟机的心跳消息的主节点。
根据本发明的实施例,虚拟机集群中的每个虚拟机可以根据其保存虚拟机列表确定至少两个邻居虚拟机,虚拟机集群中的每个虚拟机可以向其至少两个邻居虚拟机发送心跳消息以便根据至少两个邻居虚拟机的检测结果确定该虚拟机的状态。由于每个虚拟机的状态均可以由其邻居虚拟机检测该虚拟机发送的心跳信息的结果来确定,因此避免了主/从结构存在的Master成为整个集群的瓶颈问题,同时由于不会出现重新选举Master主机的情况,因此,使得采用这种方案进行状态确定的虚拟机集群不会造成故障恢复时间的延迟和争抢资源的情况,因此,提高了虚拟机集群的容错能力和性能。
具体地,由于该虚拟机集群中的每个虚拟机与其他虚拟机的地位都是相同的或者对等的,因此该虚拟机集群不会出现主/从架构的虚拟机集群存在的问题。由于本发明的实施例的虚拟机集群中不需要主节点,因此也就不会出现由于主节点性能不足以支持维护大量从节点导致的问题,也不会出现由于主节点发生故障导致的重新选取主节点带来的问题(例如,恢复时间较长、出现集群脑裂等)。
在220中,图1的方法还包括:第一虚拟机可以采用邻居关系算法,根据虚拟机列表中保存的N个虚拟机的信息,从N个虚拟机中选择至少两个作为邻居虚拟机。上述至少两个邻居虚拟机可以包括N个虚拟机中具备与第一虚拟机直接进行信息交互的能力的二至六个虚拟机。
例如,上述邻居关系算法可以包含用于确定虚拟机的邻居关系的准则或策略。根据本发明的实施例可以采用P2P技术中的邻居发现算法来确定邻居 关系,例如,每个虚拟机可以选择距离该虚拟机的物理位置最近且不属于同一主机的2至6个虚拟机作为该虚拟机的邻居虚拟机。本发明的实施例对邻居关系算法不作限定,例如,每个虚拟机还可以从虚拟机列表中随机选择多个虚拟机作为其邻居虚拟机。
根据本发明的实施例,N个虚拟机的信息包括:N个虚拟机中的每个虚拟机的状态信息、N个虚拟机中的每个虚拟机的邻居关系信息、N个虚拟机中的每个虚拟机的启动次数、N个虚拟机中的每个虚拟机的心跳值。配置信息包括:N个虚拟机中的每个虚拟机的配置信息以及虚拟机集群的配置信息。虚拟机列表可以包括上述管理信息和配置信息中的任意一个或多个的组合。
例如,某个虚拟机的启动次数可以指该虚拟机加入虚拟机集群以来启动的次数,用于确定该虚拟机故障之后是否重启,例如当启动次数超过预设的阈值之后不再重启该虚拟机,并加入新的虚拟机以保证整个系统的稳定性。某个虚拟机的心跳值指该虚拟机上一次启动以来发送心跳消息的总数,用于确定该虚拟机上次启动以来正常时间。虚拟机的配置信息可以包括该虚拟机所属的虚拟机集群的配置信息(英文:Cluster Configuration,简称:ClusterConf)和虚拟机的节点信息(英文:Node Information,简称:NodeInf)。其中ClusterConf的取值越大,表示ClusterConf的值越新,NodeInf的取值越大,表示NodeInf的值越新。该虚拟机可以将虚拟机列表中保存的信息发送给其它虚拟机。其它虚拟机在接收到该信息后,根据该信息,对该虚拟机保存的虚拟机的信息进行维护或更新。
应理解,上述信息也可以采用其它形式进行保存,例如,上述信息可以采用数组的形式来保存。还应理解,上述信息还可以包含其它可以用于确定邻居关系的信息,例如其它虚拟机的邻居关系信息。
可选地,作为另一实施例,图2的方法还包括:第一虚拟机向至少两个邻居虚拟机发送第一同步信息,第一同步信息用于指示第一虚拟机中保存的虚拟机列表的更新,以便至少两个邻居虚拟机更新各自保存的虚拟机列表;第一虚拟机接收第二虚拟机发送的第二同步信息,第二同步信息用于指示第二虚拟机中保存的虚拟机列表的更新;第一虚拟机根据第二同步信息更新第一虚拟机保存的虚拟机列表。
例如,当虚拟机集群中的每个虚拟机中保存的状态被更新时,该虚拟机 可以通过同步信息将该虚拟机中保存的N个虚拟机的信息发给其邻居虚拟机,或者,该虚拟机可以通过同步信息仅将更新的虚拟机的信息发送给邻居虚拟机,或者该虚拟机可以通过同步信息仅将更新的虚拟机的信息的指示或索引发送给邻居虚拟机,以便其邻居虚拟机根据上述同步信息更新虚拟机的信息。
通过同步信息的交互,每个虚拟机都可以获取该虚拟机的每个邻居虚拟机保存的虚拟机的管理信息和配置信息,并且这些管理信息和配置信息都是最新的。可以理解的是,如果该虚拟机集群中的每个虚拟机都能获取邻居虚拟机的所保存的虚拟机的管理信息和配置信息,并且能够将自己保存的虚拟机的管理信息和配置信息发送给自己的邻居虚拟机,那么每个虚拟机都能够保存有整个虚拟机集群中的所有虚拟机的管理信息和配置信息。这样,当该虚拟机集群中的任一个虚拟机发生故障时,保存有发生故障的虚拟机的管理信息的虚拟机可以利用保存的管理信息进行对该发生故障的虚拟机进行恢复。例如,可以在其他物理主机上重建该发生故障的虚拟机。可以理解的是,在虚拟机发生故障时,首先可以对发生故障的虚拟机进行重启,如果重启不成功,则可以在其他物理主机上重建该虚拟机。
可选地,作为另一实施例,图2的方法还包括:第一虚拟机在确定第二虚拟机的状态为故障的情况下,第二虚拟机触发第二虚拟机重启或触发所述第二虚拟机从第二虚拟机的源主机迁移至目标主机,其中所述源主机为故障主机,所述目标主机为正常主机。
例如,虚拟机集群中的每个虚拟机在确定另一虚拟机的状态为故障的情况下,可以指示故障虚拟机所在的物理主机重新启动或迁移该虚拟机。例如,虚拟机可以通过报警方式指示人工完成虚拟机的重启或迁移,或者运行专用的迁移软件来执行重启或迁移。通过重启,有可能能够使故障虚拟机重新正常工作。通过迁移,可以将故障虚拟机的配置文件和磁盘文件从源主机拷贝至目标主机,从而使得故障虚拟机能够在目标主机上重新工作。
可选地,作为另一实施例,图2的方法还包括:第一虚拟机在确定第二虚拟机的状态为故障且无法重启的情况下,所述第一虚拟机在确定所述第二虚拟机的状态为故障且无法重启的情况下,所述第一虚拟机触发所述第二虚拟机从所述第二虚拟机所在的源主机迁移至目标主机,其中所述源主机为故障主机,所述目标主机为正常主机。
可选地,第一虚拟机接收迁移后的第二虚拟机发送的信息,以更新迁移后的第二虚拟机的信息,其中,第一虚拟机向至少两个邻居虚拟机发送第一同步信息,包括:第一虚拟机向至少两个邻居虚拟机发送第一同步信息,第一同步信息包括迁移后的第二虚拟机的信息,以便第一虚拟机的至少两个邻居虚拟机更新迁移后的第二虚拟机的信息。
例如,如果虚拟机集群中的每个虚拟机在确定另一虚拟机的状态为故障的情况下,可以指示故障虚拟机所在的物理主机重新启动该虚拟机。如果该虚拟机确定故障虚拟机无法重启,例如,在发出重启命令预设时间(例如,该预设时间可以大于虚拟机的重启时间)后仍未收到故障虚拟机发送的心跳消息,则认为该故障虚拟机无法重启,在这种情况下,该虚拟机发出迁移指示,例如,可以将故障虚拟机迁移至另一物理主机,即在另一物理主机上重启该故障虚拟机。上述预设时间的设置使得故障虚拟机在能够重启成功的情况下保持虚拟机集群的邻居关系不变,从而无需再重新确定邻居关系。
应理解,故障虚拟机在重启后可以仍然保留原来的邻居关系,或者重新确定邻居关系。在重新确定邻居关系的情况下,可以通过同步信息进行信息的同步。
可选地,作为另一实施例,图2的方法还包括:第一虚拟机确定第二虚拟机的状态为离开的情况下,所述第一虚拟机确定所述第二虚拟机的状态为离开的情况下,所述第一虚拟机触发所述第二虚拟机从所述第二虚报机所在的源主机上删除。
例如,虚拟机集群中的每个虚拟机在确定另一虚拟机的状态为离开的情况下,可以指示删除该虚拟机。例如,虚拟机可以通过报警方式指示人工完成虚拟机的删除,或者运行专用的迁移软件来执行删除。例如,通过删除,可以将故障虚拟机的配置文件和磁盘文件从源主机上删除。
可选地,第一虚拟机可以删除虚拟机列表中保存的第二虚拟机的信息,其中,第一虚拟机向至少两个邻居虚拟机发送第一同步信息,包括:第一虚拟机向第一虚拟机的至少两个邻居虚拟机发送第一同步信息,第一同步信息包括用于指示删除第二虚拟机的指示信息,以便第一虚拟机的至少两个邻居虚拟机删除第二虚拟机的信息。
例如,如果某个虚拟机被主动停止运行,则说明该虚拟机不再使用,可以从虚拟机集群中删除,在这种情况下,该离开虚拟机的邻居虚拟机在获知 该虚拟机离开之后,可以从保存虚拟机列表中删除该离开虚拟机的信息,从而触发其它虚拟机从保存的虚拟机列表中删除该离开虚拟机的信息。
可选地,作为另一实施例,图2的方法还包括:第一虚拟机向N个虚拟机的上层节点发送第二心跳消息的检测结果,以便上层节点根据第二心跳消息的检测结果和第二虚拟机发送给第二虚拟机的其它邻居虚拟机的心跳消息的检测结果确定第二虚拟机的状态;第一虚拟机接收上层节点发送的指示消息,指示消息用于指示第二虚拟机的状态。
根据本发明的实施例,可以通过每个虚拟机的多个邻居虚拟机向上层节点(例如,管理节点)上报心跳消息的检测结果,并综合多个邻居虚拟机的检测结果来确定该虚拟机的状态,以便更准确地确定该虚拟机的状态。例如,每个虚拟机的邻居虚拟机均检测该虚拟机发送的一次心跳消息,并且均将检测结果上报给上层节点。上层节点在确定该虚拟机的每个邻居虚拟机在预设时间内均未检测到心跳消息时,可以确定该虚拟机故障或离开,并将该虚拟机故障或离开的信息通知给该虚拟机的各个邻居虚拟机。这样做的好处于在于可以通过检测虚拟机发送的一次心跳消息确定该虚拟机是否故障或离开,从而准确及时发现故障或离开的虚拟机。而在采用Master/Slave结构的常规技术中,Master节点需要根据Slave节点发送的多次心跳消息来准确发现故障或离开的节点,从而无法及时发现故障或离开的节点。
可选地,作为另一实施例,图2的方法还包括:第一虚拟机接收第二虚拟机的其它邻居虚拟机检测第二虚拟机发送的心跳消息得到的检测结果;第一虚拟机根据第二心跳消息的检测结果和第二虚拟机的其它邻居虚拟机检测第二虚拟机发送的心跳消息得到的检测结果确定第二虚拟机的状态。
例如,每个虚拟机的邻居虚拟机均可以根据该虚拟机发送的心跳消息确定该虚拟机的状态(例如,故障或离开),而且该虚拟机的每个邻居虚拟机可以从该虚拟机的其它邻居虚拟机接收其它邻居虚拟机检测的该虚拟机的状态,并根据这些检测来确定该虚拟机的状态。例如,如果该虚拟机的每个邻居虚拟机检测到该虚拟机故障,同时接收到其它邻居虚拟机发送的检测到该虚拟机故障的消息,则可以判断该虚拟机故障。虚拟机的每个邻居虚拟机可以在预定时间内没有检测到心跳消息时向其它邻居虚拟机主动发送检测结果,或者周期性地向其它邻居虚拟机发送检测结果,本发明的实施例对此不作限定,例如,也可以是在预设定时间内没有检测到心跳消息时向其它邻 居虚拟机主动请求发送检测结果。
在根据本发明的实施例的虚拟机集群的初始建立过程中,每个节点需要获知虚拟机集群中有哪些虚拟机,以及这些虚拟机的相关信息,然后在此基础上确定哪些虚拟机可以生成为自己的邻居。
可选地,作为另一实施例,图2的方法还包括:在第一虚拟机加入虚拟机集群时,第一虚拟机向N个虚拟机中的其它虚拟机发送第一虚拟机的信息;第一虚拟机接收N虚拟机中其它虚拟机发送的各自的信息。
具体而言,虚拟机之间可以采用P2P协议,例如,Gossip协议进行通信。Gossip进程是一个定时程序,本发明的实施例可以利用该进程,每隔1s从本地维护的虚拟机列表中随机选取一定数量(例如,3个)的其它虚拟机进行通信,以交换各自的信息。这种方式的优点是整个集群的构建过程完全是自组织的,减少了配置的环节。
可选地,作为另一实施例,图2的方法还包括:在第一虚拟机加入虚拟机集群时,第一虚拟机向索引服务器发送注册信息,注册信息包括第一虚拟机的信息,其中所述索引服务器为所述虚拟机集群的注册中心,用于为所述虚拟机集群中的虚拟机提供注册服务;第一虚拟机接收索引服务器发送的N个虚拟机中的其它虚拟机的信息。
例如,虚拟机集群中所有虚拟机节点在启动和初始加入集群时,都将自身信息注册到该索引服务器中,再从该服务器获取当前集群的其他虚拟机的信息。应理解,索引服务器可以是一个单独的节点,也可以是集群中的一个虚拟机,如第一台加入集群的虚拟机。
应理解,根据本发明的实施例也可以由管理员通过配置工具在每个虚拟机上配置所有虚拟机的信息,然后各虚拟机根据特定的邻居关系算法计算自己的邻居虚拟机并与之建立邻接关系。这种方式的优点是,可以快速获知集群节点信息,避免了广播hello报文所存在的诸多弊端。
为了帮助本领域技术人员更好地理解本发明,下面将结合具体实施例对本发明进行进一步描述。可以理解的是,该具体实施例仅是为了帮助更好地理解本发明的技术方案,而并非对本发明的技术方案的限制。
图3是本发明的实施例的虚拟机集群300的邻居关系的示意图。以图3的虚拟机集群300为例,图2的实施例中的第一虚拟机可以是集群300中的任一个虚拟机。
假设本具体实施例中的第一虚拟机为VM 301。可以看出,VM 301与VM 302和VM 304建立了邻居关系。因此,VM 301的邻居虚拟机为VM 302和VM 304。第二虚拟机可以是VM 302和VM 304中的任一个虚拟机。假设本具体实施例中的第二虚拟机为VM 302。
假设第一虚拟机VM 301保存有VM 301的管理信息、VM 304的管理信息和VM 307的管理信息。该第一节点列表保存的VM 301的管理信息中VM 301的启动次数为2,VM 301的心跳值为8;该第一节点列表保存的VM204的管理信息中的VM 304的启动次数为3,VM 304的心跳值为3;该第一节点列表保存的VM 307的管理信息中VM 307的启动次数为1,VM 307的心跳值为9。
假设第二虚拟机VM 302保存有VM 302的管理信息,VM 301的管理信息和VM 303的管理信息。该第二节点列表保存的VM 301的管理信息中VM 301的启动次数为2,VM 301的心跳值为2;该第二节点列表保存的VM302的管理信息中VM 302的启动次数为3,VM 302的心跳值为3;该第二节点列表中保存的VM 303的管理信息中的VM 303的启动次数为1,VM 303的心跳值为5。
此外,第一虚拟机和第二虚拟机保存的虚拟机的管理信息中还包括虚拟机的标识符。为方便描述,假设在本实施例中每个虚拟机的标识符就是该虚拟机在集群300中对应的编号,例如VM 301的标识符就是“VM 301”。
进一步,该第一虚拟机的配置信息可以包括该第一虚拟机所属的簇的配置信息(ClusterConf)和该第一虚拟机所属的节点信息(NodeInf)。同理,该第二虚拟机的配置信息包括该第二虚拟机所属的簇的配置信息和该第二邻居虚拟机所属的节点信息。在本实施例中,该第一虚拟机的配置信息中的ClusterConf可以为0,NodeInf可以为1。该第二虚拟机的配置信息中的ClusterConf可以为1,NodeInf可以为0,其中ClusterConf的取值越大,表示ClusterConf的值越新,NodeInf的取值越大,表示NodeInf的值越新
该第一虚拟机可以将该第一节点列表中保存的3个虚拟机的管理信息以及该第一虚拟机的配置信息发送给该第二虚拟机。该第二虚拟机在接收到该信息后,根据3个虚拟机的管理信息,对该第二虚拟机保存的虚拟机的管理信息进行维护。
具体地,该第二虚拟机可以比较3个虚拟机的管理信息和该第二虚拟机 保存的3个虚拟机的管理信息,确定该第二虚拟机保存的虚拟机的管理信息中是否有需要更新的虚拟机。由于VM 301管理信息中的启动次数为2,心跳值为8,而该第二虚拟机保存的VM 301的管理信息中的启动次数为2,心跳值为2。该第二虚拟机可以确定VM 301管理信息比在该第二虚拟机保存的管理信息更新。此外该第二虚拟机还可以确定出VM 304的管理信息和VM 307的管理信息,而该第二虚拟机保存的虚拟机的管理信息中没有VM3204的管理信息和VM 307的管理信息。
该第二虚拟机在确定出VM 301管理信息比在该第二虚拟机保存的管理信息更新的情况下,将该保存的VM 301的管理信息更新为VM 301的管理信息。同时,该第二虚拟机可以保存VM 304和VM 307的管理信息。
进一步,该第二虚拟机还可以确定出该第二虚拟机在接收到VM 302的管理信息和VM 303的管理信息且VM 302的管理信息和VM 303的管理信息。该第二邻居虚拟机可以将VM 302的管理信息和VM 303的管理信息发送给该第一虚拟机,以便于该第一虚拟机保存VM 302的管理信息和VM 303的管理信息。
该第二虚拟机可以根据第一虚拟机的配置信息对该第二虚拟机的配置信息进行维护。在本具体实施例中,该第二虚拟机确定出该第一虚拟机的配置信息中的ClusterConf小于该第一虚拟机的配置信息中的ClusterConf,则该第二虚拟机可以确定该第二虚拟机的配置信息中的ClusterConf更新。该第二邻居虚拟机确定该第一虚拟机的配置信息中的NodeInf大于该第二虚拟机的配置信息中的NodeInf,则该第二虚拟机可以确定该第一虚拟机的配置信息中的NodeInf更新。在此情况下,该第二虚拟机可以保持该第二虚拟机的配置信息中的ClusterConf不变,将该第二虚拟机的配置信息中的NodeInf更新为该第一虚拟机的配置信息中的NodeInf。
该第二虚拟机在确定该第二虚拟机的配置信息比该第一虚拟机的配置信息更新的情况下,可以将该第二虚拟机的配置信息发送给该第一虚拟机。该第一虚拟机在接收到该第二虚拟机的配置信息后,可以根据该第二虚拟机的配置信息对该第一虚拟机的配置信息进行更新。该第一虚拟机更新配置信息的过程与该第二虚拟机更新配置信息的过程类似,在此就不必赘述。
在通过上述过程后,该第一虚拟机所维护的虚拟机的管理信息中的内容与该第二虚拟机所维护的虚拟机的管理信息中的内容完全相同,并且该第一 虚拟机的配置信息也与该第二虚拟机完全相同。类似的,集群300中的每一个虚拟机都可以与相应的虚拟机进行上述过程。这样,在不需要主节点的情况下,集群300中的每一个虚拟机都保存有其他虚拟机的管理信息,并且所有虚拟机的配置信息也是相同的。因此,可以避免主节点主机发生宕机导致的恢复时间过长以及集群脑裂的问题。
上面详细描述了根据本发明的实施例的处理虚拟机集群的方法。下面结合具体的例子分别描述根据本发明的实施例的虚拟机集群的初始建立、监测和管理的过程。
在根据本发明的实施例的虚拟机集群的初始建立过程中,每个虚拟机需要获知虚拟机集群中有哪些虚拟机,以及这些虚拟机的相关信息,然后在此基础上确定哪些虚拟机可以成为自己的邻居。
在本实施例中,初始建立的虚拟机列表可以包括状态信息(例如,中央处理器(Center Process Unit,CPU)的使用量和内存的使用量等信息)、管理信息(心跳值和启动次数等信息)和配置信息(例如,虚拟机的IP地址等信息)。例如,初始建立虚拟机集群后,虚拟机列表如表1所示。
表1
Figure PCTCN2015095654-appb-000001
本发明的实施例可以通过广播方式、人工配置方式以及索引服务器方式初始建立虚拟机集群。
在采用广播方式建立虚拟机集群时,每个虚拟机可以广播Hello消息,其它虚拟机接收到Hello消息后向该虚拟机返回Hello报文的确认消息,以便在两个虚拟机之间交互状态信息、管理信息和/或配置信息。
图4是根据本发明的实施例的建立虚拟机集群的过程的示意性流程图。
参见图4,采用广播方式建立虚拟机集群的具体过程如下。
410,虚拟机1广播HelloMessage(你好报文),该HelloMessage包含该 虚拟机的的信息,例如,状态信息、管理信息和/或配置信息。
420,虚拟机2接收到虚拟机1发送的HelloMessage,从该HelloMessage中读取虚拟机1的信息,并将虚拟机1添加到本地虚拟机列表中,即在本地虚拟机列表中记录虚拟机1的ID和虚拟机1的状态信息、管理信息和/或配置信息。
430,虚拟机2向虚拟机1返回确认消息AckMessage(确认报文),该AckMessage包含有虚拟机2保存的虚拟机的信息。
440,虚拟机1接收到来自虚拟机2的AckMessage,从该AckMessage中提取虚拟机2保存虚拟机的信息,并添加到本地保存的虚拟机列表。
450,虚拟机1向虚拟机2返回Ack2Message(确认2报文),该Ack2Message包含本地添加的虚拟机的信息,以确认与虚拟机2间的消息同步。
460,虚拟机1可以根据该虚拟机列表,采用特定的邻居关系算法计算出自己的邻居并与之建立邻接关系。
各个虚拟机可以在建立虚拟机集群时在本地生成虚拟机列表。然后,各个虚拟机可以根据该虚拟机列表,采用特定的邻居关系算法计算出自己的邻居并与之建立邻接关系。例如,各个虚拟机可以在初始建立的虚拟机列表中进一步添加其邻居虚拟机的信息。各个虚拟机可以根据表1的虚拟机列表,并采用特定的邻居关系算法计算邻居虚拟机,可以是每个虚拟机将自己确定的邻居虚拟机通知给其它虚拟机,也可以是每个虚拟机采用相同的算法根据虚拟机列表直接计算出所有虚拟机的邻居虚拟机。例如,确定邻居关系之后,虚拟机列表如表2所示。
表2
Figure PCTCN2015095654-appb-000002
通过以上的过程,虚拟机集群中每个虚拟机都可以获知集群中其它虚拟 机的存在,以及它们的状态信息、管理信息和/或配置信息,从而在每个虚拟机本地形成一个虚拟机集群的全局信息列表。这种方式的优点是整个集群的构建过程完全是自组织的,减少了配置的环节。
应理解,当采用P2P协议构建虚拟机集群时,可以采用P2P协议中的Gossip协议。通过Gossip协议进行交互,每个P2P节点可以知道所有其他节点,也可能仅知道几个邻居节点,只要这些节点可以通过网络连通,最终他们的状态都是一致的。
可替代地,作为另一实施例,在采用人工配置方式建立虚拟机集群时,可以由管理员通过配置工具在每个虚拟机上配置所有虚拟机的状态信息、管理信息和/或配置信息。这种方式的优点是,虚拟机集群中的每个虚拟机可以快速获知虚拟机集群的状态信息、管理信息和/或配置信息。
可替代地,作为另一实施例,在采用索引服务器方式建立虚拟机集群时,索引服务器相当于一个注册中心,虚拟机集群中所有虚拟机在启动和初始加入集群时,都将自身的状态信息、管理信息和/或配置信息注册到该索引服务器中,再从该索引服务器获取虚拟机集群中的其他虚拟机的状态信息、管理信息和/或配置信息。应理解,索引服务器可以是一个单独的物理节点或主机,也可以是虚拟机集群中的一个虚拟机,如第一台加入集群的虚拟机。
应理解,在建立虚拟机集群的过程中,在虚拟机集群中的每个虚拟机逐个加入虚拟机集群的情况下,每个虚拟机的邻居关系可以是动态变化的,例如,第二个加入的虚拟机可以将第一个加入的虚拟机作为邻居,而第三个加入虚拟机可以将第二个加入的虚拟机作为邻居,而随着加入虚拟机的增多,第三个加入的虚拟机可能选择其它虚拟机作为邻居。
图5是根据本发明的实施例的心跳检测机制的示意图。
采用上述实施例的方法构建虚拟机集群后,每个虚拟机也就确立了各自的邻居关系,通常一个虚拟机会有2~6个邻居虚拟机,虚拟机之间会通过快速心跳机制相互进行监测。这样,对于其中任一虚拟机,都会有多个邻居虚拟机对其同时进行检测,一旦该虚拟机发生故障,其多个邻居虚拟机会同时检测到,即所谓的多点检测机制。多点检测的好处在于可以通过空间换时间的理念,缩短故障检测的时间。例如,参见图5,VM305的邻居虚拟机302、304、306、308均在预定的时间(例如,可以大于一个发送心跳消息的周期且小于两次发送心跳消息的周期)之后未能检测到VM305发送的心跳 消息,则认为虚拟机VM305故障或离开。
为了提高检测的准确性,避免误报,传统的心跳检测机制,检测节点需要多次确认机制才能判定被检测节点是否故障,例如,检测节点连续丢失3个心跳才认为被检测节点故障。而根据本发明的实施例的采用多点同时检测和空间换时间的机制,则可以仅通过一次心跳的丢失即可判定故障。由于是多点同时检测,所以可以有效避免误报,缩短检测时间的同时还可以确保高的准确性。
图6是根据本发明的实施例的虚拟机间同步的示意性流程图。
在本发明实施例的虚拟机集群中,没有中心节点的概念,虚拟机集群中的虚拟机列表及其它有关状态信息、管理信息和/或配置信息需要虚拟机集群中的每台虚拟机来维护,并通过虚拟机相互间的同步来达到数据的一致性。而虚拟机间的状态信息、管理信息和/或配置信息的同步可以有两种方式。
1)周期性定时同步
虚拟机集群中的虚拟机按照规定的时间间隔或周期进行状态信息、管理信息和/或配置信息的同步。
2)事件触发式同步
虚拟机集群中的虚拟机只有在虚拟机的状态信息、管理信息和/或配置信息有变化时才触发状态信息、管理信息和/或配置信息的同步。
例如,虚拟机1发起的与虚拟机2的消息同步过程包括如下内容。
610,虚拟机1向虚拟机2发送同步消息,
具体而言,该同步消息可以为SynMessage(同步报文)。SynMessage中可以包含虚拟机1中保存的虚拟机列表的更新内容的信息,例如,可以将虚拟机列表中更新内容通过SynMessage发送给虚拟机2,也可以将更新内容对应的索引通过SynMessage发送给虚拟机2。在本发明的实施例中,虚拟机2可以为虚拟机1的邻居虚拟机。虚拟机1可以向其所有邻居虚拟机发送SynMessage。
620,虚拟机2根据同步消息更新本地的虚拟机列表。
例如,如果同步消息中包含的是更新的状态信息、管理信息和/或配置信息,则虚拟机2在接收到同步消息后,将同步消息中包含的更新的状态信息、管理信息和/或配置信息与本地保存的状态信息、管理信息和/或配置信息进行比较,确定哪个虚拟机上保存的状态信息、管理信息和/或配置信息更新, 如果确定虚拟机1保存的状态信息、管理信息和/或配置信息更新,则根据该更新的状态信息、管理信息和/或配置信息更新本地保存的虚拟机列表。该同步消息可以为SynMessage。
630,虚拟机2向虚拟机1返回确定消息。
如果本地保存的状态信息、管理信息和/或配置信息更新,则将本地保存的状态信息、管理信息和/或配置信息再通过确认消息反馈给虚拟机1。该确认消息可以为AckMessage。
应理解,也可以是两个虚拟机的虚拟机列表中各有一部分虚拟机的状态信息、管理信息和/或配置信息更新。
640,虚拟机1根据确定消息更新本地的虚拟机列表。
当虚拟机1接收到该AckMessage后,以根据Ackmessage中包含的虚拟机2保存的状态信息、管理信息和/或配置信息,更新本地的虚拟机列表。
650,虚拟机1向虚拟机2发送确定消息。
可替代地,如果同步消息中包含的是更新的状态信息、管理信息和/或配置信息的指示信息(例如,更新的状态信息、管理信息和/或配置信息的索引),则虚拟机2在接收到同步消息后,会将接收到的更新的状态信息、管理信息和/或配置信息的指示信息与本地保存的状态信息、管理信息和/或配置信息的指示信息进行比较,确定哪个虚拟机上保存的状态信息、管理信息和/或配置信息更新,如果确定虚拟机1保存的状态信息、管理信息和/或配置信息更新,则向虚拟机1发送确认消息,该确认消息包含虚拟机2请求的虚拟机1上更新的状态信息、管理信息和/或配置信息的指示信息,用于请求虚拟机1向虚拟机2发送更新的状态信息、管理信息和/或配置信息。该确认消息是Ack2Message,虚拟机2接收到Ack2Message后,可以将虚拟机1上保存的更新的状态信息、管理信息和/或配置信息在本地更新。
应理解,当本发明的实施例采用P2P协议时,可以由一个Gossip进程来实现消息同步功能。例如,两个虚拟机可以通过GossipSynMessage、GossipAckMessage和GossipAck2Message交互状态信息、管理信息和/或配置信息。
上面的例子详细描述了根据本发明的实施例的虚拟机集群的初始建立和监测的过程,下面详细描述根据本发明的实施例的虚拟机集群的其它管理的过程。
虚拟机集群的管理包括虚拟机集群初始建、新虚拟机的加入,虚拟机的故障或重启、虚拟机的迁移、虚拟机的离开等等。
新虚拟机的加入过程与虚拟机集群初始建立过程类似,新加入的虚拟机也可以通过广播、人工配置和索引服务器三种方式获得虚拟机集群的全局信息,即全部虚拟机的状态信息、管理信息和/或配置信息,然后通过新加入虚拟机内部的邻居关系算法找到自己的邻居,并将自己加入到虚拟机集群中。
当某个虚拟机故障时,可以首先重启该虚拟机,如果重启不成功,则可以在其它物理主机上重建该虚拟机,即迁移该虚拟机。
因此,在检测到某个虚拟机故障时,为了防止该虚拟机重启成功后再次加入虚拟机集群而造成的虚拟机集群结构的震荡问题,需要在检测到该虚拟机故障时等待一段时间(例如,虚拟机重启的时间),如果该故障虚拟机重启成功,则仍然回到虚拟机集群中的原来位置,这个整个集群不需要重新配置或同步。即当虚拟机故障或重启时,不会将该虚拟机从虚拟机列表删除。此时虚拟机集群中的虚拟机间的故障检测机制(如心跳),仍然会对该虚拟机进行持续的监测,以便确定该虚拟机是否已恢复。只有当该虚拟机重启不成功时,才会触发整个集群的重新配置或同步。
当某个虚拟机离开虚拟机集群时,其它虚拟机通过同步过程可以获知该虚拟机离开,从而更新各自的本地虚拟机列表。
进一步,为了便于虚拟机管理和提高虚拟机的通讯效率,可以考虑将虚拟机集群中的虚拟机进行分类,通常可以分为下面的三类:种子虚拟机、普通虚拟机和不可达虚拟机。
种子虚拟机的作用主要是为新加入虚拟机集群的虚拟机提供一个初始的虚拟机列表。不可达虚拟机是指通过虚拟机间的检测机制发现临时不可达的那些虚拟机,包括虚拟机故障、重启中等。普通虚拟机为虚拟机集群中除种子虚拟机和不可达虚拟机之外的虚拟机。
如果采用上述虚拟机分类方式,通讯方式也可以随之改变,即由原来随机选取虚拟机(默认为3个),改为从上述三类虚拟机中各随机选取一个。这样做的好处是,可以确保种子虚拟机的存活性,以便于新加入集群的虚拟机获得初始的虚拟机列表,同时又可以对那些由于各种原因而临时不可达的虚拟机保持监测,以便在它们恢复后能被其它虚拟机及时获知。例如,当某个虚拟机新加入集群,可以首先从种子虚拟机获取一个虚拟机列表,此后该 虚拟机可以根据该虚拟机列表与其它虚拟机同步状态信息、管理信息和/或配置信息。
在不对虚拟机集群中的虚拟机进行上述分类的情况下,新加入集群的虚拟机可以通过发送一个广播消息从其它虚拟机获取节点信息,重新建立自己本地的节点列表。虚拟机集群中其它的虚拟机收到此广播消息后,可以向新加入的虚拟机返回自己的状态信息、管理信息和/或配置信息,同时更新本地的状态信息、管理信息和/或配置信息。
图7是根据本发明的实施例的一种计算机系统700的结构示意图。计算机系统包括至少一个计算机节点的物理硬件层,在至少一个计算机节点的物理硬件层之上运行虚拟机集群,虚拟机集群包括N个虚拟机,所述N个虚拟机中的每个虚拟机保存虚拟机列表,并以对等方式作为第一虚拟机进行工作,所述虚拟机列表包括所述N个虚拟机的信息,所述第一虚拟机包括:
发送模块710,用于向所述N个虚拟机中的至少两个邻居虚拟机发送第一心跳消息,以便所述至少两个邻居虚拟机检测所述第一心跳消息,其中所述第一心跳消息的检测结果用于确定所述第一虚拟机的状态,所述第一虚拟机根据所述N个虚拟机的信息与所述至少两个邻居虚拟机建立了邻居关系;
接收模块720,用于检测所述至少两个虚拟机中的第二虚拟机发送的第二心跳消息,其中所述第一虚拟机为所述第二虚拟机的邻居虚拟机,所述第二心跳消息的检测结果和所述第二虚拟机的其它邻居虚拟机检测所述第二虚拟机发送的心跳消息得到的检测结果用于确定所述第二虚拟机的状态。
根据本发明的实施例,虚拟机集群中的每个虚拟机可以根据其保存虚拟机列表确定至少两个邻居虚拟机,虚拟机集群中的每个虚拟机可以向其至少两个邻居虚拟机发送心跳消息以便根据至少两个邻居虚拟机的检测结果确定该虚拟机的状态。由于每个虚拟机的状态均可以由其邻居虚拟机检测该虚拟机发送的心跳信息的结果来确定,因此避免了主/从结构存在的Master成为整个集群的瓶颈问题,同时由于不会出现重新选举Master主机的情况,因此,使得采用这种方案进行状态确定的虚拟机集群不会造成故障恢复时间的延迟和争抢资源的情况,因此,提高了虚拟机集群的容错能力和性能。
可选地,作为另一实施例,发送模块710还向至少两个邻居虚拟机发送第一同步信息,第一同步信息用于指示第一虚拟机中保存的虚拟机列表的更新,以便至少两个邻居虚拟机更新各自保存的虚拟机列表,接收模块720还 接收第二虚拟机发送的第二同步信息,第二同步信息用于指示第二虚拟机中保存的虚拟机列表的更新,第一虚拟机还包括更新模块730,用于根据第二同步信息更新第一虚拟机保存的虚拟机列表。
可选地,作为另一实施例,至少两个邻居虚拟机包括N个虚拟机中具备与第一虚拟机直接交互信息的能力的二至六个虚拟机。
可选地,作为另一实施例,计算机系统700包括:触发模块740,用于在确定所述第二虚拟机的状态为故障的情况下,触发所述第二虚拟机重启或触发所述第二虚拟机从所述第二虚拟机的源主机迁移至目标主机,其中所述源主机为故障主机,所述目标主机为正常主机。
可选地,作为另一实施例,计算机系统700包括:触发模块740,用于在确定所述第二虚拟机的状态为故障且无法重启的情况下,所述第一虚拟机触发所述第二虚拟机从所述第二虚拟机所在的源主机迁移至目标主机,其中所述源主机为故障主机,所述目标主机为正常主机。
可选地,作为另一实施例,计算机系统700包括:触发模块740,用于在确定所述第二虚拟机的状态为离开的情况下,触发所述第二虚拟机从所述第二虚报机所在的源主机上删除。
可选地,作为另一实施例,发送模块710还向N个虚拟机的上层节点发送第二心跳消息的检测结果,以便上层节点根据第二心跳消息的检测结果和第二虚拟机发送给第二虚拟机的其它邻居虚拟机的心跳消息的检测结果确定第二虚拟机的状态,接收模块720还接收上层节点发送的指示消息,指示消息用于指示第二虚拟机的状态。
可选地,作为另一实施例,接收模块720还接收第二虚拟机的其它邻居虚拟机检测第二虚拟机发送的心跳消息得到的检测结果,接收模块720还根据第二心跳消息的检测结果和第二虚拟机的其它邻居虚拟机检测第二虚拟机发送的心跳消息得到的检测结果确定第二虚拟机的状态。
可选地,作为另一实施例,发送模块710还在第一虚拟机加入虚拟机集群时,向N个虚拟机中的其它虚拟机发送第一虚拟机的信息,接收模块720还接收N虚拟机中其它虚拟机发送的各自的信息,以生成虚拟机列表。
可选地,作为另一实施例,发送模块710还在第一虚拟机加入虚拟机集群时,向索引服务器发送注册信息,注册信息包括第一虚拟机的信息,其中所述索引服务器为所述虚拟机集群的注册中心,用于为所述虚拟机集群中的 虚拟机提供注册服务,接收模块720还接收索引服务器发送的N个虚拟机中的其它虚拟机的信息,以生成虚拟机列表。
根据本发明的实施例,选择模块750采用邻居关系算法,根据虚拟机列表中保存的N个虚拟机的信息,从N个虚拟机中选择至少两个作为邻居虚拟机。
根据本发明的实施例,N个虚拟机的信息包括:N个虚拟机中的每个虚拟机的状态信息、N个虚拟机中的每个虚拟机的邻居关系信息、N个虚拟机中的每个虚拟机的启动次数、N个虚拟机中的每个虚拟机的心跳值、N个虚拟机中的每个虚拟机的配置信息以及虚拟机集群的配置信息中的任意一个或多个的组合。
计算机系统700的各个部分的操作和功能可以参考上述图3的方法,为了避免重复,在此不再赘述。
图8是根据本发明的实施例的一种计算机系统800的结构示意图。计算机系统包括至少一个计算机节点的物理硬件层,在至少一个计算机节点的物理硬件层之上运行虚拟机集群,虚拟机集群包括N个虚拟机,N个虚拟机中的每个虚拟机保存虚拟机列表,并以对等方式作为第一虚拟机进行工作,虚拟机列表包括N个虚拟机的信息,第一虚拟机包括:处理器810,通过总线840调用和执行存储在存储器850中的代码;发送器820,用于向N个虚拟机中的至少两个邻居虚拟机发送第一心跳消息,以便至少两个邻居虚拟机检测第一心跳消息,其中第一心跳消息的检测结果用于确定第一虚拟机的状态,第一虚拟机根据N个虚拟机的信息与至少两个邻居虚拟机建立了邻居关系;接收器830,用于检测至少两个虚拟机中的第二虚拟机发送的第二心跳消息,其中第一虚拟机为第二虚拟机的邻居虚拟机,第二心跳消息的检测结果和第二虚拟机的其它邻居虚拟机检测第二虚拟机发送的心跳消息得到的检测结果用于确定第二虚拟机的状态,第二虚拟机为N个虚拟机之一。
根据本发明的实施例,虚拟机集群中的每个虚拟机可以根据其保存虚拟机列表确定至少两个邻居虚拟机,虚拟机集群中的每个虚拟机可以向其至少两个邻居虚拟机发送心跳消息以便根据至少两个邻居虚拟机的检测结果确定该虚拟机的状态。由于每个虚拟机的状态均可以由其邻居虚拟机检测该虚拟机发送的心跳信息的结果来确定,因此避免了主/从结构存在的Master成为整个集群的瓶颈问题,同时由于不会出现重新选举Master主机的情况,因 此,使得采用这种方案进行状态确定的虚拟机集群不会造成故障恢复时间的延迟和争抢资源的情况,因此,提高了虚拟机集群的容错能力和性能。
可选地,作为另一实施例,发送器820还向至少两个邻居虚拟机发送第一同步信息,第一同步信息用于指示第一虚拟机中保存的虚拟机列表的更新,以便至少两个邻居虚拟机更新各自保存的虚拟机列表,接收器830还接收第二虚拟机发送的第二同步信息,第二同步信息用于指示第二虚拟机中保存的虚拟机列表的更新,处理器810还根据第二同步信息更新第一虚拟机保存的虚拟机列表。
可选地,作为另一实施例,至少两个邻居虚拟机包括N个虚拟机中具备与第一虚拟机直接交互信息的能力的二至六个虚拟机。
可选地,作为另一实施例,处理器810还用于在确定所述第二虚拟机的状态为故障的情况下,所述第二虚拟机触发所述第二虚拟机重启或触发所述第二虚拟机从所述第二虚拟机的源主机迁移至目标主机,其中所述源主机为故障主机,所述目标主机为正常主机。
可选地,作为另一实施例,处理器810还用于所述第一虚拟机在确定所述第二虚拟机的状态为故障且无法重启的情况下,所述第一虚拟机触发所述第二虚拟机从所述第二虚拟机所在的源主机迁移至目标主机,其中所述源主机为故障主机,所述目标主机为正常主机。
可选地,作为另一实施例,处理器810还在确定所述第二虚拟机的状态为离开的情况下,所述第一虚拟机触发所述第二虚拟机从所述第二虚报机所在的源主机上删除。
可选地,作为另一实施例,发送器820还向N个虚拟机的上层节点发送第二心跳消息的检测结果,以便上层节点根据第二心跳消息的检测结果和第二虚拟机发送给第二虚拟机的其它邻居虚拟机的心跳消息的检测结果确定第二虚拟机的状态,接收器830还接收上层节点发送的指示消息,指示消息用于指示第二虚拟机的状态。
可选地,作为另一实施例,接收器830还接收第二虚拟机的其它邻居虚拟机检测第二虚拟机发送的心跳消息得到的检测结果,接收模块730还根据第二心跳消息的检测结果和第二虚拟机的其它邻居虚拟机检测第二虚拟机发送的心跳消息得到的检测结果确定第二虚拟机的状态。
可选地,作为另一实施例,发送器820还在第一虚拟机加入虚拟机集群 时,向N个虚拟机中的其它虚拟机发送第一虚拟机的信息,接收器830还接收N虚拟机中其它虚拟机发送的各自的信息,以生成虚拟机列表。
可选地,作为另一实施例,发送器820在第一虚拟机加入虚拟机集群时,向索引服务器发送注册信息,注册信息包括第一虚拟机的信息,其中所述索引服务器为所述虚拟机集群的注册中心,用于为所述虚拟机集群中的虚拟机提供注册服务,接收器830还接收索引服务器发送的N个虚拟机中的其它虚拟机的信息,以生成虚拟机列表。
根据本发明的实施例,处理器810采用邻居关系算法,根据虚拟机列表中保存的N个虚拟机的信息,从N个虚拟机中选择至少两个作为邻居虚拟机。
根据本发明的实施例,N个虚拟机的信息包括:N个虚拟机中的每个虚拟机的状态信息、N个虚拟机中的每个虚拟机的邻居关系信息、N个虚拟机中的每个虚拟机的启动次数、N个虚拟机中的每个虚拟机的心跳值、N个虚拟机中的每个虚拟机的配置信息以及虚拟机集群的配置信息中的任意一个或多个的组合。
计算机系统800的各个部分的操作和功能可以参考上述图3的方法,为了避免重复,在此不再赘述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合 或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内,因此本发明的保护范围应以权利要求的保护范围为准。

Claims (24)

  1. 一种处理虚拟机集群的方法,其特征在于,所述虚拟机集群包括N个虚拟机,所述N个虚拟机中的每个虚拟机保存虚拟机列表,并以对等方式作为第一虚拟机进行工作,所述虚拟机列表包括所述N个虚拟机的信息,所述方法包括:
    所述第一虚拟机向所述N个虚拟机中的至少两个邻居虚拟机发送第一心跳消息,以便所述至少两个邻居虚拟机检测所述第一心跳消息,其中所述第一心跳消息的检测结果用于确定所述第一虚拟机的状态,所述第一虚拟机根据所述N个虚拟机的信息与所述至少两个邻居虚拟机建立了邻居关系;
    所述第一虚拟机作为所述至少两个虚拟机中的第二虚拟机的邻居虚拟机,检测所述第二虚拟机发送的第二心跳消息,其中所述第二心跳消息的检测结果和所述第二虚拟机的其它邻居虚拟机检测所述第二虚拟机发送的心跳消息得到的检测结果用于确定所述第二虚拟机的状态。
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    所述第一虚拟机向所述至少两个邻居虚拟机发送第一同步信息,所述第一同步信息用于指示所述第一虚拟机中保存的所述虚拟机列表的更新,以便所述至少两个邻居虚拟机更新各自保存的虚拟机列表;
    所述第一虚拟机接收所述第二虚拟机发送的第二同步信息,所述第二同步信息用于指示所述第二虚拟机中保存的虚拟机列表的更新;
    所述第一虚拟机根据所述第二同步信息更新所述第一虚拟机保存的所述虚拟机列表。
  3. 如权利要求1所述的方法,其特征在于,所述至少两个邻居虚拟机包括所述N个虚拟机中具备与所述第一虚拟机直接交互信息的能力的二至六个虚拟机。
  4. 根据权利要求1至3中的任一个所述方法,其特征在于,还包括:
    所述第一虚拟机在确定所述第二虚拟机的状态为故障的情况下,所述第二虚拟机触发所述第二虚拟机重启或触发所述第二虚拟机从所述第二虚拟机的源主机迁移至目标主机,其中所述源主机为故障主机,所述目标主机为正常主机。
  5. 根据权利要求1至3中的任一项所述方法,其特征在于,还包括:
    所述第一虚拟机在确定所述第二虚拟机的状态为故障且无法重启的情 况下,所述第一虚拟机触发所述第二虚拟机从所述第二虚拟机所在的源主机迁移至目标主机,其中所述源主机为故障主机,所述目标主机为正常主机。
  6. 根据权利要求2至5中的任一项所述的方法,其特征在于,还包括:
    所述第一虚拟机在确定所述第二虚拟机的状态为离开的情况下,所述第一虚拟机触发所述第二虚拟机从所述第二虚报机所在的源主机上删除。
  7. 根据权利要求1至6中的任一项所述的方法,其特征在于,所述方法还包括:
    所述第一虚拟机向所述N个虚拟机的上层节点发送所述第二心跳消息的检测结果,以便所述上层节点根据所述第二心跳消息的检测结果和所述第二虚拟机发送给所述第二虚拟机的其它邻居虚拟机的心跳消息的检测结果确定所述第二虚拟机的状态;
    所述第一虚拟机接收所述上层节点发送的指示消息,所述指示消息用于指示第二虚拟机的状态。
  8. 根据权利要求1至6中的任一项所述的方法,其特征在于,所述方法还包括:
    所述第一虚拟机接收所述第二虚拟机的其它邻居虚拟机检测所述第二虚拟机发送的心跳消息得到的检测结果;
    所述第一虚拟机根据所述第二心跳消息的检测结果和所述第二虚拟机的其它邻居虚拟机检测所述第二虚拟机发送的心跳消息得到的检测结果确定所述第二虚拟机的状态。
  9. 如权利要求1至8中任一项所述的方法,其特征在于,在所述第一虚拟机加入所述虚拟机集群时,所述方法还包括:
    所述第一虚拟机向所述N个虚拟机中的其它虚拟机发送所述第一虚拟机的信息;
    所述第一虚拟机接收所述N虚拟机中其它虚拟机发送的各自的信息,以生成所述虚拟机列表。
  10. 如权利要求1至8中的任一项所述的方法,其特征在于,在所述第一虚拟机加入所述虚拟机集群时,所述方法还包括:
    所述第一虚拟机向索引服务器发送注册信息,所述注册信息包括所述第一虚拟机的信息,其中所述索引服务器为所述虚拟机集群的注册中心,用于为所述虚拟机集群中的虚拟机提供注册服务;
    所述第一虚拟机接收所述索引服务器发送的所述N个虚拟机中的其它虚拟机的信息,以生成所述虚拟机列表。
  11. 如权利要求1至10中的任一项所述的方法,其特征在于,还包括:
    所述第一虚拟机采用邻居关系算法,根据所述虚拟机列表中保存的所述N个虚拟机的信息,从所述N个虚拟机中选择至少两个作为邻居虚拟机。
  12. 如权利要求1至11中的任一项所述的方法,其特征在于,所述N个虚拟机的信息包括:所述N个虚拟机中的每个虚拟机的状态信息、所述N个虚拟机中的每个虚拟机的邻居关系信息、所述N个虚拟机中的每个虚拟机的启动次数、所述N个虚拟机中的每个虚拟机的心跳值、所述N个虚拟机中的每个虚拟机的配置信息以及所述虚拟机集群的配置信息中的任意一个或多个的组合。
  13. 一种计算机系统,其特征在于,所述计算机系统包括至少一个计算机节点的物理硬件层,在所述至少一个计算机节点的物理硬件层之上运行虚拟机集群,所述虚拟机集群包括N个虚拟机,所述N个虚拟机中的每个虚拟机保存虚拟机列表,并以对等方式作为第一虚拟机进行工作,所述虚拟机列表包括所述N个虚拟机的信息,所述第一虚拟机包括:
    发送模块,用于向所述N个虚拟机中的至少两个邻居虚拟机发送第一心跳消息,以便所述至少两个邻居虚拟机检测所述第一心跳消息,其中所述第一心跳消息的检测结果用于确定所述第一虚拟机的状态,所述第一虚拟机根据所述N个虚拟机的信息与所述至少两个邻居虚拟机建立了邻居关系;
    接收模块,用于检测所述至少两个虚拟机中的第二虚拟机发送的第二心跳消息,其中所述第一虚拟机为所述第二虚拟机的邻居虚拟机,所述第二心跳消息的检测结果和所述第二虚拟机的其它邻居虚拟机检测所述第二虚拟机发送的心跳消息得到的检测结果用于确定所述第二虚拟机的状态。
  14. 如权利要求13所述的计算机系统,其特征在于,所述发送模块还向所述至少两个邻居虚拟机发送第一同步信息,所述第一同步信息用于指示所述第一虚拟机中保存的所述虚拟机列表的更新,以便所述至少两个邻居虚拟机更新各自保存的虚拟机列表,所述接收模块还接收所述第二虚拟机发送的第二同步信息,所述第二同步信息用于指示所述第二虚拟机中保存的虚拟机列表的更新,所述第一虚拟机还包括更新模块,用于根据所述第二同步信息更新所述第一虚拟机保存的所述虚拟机列表。
  15. 如权利要求13或14所述的方法,其特征在于,所述至少两个邻居虚拟机包括所述N个虚拟机中具备与所述第一虚拟机直接交互信息的能力的二至六个虚拟机。
  16. 根据权利要求13至15中的任一项所述计算机系统,其特征在于,还包括:
    触发模块,用于在确定所述第二虚拟机的状态为故障的情况下,触发所述第二虚拟机重启或触发所述第二虚拟机从所述第二虚拟机的源主机迁移至目标主机,其中所述源主机为故障主机,所述目标主机为正常主机。
  17. 根据权利要求13至15中的任一项所述计算机系统,其特征在于,还包括:
    触发模块,用于在确定所述第二虚拟机的状态为故障且无法重启的情况下,触发所述第二虚拟机从所述第二虚拟机所在的源主机迁移至目标主机,其中所述源主机为故障主机,所述目标主机为正常主机。
  18. 根据权利要求13至17中的任一项所述的计算机系统,其特征在于,还包括:
    触发模块,用于在确定所述第二虚拟机的状态为离开的情况下,触发所述第二虚拟机从所述第二虚报机所在的源主机上删除。
  19. 根据权利要求13至18中的任一项所述的计算机系统,其特征在于,所述发送模块还向所述N个虚拟机的上层节点发送所述第二心跳消息的检测结果,以便所述上层节点根据所述第二心跳消息的检测结果和所述第二虚拟机发送给所述第二虚拟机的其它邻居虚拟机的心跳消息的检测结果确定所述第二虚拟机的状态,所述接收模块还接收所述上层节点发送的指示消息,所述指示消息用于指示第二虚拟机的状态。
  20. 根据权利要求13至18中的任一项所述的计算机系统,其特征在于,所述接收模块还接收所述第二虚拟机的其它邻居虚拟机检测所述第二虚拟机发送的心跳消息得到的检测结果,所述接收模块还根据所述第二心跳消息的检测结果和所述第二虚拟机的其它邻居虚拟机检测所述第二虚拟机发送的心跳消息得到的检测结果确定所述第二虚拟机的状态。
  21. 如权利要求13至20中任一项所述的计算机系统,其特征在于,所述发送模块还在所述第一虚拟机加入所述虚拟机集群时,向所述N个虚拟机中的其它虚拟机发送所述第一虚拟机的信息,所述接收模块还接收所述N虚 拟机中其它虚拟机发送的各自的信息,以生成所述虚拟机列表。
  22. 如权利要求13至20中的任一项所述的计算机系统,其特征在于,所述发送模块还在所述第一虚拟机加入所述虚拟机集群时,向索引服务器发送注册信息,所述注册信息包括所述第一虚拟机的信息,其中所述索引服务器为所述虚拟机集群的注册中心,用于为所述虚拟机集群中的虚拟机提供注册服务,所述接收模块还接收所述索引服务器发送的所述N个虚拟机中的其它虚拟机的信息,以生成所述虚拟机列表。
  23. 如权利要求13至22中的任一项所述的计算机系统,其特征在于,还包括:
    选择模块,用于采用邻居关系算法,根据所述虚拟机列表中保存的所述N个虚拟机的信息,从所述N个虚拟机中选择至少两个作为邻居虚拟机。
  24. 如权利要求13至23中的任一项所述的计算机系统,其特征在于,所述N个虚拟机的信息包括:所述N个虚拟机中的每个虚拟机的状态信息、所述N个虚拟机中的每个虚拟机的邻居关系信息、所述N个虚拟机中的每个虚拟机的启动次数、所述N个虚拟机中的每个虚拟机的心跳值、所述N个虚拟机中的每个虚拟机的配置信息以及所述虚拟机集群的配置信息中的任意一个或多个的组合。
PCT/CN2015/095654 2015-05-14 2015-11-26 处理虚拟机集群的方法和计算机系统 WO2016180005A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP15891701.3A EP3291487B1 (en) 2015-05-14 2015-11-26 Method for processing virtual machine cluster and computer system
US15/812,747 US10728099B2 (en) 2015-05-14 2017-11-14 Method for processing virtual machine cluster and computer system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510244239.8 2015-05-14
CN201510244239.8A CN106302569B (zh) 2015-05-14 2015-05-14 处理虚拟机集群的方法和计算机系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/812,747 Continuation US10728099B2 (en) 2015-05-14 2017-11-14 Method for processing virtual machine cluster and computer system

Publications (1)

Publication Number Publication Date
WO2016180005A1 true WO2016180005A1 (zh) 2016-11-17

Family

ID=57247781

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/095654 WO2016180005A1 (zh) 2015-05-14 2015-11-26 处理虚拟机集群的方法和计算机系统

Country Status (4)

Country Link
US (1) US10728099B2 (zh)
EP (1) EP3291487B1 (zh)
CN (1) CN106302569B (zh)
WO (1) WO2016180005A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114465953A (zh) * 2022-01-26 2022-05-10 亚信科技(成都)有限公司 一种标识生成方法、装置及存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11113087B2 (en) * 2017-07-12 2021-09-07 Amzetta Technologies, Llc Techniques of discovering VDI systems and synchronizing operation information of VDI systems by sending discovery messages and information messages
CN107734018A (zh) * 2017-09-29 2018-02-23 中国石油化工股份有限公司 快速部署云应用服务的系统和方法
CN109684128B (zh) * 2018-11-16 2020-12-08 深圳证券交易所 消息中间件的集群整体故障恢复方法、服务器及存储介质
US11012506B2 (en) * 2019-03-15 2021-05-18 Microsoft Technology Licensing, Llc Node and cluster management on distributed self-governed ecosystem
US11588698B2 (en) * 2019-06-19 2023-02-21 Hewlett Packard Enterprise Development Lp Pod migration across nodes of a cluster
US20220229679A1 (en) * 2021-01-15 2022-07-21 Microsoft Technology Licensing, Llc Monitoring and maintaining health of groups of virtual machines

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102110071A (zh) * 2011-03-04 2011-06-29 浪潮(北京)电子信息产业有限公司 一种虚拟机集群系统及其实现方法
CN102455951A (zh) * 2011-07-21 2012-05-16 中标软件有限公司 一种虚拟机容错方法和系统
CN103201724A (zh) * 2010-07-30 2013-07-10 赛门铁克公司 在高可用性虚拟机环境中提供高可用性应用程序
CN104601622A (zh) * 2013-10-31 2015-05-06 国际商业机器公司 一种部署集群的方法和系统

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6456599B1 (en) * 2000-02-07 2002-09-24 Verizon Corporate Services Group Inc. Distribution of potential neighbor information through an ad hoc network
US7251222B2 (en) * 2001-05-15 2007-07-31 Motorola, Inc. Procedures for merging the mediation device protocol with a network layer protocol
US7184421B1 (en) * 2001-12-21 2007-02-27 Itt Manufacturing Enterprises, Inc. Method and apparatus for on demand multicast and unicast using controlled flood multicast communications
US6925064B2 (en) * 2003-09-11 2005-08-02 Motorola, Inc. Method and apparatus for discovering neighbors within a piconet communication system
US7440436B2 (en) * 2005-05-13 2008-10-21 Natural Lighting Systems, Inc. Self-organizing multi-channel mesh network
CN101567804B (zh) * 2009-05-21 2013-06-05 华为数字技术(成都)有限公司 一种实现系统异常保护的方法、设备和系统
EP2583417B1 (en) * 2010-06-18 2018-08-29 Nokia Solutions and Networks Oy Server cluster
CN102355369B (zh) 2011-09-27 2014-01-08 华为技术有限公司 虚拟化集群系统及其处理方法和设备
CN102591443A (zh) 2011-12-29 2012-07-18 华为技术有限公司 一种虚拟化集群整合方法、装置及系统
US9116181B2 (en) 2011-12-29 2015-08-25 Huawei Technologies Co., Ltd. Method, apparatus, and system for virtual cluster integration
CN103259685B (zh) 2013-05-24 2016-03-09 杭州华三通信技术有限公司 检测链路故障的方法及网络设备
CN103593243B (zh) * 2013-11-01 2017-05-10 浪潮电子信息产业股份有限公司 一种可动态扩展的增加虚拟机资源集群系统
US9600320B2 (en) * 2015-02-11 2017-03-21 International Business Machines Corporation Mitigation of virtual machine security breaches

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103201724A (zh) * 2010-07-30 2013-07-10 赛门铁克公司 在高可用性虚拟机环境中提供高可用性应用程序
CN102110071A (zh) * 2011-03-04 2011-06-29 浪潮(北京)电子信息产业有限公司 一种虚拟机集群系统及其实现方法
CN102455951A (zh) * 2011-07-21 2012-05-16 中标软件有限公司 一种虚拟机容错方法和系统
CN104601622A (zh) * 2013-10-31 2015-05-06 国际商业机器公司 一种部署集群的方法和系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3291487A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114465953A (zh) * 2022-01-26 2022-05-10 亚信科技(成都)有限公司 一种标识生成方法、装置及存储介质

Also Published As

Publication number Publication date
EP3291487B1 (en) 2019-10-02
US10728099B2 (en) 2020-07-28
US20180097701A1 (en) 2018-04-05
CN106302569A (zh) 2017-01-04
EP3291487A1 (en) 2018-03-07
CN106302569B (zh) 2019-06-18
EP3291487A4 (en) 2018-04-18

Similar Documents

Publication Publication Date Title
WO2016180005A1 (zh) 处理虚拟机集群的方法和计算机系统
US10713134B2 (en) Distributed storage and replication system and method
WO2019085875A1 (zh) 存储集群的配置修改方法、存储集群及计算机系统
WO2018036148A1 (zh) 一种服务器集群系统
US8949828B2 (en) Single point, scalable data synchronization for management of a virtual input/output server cluster
CN103207841B (zh) 基于键值对缓存的数据读写方法及装置
US8583773B2 (en) Autonomous primary node election within a virtual input/output server cluster
CN111615066B (zh) 一种基于广播的分布式微服务注册及调用方法
TW200534637A (en) Redundant routing capabilities for a network node cluster
CN102394914A (zh) 集群脑裂处理方法和装置
CN107368369B (zh) 分布式容器管理方法及系统
WO2013044828A1 (zh) 虚拟化集群系统及其处理方法和设备
WO2015154620A1 (zh) OpenFlow多控制器系统及其管理方法
CN110971662A (zh) 一种基于Ceph的两节点高可用实现方法及装置
CN111752488B (zh) 存储集群的管理方法、装置、管理节点及存储介质
CN112887367B (zh) 实现分布式集群高可用的方法、系统及计算机可读介质
CN107046474B (zh) 一种服务集群
US10305987B2 (en) Method to syncrhonize VSAN node status in VSAN cluster
WO2023041073A1 (zh) 一种多节点间的数据同步方法、系统、设备及存储介质
US20200028731A1 (en) Method of cooperative active-standby failover between logical routers based on health of attached services
CN113794765A (zh) 基于文件传输的网闸负载均衡方法及装置
US9798633B2 (en) Access point controller failover system
WO2022130005A1 (en) Granular replica healing for distributed databases
KR101401006B1 (ko) 고가용성 시스템에서 소프트웨어 업데이트를 수행하기 위한 방법 및 장치
CN114363356B (zh) 数据同步方法、系统、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15891701

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2015891701

Country of ref document: EP