CN102355369B - Virtual clustered system as well as processing method and processing device thereof - Google Patents

Virtual clustered system as well as processing method and processing device thereof Download PDF

Info

Publication number
CN102355369B
CN102355369B CN201110301796.0A CN201110301796A CN102355369B CN 102355369 B CN102355369 B CN 102355369B CN 201110301796 A CN201110301796 A CN 201110301796A CN 102355369 B CN102355369 B CN 102355369B
Authority
CN
China
Prior art keywords
node
host node
virtual machine
slave node
common host
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201110301796.0A
Other languages
Chinese (zh)
Other versions
CN102355369A (en
Inventor
江滢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201110301796.0A priority Critical patent/CN102355369B/en
Publication of CN102355369A publication Critical patent/CN102355369A/en
Priority to PCT/CN2012/082196 priority patent/WO2013044828A1/en
Application granted granted Critical
Publication of CN102355369B publication Critical patent/CN102355369B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • H04L41/0836Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability to enhance reliability, e.g. reduce downtime

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses a virtual clustered system as well as a processing method and a processing device thereof. The system comprises at least two partitions, wherein each partition comprises one main node and at least one spare node; each main node and each spare node are respectively provided with at least one virtual machine; a peer-to-peer architecture is used between the main nodes in different partitions; a star architecture is used between the main node and the spare node in each partition; the main nodes comprise one management main node and at least one normal main node, wherein the management main node is used for reselecting a new normal main node or spare node in the partition of the ineffective normal node or spare node when the normal node or the spare node is ineffective, or, rebooting the virtual machine when the virtual machine on the normal main node or spare node is failed. According to the embodiment of the invention, the expandability and availability of the system can be improved.

Description

System of virtual cluster and processing method thereof and equipment
Technical field
The present invention relates to the network communications technology, relate in particular to a kind of system of virtual cluster and processing method thereof and equipment.
Background technology
Group system has powerful overall computational performance, memory property and management of performance, and the service form of single system mapping, and, to availability guarantee and the fault-tolerant ability of user transparent, becomes the main flow infrastructure architecture of data center.The application of Intel Virtualization Technology, for cluster development provide more excellent also more potential solution party to.Intel Virtualization Technology allows a platform to move a plurality of operating system simultaneously, and application program can move and be independent of each other in separate space, thereby significantly improves the operating efficiency of computer.Move the calculating potential that a plurality of virtual machines can take full advantage of physical server, for data center provides capability of fast response.
After introducing Intel Virtualization Technology, can expand and high availability is the ultimate challenge that group system faces.
Summary of the invention
The embodiment of the present invention is to provide a kind of system of virtual cluster and processing method and equipment, improves extensibility and the availability of virtual machine cluster system.
The embodiment of the present invention provides a kind of processing method of system of virtual cluster, comprising:
Node judges whether to occur at least one in following: has the common host node lost efficacy, has the slave node lost efficacy, or, there is the virtual machine of fault;
Node, after determining the common host node that has inefficacy, is lived again and is imitated new common host node; After determining the slave node that has inefficacy, the new slave node of the effect of living again; Perhaps, after determining and having the virtual machine of fault, restart virtual machine;
Wherein, described common host node and slave node are divided in the subregion of at least two, comprise a host node and the slave node of at least one in each subregion; Be respectively provided to the virtual machine of few on each host node and each slave node; Adopt the peer-type framework between host node in different subregions; Adopt star schema between host node in each subregion and slave node; Described host node comprises a management host node and the common host node of at least one.
The embodiment of the present invention provides a kind for the treatment of facility of system of virtual cluster, comprising:
Judging unit, for judging whether to occur following at least one: there is the common host node lost efficacy, have the slave node lost efficacy, or, there is the virtual machine of fault;
Processing unit, for after determining the common host node that has inefficacy, the new common host node of the effect of living again; After determining the slave node that has inefficacy, the new slave node of the effect of living again; Perhaps, after determining and having the virtual machine of fault, restart virtual machine;
Wherein, described common host node and slave node are divided in the subregion of at least two, comprise a host node and the slave node of at least one in each subregion; Be respectively provided to the virtual machine of few on each host node and each slave node; Adopt the peer-type framework between host node in different subregions; Adopt star schema between host node in each subregion and slave node; Described host node comprises a management host node and the common host node of at least one.
The embodiment of the present invention provides a kind of system of virtual cluster, comprising:
The subregion of at least two, comprise a host node and the slave node of at least one in each subregion; Be respectively provided to the virtual machine of few on each host node and each slave node;
Adopt the peer-type framework between host node in different subregions;
Adopt star schema between host node in each subregion and slave node;
Described host node comprises a management host node and the common host node of at least one, described management host node is for after common host node or slave node inefficacy, new common host node or slave node of gravity treatment in the subregion at the common host node lost efficacy or slave node place, perhaps, during virtual-machine fail on common host node or slave node, restart virtual machine.
As shown from the above technical solution, the system of virtual cluster of the embodiment of the present invention, by dividing subregion, can, by increasing subregion, be realized system extension; Adopt the peer-type structure between the host node of subregion, can eliminate bottleneck problem, and can improve reliability; New host node, the slave node or restart virtual machine and can further improve reliability by gravity treatment.
The accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, in below describing embodiment, the accompanying drawing of required use is briefly described, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.
The system configuration schematic diagram that Fig. 1 is first embodiment of the invention;
The method flow schematic diagram that Fig. 2 is first embodiment of the invention;
The device structure schematic diagram that Fig. 3 is first embodiment of the invention;
The method flow schematic diagram that Fig. 4 is second embodiment of the invention;
The system configuration schematic diagram that Fig. 5 is second embodiment of the invention;
The method flow schematic diagram that Fig. 6 is third embodiment of the invention;
The system configuration schematic diagram that Fig. 7 is third embodiment of the invention;
The method flow schematic diagram that Fig. 8 is fourth embodiment of the invention;
The system configuration schematic diagram that Fig. 9 is fourth embodiment of the invention.
Embodiment
For the purpose, technical scheme and the advantage that make the embodiment of the present invention clearer, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Embodiment based in the present invention, those of ordinary skills, not making under the creative work prerequisite the every other embodiment obtained, belong to the scope of protection of the invention.
The system configuration schematic diagram that Fig. 1 is first embodiment of the invention, referring to Fig. 1, this system comprises the subregion 1 of at least two, comprises a host node (master) 11 and the slave node (slave) 12 of at least one in each subregion; Be respectively provided to the virtual machine (Virtual Machine, VM) 13 of few on each host node 11 and each slave node 12.
For example, referring to Fig. 1, host node comprises host node A, host node B and host node C etc., the slave node of host node A place subregion comprises slave node a1, slave node a2 etc., the slave node of host node B place subregion comprises slave node b1, slave node b2 etc., and the slave node of host node C place subregion comprises slave node c1, slave node c2 etc.
Adopt the peer-type framework between host node 11 in different subregions, a host node can send resource state information to other arbitrary host node, also can receive the resource state information that other arbitrary host nodes send.Adopt star schema between host node 11 in each subregion and slave node 12, that is, slave node can send resource state information to host node, and host node does not send resource state information to slave node.This resource state information can show that corresponding node is normal or lost efficacy.
Described host node comprises a management host node (master leader) and the common host node of at least one, described management host node is for after common host node or slave node inefficacy, new common host node or slave node of gravity treatment in the subregion at the common host node lost efficacy or slave node place, perhaps, during virtual-machine fail on common host node or slave node, restart virtual machine.
Wherein, in host node one can be set in advance as the management host node, all the other host nodes are common host node, can store the information of the virtual machine on each host node and slave node and node in the management host node, all nodes in subregion are carried out to unified management, unified handling failure after breaking down.For example, referring to Fig. 1, host node C can be set for the management host node, and host node A, host node B etc. are common host node.
Corresponding above-mentioned system, the flow process of each equipment room can be as follows.
The method flow schematic diagram that Fig. 2 is first embodiment of the invention comprises:
Step 21: node judges whether to occur at least one in following: has the common host node lost efficacy, has the slave node lost efficacy, or, there is the virtual machine of fault;
Step 22: node, after determining the common host node that has inefficacy, is lived again and is imitated new common host node; After determining the slave node that has inefficacy, the new slave node of the effect of living again; Perhaps, after determining and having the virtual machine of fault, restart virtual machine;
Wherein, described common host node and slave node are divided in the subregion of at least two, comprise a host node and the slave node of at least one in each subregion; Be respectively provided to the virtual machine of few on each host node and each slave node; Adopt the peer-type framework between host node in different subregions; Adopt star schema between host node in each subregion and slave node; Described host node comprises a management host node and the common host node of at least one.
Above-mentioned node can be specially common host node, management host node and slave node, and when, scene different at node is different, above-mentioned flow process can have different embodiments.Particular content can be referring to subsequent embodiment.
Accordingly, equipment corresponding to the method can be as described below.
The device structure schematic diagram that Fig. 3 is first embodiment of the invention, comprise judging unit 31 and processing unit 32; Judging unit 31 is for judging whether to occur following at least one: has the common host node lost efficacy, has the slave node lost efficacy, or, there is the virtual machine of fault; Processing unit 32, for after determining the common host node that has inefficacy, is lived again and is imitated new common host node; After determining the slave node that has inefficacy, the new slave node of the effect of living again; Perhaps, after determining and having the virtual machine of fault, restart virtual machine;
Wherein, described common host node and slave node are divided in the subregion of at least two, comprise a host node and the slave node of at least one in each subregion; Be respectively provided to the virtual machine of few on each host node and each slave node; Adopt the peer-type framework between host node in different subregions; Adopt star schema between host node in each subregion and slave node; Described host node comprises a management host node and the common host node of at least one.
Certainly, corresponding with the said method flow process, above-mentioned equipment can be for common host node, management host node, slave node, under different nodes and scene, and the concrete function difference of said units.Specifically can be referring to the following examples.
The system of virtual cluster of the embodiment of the present invention, by dividing subregion, can, by increasing subregion, be realized system extension; Adopt the peer-type structure between the host node of subregion, can eliminate bottleneck problem, and can improve reliability; New host node, the slave node or restart virtual machine and can further improve reliability by gravity treatment.
The method flow schematic diagram that Fig. 4 is second embodiment of the invention, the system configuration schematic diagram that Fig. 5 is second embodiment of the invention, it is example that the present embodiment take that common host node lost efficacy.
Referring to Fig. 4, the present embodiment comprises:
Step 41: during the cluster normal operation, the common host node of each subregion detects mutually heartbeat by heartbeat detection module (heartbeatsync).
For example, the heartbeat detection module of common host node A sends to heartbeat message the heartbeat detection module of common host node B.
Step 42: if the heartbeat detection module of common host node B detects the heartbeat message of common host node A, stop, multicast failure message, carry the identification information of common host node A in this failure message, to show common host node A, lost efficacy.
Wherein, after common host node B does not receive the heartbeat message of common host node A within a certain period of time, determine that the heartbeat that common host node A detected stops.
This identification information can be for distinguishing each node, such as being the ID of common host node A or address etc.
Wherein, all the other common host nodes and management host node all can receive this failure message.
Step 43: after the heartbeat detection module of management host node receives failure message, available (the High Availability of height to the management host node, HA) module reports the host node failure message, carries the identification information of common host node A in this host node failure message.
Step 44: manage the HA module of host node in the subregion of common host node A place, the new common host node that is this subregion by a slave node gravity treatment.
For example, according to the ID priority of each slave node, the dynamic load situation of slave node, elect the slave node a1 in the subregion of A place as new common host node.
Step 45: the HA module of management host node sends the request of migration virtual machine to the resource management module (ResourceMgmt) of management host node, carries the identification information of new common host node a1 and the identification information of common host node A in this migration virtual machine request.
Step 46: the management host node resource management module by the virtual machine (vm) migration on common host node A to new common host node a1.
For example, the configuration information of the virtual machine on common host node A is sent to new common host node a1, and indicate new common host node a1 to rerun this configuration information to restart corresponding virtual machine.Wherein, the configuration information of virtual machine is the information that can start virtual machine, is for example software virtual machine, after carrying out this software virtual machine, can start virtual machine.
Further, after new common host node adds, host node also will further upgrade member relation:
Step 47: new common host node joins request to all the other common host node multicasts, after the heartbeat detection module of all the other common host nodes detects this and joins request, send the member relation update request to corresponding member management module (MembershipMgmt), carry the identification information of the common host node of the identification information of new common host node and inefficacy in this member relation update request.
For example, after common host node B receives joining request of new common host node a1 multicast, the heartbeat detection module of common host node B sends the member relation update request to the member management module of common host node B, carries the identification information of A and the identification information of a1 in this message.
Step 48: the member relation administration module upgrades member's relation list.
For example, the identification information of new common host node a1 is added in this member's list, and delete the identification information of the common host node A lost efficacy.
With reference to above-mentioned flow process, corresponding module can be as follows:
Referring to Fig. 5, in the present embodiment, relate to common host node 51 and management host node 52.Further, for common host node, its judging unit is specially the first heartbeat module detection module (Heartbeat Sync) 511, and processing unit is specially the first member relation administration module (MembershipMgmt) 512.For the management host node, its judging unit is specially the second heartbeat detection module 521, and processing unit specifically comprises the first high available (HA) module 522 and first resource administration module (ResourceMgmt) 523.
The first heartbeat detection module 511, for after arbitrary other heartbeat of common host node being detected and stopping, determining and has the common host node lost efficacy, and the common host node of common host node for losing efficacy that stop of definite heartbeat;
The first member relation administration module 512 is for receiving the first member relation request message, carry the identification information of the common host node of the identification information of new common host node and inefficacy in described the first member relation request message, the identification information of described common host node is newly added in the first member relation list, and delete the identification information of the common host node of the described inefficacy in described the first member relation list;
Wherein, for the management host node receives after Fisrt fault message, the slave node in the common host node place subregion of described inefficacy, gravity treatment obtains described new common host node, described Fisrt fault message is that described common host node sends after determining the common host node that has inefficacy, carries the identification information of the common host node of described inefficacy in described Fisrt fault message.
The second heartbeat detection module 521 is for after receiving Fisrt fault message, determine and have the common host node lost efficacy, described Fisrt fault message is that common host node sends after determining the common host node that has inefficacy, carries the identification information of the common host node of described inefficacy in described Fisrt fault message;
The first high available modules 522 is for receiving the host node failure message, carry the identification information of the common host node of described inefficacy in described host node failure message, in the slave node of the common host node place of described inefficacy subregion, gravity treatment goes out a new common host node, and the identification information of the common host node of the identification information of described common host node newly and inefficacy is carried in the first migration virtual machine request and sends, described host node failure message is to send after receiving described Fisrt fault message;
First resource administration module 523, for according to described the first migration virtual machine request message, sends on described new common host node by the identification information of the virtual machine on the common host node of described inefficacy and restarts described virtual machine.
The present embodiment can be realized the autgmentability of group system by subregion.The present embodiment is by adopting the peer-type framework between host node, can be after a host node lose efficacy, know in time that host node lost efficacy to lay equal stress on to select new host node, and improve availability.
The method flow schematic diagram that Fig. 6 is third embodiment of the invention, the system configuration schematic diagram that Fig. 7 is third embodiment of the invention, the present embodiment is example in order to node failure.
Referring to Fig. 6, the present embodiment comprises:
Step 601: during the cluster normal operation, the slave node of each subregion is the phychology detection module transmission heartbeat message to the common host node of place subregion by the heartbeat detection module.
For example, the heartbeat detection module of slave node a1 sends to heartbeat message the heartbeat detection module of the common host node A of place subregion.
Step 602: if the heartbeat detection module of common host node A detects the heartbeat of slave node a1, stop, another slave node to the place subregion sends heartbeat detection message.
For example, common host node A does not detect the heartbeat message of slave node a1 within the time of setting, the heartbeat that common host node A detects slave node a1 stops, and send heartbeat detection message to another slave node a2 of its place subregion, carry the identification information of slave node a1 in this heartbeat detection message.
Step 603: slave node a2 detects the heartbeat situation of slave node a1.
For example, slave node a2 sends ping message to slave node a1, if do not receive the response message that slave node a1 returns, shows that slave node a1 heartbeat stops.
Step 604: slave node a2 sends the heartbeat detection result to common host node A, wherein carries the heartbeat detection result to slave node a1.
Step 605: if the heartbeat detection result also shows the heartbeat of slave node a1, stop, common host node A multicast failure message, carry the identification information of slave node a1 in this failure message.
Wherein, all the other common host nodes and management host node all can receive failure message.
Step 606: after the heartbeat detection module of management host node receives this failure message, the HA module in the management host node sends the slave node failure message, carries the identification information of the slave node a1 of inefficacy in this slave node failure message.
Step 607: the HA module of management host node, in the subregion of slave node a1 place, is elected another slave node as the slave node of migration virtual machine.
Wherein, also can select another slave node according to priority, loading condition etc.
Step 608: the HA module of management host node sends the request of migration virtual machine to the resource management module of management host node, wherein carries the identification information of the slave node of the identification information of new slave node and inefficacy.
For example, the slave node of gravity treatment is a2, moves in the virtual machine request and carries the identification information of a1 and the identification information of a2.
Step 609: the management host node resource management module by the virtual machine (vm) migration on slave node a1 to slave node a2.
For example, the configuration information of the virtual machine on slave node a1 is sent to slave node a2, and indication a2 reruns this configuration information to restart corresponding virtual machine.Wherein, the configuration information of virtual machine be can start the information of virtual machine, be for example software virtual machine, can start virtual machine after carrying out this software virtual machine.
Further, the slave node of inefficacy can be carried out following action:
Step 610: slave node a1 is after finding that own heartbeat message is lost, and the ping gateway, to oneself gateway transmission ping message.
Step 611: if ping is obstructed, can not receive response message corresponding to ping message, lower electricity.
With reference to above-mentioned flow process, corresponding module can be as follows:
Referring to Fig. 7, in the present embodiment, relate to common host node 71, management host node 72 and slave node 73.Further, for common host node, its judging unit and processing unit are same module, are specially the 3rd heartbeat module detection module 711.For the management host node, its judging unit is specially the 4th heartbeat detection module 721, and processing unit specifically comprises the second high available modules 722 and Secondary resource administration module (ResourceMgmt) 723.For slave node, its judging unit and processing unit are same module, are specially the 5th heartbeat module detection module 731.
Described the 3rd heartbeat detection module 711 determines and has the slave node lost efficacy after stopping for the heartbeat of the arbitrary slave node in described common host node place subregion being detected, and the slave node of slave node for losing efficacy that stop of definite heartbeat;
The 4th heartbeat detection module 721 is after receiving the second failure message, determine and have the slave node lost efficacy, described the second failure identification information is that common host node sends after determining the slave node that has inefficacy, carries the identification information of the slave node of described inefficacy in described the second failure identification information;
The second high available modules 722 is for receiving the slave node failure message, carry the identification information of the slave node of described inefficacy in described slave node failure message, in the subregion of the slave node place of described inefficacy, gravity treatment goes out a new slave node, and the identification information of the slave node of the identification information of described new slave node and inefficacy is carried in the second migration virtual machine request and sends, described slave node failure message is to send after receiving described the second failure message;
Secondary resource administration module 723, for according to described the second migration virtual machine request message, sends on described new slave node by the identification information of the virtual machine on the slave node of described inefficacy and restarts described virtual machine.
Described the 5th heartbeat detection module 731 for sending heartbeat message when described slave node did not lose efficacy, do not send heartbeat message when losing efficacy, so that the common host node of described slave node place subregion determines according to the situation of described heartbeat message whether described slave node lost efficacy, and carry out lower electric treatment when the slave node from as losing efficacy, perhaps, self not the slave node lost efficacy and receiving whether slave node corresponding to the rear detection of the request of detection is the slave node lost efficacy, and testing result is notified to described common host node, make described common host node be lived again and imitate the slave node processing, described detection request is not send after described common host node is received the heartbeat message of arbitrary slave node within a certain period of time, carry the identification information of the slave node that heartbeat stops in described detection request.
The present embodiment can be realized the autgmentability of group system by subregion.The present embodiment adopts star schema by slave node and host node, can be after a slave node lose efficacy, and host node is moved in time and to the virtual machine on the slave node lost efficacy, and improves availability.
The method flow schematic diagram that Fig. 8 is fourth embodiment of the invention, the system configuration schematic diagram that Fig. 9 is fourth embodiment of the invention, the present embodiment be take virtual-machine fail as example.
Referring to Fig. 8, the present embodiment comprises:
Step 81: during the cluster normal operation, the virtual machine proxy module on each node sends heartbeat message to the heartbeat detection module of its place node.
For example, the virtual machine proxy module of a certain slave node sends heartbeat message to the heartbeat detection module of this slave node.
Step 82: if the heartbeat detection module of this slave node detects the heartbeat of virtual machine, stop, the common host node to the place subregion sends failure message.
For example, on this slave node, the heartbeat detection module does not receive the heartbeat message that the virtual machine proxy module on corresponding node sends within a certain period of time, determines that corresponding virtual machine heartbeat stops.
Step 83: after common host node receives failure message, the multicast failure message, carry the identification information of the virtual machine of fault in this failure message.
Above-mentioned virtual-machine fail of take on slave node is example, during virtual-machine fail on host node, after heartbeat detection module on host node does not receive the heartbeat message of virtual machine proxy module transmission within a certain period of time, determine the virtual-machine fail on this host node, multicast failure message.
Above-mentioned failure message can be received by all the other common host nodes and management host node.
Step 84: after the heartbeat detection module of management host node receives failure message, to the HA module transmission virtual-machine fail message of management host node, carry the identification information of the virtual machine of fault in this virtual-machine fail message.
Step 85: the HA module of management host node is restarted the virtual machine request to the resource management module transmission of management host node, and this restarts the identification information that carries the virtual machine of fault in the virtual machine request.
Step 86: the resource module of management host node is restarted virtual machine.
For example, the configuration information of the virtual machine of fault is issued again to the node at this virtual machine place, and indicated corresponding node to rerun this configuration information to restart virtual machine.Perhaps, the management host node as destination node, afterwards the configuration information of the virtual machine of this fault is sent to this destination node, and the indicating target node reruns this configuration information to restart virtual machine according to node of the gravity treatments such as priority, loading condition.The resource management module gravity treatment that can be specifically destination node moves this configuration information.
With reference to above-mentioned flow process, corresponding module can be as follows:
Referring to Fig. 9, in the present embodiment, relate to common host node 91, management host node 92 and slave node 93.Further, for common host node, its judging unit is specially the 6th heartbeat module detection module 911, and processing unit is specially the 4th resource management module 912.For the management host node, its judging unit is specially the 7th heartbeat detection module 921, and processing unit specifically comprises third high available modules 922 and information resources administration module 923.For slave node, its judging unit specifically comprises virtual machine proxy module 931 and the 8th heartbeat module detection module 932, and processing unit is specially the 5th resource management module 933.
The virtual-machine fail message that the 6th heartbeat detection module 911 sends for the arbitrary slave node in receiving described common host node place subregion, perhaps, after the heartbeat that the virtual machine of self detected stops, determine the virtual machine there is fault, and the virtual machine that the virtual machine of described virtual-machine fail message indication or heartbeat stop is defined as to the virtual machine of fault;
The 4th resource management module 912 is during for the virtual-machine fail when self, the configuration information of the virtual machine of the fault that the receiving management host node sends, and rerun described configuration information to restart the virtual machine of described fault, the configuration information of the virtual machine of described fault is that described management host node sends after receiving the 3rd failure message, described the 3rd failure message is that described common host node sends after determining the virtual machine that has inefficacy, carries the identification information of the virtual machine of described fault in described the 3rd failure message.
The 7th heartbeat detection module 921, for after receiving the 3rd failure message, is determined the virtual machine that has fault, carries the identification information of fault virtual machine in described the 3rd failure message;
Third high available modules 922 is restarted the virtual machine request for sink virtual machine failure message transmission, described virtual-machine fail message is to send after receiving described the 3rd failure message, described virtual-machine fail message and describedly restart the identification information that carries the fault virtual machine in the virtual machine request;
Information resources administration module 923 sends to the node at described fault virtual machine place for the configuration information of the virtual machine that described fault virtual machine is corresponding, and indicates described node to rerun described configuration information to restart described fault virtual machine.
Send heartbeat message when virtual machine proxy module 931 is normal for the virtual machine corresponding, and do not send heartbeat message when fault;
The 8th heartbeat detection module 932 after the heartbeat that the virtual machine on described slave node detected stops, determining the virtual machine that has fault, and the virtual machine that definite heartbeat stops is the fault virtual machine for the transmission situation according to described heartbeat message;
The configuration information of the fault virtual machine that the 5th resource management module 933 sends for the receiving management host node, and rerun described configuration information to restart described fault virtual machine, the configuration information of described fault virtual machine is that described management host node sends after receiving the 3rd failure message, described the 3rd failure message is that described common host node sends after receiving virtual-machine fail message, carry the identification information of described fault virtual machine in described the 3rd failure message, described virtual-machine fail message is that described slave node sends after the heartbeat that the virtual machine on described slave node detected stops, carry the identification information of described fault virtual machine in described virtual-machine fail message.
The present embodiment can be realized the autgmentability of group system by subregion.The present embodiment is by adopting the peer-type framework between host node, slave node and host node adopt star schema, can after virtual-machine fail, know in time virtual-machine fail and restart virtual machine, improve availability.
To sum up, the embodiment of the present invention, by subregion is set, can realize by increasing subregion the expansion of cluster scale; Adopt the peer-type management by a plurality of host nodes, can eliminate the HA bottleneck; By isochronous resources state information between host node, and asynchronous resource utilization information, can be so that the malfunction monitoring communication-cost be little, the state synchronized expense is little; After the heartbeat of certain slave node stops, the host node of this subregion selects other slave node in this subregion to be arbitrated, and can reduce erroneous judgement and promote availability; Adopt the peer-type framework between host node, compared to star schema, further strengthen the host node reliability; By effectively utilizing slave node, by virtual machine (vm) migration, can reduce the wasting of resources, reduce administration overhead.
Be understandable that the reference mutually of the correlated characteristic in said method and equipment.In addition, " first " in above-described embodiment, " second " etc. are for distinguishing each embodiment, and do not represent the quality of each embodiment.
One of ordinary skill in the art will appreciate that: realize that the hardware that all or part of step of said method embodiment can be relevant by program command completes, aforesaid program can be stored in computer read/write memory medium, this program, when carrying out, is carried out the step that comprises said method embodiment; And aforesaid storage medium comprises: various media that can be program code stored such as ROM, RAM, magnetic disc or CDs.
Finally it should be noted that: above embodiment only, in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment, the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: its technical scheme that still can put down in writing aforementioned each embodiment is modified, or part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (20)

1. the processing method of a system of virtual cluster, is characterized in that, comprising:
Node judges whether to occur at least one in following: has the common host node lost efficacy, has the slave node lost efficacy, or, there is the virtual machine of fault;
Node, after determining the common host node that has inefficacy, is lived again and is imitated new common host node; After determining the slave node that has inefficacy, the new slave node of the effect of living again; Perhaps, after determining and having the virtual machine of fault, restart virtual machine;
Wherein, described common host node and slave node are divided in the subregion of at least two, comprise a host node and the slave node of at least one in each subregion; Be respectively provided to the virtual machine of few on each host node and each slave node; Adopt the peer-type framework between host node in different subregions; Adopt star schema between host node in each subregion and slave node; Described host node comprises a management host node and the common host node of at least one.
2. method according to claim 1, is characterized in that, if described node is common host node,
The common host node that judgement exist to be lost efficacy comprises: after arbitrary other heartbeat of common host node being detected and stopping, determining and has the common host node lost efficacy, and the common host node of common host node for losing efficacy that stop of definite heartbeat;
Described after determining the common host node that has inefficacy, the new common host node of the effect of living again comprises:
Receive the first member relation request message, carry the identification information of the common host node of the identification information of new common host node and inefficacy in described the first member relation request message, the identification information of described common host node is newly added in the first member relation list, and delete the identification information of the common host node of the described inefficacy in described the first member relation list;
Wherein, for the management host node receives after Fisrt fault message, the slave node in the common host node place subregion of described inefficacy, gravity treatment obtains described new common host node, described Fisrt fault message is that described common host node sends after determining the common host node that has inefficacy, carries the identification information of the common host node of described inefficacy in described Fisrt fault message.
3. method according to claim 1 and 2, is characterized in that, when described node is common host node,
The slave node that the judgement existence was lost efficacy comprises: after the heartbeat of the arbitrary slave node in described common host node place subregion being detected stops, determining and have the slave node lost efficacy, and the slave node that definite heartbeat stops is the slave node lost efficacy;
Described after determining the slave node that has inefficacy, the new slave node of the effect of living again comprises:
Receive the second member relation request message, carry the identification information of the slave node of the identification information of new slave node and inefficacy in described the second member relation request message, the identification information of described new slave node is added in the second member relation list, and delete the identification information of the slave node of the described inefficacy in described the second member relation list;
Wherein, for the management host node receives after the second failure message, the slave node in the slave node place subregion of described inefficacy, gravity treatment obtains described new slave node, described the second failure message is that described common host node sends after determining the slave node that has inefficacy, carries the identification information of the slave node of described inefficacy in described the second failure message.
4. method according to claim 1 and 2, is characterized in that, when described node is common host node,
Judgement exists the virtual machine of fault to comprise: the virtual-machine fail message that the arbitrary slave node in receiving described common host node place subregion sends, perhaps, after the heartbeat that the virtual machine of self detected stops, determine the virtual machine there is fault, and the virtual machine that the virtual machine of described virtual-machine fail message indication or heartbeat stop is defined as to the virtual machine of fault;
Describedly after determining and having the virtual machine of fault, restart virtual machine, comprising:
When self virtual-machine fail, the configuration information of the virtual machine of the fault that the receiving management host node sends, and rerun described configuration information to restart the virtual machine of described fault, the configuration information of the virtual machine of described fault is that described management host node sends after receiving the 3rd failure message, described the 3rd failure message is that described common host node sends after determining the virtual machine that has inefficacy, carries the identification information of the virtual machine of described fault in described the 3rd failure message.
5. method according to claim 1, is characterized in that, when described node is the management host node,
Judgement exists the common host node lost efficacy to comprise: after receiving Fisrt fault message, determine and have the common host node lost efficacy, described Fisrt fault message is that common host node sends after determining the common host node that has inefficacy, carries the identification information of the common host node of described inefficacy in described Fisrt fault message;
Described after determining the common host node that has inefficacy, the new common host node of the effect of living again comprises:
Receive the host node failure message, carry the identification information of the common host node of described inefficacy in described host node failure message, in the slave node of the common host node place of described inefficacy subregion, gravity treatment goes out a new common host node, and the identification information of the common host node of the identification information of described common host node newly and inefficacy is carried in the first migration virtual machine request and sends, described host node failure message is to send after receiving described Fisrt fault message;
According to described the first migration virtual machine request message, send on described new common host node by the identification information of the virtual machine on the common host node of described inefficacy and restart described virtual machine.
6. method according to claim 1 or 5, is characterized in that, when described node during for the management host node,
Judgement exists the slave node lost efficacy to comprise: after receiving the second failure message, determine and have the slave node lost efficacy, described the second failure identification information is that common host node sends after determining the slave node that has inefficacy, carries the identification information of the slave node of described inefficacy in described the second failure identification information;
Described after determining the slave node that has inefficacy, the new slave node of the effect of living again comprises:
Receive the slave node failure message, carry the identification information of the slave node of described inefficacy in described slave node failure message, in the subregion of the slave node place of described inefficacy, gravity treatment goes out a new slave node, and the identification information of the slave node of the identification information of described new slave node and inefficacy is carried in the second migration virtual machine request and sends, described slave node failure message is to send after receiving described the second failure message;
According to described the second migration virtual machine request message, send on described new slave node by the identification information of the virtual machine on the slave node of described inefficacy and restart described virtual machine.
7. method according to claim 1 or 5, is characterized in that, when described node during for the management host node,
Judgement exists the virtual machine of fault to comprise: after receiving the 3rd failure message, determine the virtual machine that has fault, carry the identification information of fault virtual machine in described the 3rd failure message;
Describedly after determining and having the virtual machine of fault, restart virtual machine, comprising:
The sink virtual machine failure message also sends and to restart the virtual machine request, and described virtual-machine fail message is to send after receiving described the 3rd failure message, described virtual-machine fail message and describedly restart the identification information that carries the fault virtual machine in the virtual machine request;
By described fault virtual machine, the configuration information of corresponding virtual machine sends to the node at described fault virtual machine place, and indicates described node to rerun described configuration information to restart described fault virtual machine.
8. method according to claim 1, is characterized in that, when described node is slave node,
There is the slave node lost efficacy in judgement, and after determining the slave node that existence was lost efficacy, the new slave node of the effect of living again comprises:
When not losing efficacy, described slave node sends heartbeat message, do not send heartbeat message when losing efficacy, so that the common host node of described slave node place subregion determines according to the situation of described heartbeat message whether described slave node lost efficacy, and carry out lower electric treatment when the slave node from as losing efficacy, perhaps, self not the slave node lost efficacy and receiving whether slave node corresponding to the rear detection of the request of detection is the slave node lost efficacy, and testing result is notified to described common host node, make described common host node be lived again and imitate the slave node processing, described detection request is not send after described common host node is received the heartbeat message of arbitrary slave node within a certain period of time, carry the identification information of the slave node that heartbeat stops in described detection request.
9. according to the described method of claim 1 or 8, it is characterized in that, when described node is slave node,
Judgement exists the virtual machine of fault to comprise:
Send heartbeat message when corresponding virtual machine is normal, and does not send heartbeat message when fault;
After the heartbeat that the virtual machine on described slave node detected stops, determining the virtual machine that has fault, and the virtual machine that definite heartbeat stops is the fault virtual machine according to the transmission situation of described heartbeat message;
Describedly after determining and having the virtual machine of fault, restart virtual machine, comprising:
The configuration information of the fault virtual machine that the receiving management host node sends, and rerun described configuration information to restart described fault virtual machine, the configuration information of described fault virtual machine is that described management host node sends after receiving the 3rd failure message, described the 3rd failure message is that described common host node sends after receiving virtual-machine fail message, carry the identification information of described fault virtual machine in described the 3rd failure message, described virtual-machine fail message is that described slave node sends after the heartbeat that the virtual machine on described slave node detected stops, carry the identification information of described fault virtual machine in described virtual-machine fail message.
10. the treatment facility of a system of virtual cluster, is characterized in that, comprising:
Judging unit, for judging whether to occur following at least one: there is the common host node lost efficacy, have the slave node lost efficacy, or, there is the virtual machine of fault;
Processing unit, for after determining the common host node that has inefficacy, the new common host node of the effect of living again; After determining the slave node that has inefficacy, the new slave node of the effect of living again; Perhaps, after determining and having the virtual machine of fault, restart virtual machine;
Wherein, described common host node and slave node are divided in the subregion of at least two, comprise a host node and the slave node of at least one in each subregion; Be respectively provided to the virtual machine of few on each host node and each slave node; Adopt the peer-type framework between host node in different subregions; Adopt star schema between host node in each subregion and slave node; Described host node comprises a management host node and the common host node of at least one.
11. equipment according to claim 10, is characterized in that, when described equipment is common host node,
Described judging unit comprises:
The first heartbeat detection module, for after arbitrary other heartbeat of common host node being detected and stopping, determining and have the common host node lost efficacy, and the common host node of common host node for losing efficacy that stop of definite heartbeat;
Described processing unit comprises:
The first member relation administration module, for receiving the first member relation request message, carry the identification information of the common host node of the identification information of new common host node and inefficacy in described the first member relation request message, the identification information of described common host node is newly added in the first member relation list, and delete the identification information of the common host node of the described inefficacy in described the first member relation list;
Wherein, for the management host node receives after Fisrt fault message, the slave node in the common host node place subregion of described inefficacy, gravity treatment obtains described new common host node, described Fisrt fault message is that described common host node sends after determining the common host node that has inefficacy, carries the identification information of the common host node of described inefficacy in described Fisrt fault message.
12. equipment according to claim 10, is characterized in that, when described equipment is the management host node,
Described judging unit comprises:
The second heartbeat detection module, for after receiving Fisrt fault message, determine and have the common host node lost efficacy, described Fisrt fault message is that common host node sends after determining the common host node that has inefficacy, carries the identification information of the common host node of described inefficacy in described Fisrt fault message;
Described processing unit comprises:
The first high available modules, for receiving the host node failure message, carry the identification information of the common host node of described inefficacy in described host node failure message, in the slave node of the common host node place of described inefficacy subregion, gravity treatment goes out a new common host node, and the identification information of the common host node of the identification information of described common host node newly and inefficacy is carried in the first migration virtual machine request and sends, described host node failure message is to send after receiving described Fisrt fault message;
The first resource administration module, for according to described the first migration virtual machine request message, send on described new common host node by the configuration information of the virtual machine on the common host node of described inefficacy and restart described virtual machine.
13. according to the described equipment of claim 10 or 11, it is characterized in that, when described equipment is common host node,
Described judging unit and processing unit are arranged in the 3rd heartbeat detection module, after described the 3rd heartbeat detection module stops for the heartbeat of arbitrary slave node in described common host node place subregion being detected, determine and have the slave node lost efficacy, and the slave node that definite heartbeat stops is the slave node lost efficacy;
Wherein, for the management host node receives after the second failure message, the slave node in the slave node place subregion of described inefficacy, gravity treatment obtains described new slave node, described the second failure message is that described common host node sends after determining the slave node that has inefficacy, carries the identification information of the slave node of described inefficacy in described the second failure message.
14. according to the described equipment of claim 10 or 12, it is characterized in that, when described equipment is the management host node,
Described judging unit comprises:
The 4th heartbeat detection module, for after receiving the second failure message, determine and have the slave node lost efficacy, described the second failure identification information is that common host node sends after determining the slave node that has inefficacy, carries the identification information of the slave node of described inefficacy in described the second failure identification information;
Described processing unit comprises:
The second high available modules, for receiving the slave node failure message, carry the identification information of the slave node of described inefficacy in described slave node failure message, in the subregion of the slave node place of described inefficacy, gravity treatment goes out a new slave node, and the identification information of the slave node of the identification information of described new slave node and inefficacy is carried in the second migration virtual machine request and sends, described slave node failure message is to send after receiving described the second failure message;
The Secondary resource administration module, for according to described the second migration virtual machine request message, send on described new slave node by the identification information of the virtual machine on the slave node of described inefficacy and restart described virtual machine.
15. equipment according to claim 10, is characterized in that, when described equipment is slave node,
Described judging unit and described processing unit form the 5th heartbeat detection module, described the 5th heartbeat detection module for sending heartbeat message when described slave node did not lose efficacy, do not send heartbeat message when losing efficacy, so that the common host node of described slave node place subregion determines according to the situation of described heartbeat message whether described slave node lost efficacy, and carry out lower electric treatment when the slave node from as losing efficacy, perhaps, self not the slave node lost efficacy and receiving whether slave node corresponding to the rear detection of the request of detection is the slave node lost efficacy, and testing result is notified to described common host node, make described common host node be lived again and imitate the slave node processing, described detection request is not send after described common host node is received the heartbeat message of arbitrary slave node within a certain period of time, carry the identification information of the slave node that heartbeat stops in described detection request.
16. according to the described equipment of claim 10 or 11, it is characterized in that, when described equipment is common host node,
Described judging unit comprises:
The 6th heartbeat detection module, the virtual-machine fail message sent for the arbitrary slave node in receiving described common host node place subregion, perhaps, after the heartbeat that the virtual machine of self detected stops, determine the virtual machine there is fault, and the virtual machine that the virtual machine of described virtual-machine fail message indication or heartbeat stop is defined as to the virtual machine of fault;
Described processing unit comprises:
The 4th resource management module, during for the virtual-machine fail when self, the configuration information of the virtual machine of the fault that the receiving management host node sends, and rerun described configuration information to restart the virtual machine of described fault, the configuration information of the virtual machine of described fault is that described management host node sends after receiving the 3rd failure message, described the 3rd failure message is that described common host node sends after determining the virtual machine that has inefficacy, carries the identification information of the virtual machine of described fault in described the 3rd failure message.
17. according to the described equipment of claim 10 or 12, it is characterized in that, when described equipment is the management host node,
Described judging unit comprises:
The 7th heartbeat detection module, for after receiving the 3rd failure message, determine the virtual machine that has fault, carries the identification information of fault virtual machine in described the 3rd failure message;
Described processing unit comprises:
The third high available modules, restart the virtual machine request for sink virtual machine failure message transmission, described virtual-machine fail message is to send after receiving described the 3rd failure message, described virtual-machine fail message and describedly restart the identification information that carries the fault virtual machine in the virtual machine request;
The information resources administration module, send to the node at described fault virtual machine place for the configuration information of the virtual machine that described fault virtual machine is corresponding, and indicate described node to rerun described configuration information to restart described fault virtual machine.
18. according to the described equipment of claim 10 or 15, it is characterized in that, when described equipment is slave node,
Described judging unit comprises:
The virtual machine proxy module, send heartbeat message, and do not send heartbeat message when fault when normal for the virtual machine corresponding;
The 8th heartbeat detection module, after the heartbeat that the virtual machine on described slave node detected stops, determining the virtual machine that has fault, and the virtual machine that definite heartbeat stops is the fault virtual machine for the transmission situation according to described heartbeat message;
Described processing unit comprises:
The 5th resource management module, the configuration information of the fault virtual machine sent for the receiving management host node, and rerun described configuration information to restart described fault virtual machine, the configuration information of described fault virtual machine is that described management host node sends after receiving the 3rd failure message, described the 3rd failure message is that described common host node sends after receiving virtual-machine fail message, carry the identification information of described fault virtual machine in described the 3rd failure message, described virtual-machine fail message is that described slave node sends after the heartbeat that the virtual machine on described slave node detected stops, carry the identification information of described fault virtual machine in described virtual-machine fail message.
19. a system of virtual cluster, is characterized in that, comprising:
The subregion of at least two, comprise a host node and the slave node of at least one in each subregion; Be respectively provided to the virtual machine of few on each host node and each slave node;
Adopt the peer-type framework between host node in different subregions;
Adopt star schema between host node in each subregion and slave node;
Described host node comprises a management host node and the common host node of at least one, described management host node is for after common host node or slave node inefficacy, new common host node or slave node of gravity treatment in the subregion at the common host node lost efficacy or slave node place, perhaps, during virtual-machine fail on common host node or slave node, restart virtual machine.
20. system according to claim 19, is characterized in that,
Described common host node is equipment as claimed in claim 11; Described management host node is equipment as claimed in claim 12;
Perhaps,
Described common host node is equipment as claimed in claim 13; Described management host node is equipment as claimed in claim 14; And described slave node is equipment as claimed in claim 15;
Perhaps,
Described common host node is equipment as claimed in claim 16; Described management host node is equipment as claimed in claim 17; And described slave node is equipment as claimed in claim 18.
CN201110301796.0A 2011-09-27 2011-09-27 Virtual clustered system as well as processing method and processing device thereof Expired - Fee Related CN102355369B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201110301796.0A CN102355369B (en) 2011-09-27 2011-09-27 Virtual clustered system as well as processing method and processing device thereof
PCT/CN2012/082196 WO2013044828A1 (en) 2011-09-27 2012-09-27 Virtual cluster system, processing method and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110301796.0A CN102355369B (en) 2011-09-27 2011-09-27 Virtual clustered system as well as processing method and processing device thereof

Publications (2)

Publication Number Publication Date
CN102355369A CN102355369A (en) 2012-02-15
CN102355369B true CN102355369B (en) 2014-01-08

Family

ID=45578866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110301796.0A Expired - Fee Related CN102355369B (en) 2011-09-27 2011-09-27 Virtual clustered system as well as processing method and processing device thereof

Country Status (2)

Country Link
CN (1) CN102355369B (en)
WO (1) WO2013044828A1 (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102355369B (en) * 2011-09-27 2014-01-08 华为技术有限公司 Virtual clustered system as well as processing method and processing device thereof
CN103294494B (en) * 2012-02-29 2018-07-03 中兴通讯股份有限公司 A kind of method and system of virtual system automatically dispose
CN102664763A (en) * 2012-03-20 2012-09-12 浪潮电子信息产业股份有限公司 Method for rapidly detecting connection states and making virtual machine HA
CN103229463B (en) * 2012-12-18 2015-11-25 华为技术有限公司 A kind of method, the network equipment and Virtual Cluster determining management domain
CN103607296B (en) * 2013-11-01 2017-08-22 新华三技术有限公司 A kind of virtual-machine fail processing method and equipment
CN103729234B (en) * 2013-12-20 2017-06-27 中电长城网际系统应用有限公司 A kind of cluster virtual machine management method and device
CN105591780B (en) * 2014-10-24 2019-01-29 新华三技术有限公司 Cluster monitoring method and equipment
CN106062717B (en) * 2014-11-06 2019-05-03 华为技术有限公司 A kind of distributed storage dubbing system and method
CN106302569B (en) 2015-05-14 2019-06-18 华为技术有限公司 Handle the method and computer system of cluster virtual machine
CN106612314A (en) * 2015-10-26 2017-05-03 上海宝信软件股份有限公司 System for realizing software-defined storage based on virtual machine
CN105357038B (en) * 2015-10-26 2019-05-07 北京百度网讯科技有限公司 Monitor the method and system of cluster virtual machine
CN108108255A (en) * 2016-11-25 2018-06-01 中兴通讯股份有限公司 The detection of virtual-machine fail and restoration methods and device
CN106789350A (en) * 2017-01-23 2017-05-31 郑州云海信息技术有限公司 A kind of method and device of back-level server virtualization system host node High Availabitity
CN107315663B (en) * 2017-03-10 2020-06-09 秦皇岛市第一医院 Dual-machine cluster architecture
CN107018041B (en) * 2017-03-31 2019-05-17 杭州数梦工场科技有限公司 Data migration method and device in cluster
CN108111337B (en) * 2017-12-06 2021-04-06 北京天融信网络安全技术有限公司 Method and equipment for arbitrating main nodes in distributed system
WO2019178714A1 (en) * 2018-03-19 2019-09-26 华为技术有限公司 Fault detection method, apparatus, and system
CN110661599B (en) * 2018-06-28 2022-04-29 中兴通讯股份有限公司 HA implementation method, device and storage medium between main node and standby node
CN109361777B (en) * 2018-12-18 2021-08-10 广东浪潮大数据研究有限公司 Synchronization method, synchronization system and related device for distributed cluster node states
CN113742417B (en) * 2020-05-29 2024-06-07 同方威视技术股份有限公司 Multistage distributed consensus method and system, electronic equipment and computer readable medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102110071A (en) * 2011-03-04 2011-06-29 浪潮(北京)电子信息产业有限公司 Virtual machine cluster system and implementation method thereof

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060155912A1 (en) * 2005-01-12 2006-07-13 Dell Products L.P. Server cluster having a virtual server
US20080189700A1 (en) * 2007-02-02 2008-08-07 Vmware, Inc. Admission Control for Virtual Machine Cluster
CN102355369B (en) * 2011-09-27 2014-01-08 华为技术有限公司 Virtual clustered system as well as processing method and processing device thereof

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102110071A (en) * 2011-03-04 2011-06-29 浪潮(北京)电子信息产业有限公司 Virtual machine cluster system and implementation method thereof

Also Published As

Publication number Publication date
WO2013044828A1 (en) 2013-04-04
CN102355369A (en) 2012-02-15

Similar Documents

Publication Publication Date Title
CN102355369B (en) Virtual clustered system as well as processing method and processing device thereof
EP3620905B1 (en) Method and device for identifying osd sub-health, and data storage system
US8615676B2 (en) Providing first field data capture in a virtual input/output server (VIOS) cluster environment with cluster-aware vioses
CN108200124B (en) High-availability application program architecture and construction method
US8949828B2 (en) Single point, scalable data synchronization for management of a virtual input/output server cluster
US10243780B2 (en) Dynamic heartbeating mechanism
US8583773B2 (en) Autonomous primary node election within a virtual input/output server cluster
US8473692B2 (en) Operating system image management
US9639437B2 (en) Techniques to manage non-disruptive SAN availability in a partitioned cluster
US8726274B2 (en) Registration and initialization of cluster-aware virtual input/output server nodes
CN108023967B (en) Data balancing method and device and management equipment in distributed storage system
US8856585B2 (en) Hardware failure mitigation
US20120303594A1 (en) Multiple Node/Virtual Input/Output (I/O) Server (VIOS) Failure Recovery in Clustered Partition Mobility
CN104408071A (en) Distributive database high-availability method and system based on cluster manager
JP2014501424A (en) Integrated software and hardware system that enables automated provisioning and configuration based on the physical location of the blade
CN105159798A (en) Dual-machine hot-standby method for virtual machines, dual-machine hot-standby management server and system
US10860375B1 (en) Singleton coordination in an actor-based system
US20120151095A1 (en) Enforcing logical unit (lu) persistent reservations upon a shared virtual storage device
CN102394914A (en) Cluster brain-split processing method and device
CN112052230B (en) Multi-machine room data synchronization method, computing device and storage medium
CN104158707A (en) Method and device of detecting and processing brain split in cluster
CN112887367B (en) Method, system and computer readable medium for realizing high availability of distributed cluster
CN103810038A (en) Method and device for transferring virtual machine storage files in HA cluster
CN104052799B (en) A kind of method that High Availabitity storage is realized using resource ring
CN116010111A (en) Cross-cluster resource scheduling method, system and terminal equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140108

Termination date: 20190927

CF01 Termination of patent right due to non-payment of annual fee