CN102355369B

CN102355369B - Virtual clustered system as well as processing method and processing device thereof

Info

Publication number: CN102355369B
Application number: CN201110301796.0A
Authority: CN
Inventors: 江滢
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2011-09-27
Filing date: 2011-09-27
Publication date: 2014-01-08
Anticipated expiration: 2031-09-27
Also published as: WO2013044828A1; CN102355369A

Abstract

The invention discloses a virtual clustered system as well as a processing method and a processing device thereof. The system comprises at least two partitions, wherein each partition comprises one main node and at least one spare node; each main node and each spare node are respectively provided with at least one virtual machine; a peer-to-peer architecture is used between the main nodes in different partitions; a star architecture is used between the main node and the spare node in each partition; the main nodes comprise one management main node and at least one normal main node, wherein the management main node is used for reselecting a new normal main node or spare node in the partition of the ineffective normal node or spare node when the normal node or the spare node is ineffective, or, rebooting the virtual machine when the virtual machine on the normal main node or spare node is failed. According to the embodiment of the invention, the expandability and availability of the system can be improved.

Description

System of virtual cluster and processing method thereof and equipment

Technical field

The present invention relates to the network communications technology, relate in particular to a kind of system of virtual cluster and processing method thereof and equipment.

Background technology

Group system has powerful overall computational performance, memory property and management of performance, and the service form of single system mapping, and, to availability guarantee and the fault-tolerant ability of user transparent, becomes the main flow infrastructure architecture of data center.The application of Intel Virtualization Technology, for cluster development provide more excellent also more potential solution party to.Intel Virtualization Technology allows a platform to move a plurality of operating system simultaneously, and application program can move and be independent of each other in separate space, thereby significantly improves the operating efficiency of computer.Move the calculating potential that a plurality of virtual machines can take full advantage of physical server, for data center provides capability of fast response.

After introducing Intel Virtualization Technology, can expand and high availability is the ultimate challenge that group system faces.

Summary of the invention

The embodiment of the present invention is to provide a kind of system of virtual cluster and processing method and equipment, improves extensibility and the availability of virtual machine cluster system.

The embodiment of the present invention provides a kind of processing method of system of virtual cluster, comprising:

Node judges whether to occur at least one in following: has the common host node lost efficacy, has the slave node lost efficacy, or, there is the virtual machine of fault;

Node, after determining the common host node that has inefficacy, is lived again and is imitated new common host node; After determining the slave node that has inefficacy, the new slave node of the effect of living again; Perhaps, after determining and having the virtual machine of fault, restart virtual machine;

Wherein, described common host node and slave node are divided in the subregion of at least two, comprise a host node and the slave node of at least one in each subregion; Be respectively provided to the virtual machine of few on each host node and each slave node; Adopt the peer-type framework between host node in different subregions; Adopt star schema between host node in each subregion and slave node; Described host node comprises a management host node and the common host node of at least one.

The embodiment of the present invention provides a kind for the treatment of facility of system of virtual cluster, comprising:

Judging unit, for judging whether to occur following at least one: there is the common host node lost efficacy, have the slave node lost efficacy, or, there is the virtual machine of fault;

Processing unit, for after determining the common host node that has inefficacy, the new common host node of the effect of living again; After determining the slave node that has inefficacy, the new slave node of the effect of living again; Perhaps, after determining and having the virtual machine of fault, restart virtual machine;

The embodiment of the present invention provides a kind of system of virtual cluster, comprising:

The subregion of at least two, comprise a host node and the slave node of at least one in each subregion; Be respectively provided to the virtual machine of few on each host node and each slave node;

Adopt the peer-type framework between host node in different subregions;

Adopt star schema between host node in each subregion and slave node;

Described host node comprises a management host node and the common host node of at least one, described management host node is for after common host node or slave node inefficacy, new common host node or slave node of gravity treatment in the subregion at the common host node lost efficacy or slave node place, perhaps, during virtual-machine fail on common host node or slave node, restart virtual machine.

As shown from the above technical solution, the system of virtual cluster of the embodiment of the present invention, by dividing subregion, can, by increasing subregion, be realized system extension; Adopt the peer-type structure between the host node of subregion, can eliminate bottleneck problem, and can improve reliability; New host node, the slave node or restart virtual machine and can further improve reliability by gravity treatment.

The accompanying drawing explanation

In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, in below describing embodiment, the accompanying drawing of required use is briefly described, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.

The system configuration schematic diagram that Fig. 1 is first embodiment of the invention;

The method flow schematic diagram that Fig. 2 is first embodiment of the invention;

The device structure schematic diagram that Fig. 3 is first embodiment of the invention;

The method flow schematic diagram that Fig. 4 is second embodiment of the invention;

The system configuration schematic diagram that Fig. 5 is second embodiment of the invention;

The method flow schematic diagram that Fig. 6 is third embodiment of the invention;

The system configuration schematic diagram that Fig. 7 is third embodiment of the invention;

The method flow schematic diagram that Fig. 8 is fourth embodiment of the invention;

The system configuration schematic diagram that Fig. 9 is fourth embodiment of the invention.

Embodiment

For the purpose, technical scheme and the advantage that make the embodiment of the present invention clearer, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Embodiment based in the present invention, those of ordinary skills, not making under the creative work prerequisite the every other embodiment obtained, belong to the scope of protection of the invention.

The system configuration schematic diagram that Fig. 1 is first embodiment of the invention, referring to Fig. 1, this system comprises the subregion 1 of at least two, comprises a host node (master) 11 and the slave node (slave) 12 of at least one in each subregion; Be respectively provided to the virtual machine (Virtual Machine, VM) 13 of few on each host node 11 and each slave node 12.

For example, referring to Fig. 1, host node comprises host node A, host node B and host node C etc., the slave node of host node A place subregion comprises slave node a1, slave node a2 etc., the slave node of host node B place subregion comprises slave node b1, slave node b2 etc., and the slave node of host node C place subregion comprises slave node c1, slave node c2 etc.

Adopt the peer-type framework between host node 11 in different subregions, a host node can send resource state information to other arbitrary host node, also can receive the resource state information that other arbitrary host nodes send.Adopt star schema between host node 11 in each subregion and slave node 12, that is, slave node can send resource state information to host node, and host node does not send resource state information to slave node.This resource state information can show that corresponding node is normal or lost efficacy.

Described host node comprises a management host node (master leader) and the common host node of at least one, described management host node is for after common host node or slave node inefficacy, new common host node or slave node of gravity treatment in the subregion at the common host node lost efficacy or slave node place, perhaps, during virtual-machine fail on common host node or slave node, restart virtual machine.

Wherein, in host node one can be set in advance as the management host node, all the other host nodes are common host node, can store the information of the virtual machine on each host node and slave node and node in the management host node, all nodes in subregion are carried out to unified management, unified handling failure after breaking down.For example, referring to Fig. 1, host node C can be set for the management host node, and host node A, host node B etc. are common host node.

Corresponding above-mentioned system, the flow process of each equipment room can be as follows.

The method flow schematic diagram that Fig. 2 is first embodiment of the invention comprises:

Step 21: node judges whether to occur at least one in following: has the common host node lost efficacy, has the slave node lost efficacy, or, there is the virtual machine of fault;

Step 22: node, after determining the common host node that has inefficacy, is lived again and is imitated new common host node; After determining the slave node that has inefficacy, the new slave node of the effect of living again; Perhaps, after determining and having the virtual machine of fault, restart virtual machine;

Above-mentioned node can be specially common host node, management host node and slave node, and when, scene different at node is different, above-mentioned flow process can have different embodiments.Particular content can be referring to subsequent embodiment.

Accordingly, equipment corresponding to the method can be as described below.

The device structure schematic diagram that Fig. 3 is first embodiment of the invention, comprise judging unit 31 and processing unit 32; Judging unit 31 is for judging whether to occur following at least one: has the common host node lost efficacy, has the slave node lost efficacy, or, there is the virtual machine of fault; Processing unit 32, for after determining the common host node that has inefficacy, is lived again and is imitated new common host node; After determining the slave node that has inefficacy, the new slave node of the effect of living again; Perhaps, after determining and having the virtual machine of fault, restart virtual machine;

Certainly, corresponding with the said method flow process, above-mentioned equipment can be for common host node, management host node, slave node, under different nodes and scene, and the concrete function difference of said units.Specifically can be referring to the following examples.

The system of virtual cluster of the embodiment of the present invention, by dividing subregion, can, by increasing subregion, be realized system extension; Adopt the peer-type structure between the host node of subregion, can eliminate bottleneck problem, and can improve reliability; New host node, the slave node or restart virtual machine and can further improve reliability by gravity treatment.

The method flow schematic diagram that Fig. 4 is second embodiment of the invention, the system configuration schematic diagram that Fig. 5 is second embodiment of the invention, it is example that the present embodiment take that common host node lost efficacy.

Referring to Fig. 4, the present embodiment comprises:

Step 41: during the cluster normal operation, the common host node of each subregion detects mutually heartbeat by heartbeat detection module (heartbeatsync).

For example, the heartbeat detection module of common host node A sends to heartbeat message the heartbeat detection module of common host node B.

Step 42: if the heartbeat detection module of common host node B detects the heartbeat message of common host node A, stop, multicast failure message, carry the identification information of common host node A in this failure message, to show common host node A, lost efficacy.

Wherein, after common host node B does not receive the heartbeat message of common host node A within a certain period of time, determine that the heartbeat that common host node A detected stops.

This identification information can be for distinguishing each node, such as being the ID of common host node A or address etc.

Wherein, all the other common host nodes and management host node all can receive this failure message.

Step 43: after the heartbeat detection module of management host node receives failure message, available (the High Availability of height to the management host node, HA) module reports the host node failure message, carries the identification information of common host node A in this host node failure message.

Step 44: manage the HA module of host node in the subregion of common host node A place, the new common host node that is this subregion by a slave node gravity treatment.

For example, according to the ID priority of each slave node, the dynamic load situation of slave node, elect the slave node a1 in the subregion of A place as new common host node.

Step 45: the HA module of management host node sends the request of migration virtual machine to the resource management module (ResourceMgmt) of management host node, carries the identification information of new common host node a1 and the identification information of common host node A in this migration virtual machine request.

Step 46: the management host node resource management module by the virtual machine (vm) migration on common host node A to new common host node a1.

For example, the configuration information of the virtual machine on common host node A is sent to new common host node a1, and indicate new common host node a1 to rerun this configuration information to restart corresponding virtual machine.Wherein, the configuration information of virtual machine is the information that can start virtual machine, is for example software virtual machine, after carrying out this software virtual machine, can start virtual machine.

Further, after new common host node adds, host node also will further upgrade member relation:

Step 47: new common host node joins request to all the other common host node multicasts, after the heartbeat detection module of all the other common host nodes detects this and joins request, send the member relation update request to corresponding member management module (MembershipMgmt), carry the identification information of the common host node of the identification information of new common host node and inefficacy in this member relation update request.

For example, after common host node B receives joining request of new common host node a1 multicast, the heartbeat detection module of common host node B sends the member relation update request to the member management module of common host node B, carries the identification information of A and the identification information of a1 in this message.

Step 48: the member relation administration module upgrades member's relation list.

For example, the identification information of new common host node a1 is added in this member's list, and delete the identification information of the common host node A lost efficacy.

With reference to above-mentioned flow process, corresponding module can be as follows:

Referring to Fig. 5, in the present embodiment, relate to common host node 51 and management host node 52.Further, for common host node, its judging unit is specially the first heartbeat module detection module (Heartbeat Sync) 511, and processing unit is specially the first member relation administration module (MembershipMgmt) 512.For the management host node, its judging unit is specially the second heartbeat detection module 521, and processing unit specifically comprises the first high available (HA) module 522 and first resource administration module (ResourceMgmt) 523.

The first heartbeat detection module 511, for after arbitrary other heartbeat of common host node being detected and stopping, determining and has the common host node lost efficacy, and the common host node of common host node for losing efficacy that stop of definite heartbeat;

The first member relation administration module 512 is for receiving the first member relation request message, carry the identification information of the common host node of the identification information of new common host node and inefficacy in described the first member relation request message, the identification information of described common host node is newly added in the first member relation list, and delete the identification information of the common host node of the described inefficacy in described the first member relation list;

Wherein, for the management host node receives after Fisrt fault message, the slave node in the common host node place subregion of described inefficacy, gravity treatment obtains described new common host node, described Fisrt fault message is that described common host node sends after determining the common host node that has inefficacy, carries the identification information of the common host node of described inefficacy in described Fisrt fault message.

The second heartbeat detection module 521 is for after receiving Fisrt fault message, determine and have the common host node lost efficacy, described Fisrt fault message is that common host node sends after determining the common host node that has inefficacy, carries the identification information of the common host node of described inefficacy in described Fisrt fault message;

The first high available modules 522 is for receiving the host node failure message, carry the identification information of the common host node of described inefficacy in described host node failure message, in the slave node of the common host node place of described inefficacy subregion, gravity treatment goes out a new common host node, and the identification information of the common host node of the identification information of described common host node newly and inefficacy is carried in the first migration virtual machine request and sends, described host node failure message is to send after receiving described Fisrt fault message;

First resource administration module 523, for according to described the first migration virtual machine request message, sends on described new common host node by the identification information of the virtual machine on the common host node of described inefficacy and restarts described virtual machine.

The present embodiment can be realized the autgmentability of group system by subregion.The present embodiment is by adopting the peer-type framework between host node, can be after a host node lose efficacy, know in time that host node lost efficacy to lay equal stress on to select new host node, and improve availability.

The method flow schematic diagram that Fig. 6 is third embodiment of the invention, the system configuration schematic diagram that Fig. 7 is third embodiment of the invention, the present embodiment is example in order to node failure.

Referring to Fig. 6, the present embodiment comprises:

Step 601: during the cluster normal operation, the slave node of each subregion is the phychology detection module transmission heartbeat message to the common host node of place subregion by the heartbeat detection module.

For example, the heartbeat detection module of slave node a1 sends to heartbeat message the heartbeat detection module of the common host node A of place subregion.

Step 602: if the heartbeat detection module of common host node A detects the heartbeat of slave node a1, stop, another slave node to the place subregion sends heartbeat detection message.

For example, common host node A does not detect the heartbeat message of slave node a1 within the time of setting, the heartbeat that common host node A detects slave node a1 stops, and send heartbeat detection message to another slave node a2 of its place subregion, carry the identification information of slave node a1 in this heartbeat detection message.

Step 603: slave node a2 detects the heartbeat situation of slave node a1.

For example, slave node a2 sends ping message to slave node a1, if do not receive the response message that slave node a1 returns, shows that slave node a1 heartbeat stops.

Step 604: slave node a2 sends the heartbeat detection result to common host node A, wherein carries the heartbeat detection result to slave node a1.

Step 605: if the heartbeat detection result also shows the heartbeat of slave node a1, stop, common host node A multicast failure message, carry the identification information of slave node a1 in this failure message.

Wherein, all the other common host nodes and management host node all can receive failure message.

Step 606: after the heartbeat detection module of management host node receives this failure message, the HA module in the management host node sends the slave node failure message, carries the identification information of the slave node a1 of inefficacy in this slave node failure message.

Step 607: the HA module of management host node, in the subregion of slave node a1 place, is elected another slave node as the slave node of migration virtual machine.

Wherein, also can select another slave node according to priority, loading condition etc.

Step 608: the HA module of management host node sends the request of migration virtual machine to the resource management module of management host node, wherein carries the identification information of the slave node of the identification information of new slave node and inefficacy.

For example, the slave node of gravity treatment is a2, moves in the virtual machine request and carries the identification information of a1 and the identification information of a2.

Step 609: the management host node resource management module by the virtual machine (vm) migration on slave node a1 to slave node a2.

For example, the configuration information of the virtual machine on slave node a1 is sent to slave node a2, and indication a2 reruns this configuration information to restart corresponding virtual machine.Wherein, the configuration information of virtual machine be can start the information of virtual machine, be for example software virtual machine, can start virtual machine after carrying out this software virtual machine.

Further, the slave node of inefficacy can be carried out following action:

Step 610: slave node a1 is after finding that own heartbeat message is lost, and the ping gateway, to oneself gateway transmission ping message.

Step 611: if ping is obstructed, can not receive response message corresponding to ping message, lower electricity.

Referring to Fig. 7, in the present embodiment, relate to common host node 71, management host node 72 and slave node 73.Further, for common host node, its judging unit and processing unit are same module, are specially the 3rd heartbeat module detection module 711.For the management host node, its judging unit is specially the 4th heartbeat detection module 721, and processing unit specifically comprises the second high available modules 722 and Secondary resource administration module (ResourceMgmt) 723.For slave node, its judging unit and processing unit are same module, are specially the 5th heartbeat module detection module 731.

Described the 3rd heartbeat detection module 711 determines and has the slave node lost efficacy after stopping for the heartbeat of the arbitrary slave node in described common host node place subregion being detected, and the slave node of slave node for losing efficacy that stop of definite heartbeat;

The 4th heartbeat detection module 721 is after receiving the second failure message, determine and have the slave node lost efficacy, described the second failure identification information is that common host node sends after determining the slave node that has inefficacy, carries the identification information of the slave node of described inefficacy in described the second failure identification information;

The second high available modules 722 is for receiving the slave node failure message, carry the identification information of the slave node of described inefficacy in described slave node failure message, in the subregion of the slave node place of described inefficacy, gravity treatment goes out a new slave node, and the identification information of the slave node of the identification information of described new slave node and inefficacy is carried in the second migration virtual machine request and sends, described slave node failure message is to send after receiving described the second failure message;

Secondary resource administration module 723, for according to described the second migration virtual machine request message, sends on described new slave node by the identification information of the virtual machine on the slave node of described inefficacy and restarts described virtual machine.

Described the 5th heartbeat detection module 731 for sending heartbeat message when described slave node did not lose efficacy, do not send heartbeat message when losing efficacy, so that the common host node of described slave node place subregion determines according to the situation of described heartbeat message whether described slave node lost efficacy, and carry out lower electric treatment when the slave node from as losing efficacy, perhaps, self not the slave node lost efficacy and receiving whether slave node corresponding to the rear detection of the request of detection is the slave node lost efficacy, and testing result is notified to described common host node, make described common host node be lived again and imitate the slave node processing, described detection request is not send after described common host node is received the heartbeat message of arbitrary slave node within a certain period of time, carry the identification information of the slave node that heartbeat stops in described detection request.

The present embodiment can be realized the autgmentability of group system by subregion.The present embodiment adopts star schema by slave node and host node, can be after a slave node lose efficacy, and host node is moved in time and to the virtual machine on the slave node lost efficacy, and improves availability.

The method flow schematic diagram that Fig. 8 is fourth embodiment of the invention, the system configuration schematic diagram that Fig. 9 is fourth embodiment of the invention, the present embodiment be take virtual-machine fail as example.

Referring to Fig. 8, the present embodiment comprises:

Step 81: during the cluster normal operation, the virtual machine proxy module on each node sends heartbeat message to the heartbeat detection module of its place node.

For example, the virtual machine proxy module of a certain slave node sends heartbeat message to the heartbeat detection module of this slave node.

Step 82: if the heartbeat detection module of this slave node detects the heartbeat of virtual machine, stop, the common host node to the place subregion sends failure message.

For example, on this slave node, the heartbeat detection module does not receive the heartbeat message that the virtual machine proxy module on corresponding node sends within a certain period of time, determines that corresponding virtual machine heartbeat stops.

Step 83: after common host node receives failure message, the multicast failure message, carry the identification information of the virtual machine of fault in this failure message.

Above-mentioned virtual-machine fail of take on slave node is example, during virtual-machine fail on host node, after heartbeat detection module on host node does not receive the heartbeat message of virtual machine proxy module transmission within a certain period of time, determine the virtual-machine fail on this host node, multicast failure message.

Above-mentioned failure message can be received by all the other common host nodes and management host node.

Step 84: after the heartbeat detection module of management host node receives failure message, to the HA module transmission virtual-machine fail message of management host node, carry the identification information of the virtual machine of fault in this virtual-machine fail message.

Step 85: the HA module of management host node is restarted the virtual machine request to the resource management module transmission of management host node, and this restarts the identification information that carries the virtual machine of fault in the virtual machine request.

Step 86: the resource module of management host node is restarted virtual machine.

For example, the configuration information of the virtual machine of fault is issued again to the node at this virtual machine place, and indicated corresponding node to rerun this configuration information to restart virtual machine.Perhaps, the management host node as destination node, afterwards the configuration information of the virtual machine of this fault is sent to this destination node, and the indicating target node reruns this configuration information to restart virtual machine according to node of the gravity treatments such as priority, loading condition.The resource management module gravity treatment that can be specifically destination node moves this configuration information.

Referring to Fig. 9, in the present embodiment, relate to common host node 91, management host node 92 and slave node 93.Further, for common host node, its judging unit is specially the 6th heartbeat module detection module 911, and processing unit is specially the 4th resource management module 912.For the management host node, its judging unit is specially the 7th heartbeat detection module 921, and processing unit specifically comprises third high available modules 922 and information resources administration module 923.For slave node, its judging unit specifically comprises virtual machine proxy module 931 and the 8th heartbeat module detection module 932, and processing unit is specially the 5th resource management module 933.

The virtual-machine fail message that the 6th heartbeat detection module 911 sends for the arbitrary slave node in receiving described common host node place subregion, perhaps, after the heartbeat that the virtual machine of self detected stops, determine the virtual machine there is fault, and the virtual machine that the virtual machine of described virtual-machine fail message indication or heartbeat stop is defined as to the virtual machine of fault;

The 4th resource management module 912 is during for the virtual-machine fail when self, the configuration information of the virtual machine of the fault that the receiving management host node sends, and rerun described configuration information to restart the virtual machine of described fault, the configuration information of the virtual machine of described fault is that described management host node sends after receiving the 3rd failure message, described the 3rd failure message is that described common host node sends after determining the virtual machine that has inefficacy, carries the identification information of the virtual machine of described fault in described the 3rd failure message.

The 7th heartbeat detection module 921, for after receiving the 3rd failure message, is determined the virtual machine that has fault, carries the identification information of fault virtual machine in described the 3rd failure message;

Third high available modules 922 is restarted the virtual machine request for sink virtual machine failure message transmission, described virtual-machine fail message is to send after receiving described the 3rd failure message, described virtual-machine fail message and describedly restart the identification information that carries the fault virtual machine in the virtual machine request;

Information resources administration module 923 sends to the node at described fault virtual machine place for the configuration information of the virtual machine that described fault virtual machine is corresponding, and indicates described node to rerun described configuration information to restart described fault virtual machine.

Send heartbeat message when virtual machine proxy module 931 is normal for the virtual machine corresponding, and do not send heartbeat message when fault;

The 8th heartbeat detection module 932 after the heartbeat that the virtual machine on described slave node detected stops, determining the virtual machine that has fault, and the virtual machine that definite heartbeat stops is the fault virtual machine for the transmission situation according to described heartbeat message;

The configuration information of the fault virtual machine that the 5th resource management module 933 sends for the receiving management host node, and rerun described configuration information to restart described fault virtual machine, the configuration information of described fault virtual machine is that described management host node sends after receiving the 3rd failure message, described the 3rd failure message is that described common host node sends after receiving virtual-machine fail message, carry the identification information of described fault virtual machine in described the 3rd failure message, described virtual-machine fail message is that described slave node sends after the heartbeat that the virtual machine on described slave node detected stops, carry the identification information of described fault virtual machine in described virtual-machine fail message.

The present embodiment can be realized the autgmentability of group system by subregion.The present embodiment is by adopting the peer-type framework between host node, slave node and host node adopt star schema, can after virtual-machine fail, know in time virtual-machine fail and restart virtual machine, improve availability.

To sum up, the embodiment of the present invention, by subregion is set, can realize by increasing subregion the expansion of cluster scale; Adopt the peer-type management by a plurality of host nodes, can eliminate the HA bottleneck; By isochronous resources state information between host node, and asynchronous resource utilization information, can be so that the malfunction monitoring communication-cost be little, the state synchronized expense is little; After the heartbeat of certain slave node stops, the host node of this subregion selects other slave node in this subregion to be arbitrated, and can reduce erroneous judgement and promote availability; Adopt the peer-type framework between host node, compared to star schema, further strengthen the host node reliability; By effectively utilizing slave node, by virtual machine (vm) migration, can reduce the wasting of resources, reduce administration overhead.

Be understandable that the reference mutually of the correlated characteristic in said method and equipment.In addition, " first " in above-described embodiment, " second " etc. are for distinguishing each embodiment, and do not represent the quality of each embodiment.

One of ordinary skill in the art will appreciate that: realize that the hardware that all or part of step of said method embodiment can be relevant by program command completes, aforesaid program can be stored in computer read/write memory medium, this program, when carrying out, is carried out the step that comprises said method embodiment; And aforesaid storage medium comprises: various media that can be program code stored such as ROM, RAM, magnetic disc or CDs.

Finally it should be noted that: above embodiment only, in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment, the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: its technical scheme that still can put down in writing aforementioned each embodiment is modified, or part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of various embodiments of the present invention technical scheme.

Claims

1. the processing method of a system of virtual cluster, is characterized in that, comprising:

2. method according to claim 1, is characterized in that, if described node is common host node,

The common host node that judgement exist to be lost efficacy comprises: after arbitrary other heartbeat of common host node being detected and stopping, determining and has the common host node lost efficacy, and the common host node of common host node for losing efficacy that stop of definite heartbeat;

Described after determining the common host node that has inefficacy, the new common host node of the effect of living again comprises:

Receive the first member relation request message, carry the identification information of the common host node of the identification information of new common host node and inefficacy in described the first member relation request message, the identification information of described common host node is newly added in the first member relation list, and delete the identification information of the common host node of the described inefficacy in described the first member relation list;

3. method according to claim 1 and 2, is characterized in that, when described node is common host node,

The slave node that the judgement existence was lost efficacy comprises: after the heartbeat of the arbitrary slave node in described common host node place subregion being detected stops, determining and have the slave node lost efficacy, and the slave node that definite heartbeat stops is the slave node lost efficacy;

Described after determining the slave node that has inefficacy, the new slave node of the effect of living again comprises:

Receive the second member relation request message, carry the identification information of the slave node of the identification information of new slave node and inefficacy in described the second member relation request message, the identification information of described new slave node is added in the second member relation list, and delete the identification information of the slave node of the described inefficacy in described the second member relation list;

Wherein, for the management host node receives after the second failure message, the slave node in the slave node place subregion of described inefficacy, gravity treatment obtains described new slave node, described the second failure message is that described common host node sends after determining the slave node that has inefficacy, carries the identification information of the slave node of described inefficacy in described the second failure message.

4. method according to claim 1 and 2, is characterized in that, when described node is common host node,

Judgement exists the virtual machine of fault to comprise: the virtual-machine fail message that the arbitrary slave node in receiving described common host node place subregion sends, perhaps, after the heartbeat that the virtual machine of self detected stops, determine the virtual machine there is fault, and the virtual machine that the virtual machine of described virtual-machine fail message indication or heartbeat stop is defined as to the virtual machine of fault;

Describedly after determining and having the virtual machine of fault, restart virtual machine, comprising:

When self virtual-machine fail, the configuration information of the virtual machine of the fault that the receiving management host node sends, and rerun described configuration information to restart the virtual machine of described fault, the configuration information of the virtual machine of described fault is that described management host node sends after receiving the 3rd failure message, described the 3rd failure message is that described common host node sends after determining the virtual machine that has inefficacy, carries the identification information of the virtual machine of described fault in described the 3rd failure message.

5. method according to claim 1, is characterized in that, when described node is the management host node,

Judgement exists the common host node lost efficacy to comprise: after receiving Fisrt fault message, determine and have the common host node lost efficacy, described Fisrt fault message is that common host node sends after determining the common host node that has inefficacy, carries the identification information of the common host node of described inefficacy in described Fisrt fault message;

Receive the host node failure message, carry the identification information of the common host node of described inefficacy in described host node failure message, in the slave node of the common host node place of described inefficacy subregion, gravity treatment goes out a new common host node, and the identification information of the common host node of the identification information of described common host node newly and inefficacy is carried in the first migration virtual machine request and sends, described host node failure message is to send after receiving described Fisrt fault message;

According to described the first migration virtual machine request message, send on described new common host node by the identification information of the virtual machine on the common host node of described inefficacy and restart described virtual machine.

6. method according to claim 1 or 5, is characterized in that, when described node during for the management host node,

Judgement exists the slave node lost efficacy to comprise: after receiving the second failure message, determine and have the slave node lost efficacy, described the second failure identification information is that common host node sends after determining the slave node that has inefficacy, carries the identification information of the slave node of described inefficacy in described the second failure identification information;

Receive the slave node failure message, carry the identification information of the slave node of described inefficacy in described slave node failure message, in the subregion of the slave node place of described inefficacy, gravity treatment goes out a new slave node, and the identification information of the slave node of the identification information of described new slave node and inefficacy is carried in the second migration virtual machine request and sends, described slave node failure message is to send after receiving described the second failure message;

According to described the second migration virtual machine request message, send on described new slave node by the identification information of the virtual machine on the slave node of described inefficacy and restart described virtual machine.

7. method according to claim 1 or 5, is characterized in that, when described node during for the management host node,

Judgement exists the virtual machine of fault to comprise: after receiving the 3rd failure message, determine the virtual machine that has fault, carry the identification information of fault virtual machine in described the 3rd failure message;

The sink virtual machine failure message also sends and to restart the virtual machine request, and described virtual-machine fail message is to send after receiving described the 3rd failure message, described virtual-machine fail message and describedly restart the identification information that carries the fault virtual machine in the virtual machine request;

By described fault virtual machine, the configuration information of corresponding virtual machine sends to the node at described fault virtual machine place, and indicates described node to rerun described configuration information to restart described fault virtual machine.

8. method according to claim 1, is characterized in that, when described node is slave node,

There is the slave node lost efficacy in judgement, and after determining the slave node that existence was lost efficacy, the new slave node of the effect of living again comprises:

When not losing efficacy, described slave node sends heartbeat message, do not send heartbeat message when losing efficacy, so that the common host node of described slave node place subregion determines according to the situation of described heartbeat message whether described slave node lost efficacy, and carry out lower electric treatment when the slave node from as losing efficacy, perhaps, self not the slave node lost efficacy and receiving whether slave node corresponding to the rear detection of the request of detection is the slave node lost efficacy, and testing result is notified to described common host node, make described common host node be lived again and imitate the slave node processing, described detection request is not send after described common host node is received the heartbeat message of arbitrary slave node within a certain period of time, carry the identification information of the slave node that heartbeat stops in described detection request.

9. according to the described method of claim 1 or 8, it is characterized in that, when described node is slave node,

Judgement exists the virtual machine of fault to comprise:

Send heartbeat message when corresponding virtual machine is normal, and does not send heartbeat message when fault;

After the heartbeat that the virtual machine on described slave node detected stops, determining the virtual machine that has fault, and the virtual machine that definite heartbeat stops is the fault virtual machine according to the transmission situation of described heartbeat message;

The configuration information of the fault virtual machine that the receiving management host node sends, and rerun described configuration information to restart described fault virtual machine, the configuration information of described fault virtual machine is that described management host node sends after receiving the 3rd failure message, described the 3rd failure message is that described common host node sends after receiving virtual-machine fail message, carry the identification information of described fault virtual machine in described the 3rd failure message, described virtual-machine fail message is that described slave node sends after the heartbeat that the virtual machine on described slave node detected stops, carry the identification information of described fault virtual machine in described virtual-machine fail message.

10. the treatment facility of a system of virtual cluster, is characterized in that, comprising:

11. equipment according to claim 10, is characterized in that, when described equipment is common host node,

Described judging unit comprises:

The first heartbeat detection module, for after arbitrary other heartbeat of common host node being detected and stopping, determining and have the common host node lost efficacy, and the common host node of common host node for losing efficacy that stop of definite heartbeat;

Described processing unit comprises:

The first member relation administration module, for receiving the first member relation request message, carry the identification information of the common host node of the identification information of new common host node and inefficacy in described the first member relation request message, the identification information of described common host node is newly added in the first member relation list, and delete the identification information of the common host node of the described inefficacy in described the first member relation list;

12. equipment according to claim 10, is characterized in that, when described equipment is the management host node,

Described judging unit comprises:

The second heartbeat detection module, for after receiving Fisrt fault message, determine and have the common host node lost efficacy, described Fisrt fault message is that common host node sends after determining the common host node that has inefficacy, carries the identification information of the common host node of described inefficacy in described Fisrt fault message;

Described processing unit comprises:

The first high available modules, for receiving the host node failure message, carry the identification information of the common host node of described inefficacy in described host node failure message, in the slave node of the common host node place of described inefficacy subregion, gravity treatment goes out a new common host node, and the identification information of the common host node of the identification information of described common host node newly and inefficacy is carried in the first migration virtual machine request and sends, described host node failure message is to send after receiving described Fisrt fault message;

The first resource administration module, for according to described the first migration virtual machine request message, send on described new common host node by the configuration information of the virtual machine on the common host node of described inefficacy and restart described virtual machine.

13. according to the described equipment of claim 10 or 11, it is characterized in that, when described equipment is common host node,

Described judging unit and processing unit are arranged in the 3rd heartbeat detection module, after described the 3rd heartbeat detection module stops for the heartbeat of arbitrary slave node in described common host node place subregion being detected, determine and have the slave node lost efficacy, and the slave node that definite heartbeat stops is the slave node lost efficacy;

14. according to the described equipment of claim 10 or 12, it is characterized in that, when described equipment is the management host node,

Described judging unit comprises:

The 4th heartbeat detection module, for after receiving the second failure message, determine and have the slave node lost efficacy, described the second failure identification information is that common host node sends after determining the slave node that has inefficacy, carries the identification information of the slave node of described inefficacy in described the second failure identification information;

Described processing unit comprises:

The second high available modules, for receiving the slave node failure message, carry the identification information of the slave node of described inefficacy in described slave node failure message, in the subregion of the slave node place of described inefficacy, gravity treatment goes out a new slave node, and the identification information of the slave node of the identification information of described new slave node and inefficacy is carried in the second migration virtual machine request and sends, described slave node failure message is to send after receiving described the second failure message;

The Secondary resource administration module, for according to described the second migration virtual machine request message, send on described new slave node by the identification information of the virtual machine on the slave node of described inefficacy and restart described virtual machine.

15. equipment according to claim 10, is characterized in that, when described equipment is slave node,

Described judging unit and described processing unit form the 5th heartbeat detection module, described the 5th heartbeat detection module for sending heartbeat message when described slave node did not lose efficacy, do not send heartbeat message when losing efficacy, so that the common host node of described slave node place subregion determines according to the situation of described heartbeat message whether described slave node lost efficacy, and carry out lower electric treatment when the slave node from as losing efficacy, perhaps, self not the slave node lost efficacy and receiving whether slave node corresponding to the rear detection of the request of detection is the slave node lost efficacy, and testing result is notified to described common host node, make described common host node be lived again and imitate the slave node processing, described detection request is not send after described common host node is received the heartbeat message of arbitrary slave node within a certain period of time, carry the identification information of the slave node that heartbeat stops in described detection request.

16. according to the described equipment of claim 10 or 11, it is characterized in that, when described equipment is common host node,

Described judging unit comprises:

The 6th heartbeat detection module, the virtual-machine fail message sent for the arbitrary slave node in receiving described common host node place subregion, perhaps, after the heartbeat that the virtual machine of self detected stops, determine the virtual machine there is fault, and the virtual machine that the virtual machine of described virtual-machine fail message indication or heartbeat stop is defined as to the virtual machine of fault;

Described processing unit comprises:

The 4th resource management module, during for the virtual-machine fail when self, the configuration information of the virtual machine of the fault that the receiving management host node sends, and rerun described configuration information to restart the virtual machine of described fault, the configuration information of the virtual machine of described fault is that described management host node sends after receiving the 3rd failure message, described the 3rd failure message is that described common host node sends after determining the virtual machine that has inefficacy, carries the identification information of the virtual machine of described fault in described the 3rd failure message.

17. according to the described equipment of claim 10 or 12, it is characterized in that, when described equipment is the management host node,

Described judging unit comprises:

The 7th heartbeat detection module, for after receiving the 3rd failure message, determine the virtual machine that has fault, carries the identification information of fault virtual machine in described the 3rd failure message;

Described processing unit comprises:

The third high available modules, restart the virtual machine request for sink virtual machine failure message transmission, described virtual-machine fail message is to send after receiving described the 3rd failure message, described virtual-machine fail message and describedly restart the identification information that carries the fault virtual machine in the virtual machine request;

The information resources administration module, send to the node at described fault virtual machine place for the configuration information of the virtual machine that described fault virtual machine is corresponding, and indicate described node to rerun described configuration information to restart described fault virtual machine.

18. according to the described equipment of claim 10 or 15, it is characterized in that, when described equipment is slave node,

Described judging unit comprises:

The virtual machine proxy module, send heartbeat message, and do not send heartbeat message when fault when normal for the virtual machine corresponding;

The 8th heartbeat detection module, after the heartbeat that the virtual machine on described slave node detected stops, determining the virtual machine that has fault, and the virtual machine that definite heartbeat stops is the fault virtual machine for the transmission situation according to described heartbeat message;

Described processing unit comprises:

The 5th resource management module, the configuration information of the fault virtual machine sent for the receiving management host node, and rerun described configuration information to restart described fault virtual machine, the configuration information of described fault virtual machine is that described management host node sends after receiving the 3rd failure message, described the 3rd failure message is that described common host node sends after receiving virtual-machine fail message, carry the identification information of described fault virtual machine in described the 3rd failure message, described virtual-machine fail message is that described slave node sends after the heartbeat that the virtual machine on described slave node detected stops, carry the identification information of described fault virtual machine in described virtual-machine fail message.

19. a system of virtual cluster, is characterized in that, comprising:

Adopt the peer-type framework between host node in different subregions;

Adopt star schema between host node in each subregion and slave node;

20. system according to claim 19, is characterized in that,

Described common host node is equipment as claimed in claim 11; Described management host node is equipment as claimed in claim 12;

Perhaps,

Described common host node is equipment as claimed in claim 13; Described management host node is equipment as claimed in claim 14; And described slave node is equipment as claimed in claim 15;

Perhaps,

Described common host node is equipment as claimed in claim 16; Described management host node is equipment as claimed in claim 17; And described slave node is equipment as claimed in claim 18.