CN110377487A

CN110377487A - A kind of method and device handling high-availability cluster fissure

Info

Publication number: CN110377487A
Application number: CN201910622641.3A
Authority: CN
Inventors: 吴业亮
Original assignee: Wuxi Huayun Data Technology Service Co Ltd
Current assignee: Wuxi Huayun Data Technology Service Co Ltd
Priority date: 2019-07-11
Filing date: 2019-07-11
Publication date: 2019-10-25

Abstract

The present invention provides a kind of method and devices for handling high-availability cluster fissure, this method comprises: disposing house dog component in the node that high-availability cluster is configured, the status data of business unit in corresponding node is detected by the house dog component, and the house dog component that node each in high-availability cluster is disposed is established into synchronized links, when the status data that house dog component detects business unit in corresponding node is abnormal, the triggering of house dog component freezes event to corresponding node.Revealed a kind of method and device for handling high-availability cluster fissure through the invention, realize the technical effect of fissure phenomenon caused by the node failure that discovery high-availability cluster is configured in time, and synchronization process can be executed to data between each node after failure node recovery, to ensure that the strong consistency of data in each node in high-availability cluster, the unordered contention to permission and resource is avoided.

Description

A kind of method and device handling high-availability cluster fissure

Technical field

The present invention relates to field of cloud computer technology more particularly to a kind of methods and one kind for handling high-availability cluster fissure Handle the device of high-availability cluster fissure.

Background technique

Fissure (split-brain) refers in a High Availabitity (High Availability, HA) system, works as connection Two node decouplings when, originally be an entirety system, be split into two isolated nodes, at this moment two nodes are opened Beginning fights for shared resource, the phenomenon that so as to cause system perturbations, corrupted data.With internet and cloud computing fast development and It is continuously increased based on portfolio caused by user, it is higher and higher to the reliability and performance requirement of business.In actual production In environment, most clusters are all High Availabitities, once fissure occurs, multiple nodes in cluster or high-availability cluster Role becomes host node, executes write operation with the node (i.e. from node) that with multiple scripts can be per family host node at this time The operations that can be performed on equal nodes, there is a situation where inconsistent so as to cause the data between each node.

In the prior art, the common practice of fissure, which occur, in processing cluster or high-availability cluster is, directly to disconnection Node executes shutoff operation and restarts computer system, then restores the initial environment of computer in disconnected node, after recovery The node is rejoined in cluster or high-availability cluster again, to provide service again.When high-availability cluster is restored, can send out The inconsistent situation of data between raw each node, at this time can only be by the way of artificial merging data between each node Data carry out error correction, to judge newest node, and the data in newest node are copied in other nodes, it is each to guarantee Strong consistency is presented in data between node.

However the above-mentioned prior art is the consistency that the mode based on manual intervention realizes data between master and slave node, because This causes to need that a large amount of manpower is spent to go to intervene after fissure occurs, and cannot achieve and occur the first of fissure in cluster Time advance intervenes the problem inconsistent to data between each node caused by fissure, and can not to control each node it Between because of the problem of data variance caused by fissure between each node expands.

Summary of the invention

It is an object of the invention to disclose a kind of method for handling high-availability cluster fissure, to realize that discovery fissure is simultaneously immediately Intervened in advance, and synchronization process is executed to data between each node after fissure generation, to guarantee in high-availability cluster The strong consistency of data in each node；Meanwhile it being based on identical invention thought, the invention also discloses a kind of processing High Availabitity collection The device that mass-brain is split.

To realize above-mentioned first goal of the invention, the present invention provides a kind of method for handling high-availability cluster fissure, packets It includes:

House dog component is disposed in the node that high-availability cluster is configured；

The status data of business unit in corresponding node is detected by the house dog component, and will be each in high-availability cluster The house dog component that a node is disposed establishes synchronized links；

When the status data that house dog component detects business unit in corresponding node is abnormal, the touching of house dog component Hair freezes event to corresponding node.

As a further improvement of the present invention, the triggering of house dog component after the operation for freezing event of corresponding node to also wrapping It includes:

The data in the node that status data is abnormal are deleted, business in corresponding node is at least read by house dog component The flag bit field of unit, by comparing timestamp newest in flag bit field, to determine the current master in high-availability cluster Node, and the current data in the host node is synchronized to other nodes in high-availability cluster.

The data in the node that status data is abnormal are deleted, notify the High Availabitity component disposed in corresponding node, And by High Availabitity component read corresponding node in business unit flag bit field, by comparing it is newest in flag bit field when Between stab, to determine the current primary node in high-availability cluster, and the current data in the host node is synchronized to High Availabitity collection Other nodes in group.

As a further improvement of the present invention, the business unit is database, server or load balancing component；

The house dog component is kernel level house dog component；

The load balancing component is Apache, Ngnix, lvs or HAProxy.

As a further improvement of the present invention, the business unit is according to library, server or load balancing component；

The house dog component is user space level house dog component；

The load balancing component is Apache, Ngnix, lvs or HAProxy.

As a further improvement of the present invention, the house dog component disposed among each node in the high-availability cluster Between at least through heartbeat mechanism establish synchronized links.

As a further improvement of the present invention, the heartbeat mechanism rely on heartbeat, corosync, keepalive or Person cman is realized.

As a further improvement of the present invention, the high-availability cluster is in a manner of master-slave mode, symmetry approach or multimachine It communicates.

As a further improvement of the present invention, the house dog component is made of monitoring module and resetting module；

The status data of business unit in monitoring module detecting corresponding node, when the monitoring module detects pair When the status data of business unit in node being answered to be abnormal, resetting module is called by monitoring module, passes through the resetting module Reset operation is executed to the node that status data is abnormal.

As a further improvement of the present invention, after the monitoring module calls resetting module, further includes:

Modification node data request is initiated to current primary node, the corresponding data of request are synchronized in high-availability cluster The slave node that non-generating state data are abnormal.

As a further improvement of the present invention, the resetting module executes resetting behaviour to the node that status data is abnormal After work, further includes:

It deletes after monitoring module calls resetting module and initiates to modify the corresponding number of node data request to current primary node According to.

Based on identical invention thought, to realize above-mentioned second goal of the invention, the present invention provides a kind of processing High Availabitities The device of cluster fissure, comprising:

House dog component is deployed in each node that high-availability cluster is configured；

The status data of business unit in the house dog component detecting corresponding node, and will be each in high-availability cluster Synchronized links are established between the house dog component disposed among node, when house dog component detects business sheet in corresponding node When the status data of member is abnormal, the triggering of house dog component freezes event to corresponding node.

Compared with prior art, the beneficial effects of the present invention are: revealed a kind of processing High Availabitity collection through the invention The method and device that mass-brain is split realizes fissure phenomenon caused by the node failure that discovery high-availability cluster is configured in time Technical effect, and synchronization process can be executed to data between each node after failure node recovery, to ensure that height can With the strong consistency of data in node each in cluster, the unordered contention to permission and resource is avoided.

Detailed description of the invention

Fig. 1 is the topology for the cluster for running a kind of method for handling high-availability cluster fissure of the present invention and configuring HA component Figure；

Fig. 2 is the topology for the cluster for running a kind of method for handling high-availability cluster fissure of the present invention and HA component being not configured Figure；

Fig. 3 is that corresponding node is freezed event and occurred based on house dog component triggering in scene illustrated in fig. 2 Current data in host node is synchronized to the schematic diagram of other nodes in high-availability cluster by the node of failure after restoring；

Fig. 4 is a kind of flow chart for the method for handling high-availability cluster fissure of the present invention.

Specific embodiment

The present invention is described in detail for each embodiment shown in reference to the accompanying drawing, but it should be stated that, these Embodiment is not limitation of the present invention, those of ordinary skill in the art according to these embodiments made by function, method, Or equivalent transformation or substitution in structure, all belong to the scope of protection of the present invention within.

Elaborate a kind of specific implementation of the method and device of disclosed processing high-availability cluster fissure Before journey, to fissure occur the reason of and other relevant technical terms give necessary explanation.

Fissure be because cluster perhaps have node in cluster or high-availability cluster caused by high-availability cluster division because Heartbeat when processor is busy or other reasons temporarily cease response between (and be understood to be and break down), with other nodes goes out Existing failure, but these nodes, also in active state, other nodes could incorrectly assume that the node " in heaven ", so that contention is shared The access right of resource (such as shared storage), is split into two parts isolated node.Cluster is the service entities of one group of collaborative work, is used To provide the service platform for having more scalability and availability than single service entity.In client (Client), a collection Group is exactly a service entities, but in fact cluster is made of one group of service entities.Therefore, in the present embodiment, "High Availabitity collection Group" with "Cluster" can be used as technical equivalents and understood.

Embodiment one:

Join a kind of a kind of specific embodiment of method for handling high-availability cluster fissure illustrated in fig. 4.This method packet Include following steps:

Step S1, house dog component is disposed in the node that high-availability cluster is configured.

High-availability cluster or cluster 100 as shown in connection with fig. 1, in the high-availability cluster 100 configured with node 1, node 2 and Node 3.Certain three nodes are the exemplary explanations for the node disposed to high-availability cluster 100.In high-availability cluster During 100 provide service or response user upwards, a node in node 1, node 2 and node 3 is defined as one A host node (Master), and be from node (Slave) by other two node definition.

Wherein, the database 31 as a kind of form of expression of business unit is disposed in node 1, deployment is used as industry in node 2 A kind of database 32 for form of expression of unit of being engaged in, database 33 of the deployment as a kind of form of expression of business unit in node 3.Together When, in multiple embodiments disclosed herein using node 1 before fissure phenomenon occurs as host node (Master), and Node 2 and node 3 are used as from node (Slave).Meanwhile business unit is also configured to the group in other high-availability clusters Part, such as: server or load balancing component；Wherein, server include but is not limited to virtual machine server, web server, Application server, container etc..

More specifically, in the present embodiment, load balancing component is Apache, Ngnix, lvs or HAProxy.It is high Availability cluster 100 or high-availability cluster 100a pass through the Restful API configured upwards and interact with client (Client), with Establish the unidirectional or two-way access operation between client and business unit.Specifically, access operation can be read operation, write The operations such as operation, modification operation, migration operation, image file backup, and rely on the specific function of business unit institute and be achieved. Due to business unit and the innovative point of non-present invention, therefore not excessive expansion is introduced in this application.

Further, which is also configured to data center, caching, MQ, infrastructure service or third party's clothes Business.It, both can be as shown in Figure 1, in each node when business unit is configured to database (DB) in conjunction with shown in Fig. 1 and Fig. 2 HA component 11, HA component 12 and HA component 13 is respectively configured, it can also be as shown in Fig. 2, configuration HA component 11, HA component can be omitted 12 and HA component 13.If business unit is configured as server, technical solution shown in figure 1 must be used, each HA component 11~13 is configured in node.

Exemplary, which can be vSphere YARN HA, and can according to high-availability cluster 100 or The framework of person's high-availability cluster 100a selects the HA component of adaptation.

The present embodiment is elaborated with scene illustrated in fig. 2.House dog component 21 is configured in node 1 and by being somebody's turn to do It sees the status data in 21 detecting data library 31 of door component, house dog component 22 is configured in node 2 and sees that door component 22 is detectd by this The status data in measured data library 32 configures house dog component 23 in node 3 and sees 23 detecting data library 33 of door component by this Status data.High-availability cluster 100a is communicated in a manner of master-slave mode, symmetry approach or multimachine, is selected in the present embodiment Master-slave mode communicates.Node 1 and node 2, node 3 form main and subordinate node relationship as a result,.High-availability cluster 100a is mainly realized Auto-Sensing (Auto-Detect) failure, automatic switchover/failure transfer (Fail Over) and automatic recovery (Fail Back).

High-availability cluster 100a can be by the house dog component disposed in each node to node 1, section when fissure not occurring The status data of point 2 and the database in node 3 is monitored, and the database 31 in the node 1 as host node occurs Data modification perhaps configuration modification when the number that will be synchronized to because of data corresponding to data modification or configuration modification in node 2 According to the database 33 in library 32 and node 3.

Step S2, the status data of business unit in corresponding node is detected by the house dog component, and by High Availabitity The house dog component that each node is disposed in cluster establishes synchronized links.

Synchronized links are established between house dog component 21 and house dog component 22 and house dog component 23.Preferably, Gao Ke Synchronized links are established at least through heartbeat mechanism between the house dog component disposed among node each in cluster 100a.Tool Body, which relies on heartbeat, corosync, keepalive or cman and realizes.To pass through house dog component 21~23 respectively detect the status data in the database 31~33 disposed in node 1~3, to determine database 31 Whether the status data in~33 is consistent.House dog component 21~23 can periodically monitor the status number of database in this node According to, and the house dog component configured with other nodes synchronizes, and avoids occurring in high-availability cluster 100a in different nodes Configuration as between the database of business unit the inconsistent problem of data, avoid generation fissure phenomenon.

Step S3, it when the status data that house dog component detects business unit in corresponding node is abnormal, guards the gate The triggering of dog component freezes event to corresponding node.

Caused by the reasons such as the status data of business unit, which is abnormal, to be disconnected by network, delay machine.Assuming that guarding the gate The status data that dog component 22 detects the database 32 in node 2 is abnormal, then is triggered by house dog component 22 to node 2 Freeze event, with the data simultaneously operating between disconnected node 2 and node 1.It is triggered by house dog component 22 to node 2 Freeze event, fissure can occur because of the entire high-availability cluster 100a caused by breaking down in discovery node 2 in time.Based on jelly Knot event, the disconnection of node 2 is synchronous with the data between other nodes in high-availability cluster 100a, to prevent as host node The lasting synchronized update of data in database 21 in node 1 is to database 32 and database 33, to prevent fissure phenomenon Further occurrence diffusion and deterioration in high-availability cluster 100a.

Meanwhile in the present embodiment, house dog component 22 trigger to node 2 freeze event after, execute deletion state The data in node 21 that data are abnormal, and will be as in the database 21 of the node 1 of host node after the recovery of node 2 Current data executes data simultaneously operating, to be written in the database 32 of node 2.Simultaneously operating of the database 31 to database 32 After completing afterwards, detected respectively by house dog component 21 and house dog component 22, and in house dog component 21 and house dog Component 22 executes simultaneously operating.

Meanwhile in the present embodiment, house dog component 21~23 is kernel level house dog component.Kernel level house dog component Multiple user class multiplexings may be implemented, can create, cancel and dispatch the technical effect of these user-level threads.Kernel level House dog component can be hardware circuit in realization and be also possible to software timer.House dog component 21~23 can be in system Reset automatically system when failure.Under linux kernel, the basic functional principle of watchdog is: working as watchdog After (i.e. house dog component) starting (after i.e./dev/watchdog equipment is opened), if in the time interval of a certain setting/ Dev/watchdog is not performed write operation, and hardware watchdog circuit or software timer will restarting systems.By This, a kind of revealed method for handling high-availability cluster fissure, realizes in high-availability cluster 100a through this embodiment Each node in the anticipation in advance for fissure occur, and ensure each in high-availability cluster 100a after the node of failure restores The strong consistency of a node data.

Embodiment two:

The present embodiment discloses a kind of a kind of variation of method (hereinafter referred to as method) for handling high-availability cluster fissure.

Compared with embodiment one, the main distinction is the revealed method of the present embodiment, in the present embodiment, if In scene illustrated in fig. 2, after the triggering of house dog component 22 is to the operation for freezing event of corresponding node further include: delete state The data in node 2 that data are abnormal at least read business sheet in corresponding node (i.e. node 2) by house dog component 22 Member, i.e. flag bit field in database 32, by comparing timestamp newest in flag bit field, to determine high-availability cluster Current primary node in 100a, and by the current data in the host node be synchronized in high-availability cluster 100a other section Point.

If in scene out shown in Fig. 1, after the triggering of house dog component 22 is to the operation for freezing event of corresponding node Further include: the data in the node 2 that status data is abnormal are deleted, notify the High Availabitity component disposed in corresponding node 2, That is HA component 12, and by the flag bit word of business unit (i.e. database 32) in the reading of HA component 12 corresponding node (i.e. node 2) Section, by comparing timestamp newest in flag bit field, to determine the current primary node in high-availability cluster 100a, and by institute State other node (i.e. node 2 and nodes that the current data in host node (i.e. node 1) is synchronized in high-availability cluster 100a 3)。

Scene either shown in figure 1 or scene illustrated in fig. 2 are being selected after the node 2 to break down restores New host node (Master) is lifted to realize using following mechanism.

House dog component 21~23 reads the flag bit field in node 1~3 respectively.If the mark in node 1 and node 3 Will bit field contains keyword " Master ", the then timestamp that will compare in flag bit field, and deletes newest timestamp Flag bit field, only the node (i.e. node 1) of timestamp flag bit field earlier is elected as High Availabitity in current state Host node in cluster 100 or 100a.

Occur between multiple nodes after being resumed through the above technical solutions, efficiently solving the node to break down The problem of host node is fought for, further ensures the strong consistency of high-availability cluster 100 or 100a data.

Join shown in Fig. 3, the house dog component is made of monitoring module and resetting module.Applicant is with seeing in node 1 Door dog component 21 does exemplary illustrated for example.

House dog component 21 is made of monitoring module 211 and resetting module 212, and the house dog group in node 2, node 3 Part 22,23 has identity logic structure.Specifically, in the present embodiment, monitoring module 211 detects the business unit in corresponding node Status data, when monitoring module 211 detects the status number of business unit (i.e. database 32) in corresponding node (i.e. node 2) When according to being abnormal, resetting module 212 being called by monitoring module 211, status data being occurred by the resetting module 212 different Normal node executes reset operation.Meanwhile after monitoring module 211 calls resetting module 212, further includes: (i.e. to current primary node Node 1) modification node data request is initiated, the corresponding data of request are synchronized to non-generating state in high-availability cluster 100a The slave node (i.e. node 3) that data are abnormal.The node (i.e. node 2) that resetting module 211 is abnormal status data is held After row reset operation, further includes: delete after monitoring module 212 calls resetting module 212 and initiated to current primary node (i.e. node 1) The corresponding data of modification node data request are gone forward side by side so as to reduce the generation of redundant data in high-availability cluster 100a One step improves the stability of high-availability cluster 100a and the consistency of data after fault recovery.

Meanwhile in the present embodiment, which is user space level house dog component.Opposite embodiment One revealed kernel level house dog component, if using user space level house dog component, it is a technical advantage that: line Journey switch than kernel level house dog component thread switch speed faster, and support more multithreading, and allow each process have from The dispatching algorithm of oneself customization, therefore there is more fine and smooth Control granularity.

Join shown in Fig. 3, applicant freezes thing to corresponding node based on house dog component triggering in scene illustrated in fig. 2 Part and other nodes being synchronized to the current data in host node after the node that breaks down restores in high-availability cluster Whole process is described in detail.1. 6. this method can give relatively sharp description by step in Fig. 3 to step.The height can It is made of with cluster 100a three nodes, wherein node 1 is host node, and node 2 and node 3 are from node.On each node by The house dog component 21 that monitoring module 211 and resetting module 212 form, and the database 31~33 of configuration on each node (or business unit of other forms).

211 reading data in real-time library 31 of monitoring module in node 1, meanwhile, the monitoring module in node 2 (simplifies description It is not specifically illustrated) reading data in real-time library 32, to determine the synchronizing information between database 31 and database 32.If in node 2 Monitoring module discovery node 2 break down, call resetting module 212 to remove resetting malfunctioning node, and notify the monitoring in node 1 Module 211, and execute step 1. to step 6..

It should be noted that in the present embodiment, each node can only reset other nodes in high-availability cluster 100a. When causing to carry out reset operation to the node to break down due to breaking down, this node has disabled, therefore is not capable of calling this Resetting module reset this node in node in house dog component.Meanwhile host node does not allow resetting, because host node passes through Data write operation is provided, to ensure data that data in entire high-availability cluster 100a in other nodes and host node are saved In strong consistency.

As shown in figure 3, causing the data in node 1 that can not be synchronized to node 2 if node 2 breaks down, and after as follows Step 1. to step 6., how to contain the further expanding and after the node that breaks down restores of fissure to by this method Treatment process be described in detail.

Step is 1.: the monitoring module 211 of node 1 finds that node 2 is sent out by reading the status data of high-availability cluster 100a Raw failure.

Step is 2.: the monitoring module 211 in node 1 calls resetting module 212.

Step is 3.: resetting module 212 arrives first is written a data into database 31 in data-base cluster, which is used for Show that the resetting module 212 of house dog component 21 in node 1 is operating resetting node 2.

Step is 4.: the house dog component 23 configured on node 3 can be synchronized to after data at once by writing on node 1 In, the monitoring module that the house dog component 23 in node 3 is configured (is not shown, and joins 21 institute of house dog component in node 1 Show) read house dog component 21 in the INFORMATION DISCOVERY node 1 resetting module 212 operating resetting node 2, in the process Node 3 keeps silent status, i.e. data manipulation does not occur for the database 33 in node 3.

Step is 5.: the monitoring module 212 of house dog component 21 passes through to executing database 32 in resetting node 2 in node 1 Data reset operation, reset operation can be the data of database 32 in deletion of node 2, and again from the data in node 1 Data on library 31 after synchronized update.

Step is 6.: resetting module 212 on node 1 in house dog component 21 can in high-availability cluster 100a by step 3. Generated data are deleted, and are occurred erroneous judgement when fissure phenomenon to prevent next time, are improved the reliable of high-availability cluster 100a The strong consistency of data between property and each node.

The technical solution of same section, please join described in embodiment one in the revealed method of the present embodiment and embodiment one, Details are not described herein.

Embodiment three:

In conjunction with shown in Fig. 1 to Fig. 3, the device that the present invention also discloses a kind of processing high-availability cluster fissure (is hereinafter referred to as filled Set), which runs a kind of method for handling high-availability cluster fissure disclosed in embodiment one and/or embodiment two.

In the present embodiment, which includes: house dog component (i.e. 21~house dog of house dog component component 23), deployment In each node that high-availability cluster is configured (i.e. 1~node of node 3).House dog component detects the business sheet in corresponding node The status data of member, and synchronized links will be established between the house dog component disposed among node each in high-availability cluster, When the status data that house dog component detects business unit in corresponding node is abnormal, house dog component is triggered to correspondence Node freezes event.

The present apparatus runs a kind of method for handling high-availability cluster fissure disclosed in embodiment one and/or embodiment two, with The technical effect of fissure phenomenon caused by the realization node failure that discovery high-availability cluster is configured in time, and can fail Node executes synchronization process to data between each node after restoring, to ensure that in high-availability cluster data in each node Strong consistency, avoid the unordered contention to permission and resource.Technical solution that the device is relied on and embodiment one and/or Disclosed in embodiment two it is a kind of handle high-availability cluster fissure method same section technical solution, please join it is described above, This is repeated no more.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The all or part of the steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk Etc. the various media that can store program code.

The series of detailed descriptions listed above only for feasible embodiment of the invention specifically Protection scope bright, that they are not intended to limit the invention, it is all without departing from equivalent implementations made by technical spirit of the present invention Or change should all be included in the protection scope of the present invention.

It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included within the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.

In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only wrapped Containing an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should It considers the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art The other embodiments being understood that.

Claims

1. a kind of method for handling high-availability cluster fissure characterized by comprising

The status data of business unit in corresponding node is detected by the house dog component, and by section each in high-availability cluster The disposed house dog component of point establishes synchronized links；

When the status data that house dog component detects business unit in corresponding node is abnormal, the triggering pair of house dog component Corresponding node freezes event.

2. the method according to claim 1, wherein freeze event of the house dog component triggering to corresponding node After operation further include:

The data in the node that status data is abnormal are deleted, business unit in corresponding node is at least read by house dog component Flag bit field, by comparing timestamp newest in flag bit field, to determine the current primary node in high-availability cluster, And the current data in the host node is synchronized to other nodes in high-availability cluster.

3. the method according to claim 1, wherein freeze event of the house dog component triggering to corresponding node After operation further include:

The data in the node that is abnormal of status data are deleted, notify the High Availabitity component disposed in corresponding node, and by High Availabitity component reads the flag bit field of business unit in corresponding node, by comparing the time newest in flag bit field Stamp, to determine the current primary node in high-availability cluster, and is synchronized to high-availability cluster for the current data in the host node In other nodes.

4. the method according to claim 1, wherein the business unit is database, server or load Balanced component；

The house dog component is kernel level house dog component；

The load balancing component is Apache, Ngnix, lvs or HAProxy.

5. the method according to claim 1, wherein the business unit is equal according to library, server or load Weigh component；

The house dog component is user space level house dog component；

The load balancing component is Apache, Ngnix, lvs or HAProxy.

6. the method according to claim 1, wherein being disposed among each node in the high-availability cluster Synchronized links are established at least through heartbeat mechanism between house dog component.

7. according to the method described in claim 6, it is characterized in that, the heartbeat mechanism rely on heartbeat, corosync, Keepalive or cman is realized.

8. method according to any one of claim 1 to 7, which is characterized in that the high-availability cluster with master-slave mode, Symmetry approach or multimachine mode communicate.

9. according to the method described in claim 8, it is characterized in that, the house dog component is by monitoring module and resetting module group At；

The status data of business unit in the monitoring module detecting corresponding node, when the monitoring module detects corresponding section When the status data of business unit is abnormal in point, resetting module is called by monitoring module, by the resetting module to shape The node that state data are abnormal executes reset operation.

10. according to the method described in claim 9, it is characterized in that, the monitoring module calls after resetting module, further includes:

Modification node data request is initiated to current primary node, the corresponding data of request are synchronized in high-availability cluster and are not sent out The slave node that raw status data is abnormal.

11. according to the method described in claim 9, it is characterized in that, the section that the resetting module is abnormal status data After point executes reset operation, further includes:

It deletes after monitoring module calls resetting module and initiates to modify the corresponding data of node data request to current primary node.

12. a kind of device for handling high-availability cluster fissure characterized by comprising

The status data of business unit in house dog component detecting corresponding node, and by node each in high-availability cluster Among establish synchronized links between the house dog component disposed, when house dog component detects business unit in corresponding node When status data is abnormal, the triggering of house dog component freezes event to corresponding node.