CN110377487A - A kind of method and device handling high-availability cluster fissure - Google Patents

A kind of method and device handling high-availability cluster fissure Download PDF

Info

Publication number
CN110377487A
CN110377487A CN201910622641.3A CN201910622641A CN110377487A CN 110377487 A CN110377487 A CN 110377487A CN 201910622641 A CN201910622641 A CN 201910622641A CN 110377487 A CN110377487 A CN 110377487A
Authority
CN
China
Prior art keywords
node
house dog
dog component
availability cluster
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910622641.3A
Other languages
Chinese (zh)
Inventor
吴业亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Huayun Data Technology Service Co Ltd
Original Assignee
Wuxi Huayun Data Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Huayun Data Technology Service Co Ltd filed Critical Wuxi Huayun Data Technology Service Co Ltd
Priority to CN201910622641.3A priority Critical patent/CN110377487A/en
Publication of CN110377487A publication Critical patent/CN110377487A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3433Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment for load management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Abstract

The present invention provides a kind of method and devices for handling high-availability cluster fissure, this method comprises: disposing house dog component in the node that high-availability cluster is configured, the status data of business unit in corresponding node is detected by the house dog component, and the house dog component that node each in high-availability cluster is disposed is established into synchronized links, when the status data that house dog component detects business unit in corresponding node is abnormal, the triggering of house dog component freezes event to corresponding node.Revealed a kind of method and device for handling high-availability cluster fissure through the invention, realize the technical effect of fissure phenomenon caused by the node failure that discovery high-availability cluster is configured in time, and synchronization process can be executed to data between each node after failure node recovery, to ensure that the strong consistency of data in each node in high-availability cluster, the unordered contention to permission and resource is avoided.

Description

A kind of method and device handling high-availability cluster fissure
Technical field
The present invention relates to field of cloud computer technology more particularly to a kind of methods and one kind for handling high-availability cluster fissure Handle the device of high-availability cluster fissure.
Background technique
Fissure (split-brain) refers in a High Availabitity (High Availability, HA) system, works as connection Two node decouplings when, originally be an entirety system, be split into two isolated nodes, at this moment two nodes are opened Beginning fights for shared resource, the phenomenon that so as to cause system perturbations, corrupted data.With internet and cloud computing fast development and It is continuously increased based on portfolio caused by user, it is higher and higher to the reliability and performance requirement of business.In actual production In environment, most clusters are all High Availabitities, once fissure occurs, multiple nodes in cluster or high-availability cluster Role becomes host node, executes write operation with the node (i.e. from node) that with multiple scripts can be per family host node at this time The operations that can be performed on equal nodes, there is a situation where inconsistent so as to cause the data between each node.
In the prior art, the common practice of fissure, which occur, in processing cluster or high-availability cluster is, directly to disconnection Node executes shutoff operation and restarts computer system, then restores the initial environment of computer in disconnected node, after recovery The node is rejoined in cluster or high-availability cluster again, to provide service again.When high-availability cluster is restored, can send out The inconsistent situation of data between raw each node, at this time can only be by the way of artificial merging data between each node Data carry out error correction, to judge newest node, and the data in newest node are copied in other nodes, it is each to guarantee Strong consistency is presented in data between node.
However the above-mentioned prior art is the consistency that the mode based on manual intervention realizes data between master and slave node, because This causes to need that a large amount of manpower is spent to go to intervene after fissure occurs, and cannot achieve and occur the first of fissure in cluster Time advance intervenes the problem inconsistent to data between each node caused by fissure, and can not to control each node it Between because of the problem of data variance caused by fissure between each node expands.
Summary of the invention
It is an object of the invention to disclose a kind of method for handling high-availability cluster fissure, to realize that discovery fissure is simultaneously immediately Intervened in advance, and synchronization process is executed to data between each node after fissure generation, to guarantee in high-availability cluster The strong consistency of data in each node;Meanwhile it being based on identical invention thought, the invention also discloses a kind of processing High Availabitity collection The device that mass-brain is split.
To realize above-mentioned first goal of the invention, the present invention provides a kind of method for handling high-availability cluster fissure, packets It includes:
House dog component is disposed in the node that high-availability cluster is configured;
The status data of business unit in corresponding node is detected by the house dog component, and will be each in high-availability cluster The house dog component that a node is disposed establishes synchronized links;
When the status data that house dog component detects business unit in corresponding node is abnormal, the touching of house dog component Hair freezes event to corresponding node.
As a further improvement of the present invention, the triggering of house dog component after the operation for freezing event of corresponding node to also wrapping It includes:
The data in the node that status data is abnormal are deleted, business in corresponding node is at least read by house dog component The flag bit field of unit, by comparing timestamp newest in flag bit field, to determine the current master in high-availability cluster Node, and the current data in the host node is synchronized to other nodes in high-availability cluster.
As a further improvement of the present invention, the triggering of house dog component after the operation for freezing event of corresponding node to also wrapping It includes:
The data in the node that status data is abnormal are deleted, notify the High Availabitity component disposed in corresponding node, And by High Availabitity component read corresponding node in business unit flag bit field, by comparing it is newest in flag bit field when Between stab, to determine the current primary node in high-availability cluster, and the current data in the host node is synchronized to High Availabitity collection Other nodes in group.
As a further improvement of the present invention, the business unit is database, server or load balancing component;
The house dog component is kernel level house dog component;
The load balancing component is Apache, Ngnix, lvs or HAProxy.
As a further improvement of the present invention, the business unit is according to library, server or load balancing component;
The house dog component is user space level house dog component;
The load balancing component is Apache, Ngnix, lvs or HAProxy.
As a further improvement of the present invention, the house dog component disposed among each node in the high-availability cluster Between at least through heartbeat mechanism establish synchronized links.
As a further improvement of the present invention, the heartbeat mechanism rely on heartbeat, corosync, keepalive or Person cman is realized.
As a further improvement of the present invention, the high-availability cluster is in a manner of master-slave mode, symmetry approach or multimachine It communicates.
As a further improvement of the present invention, the house dog component is made of monitoring module and resetting module;
The status data of business unit in monitoring module detecting corresponding node, when the monitoring module detects pair When the status data of business unit in node being answered to be abnormal, resetting module is called by monitoring module, passes through the resetting module Reset operation is executed to the node that status data is abnormal.
As a further improvement of the present invention, after the monitoring module calls resetting module, further includes:
Modification node data request is initiated to current primary node, the corresponding data of request are synchronized in high-availability cluster The slave node that non-generating state data are abnormal.
As a further improvement of the present invention, the resetting module executes resetting behaviour to the node that status data is abnormal After work, further includes:
It deletes after monitoring module calls resetting module and initiates to modify the corresponding number of node data request to current primary node According to.
Based on identical invention thought, to realize above-mentioned second goal of the invention, the present invention provides a kind of processing High Availabitities The device of cluster fissure, comprising:
House dog component is deployed in each node that high-availability cluster is configured;
The status data of business unit in the house dog component detecting corresponding node, and will be each in high-availability cluster Synchronized links are established between the house dog component disposed among node, when house dog component detects business sheet in corresponding node When the status data of member is abnormal, the triggering of house dog component freezes event to corresponding node.
Compared with prior art, the beneficial effects of the present invention are: revealed a kind of processing High Availabitity collection through the invention The method and device that mass-brain is split realizes fissure phenomenon caused by the node failure that discovery high-availability cluster is configured in time Technical effect, and synchronization process can be executed to data between each node after failure node recovery, to ensure that height can With the strong consistency of data in node each in cluster, the unordered contention to permission and resource is avoided.
Detailed description of the invention
Fig. 1 is the topology for the cluster for running a kind of method for handling high-availability cluster fissure of the present invention and configuring HA component Figure;
Fig. 2 is the topology for the cluster for running a kind of method for handling high-availability cluster fissure of the present invention and HA component being not configured Figure;
Fig. 3 is that corresponding node is freezed event and occurred based on house dog component triggering in scene illustrated in fig. 2 Current data in host node is synchronized to the schematic diagram of other nodes in high-availability cluster by the node of failure after restoring;
Fig. 4 is a kind of flow chart for the method for handling high-availability cluster fissure of the present invention.
Specific embodiment
The present invention is described in detail for each embodiment shown in reference to the accompanying drawing, but it should be stated that, these Embodiment is not limitation of the present invention, those of ordinary skill in the art according to these embodiments made by function, method, Or equivalent transformation or substitution in structure, all belong to the scope of protection of the present invention within.
Elaborate a kind of specific implementation of the method and device of disclosed processing high-availability cluster fissure Before journey, to fissure occur the reason of and other relevant technical terms give necessary explanation.
Fissure be because cluster perhaps have node in cluster or high-availability cluster caused by high-availability cluster division because Heartbeat when processor is busy or other reasons temporarily cease response between (and be understood to be and break down), with other nodes goes out Existing failure, but these nodes, also in active state, other nodes could incorrectly assume that the node " in heaven ", so that contention is shared The access right of resource (such as shared storage), is split into two parts isolated node.Cluster is the service entities of one group of collaborative work, is used To provide the service platform for having more scalability and availability than single service entity.In client (Client), a collection Group is exactly a service entities, but in fact cluster is made of one group of service entities.Therefore, in the present embodiment, "High Availabitity collection Group" with "Cluster" can be used as technical equivalents and understood.
Embodiment one:
Join a kind of a kind of specific embodiment of method for handling high-availability cluster fissure illustrated in fig. 4.This method packet Include following steps:
Step S1, house dog component is disposed in the node that high-availability cluster is configured.
High-availability cluster or cluster 100 as shown in connection with fig. 1, in the high-availability cluster 100 configured with node 1, node 2 and Node 3.Certain three nodes are the exemplary explanations for the node disposed to high-availability cluster 100.In high-availability cluster During 100 provide service or response user upwards, a node in node 1, node 2 and node 3 is defined as one A host node (Master), and be from node (Slave) by other two node definition.
Wherein, the database 31 as a kind of form of expression of business unit is disposed in node 1, deployment is used as industry in node 2 A kind of database 32 for form of expression of unit of being engaged in, database 33 of the deployment as a kind of form of expression of business unit in node 3.Together When, in multiple embodiments disclosed herein using node 1 before fissure phenomenon occurs as host node (Master), and Node 2 and node 3 are used as from node (Slave).Meanwhile business unit is also configured to the group in other high-availability clusters Part, such as: server or load balancing component;Wherein, server include but is not limited to virtual machine server, web server, Application server, container etc..
More specifically, in the present embodiment, load balancing component is Apache, Ngnix, lvs or HAProxy.It is high Availability cluster 100 or high-availability cluster 100a pass through the Restful API configured upwards and interact with client (Client), with Establish the unidirectional or two-way access operation between client and business unit.Specifically, access operation can be read operation, write The operations such as operation, modification operation, migration operation, image file backup, and rely on the specific function of business unit institute and be achieved. Due to business unit and the innovative point of non-present invention, therefore not excessive expansion is introduced in this application.
Further, which is also configured to data center, caching, MQ, infrastructure service or third party's clothes Business.It, both can be as shown in Figure 1, in each node when business unit is configured to database (DB) in conjunction with shown in Fig. 1 and Fig. 2 HA component 11, HA component 12 and HA component 13 is respectively configured, it can also be as shown in Fig. 2, configuration HA component 11, HA component can be omitted 12 and HA component 13.If business unit is configured as server, technical solution shown in figure 1 must be used, each HA component 11~13 is configured in node.
Exemplary, which can be vSphere YARN HA, and can according to high-availability cluster 100 or The framework of person's high-availability cluster 100a selects the HA component of adaptation.
The present embodiment is elaborated with scene illustrated in fig. 2.House dog component 21 is configured in node 1 and by being somebody's turn to do It sees the status data in 21 detecting data library 31 of door component, house dog component 22 is configured in node 2 and sees that door component 22 is detectd by this The status data in measured data library 32 configures house dog component 23 in node 3 and sees 23 detecting data library 33 of door component by this Status data.High-availability cluster 100a is communicated in a manner of master-slave mode, symmetry approach or multimachine, is selected in the present embodiment Master-slave mode communicates.Node 1 and node 2, node 3 form main and subordinate node relationship as a result,.High-availability cluster 100a is mainly realized Auto-Sensing (Auto-Detect) failure, automatic switchover/failure transfer (Fail Over) and automatic recovery (Fail Back).
High-availability cluster 100a can be by the house dog component disposed in each node to node 1, section when fissure not occurring The status data of point 2 and the database in node 3 is monitored, and the database 31 in the node 1 as host node occurs Data modification perhaps configuration modification when the number that will be synchronized to because of data corresponding to data modification or configuration modification in node 2 According to the database 33 in library 32 and node 3.
Step S2, the status data of business unit in corresponding node is detected by the house dog component, and by High Availabitity The house dog component that each node is disposed in cluster establishes synchronized links.
Synchronized links are established between house dog component 21 and house dog component 22 and house dog component 23.Preferably, Gao Ke Synchronized links are established at least through heartbeat mechanism between the house dog component disposed among node each in cluster 100a.Tool Body, which relies on heartbeat, corosync, keepalive or cman and realizes.To pass through house dog component 21~23 respectively detect the status data in the database 31~33 disposed in node 1~3, to determine database 31 Whether the status data in~33 is consistent.House dog component 21~23 can periodically monitor the status number of database in this node According to, and the house dog component configured with other nodes synchronizes, and avoids occurring in high-availability cluster 100a in different nodes Configuration as between the database of business unit the inconsistent problem of data, avoid generation fissure phenomenon.
Step S3, it when the status data that house dog component detects business unit in corresponding node is abnormal, guards the gate The triggering of dog component freezes event to corresponding node.
Caused by the reasons such as the status data of business unit, which is abnormal, to be disconnected by network, delay machine.Assuming that guarding the gate The status data that dog component 22 detects the database 32 in node 2 is abnormal, then is triggered by house dog component 22 to node 2 Freeze event, with the data simultaneously operating between disconnected node 2 and node 1.It is triggered by house dog component 22 to node 2 Freeze event, fissure can occur because of the entire high-availability cluster 100a caused by breaking down in discovery node 2 in time.Based on jelly Knot event, the disconnection of node 2 is synchronous with the data between other nodes in high-availability cluster 100a, to prevent as host node The lasting synchronized update of data in database 21 in node 1 is to database 32 and database 33, to prevent fissure phenomenon Further occurrence diffusion and deterioration in high-availability cluster 100a.
Meanwhile in the present embodiment, house dog component 22 trigger to node 2 freeze event after, execute deletion state The data in node 21 that data are abnormal, and will be as in the database 21 of the node 1 of host node after the recovery of node 2 Current data executes data simultaneously operating, to be written in the database 32 of node 2.Simultaneously operating of the database 31 to database 32 After completing afterwards, detected respectively by house dog component 21 and house dog component 22, and in house dog component 21 and house dog Component 22 executes simultaneously operating.
Meanwhile in the present embodiment, house dog component 21~23 is kernel level house dog component.Kernel level house dog component Multiple user class multiplexings may be implemented, can create, cancel and dispatch the technical effect of these user-level threads.Kernel level House dog component can be hardware circuit in realization and be also possible to software timer.House dog component 21~23 can be in system Reset automatically system when failure.Under linux kernel, the basic functional principle of watchdog is: working as watchdog After (i.e. house dog component) starting (after i.e./dev/watchdog equipment is opened), if in the time interval of a certain setting/ Dev/watchdog is not performed write operation, and hardware watchdog circuit or software timer will restarting systems.By This, a kind of revealed method for handling high-availability cluster fissure, realizes in high-availability cluster 100a through this embodiment Each node in the anticipation in advance for fissure occur, and ensure each in high-availability cluster 100a after the node of failure restores The strong consistency of a node data.
Embodiment two:
The present embodiment discloses a kind of a kind of variation of method (hereinafter referred to as method) for handling high-availability cluster fissure.
Compared with embodiment one, the main distinction is the revealed method of the present embodiment, in the present embodiment, if In scene illustrated in fig. 2, after the triggering of house dog component 22 is to the operation for freezing event of corresponding node further include: delete state The data in node 2 that data are abnormal at least read business sheet in corresponding node (i.e. node 2) by house dog component 22 Member, i.e. flag bit field in database 32, by comparing timestamp newest in flag bit field, to determine high-availability cluster Current primary node in 100a, and by the current data in the host node be synchronized in high-availability cluster 100a other section Point.
If in scene out shown in Fig. 1, after the triggering of house dog component 22 is to the operation for freezing event of corresponding node Further include: the data in the node 2 that status data is abnormal are deleted, notify the High Availabitity component disposed in corresponding node 2, That is HA component 12, and by the flag bit word of business unit (i.e. database 32) in the reading of HA component 12 corresponding node (i.e. node 2) Section, by comparing timestamp newest in flag bit field, to determine the current primary node in high-availability cluster 100a, and by institute State other node (i.e. node 2 and nodes that the current data in host node (i.e. node 1) is synchronized in high-availability cluster 100a 3)。
Scene either shown in figure 1 or scene illustrated in fig. 2 are being selected after the node 2 to break down restores New host node (Master) is lifted to realize using following mechanism.
House dog component 21~23 reads the flag bit field in node 1~3 respectively.If the mark in node 1 and node 3 Will bit field contains keyword " Master ", the then timestamp that will compare in flag bit field, and deletes newest timestamp Flag bit field, only the node (i.e. node 1) of timestamp flag bit field earlier is elected as High Availabitity in current state Host node in cluster 100 or 100a.
Occur between multiple nodes after being resumed through the above technical solutions, efficiently solving the node to break down The problem of host node is fought for, further ensures the strong consistency of high-availability cluster 100 or 100a data.
Join shown in Fig. 3, the house dog component is made of monitoring module and resetting module.Applicant is with seeing in node 1 Door dog component 21 does exemplary illustrated for example.
House dog component 21 is made of monitoring module 211 and resetting module 212, and the house dog group in node 2, node 3 Part 22,23 has identity logic structure.Specifically, in the present embodiment, monitoring module 211 detects the business unit in corresponding node Status data, when monitoring module 211 detects the status number of business unit (i.e. database 32) in corresponding node (i.e. node 2) When according to being abnormal, resetting module 212 being called by monitoring module 211, status data being occurred by the resetting module 212 different Normal node executes reset operation.Meanwhile after monitoring module 211 calls resetting module 212, further includes: (i.e. to current primary node Node 1) modification node data request is initiated, the corresponding data of request are synchronized to non-generating state in high-availability cluster 100a The slave node (i.e. node 3) that data are abnormal.The node (i.e. node 2) that resetting module 211 is abnormal status data is held After row reset operation, further includes: delete after monitoring module 212 calls resetting module 212 and initiated to current primary node (i.e. node 1) The corresponding data of modification node data request are gone forward side by side so as to reduce the generation of redundant data in high-availability cluster 100a One step improves the stability of high-availability cluster 100a and the consistency of data after fault recovery.
Meanwhile in the present embodiment, which is user space level house dog component.Opposite embodiment One revealed kernel level house dog component, if using user space level house dog component, it is a technical advantage that: line Journey switch than kernel level house dog component thread switch speed faster, and support more multithreading, and allow each process have from The dispatching algorithm of oneself customization, therefore there is more fine and smooth Control granularity.
Join shown in Fig. 3, applicant freezes thing to corresponding node based on house dog component triggering in scene illustrated in fig. 2 Part and other nodes being synchronized to the current data in host node after the node that breaks down restores in high-availability cluster Whole process is described in detail.1. 6. this method can give relatively sharp description by step in Fig. 3 to step.The height can It is made of with cluster 100a three nodes, wherein node 1 is host node, and node 2 and node 3 are from node.On each node by The house dog component 21 that monitoring module 211 and resetting module 212 form, and the database 31~33 of configuration on each node (or business unit of other forms).
211 reading data in real-time library 31 of monitoring module in node 1, meanwhile, the monitoring module in node 2 (simplifies description It is not specifically illustrated) reading data in real-time library 32, to determine the synchronizing information between database 31 and database 32.If in node 2 Monitoring module discovery node 2 break down, call resetting module 212 to remove resetting malfunctioning node, and notify the monitoring in node 1 Module 211, and execute step 1. to step 6..
It should be noted that in the present embodiment, each node can only reset other nodes in high-availability cluster 100a. When causing to carry out reset operation to the node to break down due to breaking down, this node has disabled, therefore is not capable of calling this Resetting module reset this node in node in house dog component.Meanwhile host node does not allow resetting, because host node passes through Data write operation is provided, to ensure data that data in entire high-availability cluster 100a in other nodes and host node are saved In strong consistency.
As shown in figure 3, causing the data in node 1 that can not be synchronized to node 2 if node 2 breaks down, and after as follows Step 1. to step 6., how to contain the further expanding and after the node that breaks down restores of fissure to by this method Treatment process be described in detail.
Step is 1.: the monitoring module 211 of node 1 finds that node 2 is sent out by reading the status data of high-availability cluster 100a Raw failure.
Step is 2.: the monitoring module 211 in node 1 calls resetting module 212.
Step is 3.: resetting module 212 arrives first is written a data into database 31 in data-base cluster, which is used for Show that the resetting module 212 of house dog component 21 in node 1 is operating resetting node 2.
Step is 4.: the house dog component 23 configured on node 3 can be synchronized to after data at once by writing on node 1 In, the monitoring module that the house dog component 23 in node 3 is configured (is not shown, and joins 21 institute of house dog component in node 1 Show) read house dog component 21 in the INFORMATION DISCOVERY node 1 resetting module 212 operating resetting node 2, in the process Node 3 keeps silent status, i.e. data manipulation does not occur for the database 33 in node 3.
Step is 5.: the monitoring module 212 of house dog component 21 passes through to executing database 32 in resetting node 2 in node 1 Data reset operation, reset operation can be the data of database 32 in deletion of node 2, and again from the data in node 1 Data on library 31 after synchronized update.
Step is 6.: resetting module 212 on node 1 in house dog component 21 can in high-availability cluster 100a by step 3. Generated data are deleted, and are occurred erroneous judgement when fissure phenomenon to prevent next time, are improved the reliable of high-availability cluster 100a The strong consistency of data between property and each node.
The technical solution of same section, please join described in embodiment one in the revealed method of the present embodiment and embodiment one, Details are not described herein.
Embodiment three:
In conjunction with shown in Fig. 1 to Fig. 3, the device that the present invention also discloses a kind of processing high-availability cluster fissure (is hereinafter referred to as filled Set), which runs a kind of method for handling high-availability cluster fissure disclosed in embodiment one and/or embodiment two.
In the present embodiment, which includes: house dog component (i.e. 21~house dog of house dog component component 23), deployment In each node that high-availability cluster is configured (i.e. 1~node of node 3).House dog component detects the business sheet in corresponding node The status data of member, and synchronized links will be established between the house dog component disposed among node each in high-availability cluster, When the status data that house dog component detects business unit in corresponding node is abnormal, house dog component is triggered to correspondence Node freezes event.
The present apparatus runs a kind of method for handling high-availability cluster fissure disclosed in embodiment one and/or embodiment two, with The technical effect of fissure phenomenon caused by the realization node failure that discovery high-availability cluster is configured in time, and can fail Node executes synchronization process to data between each node after restoring, to ensure that in high-availability cluster data in each node Strong consistency, avoid the unordered contention to permission and resource.Technical solution that the device is relied on and embodiment one and/or Disclosed in embodiment two it is a kind of handle high-availability cluster fissure method same section technical solution, please join it is described above, This is repeated no more.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The all or part of the steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk Etc. the various media that can store program code.
The series of detailed descriptions listed above only for feasible embodiment of the invention specifically Protection scope bright, that they are not intended to limit the invention, it is all without departing from equivalent implementations made by technical spirit of the present invention Or change should all be included in the protection scope of the present invention.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included within the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.
In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only wrapped Containing an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should It considers the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art The other embodiments being understood that.

Claims (12)

1. a kind of method for handling high-availability cluster fissure characterized by comprising
House dog component is disposed in the node that high-availability cluster is configured;
The status data of business unit in corresponding node is detected by the house dog component, and by section each in high-availability cluster The disposed house dog component of point establishes synchronized links;
When the status data that house dog component detects business unit in corresponding node is abnormal, the triggering pair of house dog component Corresponding node freezes event.
2. the method according to claim 1, wherein freeze event of the house dog component triggering to corresponding node After operation further include:
The data in the node that status data is abnormal are deleted, business unit in corresponding node is at least read by house dog component Flag bit field, by comparing timestamp newest in flag bit field, to determine the current primary node in high-availability cluster, And the current data in the host node is synchronized to other nodes in high-availability cluster.
3. the method according to claim 1, wherein freeze event of the house dog component triggering to corresponding node After operation further include:
The data in the node that is abnormal of status data are deleted, notify the High Availabitity component disposed in corresponding node, and by High Availabitity component reads the flag bit field of business unit in corresponding node, by comparing the time newest in flag bit field Stamp, to determine the current primary node in high-availability cluster, and is synchronized to high-availability cluster for the current data in the host node In other nodes.
4. the method according to claim 1, wherein the business unit is database, server or load Balanced component;
The house dog component is kernel level house dog component;
The load balancing component is Apache, Ngnix, lvs or HAProxy.
5. the method according to claim 1, wherein the business unit is equal according to library, server or load Weigh component;
The house dog component is user space level house dog component;
The load balancing component is Apache, Ngnix, lvs or HAProxy.
6. the method according to claim 1, wherein being disposed among each node in the high-availability cluster Synchronized links are established at least through heartbeat mechanism between house dog component.
7. according to the method described in claim 6, it is characterized in that, the heartbeat mechanism rely on heartbeat, corosync, Keepalive or cman is realized.
8. method according to any one of claim 1 to 7, which is characterized in that the high-availability cluster with master-slave mode, Symmetry approach or multimachine mode communicate.
9. according to the method described in claim 8, it is characterized in that, the house dog component is by monitoring module and resetting module group At;
The status data of business unit in the monitoring module detecting corresponding node, when the monitoring module detects corresponding section When the status data of business unit is abnormal in point, resetting module is called by monitoring module, by the resetting module to shape The node that state data are abnormal executes reset operation.
10. according to the method described in claim 9, it is characterized in that, the monitoring module calls after resetting module, further includes:
Modification node data request is initiated to current primary node, the corresponding data of request are synchronized in high-availability cluster and are not sent out The slave node that raw status data is abnormal.
11. according to the method described in claim 9, it is characterized in that, the section that the resetting module is abnormal status data After point executes reset operation, further includes:
It deletes after monitoring module calls resetting module and initiates to modify the corresponding data of node data request to current primary node.
12. a kind of device for handling high-availability cluster fissure characterized by comprising
House dog component is deployed in each node that high-availability cluster is configured;
The status data of business unit in house dog component detecting corresponding node, and by node each in high-availability cluster Among establish synchronized links between the house dog component disposed, when house dog component detects business unit in corresponding node When status data is abnormal, the triggering of house dog component freezes event to corresponding node.
CN201910622641.3A 2019-07-11 2019-07-11 A kind of method and device handling high-availability cluster fissure Pending CN110377487A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910622641.3A CN110377487A (en) 2019-07-11 2019-07-11 A kind of method and device handling high-availability cluster fissure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910622641.3A CN110377487A (en) 2019-07-11 2019-07-11 A kind of method and device handling high-availability cluster fissure

Publications (1)

Publication Number Publication Date
CN110377487A true CN110377487A (en) 2019-10-25

Family

ID=68250876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910622641.3A Pending CN110377487A (en) 2019-07-11 2019-07-11 A kind of method and device handling high-availability cluster fissure

Country Status (1)

Country Link
CN (1) CN110377487A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112653734A (en) * 2020-12-11 2021-04-13 邦彦技术股份有限公司 Server cluster real-time master-slave control and data synchronization system and method
CN114528350A (en) * 2022-02-18 2022-05-24 苏州浪潮智能科技有限公司 Cluster split brain processing method, device and equipment and readable storage medium
CN116094940A (en) * 2023-02-15 2023-05-09 北京志凌海纳科技有限公司 VRRP brain crack inhibition method, system, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102394914A (en) * 2011-09-22 2012-03-28 浪潮(北京)电子信息产业有限公司 Cluster brain-split processing method and device
CN102521060A (en) * 2011-11-16 2012-06-27 广东新支点技术服务有限公司 Pseudo halt solving method of high-availability cluster system based on watchdog local detecting technique
CN105849702A (en) * 2013-12-25 2016-08-10 日本电气方案创新株式会社 Cluster system, server device, cluster system management method, and computer-readable recording medium
CN107147540A (en) * 2017-07-19 2017-09-08 郑州云海信息技术有限公司 Fault handling method and troubleshooting cluster in highly available system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102394914A (en) * 2011-09-22 2012-03-28 浪潮(北京)电子信息产业有限公司 Cluster brain-split processing method and device
CN102521060A (en) * 2011-11-16 2012-06-27 广东新支点技术服务有限公司 Pseudo halt solving method of high-availability cluster system based on watchdog local detecting technique
CN105849702A (en) * 2013-12-25 2016-08-10 日本电气方案创新株式会社 Cluster system, server device, cluster system management method, and computer-readable recording medium
CN107147540A (en) * 2017-07-19 2017-09-08 郑州云海信息技术有限公司 Fault handling method and troubleshooting cluster in highly available system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112653734A (en) * 2020-12-11 2021-04-13 邦彦技术股份有限公司 Server cluster real-time master-slave control and data synchronization system and method
CN112653734B (en) * 2020-12-11 2023-09-19 邦彦技术股份有限公司 Real-time master-slave control and data synchronization system and method for server cluster
CN114528350A (en) * 2022-02-18 2022-05-24 苏州浪潮智能科技有限公司 Cluster split brain processing method, device and equipment and readable storage medium
CN114528350B (en) * 2022-02-18 2024-01-16 苏州浪潮智能科技有限公司 Cluster brain fracture processing method, device, equipment and readable storage medium
CN116094940A (en) * 2023-02-15 2023-05-09 北京志凌海纳科技有限公司 VRRP brain crack inhibition method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
US20200257593A1 (en) Storage cluster configuration change method, storage cluster, and computer system
US11320991B2 (en) Identifying sub-health object storage devices in a data storage system
CN110071821B (en) Method, node and storage medium for determining the status of a transaction log
WO2019154394A1 (en) Distributed database cluster system, data synchronization method and storage medium
CN105933407B (en) method and system for realizing high availability of Redis cluster
WO2021136422A1 (en) State management method, master and backup application server switching method, and electronic device
CN110377487A (en) A kind of method and device handling high-availability cluster fissure
JP2007279890A (en) Backup system and method
CN102394914A (en) Cluster brain-split processing method and device
CN105471622A (en) High-availability method and system for main/standby control node switching based on Galera
WO2012097588A1 (en) Data storage method, apparatus and system
CN111460039A (en) Relational database processing system, client, server and method
JP2012173996A (en) Cluster system, cluster management method and cluster management program
CN106919473A (en) A kind of data disaster recovery and backup systems and method for processing business
CN108173971A (en) A kind of MooseFS high availability methods and system based on active-standby switch
CN112527567A (en) System disaster tolerance method, device, equipment and storage medium
CN114138732A (en) Data processing method and device
CN107071189B (en) Connection method of communication equipment physical interface
CN105824571A (en) Data seamless migration method and device
CN107357800A (en) A kind of database High Availabitity zero loses solution method
WO2021115043A1 (en) Distributed database system and data disaster backup drilling method
CN112783694B (en) Long-distance disaster recovery method for high-availability Redis
CN112887367B (en) Method, system and computer readable medium for realizing high availability of distributed cluster
CN116185697A (en) Container cluster management method, device and system, electronic equipment and storage medium
JP6511737B2 (en) Redundant system, redundant method and redundant program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination