WO2019119263A1 - Distributed cluster split-brain processing method, apparatus, and device - Google Patents

Distributed cluster split-brain processing method, apparatus, and device Download PDF

Info

Publication number
WO2019119263A1
WO2019119263A1 PCT/CN2017/117131 CN2017117131W WO2019119263A1 WO 2019119263 A1 WO2019119263 A1 WO 2019119263A1 CN 2017117131 W CN2017117131 W CN 2017117131W WO 2019119263 A1 WO2019119263 A1 WO 2019119263A1
Authority
WO
WIPO (PCT)
Prior art keywords
voting
service
cluster
clusters
initiate
Prior art date
Application number
PCT/CN2017/117131
Other languages
French (fr)
Chinese (zh)
Inventor
关超
Original Assignee
海能达通信股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 海能达通信股份有限公司 filed Critical 海能达通信股份有限公司
Priority to PCT/CN2017/117131 priority Critical patent/WO2019119263A1/en
Publication of WO2019119263A1 publication Critical patent/WO2019119263A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications

Definitions

  • the present invention relates to the field of distributed clusters, and in particular, to a method, device and device for processing a distributed cluster brain split.
  • a highly available distributed cluster there are two clusters that are mutually backed up.
  • one of the clusters serves the user, and the other cluster acts as a backup cluster and passes each other through the heartbeat link. Achieved the main and standby negotiations.
  • only one of the two clusters can be in a viable state. That is, only one of the clusters can serve the user. If both are in a viable state, the user is provided with the user. Service, and the inability of the two to synchronize data, indicates that there is a situation of brain splitting, which will lead to confusion of user data. Specifically, there may be inconsistencies in user data stored in the two clusters, so It is impossible to know which user data stored in the cluster is valid data.
  • the embodiment of the present invention discloses a method, a device, and a device for processing a distributed cluster brain split, which solves the problem that the cluster that selects an activity cannot be provided for the user in the prior art, for the voter and the use.
  • the voting is inconsistent, the voting result may be unavailable, and if the non-voting party fails, the voting result may not be known, resulting in a problem of brain splitting.
  • An embodiment of the present invention provides a method for processing a distributed cluster brain split, and the method may include:
  • the two service clusters connected to each other in the distributed system cannot detect the running status of the other party, the two service clusters are switched to the read-only mode.
  • the service cluster having the right to initiate voting initiates a vote to the set of voters; wherein the set of voters is determined from the use terminal and/or the arbitration terminal according to a preset selection rule; a terminal that performs network operation in any one of two service clusters, where the arbitration terminal is a server other than the use terminal in the network node;
  • the polling service cluster obtains a predetermined number of votes within a preset time, the polling service cluster is switched from the read-only mode to the normal service mode, and the voting result is notified that the two clusters are not initiated. Voting business cluster.
  • determining, according to the judgment result that the connection between the two service clusters and the database is normal, determining, from the two service clusters, at most one service cluster having the right to initiate voting, includes:
  • each of the two service clusters detects whether another service cluster initiates voting
  • the first service cluster of the two service clusters When the first service cluster of the two service clusters does not detect that other service clusters initiate voting, the first service cluster has the right to initiate voting;
  • the service cluster that is connected to the database has the right to initiate voting
  • connection between the two service clusters and the database is abnormal, the two service clusters do not have the right to initiate voting.
  • the service cluster with the right to initiate voting initiates a vote to the set of voters, including:
  • the business cluster that initiated the vote writes a voting record and a timestamp in the database.
  • each of the two service clusters detects whether other service clusters initiate voting, including:
  • the first service cluster of the two service clusters queries other voting records and timestamps in the database
  • the other service clusters do not initiate voting.
  • determining the set of voters from the using terminal and/or the arbitration terminal according to a preset selection rule including:
  • the arbitration terminal is selected from the arbitration terminals that satisfy the preset condition to determine the voter together with the use terminal that meets the preset condition. set.
  • the preset condition is:
  • the terminal is online for more than the preset time and the state is stable.
  • the notifying the voting result to the service clusters in the two clusters that do not initiate voting including:
  • the service cluster that has not initiated voting reads the voting result from the database.
  • the notifying the voting result to the service clusters in the two clusters that do not initiate voting including:
  • the voting result is sent to the service cluster that has not initiated voting.
  • the method further includes:
  • the service cluster that initiated the voting does not obtain the predetermined number of votes within a preset time
  • the service clusters that have not initiated the voting in the two service clusters initiate voting and return to the service cluster that performs the judgment to initiate the voting to obtain the preset time. The number of tickets booked.
  • the predetermined number of votes is more than half of the votes in the set of voters.
  • the number of votes obtained by the polling business cluster is a weighted sum of the voter votes.
  • the weight of the voter changes according to its importance.
  • a business cluster that does not have the right to initiate voting does not initiate a vote.
  • the embodiment of the invention further provides a processing device for a distributed cluster brain split, the device comprising:
  • a first mode switching unit configured to switch the two service clusters into a read-only mode when the two service clusters connected to each other in the distributed system cannot sense the running state of the other party;
  • a first determining unit configured to determine whether the connection between the two service clusters and the database is normal
  • a first determining unit configured to determine, according to a judgment result that the connection between the two service clusters and the database is normal, at most one service cluster having a voting right from the two service clusters;
  • a first initiating voting unit configured to initiate a voting to a set of voters by the service cluster having the right to initiate voting; wherein the set of voters is determined from the using terminal and/or the arbitration terminal according to a preset selection rule
  • the use terminal is a terminal that performs network operation by connecting any one of the two service clusters, where the arbitration terminal is a server other than the use terminal in the network node;
  • a second determining unit configured to determine whether the service cluster that initiated the voting obtains a predetermined number of votes within a preset time
  • a second mode switching unit configured to: if the service cluster that initiates voting obtains a predetermined number of votes in a preset time, switch the service cluster that initiates the voting from a read-only mode to a normal service mode;
  • the notification unit is configured to notify the service cluster that does not initiate voting in the two clusters if the service cluster that initiated the voting obtains a predetermined number of votes within a preset time.
  • the first determining unit includes:
  • a detecting subunit configured to: if each of the two service clusters is connected to the data, each of the two service clusters detects whether another service cluster initiates voting;
  • a first determining subunit configured to: when the first service cluster of the two service clusters does not detect that another service cluster initiates voting, determining that the first service cluster has the right to initiate voting;
  • a second determining subunit configured to determine that a service cluster that is normally connected to the database has the right to initiate voting if only one of the two service clusters is normally connected to the database
  • a third determining subunit configured to determine that the two service clusters do not have the right to initiate voting if the connection between the two service clusters and the database is abnormal.
  • the first initiating voting unit includes:
  • the detecting subunit includes:
  • Querying a subunit configured to query a voting record and a timestamp of the service cluster in the database
  • a determining subunit configured to: when the voting record and the timestamp of the other service cluster exist and the current time does not exceed a preset time starting from the timestamp, determine that the other service cluster initiates voting; when the other When the voting record of the service cluster does not exist or the current time exceeds the preset time from the time stamp, it is determined that the other service cluster does not initiate voting.
  • it also includes:
  • a third determining unit configured to determine whether the number of terminals that meet the preset condition in the using terminal reaches a preset number of voting sets
  • a second determining unit configured to determine, from the user, a voter in the set of voters if the number of used terminals satisfying the preset condition exceeds the preset number of voters
  • a third determining unit configured to: if the number of used terminals that meet the preset condition does not exceed the preset number of voter sets, select an arbitration terminal from the arbitration terminal that meets the preset condition to determine with the use terminal of the preset condition The set of voters.
  • the preset condition is:
  • the terminal is online for more than the preset time and the state is stable.
  • a first sending subunit configured to send the voting result to a using terminal and/or an arbitration terminal connected to the service cluster that initiates the voting;
  • a second sending subunit configured to send the voting result to the service cluster that does not initiate voting when the using terminal and/or the arbitration terminal are connected to the service cluster that does not initiate voting.
  • it also includes:
  • the second initiating voting unit is configured to: if the service cluster that initiates the voting does not obtain the predetermined number of votes in a preset time, the service cluster that has not initiated the voting in the two service clusters initiates voting and returns to the second determining unit.
  • the predetermined number of votes is more than half of the votes in the set of voters.
  • the number of votes obtained by the polling business cluster is a weighted sum of the voter votes.
  • the weight of the voter changes according to its importance.
  • a business cluster that does not have the right to initiate voting does not initiate a vote.
  • Embodiments of the present invention provide a storage medium on which a program is stored, and when the program is executed by a processor, the distributed cluster splitting processing method is implemented.
  • An embodiment of the present invention provides a processing device for distributed cluster brain splitting, where the distributed cluster device includes: a memory for storing a program; a processor for running the program, when the processor runs In the program, the processor implements the processing method of the distributed cluster brain split according to the present invention.
  • the two service clusters that are connected to each other cannot sense the running state of the other party, the two service clusters are switched to the read-only mode, and according to whether the two service clusters and the database can be normally connected. As a result, it is determined that at most one service cluster having the right to initiate voting initiates a vote, and the service cluster without the right to initiate voting does not initiate voting, and the service cluster having the right to initiate voting initiates a vote to the set of voters, if within a preset time The predetermined number of votes is obtained, and the service cluster that initiated the voting is switched from the read-only mode to the normal service mode, the voting result is notified to the service clusters in the two clusters that have not initiated the voting, and the work of the service cluster that has not initiated the voting is stopped.
  • the voting result is closer to the actual use.
  • the voting result is the result of multi-path transmission, and the voting result can be known even if a part of the link that has not initiated the voting fails.
  • FIG. 1 is a schematic flowchart of a method for processing a distributed cluster brain split according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a method for determining a service cluster with voting rights according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a cluster initiated voting in a distributed cluster according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of notifying another cluster after voting is completed in a distributed cluster according to an embodiment of the present invention
  • FIG. 5 is a schematic structural diagram of a processing device for a distributed cluster brain split according to an embodiment of the present disclosure
  • FIG. 6 is a schematic structural diagram of a processing device for a distributed cluster brain split according to an embodiment of the present invention.
  • FIG. 1 a schematic flowchart of a method for processing a distributed cluster brain split according to an embodiment of the present invention is shown.
  • the method may include:
  • a highly available distributed cluster there may be two service clusters connected to each other, wherein a plurality of service clusters connected to each other may sense the state of each other through a certain mechanism, for example, through a heartbeat chain.
  • the mode of the road senses whether the other party is in the normal running state.
  • a heartbeat link fault occurs, that is, the two service clusters connected to each other cannot sense the running status of the other party. Otherwise, when either of the two service clusters appears In the case of downtime, the two business clusters can no longer sense the other party's operating status.
  • the two service clusters that are connected to each other may also be two service clusters that are mutually backed up.
  • the read-only mode can be understood as that the terminal can only read data from the database through the service cluster, but cannot write data in the database, for example, cannot add or modify data in the database.
  • the cluster can filter the service type at the agent entry, filter out other services except the data read service, that is, accept only the service that reads the data, and does not accept the data read service. Other services; or it can also be restricted in the database write service, that is, the business that does not perform database write data.
  • S102 Determine whether the connection between the two service clusters and the database is normal, and determine at most one of the two service clusters according to a judgment result of whether the connection between the two service clusters and the database is normal. a business cluster that initiates voting rights;
  • a service cluster having the right to initiate voting may initiate a vote
  • a service cluster that does not have the right to initiate voting may not initiate a vote
  • the two service clusters it is determined whether the connection between the two service clusters and the database is normal, and it can be understood that the two service clusters can read and write the database normally.
  • the two services may be determined. Whether the cluster can read the data in the database normally, and determine whether the two service clusters can normally write data in the database.
  • S102 may specifically include:
  • the service cluster that initiates the voting needs to write the voting record and the timestamp in the database, so that any one of the service clusters can determine whether the service cluster has initiated voting according to the voting records and timestamps of other service clusters.
  • the first service cluster of the two service clusters queries the database for the voting records and time stamps of other service clusters;
  • the other service clusters do not initiate voting.
  • the two service clusters connected to each other include: cluster A and cluster B, when the first cluster is cluster A, the other clusters are cluster B; if the first cluster is cluster B, the other clusters are cluster A.
  • the timestamp includes the status of the service cluster, and the running status of the service cluster may include: the health status of the service cluster, that is, whether the service cluster is in a live state, whether a vote is initiated, or the like. Therefore, the two service clusters can learn through the database whether other service clusters have initiated voting.
  • both service clusters can obtain the running status of the other party from the database, where a service cluster detects If other service clusters connected to the service cluster do not initiate voting, the service cluster has the right to initiate voting. If other service clusters connected to the service cluster have already initiated voting, the service cluster does not have the current state. The right to vote.
  • cluster B cannot be connected to the database. In this state, cluster B cannot obtain cluster A from the database.
  • the running state while cluster B can not send the running status to the database.
  • S103 The service cluster with the right to initiate voting initiates a vote to the set of voters; wherein the set of voters is determined from the use terminal and/or the arbitration terminal according to a preset selection rule; a terminal that connects any one of the two service clusters for network operation, where the arbitration terminal is a server other than the use terminal in the network node;
  • only one service cluster having the right to initiate voting is determined in S102, that is, only at most one service cluster having the right to initiate voting can initiate voting, and when a service cluster with voting rights is determined is determined.
  • the service cluster with the right to initiate voting initiates a vote to a preset set of voters.
  • two service clusters connected to each other are cluster A and cluster B.
  • cluster A and cluster B have heartbeat link failures, that is, the two parties cannot sense the other party's running status, first determine the cluster.
  • the connection between A and the database and the connection between the cluster B and the database are normal; if both cluster A and cluster B can be connected to the database, both cluster A and cluster B have the right to initiate voting; then, cluster A and Cluster B determines whether the other party has initiated voting. If cluster B detects that cluster A has not initiated voting, cluster B is the cluster with the right to initiate voting, and then cluster B can initiate voting to the preset set of voters.
  • the set of voters when determining the set of voters, if User5 and User6 are not frequently online users, User5 and User6 are considered to be users without voting rights, and therefore do not belong to the range considered by the voter set, as can be seen from the figure.
  • the set of voters is User1, User2, User3, User4, Abiter1, Abiter2, and Abiter3, but User3 in the voter set may have a single point of failure, so voting cannot be performed, but even if there are a small number of terminals that cannot vote, it does not affect. The result of the vote.
  • the voter in the set of voters may be pre-configured, for example, may be preset by a technician, or may be dynamically selected according to the user's situation; however, in either case, The priority of the user is required to be followed.
  • the set of voters is determined from the using terminal and the arbitration terminal according to a preset selection rule, including:
  • the arbitration terminal is selected from the arbitration terminal satisfying the preset condition to determine the vote together with the use terminal satisfying the preset condition. Collection.
  • the preset condition may be that the online time of the use terminal exceeds a preset time and the state is stable.
  • the use terminal is often online and stable in state, wherein the frequent online can be understood as a network operation often performed by two interconnected service clusters mentioned in S101, such as a fixed collection station of a public security system, a bank. Server, etc.
  • the set of voters may be determined from the use terminal; when the online time of the use terminal exceeds the preset time and the number of states is stable, For the voter set determination, one part may be determined from the user terminal, and some part is determined from the arbitration terminal; when the use terminal whose online time exceeds the preset time and the state is stable, all the voter sets may be determined from the arbitration terminal. . Therefore, even if a part of the use terminal fails, the execution of the embodiment is not affected, that is, even if a part of the use terminal fails, the service cluster can initiate voting.
  • the user terminal may be a terminal that performs network operations by connecting any one of the two service clusters mentioned above, for example, it may be understood that the terminal is used to log in to the two service clusters. Any one of the terminals that provide the network service by the service cluster in which the terminal is logged in.
  • the network node also includes many frequently-on-line servers, but these servers do not need to be connected to any of the two service clusters for network operations. Such servers can be arbitrator terminals, voting in the voter set. The person can also be determined from the arbitrator terminal.
  • S104 Determine whether the service cluster that initiated the voting obtains a predetermined number of votes within a preset time.
  • the predetermined number of votes may be the majority of the votes in the set of voters, and more specifically, as an example, may be more than half of the votes in the set of voters. Moreover, the number of voters in the set of voters may be an odd number or an even number.
  • the business cluster that initiated the voting may obtain more than half of the votes in the set of voters within a preset time. Successful for voting.
  • the number of votes obtained by the polling business cluster is a weighted sum of the voter votes, and the weight of the voter changes according to importance, and the importance here is the importance of the voter.
  • S105 If the service cluster that initiates voting obtains a predetermined number of votes within a preset time, the service cluster that initiates the voting is switched from the read-only mode to the normal service mode, and the voting result is notified to the two clusters. A business cluster that did not initiate a vote.
  • cluster B initiates voting and obtains more than half of the votes in the set of voters within a preset time, it indicates that the voting is successful. At this time, cluster B can be considered as the primary cluster.
  • Cluster A is the backup cluster of cluster B. It can be understood that after B initiates a successful vote, B is a cluster that provides normal services by using the terminal, and A stops the service at this time.
  • the cluster that successfully initiates the voting can connect to the terminal normally, provide network services for the use terminal, and can perform read and write operations in the database.
  • the service cluster that notifies the two clusters that the voting is not initiated may specifically include the following two implementation manners:
  • Embodiment 1 Write the voting result into the database; the service cluster that has not initiated voting reads the voting result from the database.
  • Embodiment 2 transmitting the voting result to a use terminal and/or an arbitration terminal connected to the service cluster that initiates voting; when the use terminal and/or the arbitration terminal are connected to the service cluster that does not initiate voting The voting result is sent to the service cluster that has not initiated voting.
  • the voting result is sent to a voter connected to the service cluster that initiated the vote and a use terminal other than the voter, wherein the voter uses the terminal and/or the arbitration to vote for the service cluster that initiated the vote. terminal.
  • the solution of the first embodiment may be executed, and the solution of the second embodiment may be executed.
  • Embodiment 1 and Embodiment 2 are simultaneously performed.
  • a preferred embodiment is to perform both the first embodiment and the second embodiment.
  • cluster B is a service cluster that initiates voting
  • cluster A is a service cluster that does not initiate voting.
  • cluster B can inform the database of the voting result and the voters connected to cluster B.
  • User terminals other than voters where User1, User2, User3, User4, Arbiter1, Arbiter2, and Arbiter3 are voters, and User5 and User6 are non-voting user terminals, for example, can be used other than voters.
  • Terminal where User3 may be a use terminal that is not connected to cluster B, and the reason for not connecting with cluster B may be a single point of failure, or a network failure, and then a database that can receive the voting result
  • the voter or the use terminal other than the voter informs the cluster A of the vote result.
  • the service cluster that initiated the voting does not obtain more than half of the votes in the set of voters within a preset time, and the voting service cluster is not initiated in the two service clusters.
  • the set of voters initiates a vote and returns to execution S104. It should be noted that the service clusters that have not initiated voting in the two service clusters can initiate voting.
  • the premise that the service clusters that have not initiated voting can be normally connected with the database.
  • the two service clusters that are connected to each other cannot sense the running state of the other party, the two service clusters are switched to the read-only mode, and according to whether the two service clusters and the database can be normally connected. As a result, it is determined that at most one service cluster having the right to initiate voting initiates a vote, and the service cluster without the right to initiate voting does not initiate voting, and the service cluster having the right to initiate voting initiates a vote to the set of voters, if within a preset time The predetermined number of votes is obtained, and the service cluster that initiated the voting is switched from the read-only mode to the normal service mode, the voting result is notified to the service clusters in the two clusters that have not initiated the voting, and the work of the service cluster that has not initiated the voting is stopped.
  • the voting result is closer to the actual use.
  • the voting result is the result of multi-path transmission, and the voting result can be known even if a part of the link that has not initiated the voting fails.
  • the device may include:
  • the first mode switching unit 501 is configured to switch the two service clusters into a read-only mode when the two service clusters connected to each other in the distributed system cannot detect the running state of the other party;
  • the first determining unit 502 is configured to determine whether the connection between the two service clusters and the database is normal;
  • a first determining unit 503 configured to determine, according to a judgment result that the connection between the two service clusters and the database is normal, at most one service cluster having the right to initiate voting from the two service clusters;
  • a first initiating voting unit 504 configured to initiate a voting to a set of voters by the service cluster having the right to initiate voting; wherein the set of voters is determined from the using terminal and/or the arbitration terminal according to a preset selection rule
  • the use terminal is a terminal that performs network operation by connecting any one of the two service clusters, and the arbitration terminal is a server of the network node other than the use terminal;
  • the second determining unit 505 is configured to determine whether the service cluster that initiated the voting obtains a predetermined number of votes within a preset time period;
  • the second mode switching unit 506 is configured to switch the service cluster that initiates the voting from the read-only mode to the normal service mode, if the service cluster that initiates the voting obtains a predetermined number of votes within a preset time;
  • the notification unit 507 is configured to notify the service cluster that does not initiate voting in the two clusters if the service cluster that initiated the voting obtains a predetermined number of votes within a preset time.
  • the first determining unit includes:
  • a detecting subunit configured to: if each of the two service clusters is connected to the data, each of the two service clusters detects whether another service cluster initiates voting;
  • a first determining subunit configured to: when the first service cluster of the two service clusters does not detect that another service cluster initiates voting, determining that the first service cluster has the right to initiate voting;
  • a second determining subunit configured to: if only one of the two service clusters is normally connected to the database, determine that the service cluster that is normally connected to the database has the right to initiate voting;
  • a third determining subunit configured to determine that the two service clusters do not have the right to initiate voting if the connection between the two service clusters and the database is abnormal.
  • the first initiating voting unit includes:
  • the detecting subunit includes:
  • Querying a subunit configured to query a voting record and a timestamp of the service cluster in the database
  • a determining subunit configured to: when the voting record and the timestamp of the other service cluster exist and the current time does not exceed a preset time starting from the timestamp, determine that the other service cluster initiates voting; when the other When the voting record of the service cluster does not exist or the current time exceeds the preset time from the time stamp, it is determined that the other service cluster does not initiate voting.
  • it also includes:
  • a third determining unit configured to determine whether the number of terminals that meet the preset condition in the using terminal reaches a preset number of voting sets
  • a second determining unit configured to determine, from the user, a voter in the set of voters if the number of used terminals satisfying the preset condition exceeds the preset number of voters
  • a third determining unit configured to: if the number of used terminals that meet the preset condition does not exceed the preset number of voter sets, select an arbitration terminal from the arbitration terminal that meets the preset condition to determine with the use terminal of the preset condition The set of voters.
  • the preset condition is:
  • the terminal is online for more than the preset time and the state is stable.
  • the notification unit includes:
  • the notification unit includes:
  • a first sending subunit configured to send the voting result to a using terminal and/or an arbitration terminal connected to the service cluster that initiates the voting;
  • a second sending subunit configured to send the voting result to the service cluster that does not initiate voting when the using terminal and/or the arbitration terminal are connected to the service cluster that does not initiate voting.
  • it also includes:
  • the second initiating voting unit is configured to: if the service cluster that initiates the voting does not obtain the predetermined number of votes in a preset time, the service cluster that has not initiated the voting in the two service clusters initiates voting and returns to the second determining unit.
  • the predetermined number of votes is more than half of the votes in the set of voters.
  • the number of votes obtained by the polling business cluster is a weighted sum of the voter votes.
  • the weight of the voter changes according to its importance.
  • a business cluster that does not have the right to initiate voting does not initiate a vote.
  • the voting result is the result of multi-path transmission, even if the part of the link that has not initiated the voting fails, the voting result can be known.
  • Embodiments of the present invention provide a storage medium on which a program is stored, and when the program is executed by a processor, the distributed cluster splitting processing method is implemented.
  • FIG. 6 is a schematic structural diagram of a processing device for a distributed cluster brain splitting according to an embodiment of the present invention.
  • the device includes a memory 601 and a processor 602.
  • the memory 601 is configured to store a program
  • the processor 602 is configured to run the program. Specifically, when the processor executes the program, the following steps are implemented:
  • the two service clusters connected to each other in the distributed system cannot detect the running status of the other party, the two service clusters are switched to the read-only mode.
  • the service cluster having the right to initiate voting initiates a vote to the set of voters; wherein the set of voters is determined from the use terminal and/or the arbitration terminal according to a preset selection rule; a terminal that performs network operation in any one of two service clusters, where the arbitration terminal is a server other than the use terminal in the network node;
  • the polling service cluster obtains a predetermined number of votes within a preset time, the polling service cluster is switched from the read-only mode to the normal service mode, and the voting result is notified that the two clusters are not initiated. Voting business cluster.
  • determining, according to the judgment result that the connection between the two service clusters and the database is normal, determining, from the two service clusters, at most one service cluster having the right to initiate voting, includes:
  • each of the two service clusters detects whether another service cluster initiates voting
  • the first service cluster of the two service clusters When the first service cluster of the two service clusters does not detect that other service clusters initiate voting, the first service cluster has the right to initiate voting;
  • the service cluster that is connected to the database has the right to initiate voting
  • connection between the two service clusters and the database is abnormal, the two service clusters do not have the right to initiate voting.
  • the service cluster with the right to initiate voting initiates a vote to the set of voters, including:
  • the business cluster that initiated the vote writes a voting record and a timestamp in the database.
  • each of the two service clusters detects whether other service clusters initiate voting, including:
  • the first service cluster of the two service clusters queries other voting records and timestamps in the database
  • the other service clusters do not initiate voting.
  • determining the set of voters from the using terminal and/or the arbitration terminal according to a preset selection rule including:
  • the arbitration terminal is selected from the arbitration terminals that satisfy the preset condition to determine the voter together with the use terminal that meets the preset condition. set.
  • the preset condition is:
  • the terminal is online for more than the preset time and the state is stable.
  • the notifying the voting result to the service clusters in the two clusters that do not initiate voting including:
  • the service cluster that has not initiated voting reads the voting result from the database.
  • the notifying the voting result to the service cluster in the two clusters that does not initiate voting including:
  • the voting result is sent to the service cluster that has not initiated voting.
  • the method further includes:
  • the service cluster that initiated the voting does not obtain the predetermined number of votes within a preset time
  • the service clusters that have not initiated the voting in the two service clusters initiate voting and return to the service cluster that performs the judgment to initiate the voting to obtain the preset time. The number of tickets booked.
  • the predetermined number of votes is more than half of the votes in the set of voters.
  • the number of votes obtained by the polling business cluster is a weighted sum of the voter votes.
  • the weight of the voter changes according to its importance.
  • a business cluster that does not have the right to initiate voting does not initiate a vote.
  • the device in this embodiment may be a server, a PC, a PAD, a mobile phone, or the like.
  • the memory in the device of this embodiment may include a non-permanent memory, a random access memory (RAM), and/or a non-volatile memory in a computer-readable storage medium, such as a read-only memory or a flash memory (falsh).
  • RAM random access memory
  • the memory includes at least one memory chip.
  • the voting result is More close to the actual use situation; in addition, the voting result is the result of multi-path transmission, even if the part of the link that has not initiated the voting fails, the voting result can be known.

Abstract

Disclosed are a distributed cluster split-brain processing method, apparatus, and device. The method comprises: when running states of opposite parties cannot be sensed between two mutually connected service clusters, switching the two service clusters into a read-only mode; determining, according to the result of whether the two service clusters can be normally connected to a database, at most one service cluster having the right to initiate voting, the service cluster having the voting right initiating voting to a preset voter set, and the service cluster not having the right to initiate voting not initiating voting; and if the service cluster that initiates voting obtains a predetermined number of votes within a preset time period, switching the service cluster that initiates voting from the read-only mode to a normal service mode, notifying the service cluster that does not initiate voting in the two clusters of a voting result, and stopping the work of the service cluster that does not initiate voting. Therefore, by means of the method of the present invention, the occurrence of split-brain can be completely avoided, and the voting result is closer to an actual use situation.

Description

一种分布式集群脑裂的处理方法、装置及设备Method, device and device for processing distributed cluster brain split 技术领域Technical field
本发明涉及分布式集群领域,尤其涉及一种分布式集群脑裂的处理方法、装置及设备。The present invention relates to the field of distributed clusters, and in particular, to a method, device and device for processing a distributed cluster brain split.
背景技术Background technique
在高可用的分布式集群中,会存在互为备份的两个集群,两个集群在处于连接状态时,由其中的一个集群为用户提供服务,另一个集群作为备份集群,彼此通过心跳链路达成主备协商。但是,当两个集群出现心跳链路故障时,两个集群中只可以有一个为存活状态,即只可以由其中的一个集群为用户提供服务,若二者都为存活状态,同时为用户提供服务,而且二者无法进行数据的同步,就表示出现了脑裂的状况,这种情况会导致用户数据的混乱,具体的,可能会出现两个集群中存储的用户数据不一致的情况,因此就无法得知到底哪个集群存储的用户数据是有效的数据。In a highly available distributed cluster, there are two clusters that are mutually backed up. When two clusters are connected, one of the clusters serves the user, and the other cluster acts as a backup cluster and passes each other through the heartbeat link. Achieved the main and standby negotiations. However, when there is a heartbeat link failure in two clusters, only one of the two clusters can be in a viable state. That is, only one of the clusters can serve the user. If both are in a viable state, the user is provided with the user. Service, and the inability of the two to synchronize data, indicates that there is a situation of brain splitting, which will lead to confusion of user data. Specifically, there may be inconsistencies in user data stored in the two clusters, so It is impossible to know which user data stored in the cluster is valid data.
现有技术中,提供了很多解决脑裂的方法,但是很多方法都未解决如何无分歧的选择活动的集群为用户提供服务,并且,对于投票者和使用者投票不一致时,可能会导致投票结果不可用,并且当非投票的一方若出现故障时,也无法得知投票结果,从而导致脑裂的发生。In the prior art, many methods for solving brain splitting are provided, but many methods fail to solve how to select a cluster of activities without disagreement to provide services for users, and when votes are not consistent between voters and users, voting results may be caused. Not available, and if a non-voting party fails, it will not be able to know the result of the vote, resulting in brain splitting.
发明内容Summary of the invention
有鉴于此,本发明实施例公开了一种分布式集群脑裂的处理方法、装置及设备,解决了现有技术中,无法无分歧的选择活动的集群为用户提供服务,对于投票者和使用者投票不一致时,可能会导致投票结果不可用,并且当非投票的一方若出现故障时,也无法得知投票结果,从而导致脑裂发生的问题。In view of this, the embodiment of the present invention discloses a method, a device, and a device for processing a distributed cluster brain split, which solves the problem that the cluster that selects an activity cannot be provided for the user in the prior art, for the voter and the use. When the voting is inconsistent, the voting result may be unavailable, and if the non-voting party fails, the voting result may not be known, resulting in a problem of brain splitting.
本发明实施例提供了一种分布式集群脑裂的处理方法,所述方法可以包括:An embodiment of the present invention provides a method for processing a distributed cluster brain split, and the method may include:
当分布式系统中相互连接的两个业务集群之间无法感知到对方的运行状态时,所述两个业务集群切换为只读模式;When the two service clusters connected to each other in the distributed system cannot detect the running status of the other party, the two service clusters are switched to the read-only mode.
判断所述两个业务集群与数据库之间的连接是否正常,并依据所述两个业务集群与所述数据库之间的连接是否正常的判断结果,从所述两个业务集群中确定至多一个具有发起投票权利的业务集群;Determining whether the connection between the two service clusters and the database is normal, and determining, according to a judgment result of whether the connection between the two service clusters and the database is normal, determining at most one of the two service clusters a business cluster that initiates voting rights;
所述具有发起投票权利的业务集群向投票者集合发起投票;其中,所述投票者集合为依据预设的选择规则从使用终端和/或仲裁终端中确定的;所述使用终端为通过连接所述两个业务集群中任何一个进行网络操作的终端,所述仲裁终端为网络节点中除所述使用终端之外的服务器;The service cluster having the right to initiate voting initiates a vote to the set of voters; wherein the set of voters is determined from the use terminal and/or the arbitration terminal according to a preset selection rule; a terminal that performs network operation in any one of two service clusters, where the arbitration terminal is a server other than the use terminal in the network node;
判断发起投票的业务集群是否在预设的时间内获得了预定的票数;Determining whether the business cluster that initiated the voting has obtained a predetermined number of votes within a preset time;
若所述发起投票的业务集群在预设的时间内获得了预定的票数,将所述发起投票的业务集群由只读模式切换为正常服务模式,将投票结果通知所述两个集群中未发起投票的业务集群。If the polling service cluster obtains a predetermined number of votes within a preset time, the polling service cluster is switched from the read-only mode to the normal service mode, and the voting result is notified that the two clusters are not initiated. Voting business cluster.
可选的,依据所述两个业务集群与所述数据库之间的连接是否正常的判断结果,从所述两个业务集群中确定至多一个具有发起投票权利的业务集群,包括:Optionally, determining, according to the judgment result that the connection between the two service clusters and the database is normal, determining, from the two service clusters, at most one service cluster having the right to initiate voting, includes:
若所述两个业务集群与所述数据库的连接均正常,所述两个业务集群中每个业务集群分别检测其它业务集群是否发起了投票;If the connection between the two service clusters and the database is normal, each of the two service clusters detects whether another service cluster initiates voting;
当所述两个业务集群中的第一业务集群未检测到其它业务集群发起了投票,所述第一业务集群具有发起投票的权利;When the first service cluster of the two service clusters does not detect that other service clusters initiate voting, the first service cluster has the right to initiate voting;
若所述两个业务集群中只有一个与所述数据库的连接正常,则与所述数据库连接正常的业务集群具有发起投票的权利;If only one of the two service clusters is properly connected to the database, the service cluster that is connected to the database has the right to initiate voting;
若所述两个业务集群与所述数据库的连接均不正常,所述两个业务集群均不具有发起投票的权利。If the connection between the two service clusters and the database is abnormal, the two service clusters do not have the right to initiate voting.
可选的,所述具有发起投票权利的业务集群向投票者集合发起投票,包括:Optionally, the service cluster with the right to initiate voting initiates a vote to the set of voters, including:
发起投票的业务集群在所述数据库中写入投票记录和时间戳。The business cluster that initiated the vote writes a voting record and a timestamp in the database.
可选的,所述若所述两个业务集群与所述数据库的连接均正常,所述两个业务集群中每个业务集群分别检测其它业务集群是否发起了投票,包括:Optionally, if the connection between the two service clusters and the database is normal, each of the two service clusters detects whether other service clusters initiate voting, including:
所述两个业务集群中的第一业务集群在所述数据库中查询其它的投票记录和时间戳;The first service cluster of the two service clusters queries other voting records and timestamps in the database;
当所述其它业务集群的投票记录和时间戳存在且当前时间未超出从所述时间戳开始的预设时间时,所述其它业务集群发起了投票;When the voting records and time stamps of the other service clusters exist and the current time does not exceed the preset time starting from the time stamp, the other service clusters initiate voting;
当所述其它业务集群的投票记录不存在,或者所述其它业务集群的投票记录和时间戳存在且当前时间超出从所述时间戳开始的预设时间时,所述其它业务集群没有发起投票。When the voting records of the other service clusters do not exist, or the voting records and time stamps of the other service clusters exist and the current time exceeds a preset time starting from the timestamp, the other service clusters do not initiate voting.
可选的,依据预设的选择规则,从使用终端和/或仲裁终端确定所述投票者集合,包括:Optionally, determining the set of voters from the using terminal and/or the arbitration terminal according to a preset selection rule, including:
判断所述使用终端中满足预设条件的使用终端的数量是否达到了预设的投票者集合的数量;Determining whether the number of used terminals satisfying the preset condition in the using terminal reaches a preset number of voting sets;
若满足预设条件的使用终端的数量超过了预设的投票者集合的数量,则从所述使用终端中确定投票者集合中的投票者;Determining a voter in the set of voters from the use terminal if the number of use terminals satisfying the preset condition exceeds the preset number of voter sets;
若满足预设条件的使用终端的数量未超过预设的投票者集合的数量,从满足预设条件的仲裁终端中选择仲裁终端以和所述满足预设条件的使用终端一起确定所述投票者集合。If the number of used terminals that satisfy the preset condition does not exceed the preset number of voters, the arbitration terminal is selected from the arbitration terminals that satisfy the preset condition to determine the voter together with the use terminal that meets the preset condition. set.
可选的,所述预设条件为:Optionally, the preset condition is:
使用终端在线时间超过预设的时间且状态稳定。The terminal is online for more than the preset time and the state is stable.
可选的,所述将投票结果通知所述两个集群中未发起投票的业务集群,包括:Optionally, the notifying the voting result to the service clusters in the two clusters that do not initiate voting, including:
将所述投票结果写入所述数据库;Writing the voting result to the database;
所述未发起投票的业务集群从所述数据库中读取所述投票结果。The service cluster that has not initiated voting reads the voting result from the database.
可选的,所述将投票结果通知所述两个集群中未发起投票的业务集群,包括:Optionally, the notifying the voting result to the service clusters in the two clusters that do not initiate voting, including:
将所述投票结果发送给连接到所述发起投票的业务集群的使用终端和/或仲裁终端;Sending the voting result to a use terminal and/or an arbitration terminal connected to the service cluster that initiated the voting;
当所述使用终端和/或仲裁终端与所述未发起投票的业务集群相连接时,将所述投票结果发送给所述未发起投票的业务集群。When the use terminal and/or the arbitration terminal are connected to the service cluster that has not initiated voting, the voting result is sent to the service cluster that has not initiated voting.
可选的,所述判断发起投票的业务集群是否在预设的时间内获得了预定的票数后,还包括:Optionally, after determining whether the service cluster that initiated the voting obtains the predetermined number of votes within a preset time, the method further includes:
若发起投票的业务集群未在预设的时间内获得预定的票数,所述两个业务集群中未发起投票的业务集群发起投票并返回执行判断发起投票的业务集群 是否在预设的时间内获得了预定的票数。If the service cluster that initiated the voting does not obtain the predetermined number of votes within a preset time, the service clusters that have not initiated the voting in the two service clusters initiate voting and return to the service cluster that performs the judgment to initiate the voting to obtain the preset time. The number of tickets booked.
可选的,所述预定的票数为所述投票者集合中半数以上的票数。Optionally, the predetermined number of votes is more than half of the votes in the set of voters.
可选的,发起投票的业务集群获得的票数是所述投票者投票的加权和。Optionally, the number of votes obtained by the polling business cluster is a weighted sum of the voter votes.
可选的,所述投票者的权重根据其重要性而改变。Alternatively, the weight of the voter changes according to its importance.
可选的,不具有发起投票权利的业务集群不发起投票。Optionally, a business cluster that does not have the right to initiate voting does not initiate a vote.
本发明实施例还提供了一种分布式集群脑裂的处理装置,所述装置包括:The embodiment of the invention further provides a processing device for a distributed cluster brain split, the device comprising:
第一模式切换单元,用于当分布式系统中相互连接的两个业务集群之间无法感知到对方的运行状态时,将所述两个业务集群切换为只读模式;a first mode switching unit, configured to switch the two service clusters into a read-only mode when the two service clusters connected to each other in the distributed system cannot sense the running state of the other party;
第一判断单元,用于判断所述两个业务集群与数据库之间的连接是否正常;a first determining unit, configured to determine whether the connection between the two service clusters and the database is normal;
第一确定单元,用于依据所述两个业务集群与所述数据库之间的连接是否正常的判断结果,从所述两个业务集群中确定至多一个具有发起投票权利的业务集群;a first determining unit, configured to determine, according to a judgment result that the connection between the two service clusters and the database is normal, at most one service cluster having a voting right from the two service clusters;
第一发起投票单元,用于所述具有发起投票权利的业务集群向投票者集合发起投票;其中,所述投票者集合为依据预设的选择规则,从使用终端和/或仲裁终端中确定的;所述使用终端为通过连接所述两个业务集群中任何一个进行网络操作的终端,所述仲裁终端为网络节点中除所述使用终端之外的服务器;a first initiating voting unit, configured to initiate a voting to a set of voters by the service cluster having the right to initiate voting; wherein the set of voters is determined from the using terminal and/or the arbitration terminal according to a preset selection rule The use terminal is a terminal that performs network operation by connecting any one of the two service clusters, where the arbitration terminal is a server other than the use terminal in the network node;
第二判断单元,用于判断发起投票的业务集群是否在预设的时间内获得了预定的票数;a second determining unit, configured to determine whether the service cluster that initiated the voting obtains a predetermined number of votes within a preset time;
第二模式切换单元,用于若所述发起投票的业务集群在预设的时间内获得了预定的票数,将所述发起投票的业务集群由只读模式切换为正常服务模式;a second mode switching unit, configured to: if the service cluster that initiates voting obtains a predetermined number of votes in a preset time, switch the service cluster that initiates the voting from a read-only mode to a normal service mode;
通知单元,用于若所述发起投票的业务集群在预设的时间内获得了预定的票数,将投票结果通知所述两个集群中未发起投票的业务集群。The notification unit is configured to notify the service cluster that does not initiate voting in the two clusters if the service cluster that initiated the voting obtains a predetermined number of votes within a preset time.
可选的,所述第一确定单元包括:Optionally, the first determining unit includes:
检测子单元,用于若所述两个业务集群与所述数据的连接均正常,所述两个业务集群中每个业务集群分别检测其它业务集群是否发起了投票;a detecting subunit, configured to: if each of the two service clusters is connected to the data, each of the two service clusters detects whether another service cluster initiates voting;
第一确定子单元,用于当所述两个业务集群中的第一业务集群未检测到其它业务集群发起了投票,确定所述第一业务集群具有发起投票的权利;a first determining subunit, configured to: when the first service cluster of the two service clusters does not detect that another service cluster initiates voting, determining that the first service cluster has the right to initiate voting;
第二确定子单元,用于若所述两个业务集群中只有一个与所述数据库的连 接正常,则确定与所述数据库连接正常的业务集群具有发起投票的权利;a second determining subunit, configured to determine that a service cluster that is normally connected to the database has the right to initiate voting if only one of the two service clusters is normally connected to the database;
第三确定子单元,用于若所述两个业务集群与所述数据库的连接均不正常,则确定所述两个业务集群均不具有发起投票的权利。And a third determining subunit, configured to determine that the two service clusters do not have the right to initiate voting if the connection between the two service clusters and the database is abnormal.
可选的,所述第一发起投票单元包括:Optionally, the first initiating voting unit includes:
写入单元,用于在所述数据库中写入投票记录和时间戳Write unit for writing a vote record and a time stamp in the database
可选的,所述检测子单元,包括:Optionally, the detecting subunit includes:
查询子单元,用于在所述数据库中查询业务集群的投票记录和时间戳;Querying a subunit, configured to query a voting record and a timestamp of the service cluster in the database;
判定子单元,用于当所述其它业务集群的投票记录和时间戳存在且当前时间未超出从所述时间戳开始的预设时间时,判定所述其它业务集群发起了投票;当所述其它业务集群的投票记录不存在或者当前时间超出从所述时间戳开始的预设时间时,判定所述其它业务集群没有发起投票。a determining subunit, configured to: when the voting record and the timestamp of the other service cluster exist and the current time does not exceed a preset time starting from the timestamp, determine that the other service cluster initiates voting; when the other When the voting record of the service cluster does not exist or the current time exceeds the preset time from the time stamp, it is determined that the other service cluster does not initiate voting.
可选的,还包括:Optionally, it also includes:
第三判断单元,用于判断所述使用终端中满足预设条件的终端的数量是否达到了预设的投票者集合的数量;a third determining unit, configured to determine whether the number of terminals that meet the preset condition in the using terminal reaches a preset number of voting sets;
第二确定单元,用于若满足预设条件的使用终端的数量超过了预设的投票者集合的数量,则从所述使用者中确定投票者集合中的投票者;a second determining unit, configured to determine, from the user, a voter in the set of voters if the number of used terminals satisfying the preset condition exceeds the preset number of voters;
第三确定单元,用于若满足预设条件的使用终端的数量未超过预设的投票者集合的数量,从满足预设条件的仲裁终端中选择仲裁终端以和预设条件的使用终端一起确定所述投票者集合。a third determining unit, configured to: if the number of used terminals that meet the preset condition does not exceed the preset number of voter sets, select an arbitration terminal from the arbitration terminal that meets the preset condition to determine with the use terminal of the preset condition The set of voters.
可选的,所述预设条件为:Optionally, the preset condition is:
使用终端在线时间超过预设的时间且状态稳定。The terminal is online for more than the preset time and the state is stable.
可选的,包括:Optional, including:
写入子单元,用于将所述投票结果写入所述数据库;Writing a subunit for writing the voting result to the database;
读取子单元,用于所述未发起投票的业务集群从所述数据库中读取所述投票结果。Reading a subunit for reading the voting result from the database for the business cluster that did not initiate voting.
可选的,包括:Optional, including:
第一发送子单元,用于将所述投票结果发送给连接到所述发起投票的业务集群的使用终端和/或仲裁终端;a first sending subunit, configured to send the voting result to a using terminal and/or an arbitration terminal connected to the service cluster that initiates the voting;
第二发送子单元,用于当所述使用终端和/或仲裁终端与所述未发起投票的业务集群相连接时,将所述投票结果发送给所述未发起投票的业务集群。And a second sending subunit, configured to send the voting result to the service cluster that does not initiate voting when the using terminal and/or the arbitration terminal are connected to the service cluster that does not initiate voting.
可选的,还包括:Optionally, it also includes:
第二发起投票单元,用于若发起投票的业务集群未在预设的时间内获得预定的票数,所述两个业务集群中未发起投票的业务集群发起投票并返回第二判断单元。The second initiating voting unit is configured to: if the service cluster that initiates the voting does not obtain the predetermined number of votes in a preset time, the service cluster that has not initiated the voting in the two service clusters initiates voting and returns to the second determining unit.
可选的,所述预定的票数为所述投票者集合中半数以上的票数。Optionally, the predetermined number of votes is more than half of the votes in the set of voters.
可选的,发起投票的业务集群获得的票数是所述投票者投票的加权和。Optionally, the number of votes obtained by the polling business cluster is a weighted sum of the voter votes.
可选的,所述投票者的权重根据其重要性而改变。Alternatively, the weight of the voter changes according to its importance.
可选的,不具有发起投票权利的业务集群不发起投票。Optionally, a business cluster that does not have the right to initiate voting does not initiate a vote.
本发明实施例提供了一种存储介质,其上存储有程序,该程序被处理器执行时实现所述分布式集群脑裂的处理方法。Embodiments of the present invention provide a storage medium on which a program is stored, and when the program is executed by a processor, the distributed cluster splitting processing method is implemented.
本发明实施例提供了一种分布式集群脑裂的处理设备,所述分布式集群设备包括:存储器,其用于存储程序;处理器,其用于运行所述程序,当所述处理器运行所述程序时,所述处理器实现了本发明所述的分布式集群脑裂的处理方法。An embodiment of the present invention provides a processing device for distributed cluster brain splitting, where the distributed cluster device includes: a memory for storing a program; a processor for running the program, when the processor runs In the program, the processor implements the processing method of the distributed cluster brain split according to the present invention.
本实施例中,当相互连接的两个业务集群之间无法感知到对方的运行状态时,将这两个业务集群切换为只读模式,根据两个业务集群与数据库之间是否能正常连接的结果确定至多一个具有发起投票权利的业务集群发起投票,而不具有发起投票权利的业务集群不发起投票,具有发起投票权利的业务集群向所述投票者集合发起投票,若在预设的时间内获得了预定的票数,将发起投票的业务集群由只读模式切换为正常服务模式,将投票结果通知这两个集群中未发起投票的业务集群,并停止未发起投票的业务集群的工作。这样,在保证数据可以进行读取的情况下,还完全的避免了脑裂的发生;并且,由于投票者集合中投票者是以使用终端为优先原则确定的,因此投票结果更加贴近实际的使用情况;除此之外,投票结果采用多路径传递的结果,即使未发起投票的一方部份链路出现故障也能获知投票结果。In this embodiment, when the two service clusters that are connected to each other cannot sense the running state of the other party, the two service clusters are switched to the read-only mode, and according to whether the two service clusters and the database can be normally connected. As a result, it is determined that at most one service cluster having the right to initiate voting initiates a vote, and the service cluster without the right to initiate voting does not initiate voting, and the service cluster having the right to initiate voting initiates a vote to the set of voters, if within a preset time The predetermined number of votes is obtained, and the service cluster that initiated the voting is switched from the read-only mode to the normal service mode, the voting result is notified to the service clusters in the two clusters that have not initiated the voting, and the work of the service cluster that has not initiated the voting is stopped. In this way, in the case of ensuring that the data can be read, the occurrence of brain splitting is completely avoided; and since the voter in the set of voters is determined by using the terminal as a priority principle, the voting result is closer to the actual use. In addition, the voting result is the result of multi-path transmission, and the voting result can be known even if a part of the link that has not initiated the voting fails.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施 例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其它的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is an embodiment of the present invention, and those skilled in the art can obtain other drawings according to the provided drawings without any creative work.
图1为本发明实施例提供的一种分布式集群脑裂的处理方法的流程示意图;1 is a schematic flowchart of a method for processing a distributed cluster brain split according to an embodiment of the present invention;
图2为本发明实施例提供的一种确定具有投票权利的业务集群的方法的流程示意图;2 is a schematic flowchart of a method for determining a service cluster with voting rights according to an embodiment of the present invention;
图3为本发明实施例提供的分布式集群中集群发起投票的示意图;3 is a schematic diagram of a cluster initiated voting in a distributed cluster according to an embodiment of the present invention;
图4为本发明实施例提供的分布式集群中投票完成后告知另外一集群的示意图;4 is a schematic diagram of notifying another cluster after voting is completed in a distributed cluster according to an embodiment of the present invention;
图5为本发明实施例提供的一种分布式集群脑裂的处理装置的结构示意图;FIG. 5 is a schematic structural diagram of a processing device for a distributed cluster brain split according to an embodiment of the present disclosure;
图6为本发明实施例提供的一种分布式集群脑裂的处理设备的结构示意图。FIG. 6 is a schematic structural diagram of a processing device for a distributed cluster brain split according to an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
参考图1,示出了本发明实施例一种分布式集群脑裂的处理方法的流程示意图,在本实施例中,所述方法可以包括:Referring to FIG. 1 , a schematic flowchart of a method for processing a distributed cluster brain split according to an embodiment of the present invention is shown. In this embodiment, the method may include:
S101:当分布式系统中相互连接的两个业务集群之间无法感知到对方的运行状态时,所述两个业务集群切换为只读模式;S101: When the two service clusters connected to each other in the distributed system cannot detect the running status of the other party, the two service clusters are switched to the read-only mode.
本实施例中,在高可用的分布式集群中,可以存在相互连接的两个业务集群,其中,相互连接的多个业务集群之间可以通过某种机制感知对方的状态,例如可以通过心跳链路的方式感知对方是否处于正常运行状态,当出现心跳链路故障,即相互连接的两个业务集群之间无法感知到对方的运行状态,除此之外,当两个业务集群中任何一方出现宕机的情况,这两个业务集群也无法再感 知到对方的运行状态。其中,进一步说,相互连接的两个业务集群还可以为互为备份的两个业务集群。In this embodiment, in a highly available distributed cluster, there may be two service clusters connected to each other, wherein a plurality of service clusters connected to each other may sense the state of each other through a certain mechanism, for example, through a heartbeat chain. The mode of the road senses whether the other party is in the normal running state. When a heartbeat link fault occurs, that is, the two service clusters connected to each other cannot sense the running status of the other party. Otherwise, when either of the two service clusters appears In the case of downtime, the two business clusters can no longer sense the other party's operating status. In addition, further, the two service clusters that are connected to each other may also be two service clusters that are mutually backed up.
本实施例中,所述只读模式可以理解为使用终端只可以通过业务集群从数据库中读取数据,但是不能在数据库中写入数据,例如不能在数据库中进行数据的增加、修改等。在只读模式下,集群可以在代理入口处进行业务种类的过滤,将除数据读取业务之外的其它业务过滤掉,即只接受读取数据的业务,不接受除数据读取业务之外的其它业务;或者还可以在数据库写入服务中进行限制,即不执行数据库写入数据的业务。In this embodiment, the read-only mode can be understood as that the terminal can only read data from the database through the service cluster, but cannot write data in the database, for example, cannot add or modify data in the database. In the read-only mode, the cluster can filter the service type at the agent entry, filter out other services except the data read service, that is, accept only the service that reads the data, and does not accept the data read service. Other services; or it can also be restricted in the database write service, that is, the business that does not perform database write data.
S102:判断所述两个业务集群与数据库之间的连接是否正常,依据所述两个业务集群与所数据库之间的连接是否正常的判断结果,从所述两个业务集群中确定至多一个具有发起投票权利的业务集群;S102: Determine whether the connection between the two service clusters and the database is normal, and determine at most one of the two service clusters according to a judgment result of whether the connection between the two service clusters and the database is normal. a business cluster that initiates voting rights;
本实施例中,需要说明的是,具有发起投票权利的业务集群可以发起投票,不具有发起投票权利的业务集群不可以发起投票。In this embodiment, it should be noted that a service cluster having the right to initiate voting may initiate a vote, and a service cluster that does not have the right to initiate voting may not initiate a vote.
本实施例中,判断相互连接的两个业务集群与数据库之间的连接是否正常,可以理解为判断这两个业务集群是否能够正常的读写数据库,具体的可以为,判断所述两个业务集群是否能够正常的读取数据库中的数据,并且判断所述两个业务集群是否能够正常的在数据库中写入数据。In this embodiment, it is determined whether the connection between the two service clusters and the database is normal, and it can be understood that the two service clusters can read and write the database normally. Specifically, the two services may be determined. Whether the cluster can read the data in the database normally, and determine whether the two service clusters can normally write data in the database.
本实施例中,为了保证不出现脑裂的情况,同一时间只可以确定至多一个具有发起投票权利的业务集群,参考图2,作为示例,S102具体可以包括:In this embodiment, in order to ensure that no brain splitting occurs, only one service cluster having the right to initiate voting can be determined at the same time. Referring to FIG. 2, as an example, S102 may specifically include:
S201:若所述两个业务集群与所述数据库的连接均正常,所述两个业务集群中每个业务集群分别检测其它业务集群是否发起了投票;S201: If the connection between the two service clusters and the database is normal, each of the two service clusters detects whether another service cluster initiates voting;
本实施例中,发起投票的业务集群需要在数据库中写入投票记录和时间戳,以使任何一个业务集群可以根据其它业务集群的投票记录和时间戳判断业务集群是否发起了投票,具体的S201可以包括:In this embodiment, the service cluster that initiates the voting needs to write the voting record and the timestamp in the database, so that any one of the service clusters can determine whether the service cluster has initiated voting according to the voting records and timestamps of other service clusters. Can include:
所述两个业务集群中的第一业务集群在所述数据库中查询其它业务集群的投票记录和时间戳;The first service cluster of the two service clusters queries the database for the voting records and time stamps of other service clusters;
当所述其它业务集群的投票记录和时间戳存在且当前时间未超出从所述时间戳开始的预设时间时,所述其它业务集群发起了投票;When the voting records and time stamps of the other service clusters exist and the current time does not exceed the preset time starting from the time stamp, the other service clusters initiate voting;
当所述其它业务集群的投票记录不存在,或者所述其它业务集群的投票记录和时间戳存在且当前时间超出从所述时间戳开始的预设时间时,所述其 它业务集群没有发起投票。When the voting records of the other service clusters do not exist, or the voting records and time stamps of the other service clusters exist and the current time exceeds a preset time starting from the time stamp, the other service clusters do not initiate voting.
距离说明:若相互连接的两个业务集群包括:集群A和集群B,则当第一集群为集群A时,其它集群为集群B;若第一集群为集群B时,其它集群为集群A。Distance description: If the two service clusters connected to each other include: cluster A and cluster B, when the first cluster is cluster A, the other clusters are cluster B; if the first cluster is cluster B, the other clusters are cluster A.
其中,所述时间戳中包括业务集群的自身状态,自身运行状态可以包括:业务集群的健康状态即业务集群是否处于存活的状态、是否发起了投票等。因此,所述两个业务集群可以通过数据库得知其它的业务集群是否发起了投票。The timestamp includes the status of the service cluster, and the running status of the service cluster may include: the health status of the service cluster, that is, whether the service cluster is in a live state, whether a vote is initiated, or the like. Therefore, the two service clusters can learn through the database whether other service clusters have initiated voting.
S202:当所述两个业务集群中的第一业务集群未检测到其它业务集群发起了投票,所述第一业务集群具有发起投票的权利;S202: When the first service cluster in the two service clusters does not detect that other service clusters initiate voting, the first service cluster has the right to initiate voting;
本实施例中,所述两个业务集群与所述数据库之间的连接均正常时,表示两个业务集群均可以从所述数据库中获取到对方的运行状态,当其中,一个业务集群检测到与该业务集群相连接的其他业务集群未发起投票,则该业务集群具有发起投票的权利,若是与该业务集群相连接的其他业务集群已经发起了投票,则当前状态下该业务集群不具有发起投票的权利。In this embodiment, when the connection between the two service clusters and the database is normal, it indicates that both service clusters can obtain the running status of the other party from the database, where a service cluster detects If other service clusters connected to the service cluster do not initiate voting, the service cluster has the right to initiate voting. If other service clusters connected to the service cluster have already initiated voting, the service cluster does not have the current state. The right to vote.
S203:若所述两个业务集群中只有一个与所述数据库的连接正常,则与所述数据库连接正常的业务集群具有发起投票的权利;S203: If only one of the two service clusters is connected to the database, the service cluster that is connected to the database has the right to initiate voting.
本实施例中,若所述两个业务集群中只有一个与所述数据的连接正常,则表示另外一个与所述数据库连接不正常的业务集群,无法从所述数据库中获取到与该业务集群相互连接的业务集群的状态,并且与所述数据库正常连接的业务集群也无法向所述数据库发送该业务集群的运行状态,因此与数据库正常连接的业务集群具有发起投票的权利,而无法与所述数据库正常连接的业务集群不具有投票的权利。In this embodiment, if only one of the two service clusters is in normal connection with the data, it indicates that another service cluster that is not connected to the database is abnormal, and the service cluster cannot be obtained from the database. The status of the interconnected service cluster, and the service cluster normally connected to the database cannot send the running status of the service cluster to the database, so the service cluster normally connected with the database has the right to initiate voting, and cannot The service cluster in which the database is normally connected does not have the right to vote.
举例说明:假设互相连接的两个业务集群为集群A和集群B,则若集群可以和数据库正常连接,集群B不可以和数据库正常连接,在该状态下,集群B无法从数据库中获取集群A的运行状态,同时集群B也无法向数据库发送运行状态。For example, suppose that two service clusters connected to each other are cluster A and cluster B. If the cluster can be connected to the database normally, cluster B cannot be connected to the database. In this state, cluster B cannot obtain cluster A from the database. The running state, while cluster B can not send the running status to the database.
S204:若所述两个业务集群与所述数据库的连接均不正常,所述两个业务集群均不具有发起投票的权利。S204: If the connection between the two service clusters and the database is abnormal, the two service clusters do not have the right to initiate voting.
本实施例中,当相互连接的两个业务集群均无法与所述数据库正常连接时,表示这两个业务集群均无法从所述数据库中获取到对方的运行状态,这两 个业务集群也无法正常执行业务。此时,二者均不具有发起投票的权利,即二者均不可以发起投票。In this embodiment, when two service clusters that are connected to each other cannot be normally connected to the database, it indicates that neither of the two service clusters can obtain the running status of the other party from the database, and the two service clusters cannot Perform business normally. At this time, neither of them has the right to initiate a vote, that is, neither of them can initiate a vote.
S103:所述具有发起投票权利的业务集群向投票者集合发起投票;其中,所述投票者集合为依据预设的选择规则从使用终端和/或仲裁终端中确定的;所述使用终端为通过连接所述两个业务集群中任何一个进行网络操作的终端,所述仲裁终端为网络节点中除所述使用终端之外的服务器;S103: The service cluster with the right to initiate voting initiates a vote to the set of voters; wherein the set of voters is determined from the use terminal and/or the arbitration terminal according to a preset selection rule; a terminal that connects any one of the two service clusters for network operation, where the arbitration terminal is a server other than the use terminal in the network node;
本实施例中,在S102中只确定出了至多一个具有发起投票权利的业务集群,即只有至多一个具有发起投票权利的业务集群可以发起投票,当确定出了一个具有发起投票权利的业务集群时,所述具有发起投票权利的业务集群向预设的投票者集合发起投票。In this embodiment, only one service cluster having the right to initiate voting is determined in S102, that is, only at most one service cluster having the right to initiate voting can initiate voting, and when a service cluster with voting rights is determined is determined. The service cluster with the right to initiate voting initiates a vote to a preset set of voters.
举例说明:如图3所示,相互连接的两个业务集群为集群A和集群B,当集群A和集群B出现心跳链路故障,即双方无法感知到对方的运行状态时,首先判断出集群A与数据库的连接以及集群B与数据库之间的连接是否是正常的;若集群A和集群B均可以与数据库正常连接,则集群A和集群B均具有发起投票的权利;然后,集群A和集群B分别判断对方是否已经发起了投票,若集群B检测到集群A还未发起投票,则集群B为具有发起投票权利的集群,然后集群B可以向预设的投票者集合发起投票。其中,确定投票者集合时,若User5和User6不经常在线的使用者,则认为User5和User6是无投票权利的使用者,因此不属于投票者集合考虑的范围,从图中可以看出确定出的投票者集合为User1、User2、User3、User4、Abiter1、Abiter2和Abiter3,但是投票者集合中User3可能出现了单点故障,因此无法进行投票,但是即使有少量的使用终端无法进行投票,不影响投票的结果。For example, as shown in Figure 3, two service clusters connected to each other are cluster A and cluster B. When cluster A and cluster B have heartbeat link failures, that is, the two parties cannot sense the other party's running status, first determine the cluster. The connection between A and the database and the connection between the cluster B and the database are normal; if both cluster A and cluster B can be connected to the database, both cluster A and cluster B have the right to initiate voting; then, cluster A and Cluster B determines whether the other party has initiated voting. If cluster B detects that cluster A has not initiated voting, cluster B is the cluster with the right to initiate voting, and then cluster B can initiate voting to the preset set of voters. Where, when determining the set of voters, if User5 and User6 are not frequently online users, User5 and User6 are considered to be users without voting rights, and therefore do not belong to the range considered by the voter set, as can be seen from the figure. The set of voters is User1, User2, User3, User4, Abiter1, Abiter2, and Abiter3, but User3 in the voter set may have a single point of failure, so voting cannot be performed, but even if there are a small number of terminals that cannot vote, it does not affect. The result of the vote.
本实施例中,对于投票者集合中的投票者可以是预配置的,例如可以是技术人员预先设置的,也可以是根据使用者的情况动态选取的;但是,无论是哪一种情况,都需要遵循使用者优先的原则,具体的,S103中依据预设的选择规则,从使用终端和仲裁终端确定所述投票者集合,包括:In this embodiment, the voter in the set of voters may be pre-configured, for example, may be preset by a technician, or may be dynamically selected according to the user's situation; however, in either case, The priority of the user is required to be followed. Specifically, in S103, the set of voters is determined from the using terminal and the arbitration terminal according to a preset selection rule, including:
判断所述使用终端中满足预设条件的使用终端的数量是否达到了预设的投票者集合的数量;Determining whether the number of used terminals satisfying the preset condition in the using terminal reaches a preset number of voting sets;
若满足预设的条件的使用者终端的数量超过了预设的投票者集合的数量,则从所述使用终端中确定投票者集合中的投票者;Determining a voter in the set of voters from the use terminal if the number of user terminals satisfying the preset condition exceeds the preset number of voter sets;
若满足预设条件的使用终端的数量未超过预设的投票者集合的数量,从满足预设条件的和仲裁终端中选择仲裁终端以和所述满足预设条件的使用终端一起确定所述投票者集合。If the number of used terminals satisfying the preset condition does not exceed the preset number of voters, the arbitration terminal is selected from the arbitration terminal satisfying the preset condition to determine the vote together with the use terminal satisfying the preset condition. Collection.
本实施例中,需要说明的是,所述预设的条件可以为使用终端在线时间超过预设的时间且状态稳定。例如,可以理解为,使用终端经常在线且状态稳定,其中所述经常在线可以理解为经常通过S101中提到的相互连接的两个业务集群进行网络操作,例如公安系统的固定采集站、银行的服务器等。In this embodiment, it should be noted that the preset condition may be that the online time of the use terminal exceeds a preset time and the state is stable. For example, it can be understood that the use terminal is often online and stable in state, wherein the frequent online can be understood as a network operation often performed by two interconnected service clusters mentioned in S101, such as a fixed collection station of a public security system, a bank. Server, etc.
本实施例中,当使用终端在线时间超过预设的时间且状态稳定的数量较多,可以从使用终端确定投票者集合;当使用终端在线时间超过预设时间且状态稳定的数量较少时,对于投票者集合确定,可以一部分从使用者终端中确定,一部分从仲裁终端中确定;当不包含在线时间超过预设时间且状态稳定的使用终端时,所有投票者集合均可以从仲裁终端中确定。因此,使用终端中即使有部分发生故障,也不影响本实施例的执行,也就是说即使有部分使用终端发生故障,业务集群仍然可以发起投票。In this embodiment, when the online time of the terminal exceeds the preset time and the number of states is stable, the set of voters may be determined from the use terminal; when the online time of the use terminal exceeds the preset time and the number of states is stable, For the voter set determination, one part may be determined from the user terminal, and some part is determined from the arbitration terminal; when the use terminal whose online time exceeds the preset time and the state is stable, all the voter sets may be determined from the arbitration terminal. . Therefore, even if a part of the use terminal fails, the execution of the embodiment is not affected, that is, even if a part of the use terminal fails, the service cluster can initiate voting.
本实施例中,需要说明的是,所述使用终端可以为通过连接以上提到的这两个业务集群中的任何一个进行网络操作的终端,例如,可以理解为使用终端登录这两个业务集群中的任何一个,被所述使用终端登录的业务集群提供网络服务的终端。除此之外,网络节点中还包括很多经常在线的服务器,但是这些服务器无须连接到这两个业务集群中的任何一个进行网络操作,这样的服务器可以为仲裁者终端,投票者集合中的投票者也可以从仲裁者终端中确定。In this embodiment, it should be noted that the user terminal may be a terminal that performs network operations by connecting any one of the two service clusters mentioned above, for example, it may be understood that the terminal is used to log in to the two service clusters. Any one of the terminals that provide the network service by the service cluster in which the terminal is logged in. In addition, the network node also includes many frequently-on-line servers, but these servers do not need to be connected to any of the two service clusters for network operations. Such servers can be arbitrator terminals, voting in the voter set. The person can also be determined from the arbitrator terminal.
S104:判断发起投票的业务集群是否在预设的时间内获得了预定的票数。S104: Determine whether the service cluster that initiated the voting obtains a predetermined number of votes within a preset time.
本实施例中,预定的票数可以为投票者集合中的大部分票数,更具体的,作为示例,可以为投票者集合中半数以上的票数。并且,投票者集合中投票者的数量可以为奇数也可以为偶数,当发起投票后,只要发起投票的业务集群在预设的时间内获取到了所述投票者集合中半数以上的票数即可认定为投票成功。In this embodiment, the predetermined number of votes may be the majority of the votes in the set of voters, and more specifically, as an example, may be more than half of the votes in the set of voters. Moreover, the number of voters in the set of voters may be an odd number or an even number. When the voting is initiated, the business cluster that initiated the voting may obtain more than half of the votes in the set of voters within a preset time. Successful for voting.
具体的,发起投票的业务集群获得的票数是所述投票者投票的加权和,并且,所述投票者的权重根据重要性而改变,这里所说的重要性为投票者的重要性。Specifically, the number of votes obtained by the polling business cluster is a weighted sum of the voter votes, and the weight of the voter changes according to importance, and the importance here is the importance of the voter.
在实际应用中,具有发起投票权利的业务集群发起了投票后,投票者中可 能有一些会出现故障,这些出现故障的业务集群就不再参与投票,但只要出现故障的业务集群大于投票者集合中半数以上,就不会对投票结果产生影响。In practical applications, after a business cluster with voting rights initiates a vote, some of the voters may fail. These failed business clusters no longer participate in voting, but as long as the failed business cluster is larger than the voter set. More than half of them will not affect the voting results.
S105:若所述发起投票的业务集群在预设的时间内获得了预定的票数,将所述发起投票的业务集群由只读模式切换为正常服务模式,将投票结果通知所述两个集群中未发起投票的业务集群。S105: If the service cluster that initiates voting obtains a predetermined number of votes within a preset time, the service cluster that initiates the voting is switched from the read-only mode to the normal service mode, and the voting result is notified to the two clusters. A business cluster that did not initiate a vote.
举例说明:如图3所示,若集群B发起了投票,并且在预设的时间内获得了投票者集合中半数以上的票数,则表示投票成功,此时,可以认为集群B为主集群,而集群A为集群B的备份集群,或者可以理解为,B发起投票成功后,B为使用终端提供正常服务的集群,而A此时要停止服务。For example, as shown in FIG. 3, if cluster B initiates voting and obtains more than half of the votes in the set of voters within a preset time, it indicates that the voting is successful. At this time, cluster B can be considered as the primary cluster. Cluster A is the backup cluster of cluster B. It can be understood that after B initiates a successful vote, B is a cluster that provides normal services by using the terminal, and A stops the service at this time.
本实施例中,对于正常服务模式,可以理解为,此时发起投票成功的集群可以正常的连接使用终端,并为使用终端提供网络服务,并且可以在数据库中执行读写的操作。In this embodiment, for the normal service mode, it can be understood that the cluster that successfully initiates the voting can connect to the terminal normally, provide network services for the use terminal, and can perform read and write operations in the database.
本实施例中,将投票结果通知所述两个集群中未发起投票的业务集群具体可以包括以下两种实施方式:In this embodiment, the service cluster that notifies the two clusters that the voting is not initiated may specifically include the following two implementation manners:
实施方式一:将所述投票结果写入所述数据库;所述未发起投票的业务集群从所述数据库中读取所述投票结果。Embodiment 1: Write the voting result into the database; the service cluster that has not initiated voting reads the voting result from the database.
实施方式二:将所述投票结果发送给连接到所述发起投票的业务集群的使用终端和/或仲裁终端;当所述使用终端和/或仲裁终端与所述未发起投票的业务集群相连接时,将所述投票结果发送给所述未发起投票的业务集群。Embodiment 2: transmitting the voting result to a use terminal and/or an arbitration terminal connected to the service cluster that initiates voting; when the use terminal and/or the arbitration terminal are connected to the service cluster that does not initiate voting The voting result is sent to the service cluster that has not initiated voting.
本实施例中,将投票结果发送给连接到发起投票的业务集群的投票者和除投票者之外的使用终端中,其中,投票者为发起投票的业务集群进行投票的使用终端和/或仲裁终端。In this embodiment, the voting result is sent to a voter connected to the service cluster that initiated the vote and a use terminal other than the voter, wherein the voter uses the terminal and/or the arbitration to vote for the service cluster that initiated the vote. terminal.
本实施例中,执行将投票结果通知相互连接的两个集群中未发起投票的业务集群的操作时,可以执行实施方式一的方案也可以执行实施方式二的方案,除此之外,还可以同时执行实施方式一和实施方式二。但本实施例中,优选的实施方式为:既执行实施方式一也执行实施方式二。这样,当未发起投票的业务集群能够连接上服务器时,可以通过读取服务器中发起投票的业务集群的运行状态,获知投票的结果,即使未发起投票的一方在无法与所述数据库正常连接时,也可以从未发起投票的业务集群连接上的使用终端中获知对方的投票结果。In this embodiment, when the operation of notifying the service cluster in the two clusters that are connected to each other is performed, the solution of the first embodiment may be executed, and the solution of the second embodiment may be executed. Embodiment 1 and Embodiment 2 are simultaneously performed. However, in this embodiment, a preferred embodiment is to perform both the first embodiment and the second embodiment. In this way, when the service cluster that has not initiated the voting can connect to the server, the result of the voting can be known by reading the running status of the service cluster that initiated the voting in the server, even if the party that did not initiate the voting is unable to connect with the database normally. It is also possible to know the voting result of the other party from the use terminal on the service cluster connection that has not initiated the voting.
举例说明:如图4所示,若集群B为发起投票的业务集群,集群A为未发起投票的业务集群,投票结束后,集群B可以将投票结果告知数据库和连接到集群B的投票者以及除投票者之外的使用终端,其中,User1、User2、User3、User4、Arbiter1、Arbiter2和Arbiter3为投票者,而User5和User6为无投票权的使用终端,例如可以为除投票者之外的使用终端,其中User3可能为未与集群B相连接的使用终端,而未与集群B进行连接的原因可能为出现了单点故障,或者是出现了网络故障,然后,可以由接收到投票结果的数据库、投票者或除投票者之外的使用终端将所述投票结果告知集群A。但是,若是发起投票的一方投票超时,即发起投票的业务集群未在预设的时间内获得所述投票者集合中半数以上的票数,所述两个业务集群中未发起投票业务集群向预设的投票者集合发起投票,并返回执行S104。需要说明的是,两个业务集群中未发起投票的业务集群可以发起投票的前提为未发起投票的业务集群可以与数据库进行正常的连接。For example, as shown in Figure 4, if cluster B is a service cluster that initiates voting, cluster A is a service cluster that does not initiate voting. After voting ends, cluster B can inform the database of the voting result and the voters connected to cluster B. User terminals other than voters, where User1, User2, User3, User4, Arbiter1, Arbiter2, and Arbiter3 are voters, and User5 and User6 are non-voting user terminals, for example, can be used other than voters. Terminal, where User3 may be a use terminal that is not connected to cluster B, and the reason for not connecting with cluster B may be a single point of failure, or a network failure, and then a database that can receive the voting result The voter or the use terminal other than the voter informs the cluster A of the vote result. However, if the party that initiated the voting times out, the service cluster that initiated the voting does not obtain more than half of the votes in the set of voters within a preset time, and the voting service cluster is not initiated in the two service clusters. The set of voters initiates a vote and returns to execution S104. It should be noted that the service clusters that have not initiated voting in the two service clusters can initiate voting. The premise that the service clusters that have not initiated voting can be normally connected with the database.
本实施例中,当相互连接的两个业务集群之间无法感知到对方的运行状态时,将这两个业务集群切换为只读模式,根据两个业务集群与数据库之间是否能正常连接的结果确定至多一个具有发起投票权利的业务集群发起投票,而不具有发起投票权利的业务集群不发起投票,具有发起投票权利的业务集群向所述投票者集合发起投票,若在预设的时间内获得了预定的票数,将发起投票的业务集群由只读模式切换为正常服务模式,将投票结果通知这两个集群中未发起投票的业务集群,并停止未发起投票的业务集群的工作。这样,在保证数据可以进行读取的情况下,还完全的避免了脑裂的发生;并且,由于投票者集合中投票者是以使用终端为优先原则确定的,因此投票结果更加贴近实际的使用情况;除此之外,投票结果采用多路径传递的结果,即使未发起投票的一方部份链路出现故障也能获知投票结果。In this embodiment, when the two service clusters that are connected to each other cannot sense the running state of the other party, the two service clusters are switched to the read-only mode, and according to whether the two service clusters and the database can be normally connected. As a result, it is determined that at most one service cluster having the right to initiate voting initiates a vote, and the service cluster without the right to initiate voting does not initiate voting, and the service cluster having the right to initiate voting initiates a vote to the set of voters, if within a preset time The predetermined number of votes is obtained, and the service cluster that initiated the voting is switched from the read-only mode to the normal service mode, the voting result is notified to the service clusters in the two clusters that have not initiated the voting, and the work of the service cluster that has not initiated the voting is stopped. In this way, in the case of ensuring that the data can be read, the occurrence of brain splitting is completely avoided; and since the voter in the set of voters is determined by using the terminal as a priority principle, the voting result is closer to the actual use. In addition, the voting result is the result of multi-path transmission, and the voting result can be known even if a part of the link that has not initiated the voting fails.
参考图5,示出了本发明实施例提供的一种分布式集群脑裂的处理装置的结构示意图,所述装置可以包括:Referring to FIG. 5, a schematic structural diagram of a processing device for a distributed cluster brain split according to an embodiment of the present invention is shown. The device may include:
第一模式切换单元501,用于当分布式系统中相互连接的两个业务集群之间无法感知到对方的运行状态时,将所述两个业务集群切换为只读模式;The first mode switching unit 501 is configured to switch the two service clusters into a read-only mode when the two service clusters connected to each other in the distributed system cannot detect the running state of the other party;
第一判断单元502,用于判断所述两个业务集群与数据库之间的连接是否正常;The first determining unit 502 is configured to determine whether the connection between the two service clusters and the database is normal;
第一确定单元503,用于依据所述两个业务集群与所述数据库之间的连接 是否正常的判断结果,从所述两个业务集群中确定至多一个具有发起投票权利的业务集群;a first determining unit 503, configured to determine, according to a judgment result that the connection between the two service clusters and the database is normal, at most one service cluster having the right to initiate voting from the two service clusters;
第一发起投票单元504,用于所述具有发起投票权利的业务集群向投票者集合发起投票;其中,所述投票者集合为依据预设的选择规则,从使用终端和/或仲裁终端中确定的;所述使用终端为通过连接所述两个业务集群中任何一个进行网络操作的终端,所述仲裁终端为网络节点中除所述使用终端之外的服务器;a first initiating voting unit 504, configured to initiate a voting to a set of voters by the service cluster having the right to initiate voting; wherein the set of voters is determined from the using terminal and/or the arbitration terminal according to a preset selection rule The use terminal is a terminal that performs network operation by connecting any one of the two service clusters, and the arbitration terminal is a server of the network node other than the use terminal;
第二判断单元505,用于判断发起投票的业务集群是否在预设的时间内获得了预定的票数;The second determining unit 505 is configured to determine whether the service cluster that initiated the voting obtains a predetermined number of votes within a preset time period;
第二模式切换单元506,用于若所述发起投票的业务集群在预设的时间内获得了预定的票数,将所述发起投票的业务集群由只读模式切换为正常服务模式;The second mode switching unit 506 is configured to switch the service cluster that initiates the voting from the read-only mode to the normal service mode, if the service cluster that initiates the voting obtains a predetermined number of votes within a preset time;
通知单元507,用于若所述发起投票的业务集群在预设的时间内获得了预定的票数,将投票结果通知所述两个集群中未发起投票的业务集群。The notification unit 507 is configured to notify the service cluster that does not initiate voting in the two clusters if the service cluster that initiated the voting obtains a predetermined number of votes within a preset time.
可选的,所述第一确定单元包括:Optionally, the first determining unit includes:
检测子单元,用于若所述两个业务集群与所述数据的连接均正常,所述两个业务集群中每个业务集群分别检测其它业务集群是否发起了投票;a detecting subunit, configured to: if each of the two service clusters is connected to the data, each of the two service clusters detects whether another service cluster initiates voting;
第一确定子单元,用于当所述两个业务集群中的第一业务集群未检测到其它业务集群发起了投票,确定所述第一业务集群具有发起投票的权利;a first determining subunit, configured to: when the first service cluster of the two service clusters does not detect that another service cluster initiates voting, determining that the first service cluster has the right to initiate voting;
第二确定子单元,用于若所述两个业务集群中只有一个与所述数据库的连接正常,则确定与所述数据库连接正常的业务集群具有发起投票的权利;a second determining subunit, configured to: if only one of the two service clusters is normally connected to the database, determine that the service cluster that is normally connected to the database has the right to initiate voting;
第三确定子单元,用于若所述两个业务集群与所述数据库的连接均不正常,则确定所述两个业务集群均不具有发起投票的权利。And a third determining subunit, configured to determine that the two service clusters do not have the right to initiate voting if the connection between the two service clusters and the database is abnormal.
可选的,所述第一发起投票单元包括:Optionally, the first initiating voting unit includes:
写入单元,用于在所述数据库中写入投票记录和时间戳Write unit for writing a vote record and a time stamp in the database
可选的,所述检测子单元,包括:Optionally, the detecting subunit includes:
查询子单元,用于在所述数据库中查询业务集群的投票记录和时间戳;Querying a subunit, configured to query a voting record and a timestamp of the service cluster in the database;
判定子单元,用于当所述其它业务集群的投票记录和时间戳存在且当前时间未超出从所述时间戳开始的预设时间时,判定所述其它业务集群发起了投票;当所述其它业务集群的投票记录不存在或者当前时间超出从所述时间戳开 始的预设时间时,判定所述其它业务集群没有发起投票。a determining subunit, configured to: when the voting record and the timestamp of the other service cluster exist and the current time does not exceed a preset time starting from the timestamp, determine that the other service cluster initiates voting; when the other When the voting record of the service cluster does not exist or the current time exceeds the preset time from the time stamp, it is determined that the other service cluster does not initiate voting.
可选的,还包括:Optionally, it also includes:
第三判断单元,用于判断所述使用终端中满足预设条件的终端的数量是否达到了预设的投票者集合的数量;a third determining unit, configured to determine whether the number of terminals that meet the preset condition in the using terminal reaches a preset number of voting sets;
第二确定单元,用于若满足预设条件的使用终端的数量超过了预设的投票者集合的数量,则从所述使用者中确定投票者集合中的投票者;a second determining unit, configured to determine, from the user, a voter in the set of voters if the number of used terminals satisfying the preset condition exceeds the preset number of voters;
第三确定单元,用于若满足预设条件的使用终端的数量未超过预设的投票者集合的数量,从满足预设条件的仲裁终端中选择仲裁终端以和预设条件的使用终端一起确定所述投票者集合。a third determining unit, configured to: if the number of used terminals that meet the preset condition does not exceed the preset number of voter sets, select an arbitration terminal from the arbitration terminal that meets the preset condition to determine with the use terminal of the preset condition The set of voters.
可选的,所述预设条件为:Optionally, the preset condition is:
使用终端在线时间超过预设的时间且状态稳定。The terminal is online for more than the preset time and the state is stable.
可选的,所述通知单元,包括:Optionally, the notification unit includes:
写入子单元,用于将所述投票结果写入所述数据库;Writing a subunit for writing the voting result to the database;
读取子单元,用于所述未发起投票的业务集群从所述数据库中读取所述投票结果。Reading a subunit for reading the voting result from the database for the business cluster that did not initiate voting.
可选的,所述通知单元,包括:Optionally, the notification unit includes:
第一发送子单元,用于将所述投票结果发送给连接到所述发起投票的业务集群的使用终端和/或仲裁终端;a first sending subunit, configured to send the voting result to a using terminal and/or an arbitration terminal connected to the service cluster that initiates the voting;
第二发送子单元,用于当所述使用终端和/或仲裁终端与所述未发起投票的业务集群相连接时,将所述投票结果发送给所述未发起投票的业务集群。And a second sending subunit, configured to send the voting result to the service cluster that does not initiate voting when the using terminal and/or the arbitration terminal are connected to the service cluster that does not initiate voting.
可选的,还包括:Optionally, it also includes:
第二发起投票单元,用于若发起投票的业务集群未在预设的时间内获得预定的票数,所述两个业务集群中未发起投票的业务集群发起投票并返回第二判断单元。The second initiating voting unit is configured to: if the service cluster that initiates the voting does not obtain the predetermined number of votes in a preset time, the service cluster that has not initiated the voting in the two service clusters initiates voting and returns to the second determining unit.
可选的,所述预定的票数为所述投票者集合中半数以上的票数。Optionally, the predetermined number of votes is more than half of the votes in the set of voters.
可选的,发起投票的业务集群获得的票数是所述投票者投票的加权和。Optionally, the number of votes obtained by the polling business cluster is a weighted sum of the voter votes.
可选的,所述投票者的权重根据其重要性而改变。Alternatively, the weight of the voter changes according to its importance.
可选的,不具有发起投票权利的业务集群不发起投票。Optionally, a business cluster that does not have the right to initiate voting does not initiate a vote.
通过本实施例的装置,在保证数据可以进行读取的情况下,还完全的避免了脑裂的发生;并且,由于投票者集合中投票者是以使用终端为优先原则确定 的,因此投票结果更加贴近实际的使用情况;除此之外,投票结果采用多路径传递的结果,即使未发起投票的一方部份链路出现故障也能获知投票结果。With the apparatus of the embodiment, in the case where the data can be read, the occurrence of brain splitting is completely avoided; and since the voter in the set of voters is determined based on the principle of using the terminal, the result of the voting More close to the actual use situation; in addition, the voting result is the result of multi-path transmission, even if the part of the link that has not initiated the voting fails, the voting result can be known.
本发明实施例提供了一种存储介质,其上存储有程序,该程序被处理器执行时实现所述分布式集群脑裂的处理方法。Embodiments of the present invention provide a storage medium on which a program is stored, and when the program is executed by a processor, the distributed cluster splitting processing method is implemented.
参考图6,示出了本发明实施例提供了一种分布式集群脑裂的处理设备的结构示意图,在本实施例中,该设备包括存储器601、处理器602;FIG. 6 is a schematic structural diagram of a processing device for a distributed cluster brain splitting according to an embodiment of the present invention. In this embodiment, the device includes a memory 601 and a processor 602.
其中,存储器601,用于存储程序;Wherein, the memory 601 is configured to store a program;
处理器602,用于运行所述程序,具体的,处理器执行程序时实现以下步骤:The processor 602 is configured to run the program. Specifically, when the processor executes the program, the following steps are implemented:
当分布式系统中相互连接的两个业务集群之间无法感知到对方的运行状态时,所述两个业务集群切换为只读模式;When the two service clusters connected to each other in the distributed system cannot detect the running status of the other party, the two service clusters are switched to the read-only mode.
判断所述两个业务集群与数据库之间的连接是否正常,并依据所述两个业务集群与所述数据库之间的连接是否正常的判断结果,从所述两个业务集群中确定至多一个具有发起投票权利的业务集群;Determining whether the connection between the two service clusters and the database is normal, and determining, according to a judgment result of whether the connection between the two service clusters and the database is normal, determining at most one of the two service clusters a business cluster that initiates voting rights;
所述具有发起投票权利的业务集群向投票者集合发起投票;其中,所述投票者集合为依据预设的选择规则从使用终端和/或仲裁终端中确定的;所述使用终端为通过连接所述两个业务集群中任何一个进行网络操作的终端,所述仲裁终端为网络节点中除所述使用终端之外的服务器;The service cluster having the right to initiate voting initiates a vote to the set of voters; wherein the set of voters is determined from the use terminal and/or the arbitration terminal according to a preset selection rule; a terminal that performs network operation in any one of two service clusters, where the arbitration terminal is a server other than the use terminal in the network node;
判断发起投票的业务集群是否在预设的时间内获得了预定的票数;Determining whether the business cluster that initiated the voting has obtained a predetermined number of votes within a preset time;
若所述发起投票的业务集群在预设的时间内获得了预定的票数,将所述发起投票的业务集群由只读模式切换为正常服务模式,将投票结果通知所述两个集群中未发起投票的业务集群。If the polling service cluster obtains a predetermined number of votes within a preset time, the polling service cluster is switched from the read-only mode to the normal service mode, and the voting result is notified that the two clusters are not initiated. Voting business cluster.
可选的,依据所述两个业务集群与所述数据库之间的连接是否正常的判断结果,从所述两个业务集群中确定至多一个具有发起投票权利的业务集群,包括:Optionally, determining, according to the judgment result that the connection between the two service clusters and the database is normal, determining, from the two service clusters, at most one service cluster having the right to initiate voting, includes:
若所述两个业务集群与所述数据库的连接均正常,所述两个业务集群中每个业务集群分别检测其它业务集群是否发起了投票;If the connection between the two service clusters and the database is normal, each of the two service clusters detects whether another service cluster initiates voting;
当所述两个业务集群中的第一业务集群未检测到其它业务集群发起了投票,所述第一业务集群具有发起投票的权利;When the first service cluster of the two service clusters does not detect that other service clusters initiate voting, the first service cluster has the right to initiate voting;
若所述两个业务集群中只有一个与所述数据库的连接正常,则与所述数据 库连接正常的业务集群具有发起投票的权利;If only one of the two service clusters is properly connected to the database, the service cluster that is connected to the database has the right to initiate voting;
若所述两个业务集群与所述数据库的连接均不正常,所述两个业务集群均不具有发起投票的权利。If the connection between the two service clusters and the database is abnormal, the two service clusters do not have the right to initiate voting.
可选的,所述具有发起投票权利的业务集群向投票者集合发起投票,包括:Optionally, the service cluster with the right to initiate voting initiates a vote to the set of voters, including:
发起投票的业务集群在所述数据库中写入投票记录和时间戳。The business cluster that initiated the vote writes a voting record and a timestamp in the database.
可选的,所述若所述两个业务集群与所述数据库的连接均正常,所述两个业务集群中每个业务集群分别检测其它业务集群是否发起了投票,包括:Optionally, if the connection between the two service clusters and the database is normal, each of the two service clusters detects whether other service clusters initiate voting, including:
所述两个业务集群中的第一业务集群在所述数据库中查询其它的投票记录和时间戳;The first service cluster of the two service clusters queries other voting records and timestamps in the database;
当所述其它业务集群的投票记录和时间戳存在且当前时间未超出从所述时间戳开始的预设时间时,所述其它业务集群发起了投票;When the voting records and time stamps of the other service clusters exist and the current time does not exceed the preset time starting from the time stamp, the other service clusters initiate voting;
当所述其它业务集群的投票记录不存在,或者所述其它业务集群的投票记录和时间戳存在且当前时间超出从所述时间戳开始的预设时间时,所述其它业务集群没有发起投票。When the voting records of the other service clusters do not exist, or the voting records and time stamps of the other service clusters exist and the current time exceeds a preset time starting from the timestamp, the other service clusters do not initiate voting.
可选的,依据预设的选择规则,从使用终端和/或仲裁终端确定所述投票者集合,包括:Optionally, determining the set of voters from the using terminal and/or the arbitration terminal according to a preset selection rule, including:
判断所述使用终端中满足预设条件的使用终端的数量是否达到了预设的投票者集合的数量;Determining whether the number of used terminals satisfying the preset condition in the using terminal reaches a preset number of voting sets;
若满足预设条件的使用终端的数量超过了预设的投票者集合的数量,则从所述使用终端中确定投票者集合中的投票者;Determining a voter in the set of voters from the use terminal if the number of use terminals satisfying the preset condition exceeds the preset number of voter sets;
若满足预设条件的使用终端的数量未超过预设的投票者集合的数量,从满足预设条件的仲裁终端中选择仲裁终端以和所述满足预设条件的使用终端一起确定所述投票者集合。If the number of used terminals that satisfy the preset condition does not exceed the preset number of voters, the arbitration terminal is selected from the arbitration terminals that satisfy the preset condition to determine the voter together with the use terminal that meets the preset condition. set.
可选的,所述预设条件为:Optionally, the preset condition is:
使用终端在线时间超过预设的时间且状态稳定。The terminal is online for more than the preset time and the state is stable.
可选的,所述将投票结果通知所述两个集群中未发起投票的业务集群,包括:Optionally, the notifying the voting result to the service clusters in the two clusters that do not initiate voting, including:
将所述投票结果写入所述数据库;Writing the voting result to the database;
所述未发起投票的业务集群从所述数据库中读取所述投票结果。The service cluster that has not initiated voting reads the voting result from the database.
可选的,所述将投票结果通知所述两个集群中未发起投票的业务集群,包 括:Optionally, the notifying the voting result to the service cluster in the two clusters that does not initiate voting, including:
将所述投票结果发送给连接到所述发起投票的业务集群的使用终端和/或仲裁终端;Sending the voting result to a use terminal and/or an arbitration terminal connected to the service cluster that initiated the voting;
当所述使用终端和/或仲裁终端与所述未发起投票的业务集群相连接时,将所述投票结果发送给所述未发起投票的业务集群。When the use terminal and/or the arbitration terminal are connected to the service cluster that has not initiated voting, the voting result is sent to the service cluster that has not initiated voting.
可选的,所述判断发起投票的业务集群是否在预设的时间内获得了预定的票数后,还包括:Optionally, after determining whether the service cluster that initiated the voting obtains the predetermined number of votes within a preset time, the method further includes:
若发起投票的业务集群未在预设的时间内获得预定的票数,所述两个业务集群中未发起投票的业务集群发起投票并返回执行判断发起投票的业务集群是否在预设的时间内获得了预定的票数。If the service cluster that initiated the voting does not obtain the predetermined number of votes within a preset time, the service clusters that have not initiated the voting in the two service clusters initiate voting and return to the service cluster that performs the judgment to initiate the voting to obtain the preset time. The number of tickets booked.
可选的,所述预定的票数为所述投票者集合中半数以上的票数。Optionally, the predetermined number of votes is more than half of the votes in the set of voters.
可选的,发起投票的业务集群获得的票数是所述投票者投票的加权和。Optionally, the number of votes obtained by the polling business cluster is a weighted sum of the voter votes.
可选的,所述投票者的权重根据其重要性而改变。Alternatively, the weight of the voter changes according to its importance.
可选的,不具有发起投票权利的业务集群不发起投票。Optionally, a business cluster that does not have the right to initiate voting does not initiate a vote.
其中,本实施例中的设备可以为服务器、PC、PAD、手机等。The device in this embodiment may be a server, a PC, a PAD, a mobile phone, or the like.
其中,本实施例的设备中的存储器可以包括计算机可读存储介质中的非永久性存储器、随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器或者闪存(falsh RAM),存储器包括至少一个存储芯片。The memory in the device of this embodiment may include a non-permanent memory, a random access memory (RAM), and/or a non-volatile memory in a computer-readable storage medium, such as a read-only memory or a flash memory (falsh). RAM), the memory includes at least one memory chip.
通过本实施例的设备,在保证数据可以进行读取的情况下,还完全的避免了脑裂的发生;并且,由于投票者集合中投票者是以使用终端为优先原则确定的,因此投票结果更加贴近实际的使用情况;除此之外,投票结果采用多路径传递的结果,即使未发起投票的一方部份链路出现故障也能获知投票结果。With the device of the embodiment, in the case where the data can be read, the occurrence of brain splitting is completely avoided; and since the voter in the set of voters is determined based on the principle of using the terminal, the voting result is More close to the actual use situation; in addition, the voting result is the result of multi-path transmission, even if the part of the link that has not initiated the voting fails, the voting result can be known.
需要说明的是,本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。It should be noted that each embodiment in the specification is described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the embodiments are referred to each other. can.
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments are obvious to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention is not to be limited to the embodiments shown herein, but the scope of the invention is to be accorded

Claims (28)

  1. 一种分布式集群脑裂的处理方法,其特征在于,所述方法包括:A method for processing a distributed cluster brain split, characterized in that the method comprises:
    当分布式系统中相互连接的两个业务集群之间无法感知到对方的运行状态时,所述两个业务集群切换为只读模式;When the two service clusters connected to each other in the distributed system cannot detect the running status of the other party, the two service clusters are switched to the read-only mode.
    判断所述两个业务集群与数据库之间的连接是否正常,并依据所述两个业务集群与所述数据库之间的连接是否正常的判断结果,从所述两个业务集群中确定至多一个具有发起投票权利的业务集群;Determining whether the connection between the two service clusters and the database is normal, and determining, according to a judgment result of whether the connection between the two service clusters and the database is normal, determining at most one of the two service clusters a business cluster that initiates voting rights;
    所述具有发起投票权利的业务集群向投票者集合发起投票;其中,所述投票者集合为依据预设的选择规则从使用终端和/或仲裁终端中确定的;所述使用终端为通过连接所述两个业务集群中任何一个进行网络操作的终端,所述仲裁终端为网络节点中除所述使用终端之外的服务器;The service cluster having the right to initiate voting initiates a vote to the set of voters; wherein the set of voters is determined from the use terminal and/or the arbitration terminal according to a preset selection rule; a terminal that performs network operation in any one of two service clusters, where the arbitration terminal is a server other than the use terminal in the network node;
    判断发起投票的业务集群是否在预设的时间内获得了预定的票数;Determining whether the business cluster that initiated the voting has obtained a predetermined number of votes within a preset time;
    若所述发起投票的业务集群在预设的时间内获得了预定的票数,将所述发起投票的业务集群由只读模式切换为正常服务模式,将投票结果通知所述两个集群中未发起投票的业务集群。If the polling service cluster obtains a predetermined number of votes within a preset time, the polling service cluster is switched from the read-only mode to the normal service mode, and the voting result is notified that the two clusters are not initiated. Voting business cluster.
  2. 根据权利要求1所述的方法,其特征在于,依据所述两个业务集群与所述数据库之间的连接是否正常的判断结果,从所述两个业务集群中确定至多一个具有发起投票权利的业务集群,包括:The method according to claim 1, wherein at least one of the two service clusters is determined to have the right to initiate voting according to a judgment result of whether the connection between the two service clusters and the database is normal. Business clusters, including:
    若所述两个业务集群与所述数据库的连接均正常,所述两个业务集群中每个业务集群分别检测其它业务集群是否发起了投票;If the connection between the two service clusters and the database is normal, each of the two service clusters detects whether another service cluster initiates voting;
    当所述两个业务集群中的第一业务集群未检测到其它业务集群发起了投票,所述第一业务集群具有发起投票的权利;When the first service cluster of the two service clusters does not detect that other service clusters initiate voting, the first service cluster has the right to initiate voting;
    若所述两个业务集群中只有一个与所述数据库的连接正常,则与所述数据库连接正常的业务集群具有发起投票的权利;If only one of the two service clusters is properly connected to the database, the service cluster that is connected to the database has the right to initiate voting;
    若所述两个业务集群与所述数据库的连接均不正常,所述两个业务集群均不具有发起投票的权利。If the connection between the two service clusters and the database is abnormal, the two service clusters do not have the right to initiate voting.
  3. 根据权利要求1所述的方法,其特征在于,所述具有发起投票权利的业务集群向投票者集合发起投票,包括:The method of claim 1, wherein the service cluster having the right to initiate voting initiates a vote to the set of voters, including:
    发起投票的业务集群在所述数据库中写入投票记录和时间戳。The business cluster that initiated the vote writes a voting record and a timestamp in the database.
  4. 根据权利要求3所述的方法,其特征在于,所述若所述两个业务集群 与所述数据库的连接均正常,所述两个业务集群中每个业务集群分别检测其它业务集群是否发起了投票,包括:The method according to claim 3, wherein if the connection between the two service clusters and the database is normal, each of the two service clusters detects whether another service cluster is initiated. Vote, including:
    所述两个业务集群中的第一业务集群在所述数据库中查询其它业务集群的投票记录和时间戳;The first service cluster of the two service clusters queries the database for the voting records and time stamps of other service clusters;
    当所述其它业务集群的投票记录和时间戳存在且当前时间未超出从所述时间戳开始的预设时间时,所述其它业务集群发起了投票;When the voting records and time stamps of the other service clusters exist and the current time does not exceed the preset time starting from the time stamp, the other service clusters initiate voting;
    当所述其它业务集群的投票记录不存在,或者所述其它业务集群的投票记录和时间戳存在且当前时间超出从所述时间戳开始的预设时间时,所述其它业务集群没有发起投票。When the voting records of the other service clusters do not exist, or the voting records and time stamps of the other service clusters exist and the current time exceeds a preset time starting from the timestamp, the other service clusters do not initiate voting.
  5. 根据权利要求1所述的方法,其特征在于,依据预设的选择规则,从使用终端和/或仲裁终端确定所述投票者集合,包括:The method according to claim 1, wherein determining the set of voters from the use terminal and/or the arbitration terminal according to a preset selection rule comprises:
    判断所述使用终端中满足预设条件的使用终端的数量是否达到了预设的投票者集合的数量;Determining whether the number of used terminals satisfying the preset condition in the using terminal reaches a preset number of voting sets;
    若满足预设条件的使用终端的数量超过了预设的投票者集合的数量,则从所述使用终端中确定投票者集合中的投票者;Determining a voter in the set of voters from the use terminal if the number of use terminals satisfying the preset condition exceeds the preset number of voter sets;
    若满足预设条件的使用终端的数量未超过预设的投票者集合的数量,从满足预设条件的仲裁终端中选择仲裁终端以和所述满足预设条件的使用终端一起确定所述投票者集合。If the number of used terminals that satisfy the preset condition does not exceed the preset number of voters, the arbitration terminal is selected from the arbitration terminals that satisfy the preset condition to determine the voter together with the use terminal that meets the preset condition. set.
  6. 根据权利要求5所述的方法,其特征在于,所述预设条件为:The method of claim 5 wherein said predetermined condition is:
    使用终端在线时间超过预设的时间且状态稳定。The terminal is online for more than the preset time and the state is stable.
  7. 根据权利要求1所述的方法,其特征在于,所述将投票结果通知所述两个集群中未发起投票的业务集群,包括:The method according to claim 1, wherein the notifying the voting result to the service cluster that does not initiate voting in the two clusters comprises:
    将所述投票结果写入所述数据库;Writing the voting result to the database;
    所述未发起投票的业务集群从所述数据库中读取所述投票结果。The service cluster that has not initiated voting reads the voting result from the database.
  8. 根据权利要求1所述的方法,其特征在于,所述将投票结果通知所述两个集群中未发起投票的业务集群,包括:The method according to claim 1, wherein the notifying the voting result to the service cluster that does not initiate voting in the two clusters comprises:
    将所述投票结果发送给连接到所述发起投票的业务集群的使用终端和/或仲裁终端;Sending the voting result to a use terminal and/or an arbitration terminal connected to the service cluster that initiated the voting;
    当所述使用终端和/或仲裁终端与所述未发起投票的业务集群相连接时,将所述投票结果发送给所述未发起投票的业务集群。When the use terminal and/or the arbitration terminal are connected to the service cluster that has not initiated voting, the voting result is sent to the service cluster that has not initiated voting.
  9. 根据权利要求1所述的方法,其特征在于,所述判断发起投票的业务集群是否在预设的时间内获得了预定的票数后,还包括:The method according to claim 1, wherein the determining whether the service cluster initiating the voting has obtained the predetermined number of votes within a preset time further comprises:
    若发起投票的业务集群未在预设的时间内获得预定的票数,所述两个业务集群中未发起投票的业务集群发起投票并返回执行判断发起投票的业务集群是否在预设的时间内获得了预定的票数。If the service cluster that initiated the voting does not obtain the predetermined number of votes within a preset time, the service clusters that have not initiated the voting in the two service clusters initiate voting and return to the service cluster that performs the judgment to initiate the voting to obtain the preset time. The number of tickets booked.
  10. 根据权利要求1所述的方法,其特征在于,所述预定的票数为所述投票者集合中半数以上的票数。The method of claim 1 wherein said predetermined number of votes is more than half of said votes in said set of voters.
  11. 根据权利要求1所述的方法,其特征在于,发起投票的业务集群获得的票数是所述投票者投票的加权和。The method of claim 1 wherein the number of votes obtained by the polling service cluster is a weighted sum of said voter votes.
  12. 根据权利要求11所述的方法,其特征在于,所述投票者的权重根据其重要性而改变。The method of claim 11 wherein the voter's weight changes according to its importance.
  13. 根据权利要求1所述的方法,其特征在于,不具有发起投票权利的业务集群不发起投票。The method of claim 1 wherein the business cluster that does not have the right to initiate voting does not initiate voting.
  14. 一种分布式集群脑裂的处理装置,所述装置包括:A distributed cluster brain splitting processing device, the device comprising:
    第一模式切换单元,用于当分布式系统中相互连接的两个业务集群之间无法感知到对方的运行状态时,将所述两个业务集群切换为只读模式;a first mode switching unit, configured to switch the two service clusters into a read-only mode when the two service clusters connected to each other in the distributed system cannot sense the running state of the other party;
    第一判断单元,用于判断所述两个业务集群与数据库之间的连接是否正常;a first determining unit, configured to determine whether the connection between the two service clusters and the database is normal;
    第一确定单元,用于依据所述两个业务集群与所述数据库之间的连接是否正常的判断结果,从所述两个业务集群中确定至多一个具有发起投票权利的业务集群;a first determining unit, configured to determine, according to a judgment result that the connection between the two service clusters and the database is normal, at most one service cluster having a voting right from the two service clusters;
    第一发起投票单元,用于所述具有发起投票权利的业务集群向投票者集合发起投票;其中,所述投票者集合为依据预设的选择规则,从使用终端和/或仲裁终端中确定的;所述使用终端为通过连接所述两个业务集群中任何一个进行网络操作的终端,所述仲裁终端为网络节点中除所述使用终端之外的服务器;a first initiating voting unit, configured to initiate a voting to a set of voters by the service cluster having the right to initiate voting; wherein the set of voters is determined from the using terminal and/or the arbitration terminal according to a preset selection rule The use terminal is a terminal that performs network operation by connecting any one of the two service clusters, where the arbitration terminal is a server other than the use terminal in the network node;
    第二判断单元,用于判断发起投票的业务集群是否在预设的时间内获得了预定的票数;a second determining unit, configured to determine whether the service cluster that initiated the voting obtains a predetermined number of votes within a preset time;
    第二模式切换单元,用于若所述发起投票的业务集群在预设的时间内获得了预定的票数,将所述发起投票的业务集群由只读模式切换为正常服务模式;a second mode switching unit, configured to: if the service cluster that initiates voting obtains a predetermined number of votes in a preset time, switch the service cluster that initiates the voting from a read-only mode to a normal service mode;
    通知单元,用于若所述发起投票的业务集群在预设的时间内获得了预定的票数,将投票结果通知所述两个集群中未发起投票的业务集群。The notification unit is configured to notify the service cluster that does not initiate voting in the two clusters if the service cluster that initiated the voting obtains a predetermined number of votes within a preset time.
  15. 根据权利要求14所述的装置,其特征在于,所述第一确定单元包括:The apparatus according to claim 14, wherein the first determining unit comprises:
    检测子单元,用于若所述两个业务集群与所述数据的连接均正常,所述两个业务集群中每个业务集群分别检测其它业务集群是否发起了投票;a detecting subunit, configured to: if each of the two service clusters is connected to the data, each of the two service clusters detects whether another service cluster initiates voting;
    第一确定子单元,用于当所述两个业务集群中的第一业务集群未检测到其它业务集群发起了投票,确定所述第一业务集群具有发起投票的权利;a first determining subunit, configured to: when the first service cluster of the two service clusters does not detect that another service cluster initiates voting, determining that the first service cluster has the right to initiate voting;
    第二确定子单元,用于若所述两个业务集群中只有一个与所述数据库的连接正常,则确定与所述数据库连接正常的业务集群具有发起投票的权利;a second determining subunit, configured to: if only one of the two service clusters is normally connected to the database, determine that the service cluster that is normally connected to the database has the right to initiate voting;
    第三确定子单元,用于若所述两个业务集群与所述数据库的连接均不正常,则确定所述两个业务集群均不具有发起投票的权利。And a third determining subunit, configured to determine that the two service clusters do not have the right to initiate voting if the connection between the two service clusters and the database is abnormal.
  16. 根据权利要求14所述的装置,其特征在于,所述第一发起投票单元包括:The apparatus according to claim 14, wherein said first initiating voting unit comprises:
    写入单元,用于在所述数据库中写入投票记录和时间戳。A write unit for writing a vote record and a timestamp in the database.
  17. 根据权利要求16所述的装置,其特征在于,所述检测子单元,包括:The device according to claim 16, wherein the detecting subunit comprises:
    查询子单元,用于在所述数据库中查询业务集群的投票记录和时间戳;Querying a subunit, configured to query a voting record and a timestamp of the service cluster in the database;
    判定子单元,用于当所述其它业务集群的投票记录和时间戳存在且当前时间未超出从所述时间戳开始的预设时间时,判定所述其它业务集群发起了投票;当所述其它业务集群的投票记录不存在或者当前时间超出从所述时间戳开始的预设时间时,判定所述其它业务集群没有发起投票。a determining subunit, configured to: when the voting record and the timestamp of the other service cluster exist and the current time does not exceed a preset time starting from the timestamp, determine that the other service cluster initiates voting; when the other When the voting record of the service cluster does not exist or the current time exceeds the preset time from the time stamp, it is determined that the other service cluster does not initiate voting.
  18. 根据权利要求14所述的装置,其特征在于,还包括:The device according to claim 14, further comprising:
    第三判断单元,用于判断所述使用终端中满足预设条件的终端的数量是否达到了预设的投票者集合的数量;a third determining unit, configured to determine whether the number of terminals that meet the preset condition in the using terminal reaches a preset number of voting sets;
    第二确定单元,用于若满足预设条件的使用终端的数量超过了预设的投票者集合的数量,则从所述使用者中确定投票者集合中的投票者;a second determining unit, configured to determine, from the user, a voter in the set of voters if the number of used terminals satisfying the preset condition exceeds the preset number of voters;
    第三确定单元,用于若满足预设条件的使用终端的数量未超过预设的投票者集合的数量,从满足预设条件的仲裁终端中选择仲裁终端以和预设条件的使用终端一起确定所述投票者集合。a third determining unit, configured to: if the number of used terminals that meet the preset condition does not exceed the preset number of voter sets, select an arbitration terminal from the arbitration terminal that meets the preset condition to determine with the use terminal of the preset condition The set of voters.
  19. 根据权利要求18所述的装置,其特征在于,所述预设条件为:The device according to claim 18, wherein said preset condition is:
    使用终端在线时间超过预设的时间且状态稳定。The terminal is online for more than the preset time and the state is stable.
  20. 根据权利要求14所述的装置,其特征在于,所述通知单元,包括:The device according to claim 14, wherein the notification unit comprises:
    写入子单元,用于将所述投票结果写入所述数据库;Writing a subunit for writing the voting result to the database;
    读取子单元,用于所述未发起投票的业务集群从所述数据库中读取所述投票结果。Reading a subunit for reading the voting result from the database for the business cluster that did not initiate voting.
  21. 根据权利要求15所述的装置,其特征在于,所述通知单元,包括:The device according to claim 15, wherein the notification unit comprises:
    第一发送子单元,用于将所述投票结果发送给连接到所述发起投票的业务集群的使用终端和/或仲裁终端;a first sending subunit, configured to send the voting result to a using terminal and/or an arbitration terminal connected to the service cluster that initiates the voting;
    第二发送子单元,用于当所述使用终端和/或仲裁终端与所述未发起投票的业务集群相连接时,将所述投票结果发送给所述未发起投票的业务集群。And a second sending subunit, configured to send the voting result to the service cluster that does not initiate voting when the using terminal and/or the arbitration terminal are connected to the service cluster that does not initiate voting.
  22. 根据权利要求14所述的装置,其特征在于,还包括:The device according to claim 14, further comprising:
    第二发起投票单元,用于若发起投票的业务集群未在预设的时间内获得预定的票数,所述两个业务集群中未发起投票的业务集群发起投票并返回第二判断单元。The second initiating voting unit is configured to: if the service cluster that initiates the voting does not obtain the predetermined number of votes in a preset time, the service cluster that has not initiated the voting in the two service clusters initiates voting and returns to the second determining unit.
  23. 根据权利要求14所述的装置,其特征在于,所述预定的票数为所述投票者集合中半数以上的票数。The apparatus according to claim 14, wherein said predetermined number of votes is more than half of said votes in said set of voters.
  24. 根据权利要求14所述的装置,其特征在于,发起投票的业务集群获得的票数是所述投票者投票的加权和。The apparatus of claim 14 wherein the number of votes obtained by the polling service cluster is a weighted sum of said voter votes.
  25. 根据权利要求24所述的装置,其特征在于,所述投票者的权重根据其重要性而改变。The apparatus of claim 24 wherein the weight of the voter changes according to its importance.
  26. 根据权利要求14所述的装置,其特征在于,不具有发起投票权利的业务集群不发起投票。The apparatus of claim 14 wherein the service cluster that does not have the right to initiate voting does not initiate voting.
  27. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理设备执行时实现权利要求1-13所述的方法。A computer readable storage medium having stored thereon a computer program that, when executed by a processing device, implements the method of claims 1-13.
  28. 一种分布式集群脑裂的处理设备,其特征在于,所述设备包括:A processing device for a distributed cluster brain split, characterized in that the device comprises:
    存储器,用于存储程序;Memory for storing programs;
    处理器,用于运行所述程序,当所述处理器运行所述程序时,所述处理器实现了权利要求1-13所述的分布式集群脑裂的处理方法。a processor for running the program, the processor implementing the distributed cluster splitting processing method of claims 1-13 when the processor runs the program.
PCT/CN2017/117131 2017-12-19 2017-12-19 Distributed cluster split-brain processing method, apparatus, and device WO2019119263A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/117131 WO2019119263A1 (en) 2017-12-19 2017-12-19 Distributed cluster split-brain processing method, apparatus, and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/117131 WO2019119263A1 (en) 2017-12-19 2017-12-19 Distributed cluster split-brain processing method, apparatus, and device

Publications (1)

Publication Number Publication Date
WO2019119263A1 true WO2019119263A1 (en) 2019-06-27

Family

ID=66994284

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/117131 WO2019119263A1 (en) 2017-12-19 2017-12-19 Distributed cluster split-brain processing method, apparatus, and device

Country Status (1)

Country Link
WO (1) WO2019119263A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115190046A (en) * 2022-04-13 2022-10-14 统信软件技术有限公司 Detection method and detection device for server cluster and computing equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103905247A (en) * 2014-03-10 2014-07-02 北京交通大学 Two-unit standby method and system based on multi-client judgment
US20140207925A1 (en) * 2013-01-18 2014-07-24 Microsoft Corporation Cluster voter model
CN105704187A (en) * 2014-11-27 2016-06-22 华为技术有限公司 Processing method and apparatus of cluster split brain
CN106789193A (en) * 2016-12-06 2017-05-31 郑州云海信息技术有限公司 A kind of cluster ballot referee method and system
CN106789197A (en) * 2016-12-07 2017-05-31 高新兴科技集团股份有限公司 A kind of cluster election method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140207925A1 (en) * 2013-01-18 2014-07-24 Microsoft Corporation Cluster voter model
CN103905247A (en) * 2014-03-10 2014-07-02 北京交通大学 Two-unit standby method and system based on multi-client judgment
CN105704187A (en) * 2014-11-27 2016-06-22 华为技术有限公司 Processing method and apparatus of cluster split brain
CN106789193A (en) * 2016-12-06 2017-05-31 郑州云海信息技术有限公司 A kind of cluster ballot referee method and system
CN106789197A (en) * 2016-12-07 2017-05-31 高新兴科技集团股份有限公司 A kind of cluster election method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115190046A (en) * 2022-04-13 2022-10-14 统信软件技术有限公司 Detection method and detection device for server cluster and computing equipment
CN115190046B (en) * 2022-04-13 2024-01-23 统信软件技术有限公司 Detection method, detection device and computing equipment of server cluster

Similar Documents

Publication Publication Date Title
US11222043B2 (en) System and method for determining consensus within a distributed database
CN106878473B (en) Message processing method, server cluster and system
US10614098B2 (en) System and method for determining consensus within a distributed database
CN107295080B (en) Data storage method applied to distributed server cluster and server
US20120159234A1 (en) Providing resilient services
CN107769943B (en) Method and equipment for switching main and standby clusters
JP5863942B2 (en) Provision of witness service
WO2016150066A1 (en) Master node election method and apparatus, and storage system
CN106330475B (en) Method and device for managing main and standby nodes in communication system and high-availability cluster
US20150339200A1 (en) Intelligent disaster recovery
CN109496401B (en) Service takeover method, storage device and service takeover device
CN103560922A (en) Disaster recovery method and system
WO2017041616A1 (en) Data reading and writing method and device, double active storage system and realization method thereof
WO2018058941A1 (en) Method for detecting communication status of cluster system, and gateway cluster
CN103257908A (en) Software and hardware cooperative multi-controller disk array designing method
CN108134712B (en) Distributed cluster split brain processing method, device and equipment
US20210320977A1 (en) Method and apparatus for implementing data consistency, server, and terminal
CN106909307B (en) Method and device for managing double-active storage array
CN110162428A (en) Method of data synchronization and device, electronic equipment and computer readable storage medium
CN109495540A (en) A kind of method, apparatus of data processing, terminal device and storage medium
US10721135B1 (en) Edge computing system for monitoring and maintaining data center operations
WO2019119263A1 (en) Distributed cluster split-brain processing method, apparatus, and device
CN108765683A (en) A kind of ballot system based on block chain technology
CN111309515B (en) Disaster recovery control method, device and system
CN104199866B (en) A kind of offer machine abnormality eliminating method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17935101

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17935101

Country of ref document: EP

Kind code of ref document: A1