CN112181305A

CN112181305A - Database cluster network partition selection method and device

Info

Publication number: CN112181305A
Application number: CN202011062473.6A
Authority: CN
Inventors: 冷建全; 杨尚
Original assignee: Beijing Kingbase Information Technologies Co Ltd
Current assignee: Beijing Kingbase Information Technologies Co Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2021-01-05

Abstract

The invention relates to a method and a device for selecting partitions of a database cluster network, which are characterized in that when a network partition fault occurs, a first node writes a first partition selection message into a shared storage, traverses the shared storage, determines that a second node writes a second partition selection message into the shared storage, changes a first partition number in the first partition selection message written into the shared storage into a second partition number in the second partition selection message, determines a partition corresponding to the partition number written into the shared storage as a target partition after all nodes write partition selection messages, the node in the target partition has access authority of the database cluster network, and the rest nodes have no access authority of the database cluster network, namely, the partition numbers written into the shared storage by the first node and the second node are the same, thereby ensuring that a unique survival partition can be selected when the node which simultaneously fails in a cluster exceeds half of the total number of nodes of the whole cluster, thus, the availability of the database cluster is improved.

Description

Database cluster network partition selection method and device

Technical Field

The present disclosure relates to the field of database technologies, and in particular, to a method and an apparatus for selecting a partition of a database cluster network.

Background

With the rapid development of the internet, the data volume has explosively increased, and the high-availability cluster technology is widely applied to the field of databases. A database cluster is a cluster that combines a group of database servers, each of which is called a node of the cluster, to provide services to users using a uniform interface.

During the process of providing services by the cluster, faults such as network congestion and network interruption may occur to the nodes, which causes network partitioning of the cluster, that is, the cluster is classified into multiple partitions. Nodes in the same partition may communicate with each other but not with nodes in another partition. Data inconsistency problems can arise when nodes in multiple partitions still have access to shared resources. A voting algorithm is typically used to select the unique surviving partition. Each node in the cluster confirms the existence of the other party through a heartbeat mechanism, the heartbeat number of each node received represents a ticket, and if the number of the tickets received by the node exceeds half of the total number of the nodes of the whole cluster, the partition where the node is located is in an available state and can provide service for the outside; if the number of votes received by the node does not exceed half of the total number of nodes of the whole cluster, the partition in which the node is located is in a paralyzed state, all the nodes in the partition stop service and cannot access the shared resource.

However, with the prior art method, when the total number of nodes in the cluster which simultaneously fail exceeds half of the total number of nodes in the whole cluster, no surviving partition exists in the cluster, so that the cluster is in a paralyzed state, and the availability of the database cluster is influenced.

Disclosure of Invention

In order to solve the technical problem or at least partially solve the technical problem, the present disclosure provides a database cluster network partition selection method and apparatus.

In a first aspect, the present disclosure provides a database cluster network partition selection method, including:

when a network partition fault occurs, a first node writes a first partition selection message into a shared memory, wherein the first partition selection message comprises: a first proposal number and a first partition number;

the first node traverses the shared storage, and the first node determines that a second node has written a second partition selection message in the shared storage, wherein the second partition selection message comprises: a second proposal number and a second partition number;

the first node changes the first partition number written in the shared storage into a second partition number;

and after all the nodes write the partition selection message, determining that the partition corresponding to the partition number written in the shared storage is a target partition, wherein the nodes in the target partition have the access authority of the database cluster network, and the rest nodes have no access authority of the database cluster network.

Optionally, the method further includes:

and traversing the shared storage by the first node, and determining that no second node writes a second partition selection message in the shared storage by the first node, and then determining to write the first partition selection message again.

Optionally, before the first node writes the first partition selection message into the shared storage, the method further includes:

the first node determines the partition number containing more nodes as the first partition number according to the state of each node and the connection state between the nodes stored in the shared storage;

or, the first node determines the partition number where the first node is located as the first partition number;

or, the first node determines that the partition number where the node with the largest node identifier is located is the first partition number;

or, the first node determines that the partition number where the node with the smallest node identifier is located is the first partition number.

Optionally, after all the nodes write the partition selection message, before determining that the partition corresponding to the partition number in the shared storage is the target partition, the method further includes:

and determining that the number of nodes contained in the partition corresponding to the partition number is greater than a preset threshold value.

Optionally, after determining that the partition corresponding to the partition number in the shared storage is the target partition, the method further includes:

and the nodes in the target partition send partition updating instructions to the shared storage, wherein the partition updating instructions are used for indicating that the nodes in the target partition have the access authority of the database cluster network, and the rest nodes do not have the access authority of the database cluster network.

In a second aspect, the present disclosure provides a database cluster network partition selecting apparatus, including:

a first writing module, configured to write a first partition selection message into a shared memory when a network partition failure occurs, where the first partition selection message includes: a first proposal number and a first partition number;

a processing module, configured to traverse the shared storage, and determine that a second node has written a second partition selection message in the shared storage, where the second partition selection message includes: a second proposal number and a second partition number;

the second writing module is used for changing the first partition number written in the shared storage into a second partition number;

and the determining module is used for determining that the partition corresponding to the partition number written in the shared storage is a target partition after all the nodes write in the partition selection message, wherein the nodes in the target partition have the access authority of the database cluster network, and the rest nodes have no access authority of the database cluster network.

Optionally, the processing module is further configured to traverse the shared storage, and determine that no second node writes a second partition selection message in the shared storage, and then the second writing module is further configured to determine to write the first partition selection message again.

Optionally, the first writing module is configured to determine, before writing the first partition selection message into the shared storage, a partition number including a plurality of nodes as the first partition number according to the state of each node stored in the shared storage and the connection state between the nodes;

or, the mobile terminal is further configured to determine the partition number where the mobile terminal is located as the first partition number;

or, the node identifier is further used for determining that the partition number where the node with the largest node identifier is located is the first partition number;

or, the method is further configured to determine that the partition number where the node with the smallest node identifier is located is the first partition number.

Optionally, the determining module is configured to determine that the number of nodes included in the partition corresponding to the partition number is greater than a preset threshold before determining that the partition corresponding to the partition number written in the shared storage is the target partition after all the nodes write the partition selection message.

Optionally, the method further includes:

a sending module, configured to send a partition update instruction to the shared storage, where the partition update instruction is used to indicate that a node in the target partition is an access right of a database cluster network, and the other nodes have no access right of the database cluster network.

In a third aspect, the present disclosure provides a database cluster system, including: at least two database nodes for performing the steps of the method of any of the first aspects and a shared storage.

In a fourth aspect, the present disclosure provides a computer device comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any of the first aspect when executing the program.

In a fifth aspect, the present disclosure provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of any one of the first aspects.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:

when a network partition fault occurs, the first node writes a first partition selection message into the shared memory, wherein the first partition selection message comprises: the first node traverses the shared memory, and determines that the second node has written a second partition selection message in the shared memory, wherein the second partition selection message comprises: the first node changes the first partition number written in the shared storage into the second partition number, after all nodes write partition selection information, the partition corresponding to the partition number written in the shared storage is determined to be a target partition, the nodes in the target partition have access authority of the database cluster network, and the other nodes have no access authority of the database cluster network.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a network architecture diagram of a database cluster provided by the present disclosure;

fig. 2 is a schematic flowchart of an embodiment of a partition selection method for a database cluster network according to the present disclosure;

FIG. 3 is a schematic flow chart diagram illustrating an embodiment of another database cluster network partition selection method provided by the present disclosure;

fig. 4 is a schematic flowchart of another embodiment of a partition selection method for a database cluster network according to the present disclosure;

fig. 5 is a timing diagram illustrating an embodiment of a partition selection method for a database cluster network according to the present disclosure;

FIG. 6 is a schematic structural diagram of a database cluster network partition-based selection apparatus according to the present disclosure;

FIG. 7 is a schematic diagram of another database cluster network partition-based selection apparatus according to the present disclosure;

fig. 8 is a schematic structural diagram of a database cluster system provided in the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

The technical solutions of the present disclosure are described in several specific embodiments, and the same or similar concepts may be referred to one another, and are not described in detail in each place.

Fig. 1 is a network architecture diagram of a database cluster provided by an embodiment of the present disclosure, as shown in fig. 1,

the embodiment provides a network architecture of a Real Application Cluster (RAC), including: a client, at least two database nodes, and a shared store. The client is connected with the database node, and the database node is connected with the shared storage. The RAC is a parallel database cluster, and has two or more database nodes, each of which communicates through a private network, and monitors the operation state of the nodes, all the data files, online log files, control files, etc. of the database are stored in the shared storage device of the cluster, the nodes in all the clusters can read and write the shared storage at the same time, and the shared storage can include a plurality of different storage devices.

A method for selecting a partition of a database cluster network according to an embodiment of the present disclosure is described in detail based on a network architecture diagram shown in fig. 1.

Fig. 2 is a schematic flowchart of an embodiment of a partition selection method for a database cluster network provided in the present disclosure, and as shown in fig. 2, the method of this embodiment includes:

s201: when a network partition failure occurs, the first node writes a first partition selection message into the shared storage.

Wherein the first partition selection message comprises: a first proposal number and a first partition number.

Each node in the database cluster has its own node identifier (fixed number), and the shared storage device has a corresponding storage area. And continuously writing heartbeat information into the corresponding storage areas by each node, wherein the heartbeat information among the nodes forms a communication state of the database cluster network. When a network partition fault occurs, that is, a first node cannot detect heartbeats of other nodes within a specified time, the first node initiates a partition voting, and writes a first partition selection message into a storage area corresponding to the first node in the shared storage, where the first partition selection message includes: a first proposal number and a first partition number. Each partition selection message includes an independent proposal number, which is an incremental sequence number generated by each node that is globally unique throughout the database cluster.

One possible implementation: and the first node writes the first proposal number in the first partition selection message into a storage area corresponding to the first node in the shared storage.

S202: the first node traverses the shared memory and determines that the second node has written the second partition selection message in the shared memory.

Wherein the second partition selection message includes: a second proposal number and a second partition number.

And the first node traverses storage areas corresponding to other nodes in the database cluster in the shared storage, and determines that the second node writes a second partition selection message in the storage area corresponding to the second node in the shared storage.

S203: the first node changes the first partition number written in the shared memory to the second partition number.

The first node obtains a second proposal number in a second partition selection message in the shared storage, and compares the size of the second proposal number with that of the first proposal number. If the second proposal number is smaller than the first proposal number, the first node changes the first partition number written in the shared memory to the second partition number.

Optionally, if the second proposal number is greater than the first proposal number, the first node abandons the vote and reinitiates the partition vote after a specified time, for example, the first node reinitiates the partition vote after 15S.

S204: and when all the nodes write the partition selection message, determining the partition corresponding to the partition number written in the shared storage as a target partition.

And the nodes in the target partition have the access authority of the database cluster network, and the other nodes have no access authority of the database cluster network.

After all the nodes complete partition voting, partition numbers written in the storage areas corresponding to the nodes in the shared storage are the same, and the partitions corresponding to the partition numbers are target partitions, namely, the only viable partitions in the database cluster.

In this embodiment, when a network partition failure occurs, the first node writes a first partition selection message into the shared storage, where the first partition selection message includes: the first node traverses the shared memory, and determines that the second node has written a second partition selection message in the shared memory, wherein the second partition selection message comprises: the first node changes the first partition number written in the shared storage into the second partition number, after all nodes write partition selection information, the partition corresponding to the partition number written in the shared storage is determined to be a target partition, the nodes in the target partition have access authority of the database cluster network, and the other nodes have no access authority of the database cluster network.

Fig. 3 is a schematic flowchart of another embodiment of a method for selecting a partition of a database cluster network provided by the present disclosure, and fig. 3 is a description of a possible implementation manner of S202 and S203 based on the embodiment shown in fig. 2, and includes:

s302: the first node traverses the shared memory and determines that no second node writes a second partition selection message in the shared memory.

And the first node traverses storage areas corresponding to other nodes in the database cluster in the shared storage, and determines that no other node writes the partition selection message in the storage area corresponding to the shared storage.

S303: the first node again determines to write the first partition selection message.

And the first node writes the first partition number in the first partition selection message into the corresponding storage area in the shared storage again.

In this embodiment, the first node traverses the shared storage, determines that no second node writes the second partition selection message in the shared storage, and then determines to write the first partition selection message again, so that the uniqueness of the partition numbers written by the nodes in the shared storage is ensured, and therefore, when the number of the nodes in the cluster which simultaneously fail exceeds half of the total number of the nodes in the whole cluster, a unique viable partition can be elected.

Optionally, in S201, before the first node writes the first partition selection message into the shared memory, the method may further include the following steps:

the first node determines the partition number containing more nodes as a first partition number according to the state of each node and the connection state between the nodes stored in the shared storage. If the plurality of partitions with the same number of nodes exist, the first node determines that the partition number where the node with the largest node identification is located or the partition number where the node with the smallest node identification is located is the first partition number.

The rule for the first node to determine the first partition number according to the state of each node and the connection state between nodes stored in the shared storage may further include, but is not limited to, the following rules:

the first node determines the partition number where the first node is located as a first partition number;

the first node determines the partition number where the node with the maximum node identification is located as a first partition number;

and the first node determines the partition number where the node with the minimum node identification is located as the first partition number.

Optionally, in S204, after all the nodes write the partition selection message, before determining that the partition corresponding to the partition number written in the shared storage is the target partition, the method may further include the following steps:

and determining that the number of nodes contained in the partition corresponding to the partition number written in the shared storage is greater than a preset threshold value.

The preset threshold is an integer, the minimum value is 0, the maximum value is N-1, and N is the total number of nodes of the whole database cluster. After all the nodes complete voting, the first node reads the partition number written in the shared storage, and judges whether the partition number where the first node is located is the partition number written in the shared storage. If the partition number of the first node is the partition number written in the shared storage, judging whether the number of the nodes contained in the partition of the first node is greater than a preset threshold value, and if the number of the nodes contained in the partition of the first node is greater than the preset threshold value, determining that the partition of the first node is the target partition; and if the number of the nodes contained in the partition where the first node is located is less than or equal to a preset threshold, the surviving partition is not selected, and the cluster stops service. Nodes in the cluster may reinitiate the partition voting or wait for manual intervention. If the partition number of the first node is not the partition number written in the shared storage, the partition of the first node is definitely not the target partition, and it is not necessary to judge whether the number of nodes included in the partition of the first node is greater than a preset threshold value.

Fig. 4 is a schematic flowchart of another embodiment of a database cluster network partition selection method provided by the present disclosure, and fig. 4 is based on the embodiment shown in fig. 2, and after S204, the method may further include the following steps, including:

s401: the node in the target partition sends a partition update instruction to the shared storage.

And the partition updating instruction is used for indicating that the nodes in the target partition have the access right of the database cluster network, and the rest nodes have no access right of the database cluster network.

One possible implementation is: and sending a partition updating instruction to the shared storage by the node in the target partition to lock the shared storage equipment, wherein the node in the target partition can access the shared resource of the database cluster network, and the node in the non-target partition cannot access the shared resource of the database cluster network after finding that the shared storage is locked.

Another possible implementation is: and the nodes in the target partition send partition updating instructions to the nodes in the non-target partition through the serial ports, and the power supply of the nodes in the non-target partition is controlled to be powered off, so that the nodes in the non-target partition cannot access the shared resources of the database cluster network.

In this embodiment, a partition update instruction is sent to the shared storage by a node in the target partition, where the partition update instruction is used to indicate that the node in the target partition has an access right of the database cluster network, and the other nodes do not have an access right of the database cluster network, so that it is ensured that only one surviving partition in the database cluster network can access the shared resource, and thus, the data consistency of the database cluster is improved.

Taking a database cluster including node 1 and node 2 as an example, node 1 and node 2 have respective storage areas in the shared storage. When a network partition failure occurs, each node initiates a partition selection message, and assuming that node 1 and node 2 select the partition where each node is located as a surviving partition, the two nodes can achieve consistent selection when submitting partition selection. As shown in fig. 5, according to one possible timing scheme, the following is illustrated:

s501: the node 1 initiates partition selection, the proposal content is a proposal number 101 and a partition 1, and the phase 1 is entered.

The node 1 writes the empty proposal content into the storage corresponding to the node 1 in the shared storage.

S502: and the node 1 reads the storage area corresponding to the node 2 in the shared storage to obtain the empty proposal content.

S503: node 1 sets the proposal content as proposal number 101 and partition 1, and enters stage 2.

S504: the node 1 writes the proposal content into the storage area corresponding to the node 1 in the shared storage.

S505: and the node 1 reads the storage area corresponding to the node 2 in the shared storage to obtain the empty proposal content. Node 1 submits the proposal content.

S506: the node 2 initiates partition selection, the proposal content is a proposal number 102 and the partition 2, and the phase 1 is entered.

The node 2 writes the empty proposal content into the storage area corresponding to the node 2 in the shared storage.

S507: the node 2 reads the storage area corresponding to the node 1 in the shared storage, and acquires the proposal content of the node 1 as a proposal number 101 and a partition 1.

S508: the node 2 sets the proposal content as the partition number corresponding to the proposal number 102 and the largest proposal number in other proposal contents: partition 1, enter stage 2.

S509: the node 2 writes the proposal content into the storage area corresponding to the node 2 in the shared storage.

S510: reading a storage area corresponding to the node 1 in the shared storage by the node 2, and reading the proposed content of the node 1: proposal number 101 and partition 1. The node 2 submits the proposal content.

Fig. 6 is a schematic structural diagram of a database cluster network partition selection apparatus provided in the present disclosure, where the apparatus of this embodiment includes: a first write module 601, a processing module 602, a second write module 603, and a determination module 604.

The first writing module 601 is configured to write a first partition selection message into the shared storage when a network partition failure occurs, where the first partition selection message includes: a first proposal number and a first partition number;

a processing module 602, configured to traverse the shared storage, and determine that a second node has written a second partition selection message in the shared storage, where the second partition selection message includes: a second proposal number and a second partition number;

a second writing module 603, configured to change the first partition number written in the shared storage into a second partition number;

the determining module 604 determines, after all the nodes write the partition selection message, that the partition corresponding to the partition number written in the shared storage is the target partition, where the node in the target partition has the access right of the database cluster network, and the other nodes have no access right of the database cluster network.

The first writing module 601, the processing module 602, and the second writing module 603 may be implemented by a partition selection module of a database node, and the determining module 604 may be implemented by a cluster management module of the database node.

The above-described embodiment of the apparatus may be correspondingly used to implement the technical solution of the method embodiment shown in any one of fig. 2 to fig. 5, and the implementation principle and the technical effect are similar, which are not described herein again.

Optionally, the processing module 602 is further configured to traverse the shared storage, and determine that no second node writes a second partition selection message in the shared storage, and then the second writing module 603 is further configured to determine to write the first partition selection message again.

Optionally, the first writing module 601 is configured to, before writing the first partition selection message into the shared storage, determine, according to the state of each node and the connection state between nodes stored in the shared storage, that the partition number including many nodes is the first partition number;

Optionally, the determining module 604 is configured to, after all the nodes write the partition selection message, determine that the partition corresponding to the partition number written in the shared storage is the target partition, and further determine that the number of nodes included in the partition corresponding to the partition number is greater than a preset threshold.

Fig. 7 is a schematic structural diagram of another database cluster network partition selection apparatus provided by the present disclosure, where fig. 7 is based on the embodiment shown in fig. 6, and further includes: a sending module 701.

The sending module 701 is configured to send a partition update instruction to the shared storage, where the partition update instruction is used to indicate that a node in the target partition is an access right of a database cluster network, and other nodes do not have the access right of the database cluster network.

Fig. 8 is a schematic structural diagram of a database cluster system provided in the present disclosure, including: at least 2 database nodes 801 and a shared store 802. The database node 801 is configured to execute the technical solution of any one of the method embodiments shown in fig. 2 to fig. 5, and the implementation principle and the technical effect are similar, which are not described herein again.

The disclosed embodiment provides a computer device, including: the memory, the processor, and the computer program stored in the memory and capable of running on the processor, where the processor may implement the technical solution of any one of the methods shown in fig. 2 to 5 when executing the program, and the implementation principle and the technical effect are similar, and are not described herein again.

The embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the technical solution of any one of the method embodiments shown in fig. 2 to 5.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A database cluster network partition selection method is characterized by comprising the following steps:

2. The method of claim 1, further comprising:

3. The method according to claim 1 or 2, wherein before the first node writes the first partition selection message into the shared memory, further comprising:

4. The method according to claim 3, wherein after the nodes all write the partition selection message, before determining the partition corresponding to the partition number in the shared storage as the target partition, further comprising:

5. The method according to claim 3, wherein after determining that the partition corresponding to the partition number in the shared storage is the target partition, further comprising:

6. A database cluster network partition selection apparatus, comprising:

7. A database clustering system, comprising: at least two database nodes and a shared storage, the database nodes being adapted to perform the steps of the method of any of claims 1-5.

8. A computer device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 5 are implemented when the processor executes the program.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.