CN117873998A

CN117873998A - High-availability database cluster management system based on raft protocol

Info

Publication number: CN117873998A
Application number: CN202311712709.XA
Authority: CN
Inventors: 陆太宝; 苏飞; 周国剑
Original assignee: Tianyi Cloud Technology Co Ltd
Current assignee: Tianyi Cloud Technology Co Ltd
Priority date: 2023-12-13
Filing date: 2023-12-13
Publication date: 2024-04-12

Abstract

The invention discloses a high-availability database cluster management system based on a shift protocol, which belongs to the field of database management and comprises the following components: the node deployment module is used for configuring a set of agent components for each set of database clusters, constructing a shift cluster through the agent components, deploying a Leader node and a Follower node through the shift cluster, deploying a DB cluster, and deploying a Slave node and a Master node through the DB cluster; the node selection master module is used for realizing the initialization selector and disaster recovery master of the raft cluster and the DB cluster sum based on the consensus capability of the raft cluster; and the state query module is used for providing HttpAPI service, and the control surface queries the cluster state through the API or executes manual master cutting action. The invention can avoid the performance problem of the centralized management component, isolate faults among services, reduce the dependence of external components, lighten cluster deployment and provide the consistency read-write capability of service data.

Description

High-availability database cluster management system based on raft protocol

Technical Field

The invention belongs to the field of database management, and particularly relates to a high-availability database cluster management system based on a raft protocol.

Background

The database system is the basic stone of the business system, and its reliability and availability directly determine the stability of the business system. Building a highly available database system has been the most interesting for business developers and has accumulated much solution and experience in this regard. In the early stage, database deployment and management are very dependent on DBA manual operation, so that the labor input cost is high, and the fault recovery time is long. With the development of business, the problems are more and more prominent, and the development of various high-availability database cluster management architectures is promoted. A common high-availability architecture of a database uses a consensus storage system of a third party to store the consistency state of a cluster and cluster management tasks; and then the self-research management component is matched to realize the management and disaster recovery switching of the database cluster, thereby forming a framework with automatic management and high availability.

The system aims to solve the problems of scale bottleneck of the third party consensus storage system, wide fault influence range of the third party consensus system, efficiency reduction after the management component centrally manages a large number of clusters and interaction among services. The scheme of the invention adopts a decentralization idea, and a raft cluster is built on a agent component of a database instance to replace a third party storage system and a central management component; the method provides an autonomous information center and an arbitration center based on the consensus capability of the raft clusters, realizes the capabilities of cluster consistency selectors, disaster recovery and cutting masters and the like, provides highly autonomous and highly available database cluster services, and provides a scheme for realizing a highly available architecture system of the database clusters based on the raft.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a high-availability database cluster management system based on a raft protocol, which can effectively solve the problems in the background art.

In order to achieve the above purpose, the present invention provides the following technical solutions: comprising the following steps:

the node deployment module is used for configuring a set of agent components for each set of database clusters, constructing a shift cluster through the agent components, deploying a Leader node and a Follower node through the shift cluster, deploying a DB cluster, and deploying a Slave node and a Master node through the DB cluster;

the node selection master module is used for realizing the initialization selector and disaster recovery master of the raft cluster and the DB cluster sum based on the consensus capability of the raft cluster;

and the state query module is used for providing HttpAPI service, and the control surface queries the cluster state through the API or executes manual master cutting action.

The invention is further improved in that the node deployment module comprises a node allocation unit and a state reporting unit, wherein the node allocation unit is used for deploying the agent instance and the DB instance 1:1 in the same physical node or POD; the state reporting unit is used for setting a report to enable each agent instance to report the local DB state at fixed time, and the Leader node writes data and synchronizes to each Follower node; each agent instance stores a piece of globally consistent state data.

The invention is further improved in that the node allocation unit is also used for processing the request from the management and control console through the ApiHandler unit, inquiring and returning to the cluster state or executing the cluster setting request; the return cluster state is realized through a storage module StatusStorage unit, and the StatusStorage unit is used for storing the state of the clusters and the state list of each DB cluster; the cluster setting request is realized by setting a Raftnode node, and the Raftnode is used for constructing and maintaining the raft cluster and receiving the instance state and the cluster state change reported by the report unit.

The invention is further improved in that a Master node in the DB cluster and a Leader node of the shift cluster cannot be in the same physical node or POD, and when the Leader selects a DB instance with the Master node as a local one, the Leader role is automatically transferred to the shift node.

The node selection main module comprises a cluster selection main flow, wherein the cluster selection main flow is responsible for an electric unit, the electric unit is triggered by a timer or a real-time message and comprises 3 sub flows of an initialization main selection main, a forced main cutting main and a disaster recovery main, the initialization main selection main is used for executing the initialization main selection for uninitialization, the forced main cutting main is used for executing the forced main cutting for a manual main cutting main, the disaster recovery main is used for triggering the disaster recovery main by setting a disaster recovery time threshold value when the abnormal time of the main state exceeds the disaster recovery time threshold value; meanwhile, the cluster state changes with the main selection process.

The invention further improves that the disaster recovery and cutting main comprises a header node disaster recovery and cutting main in a raft cluster and a Master node disaster recovery and cutting main in a DB cluster, wherein the header node disaster recovery and cutting main process in the raft cluster comprises that after the header node fails, a Follower node automatically initiates main selection, one Follower is finally selected and promoted to be a header, the old header is restored and then a new header is identified to be generated, and the new header is automatically reduced to the Follower; after the Master node fails, reporting the abnormal situation by the upper layer of raft cluster through the agent, actively initiating switching after exceeding the set disaster recovery time threshold, selecting one of the Slave as a new Master, notifying all agent nodes to execute switching, and after switching is completed, when the old Master is restored, issuing a Master switching request by the corresponding agent, wherein the old Master is changed into the Slave.

The invention is further improved in that when the nodes of the raft cluster and the nodes of the DB cluster simultaneously fail under the same environment, the automatic recovery of the cluster state is realized by solving the scene of single-Leader failure or single-Master failure.

The invention further improves in that the cluster state is maintained by a Leader node, a cluster state model is generated, an initialization state in the cluster state model is used as a first state of a new cluster, the initialization state is changed into a running state after a host is selected, when the cluster is in the running state, if a system detects that a Master fails, the cluster is converted into a failure state, the maintenance time of the failure state is set to be T, if the Master is recovered in a period of T, the cluster state is converted into the running state, otherwise, the cluster is converted into a recovery state or a UnRecovring state; then the model initiates reselection of the master, if the cluster state after the master is selected, the cluster is changed into a recovery state, meanwhile, the cluster enters a change state after waiting until the master has completed all log application, or in the past of the master log application, the old master is restored, the cluster enters a run state, if no new master meeting the conditions is found, the cluster enters a UnRecovering state, at the moment, the cluster enters the recovery state after the inconsistent condition is manually processed, or the old master is restored in the process, and the cluster enters the run state; and finally, the recovery state enters a Changing state, a new owner is determined at the moment, the owner cutting operation is executed by each agent component, and the cluster is restored to a running state after all node operations are completed.

An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a high availability database cluster management system based on a raft protocol as described above when executing the computer program.

A computer readable storage medium having stored thereon a computer program which when executed by a processor implements a high availability database cluster management system based on a raft protocol as described above.

Compared with the prior art, the invention has the beneficial effects that:

1. each set of database cluster is provided with a set of own management component, so that the performance problem of the centralized management component is avoided, and faults among businesses are isolated;

2. a third party consensus storage system is not used, so that the scale bottleneck of the consensus storage system is avoided, and the fault influence surface is effectively reduced;

3. the method has the advantages that the raft clusters are built on the existing agent components, so that an autonomous information center and an arbitration center are realized, the dependence of external components is reduced, and the cluster deployment is lighter;

4. based on the consensus capability of the raft cluster, the method realizes the selection and disaster recovery of the cluster and provides the service data consistency reading and writing capability.

Drawings

FIG. 1 shows a schematic diagram of a high availability database cluster management system based on the raft protocol of the present invention;

FIG. 2 illustrates a flow chart of a core module of a high availability database cluster management system agent component based on a raft protocol of the present invention;

FIG. 3 is a schematic diagram showing an alternative flow of a high availability database cluster management system based on a raft protocol according to the present invention;

FIG. 4 is a schematic diagram of a cluster state model of a high availability database cluster management system based on the raft protocol according to the present invention;

FIG. 5 shows a high availability database cluster management system header fault disaster recovery scheme based on the raft protocol of the present invention;

FIG. 6 shows a Master failure disaster recovery schematic diagram of a high availability database cluster management system based on the raft protocol of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.

Example 1

Referring to fig. 1, the present invention provides a high availability database cluster management system based on a raft protocol, comprising:

the node deployment module is used for constructing a shift cluster through the agent component, deploying a Leader node and a Follower node through the shift cluster, deploying a DB cluster, and deploying a Slave node and a Master node through the DB cluster;

each set of database cluster is provided with a set of agent components, and a single-shared Raft cluster of the database cluster is built in the agent components; deploying the agent instance and the DB instance 1:1 on the same physical node or POD, wherein the agent instance is responsible for inquiring and reporting the state of the local DB instance; each Agent instance reports a DB state at fixed time, and a Leader node is responsible for writing data and synchronizing to each Follower; each agent instance stores a piece of globally consistent state data; the Leader instance in the agent cluster is responsible for the initialization selector of the DB cluster and disaster recovery and master cutting under abnormal conditions, and master cutting behaviors are notified to all the agent instances through a shift protocol; the DB Master node and the Leader node of the Raft cannot be in the same physical node or POD. When the Leader selects a DB instance with a Master node as a local node, automatically transferring the Leader role to other raft nodes;

FIG. 2 shows a flow chart of a core module of a high availability database cluster management system agent component based on a Raft protocol, wherein the core module of the agent component processes a request from a management console through an ApiHandler unit, queries a return cluster state or executes a cluster setting request, the return cluster state is realized through a status storage unit, the status storage unit is a storage module and is responsible for storing a state of a cluster and a state list of each DB, snapshot and persistent storage are provided, the cluster setting request is realized by setting a Raftnode node, and the Raftnode is responsible for constructing and maintaining the Raft cluster, receiving instance state and cluster state changes and guaranteeing change consistency writing. The Reporter unit is responsible for timing inquiry of DB instance state and reporting to the Raftnode. The electric unit is responsible for cluster main inspection and switching, and is started in an Agent Leader process, and specific flow details are seen in a main selection flow. The Maintainer unit is responsible for the specific main flow of the Local DB instance.

FIG. 3 is a schematic diagram showing a process of selecting a high availability database cluster management system based on a raft protocol, wherein the process of selecting a main process is responsible for an electric unit module, and is triggered by a timer or a real-time message, such as initializing a cluster, shrinking a capacity node, manually cutting a main, and the like; the process is divided into 3 sub processes of initialization main selecting, forced main cutting and disaster recovery main cutting: executing initialization selecting main without initialization; performing forced main cutting by manually main cutting; and triggering disaster recovery and primary cutting when the abnormal time of the primary state exceeds a threshold value. Accompanied by a change in cluster state during the election process. The cluster state is maintained by a Leader node to generate a cluster state model, and fig. 4 shows a schematic diagram of the cluster state model of the high-availability database cluster management system based on the raft protocol, and a cluster state list is extracted:

the disaster recovery and cutting main comprises a header node disaster recovery and cutting main in a shift cluster and a Master node disaster recovery and cutting main in a DB cluster, and FIG. 5 shows a high-availability database cluster management system header fault disaster recovery schematic diagram based on a shift protocol, wherein the header node disaster recovery and cutting main process in the shift cluster comprises that after the header node fails, a header node automatically initiates a main selection, one of the headers is finally selected and promoted to be a header, a new header is identified after the old header is recovered, and the new header is automatically reduced to the header; FIG. 6 shows a Master failure disaster recovery schematic diagram of a highly available database cluster management system based on a raft protocol, wherein a Master node disaster recovery Master process in a DB cluster includes that after a Master node fails, an upper layer raft cluster reports and collects abnormal conditions through agents, actively initiates switching after exceeding a set disaster recovery time threshold, selects one of the Slaves as a new Master, notifies all agent nodes to execute switching, and after switching is completed, when an old Master is restored, a corresponding agent issues a Master switching request, and the old Master becomes the Slave

Example 2

The present embodiment provides an electronic device including: a processor and a memory, wherein the memory stores a computer program for the processor to call;

the processor executes the above-mentioned inspection task method for automatically creating a multi-resource pool based on the service tree and the inspection task template by calling the computer program stored in the memory.

The electronic device can generate larger difference due to different configurations or performances, and can comprise one or more processors (Central Processing Units, CPU) and one or more memories, wherein at least one computer program is stored in the memories, and the computer program is loaded and executed by the processors to realize the inspection task method for automatically creating the multi-resource pool based on the service tree and the inspection task template provided by the embodiment of the method. The electronic device can also include other components for implementing the functions of the device, for example, the electronic device can also have wired or wireless network interfaces, input-output interfaces, and the like, for inputting and outputting data. The present embodiment is not described herein.

Example 3

The present embodiment proposes a computer-readable storage medium having stored thereon an erasable computer program;

when the computer program runs on the computer equipment, the computer equipment is caused to execute the inspection task method for automatically creating the multi-resource pool based on the service tree and the inspection task template.

For example, the computer readable storage medium can be Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), compact disk Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM), magnetic tape, floppy disk, optical data storage device, etc.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are all within the protection of the present invention.

Claims

1. A high availability database cluster management system based on a raft protocol, comprising:

2. The high availability database cluster management system according to claim 1, wherein the node deployment module comprises a node allocation unit and a status reporting unit, the node allocation unit is configured to deploy agent instance and DB instance 1:1 in the same physical node or POD; the state reporting unit is used for setting a report to enable each agent instance to report the local DB state at fixed time, and the Leader node writes data and synchronizes to each Follower node; each agent instance stores a piece of globally consistent state data.

3. The high availability database cluster management system according to claim 2, wherein the node allocation unit is further configured to process a request from a console through an ApiHandler unit, query a return cluster state, or execute a cluster set request; the return cluster state is realized through a storage module StatusStorage unit, and the StatusStorage unit is used for storing the state of the clusters and the state list of each DB cluster; the cluster setting request is realized by setting a Raftnode node, and the Raftnode is used for constructing and maintaining the raft cluster and receiving the instance state and the cluster state change reported by the report unit.

4. The high availability database cluster management system according to claim 3, wherein a Master node in the DB cluster and a Leader node of the raft cluster cannot be in the same physical node or POD, and when the Leader selects a DB instance in which the Master node is local, the Leader role is automatically transferred to the raft node.

5. The system of claim 4, wherein the node selection master module comprises a cluster selection master process, the cluster selection master process is responsible for an electric unit, the electric unit is triggered by a timer or a real-time message and comprises an initialization master selection master, a forced master cutting master and a disaster recovery master 3 sub-processes, the initialization master selection master is used for executing the initialization master selection for non-initialization, the forced master cutting master is used for executing the forced master cutting for manual master cutting, and the disaster recovery master is used for triggering the disaster recovery master cutting by setting a disaster recovery time threshold when the master state abnormal time exceeds the disaster recovery time threshold; meanwhile, the cluster state changes with the main selection process.

6. The system of claim 5, wherein the disaster recovery Master comprises a header node disaster recovery Master in the raft cluster and a Master node disaster recovery Master in the DB cluster, and the header node disaster recovery Master process in the raft cluster comprises automatically initiating a selection Master by a Follower node after the header node fails, finally selecting one of the Follower to be promoted to a header, identifying that a new header has been generated after the old header is restored, and automatically reducing to the Follower; after the Master node fails, reporting the abnormal situation by the upper layer of raft cluster through the agent, actively initiating switching after exceeding the set disaster recovery time threshold, selecting one of the Slave as a new Master, notifying all agent nodes to execute switching, and after switching is completed, when the old Master is restored, issuing a Master switching request by the corresponding agent, wherein the old Master is changed into the Slave.

7. The high availability database cluster management system based on the raft protocol as claimed in claim 6, wherein when the nodes of the raft cluster and the nodes of the DB cluster simultaneously fail under the same environment, the cluster state is automatically recovered by solving a single-Leader failure or single-Master failure scenario.

8. The high availability database cluster management system according to claim 7, wherein the cluster state is maintained by a Leader node, a cluster state model is generated, an initiating state in the cluster state model is used as a first state of a new cluster, the initiating state is changed into a running state after a host is selected, when the cluster is in the running state, if the system detects a Master failure, the cluster is changed into the running state, the running state maintenance time is set to be T, if the Master is recovered within a period of T, the cluster state is changed into the running state, otherwise, the cluster is changed into a recovery state or a un recoring state; then the model initiates reselection of the master, if the cluster state after the master is selected, the cluster is changed into a recovery state, meanwhile, the cluster enters a change state after waiting until the master has completed all log application, or in the past of the master log application, the old master is restored, the cluster enters a run state, if no new master meeting the conditions is found, the cluster enters a UnRecovering state, at the moment, the cluster enters the recovery state after the inconsistent condition is manually processed, or the old master is restored in the process, and the cluster enters the run state; and finally, the recovery state enters a Changing state, a new owner is determined at the moment, the owner cutting operation is executed by each agent component, and the cluster is restored to a running state after all node operations are completed.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements a high availability database cluster management system based on the raft protocol of any one of claims 1-8 when the computer program is executed by the processor.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements a high availability database cluster management system based on a raft protocol according to any one of claims 1-8.