CN114185738A

CN114185738A - Method for realizing OpenGauss database high-availability cluster

Info

Publication number: CN114185738A
Application number: CN202111495712.1A
Authority: CN
Inventors: 潘浩文; 何小栋
Original assignee: Guangzhou Mass Database Technology Co ltd
Current assignee: Guangzhou Mass Database Technology Co ltd
Priority date: 2021-12-08
Filing date: 2021-12-08
Publication date: 2022-03-15

Abstract

The invention belongs to the technical field of relational database management and operating systems, and particularly relates to a method for realizing a high-availability cluster of an OpenGauss database and application thereof. The method realizes a high-availability cluster scheme of the OpenGauss database by creating a monitoring program independent of the OpenGauss database, detecting and recording the state of the OpenGauss database by the monitoring program, and performing corresponding processing according to the detected database state, and supports the functions of automatic detection of the availability state of the database and rapid processing of node faults in the cluster, thereby overcoming the defects of the OpenGauss database in the automatic detection of the faults of the main database and the lifting operation of the standby database, remarkably improving the reliability of the database, reducing the adverse effect of the faults on the use of the database, and providing powerful technical guarantee for the normal performance of the database.

Description

Method for realizing OpenGauss database high-availability cluster

Technical Field

The invention belongs to the technical field of relational database management and operating systems, and particularly relates to a method for realizing a high-availability cluster of an OpenGauss database and application thereof.

Background

OpenGauss is an open-source relational database management system, the kernel of which is derived from PostgreSQL, and the system is used as an open-source free database platform and aims to encourage community contribution and cooperation. Currently, OpenGauss databases already support stream-oriented physical replication (stream replication) of a pre-written log (WAL), and database users can receive the WAL from a master library and play back the received WAL during a standby library process through the stream replication, so as to construct a read-only standby library. If the main library fails in the subsequent flow and cannot provide the service, the user can execute a failover command of OpenGauss on the standby library, the standby library is promoted to the main library, and the new main library can take over the old main library to provide the service.

However, the above operation mode has three problems:

(1) since whether the OpenGauss database fails or not cannot be detected quickly, a situation that the database suspends service for a long time occurs.

(2) When there are multiple alternative libraries, the most suitable new master library cannot be determined and automatically selected.

(3) The failover command needs to be executed manually, and a large possibility of misoperation exists.

In view of this, if a high-availability cluster scheme is designed in the OpenGauss database, so that the availability state of the database can be automatically detected, and when the primary library is unavailable, the most appropriate secondary library is automatically calculated and selected, and the secondary library is timely promoted to the primary library, so that the reliability of the database can be greatly improved, the adverse effect of a fault on the use of the database is reduced, and a powerful technical guarantee is provided for the normal performance of the database.

Disclosure of Invention

In order to overcome the defects of the OpenGauss database in the automatic detection of the main library fault and the lifting operation of the standby library, the invention provides a solution. The invention aims to design a high-availability scheme for an OpenGauss database, build an OpenGauss high-availability cluster on the basis, and support the functions of automatic detection of the availability state of the database and rapid processing of node faults in the high-availability cluster.

Specifically, the invention provides a method for realizing a high-availability cluster of an OpenGauss database, which comprises the following steps of establishing a monitoring program independent of the OpenGauss database, carrying out state detection and state recording on the OpenGauss database by the monitoring program, and carrying out corresponding processing according to the detected database state, wherein the monitoring program comprises the following steps:

(1) establishing a main library and at least one standby library by utilizing stream type physical replication (stream replication) of the OpenGauss database to realize data redundancy;

(2) the distributed lock (leader Key) is realized by utilizing the characteristic of ETCD (highly available Key/Value storage system which is mainly used for sharing configuration and service discovery), the leader Key is created by the monitoring program and sets a lease period, the monitoring program holding the leader Key is responsible for regular renewal of lease, and the leader Key is automatically released after the lease period is reached;

(3) when a leader key holder (leader) does not exist in the cluster, the monitoring program serves as an ETCD client, judges whether a database monitored by the monitoring program is the most healthy node or not, and tries to acquire the leader key when a positive answer is obtained;

(4) and after the monitoring program acquires the leader key, the OpenGauss database monitored by the monitoring program is promoted to be a main database by executing a failover command, and on the contrary, if the monitoring program cannot acquire the leader key or loses the leader key, the OpenGauss database monitored by the monitoring program is started in a standby database mode.

Further, the method for realizing the high-availability cluster of the OpenGauss database comprises the following steps:

(1) master and backup streaming physical replication environment configuration

Generating a basic backup from a main library through a gs _ basebackup tool of an OpenGauss database, editing a configuration file of the database, adding a replenifone parameter, and starting the database in a backup mode;

(2) creating a monitor

Creating a monitoring program independent of the OpenGauss database, wherein the monitoring program is a program which runs in a continuous cycle and executes fault detection and fault processing functions in each cycle;

(3) cluster failure detection

The high-availability cluster judges whether the cluster has a fault according to the fact that the monitoring program accesses the ETCD and checks whether the leader key exists, and when the leader key does not exist, the cluster is judged to have the fault;

(4) cluster failure automatic handling

When the main library is unavailable or does not exist due to the fault of the cluster, the monitoring program firstly judges whether the monitored database is the most healthy node, the judgment logic is that the LSN (Log Sequence Number, namely the size of the WAL Log generated or received by the OpenGauss database at present) of all nodes in the cluster is compared, if the LSN of the nodes of the database is the maximum, the database is the most healthy node, the monitoring program tries to acquire leader key at the moment and sets the lease of the leader key, and the lease is set to be used for automatically releasing the leader key after the main library is down and triggering an automatic cluster fault processing mechanism; after the leader key is successfully acquired, a failover command of the OpenGauss database is executed to promote the database to be a main database;

(5) backup library fault detection

In the high-availability cluster, each standby library is a copy of a main library, the standby library is promoted to the main library when the main library is unavailable, and whether the standby library fails or not is judged according to the fact that whether a pid file of an OpenGauss process exists or not is detected through a monitoring program;

(6) backup failure handling

And for processing the failure of the standby library, executing a build command of the OpenGauss database by the monitoring program, performing incremental build, executing full build after the incremental build fails, and restarting the standby library after the build command is executed.

Further, in the method for implementing the OpenGauss database high availability cluster, step (2) creates a monitoring program independent of the OpenGauss database, where the monitoring program is a program that runs continuously in a loop, and executes the fault detection and fault processing functions in each loop, where an execution main body of each step is the monitoring program, and a processing flow in each loop includes the following sub-steps:

(A) the monitoring program accesses the ETCD process, checks whether the leader key exists, if so, enters the step (B), and if not, enters the step (E);

(B) the monitoring program judges whether the monitoring program holds the leader key or not, if the monitoring program holds the leader key, the monitoring program indicates that the node is the main node of the cluster, the cycle is directly ended, and if the monitoring program does not hold the leader key, the step (C) is carried out;

(C) the monitoring program detects whether the OpenGauss database process as the monitored object runs normally, if the monitored object runs normally, the monitored object continues to run in a standby mode, the monitoring program ends the cycle, and if the monitored object runs abnormally, the step (D) is carried out;

(D) the monitoring program executes a gs _ ctl built command provided by an OpenGauss database to repair the monitored object, the monitored object runs in a standby mode, and the monitoring program finishes the cycle;

(E) the monitoring program judges whether the monitored object is the most healthy node in the cluster, the judgment logic is that the LSNs of all nodes in the cluster are compared, if the LSN of the node of the database is the maximum, the database is the most healthy node, if the monitored object is the most healthy node, the step (F) is carried out, otherwise, the step (G) is carried out;

(F) accessing the ETCD process by the monitoring program, trying to acquire a leader key (namely creating a leader key path in the ETCD), if the leader key is successfully acquired, executing a gs _ ctl failure over command provided by an OpenGauss database to lift the monitored object to a main library, finishing the cycle by the monitoring program, and otherwise, entering the step (G);

(G) and the monitoring program waits for the generation of a new leader key holder (leader) in the cluster, and after the generation of the new leader key holder, the monitoring program executes a gs _ ctl built command provided by the OpenGauss database to repair the standby database, the monitoring object runs in the standby database mode, and the monitoring program finishes the cycle.

In addition, the invention also relates to the application of the method for realizing the OpenGauss database high-availability cluster in a relational database management or operating system.

In summary, the present invention provides a method for implementing a high-availability cluster of an OpenGauss database for the first time, and the method has the following advantages:

(1) in an operating OpenGauss database high-availability cluster, when a master library is unavailable, the database cluster can automatically detect and confirm the fault of the master library and generate a new master library in time, so that the condition that the database suspends service for a long time is effectively avoided.

(2) When a plurality of alternative libraries for selection exist, the most suitable alternative library can be automatically calculated and selected, and the alternative library is timely upgraded into a new main library.

(3) In a running OpenGauss database high-availability cluster, when a standby database is unavailable, the database cluster can automatically detect and confirm the standby database fault and repair the standby database in time.

(4) And the monitoring program is automatically executed in the whole process, so that the occurrence of manual misoperation is avoided.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention are briefly described below, it is obvious that the following drawings are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic view of the processing flow of the monitor program in each cycle of the method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail and completely with reference to the following embodiments and accompanying drawings. It is to be understood that the embodiments described are merely illustrative of some, but not all, of the present invention and that the invention may be embodied or carried out in various other specific forms, and that various modifications and changes in the details of the specification may be made without departing from the spirit of the invention.

Also, it should be understood that the scope of the invention is not limited to the particular embodiments described below; it is also to be understood that the terminology used in the examples is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention.

Example 1: a method for realizing OpenGauss database high-availability cluster is characterized in that a monitoring program independent of the OpenGauss database is created, the monitoring program carries out state detection and state recording on the OpenGauss database, and corresponding processing is carried out according to the detected database state, and the method is implemented according to the following processes: establishing a main library and at least one standby library by using the stream replication of the OpenGauss database to realize data redundancy; the method comprises the steps that a distributed lock (leader key) is realized by utilizing the characteristic of an ETCD, the leader key is created by a monitoring program and is provided with a lease, the monitoring program holding the leader key is responsible for regular lease renewal, and the leader key is automatically released after the lease is reached; when a leader key holder does not exist in the cluster, the monitoring program serves as an ETCD client, judges whether a database monitored by the monitoring program is the most healthy node or not, and tries to acquire the leader key if the database monitored by the monitoring program is the most healthy node; and after the monitoring program acquires the leader key, the OpenGauss database monitored by the monitoring program is promoted to be the main database by executing the failover command, and on the contrary, if the monitoring program cannot acquire the leader key or loses the leader key, the OpenGauss database monitored by the monitoring program is started in a standby database mode.

Specifically, the method for realizing the OpenGauss database high-availability cluster comprises the following steps:

(1) master and backup streaming physical replication environment configuration

Generating a basic backup from a main library through a gs _ basebackup tool of the OpenGauss database, editing a configuration file of the database, adding a replconninfo parameter, and starting the database in a backup mode.

(2) Creating a monitor

Creating a monitoring program independent of the OpenGauss database, wherein the monitoring program is a program which runs in a continuous cycle and executes fault detection and fault processing functions in each cycle; the processing flow of the monitoring program in each cycle comprises the following sub-steps (see fig. 1):

(3) Cluster failure detection

The basis of judging whether the cluster has the fault or not by the high-availability cluster is to access the ETCD through a monitoring program and check whether the leader key exists or not, and when the leader key does not exist, the cluster is judged to have the fault.

(4) Cluster failure automatic handling

When the main library is unavailable or does not exist due to the fault of the cluster, the monitoring program firstly judges whether the database monitored by the monitoring program is the most healthy node or not, the judgment logic is that the LSNs of all the nodes in the cluster are compared, if the LSN of the node of the database is the maximum, the database is the most healthy node, the monitoring program tries to acquire the leader key at the moment and sets the lease of the leader key, and the setting of the lease is used for automatically releasing the leader key after the main library is down and triggering an automatic cluster fault processing mechanism; and after the leader key is successfully acquired, executing a failover command of the OpenGauss database to promote the database to be a main database.

(5) Backup library fault detection

In the high-availability cluster, each standby library is a copy of a main library, the standby library is promoted to the main library when the main library is unavailable, and whether the standby library fails or not is judged according to the fact that whether a pid file of an OpenGauss process exists or not is detected through a monitoring program.

(6) Backup failure handling

The above description is only an example of the present invention, and is not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, replacement, or the like that comes within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A method for realizing a high-availability cluster of an OpenGauss database is characterized in that the method is implemented by creating a monitoring program independent of the OpenGauss database, the monitoring program performs state detection and state recording on the OpenGauss database, and performs corresponding processing according to the detected database state, and the method for realizing the high-availability cluster of the OpenGauss database comprises the following steps:

(1) establishing a main library and at least one standby library by utilizing the streaming physical replication of the OpenGauss database to realize data redundancy;

(2) utilizing the characteristic of the ETCD to realize a leader key, wherein the leader key is created by the monitoring program and sets a lease period, the monitoring program holding the leader key is responsible for regular renewal of lease, and the leader key is automatically released after the lease period is reached;

(3) when a leader key holder does not exist in the cluster, the monitoring program serves as an ETCD client, judges whether a database monitored by the monitoring program is the most healthy node or not, and tries to acquire the leader key when a positive answer is obtained;

2. The method for implementing the OpenGauss database high availability cluster according to claim 1, wherein the method for implementing the OpenGauss database high availability cluster comprises the following steps:

(1) master and backup streaming physical replication environment configuration

(2) creating a monitor

(3) cluster failure detection

(4) cluster failure automatic handling

When the main library is unavailable or does not exist due to the fault of the cluster, the monitoring program firstly judges whether the database monitored by the monitoring program is the most healthy node or not, the judgment logic is that the LSNs of all the nodes in the cluster are compared, if the LSN of the node of the database is the maximum, the database is the most healthy node, the monitoring program tries to acquire the leader key at the moment and sets the lease of the leader key, and the setting of the lease is used for automatically releasing the leader key after the main library is down and triggering an automatic cluster fault processing mechanism; after the leader key is successfully acquired, a failover command of the OpenGauss database is executed to promote the database to be a main database;

(5) backup library fault detection

(6) backup failure handling

3. The method according to claim 2, wherein the step (2) creates a monitor program independent from the OpenGauss database, the monitor program is a program that runs continuously in a loop, and performs fault detection and fault handling functions in each loop, and the processing flow in each loop includes the following sub-steps:

(F) accessing the ETCD process by the monitoring program, trying to acquire a leader key, if the leader key is successfully acquired, executing a gs _ ctl failure over command provided by the OpenGauss database to promote the monitored object to a main library, finishing the loop by the monitoring program, and otherwise, entering the step (G);

(G) and the monitoring program waits for the generation of a new leader key holder in the cluster, and after the generation of the new leader key holder, the monitoring program executes a gs _ ctl built command provided by the OpenGauss database to repair the standby database, the monitored object runs in the standby database mode, and the monitoring program ends the cycle.

4. Use of the method of any of claims 1-3 for implementing a high availability cluster of an OpenGauss database in a relational database management or operating system.