CN114185738A - Method for realizing OpenGauss database high-availability cluster - Google Patents

Method for realizing OpenGauss database high-availability cluster Download PDF

Info

Publication number
CN114185738A
CN114185738A CN202111495712.1A CN202111495712A CN114185738A CN 114185738 A CN114185738 A CN 114185738A CN 202111495712 A CN202111495712 A CN 202111495712A CN 114185738 A CN114185738 A CN 114185738A
Authority
CN
China
Prior art keywords
database
monitoring program
opengauss
cluster
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111495712.1A
Other languages
Chinese (zh)
Inventor
潘浩文
何小栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Mass Database Technology Co ltd
Original Assignee
Guangzhou Mass Database Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Mass Database Technology Co ltd filed Critical Guangzhou Mass Database Technology Co ltd
Priority to CN202111495712.1A priority Critical patent/CN114185738A/en
Publication of CN114185738A publication Critical patent/CN114185738A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of relational database management and operating systems, and particularly relates to a method for realizing a high-availability cluster of an OpenGauss database and application thereof. The method realizes a high-availability cluster scheme of the OpenGauss database by creating a monitoring program independent of the OpenGauss database, detecting and recording the state of the OpenGauss database by the monitoring program, and performing corresponding processing according to the detected database state, and supports the functions of automatic detection of the availability state of the database and rapid processing of node faults in the cluster, thereby overcoming the defects of the OpenGauss database in the automatic detection of the faults of the main database and the lifting operation of the standby database, remarkably improving the reliability of the database, reducing the adverse effect of the faults on the use of the database, and providing powerful technical guarantee for the normal performance of the database.

Description

Method for realizing OpenGauss database high-availability cluster
Technical Field
The invention belongs to the technical field of relational database management and operating systems, and particularly relates to a method for realizing a high-availability cluster of an OpenGauss database and application thereof.
Background
OpenGauss is an open-source relational database management system, the kernel of which is derived from PostgreSQL, and the system is used as an open-source free database platform and aims to encourage community contribution and cooperation. Currently, OpenGauss databases already support stream-oriented physical replication (stream replication) of a pre-written log (WAL), and database users can receive the WAL from a master library and play back the received WAL during a standby library process through the stream replication, so as to construct a read-only standby library. If the main library fails in the subsequent flow and cannot provide the service, the user can execute a failover command of OpenGauss on the standby library, the standby library is promoted to the main library, and the new main library can take over the old main library to provide the service.
However, the above operation mode has three problems:
(1) since whether the OpenGauss database fails or not cannot be detected quickly, a situation that the database suspends service for a long time occurs.
(2) When there are multiple alternative libraries, the most suitable new master library cannot be determined and automatically selected.
(3) The failover command needs to be executed manually, and a large possibility of misoperation exists.
In view of this, if a high-availability cluster scheme is designed in the OpenGauss database, so that the availability state of the database can be automatically detected, and when the primary library is unavailable, the most appropriate secondary library is automatically calculated and selected, and the secondary library is timely promoted to the primary library, so that the reliability of the database can be greatly improved, the adverse effect of a fault on the use of the database is reduced, and a powerful technical guarantee is provided for the normal performance of the database.
Disclosure of Invention
In order to overcome the defects of the OpenGauss database in the automatic detection of the main library fault and the lifting operation of the standby library, the invention provides a solution. The invention aims to design a high-availability scheme for an OpenGauss database, build an OpenGauss high-availability cluster on the basis, and support the functions of automatic detection of the availability state of the database and rapid processing of node faults in the high-availability cluster.
Specifically, the invention provides a method for realizing a high-availability cluster of an OpenGauss database, which comprises the following steps of establishing a monitoring program independent of the OpenGauss database, carrying out state detection and state recording on the OpenGauss database by the monitoring program, and carrying out corresponding processing according to the detected database state, wherein the monitoring program comprises the following steps:
(1) establishing a main library and at least one standby library by utilizing stream type physical replication (stream replication) of the OpenGauss database to realize data redundancy;
(2) the distributed lock (leader Key) is realized by utilizing the characteristic of ETCD (highly available Key/Value storage system which is mainly used for sharing configuration and service discovery), the leader Key is created by the monitoring program and sets a lease period, the monitoring program holding the leader Key is responsible for regular renewal of lease, and the leader Key is automatically released after the lease period is reached;
(3) when a leader key holder (leader) does not exist in the cluster, the monitoring program serves as an ETCD client, judges whether a database monitored by the monitoring program is the most healthy node or not, and tries to acquire the leader key when a positive answer is obtained;
(4) and after the monitoring program acquires the leader key, the OpenGauss database monitored by the monitoring program is promoted to be a main database by executing a failover command, and on the contrary, if the monitoring program cannot acquire the leader key or loses the leader key, the OpenGauss database monitored by the monitoring program is started in a standby database mode.
Further, the method for realizing the high-availability cluster of the OpenGauss database comprises the following steps:
(1) master and backup streaming physical replication environment configuration
Generating a basic backup from a main library through a gs _ basebackup tool of an OpenGauss database, editing a configuration file of the database, adding a replenifone parameter, and starting the database in a backup mode;
(2) creating a monitor
Creating a monitoring program independent of the OpenGauss database, wherein the monitoring program is a program which runs in a continuous cycle and executes fault detection and fault processing functions in each cycle;
(3) cluster failure detection
The high-availability cluster judges whether the cluster has a fault according to the fact that the monitoring program accesses the ETCD and checks whether the leader key exists, and when the leader key does not exist, the cluster is judged to have the fault;
(4) cluster failure automatic handling
When the main library is unavailable or does not exist due to the fault of the cluster, the monitoring program firstly judges whether the monitored database is the most healthy node, the judgment logic is that the LSN (Log Sequence Number, namely the size of the WAL Log generated or received by the OpenGauss database at present) of all nodes in the cluster is compared, if the LSN of the nodes of the database is the maximum, the database is the most healthy node, the monitoring program tries to acquire leader key at the moment and sets the lease of the leader key, and the lease is set to be used for automatically releasing the leader key after the main library is down and triggering an automatic cluster fault processing mechanism; after the leader key is successfully acquired, a failover command of the OpenGauss database is executed to promote the database to be a main database;
(5) backup library fault detection
In the high-availability cluster, each standby library is a copy of a main library, the standby library is promoted to the main library when the main library is unavailable, and whether the standby library fails or not is judged according to the fact that whether a pid file of an OpenGauss process exists or not is detected through a monitoring program;
(6) backup failure handling
And for processing the failure of the standby library, executing a build command of the OpenGauss database by the monitoring program, performing incremental build, executing full build after the incremental build fails, and restarting the standby library after the build command is executed.
Further, in the method for implementing the OpenGauss database high availability cluster, step (2) creates a monitoring program independent of the OpenGauss database, where the monitoring program is a program that runs continuously in a loop, and executes the fault detection and fault processing functions in each loop, where an execution main body of each step is the monitoring program, and a processing flow in each loop includes the following sub-steps:
(A) the monitoring program accesses the ETCD process, checks whether the leader key exists, if so, enters the step (B), and if not, enters the step (E);
(B) the monitoring program judges whether the monitoring program holds the leader key or not, if the monitoring program holds the leader key, the monitoring program indicates that the node is the main node of the cluster, the cycle is directly ended, and if the monitoring program does not hold the leader key, the step (C) is carried out;
(C) the monitoring program detects whether the OpenGauss database process as the monitored object runs normally, if the monitored object runs normally, the monitored object continues to run in a standby mode, the monitoring program ends the cycle, and if the monitored object runs abnormally, the step (D) is carried out;
(D) the monitoring program executes a gs _ ctl built command provided by an OpenGauss database to repair the monitored object, the monitored object runs in a standby mode, and the monitoring program finishes the cycle;
(E) the monitoring program judges whether the monitored object is the most healthy node in the cluster, the judgment logic is that the LSNs of all nodes in the cluster are compared, if the LSN of the node of the database is the maximum, the database is the most healthy node, if the monitored object is the most healthy node, the step (F) is carried out, otherwise, the step (G) is carried out;
(F) accessing the ETCD process by the monitoring program, trying to acquire a leader key (namely creating a leader key path in the ETCD), if the leader key is successfully acquired, executing a gs _ ctl failure over command provided by an OpenGauss database to lift the monitored object to a main library, finishing the cycle by the monitoring program, and otherwise, entering the step (G);
(G) and the monitoring program waits for the generation of a new leader key holder (leader) in the cluster, and after the generation of the new leader key holder, the monitoring program executes a gs _ ctl built command provided by the OpenGauss database to repair the standby database, the monitoring object runs in the standby database mode, and the monitoring program finishes the cycle.
In addition, the invention also relates to the application of the method for realizing the OpenGauss database high-availability cluster in a relational database management or operating system.
In summary, the present invention provides a method for implementing a high-availability cluster of an OpenGauss database for the first time, and the method has the following advantages:
(1) in an operating OpenGauss database high-availability cluster, when a master library is unavailable, the database cluster can automatically detect and confirm the fault of the master library and generate a new master library in time, so that the condition that the database suspends service for a long time is effectively avoided.
(2) When a plurality of alternative libraries for selection exist, the most suitable alternative library can be automatically calculated and selected, and the alternative library is timely upgraded into a new main library.
(3) In a running OpenGauss database high-availability cluster, when a standby database is unavailable, the database cluster can automatically detect and confirm the standby database fault and repair the standby database in time.
(4) And the monitoring program is automatically executed in the whole process, so that the occurrence of manual misoperation is avoided.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention are briefly described below, it is obvious that the following drawings are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic view of the processing flow of the monitor program in each cycle of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail and completely with reference to the following embodiments and accompanying drawings. It is to be understood that the embodiments described are merely illustrative of some, but not all, of the present invention and that the invention may be embodied or carried out in various other specific forms, and that various modifications and changes in the details of the specification may be made without departing from the spirit of the invention.
Also, it should be understood that the scope of the invention is not limited to the particular embodiments described below; it is also to be understood that the terminology used in the examples is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention.
Example 1: a method for realizing OpenGauss database high-availability cluster is characterized in that a monitoring program independent of the OpenGauss database is created, the monitoring program carries out state detection and state recording on the OpenGauss database, and corresponding processing is carried out according to the detected database state, and the method is implemented according to the following processes: establishing a main library and at least one standby library by using the stream replication of the OpenGauss database to realize data redundancy; the method comprises the steps that a distributed lock (leader key) is realized by utilizing the characteristic of an ETCD, the leader key is created by a monitoring program and is provided with a lease, the monitoring program holding the leader key is responsible for regular lease renewal, and the leader key is automatically released after the lease is reached; when a leader key holder does not exist in the cluster, the monitoring program serves as an ETCD client, judges whether a database monitored by the monitoring program is the most healthy node or not, and tries to acquire the leader key if the database monitored by the monitoring program is the most healthy node; and after the monitoring program acquires the leader key, the OpenGauss database monitored by the monitoring program is promoted to be the main database by executing the failover command, and on the contrary, if the monitoring program cannot acquire the leader key or loses the leader key, the OpenGauss database monitored by the monitoring program is started in a standby database mode.
Specifically, the method for realizing the OpenGauss database high-availability cluster comprises the following steps:
(1) master and backup streaming physical replication environment configuration
Generating a basic backup from a main library through a gs _ basebackup tool of the OpenGauss database, editing a configuration file of the database, adding a replconninfo parameter, and starting the database in a backup mode.
(2) Creating a monitor
Creating a monitoring program independent of the OpenGauss database, wherein the monitoring program is a program which runs in a continuous cycle and executes fault detection and fault processing functions in each cycle; the processing flow of the monitoring program in each cycle comprises the following sub-steps (see fig. 1):
(A) the monitoring program accesses the ETCD process, checks whether the leader key exists, if so, enters the step (B), and if not, enters the step (E);
(B) the monitoring program judges whether the monitoring program holds the leader key or not, if the monitoring program holds the leader key, the monitoring program indicates that the node is the main node of the cluster, the cycle is directly ended, and if the monitoring program does not hold the leader key, the step (C) is carried out;
(C) the monitoring program detects whether the OpenGauss database process as the monitored object runs normally, if the monitored object runs normally, the monitored object continues to run in a standby mode, the monitoring program ends the cycle, and if the monitored object runs abnormally, the step (D) is carried out;
(D) the monitoring program executes a gs _ ctl built command provided by an OpenGauss database to repair the monitored object, the monitored object runs in a standby mode, and the monitoring program finishes the cycle;
(E) the monitoring program judges whether the monitored object is the most healthy node in the cluster, the judgment logic is that the LSNs of all nodes in the cluster are compared, if the LSN of the node of the database is the maximum, the database is the most healthy node, if the monitored object is the most healthy node, the step (F) is carried out, otherwise, the step (G) is carried out;
(F) accessing the ETCD process by the monitoring program, trying to acquire a leader key (namely creating a leader key path in the ETCD), if the leader key is successfully acquired, executing a gs _ ctl failure over command provided by an OpenGauss database to lift the monitored object to a main library, finishing the cycle by the monitoring program, and otherwise, entering the step (G);
(G) and the monitoring program waits for the generation of a new leader key holder (leader) in the cluster, and after the generation of the new leader key holder, the monitoring program executes a gs _ ctl built command provided by the OpenGauss database to repair the standby database, the monitoring object runs in the standby database mode, and the monitoring program finishes the cycle.
(3) Cluster failure detection
The basis of judging whether the cluster has the fault or not by the high-availability cluster is to access the ETCD through a monitoring program and check whether the leader key exists or not, and when the leader key does not exist, the cluster is judged to have the fault.
(4) Cluster failure automatic handling
When the main library is unavailable or does not exist due to the fault of the cluster, the monitoring program firstly judges whether the database monitored by the monitoring program is the most healthy node or not, the judgment logic is that the LSNs of all the nodes in the cluster are compared, if the LSN of the node of the database is the maximum, the database is the most healthy node, the monitoring program tries to acquire the leader key at the moment and sets the lease of the leader key, and the setting of the lease is used for automatically releasing the leader key after the main library is down and triggering an automatic cluster fault processing mechanism; and after the leader key is successfully acquired, executing a failover command of the OpenGauss database to promote the database to be a main database.
(5) Backup library fault detection
In the high-availability cluster, each standby library is a copy of a main library, the standby library is promoted to the main library when the main library is unavailable, and whether the standby library fails or not is judged according to the fact that whether a pid file of an OpenGauss process exists or not is detected through a monitoring program.
(6) Backup failure handling
And for processing the failure of the standby library, executing a build command of the OpenGauss database by the monitoring program, performing incremental build, executing full build after the incremental build fails, and restarting the standby library after the build command is executed.
The above description is only an example of the present invention, and is not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, replacement, or the like that comes within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (4)

1. A method for realizing a high-availability cluster of an OpenGauss database is characterized in that the method is implemented by creating a monitoring program independent of the OpenGauss database, the monitoring program performs state detection and state recording on the OpenGauss database, and performs corresponding processing according to the detected database state, and the method for realizing the high-availability cluster of the OpenGauss database comprises the following steps:
(1) establishing a main library and at least one standby library by utilizing the streaming physical replication of the OpenGauss database to realize data redundancy;
(2) utilizing the characteristic of the ETCD to realize a leader key, wherein the leader key is created by the monitoring program and sets a lease period, the monitoring program holding the leader key is responsible for regular renewal of lease, and the leader key is automatically released after the lease period is reached;
(3) when a leader key holder does not exist in the cluster, the monitoring program serves as an ETCD client, judges whether a database monitored by the monitoring program is the most healthy node or not, and tries to acquire the leader key when a positive answer is obtained;
(4) and after the monitoring program acquires the leader key, the OpenGauss database monitored by the monitoring program is promoted to be a main database by executing a failover command, and on the contrary, if the monitoring program cannot acquire the leader key or loses the leader key, the OpenGauss database monitored by the monitoring program is started in a standby database mode.
2. The method for implementing the OpenGauss database high availability cluster according to claim 1, wherein the method for implementing the OpenGauss database high availability cluster comprises the following steps:
(1) master and backup streaming physical replication environment configuration
Generating a basic backup from a main library through a gs _ basebackup tool of an OpenGauss database, editing a configuration file of the database, adding a replenifone parameter, and starting the database in a backup mode;
(2) creating a monitor
Creating a monitoring program independent of the OpenGauss database, wherein the monitoring program is a program which runs in a continuous cycle and executes fault detection and fault processing functions in each cycle;
(3) cluster failure detection
The high-availability cluster judges whether the cluster has a fault according to the fact that the monitoring program accesses the ETCD and checks whether the leader key exists, and when the leader key does not exist, the cluster is judged to have the fault;
(4) cluster failure automatic handling
When the main library is unavailable or does not exist due to the fault of the cluster, the monitoring program firstly judges whether the database monitored by the monitoring program is the most healthy node or not, the judgment logic is that the LSNs of all the nodes in the cluster are compared, if the LSN of the node of the database is the maximum, the database is the most healthy node, the monitoring program tries to acquire the leader key at the moment and sets the lease of the leader key, and the setting of the lease is used for automatically releasing the leader key after the main library is down and triggering an automatic cluster fault processing mechanism; after the leader key is successfully acquired, a failover command of the OpenGauss database is executed to promote the database to be a main database;
(5) backup library fault detection
In the high-availability cluster, each standby library is a copy of a main library, the standby library is promoted to the main library when the main library is unavailable, and whether the standby library fails or not is judged according to the fact that whether a pid file of an OpenGauss process exists or not is detected through a monitoring program;
(6) backup failure handling
And for processing the failure of the standby library, executing a build command of the OpenGauss database by the monitoring program, performing incremental build, executing full build after the incremental build fails, and restarting the standby library after the build command is executed.
3. The method according to claim 2, wherein the step (2) creates a monitor program independent from the OpenGauss database, the monitor program is a program that runs continuously in a loop, and performs fault detection and fault handling functions in each loop, and the processing flow in each loop includes the following sub-steps:
(A) the monitoring program accesses the ETCD process, checks whether the leader key exists, if so, enters the step (B), and if not, enters the step (E);
(B) the monitoring program judges whether the monitoring program holds the leader key or not, if the monitoring program holds the leader key, the monitoring program indicates that the node is the main node of the cluster, the cycle is directly ended, and if the monitoring program does not hold the leader key, the step (C) is carried out;
(C) the monitoring program detects whether the OpenGauss database process as the monitored object runs normally, if the monitored object runs normally, the monitored object continues to run in a standby mode, the monitoring program ends the cycle, and if the monitored object runs abnormally, the step (D) is carried out;
(D) the monitoring program executes a gs _ ctl built command provided by an OpenGauss database to repair the monitored object, the monitored object runs in a standby mode, and the monitoring program finishes the cycle;
(E) the monitoring program judges whether the monitored object is the most healthy node in the cluster, the judgment logic is that the LSNs of all nodes in the cluster are compared, if the LSN of the node of the database is the maximum, the database is the most healthy node, if the monitored object is the most healthy node, the step (F) is carried out, otherwise, the step (G) is carried out;
(F) accessing the ETCD process by the monitoring program, trying to acquire a leader key, if the leader key is successfully acquired, executing a gs _ ctl failure over command provided by the OpenGauss database to promote the monitored object to a main library, finishing the loop by the monitoring program, and otherwise, entering the step (G);
(G) and the monitoring program waits for the generation of a new leader key holder in the cluster, and after the generation of the new leader key holder, the monitoring program executes a gs _ ctl built command provided by the OpenGauss database to repair the standby database, the monitored object runs in the standby database mode, and the monitoring program ends the cycle.
4. Use of the method of any of claims 1-3 for implementing a high availability cluster of an OpenGauss database in a relational database management or operating system.
CN202111495712.1A 2021-12-08 2021-12-08 Method for realizing OpenGauss database high-availability cluster Pending CN114185738A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111495712.1A CN114185738A (en) 2021-12-08 2021-12-08 Method for realizing OpenGauss database high-availability cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111495712.1A CN114185738A (en) 2021-12-08 2021-12-08 Method for realizing OpenGauss database high-availability cluster

Publications (1)

Publication Number Publication Date
CN114185738A true CN114185738A (en) 2022-03-15

Family

ID=80603946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111495712.1A Pending CN114185738A (en) 2021-12-08 2021-12-08 Method for realizing OpenGauss database high-availability cluster

Country Status (1)

Country Link
CN (1) CN114185738A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115269274A (en) * 2022-08-04 2022-11-01 广州鼎甲计算机科技有限公司 Data recovery method, apparatus, computer device, storage medium, and program product

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115269274A (en) * 2022-08-04 2022-11-01 广州鼎甲计算机科技有限公司 Data recovery method, apparatus, computer device, storage medium, and program product
CN115269274B (en) * 2022-08-04 2023-09-29 广州鼎甲计算机科技有限公司 Data recovery method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US7802128B2 (en) Method to avoid continuous application failovers in a cluster
WO2017177941A1 (en) Active/standby database switching method and apparatus
US9785521B2 (en) Fault tolerant architecture for distributed computing systems
US8108733B2 (en) Monitoring distributed software health and membership in a compute cluster
WO2021103499A1 (en) Multi-active data center-based traffic switching method and device
JP2001188684A (en) System and method for selective rejuvenation on transparent time base
CN113726553A (en) Node fault recovery method and device, electronic equipment and readable storage medium
CN111880906A (en) Virtual machine high-availability management method, system and storage medium
CN109144789A (en) A kind of method, apparatus and system for restarting OSD
US7373542B2 (en) Automatic startup of a cluster system after occurrence of a recoverable error
CN114185738A (en) Method for realizing OpenGauss database high-availability cluster
CN116781488A (en) Database high availability implementation method, device, database architecture, equipment and product
CN111917576B (en) Storage cluster control method and device, computer readable storage medium and processor
CN118018463A (en) Fault processing method, device, equipment and readable storage medium
CN117851099A (en) Fault processing method of k8s cluster, computing equipment and cluster
CN111897626A (en) Cloud computing scene-oriented virtual machine high-reliability system and implementation method
CN109104314B (en) Method and device for modifying log configuration file
CN115964142A (en) Application service management method, device and storage medium
CN115373896A (en) Replica data recovery method and system based on distributed block storage
JP4485560B2 (en) Computer system and system management program
CN111831489A (en) Sentinel mechanism-based MySQL fault switching method and device
CN111124757A (en) Data node heartbeat detection algorithm of distributed transaction database
US8713359B1 (en) Autonomous primary-mirror synchronized reset
JP2015106226A (en) Dual system
US20240219986A1 (en) Multi-node system and power supply control method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination