CN115001950A - Database cluster fault processing method, storage medium and equipment - Google Patents

Database cluster fault processing method, storage medium and equipment Download PDF

Info

Publication number
CN115001950A
CN115001950A CN202210594391.9A CN202210594391A CN115001950A CN 115001950 A CN115001950 A CN 115001950A CN 202210594391 A CN202210594391 A CN 202210594391A CN 115001950 A CN115001950 A CN 115001950A
Authority
CN
China
Prior art keywords
database
cluster
response
database cluster
gateway
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210594391.9A
Other languages
Chinese (zh)
Inventor
郭道兵
李翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingbase Information Technologies Co Ltd
Original Assignee
Beijing Kingbase Information Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingbase Information Technologies Co Ltd filed Critical Beijing Kingbase Information Technologies Co Ltd
Priority to CN202210594391.9A priority Critical patent/CN115001950A/en
Publication of CN115001950A publication Critical patent/CN115001950A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0659Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
    • H04L41/0661Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities by reconfiguring faulty entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity

Abstract

The invention provides a fault processing method, a storage medium and equipment of a database cluster, wherein the fault processing method of the database cluster comprises the following steps: acquiring abnormal events occurring in a database cluster where a current database is located; confirming the main and standby states of the current database; acquiring the communication state of the current database and the trust gateway; and configuring a virtual IP according to the communication state and the main/standby state, wherein the virtual IP is an external connection address of the database cluster. The invention can ensure that transparent application fault switching is always realized by setting the virtual IP, thereby ensuring that the application connection and the read-write request are normal.

Description

Database cluster fault processing method, storage medium and equipment
Technical Field
The present invention relates to the field of database technologies, and in particular, to a method, a storage medium, and a device for handling a failure of a database cluster.
Background
The clustering technology is a newer technology, relatively high benefits in performance, reliability, flexibility and the like can be obtained at low cost through the clustering technology, and task scheduling is a core technology in a clustering system. A cluster is a group of mutually independent computers interconnected by a high-speed network, which form a group and are managed in a single system mode. A client interacts with a cluster, which appears as a stand-alone server. The cluster configuration is for improved availability and scalability.
In the main and standby database clusters, the connection supporting the application read-write request service can only be provided by the main database, and the standby database usually does not provide external connection or only provides the query connection service of a read-only business system. When the master/slave database is switched (especially in the case of failover), the application may not know that the master/slave is switched, and the connection configuration of the application system remains before the switching, which results in abnormal application read/write request service. In order to reduce the impact of the switching on the Application connection service, it is usually necessary to perform Transparent Application failover (Transparent Application failover) on the active/standby cluster.
The main and standby cluster support transparent application fault switching is realized by the standard that application connection is switched with the main and standby databases after the main and standby databases are switched, so that smooth connection of the databases is kept and read-write request service is normal. Transparent application failover is typically implemented by configuring a string of connections for multiple IP rounds.
Fig. 1 is a diagram of an implementation architecture for transparent application failover in the prior art. In the above conventional implementation mechanism, the network Manager service (for example, LOAD _ bandwidth, failure parameter) is statically configured in JDBC (Java Database Connectivity) in combination with multiple IP address strings, when the parameter is set to LOAD _ bandwidth OFF and failure is set to failure, multiple IP address round robin can be implemented, and the round robin is terminated after an address that can be normally connected is directly found, so that the read-write request can be maintained ON the normally connected Database service. Thus, a primary capability of transparent application failover in a primary-backup cluster is achieved.
The premise of realizing the multi-IP round trip is that the multi-IP round trip can be completed only by the Net Manager service component. Not all users can shop for such database products. In addition, after the scheme deals with the main and standby failover, when the original main database is regrouped by the original IP address or hostname and works by the identity of the standby database (read-only), both the example and the monitoring service are normal, and the multi-IP configuration strings are circulated from top to bottom according to the original logic, so that the problem that the read-write request is sent to the read-only node to cause the failure of writing exists. At this time, manual intervention processing is required (for example, the sequence of the original connection string is adjusted, the IP address of the new master node is adjusted to the top of the connection string, or the switch over of the node is used as the master library). Therefore, it can be seen that the above scheme is used for the main/standby cluster and does not really realize transparent application failover.
Based on the above consideration, Transparent Application failover (Transparent Application failover) can be always realized when the original master library returns to the cluster with the identity of the backup library (read-only) after the master and backup failover (switch over), so as to ensure normal Application connection and read-write requests.
Disclosure of Invention
An object of the present invention is to provide a method, a storage medium, and a device for handling a failure of a database cluster, which can solve any of the above problems.
It is a further object of the present invention to prevent application connectivity anomalies.
It is another further object of the present invention to prevent brain cracks.
Particularly, the invention provides a fault handling method of a database cluster, which comprises the following steps:
acquiring abnormal events occurring in a database cluster where a current database is located;
confirming the main and standby states of the current database;
acquiring the communication state of a current database and a trust gateway;
and configuring a virtual IP according to the communication state and the master-slave state, wherein the virtual IP is an external connection address of the database cluster.
Optionally, the step of configuring the virtual IP according to the connected state and the active/standby state further includes:
under the condition that the current database is a main database, if the communication state is normal communication, maintaining the virtual IP of the current database;
and if the connection state is abnormal, deleting the virtual IP.
Optionally, the exception event comprises:
discovering that any other database in the database cluster has a fault;
and finding that any other database in the database cluster can not perform data synchronization with the current database.
Optionally, configuring the virtual IP according to the connected state and the active/standby state further includes:
under the condition that the current database is a standby database, if the communication state is normal communication, the current database is promoted to be a main database, and a virtual IP is added and started;
and if the communication state is abnormal connection, degrading the current database into an abnormal mode, and trying to kick the current database out of the cluster.
Optionally, the step of obtaining the connection status between the current database and the trusted gateway further includes:
and sending a detection message to a trust gateway of the database cluster, and confirming the communication state of the current database and the trust gateway according to the response of the trust gateway.
Optionally, the response of the trusted gateway comprises a normal response and a fault response, the fault response comprises an error response, a timeout response, an unresponsiveness, and
the step of confirming the communication state of the current database and the trust gateway according to the response of the trust gateway comprises the following steps:
determining that the communication state is normal communication under the condition that the response of the trust gateway is normal response;
and under the condition that the response of the trust gateway is a fault response, determining that the connection state is abnormal.
Optionally, the trusted gateway is a gateway device of a network segment where the database cluster is located;
optionally, the virtual IP is an IP address of the same network segment of the database cluster.
According to another aspect of the present invention, there is also provided a machine-readable storage medium having stored thereon a machine-executable program which, when executed by a processor, implements the fault handling method of any one of the above-described database clusters.
According to another aspect of the present invention, there is also provided a computer device, including a memory, a processor, and a machine-executable program stored on the memory and running on the processor, and when the machine-executable program is executed by the processor, the method for fault handling of any one of the above-mentioned database clusters is implemented.
The fault processing method of the database cluster comprises the steps of obtaining abnormal events in the database cluster where the current database is located; confirming the main and standby states of the current database; acquiring the communication state of the current database and the trust gateway; and configuring a virtual IP according to the communication state and the main/standby state, wherein the virtual IP is an external connection address of the database cluster. The invention can ensure that transparent application fault switching is always realized by setting the virtual IP, thereby ensuring that application connection and read-write requests are normal.
Furthermore, the database cluster fault processing method introduces a trust gateway concept, the database cluster can determine the communication state of the current database and the trust gateway according to the response of the trust gateway by sending a detection message to the trust gateway of the database cluster, and further judges the network state of the current database, so that corresponding measures are taken to prevent the occurrence of split brain.
The above and other objects, advantages and features of the present invention will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.
Drawings
Some specific embodiments of the invention will be described in detail hereinafter by way of example and not by way of limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:
FIG. 1 is a diagram of an architecture for implementing transparent application failover in the prior art;
fig. 2 is a schematic diagram of a data interaction process between a user side and a database cluster of the database cluster fault handling method according to an embodiment of the present invention;
FIG. 3 is a schematic architecture diagram of a database cluster of a method of failure handling of the database cluster according to one embodiment of the present invention;
FIG. 4 is a schematic flow chart diagram of a method of fault handling for a database cluster of one embodiment of the present invention;
FIG. 5 is a schematic flow diagram of a method for fault handling for a database cluster in the case where the current database is the master database, according to one embodiment of the present invention;
FIG. 6 is a schematic flow diagram of a method for failure handling of a database cluster in the case where a current database is a standby database, according to one embodiment of the present invention;
FIG. 7 is a schematic diagram of a machine-readable storage medium according to one embodiment of the invention; and
FIG. 8 is a schematic diagram of a computer device according to one embodiment of the invention.
Detailed Description
In the main and standby database clusters, the connection supporting the application read-write request service can only be provided by the main database, and the standby database usually does not provide external connection or only provides the query connection service of a read-only business system. When the master and slave databases are switched (because the switching is not necessarily planned, especially in case of failover), the application end may not know that the master and slave are switched, and the connection configuration of the application system is still maintained before the switching, which results in abnormal application read-write request service. In order to reduce the impact of the switching on the Application connection service, it is usually necessary to perform Transparent Application failover (Transparent Application failover) on the active/standby cluster.
The main and standby cluster support transparent application fault switching is realized by the standard that application connection is switched with the main and standby databases after the main and standby databases are switched, so that smooth connection of the databases is kept and read-write request service is normal. Transparent application failover is typically achieved by configuring a connection string for multiple IP rounds.
The premise for realizing multiple IP rounds is that the multiple IP rounds can be completed only by providing a Net Manager (network application Manager) service component, but not all customers can purchase the database products. In addition, after the scheme deals with the main and standby Failover, when the original main database returns the cluster again with the original IP address or hostname to work with the identity of the standby database (read only), both the example and the monitoring service are normal, and the multi-IP configuration strings are circulated from top to bottom according to the original logic, so that the problem that the read-write request is sent to the read-only node to cause the failure of writing exists. At this time, manual intervention (e.g., adjusting the sequence of the original connection string, adjusting the IP address of the new master node to the top of the connection string, or switching the node over to the master library) is required. It can be seen that the above scheme is used in the main/standby cluster and does not really realize transparent application failure switching.
Based on the above consideration, transparent application Failover can be always realized under the condition that the original master library is required to be returned to the cluster again with the backup library (read-only) identity after the master and backup Failover, so that the application connection and the read-write request are normal.
To solve the problems of transparent fault switching of the above main/standby cluster architecture, the concepts of virtual IP and trusted gateway need to be introduced. Before the standby database meeting the main/standby switching trigger condition and meeting the condition is not upgraded into the main database, the virtual IP can be dynamically migrated to a new main database according to the switching condition, and the virtual IP address on the fault node or the original main database is unloaded (or deleted) at the same time, so that even if any node is newly added to the database cluster in the later period (for the identity of the standby database), no link service is provided for the external because no virtual IP service exists, and the problem of abnormal communication caused by the fact that a write request accesses the standby database due to the fact that multiple addresses are sequentially patrolled is solved.
Fig. 2 is a schematic diagram of a data interaction process between a user side and a database cluster of the database cluster fault handling method according to an embodiment of the present invention. The data interaction process of the database cluster fault processing method of the present invention relates to the user side 400, the database cluster and the virtual IP. The virtual IP is an IP address of the same network segment of the database cluster and is used as an external connection address of the database cluster. The virtual IP refers to adding an IP address in the same network segment as a virtual address in a main database server (located in the same network card as the public network IP) as an external connection address. The database cluster includes a primary database server 100 and a backup database 200. The database cluster can be a framework of one master database and one standby database, and can also be a framework of a plurality of standby databases of one master database.
Fig. 3 is a schematic architecture diagram of a database cluster of a method of failure handling of the database cluster according to one embodiment of the present invention. The architecture of the database cluster of the present invention includes a primary database server 100, a backup database server 200, and a trust gateway 300. The trusted gateway 300 is a gateway device for the network segment in which the database cluster is located. The gateway device may be a router or a switch. The database cluster must first identify any gateway device in the network segment where the database cluster is located as a trusted gateway 300, and the primary database and the standby database are interacted by means of the trusted gateway 300. In the subsequent check, once the disconnection from the trusted gateway 300 is found, that is, the disconnection from all other devices in the network is simultaneously found. A trusted gateway is one design to ensure that a virtual IP address is always present only at the master database node.
The database cluster will use the existing device of the local network segment as the trusted gateway 300, and after providing the IP or host of the trusted gateway 300, all databases in the database cluster will send ICMP (Internet Control Message Protocol) messages to the trusted gateway 300 through ping and return and judge the communication state with the trusted gateway 300 through the messages. When the ping returns a normal response message, the trusted gateway 300 responds normally. When the ping returns an error message, the trusted gateway 300 responds incorrectly at this time. When the ping returns a timeout message, the trusted gateway 300 responds with a timeout at this time. When the ping time-out does not receive any message, the trust gateway 300 is not responded at this time. Determining that the communication state is normal communication when the response of the trusted gateway 300 is normal; in the case where the response of the trusted gateway 300 is an abnormal response, it is determined that the connectivity status is a connection abnormality.
FIG. 4 is a schematic flow chart diagram of a method of failure handling for a database cluster of one embodiment of the present invention. The method for processing the fault of the database cluster comprises the following steps:
step S202, abnormal events occurring in the database cluster where the current database is located are obtained. The abnormal events comprise the discovery of the fault of any other database in the database cluster and the failure of data synchronization between any other database in the database cluster and the current database. The current database is a database within a database cluster. The current database can be a main database or a standby database.
Step S204, the active/standby state of the current database is obtained. The master/standby state refers to the current database being a master database or a standby database.
Step S206, the communication state of the current database and the trust gateway is obtained. Step S208 may include: and sending a detection message to a trust gateway of the database cluster, and confirming the communication state of the current database and the trust gateway according to the response of the trust gateway.
The trusted gateway may be a gateway device of the network segment in which the database cluster is located. The gateway device can be a router or a switch, and the device does not need to be modified and only needs to provide an IP address. The response of the trusted gateway may include a normal response and a fault response.
Fault responses may include error responses, timeout responses, no responses. The database cluster can take the existing equipment of the local network segment as a trust gateway, and after the IP or host of the trust gateway is provided, all databases in the database cluster can send ICMP messages to the trust gateway through ping and return to judge the communication state with the trust gateway through the messages. And when the ping returns a normal response message, the trust gateway responds normally. And when the ping returns an error message, the trust gateway responds with an error. When the ping returns the timeout information, the trust gateway responds to the timeout. And when the ping is overtime and does not receive any message, the trust gateway is not responded at the moment. Determining that the communication state is normal communication under the condition that the response of the trust gateway is normal response; and under the condition that the response of the trust gateway is an abnormal response, determining that the communication state is abnormal connection.
And step S208, configuring a virtual IP according to the communication state and the master/standby state. In other embodiments, when no abnormal event is found in the current database, network check is performed at preset time intervals to determine whether the network of the device has a problem or not and whether other databases are disconnected, so that the occurrence of an error determination is determined.
The method for processing the database cluster fault comprises the steps of obtaining abnormal events in the database cluster where a current database is located; confirming the main and standby states of the current database; acquiring the communication state of a current database and a trust gateway; and configuring a virtual IP according to the communication state and the main/standby state, wherein the virtual IP is an external connection address of the database cluster. By setting the virtual IP, the invention can prevent abnormal application communication compared with the traditional main and standby clusters, always realize transparent application fault switching and ensure normal application connection and read-write requests.
Fig. 5 is a schematic flowchart of a fault handling method of a database cluster in a case where a current database is a master database according to an embodiment of the present invention. The step of configuring the virtual IP according to the connected state and the active/standby state in the method for processing a failure of the database cluster in this embodiment further includes:
step S302, the current database is confirmed to be a main database.
And step S304, confirming the communication state of the current database and the trust gateway. If the connection state is normal connection, executing step S306; if the connection status is abnormal, step S308 is executed.
And S306, maintaining the virtual IP of the current database, and kicking the database with the abnormal event out of the cluster.
Step S308, closing the current database and deleting the virtual IP.
And under the condition that the current database (main database) and other databases (standby databases) are synchronously interrupted, preliminarily judging the standby database to be in fault, simultaneously carrying out trust gateway detection, and judging that the local network is abnormal after the situation that the current database (main database) and the other databases (standby databases) cannot be communicated with the trust gateway is found. And closing the main database and deleting the virtual IP address.
Under the condition that the current database (main database) and other databases (standby databases) are synchronously interrupted, the standby database is preliminarily judged to be in fault, meanwhile, the trusted gateway is detected, the trusted gateway is normally connected, namely, the local network is normal, state change is not carried out, the virtual IP keeps running in the current database, and the backup database in fault is tried to be kicked out of the cluster.
Fig. 6 is a schematic flow chart of a fault handling method of a database cluster in the case that a current database is a standby database according to an embodiment of the present invention. The method for processing the failure of the database cluster according to this embodiment further includes the step of configuring a virtual IP according to the connected state and the active/standby state:
step S402, the current database is confirmed to be a standby database.
And step S404, confirming the communication state of the current database and the trust gateway. If the connection state is normal connection, go to step S406; if the connection status is abnormal, step S408 is executed.
Step S406, the current database is promoted to be a main database, and a virtual IP is added and started.
And step S408, degrading the current database into an abnormal mode, and trying to kick the current database out of the database cluster.
And under the condition that the current database (standby database) and other databases (main databases) are synchronously interrupted, primarily judging the fault of the main database, simultaneously carrying out trust gateway detection, ensuring that the communication with the trust gateway is normal and the local network is normal. And executing automatic failover, adding and starting a virtual IP (Internet protocol), and promoting a current database (standby database) to be a main database.
And under the condition that the current database (standby database) and other databases (main databases) are synchronously interrupted, primarily judging the fault of the main database, simultaneously carrying out trust gateway detection, and judging the local network is abnormal after finding that the trust gateway cannot be communicated. Automatic failover cannot be performed. And (3) degrading the current database (standby database) into an abnormal mode, trying to kick out the current database from the cluster, and only keeping the database running, wherein any fault processing cannot be executed subsequently. At this time, the virtual IP remains running in the other database (master database) unchanged.
As shown in table 1, the comparison of the two schemes can show the effect of the failure processing method (scheme:) of the database cluster of this embodiment. The scheme is as follows: and the main database and the standby database are automatically switched, and the application switching function is realized by the polling of multiple IP addresses. Scheme II: and the main and standby database clusters introduce a trust gateway and have the automatic virtual IP switching function. And the scheme shown in the table takes a main database and a cluster of standby databases as an example.
TABLE 1
Figure BDA0003667177260000081
Figure BDA0003667177260000091
As can be seen from the above figure, the failure processing method (solution ii) of the database cluster according to the present embodiment, compared to the solution i, avoids the occurrence of split brain when the main database or the backup database has a network problem, and can prevent the application connectivity (read-write request) from being abnormal.
That is, compared with the conventional active/standby cluster without any external service or device, the method for processing the failure of the database cluster in this embodiment can prevent the occurrence of split brain and simultaneously prevent the application connectivity (read/write request) from being abnormal.
The embodiment also provides a machine-readable storage medium and a computer device. Fig. 7 is a schematic diagram of a machine-readable storage medium according to an embodiment of the present invention, and fig. 8 is a schematic diagram of a computer apparatus according to an embodiment of the present invention.
The machine-readable storage medium 40 has stored thereon a machine-executable program 41, the machine-executable program 41 when executed by a processor implementing the method of fault handling for a database cluster of any of the embodiments described above.
The computer device 50 may comprise a memory 520, a processor 510 and a machine executable program 41 stored on the memory 520 and running on the processor 510, and the processor 510 implements the method of failure handling of a database cluster of any of the embodiments described above when executing the machine executable program 41.
It should be noted that the logic and/or steps shown in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any machine-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
For the purposes of this description, a machine-readable storage medium 40 can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium 40 may even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system.
The computer device 50 may be, for example, a server, a desktop computer, a notebook computer, a tablet computer, or a smartphone. In some examples, computer device 50 may be a cloud computing node. Computer device 50 may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer device 50 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
The computer device 50 may include a processor 510 adapted to execute stored instructions, a memory 520 providing temporary storage for the operation of the instructions during operation. Processor 510 may be a single core processor, a multi-core processor, a computing cluster, or any number of other configurations. Memory 520 may include Random Access Memory (RAM), read only memory, flash memory, or any other suitable storage system.
The processor 510 may also be linked through a system interconnect to a display interface suitable for connecting the computer device 50 to a display device. The display device may include a display screen as a built-in component of the computer device 50. The display device may also include a computer monitor, television, or projector, etc. externally connected to the computer device 50. In addition, a Network Interface Controller (NIC) may be adapted to connect computer device 50 to a network via a system interconnect. In some embodiments, the NIC may use any suitable interface or protocol (such as an internet small computer system interface, etc.) to transfer data. The network may be a cellular network, a radio network, a Wide Area Network (WAN)), a Local Area Network (LAN), the internet, or the like. The remote device may be connected to the computing device through a network.
The flowcharts provided by this embodiment are not intended to indicate that the operations of the method are to be performed in any particular order, or that all the operations of the method are included in each case. Further, the method may include additional operations. Additional variations on the above-described method are possible within the scope of the technical ideas provided by the method of this embodiment.
Thus, it should be appreciated by those skilled in the art that while a number of exemplary embodiments of the invention have been illustrated and described in detail herein, many other variations or modifications consistent with the principles of the invention may be directly determined or derived from the disclosure of the present invention without departing from the spirit and scope of the invention. Accordingly, the scope of the invention should be understood and interpreted to cover all such other variations or modifications.

Claims (10)

1. A failure processing method of a database cluster comprises the following steps:
acquiring abnormal events occurring in a database cluster where a current database is located;
confirming the main and standby states of the current database;
acquiring the communication state of the current database and the trust gateway;
and configuring a virtual IP according to the communication state and the main/standby state, wherein the virtual IP is an external connection address of the database cluster.
2. The method of database cluster failure handling according to claim 1, wherein said step of configuring virtual IPs according to said connected state and said active/standby state further comprises:
under the condition that the current database is a main database, if the communication state is normal communication, maintaining the virtual IP of the current database;
and if the communication state is abnormal connection, deleting the virtual IP.
3. The database cluster failure handling method of claim 1, wherein the exception event comprises:
discovering that any other database in the database cluster has a fault; and/or
And finding that any other database in the database cluster can not carry out data synchronization with the current database.
4. The database cluster failure handling method of claim 3, wherein the configuring virtual IPs according to the connected state and the active/standby state further comprises:
under the condition that the current database is a standby database, if the communication state is normal communication, the current database is promoted to be a main database, and the virtual IP is added and started;
and if the communication state is abnormal connection, degrading the current database into an abnormal mode, and trying to kick the current database out of the cluster.
5. The database cluster failure handling method of claim 1, wherein the step of obtaining the connectivity status of the current database with a trusted gateway further comprises:
and sending a detection message to a trust gateway of the database cluster, and confirming the communication state of the current database and the trust gateway according to the response of the trust gateway.
6. The database cluster failure handling method of claim 5, the response of the trusted gateway comprising a normal response and a failure response, the failure response comprising an error response, a timeout response, a non-response, and
the step of confirming the communication state of the current database and the trust gateway according to the response of the trust gateway comprises the following steps:
determining that the communication state is normal communication under the condition that the response of the trust gateway is normal response;
and under the condition that the response of the trust gateway is a fault response, determining that the connection state is abnormal.
7. The database cluster failure handling method of claim 1,
and the trust gateway is gateway equipment of the network segment where the database cluster is located.
8. The database cluster failure handling method of claim 1,
and the virtual IP is the IP address of the same network segment of the database cluster.
9. A machine readable storage medium having stored thereon a machine executable program which when executed by a processor implements a method of fault handling for a database cluster according to any one of claims 1 to 8.
10. A computer device comprising a memory, a processor and a machine-executable program stored on the memory and running on the processor, and the processor when executing the machine-executable program implements a method of fault handling for a database cluster according to any of claims 1 to 8.
CN202210594391.9A 2022-05-27 2022-05-27 Database cluster fault processing method, storage medium and equipment Pending CN115001950A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210594391.9A CN115001950A (en) 2022-05-27 2022-05-27 Database cluster fault processing method, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210594391.9A CN115001950A (en) 2022-05-27 2022-05-27 Database cluster fault processing method, storage medium and equipment

Publications (1)

Publication Number Publication Date
CN115001950A true CN115001950A (en) 2022-09-02

Family

ID=83029205

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210594391.9A Pending CN115001950A (en) 2022-05-27 2022-05-27 Database cluster fault processing method, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN115001950A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357800A (en) * 2017-05-18 2017-11-17 杭州沃趣科技股份有限公司 A kind of database High Availabitity zero loses solution method
CN107391633A (en) * 2017-06-30 2017-11-24 北京奇虎科技有限公司 Data-base cluster Automatic Optimal processing method, device and server
CN113360579A (en) * 2021-06-30 2021-09-07 平安普惠企业管理有限公司 Database high-availability processing method and device, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357800A (en) * 2017-05-18 2017-11-17 杭州沃趣科技股份有限公司 A kind of database High Availabitity zero loses solution method
CN107391633A (en) * 2017-06-30 2017-11-24 北京奇虎科技有限公司 Data-base cluster Automatic Optimal processing method, device and server
CN113360579A (en) * 2021-06-30 2021-09-07 平安普惠企业管理有限公司 Database high-availability processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US7225356B2 (en) System for managing operational failure occurrences in processing devices
US7076691B1 (en) Robust indication processing failure mode handling
JP5860497B2 (en) Failover and recovery for replicated data instances
US20080205286A1 (en) Test system using local loop to establish connection to baseboard management control and method therefor
JP2006114040A (en) Failover scope for node of computer cluster
CN107666493B (en) Database configuration method and equipment thereof
CN104503965A (en) High-elasticity high availability and load balancing realization method of PostgreSQL (Structured Query Language)
US20150256622A1 (en) Connection management device, communication system, connection management method, and computer program product
US11403319B2 (en) High-availability network device database synchronization
EP3648405B1 (en) System and method to create a highly available quorum for clustered solutions
EP2597818A1 (en) Cluster management system and method
US7499987B2 (en) Deterministically electing an active node
CN111651320A (en) High-concurrency connection method and system
US8671180B2 (en) Method and system for generic application liveliness monitoring for business resiliency
CN110351122B (en) Disaster recovery method, device, system and electronic equipment
CN114840495A (en) Database cluster split-brain prevention method, storage medium and device
CN115426258B (en) Information configuration method, device, switch and readable storage medium
US10367711B2 (en) Protecting virtual computing instances from network failures
CN113596195B (en) Public IP address management method, device, main node and storage medium
CN115001950A (en) Database cluster fault processing method, storage medium and equipment
JP2002344450A (en) High availability processing method, and executing system and processing program thereof
CN114697191A (en) Resource migration method, device, equipment and storage medium
JP2015114952A (en) Network system, monitoring control unit, and software verification method
CN114598643B (en) Data backup method and device
CN115499296B (en) Cloud desktop hot standby management method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination