CN117389772A

CN117389772A - Fault processing method and computing device of database system

Info

Publication number: CN117389772A
Application number: CN202311174554.9A
Authority: CN
Inventors: 胡义成
Original assignee: XFusion Digital Technologies Co Ltd
Current assignee: XFusion Digital Technologies Co Ltd
Priority date: 2023-09-08
Filing date: 2023-09-08
Publication date: 2024-01-12

Abstract

The application provides a fault processing method of a database system and computing equipment. In an embodiment, a database system includes a master node, a plurality of slave nodes, and a cluster manager, and a method is applied to the cluster manager, including: in the event of a failure of the master node of the ith tenure, the master node of the ith tenure is lowered to a slave node, and a first lock message is sent to the slave node (for disconnecting the master node of the ith tenure from the slave node); and determining the master node of the (i+1) th tenure from the slave nodes of the first locking state based on the tenure of the slave node of the first locking state (the slave node disconnected from the master node of the (i) th tenure) and the serial number of the persistently stored pre-written log under the condition that the number of the slave nodes of the first locking state is greater than or equal to a preset threshold. Therefore, for each slave node, the master can participate in the master selection without considering whether the WAL log is played back, and RTO is reduced to a certain extent.

Description

Fault processing method and computing device of database system

Technical Field

The present disclosure relates to the field of database technologies, and in particular, to a method and a computing device for processing a fault of a database system.

Background

A Database cluster (Database cluster) is a system made up of a plurality of Database nodes for handling large numbers of data requests and supporting high availability. One of the database nodes is typically selected as a Master node (Master node) and the other database nodes are called Slave nodes (Slave nodes) or Standby nodes (Standby nodes) that can accept copies of the Master node for data redundancy and high availability.

The cluster management software is used for managing the database clusters. When the cluster management software detects that the main node of the database cluster fails and can not provide service, arbitration is automatically initiated, one of available nodes is selected as the main node to continue providing service, when the service is stopped due to the downtime of the database until the service is recovered, the time period between the two points is RTO (Recovery Time Objective), namely a recovery time target, RTO is an important index for measuring the high availability of the database, and the smaller RTO is, the higher the high availability of the database is.

When the cluster management software selects a master node from available nodes, the cluster management software controls all available nodes to disconnect from an old master and plays back a Write-Ahead Log (WAL), which may cause a longer RTO and affect the service.

The information disclosed in this background section is only for enhancement of understanding of the general background of the application and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person of ordinary skill in the art.

Disclosure of Invention

The embodiment of the application provides a fault processing method and computing equipment of a database system, wherein for each slave node, the slave node can participate in master selection without considering whether the WAL log is replayed or not, and RTO is reduced to a certain extent.

In a first aspect, an embodiment of the present application provides a method for processing a failure of a database system, where the database system includes a master node, a plurality of slave nodes, such as N slave nodes, and a cluster manager, where the method is applied to the cluster manager, and includes:

in the case of a failure of the master node of the ith period, the master node of the ith period is lowered to the slave node, and a first lock message is sent to the slave node, wherein the first lock message is used for disconnecting the master node of the ith period from the slave node (namely, disconnecting the master node of the ith period from the slave node); under the condition that the number of the slave nodes in the first locking state is greater than or equal to a preset threshold value (can be half of the slave nodes or more than half of the slave nodes, and is determined by combining actual demands in detail), determining a master node in an i+1th period from the slave nodes in the first locking state based on the period of the slave nodes in the first locking state and the serial number of the permanently stored pre-written log; the slave node in the first locking state is a slave node disconnected with the master node in the ith period.

In the scheme, for each slave node, the slave node can participate in master selection without considering whether the WAL log is played back or not, and RTO is reduced to a certain extent.

In one possible implementation, the method further includes: sending detection requests to the master node and the slave node in the ith period according to the time interval, so that the master node and the slave node in the ith period send parameter values of detection parameters to the cluster manager; the detection request includes a detection parameter; wherein, the detection parameters include: whether a node fails, tenns, a sequence number, the sequence number indicating a pre-written log that has been persisted.

In the scheme, the pre-written log which is stored in a persistent mode is considered, whether the WAL log is replayed or not is not considered, and RTO can be reduced to a certain extent.

In one example, the method further comprises: a first configuration interface is provided for determining a user configured time interval.

In one example, the method further comprises: determining network information indicating a network condition between the master node and the slave node at the ith tenure; based on the network information, a time interval is determined. Illustratively, determining the time interval to be a first value when the network information indicates that there is no network delay between the master node and the slave node for the ith tenure; determining that the time interval is a second value when the network information indicates that network delay exists between the master node and the slave node in the ith period; the second value is greater than the first value.

In the scheme, the time interval can be flexibly configured and flexibly adapt to scene requirements, so that RTO (real time optical network) can be reduced to a certain extent.

In one possible implementation, determining, from the slave nodes of the first lock state, the master node of the i+1st tenure based on the tenure of the slave node of the first lock state and the sequence number of the persistently stored pre-written log, includes:

determining a slave node with a maximum tenure period and a serial number of a pre-written log stored in a lasting mode from the slave node in a first locking state which is never failed as a target node; and sending the ascending master information to the target node so that the slave node of the persistent storage pre-written log with the maximum serial number takes the slave node as the master node of the i+1st tenure based on the ascending master information.

In the scheme, the slave node with the maximum period and the serial number of the pre-written log stored in the maximum lasting mode is selected as the master node, at the moment, the WAL log of the master node can be regarded as the latest, and subsequently, the data obtained by playback based on the latest WAL log is consistent with or closest to the latest data of the old master, so that the influence on the service is reduced.

In an exemplary embodiment, in the slave node in the first locking state without failure, when the slave node having the maximum period and the serial number of the pre-written log stored in the persistent mode includes the master node in the ith period, the master node in the ith period is taken as the target node, so that the possibility of master-slave switching is reduced as much as possible, and the influence on the service is reduced.

In one possible implementation, the method further includes: sending unlocking information to the master node in the (i+1) th period, so that the master node in the (i+1) th period waits for connecting the slave nodes based on the unlocking information; and sending second lock information to the slave node so that the slave node establishes connection with the master node of the (i+1) th period based on the second lock information, and synchronizes the periods after connecting to the master node of the (i+1) th period.

In one possible implementation, the method further includes: when determining the fault of the main node in the ith period, waiting for the fault recovery of the main node in the ith period according to the waiting time; transmitting a first lock message to a slave node, comprising: and after the waiting time is exceeded, sending a first lock message to the slave node.

In the scheme, the fault recovery of the main node is waited by waiting for a certain time, so that the main node can participate in the subsequent main selecting process, and the old node is used as a new main as far as possible, thereby reducing the possibility of main-standby switching as far as possible and reducing the influence on the service.

In one example, the method further comprises: a second configuration interface is provided for determining a user configured wait period.

In one example, the method further comprises: determining influence degree information, wherein the influence degree information indicates the influence degree of switching the slave node to the master node on user service; and determining the waiting time according to the influence degree information. Illustratively, when the influence degree information indicates that the influence degree of the slave node to be switched to the master node on the user service is smaller than or equal to a first threshold value (the influence degree is small), determining that the waiting time period is a third value; when the influence degree information indicates that the influence degree of the slave node to the master node on the user service is greater than or equal to a second threshold value (the influence degree is large), determining the waiting time to be a fourth value; the fourth value is greater than the third value.

In the scheme, the waiting time length can be flexibly configured and flexibly adapted to scene requirements, so that the possibility of switching RTO and the master and slave can be reduced to a certain extent.

In a second aspect, an embodiment of the present application provides a fault handling device of a database system, where the fault handling device of the database system includes a plurality of modules, each module is configured to execute each step in the fault handling method of the database system provided in the first aspect of the present application, and the division of the modules is not limited herein. The specific functions and the beneficial effects executed by each module of the fault handling device of the database system refer to the functions of each step of the fault handling method of the database system provided in the first aspect of the embodiment of the present application, and are not described herein again.

Illustratively, the fault handling device of the database system includes:

the processing module is used for reducing the master node of the ith period to a slave node under the condition of the fault of the master node of the ith period, and sending a first lock message to the slave node, wherein the first lock message is used for disconnecting the master node of the ith period from the slave node (namely, disconnecting the slave node from the master node of the ith period);

The decision module is used for determining a master node in an i+1th period from the slave nodes in the first locking state based on the period of the slave nodes in the first locking state and the serial number of the permanently stored pre-written log under the condition that the number of the slave nodes in the first locking state is larger than or equal to a preset threshold value; the slave node in the first locking state is a slave node disconnected with the master node in the ith period.

In one possible implementation, the apparatus further includes:

the detection module is used for sending detection requests to the master node and the slave node in the ith period according to the time interval so that the master node and the slave node in the ith period send parameter values of detection parameters to the cluster manager; the detection request includes a detection parameter; wherein, the detection parameters include: whether a node fails, tenns, a sequence number, the sequence number indicating a pre-written log that has been persisted.

In one possible implementation, the apparatus further includes:

the detection interval configuration module is used for providing a first configuration interface, and the first configuration interface is used for determining a time interval configured by a user; or, determining network information indicating a network condition between the master node and the slave node in the ith tenure; based on the network information, a time interval is determined. Illustratively, determining the time interval to be a first value when the network information indicates that there is no network delay between the master node and the slave node for the ith tenure; determining that the time interval is a second value when the network information indicates that network delay exists between the master node and the slave node in the ith period; the second value is greater than the first value.

In one possible implementation, the decision module includes:

a selecting unit, configured to determine, from among the slave nodes in the first locking state that have never failed, a slave node having a maximum tenure period and a serial number of a pre-written log that has been stored in a persistent manner as a target node;

and the ascending master unit is used for sending ascending master information to the target node so that the slave node of the persistent storage pre-written log with the maximum serial number takes the slave node as the master node of the i+1st tenure based on the ascending master information.

In one example, the selecting unit is configured to, in the slave node in the first locking state that is not faulty, take the master node in the ith period as the target node when the slave node having the largest period and the serial number of the pre-written log that has been stored in the persistent mode includes the master node in the ith period.

In one possible implementation, the apparatus further includes:

the unlocking module is used for sending unlocking information to the master node in the (i+1) th period so that the master node in the (i+1) th period waits for connecting the slave nodes based on the unlocking information;

a connection module for transmitting second lock information to the slave node so that the slave node establishes connection with the master node of the (i+1) th period based on the second lock information, and synchronizes the period after connecting to the master node of the (i+1) th period

In one possible implementation manner, the processing module is further configured to, when determining the failure of the master node in the ith period, wait for the recovery of the failure of the master node in the ith period according to the waiting duration, and send the first lock message to the slave node after the waiting duration is exceeded.

In one example, the apparatus further comprises:

the waiting time length configuration module is used for providing a second configuration interface, and the second configuration interface is used for determining the waiting time length configured by a user; or determining influence degree information, wherein the influence degree information indicates the influence degree of switching the slave node to the master node on the user service; and determining the waiting time according to the influence degree information. Illustratively, when the influence degree information indicates that the influence degree of the slave node to be switched to the master node on the user service is smaller than or equal to a first threshold value (the influence degree is small), determining that the waiting time period is a third value; when the influence degree information indicates that the influence degree of the slave node to the master node on the user service is greater than or equal to a second threshold value (the influence degree is large), determining the waiting time to be a fourth value; the fourth value is greater than the third value.

In a third aspect, an embodiment of the present application provides a fault handling device of a database system, including: at least one memory for storing a program; at least one processor for executing the memory-stored program, the processor being for performing the method provided in the first aspect when the memory-stored program is executed.

In a fourth aspect, embodiments of the present application provide a fault handling apparatus for a database system, wherein the apparatus executes computer program instructions to perform the method provided in the first aspect. The apparatus may be, for example, a chip, or a processor.

In one example, the apparatus may include a processor, which may be coupled to a memory, read instructions in the memory and perform the method provided in the first aspect in accordance with the instructions. The memory may be integrated into the chip or the processor, or may be separate from the chip or the processor.

In a fifth aspect, embodiments of the present application provide a computer storage medium having instructions stored therein, which when run on a computer, cause the computer to perform the method provided in the second aspect, or to perform the method provided in the third aspect.

In a sixth aspect, embodiments of the present application provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method provided in the second aspect, or to perform the method provided in the third aspect.

Drawings

FIG. 1 is a system architecture diagram of a data processing system provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a computing device provided in an embodiment of the present application;

FIG. 3a is a schematic diagram of a scenario for pre-write log persistence provided by an embodiment of the present application;

FIG. 3b is a schematic diagram of a scenario of pre-written log playback provided by an embodiment of the present application;

FIG. 3c is a schematic diagram of a scenario in which a master node and a slave node perform pre-write log synchronization according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a scenario of database system fault handling in the related art;

FIG. 5 is a schematic flow chart of a method for processing a failure of a database system according to an embodiment of the present application;

fig. 6a is a schematic diagram of a scenario for detecting a time interval configuration of a database node according to an embodiment of the present application;

FIG. 6b is a schematic diagram of a scenario of a latency configuration provided by an embodiment of the present application;

FIG. 7a is a schematic illustration of a scenario 1 of a method for fault handling of the database system provided in FIG. 5;

FIG. 7b is a schematic diagram of a second scenario of the fault handling method of the database system provided in FIG. 5;

FIG. 8 is a schematic diagram of a fault handling apparatus of the database system provided in FIG. 5;

FIG. 9 is a flowchart of another method for processing a failure of a database system according to an embodiment of the present application;

FIG. 10a is a schematic diagram of a scenario 1 of a failure handling method of the database system provided in FIG. 9;

FIG. 10b is a second schematic diagram of a scenario of the failure handling method of the database system provided in FIG. 9;

fig. 11 is a schematic structural diagram of a fault handling apparatus of the database system provided in fig. 9.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be described below with reference to the accompanying drawings.

In the description of embodiments of the present application, words such as "exemplary," "such as" or "for example," are used to indicate by way of example, illustration, or description. Any embodiment or design described herein as "exemplary," "such as" or "for example" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary," "such as" or "for example," etc., is intended to present related concepts in a concrete fashion.

In the description of the embodiments of the present application, the term "and/or" is merely an association relationship describing an association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a alone, B alone, and both A and B. In addition, unless otherwise indicated, the term "plurality" means two or more. For example, the plurality of systems means two or more systems, and the plurality of terminals 200 means two or more terminals 200.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating an indicated technical feature. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

First, a description will be given of a data processing system to which the scheme provided in the embodiment of the present application may be applied. FIG. 1 is a diagram illustrating an example architecture of a data processing system provided in an embodiment of the present application. As shown in fig. 1, the data processing system may include a terminal 200, a plurality of server nodes; the plurality of server nodes are deployed with databases (including a plurality of database nodes (datanodes) 101) and management software of the databases; wherein, the database and the management software of the database can form a database system; one server node may be one computing device 100, or a plurality of server nodes may be one computing device. Wherein database node 101 may be understood as a database-related instance, embodied as a process; one database node 101 is disposed in one server node, or a plurality of database nodes 101 are disposed in one server node, the plurality of database nodes 101 may include a plurality of Slave nodes (or referred to as, database nodes, slave nodes or Standby nodes)) and a Master node (or referred to as a coordination node, a management node or a Master node), the Slave nodes may be the same as data in the Master node, and the Master node is used for managing and monitoring each Slave node, for example, the Master node may issue data operation instructions, such as data query instructions, to each Slave node, to instruct the database node to perform data query; the management software of the database may be deployed in 1 or more server nodes, and the management software may include the cluster manager 102; the terminal 200 may communicate (data interact) with a database, such as over a network. The terminal 200 may run a database client, such as an installation database client or access database client through a browser, which may be a desktop application, a Web application, a command line terminal 200, etc. And, the user may send SQL statements, such as query SQL statements, modify SQL statements, delete SQL statements, etc., to the database through the database client run by the terminal 200, so as to perform operations of querying, modifying, deleting, etc., data. It should be appreciated that the database client may be a software program that may run in a terminal 200 such as a notebook computer, desktop computer, server, or the like.

Embodiments of the present application provide a computing device 100.

In scenario S1 of this embodiment, computing device 100 may be a single-node server, such as a rack server. Illustratively, as shown in fig. 2, the computing device 100 may include a power supply 121 and a motherboard 110, the power supply 121 electrically connected to the motherboard 110 for powering devices on the motherboard 110.

Illustratively, the motherboard includes a central processing unit (central processing unit, CPU) 111, memory slot-in volatile memory, e.g., 112, programmable logic device (programmable logic device, PLD) 113, baseboard management controller (baseboard management controller, BMC) 114, PCIE slot 115, network card 122, hard disk 123, fan 124.

Wherein the memory 112 serves as an external cache. Memory 112 may be, for example, random access memory (random access memory, RAM). By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DR RAM).

The PCIE slot 115 is adapted to extend at least one of a GPU card, a network card, a video capture card, an HBA (Host Bus Adapter) card, a RAID (redundant arrays of independent disks, disk array) card, an SSD (solid state disk or solid state drive), and may also support extending various Adapter cards.

The programmable logic device 113 may be a complex logic device (complex programmable logic device, CPLD, a digital integrated circuit that a user constructs a logic function according to the needs of the user), or may be a field programmable gate array (field programmable gate array, FPGA).

The hard Disk 123 may be a mechanical hard Disk (HDD) or a Solid State Disk (SSD) for short. It should be understood that the hard disk 123 is merely an example of a nonvolatile memory, and is not particularly limited, and the nonvolatile memory may be selected in practical applications in combination with practical situations.

It should be noted that fig. 2 is only an example of the computing device 100, and is not limited to the specific embodiment, and may include more or less devices than fig. 2 in practical applications, such as a single chip microcomputer (an integrated circuit chip, equivalent to a microcomputer), a digital signal processor (digital signal processor, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, a general purpose processor, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In the scenario of this embodiment, the computing device 100 may be a multi-node server, which may be a blade server, a high-density server, or a whole cabinet server (an independent product formed by fusing an original rack with a machine, such as a architecture in which server nodes are separated); the multi-node server includes a server chassis and a plurality of server nodes, wherein the server nodes have complete functions of the server, and may be called as computing nodes in some possible cases. The structure of the server node may be seen in motherboard 110 in fig. 2, and may include more or fewer devices than shown in fig. 2.

For each database node 101, there are a data file (a carrier that holds data stored in the database, such as a table and an index) and a Log file (for recording a plurality of Write Ahead Logs (WALs)) in order) in the database node 101. If the data file in database node 101 is to be modified, first, the modification operations on the data file need to be recorded to the WAL log, and the WAL log describing these modification operations is persisted, which may be specifically a log file refreshed into a nonvolatile memory, such as hard disk 123; next, the data file is modified and persisted. When a fault occurs, the WAL log can be used for performing WAL log playback to restore the database. Wherein WAL log playback may be understood as modifying data based on a modification operation record of the WAL log record; the log file includes a plurality of WAL logs arranged in a sequence, each WAL log having a log sequence number (Log Sequence Number, LSN) to distinguish between different WAL logs; wherein LSN is used for recording the position and the size of the WAL log in the log file and is generally represented by a shaping number which is always increased, so that the larger the LSN is, the later the WAL log is generated, and the newer the WAL log is; in addition, database node 101 stores all WAL logs with LSNs less than or equal to the maximum, considering that LSNs are increasingly shaped numbers; therefore, by the largest LSN of WAL logs contained by different database nodes 101, the overlapping range of WAL logs of different database nodes 101 can be determined; illustratively, if the maximum LSN of the WAL log of one database node 101A is equal to the LSN of the maximum WAL log of another database node 101B, then the WAL logs of that database node 101A and database node 101B are the same; if the maximum LSN of the WAL log of one database node 101A is greater than the LSN of the maximum WAL log of another database node 101B, then database node 101A contains the WAL log of node 101B. Illustratively, as shown in fig. 3a, the processor 111 may generate a new WAL log (distinguished by LSN) assuming LSN4 according to the LSN of the WAL log, and write the modified operation record to the latest WAL log (LSN 4), and control the memory 112 to write the WAL log (LSN 4) to the log file of the hard disk 123. As shown in fig. 3b, during the WAL log playback process, the processor 111 reads the WAL log in the hard disk 123 from the memory 112, reads the WAL log from the memory 112, and plays back the WAL log in the order of increasing LSN, for example, lsn1→lsn2→lsn3→lsn4.

In this embodiment, for N database nodes 101, when the database is first run, the cluster manager 102 may divide N nodes into one master node and N-1 slave nodes. The slave node can receive the data of the master node to form a data copy, so that the data of the slave node and the data of the master node are identical to realize data redundancy and high availability; it should be noted that, in general, in order to prevent the latest data from being lost after the subsequent failure, the cluster manager 102 or the master node needs to determine that more than half of WAL logs of the slave nodes fall on a disk, and the master node can submit the transaction update data file and persistence the data file. In addition, after the failure of the master node, the cluster manager 102 may use the rule of multiple groups when selecting the master node, that is, more than half of the database nodes 101 are normal, and the master node can be selected only after the log playback is completed, so that it is ensured that at least one database node 101 has the latest data.

Wherein the data copy includes a WAL log, the process of synchronizing the WAL logs of the master node and the slave node may include: the master node writes the WAL log in memory 112 to hard disk 123 without a drop disk and sends to the slave node, which drops the WAL log to form a copy of the data. In practical application, after dividing N nodes 101 into 1 master node and N-1 slave nodes, the cluster manager 102 automatically initiates arbitration when detecting that the master node fails and cannot provide service, selects one from available nodes 101 as the master node to continue processing service, and performs WAL log synchronization between the slave node and the master node; in addition, in consideration of the difference between the WAL log included in the master node and the WAL included in the slave node, the master node generally includes the latest WAL log not included in the slave node, and therefore, the slave node and the master node perform WAL log synchronization. When the service is stopped from the downtime of the main node to the service recovery, the time period between the two points is called a recovery time target (Recovery Time Objective, RTO), wherein RTO is an important index for measuring the high availability of the database, and the smaller RTO is, the higher the high availability of the database is.

In summary, embodiments of the present application use Master-Slave (Master-Slave) topologies to reselect a Master node after a Master node fails to achieve failover and fault tolerance capabilities. By synchronizing data between the master and slave nodes, it is ensured that all nodes 101 have the same copy of the data.

Next, a process of WAL log synchronization when a slave node first connects to a master node will be described in detail with reference to fig. 3 c:

as shown in fig. 3c, when the slave node connects to the master node for the first time, a WAL log request is determined based on the largest LSNs of WAL logs stored in the slave node (which may reflect which of the WAL logs stored in the slave node), and the WAL log request is sent to the master node, where the WAL log request is used to request the LSN of the starting log (for convenience of description and distinction, referred to as the starting LSN, indicating the largest LSNs of the WAL logs stored in the slave node). The walmaster thread of the master node determines the largest LSN of WAL logs stored by the master node (which may reflect what WAL logs are stored by the master node, and may be referred to as an end LSN for convenience of description and distinction), determines WAL logs corresponding to LSNs between the initial LSN and the end LSN (which represent WAL logs different between the master node and the slave node, more specifically, represent WAL logs not included in the slave node) and takes the WAL logs as WAL logs to be synchronized, reads the WAL logs into a transmission message queue (output_xlog_message), and the walceder thread transmits the WAL logs to be synchronized from the transmission message queue to the slave node. The slave node receives the WAL log sent by the master node, caches the WAL log in a write message queue (WALDataWriterQuue), wakes up a WALRcvWriter thread, writes the WAL log drop received in the write message queue into a log file, and plays back the log file by a ParallelRecov thread to finish incremental update of the data file.

In some possible implementations, cluster manager 102 includes a server (which may be referred to as a server for ease of description and distinction) and a proxy (which may be referred to as a agent for ease of description and distinction). In some possible ways, each node 101 has one agent; one server manages the agent ends (agents) of all nodes 101.

Service end (server): the method is used for collecting state information reported by agent terminals (agents), and is used as an arbitration center and a global configuration center to divide N database nodes 101 into 1 master node and N-1 slave nodes.

Agent side (agent): and the state detection of the management database node 101 is responsible, the state information of the management database node 101 is reported to a server, and a command of the server (server) is issued to the database node 101. In some possible implementations, a proxy (agent) is deployed in the server node for monitoring the status of all database nodes 101 deployed in the server node.

It should be noted that in a distributed architecture of multiple database nodes 101, "time synchronization" is a significant problem, because each database node 101 may be inconsistent in clock due to different geographical locations, machine environments, etc., but time information is necessary to identify "expiration information". In the related art, the cluster manager 102 adopts the term (term) concept to divide time into terms, which are consecutive numbers and can be regarded as logical times. For each term, the cluster manager 102 starts with arbitration in which a plurality of database nodes 101 attempt to become master nodes, and after a master node is selected, the master node serves that term (term), and if the master node fails, a new master node is selected within that term (term), and the next term (term) is started. It is noted that for any period (term), when the master node and the slave node of that period (term) are connected for the first time, the master node will synchronize its period (term) to the connected slave node, so that the master node and the slave node are synchronized in time; if the slave node fails to connect with the master node, the latest period (term) of the master node will not be synchronized, and at this time, the period (term) of the slave node is smaller than the period (term) of the master node, and the corresponding data will be regarded as the expiration data. Thus, the tenn (term) can indicate whether the data is up to date.

Next, a process of selecting master arbitration after a failure of the master node in the related art will be described in detail. The main selecting and arbitrating process mainly comprises an arbitrating stage, a main selecting stage and a main exchanging stage. Suppose that it is currently located at the i (1 or more) th term.

Before the master arbitration, the cluster manager 102 sends detection requests to the N database nodes 101 at preset time intervals; the detection request is used for requesting reporting of the parameter value of the detection parameter; the detection parameters include whether a fault, tenure (term), target LSN, which indicates the WAL log for which playback is complete, typically the largest LSN in the pre-written log for which playback is complete.

For each database node 101 of the N database nodes 101, its own state information (i.e. dn state information in fig. 4) is reported, which is used to represent the parameter value of the detected parameter.

In particular implementations, cluster manager 102 arbitrates at certain arbitration intervals. The arbitration interval is generally a fixed value that is considered to be set, and for each arbitration, the cluster manager 102 enters an arbitration phase when judging that the master node in the ith period (term) fails based on the status information reported by the database node 101, and waits according to a waiting duration, where the waiting duration indicates a duration of performing fault recovery in the ith period (term), and is generally a fixed value that is set manually. It is noted that if the master node in the i-th period (term) recovers from the fault in the waiting period, the subsequent processes of arbitration, master selection and the like can be participated, and when the master node in the i-th period (term) meets the condition as the master node, the master node in the i+1-th period is used as the master node, so that the possibility of master change is reduced as much as possible; however, if the master node of the ith tenure (term) is not restored within the waiting period, a change of master is likely to occur.

Notably, the failure of the master node in the ith tenure (term) is mainly two cases as follows.

In case 1, if the primary node in the ith period (term) fails during the operation, the failure will be reported to the cluster manager 102, and the cluster manager 102 determines that the primary node in the ith period fails.

In case 2, if the instance of the master node in the ith period (term) is suspended and cannot be operated, at this time, the cluster manager 102 cannot receive the information reported by the master node in the ith period (term) within a period of time, and considers that the master node in the ith period fails.

For the case 1, after the failure of the master node in the ith period (term), the master node in the ith period (term) may automatically implement self-healing for failure recovery, and in the case that self-healing is not possible, considering that the cluster manager 102 deploys a management process, such as a proxy (agent), in the server node where the database node 101 is located, the proxy (agent) may help the master node in the ith period (term) perform failure recovery.

For case 2, since the master node of the ith period (term) is already inoperable, considering that the cluster manager 102 deploys a management process, such as a proxy (agent), in the server node where the database node 101 is located, the proxy (agent) can help to re-lift the master node of the ith period (term) to implement fault recovery.

In the arbitration phase, after the waiting period is exceeded, LOCK1 (issuing a first LOCK message (noted as LOCK1 message) to the slave node) is required and the failed node is arbitrated. Wherein the lock1 message includes a message type, a server node ID, dataNodeID; wherein the server node ID indicates the server node where the database node 101 (which may be indicated by a DataNodeID) is located; the message type is used to determine the command to execute, which is used to prohibit connection of the master node and WAL log playback, which may be referred to as the lock1 command in fig. 4; correspondingly, the lock1 message is used to control the database node 101 indicated by the DataNodeID on the server node indicated by the server node ID to execute a command that prohibits playback of the WAL log and the connection master node.

Determining that a master node in an ith tenure period (term) falls into a slave node in the process of arbitrating a fault node; if the master node of the ith period (term) fails during the operation, indicating that the master node of the ith period (term) is still running, when determining that the master node of the ith period (term) fails, the cluster manager 102 directly reduces the master node of the ith period (term) to a slave node (standby), and modifies the dn state of the master node of the ith period (term) to be the slave node; if the instance of the i-th period (term) master node hangs up and cannot operate, the cluster manager 102 determines that the i-th period (term) master node is reduced to a slave node, but only when the failure of the i-th period (term) master node is recovered, the dn state of the i-th period (term) master node is modified to be the slave node if the i-th period (term) master node is illustrated to be operating.

During LOCK1, cluster manager 102 sends a first LOCK message (denoted as LOCK1 message) to each slave node. For each slave node, after receiving the lock1 message, the slave node executes a command for prohibiting playback of the WAL log and a command for prohibiting playback of the master log, where the two commands are the lock1 command in fig. 4, and determines an execution result of the slave node (i.e., the lock1 result in fig. 4) and reports the result to the cluster manager 102. The execution result of the slave node includes whether the slave node successfully enters a lock1 state, and the lock1 state is successful mainly under two conditions: 1) The link between the slave node and the master node is permanently disconnected and no longer actively reconnected. 2) The log file of the slave node is completely played back and is not increased.

The condition that the cluster manager determines that the database management system enters the main selection stage is: successful lock1 state for more than at least half of the slave nodes: the link with the master node is permanently disconnected and no longer actively reconnected, while WAL log file playback ends.

Specifically, the cluster manager 102 enters the master selection stage after judging that more than half of the slave nodes enter the lock1 state. The principle of selecting the master node is that the selected master node needs to recover the data of the old master fault as far as possible, which is specifically expressed as follows: the WAL log is up-to-date, so that the data is ensured to be up-to-date by playing back the up-to-date WAL log, thereby reducing the influence on the service to a certain extent; considering various possible scenarios in database operation, it is possible that the largest LSN of the played back WAL log of database node 101 within the largest tenure (term) occurs, which is smaller than the largest LSN of the played back WAL log of database node 101 within the historic tenure (term); therefore, it cannot be guaranteed that the data is up-to-date (the latest here can be understood as the data at the time of failure of the master node) by the term (term) or the LSN maximum, so in order to guarantee that the data is up-to-date with a high probability, it is generally necessary to consider the term (term) and the LSN of the WAL log in combination.

The cluster manager 102 selects the slave node of the maximum tenure (term) maximum (indicating that the data is not expired data, is the latest data) and the maximum target LSN (indicating that the pre-written log that has been played back is latest, thereby indicating that the data after playback is latest) as the master node of the i+1st term. It should be noted that, if the master node in the ith period resumes in the waiting period, after entering the ascending master stage, the term and the target LSN of the master node in the ith period are generally the largest, and even if the term and the target LSN of other slave nodes are the largest, the master node in the ith period is used as the master node in the (i+1) th period, so that the possibility of occurrence of a master change is reduced as much as possible; however, if the master node of the ith tenure (term) is not restored within the waiting period, it is likely that the slave node with the maximum term and target LSN is lifted as the master node, and the master replacement occurs.

Illustratively, the manner in which the i+1st term master node is determined may be: the server sends out the fault transfer information (noted as a fault message) to the database node 101 through the agent, and selects the slave node with the maximum term and the maximum LSN in the database node 101 as the i+1th term master node. The failover message includes a message type, a server node ID, dataNodeID, a node type, and term. Wherein, the node type is used for describing a master node, term is used for describing a period of the master node, the server node ID indicates a server node where the database node 101 (which may be indicated by a DataNodeID) is located, the message type is used for describing an executed command, and the command is used for letting the database node 101 be the master node of the latest period (term); correspondingly, the failover message is used to control the database node 101 indicated by the DataNodeID on the server node indicated by the server node ID to take itself as the master node of the latest tenure (term).

The swap phase (which may also be referred to as a master unlock standby LOCK2 phase) is then entered.

The cluster manager 102 sends unlock information (denoted as unlock information) to the i+1st term master node and second lock information (denoted as lock2 message) to the slave node. Wherein the lock2 message includes connection information of the i+1st term's master node.

The unlock information may include, among other things, the message type, the server node ID, dataNodeID. Wherein the server node ID indicates the server node where the database node 101 (which may be indicated by a DataNodeID) is located; the message type indicates the command to execute for performing the polling connection, such as may be referred to as the unlock command in fig. 4; correspondingly, the unlock information is used to control the database node 101 indicated by the DataNodeID on the server node indicated by the server node ID to perform a polling connection.

The lock2 message may include, among other things, the message type, server node ID, dataNodeID, host IP, port. Wherein the host IP may be understood as a communication address of a network card of the server node indicated by the server node ID; the port indicates the access address of database node 101 (DataNode); the message type is used to determine the command to execute, which is used to connect the master node of the most recent tenure (term), which may be referred to as the lock2 command in fig. 4; the server node ID indicates the server node where the database node 101 (which may be indicated by a DataNodeID) is located; correspondingly, the lock2 message is used to control the database node 101 indicated by the DataNodeID on the server node indicated by the server node ID to connect the host IP and the database node 101 indicated by the port.

The (i+1) -th term master node executes a poll connection command (unlock command in fig. 4) based on the unlock information, waiting for connection to be established with the slave node.

The slave node executes a connect command (lock 2 command in fig. 4) based on the lock2 message, and establishes a connection with the i+1th term master node.

It should be noted that, when the cluster manager 102 manages N database nodes 101 for the first time, the database has no master node at this time, and the master node and the slave node need to be determined according to the foregoing arbitration phase (without waiting according to the waiting time duration), the master node selection phase, and the master node replacement phase.

Illustratively, as shown in FIG. 4, assume that there are 3 database nodes 101: datanode (1), datanode (2), datanode (3), agent cm_agent (for convenience of description, different agent ends of 3 database nodes 101 are described as a whole as an example), and server cm_server; let the master node of the i-th term be datinode (1), and the slave nodes be datinode (2) and datinode (3).

The agent cm_agent detects dn state information (indicating whether a fault, term, or target LSN indicating the WAL log that has been played back is complete, typically the largest LSN of the WAL log that has been played back) of each of the datinode (1), datinode (2), and datinode (3), and reports the dn state information to the server cm_server.

The agent cm_agent reports the state information of each node to the server cm_server according to a certain time interval (a certain frequency), and the server cm_server arbitrates according to an arbitration interval.

For each arbitration, the server cm_server determines whether the main node (datanode (1)) is detected to be faulty, enters an arbitration stage after the fault, and waits based on a waiting time, as described above, the waiting time indicates a time for the fault recovery of the main node waiting for the ith period of the fault, and is typically a fixed value.

After the waiting time is exceeded, for the failed master node (datanode (1)), the server side cm_server determines that the master node (datanode (1)) is down to the slave node (standby) after the failure, and when the control agent side cm_agent is in the running state of the master node (datanode (1)), the dn state of the master node (datanode (1)) is modified to be the slave node.

Then, the server cm_server sends the agent cm_agent to the non-faulty slave node: the datinode (1), the datinode (2) and the datinode (3) send lock1 messages, and the datinode (1), the datinode (2) and the datinode (3) respectively execute lock1 commands (commands for disabling connection of a master node and commands for WAL log playback); the agent cm_agent detects the results of the lock1 executed by the datinode (1), the datinode (2) and the datinode (3) respectively, and reports the results to the server cm_server.

The server cm_server determines at least 2 slave nodes: when the datinode (1), the datinode (2) and the datinode (3) enter the lock1 state, the main selection stage is entered.

After the server cm-server determines to enter a main selection stage, a datinode (1) with the maximum term and the maximum target LSN is selected as a main node of the i+1th period, a failover message is sent to the datinode (1), and after the detection of dn1 (the success of the execution of the failover result by the agent cm-agent, the completion of the main selection is determined.

And the master selection is completed, and a stage of unlocking the slave node LOCK2 by the master node is entered.

The server cm_server sends a unlock message to the datinode (1) through the agent cm_agent, and sends lock2 messages to the datinode (2) and the datinode (3). Optionally, the server cm_server may detect that dn1 (datanode (1)) is successfully executed by the agent cm_agent, and send a lock2 message to datanode (2) and datanode (3).

The datanode (1) executes a unlock command based on the unlock information, waiting for a connection to be established with the slave node.

The datinode (2) and the datinode (3) execute a lock2 command based on the lock2 message respectively, and establish connection with the datinode (1).

In addition, the agent cm_agent detects the result of the datinode (2) and the datinode (3) executing the lock2 and reports the result to the server cm_server, so that the server cm_server knows which slave nodes connected with the master node (dn 1) exist.

The master arbitration is ended.

For the above scheme, there are the following problems:

1. in the arbitration phase, the main node waiting for faults is recovered according to the waiting time (fixed) which is not raised to the main node even if the slave node is ready to meet the condition of raising to the main node; if the failure of the master node is not recovered in the waiting time, selecting the master node from available slave nodes after the waiting time; if the fault is recovered in the waiting time, the master node needs to be reduced to the slave node, enter the lock1 state and participate in the master selection after the waiting time, if the master node enters the lock1 state and meets the condition of rising to the master node, the master node is reselected, and then the step of unlocking is needed. In summary, if the waiting time for the recovery of the master node waiting for the failure is shorter, the failure of the master node may not be recovered, so that the master node may not participate in the master selection, the slave node is easily switched to the master node, and if the waiting time is longer, the RTO may be prolonged.

2. After the database is down due to the failure of the master node, the master selection is needed to be performed after the completion of the playback of WAL logs of a plurality of slave nodes (more than half of slave nodes), a certain time is wasted, the more the WAL logs are wasted, the longer the wasted time is, the database can be in an unavailable state for a long time, and even if the failed master node is automatically recovered, the slave node still needs to wait for the completion of the playback of the WAL logs to recover the available state.

3. The time interval and arbitration interval during which cluster manager 102 detects the status information of database node 101 is fixed, and there may be some time wasted from when cluster manager 102 detects a failure of database node 101 to when master arbitration begins.

In order to solve the above problems, an embodiment of the present application provides a method for processing a fault of a database system. The treatment method is improved by 6 points compared with the prior art.

Improvement 1: the conditions for a single slave node to successfully perform lock1 state are: the link between the slave node and the master node is permanently disconnected and is not actively reconnected any more, and the completion of the playback of the WAL log of the standby machine (namely, the slave node) is not required, so that the WAL log of the standby machine (namely, the slave node) can be selected for the master without the completion of the playback, and can also be directly lifted to the master node. And then, before the main node provides service to the outside, finishing playback of the WAL log.

Improvement 2: the cluster manager 102 may also query term when the WAL log of the slave node is not replayed, and the last LSN of the persistent WAL log (which may be denoted as write_ LSN for convenience of description and distinction). Wherein write_ lsn can determine how much of the WAL log has been persisted; subsequently, the cluster manager 102 selects the slave node with the maximum term and the maximum write_ lsn as the master node, so that the slave node is lifted as the master node without considering whether playback of the WAL log is completed, and RTO is reduced to some extent.

It should be noted that if the term of the slave node is maximum and the write_ lsn is maximum, the persistent WAL log contained in the slave node is up to date, and before the service is provided to the outside, the up-to-date WAL log is played back, so that the data is up to date and as close to the data in the old main fault as possible, and the influence on the service is reduced.

In particular implementations, the detection parameters may be modified to whether the failure, tenure, write_ LSN, write_ LSN indicate the largest LSN in the persisted WAL logs, through which all the WAL logs in database node 101 that have been persisted are indicated. Database node 101 stores all WAL logs equal to or less than the maximum LSN.

Improvement 3: after the main node fails, in the waiting time, the main node is recovered, and the old main node is directly re-used as the main node in the next period (term) because the term of the main node is maximum and the write_ lsn is maximum, so that the main selection is realized quickly without entering a lock1 state and participating in the main selection; correspondingly, the waiting time length here does not indicate the time for the main node waiting for the fault to recover the fault any more, but can be understood as the time length that the main node waiting for the ith period of the fault can be regarded as the main node of the (i+1) th period again.

Improvement 4: parameterizing a time interval at which the cluster manager 102 sends the detection request; specifically, the time interval for the cluster manager 102 to send the detection request may be modified according to practical situations, for example, the time interval for the cluster manager 102 to send the detection request may be reduced when the network states of the master node and the slave node are good; the time interval for the cluster manager 102 to send a detection request may be increased when the network status of the master node and the slave node is poor.

Improvement 5: the arbitration interval of the cluster manager 102 is parameterized, and in particular, the arbitration interval may be flexibly changed in combination with the time interval of sending the detection request, so as to reduce the duration of the period from when the cluster manager 102 detects the failure of the master node to when the arbitration starts. Illustratively, the agent cm_agent reports the state information of each node to the server cm_server according to a certain time interval (a certain frequency), and the server cm_server can determine an arbitration interval based on the reporting frequency of the cm_agent; for example, the arbitration interval may be set to be slightly larger than the time interval for sending the detection request.

The improvement is as follows: parameterizing the waiting time period (the time period for waiting for the recovery of the main node of the failure); specifically, the waiting time length can be modified according to actual conditions, for example, the influence degree on the user service after the master node is switched to the slave node can be predicted, and when the influence is larger, the waiting time length is increased; when the influence is small, the waiting time period is reduced. The influence on the user service can be determined by the historical performance data of the master node in the current period, for example, the historical performance data can comprise the service processing speed of a database and the service volume of the user, and the faster the service processing speed of the database is, the smaller the service volume of the user is, which means that the smaller the influence on the user service is when the master node is switched to the slave node; whereas the larger.

Next, a detailed description will be given of a fault handling method of the database system provided in the embodiment of the present application, in combination with the database system provided above.

Fig. 5 is a flowchart of a fault handling method of a database system according to an embodiment of the present application. The present embodiment is applicable to a database system, and in particular, to cluster manager 102 and database node 101. As shown in fig. 5, the fault handling method of the database system provided in the embodiment of the present application at least includes the following steps:

Step 501, the cluster manager 102 determines a slave node and a master node in an ith tenure; wherein i is a positive integer of 1 or more.

Step 502, the cluster manager 102 sends a detection request to the slave node and the master node in the ith period according to the time interval; the detection request includes a detection parameter; the detection parameters include whether the fault, tenure, sequence number, the sequence number indicating the persistent pre-written log.

Illustratively, the sequence number is the maximum sequence number (LSN) of the pre-written log that has been persisted.

The time interval of detection can be fixed or flexible.

In some possible implementations, the cluster manager 102 may send the detection request to the slave node and the master node of the ith tenure at preset time intervals; for the slave node, the slave node determines a parameter value of the detection parameter after receiving the detection request. The main nodes in the ith period are similar and are not described in detail.

It should be noted that, in the embodiment of the present application, the preset time interval is configured to be adjustable.

According to one possible implementation, cluster manager 102 may provide a configuration interface (which may be referred to as a first configuration interface for ease of description and distinction) for determining a user configured time interval. Illustratively, the first configuration Interface is for invoking a User Interface (UI). The user can access the configuration interface through the terminal 200 such as a mobile phone, a computer, etc., and input the time interval in the User Interface (UI).

According to one possible implementation, the cluster manager 102 may detect a network condition between the i-th master node and the slave node, to obtain network information; based on the network information, a time interval is determined.

Illustratively, the cluster manager 102 determines that the time interval is a first value when the network information indicates that there is no network delay between the master node and the slave node for the ith tenure; when the network information indicates that there is a network delay between the master node and the slave node for the ith tenure, the time interval is determined to be a second value (greater than the first value).

In practical applications, cluster manager 102 may set an initial time interval when the database starts to run; for example, an initial time interval may be determined via the first configuration interface; thereafter, the update may be implemented at time intervals based on the network information. Illustratively, the time interval is reduced when the network information indicates that there is no network delay between the master node and the slave node for the ith tenure; the time interval is increased when the network information indicates that there is a network delay between the master node and the N-1 slave nodes for the ith tenure.

Illustratively, as shown in fig. 6a, the user accesses the first configuration interface through a computer (which may also be other terminal 200 devices such as a mobile phone), the computer displays a UI, and the user configures a time interval in the UI (step A1 shown in fig. 6 a); the cluster resource management system 102 detects a network situation of the database (step A2 shown in fig. 6 a), where the network situation describes a communication duration between the master node and the standby node; the cluster resource management system 102 updates the user configured time interval (step A3 shown in fig. 6 a) based on the network information (indicating the detected network situation).

Step 503, reporting the parameter values of the detection parameters to the cluster manager by the slave node and the master node in the ith period.

And 504, performing fault recovery on the main node in the ith period.

In the embodiment of the present application, after the failure occurs in the operation process of the master node in the ith period, the failure recovery is actively performed. If the master node of the ith tenure (term) is already unable to run, considering that the cluster manager 102 deploys a management process such as a proxy (agent) in the server node where the database node 101 is located, the master node of the ith tenure (term) may be re-pulled with the help of the proxy (agent), and if the master node of the ith tenure (term) cannot be pulled in a short period (for example, in a period of several seconds), a failure of the master node of the ith tenure is determined.

Step 505, the cluster manager 102 arbitrates according to the arbitration interval, and for each arbitration, based on the situation reported by the master node in the ith period, determines whether the master node in the ith period fails, and if so, step 506 is executed.

The arbitration interval may be fixed or may be flexibly variable. The arbitration interval may be a parameter that is manually configurable flexibly, or may be flexibly varied based on the time interval at which the detection request is sent, thereby reducing the time difference between detecting a failure of the master node and starting arbitration, for example. For example, the server cm_server may determine the arbitration interval based on the reported time interval of cm_agent.

If the ith master node fails, it indicates that the database is in a state of no master, and the master node needs to be reselected.

Illustratively, the cluster manager 102 may determine, through the detection parameters reported by the master node in the ith period, whether the master node in the ith period fails; illustratively, the cluster manager 102 may not receive information reported by the master node of the ith tenure (term) within a period of time, and may consider the master node of the ith tenure to be faulty.

Step 506, cluster manager 102 waits according to the waiting time.

In this embodiment of the present application, the waiting duration may be fixed or may be flexibly changed. The waiting period is generally 0 or more.

According to one possible implementation, cluster manager 102 may provide a configuration interface (which may be referred to as a second configuration interface for ease of description and distinction) for determining a user configured wait period. Illustratively, the second configuration Interface is for invoking a User Interface (UI). The user can access the configuration interface through the terminal 200 devices such as a mobile phone, a computer and the like, and input the waiting time in a User Interface (UI).

According to one possible implementation manner, the cluster manager 102 may detect the influence of the master node to the slave node on the user service, and obtain the influence degree information; and determining the waiting time according to the influence degree information.

Specifically, when the influence degree information indicates that the influence degree of switching the slave node to the master node on the user service is small, for example, less than or equal to a first threshold, the cluster manager 102 may determine that the waiting duration is a third value; the first threshold may be set in combination with an actual situation, which is not specifically limited in the embodiment of the present application; when the influence degree information indicates that the influence degree of the slave node to the master node on the user service is larger than or equal to a second threshold value, determining that the waiting time length is a fourth numerical value; the fourth value is greater than the third value; the second threshold may be set in combination with an actual situation, which is not specifically limited in the embodiments of the present application. When the influence degree information indicates that the influence degree of the slave node to the master node on the user service is normal, for example, greater than the first threshold value and smaller than the second threshold value, the waiting time period can be determined to be unchanged. By way of example, the first threshold may be 0.3 and the second threshold may be 0.7, assuming that the degree of influence is represented by a value between 0 and 1.

Wherein the influence degree information is generally determined by monitoring the historical operation condition of the master node in the ith tenure. Illustratively, the historical operating conditions that may be detected by the cluster manager 102 may include a service processing speed of the database (may reflect a speed of processing a service by the database), an upper limit of a service throughput (such as an amount of data that may be maximally processed in a unit time), a service volume of a user (a size of a traffic volume of the service in the unit time), and determining the influence degree information based on the service processing speed, the upper limit of the service throughput, and the service volume of the user; the faster the service processing speed, the higher the upper limit of the service processing amount and the lower the service amount, the smaller the influence of the master node to the slave node on the user service is, and the larger the influence is on the other hand. It should be noted that, the traffic of the user may be estimated by the detection database for the processing condition of the user's traffic, or may be actively reported to the cluster manager 102 by the user.

In practical applications, the cluster manager 102 may set an initial waiting period when the database starts to run; for example, an initial wait period may be determined via the second configuration interface; then, the update of the latency can be realized according to the influence degree information. Illustratively, when the influence degree information indicates that the influence degree of switching the slave node to the master node on the user service is small, the waiting time is reduced; and when the influence degree information indicates that the influence degree of the slave node to the master node on the user service is large, increasing the waiting time.

Illustratively, as shown in fig. 6B, the user accesses the second configuration interface through a computer (which may also be other terminal 200 devices such as a mobile phone), the computer displays a UI, and the user configures a waiting time period in the UI (step B1 shown in fig. 6B); the cluster resource management system 102 detects the service processing situation of the database (step B2 shown in fig. 6B), where the service processing situation describes the service processing speed, the upper limit of the number of service volumes and the service volumes of the users; the cluster resource management system 102 updates the waiting period configured by the user (step B3 shown in fig. 6B) based on the performance information (indicating the detected traffic handling situation).

Step 507, cluster manager 102 determines whether the waiting time period is exceeded, if yes, step 508 and step 509 are executed, and if no, step 506 is executed.

Step 508, the cluster manager 102 changes the master node of the ith tenure to the slave node.

It should be noted that, if the master node of the i-th period (term) fails during the operation, and at this time, the master node of the i-th period (term) is still operating, after determining that the master node of the i-th period (term) fails, the cluster manager 102 directly reduces the master node of the i-th period (term) to a slave node (standby), and modifies the dn state of the master node of the i-th period (term) to a slave node. If the instance of the master node of the i-th tenure (term) hangs up and cannot run, the cluster manager 102 first drops the master node of the i-th tenure (term) to a slave node (standby); when the fault recovery of the master node of the ith period (term) is determined, and the master node of the ith period (term) is operated, the dn state of the master node of the ith period (term) is modified to be the slave node.

It should be noted that step 508 may also be performed during the waiting period, so that step 508 does not need to be performed after the waiting period, so that the first lock message is sent to all the slave nodes as soon as possible, and the master selection phase is entered more quickly.

Step 509, the cluster manager 102 sends a first lock message to the slave node, where the first lock message is used to disconnect the master node from the slave node.

It should be noted that, in the embodiment of the present application, the execution sequence of steps 509 and 508 is not limited, for example, step 509 may precede step 508, and for example, step 509 and step 508 may be executed in parallel; the execution sequence of the step 508 and the step 509 is flexibly changed according to actual situations. In addition, the number of execution times of step 509 is not limited.

It should be noted that the first lock message is different from the lock1 message in fig. 4 in that the first lock message in step 507 is used to break the link between the master node and the slave node, and no WAL log playback of the slave node is required to complete. A slave node that successfully enters the first lock state (lock 1 state) indicates that the slave node has permanently disconnected from the master node link and is no longer actively reconnected.

Step 510, the slave node executes a command to disconnect from the master node in the ith tenure based on the first lock message.

After determining that the failure of the master node in the i-th tenure is recovered, the slave node in the first locking state is directly used as the slave node without sending the first locking message to the master node, or an instruction of executing the first locking message is not needed. In some general scenarios requiring RTO, the master node at the i-th tenure may also execute the instruction of the first lock message.

Step 511, the cluster manager 102 determines whether the number of slave nodes in the first lock state is greater than half of slave nodes, and if so, step 512 is executed; the slave node in the first locking state is a slave node which is disconnected with the master node in the ith period and is not actively reconnected with the master node.

It should be noted that the number of slave nodes in the first locking state is greater than half, which is merely an example, and may be greater than other values in practical applications, for example, greater than two-thirds of the number of slave nodes, and may be specifically designed flexibly in combination with practical situations.

Step 512, cluster manager 102 determines, from among the slave nodes in the first lock state that have never failed, the slave node of the persisted pre-written log that has the largest tenure and the largest sequence number as the target node.

Specifically, in the period of the slave node in the first lock state, the maximum and maximum sequence number (the maximum sequence number may indicate that the time of the generated pre-written log is the latest), the WAL log of the locked slave node may be regarded as being the latest, consistent with the WAL log in the master node in the i-th period, and subsequently, the data played back based on the latest WAL log is consistent with the data of the master node in the i-th period, and is the latest, so that the slave node in the first lock state may be regarded as the target node, that is, the master node in the i+1-th period. The time of generating the pre-written log is the time of generating the pre-written log in the main node memory. Illustratively, in the scenario of the Largest Sequence Number (LSN) of the persistent pre-written log, the target node is the slave node of the first lock state with the Largest Sequence Number (LSN) and the largest tenure.

It should be noted that, the target node may be the master node in the ith period of fault recovery.

It is noted that there may be multiple slave nodes with the first lock state of the persisted WAL log with the largest sequence number, and there may be multiple ways to determine the target node from among the multiple slave nodes with the first lock state of the persisted WAL log with the largest sequence number.

In one possible manner, cluster manager 102 may select, as the target node, a slave node having the smallest number of first lock states from among the plurality of slave nodes having the largest sequence numbers. In practical applications, the cluster manager 102 numbers N nodes, so as to distinguish between different nodes. Thus, the target node may be determined based on the number of the slave node.

In another possible manner, the secondary node of the first, non-failed, locked state may comprise the primary node of the ith tenure of failure recovery; the cluster manager 102 may use the master node in the ith period as the target node when determining that the slave nodes in the plurality of first locking states of the persistent WAL log with the maximum sequence number have the master node in the ith period, and in this way, the master is not required to be replaced, thereby reducing the influence on the service.

It should be noted that the above implementation is merely exemplary, and not intended to be limiting, and the target node may be specifically determined from the nodes according to actual requirements from the plurality of first lock states of the persistent WAL log having the largest sequence number.

In step 513, cluster manager 102 sends a master up message to the target node.

Illustratively, the lift master message is the failover message in FIG. 4.

Step 514, the target node determines itself as the master node of the i+1st tenure based on the rising master message.

Step 515, the cluster manager 102 sends an unlocking message to the master node in the i+1st tenure.

Illustratively, the rising master message is the unlock message in fig. 4.

In step 516, the master node in the i+1th tenure executes a poll connection command based on the unlock message, and receives the connection of the slave node.

Step 517, cluster manager 102 sends a second lock message to the slave nodes, respectively.

Illustratively, the second lock message is the lock2 message in fig. 4.

The execution sequence of step 517 may be performed after step 515 or may be performed in parallel with step 515.

Step 518, the slave node executes the command connected with the master node of the (i+1) th tenure based on the second lock message, and synchronizes the tenure after connecting the master node of the (i+1) th tenure.

In the scheme, the main selection can be performed without playback of all logs, so that RTO is reduced on a certain program; in addition, the time interval, the arbitration interval and the waiting time can be flexibly configured, so that RTO is reduced to a certain extent.

Based on the above provided fault handling method of the database system, a specific application of the fault handling method of the database system will be described.

Fig. 7a provides one possible application scenario. As shown in fig. 7a, assume that there are 3 database nodes 101: datanode (1), datanode (2), datanode (3), agent cm_agent (for convenience of description, different agent ends of 3 database nodes 101 are taken as a whole), and service cm_server; suppose the master node of the i-th term is datinode (1).

The agent cm_agent detects dn state information of each of the datinode (1), the datinode (2) and the datinode (3) according to time intervals (indicating whether faults occur, term, write_ LSN, write_ LSN indicates the serial number of the persistent WAL log, the serial number can represent the same data amount as the old master node, if the LSN is larger, the data amount of the node is closer to the data amount of the old master node after log playback), and the data amount is reported to the server cm_server. In this embodiment of the present application, the detected time interval may be according to the master node: datanode (1) and slave node: the network states between the datanode (2) and the datanode (3) change flexibly.

The agent cm_agent reports the state information of each node to the server cm_server according to a certain time interval (a certain frequency), and the server cm_server determines an arbitration interval based on the reporting frequency of the cm_agent and arbitrates according to the arbitration interval; for each arbitration, the server cm_server detects whether the master node (datanode (1)) is faulty, and waits based on the waiting time period once the fault enters the arbitration phase. In this embodiment of the present application, the waiting duration indicates a duration that the master node in the ith period of waiting for the failure can be regarded as the master node again, and the waiting duration can be determined according to the historical performance data of the master node in the ith period of waiting for the failure, for example, when the service can be rapidly processed, the description performance is good, and the influence of the switching between the master node and the standby node on the service is greater, at this time, the waiting duration is long, otherwise, the waiting duration is short.

And recovering the fault of the main node (datanode (1)) in the waiting time.

After the waiting time is exceeded, for the failed master node (datanode (1)), the server side cm_server determines that the master node (datanode (1)) is reduced to the slave node (standby), and when the master node (datanode (1)) is in an operation state, the control agent side cm_agent modifies the dn state of the master node (datanode (1)) to be the slave node.

Then, the server cm_server sends the agent cm_agent to the slave node: the method comprises the steps that the datanode (1), the datanode (2) and the datanode (3) respectively send lock1 messages, and the datanode (1), the datanode (2) and the datanode (3) respectively execute lock1 commands (commands for disabling a connection master node); proxy cm_agent detects dn: the result of lock1 is executed by each of the datanode (1), the datanode (2) and the datanode (3), and the result is reported to the server cm_server.

The server cm_server determines at least 2 slave nodes: and after the datinode (1), the datinode (2) and the datinode (3) enter a lock1 state (the link between the slave node and the master node is permanently disconnected and is not actively reconnected any more), entering a master selection stage.

The server cm-server selects a term and a target LSN (1) with the largest dastandode as a main node of the i+1th period, sends a failover message to the dastandode (1), and determines that the main selection is completed when a success result of executing the failover by the proxy cm-agent is detected by dn1 (the dastandode (1)), wherein in the waiting period, the term and the target LSN of the dastandode (1) are generally the largest after entering the main rising period, and the dastandode (1) is used as the main node of the i+1th period even if the term and the target LSN of other slave nodes are the largest.

The master arbitration is ended.

It should be noted that, the difference between fig. 7a and fig. 4 is that, on the one hand, log playback is no longer required when the slave node enters the lock1 state, so that RTO can be reduced to some extent; on the other hand, the waiting time, the arbitration interval, and the detected time interval are all adjustable parameters.

Fig. 7b provides another possible application scenario. As shown in FIG. 7b, the difference with respect to FIG. 7a is that the failure of the master node (datanode (1)) is not recovered within the waiting period, the datanode (2) and the datanode (3) enter a lock1 state, optionally, the service side cm_server selects the datanode (3) as the master node in the i+1st period, sends a failure message to the datanode (3), determines that the host selection is completed when the drop 3 (datanode (3) is detected by the agent side cm_agent, and realizes the replacement of the host, and then sends an unlock message to the datanode (3) by the agent side cm_agent, optionally, the service side cm_server can send a lock2 message to the datanode (1) when the drop 3 (daode (3) is detected successfully by the agent side cm_agent), and establish a connection command to the nodes (2) based on the connection with the nodes (1, 2) and the connection between the nodes (2) can be established when the drop 2 is not detected by the agent side cm_agent, and the connection command (2) is established.

Based on the above examples of fig. 5, fig. 7a and fig. 7b, an embodiment of the present application provides a fault handling method of a database system, where the database system includes a master node, a plurality of slave nodes such as N slave nodes, and a cluster manager, and the method is applied to the cluster manager, and includes:

in the event of a failure of the i-th tenure's master node (see description of step 505 for details), the i-th tenure's master node is down to a slave node (see description of step 508 above for details), a first lock message is sent to the slave node, the first lock message being used to disconnect the i-th tenure's master node from the slave node (see description of step 509 above for details); under the condition that the number of the slave nodes in the first locking state is greater than or equal to a preset threshold value (can be half of the slave nodes or more than half of the slave nodes, and is determined by combining actual demands in detail), determining a master node in an i+1th period from the slave nodes in the first locking state based on the period of the slave nodes in the first locking state and the serial number of the permanently stored pre-written log; wherein the slave node in the first locked state is a slave node disconnected from the master node in the ith tenure (see description of step 512 for details), the target node in step 512, i.e. the master node in the ith tenure.

In one possible implementation, the method further includes: sending detection requests to the master node and the slave node in the ith period according to the time interval, so that the master node and the slave node in the ith period send parameter values of detection parameters to the cluster manager; the detection request includes a detection parameter; wherein, the detection parameters include: whether a node fails, tenns, a sequence number, the sequence number indicating a pre-written log that has been persisted. For details, see description of step 502 above.

The manner of determining the time interval may be referred to in fig. 6a and step 502, and the description about the time interval is not repeated.

determining a slave node with a maximum tenure period and a serial number of a pre-written log stored in a lasting mode from the slave node in a first locking state which is never failed as a target node; the master information (i.e., the master message described in step 513) is sent to the target node so that the slave node of the pre-written log having the largest sequence number that has been stored persistently will take itself as the master node of the i+1st tenure based on the master information.

In one possible implementation, the method further includes: sending unlocking information (namely, the unlocking information described in step 515) to the master node in the (i+1) th period, so that the master node in the (i+1) th period waits for connecting the slave nodes based on the unlocking information; and sending second lock information (i.e., the second lock message described in step 517) to the slave node, so that the slave node establishes a connection with the master node of the i+1st tenure based on the second lock information, and synchronizes the tenure after connecting to the master node of the i+1st tenure.

The manner of determining the waiting duration may refer to the description of the waiting duration in fig. 6b and step 506, and will not be repeated.

Based on the same conception as the embodiment of the method, the embodiment of the application also provides a fault processing device of the database system. The fault handling device of the database system includes a plurality of modules, each module is configured to execute each step in the fault handling method of the database system provided in the embodiment of the present application, and the division of the modules is not limited herein. It will be clear to those skilled in the art that, in practical application, each step allocation in the fault handling method of the database system provided in the embodiments of the present application may be completed by different modules, that is, the internal structure of the device is divided into different modules, so as to complete all or part of the functions described above. In addition, the specific names of the modules are only for distinguishing from each other, and are not used to limit the protection scope of the present application. The specific working process of the modules in the above apparatus may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

For example, the fault handling device of the database system is configured to execute the fault handling method of the database system provided in the embodiment of the present application, and fig. 8 is a schematic structural diagram of the fault handling device of the database system provided in the embodiment of the present application. As shown in fig. 8, a fault handling device of a database system provided in an embodiment of the present application includes:

a first processing module 801, configured to reduce the master node in the ith period to a slave node in the event of a failure of the master node in the ith period, and send a first lock message to the slave node, where the first lock message is used to disconnect the master node in the ith period from the slave node;

a first decision module 802, configured to determine, from among the slave nodes in the first locked state, a master node in an i+1st tenure based on the tenure of the slave node in the first locked state and the serial number of the persistently stored pre-written log if the number of the slave nodes in the first locked state is greater than or equal to a preset threshold; the slave node in the first locking state is a slave node disconnected with the master node in the ith period.

The embodiment of the application also provides electronic equipment based on the same conception as the embodiment of the method of the application. The electronic device may be a server node. The structure of the electronic device can be seen from the structure described in fig. 2.

In a specific application, the hard disk 123 may store a computer program, and when the electronic device is running, the CPU111 may read the computer program stored in the hard disk 123 to the memory, and read the computer program from the memory, to implement steps in the fault handling method of the database system, for example, steps 501 to 518 in fig. 5.

By way of example, a computer program may be divided into one or more modules/units, which may be a series of computer program instruction segments capable of performing a specific function, stored in the hard disk 123 and executed by the CPU111 to perform the present application. For example, the computer program may be split into a first processing module 801, a first decision module 802, the specific functions of each module being described above.

Fig. 9 is a flowchart of a fault handling method of a database system according to an embodiment of the present application. The present embodiment is applicable to a database system, and in particular, to cluster manager 102 and database node 101. As shown in fig. 9, the fault handling method of the database system provided in the embodiment of the present application at least includes the following steps:

Step 901, the cluster manager 102 determines a slave node and a master node in an ith period; wherein i is a positive integer of 1 or more.

Step 902, the cluster manager 102 sends a detection request to the slave node and the master node in the ith period according to the time interval; the detection request includes a detection parameter; the detection parameters include whether the fault, tenure, sequence number, the sequence number indicating the persistent pre-written log.

Details refer to the description of step 502 above, and will not be repeated.

And 903, reporting parameter values of the detection parameters to a cluster manager by the slave node and the master node in the ith period respectively.

And 904, performing fault recovery on the main node in the ith period.

Details refer to step 504, and will not be described again.

Step 905, the cluster manager 102 arbitrates according to the arbitration interval, and for each arbitration, based on the situation reported by the master node in the ith period, determines whether the master node in the ith period is faulty, if so, step 806 is executed.

Details refer to step 505, and will not be described again.

Step 906, cluster manager 102 waits according to the waiting time.

Details refer to step 506, and will not be described again.

Step 907, the cluster manager 102 determines whether the failure of the master node at the i-th tenure is recovered in the waiting period, if so, step 908 is executed, and if not, steps 909 and 910 are executed.

In step 908, the cluster manager 102 determines the master node in the i-th tenure as the target node if the sequence numbers of the pre-written logs of the master node in the i-th tenure and the dropped disk are the maximum.

In one possible implementation manner, the fault recovery of the master node of the ith period master node may consider that the serial numbers of the pre-written logs of the period and the dropped disk of the ith period master node are the largest, and the ith period master node is directly taken as the target node (i+1th period master node).

In another possible implementation, the cluster manager 102 compares all the detection parameters reported by the database nodes 101 to determine whether the serial numbers of the pre-written log of the i-th tenure master node and the pre-written log of the dropped disk are maximum.

Step 909, the cluster manager 102 changes the master node in the i-th tenure to the slave node.

Details refer to the description of step 508, and will not be repeated.

Step 910, the cluster manager 102 sends a first lock message to the slave node, where the first lock message is used to disconnect the master node from the slave node during the i-th period.

Wherein the first lock message may be the first lock message in step 509. The execution steps of steps 909 and 910 are not limited in sequence. Details refer to the description of step 509, and will not be repeated.

Step 911, the slave node executes a command to disconnect from the master node in the ith tenure based on the first lock message.

Step 912, the cluster manager 102 determines whether the number of slave nodes in the first lock state is greater than half of the slave nodes, and if so, step 913 is performed; the slave node in the first locking state is a slave node disconnected from the master node and is not actively reconnected with the master node.

Step 913, cluster manager 102 determines, from among the slave nodes in the first lock state that have never failed, the slave node of the persisted pre-written log having the largest tenure and the largest sequence number as the target node.

Details refer to step 512, and will not be described again.

Step 914, cluster manager 102 sends a master up message to the target node.

In step 915, the target node determines itself as the master node of the i+1st tenure based on the ascending master message.

In step 916, the cluster manager 102 sends an unlock message to the master node in the i+1st tenure.

Illustratively, the rising master message is the unlock message in fig. 4.

Step 917, the master node in the (i+1) th tenure executes a polling connection command based on the unlocking message, and receives the connection of the slave node.

Step 918, cluster manager 102 sends a second lock message to the slave nodes, respectively.

Illustratively, the second lock message is the lock2 message in fig. 4.

It should be noted that, the execution sequence of step 918 may be performed after step 916, or may be performed in parallel with step 916.

Step 919, the slave node executes a command connected with the master node of the (i+1) th tenure based on the second lock message, and synchronizes the tenure after connecting the master node of the (i+1) th tenure.

In the scheme, the old master log is not required to be replayed and completed in the waiting time, and the old master can be used as a new master when the old master is recovered to be normal, so that RTO is reduced in a certain program. In addition, the time interval, the arbitration interval and the waiting time can be flexibly configured, so that RTO is reduced to a certain extent.

Fig. 10a provides a possible application scenario. As shown in fig. 10a, assume that there are 3 database nodes 101: datanode (1), datanode (2), datanode (3), agent cm_agent (for convenience of description, different agent ends of 3 database nodes 101 are taken as a whole), and service cm_server; suppose the master node of the i-th term is datinode (1).

The agent cm_agent reports the state information of each node to the server cm_server at a certain time interval (a certain frequency), the server cm_server determines an arbitration interval based on the reporting frequency of the cm_agent and arbitrates according to the arbitration interval, and for each arbitration, the server cm_server detects whether the master node (datinode (1)) has a fault, and once the fault enters an arbitration stage, the server waits based on the waiting time. As described above, the waiting duration indicates a duration that the host node waiting for the ith period of failure can be regarded as the host node again, and the waiting duration can be determined according to the historical performance data of the host node in the ith period of failure, for example, when the service can be processed quickly, the description shows good performance, and the effect of the host-standby switching on the service is large, at this time, the waiting duration is long, otherwise, the waiting duration is short.

And recovering the fault of the main node (datanode (1)) in the waiting time.

And the server cm-server judges that the fault of the datinode (1) is recovered within the waiting time, the datinode (1) meets the rising main condition, the term and the write_ lsn are maximum, the datinode (1) is determined to be a new main node in the i+1st period, a fault over message is sent to the datinode (1), and the completion of the main selection is determined when the fault over result is successfully executed by detecting dn1 (datinode (1) through the proxy cm-agent.

And (5) finishing the selection of the master, and entering a stage of unlocking the master and preparing LOCK 2.

The master arbitration is ended.

Fig. 10b provides another possible application scenario. As shown in fig. 10b, it differs from fig. 10a in that the failure of the master node (datanode (1)) is not recovered during the waiting period.

Judging that the datinode (1) does not return to normal when the waiting time is exceeded by the service end cm_server; at this time, for the failed master node (datanode (1)), the server cm_server determines that the master node (datanode (1)) is reduced to the slave node (standby), and when the master node (datanode (1)) is in an operation state, the control agent cm_agent modifies the dn state of the master node (datanode (1)) to be the slave node.

The server cm_server transmits the proxy cm_agent to the slave node: the method comprises the steps that the datanode (1), the datanode (2) and the datanode (3) respectively send lock1 messages, and the datanode (1), the datanode (2) and the datanode (3) respectively execute lock1 commands (commands for disabling a connection master node); proxy cm_agent detects dn: and executing lock1 results by the datanode (1), the datanode (2) and the datanode (3), and reporting the results to a server cm_server.

The server cm_server determines at least two slave nodes: the datinode (2) and the datinode (3) enter a lock1 state to perform a main selection stage.

The server cm_server judges that dn3 (datanode (3)) meets the primary rising condition: term and write_ lsn max; at this time, taking the datanode (3) as the master node of the i+1th tenure, sending a failover message to the datanode (3), and determining that the master selection is completed when the detection of dn3 (datanode (3)) by the agent end cm_agent is successful.

The server cm_server sends a unlock message to the datinode (3) through the agent cm_agent, and sends lock2 messages to the datinode (1) and the datinode (2). Optionally, the server cm_server may detect that dn3 (datanode (3)) is successfully executed by the agent cm_agent, and send a lock2 message to datanode (1) and datanode (2).

The datanode (3) executes a unlock command based on the unlock information, waiting for a connection to be established with the slave node.

The datinode (1) and the datinode (2) execute a lock2 command based on the lock2 message respectively, and establish connection with the datinode (3). If the failure of the datinode (1) is not recovered, the connection may fail.

In addition, the agent cm_agent detects the result of the datinode (1) and the datinode (2) executing the lock2 and reports the result to the server cm_server, so that the server cm_server knows which slave nodes connected with the master node (dn 3) exist.

The master arbitration is ended.

Based on the above examples of fig. 9, fig. 10a and fig. 10b, the embodiment of the present application provides a fault handling method of a database system, where the database system includes a master node, a plurality of slave nodes such as N slave nodes, and a cluster manager, and the method is applied to the cluster manager, and includes:

Under the condition of the fault of the main node in the ith period, waiting for the fault recovery of the main node in the ith period according to the waiting time; if the master node in the ith period fails to recover within the waiting period and has the maximum period and the serial number of the pre-written log stored in the persistent mode, the master node in the ith period is determined to be the master node in the (i+1) th period (see steps 905 to 908 for details).

In the scheme, the old master log is not required to be replayed and completed in the waiting time, and the old master can be used as a new master when the old master is recovered to be normal, so that RTO is reduced in a certain program.

In one possible implementation, the method further includes: a second configuration interface is provided for determining a user configured wait period.

In one possible implementation, the method further includes: determining influence degree information, wherein the influence degree information indicates the influence degree of switching the slave node to the master node on user service; and determining the waiting time according to the influence degree information. Illustratively, when the influence degree information indicates that the influence degree of the slave node to be switched to the master node on the user service is smaller than or equal to a first threshold value (the influence degree is small), determining that the waiting time period is a third value; when the influence degree information indicates that the influence degree of the slave node to the master node on the user service is greater than or equal to a second threshold value (the influence degree is large), determining the waiting time to be a fourth value; the fourth value is greater than the third value.

For the implementation manner in determining the waiting duration 2, refer to the description of the waiting duration in fig. 6b and step 506, and will not be repeated.

In one possible implementation, the method further includes: if the failure of the master node in the ith period is not recovered within the waiting time, after the waiting time is exceeded, the master node in the ith period is reduced to a slave node, and a first lock message is sent to the slave node, wherein the first lock message is used for disconnecting the master node in the ith period from the slave node; and when the number of the slave nodes in the first locking state is greater than or equal to a preset threshold value, determining the master node in the i+1th period from the slave nodes in the first locking state based on the period of the slave nodes in the first locking state and the serial number of the persistently stored pre-written log.

In the scheme, the fault recovery of the master node is waited through waiting for a certain time, and when the fault of the master node is not recovered, the master node is selected from available slave nodes, so that the influence on the service is reduced as much as possible.

In one example, determining the master node of the i+1th tenure from the slave nodes of the first lock state based on the tenure of the slave node of the first lock state and the sequence number of the persistently stored pre-written log, comprises:

In one possible implementation, the method further includes: sending unlocking information (namely, the unlocking information described in step 916) to the master node in the (i+1) th period, so that the master node in the (i+1) th period waits for the slave node to be connected based on the unlocking information; and sending second lock information (i.e., the second lock message described in step 918) to the slave node, so that the slave node establishes a connection with the master node of the i+1st tenure based on the second lock information, and synchronizes the tenure after connecting to the master node of the i+1st tenure.

For example, the fault handling device of the database system is configured to perform the fault handling method of the database system provided in the embodiment of the present application, and fig. 11 is a schematic structural diagram of the fault handling device of the database system provided in the embodiment of the present application. As shown in fig. 11, a fault handling device of a database system provided in an embodiment of the present application includes:

A second processing module 1101, configured to wait for a failure recovery of the host node in the ith period according to a waiting duration in case of a failure of the host node in the ith period;

and the second decision module 1102 is configured to determine the master node in the i-th tenure as the master node in the i+1th tenure if the master node in the i-th tenure fails to recover within the waiting duration and has the maximum tenure and the serial number of the permanently stored pre-written log.

In a specific application, the hard disk 123 may store a computer program, and when the electronic device is running, the CPU111 may read the computer program stored in the hard disk 123 to the memory, and read the computer program from the memory, to implement steps in the fault handling method of the database system, for example, steps 901 to 919 in fig. 9.

By way of example, a computer program may be divided into one or more modules/units, which may be a series of computer program instruction segments capable of performing a specific function, stored in the hard disk 123 and executed by the CPU111 to perform the present application. For example, the computer program may be split into a second processing module 1101, a second decision module 1102, the specific functions of each module being described above.

In addition to the methods, apparatus and electronic devices described above, embodiments of the present application may also provide a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in the fault handling method of the database system of the various embodiments of the present application described in the "methods" section of the present specification. Wherein the computer program product may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. Wherein the computer program code may be in the form of source code, object code, executable files, or in some intermediate form, etc. The computer program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device or entirely on the remote computing device or server.

Further, embodiments of the present application may also provide a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in a fault handling method of a database system according to various embodiments of the present disclosure described in the "method" section of the present specification. The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.

The basic principles of the present application have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present application are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, since the disclosure is not necessarily limited to practice with the specific details described.

The block diagrams of the devices, apparatuses, devices, systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.

It is also noted that in the apparatus, devices and methods of the present disclosure, components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered equivalent to the present disclosure.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

It will be appreciated that the various numerical numbers referred to in the embodiments of the present application are merely for ease of description and are not intended to limit the scope of the embodiments of the present application.

Claims

1. A method of fault handling for a database system, the database system comprising a master node, a plurality of slave nodes and a cluster manager, the method being applied to the cluster manager, the method comprising:

under the condition of failure of the master node in the ith period, the master node in the ith period is reduced to a slave node, and a first lock message is sent to the slave node, wherein the first lock message is used for disconnecting the master node in the ith period from the slave node;

Determining a master node in an i+1th period from the slave nodes in the first locking state based on the period of the slave nodes in the first locking state and the serial number of the permanently stored pre-written log under the condition that the number of the slave nodes in the first locking state is larger than or equal to a preset threshold value; the slave node in the first locking state is a slave node disconnected with the master node in the ith period.

2. The method according to claim 1, wherein the method further comprises:

sending detection requests to the master node and the slave nodes in the ith period according to the time interval, so that the master node and the slave nodes in the ith period send parameter values of detection parameters to the cluster manager; the detection request includes the detection parameter; wherein the detection parameters include: whether a node fails, tenns, a sequence number indicating a pre-written log that has been persisted.

3. The method according to claim 2, wherein the method further comprises:

a first configuration interface is provided for determining the time interval configured by the user.

4. The method according to claim 2, wherein the method further comprises:

Determining network information indicating a network condition between the master node and the slave node of the ith tenure;

and determining the time interval according to the network information.

5. The method of claim 4, wherein said determining said time interval based on said network information comprises:

determining that the time interval is a first value when the network information indicates that no network delay exists between the master node and the slave node in the ith period;

determining that the time interval is a second value when the network information indicates that a network delay exists between the master node and the slave node in the ith tenure; the second value is greater than the first value.

6. The method of claim 1, wherein the determining the master node of the (i+1) th tenure from the slave nodes of the first lock state based on the tenure of the slave nodes of the first lock state and the sequence number of the persistently stored pre-written log comprises:

determining a slave node with a maximum tenure period and a serial number of a permanently stored pre-written log from the slave nodes in the first locking state which never fail as a target node;

And sending the ascending master information to the target node, so that the slave node of the persistent storage pre-written log with the maximum serial number takes the slave node as the master node of the i+1st tenure based on the ascending master information.

7. The method of claim 6, wherein the determining, from among the slave nodes of the first lock state that never failed, the slave node having the greatest tenure and the sequence number of the largest persistently stored pre-written log as the target node comprises:

and in the secondary nodes in the first locking state which are not failed, when the secondary node with the maximum period and the serial number of the pre-written log which is stored in a lasting mode comprises the primary node in the ith period, taking the primary node in the ith period as a target node.

8. The method according to claim 1, wherein the method further comprises:

sending unlocking information to the master node in the (i+1) th period, so that the master node in the (i+1) th period waits for connecting the slave node based on the unlocking information;

and sending second lock information to the slave node so that the slave node establishes connection with the master node of the (i+1) th tenure based on the second lock information, and synchronizes the tenure after connecting to the master node of the (i+1) th tenure.

9. The method according to any one of claims 1 to 8, further comprising:

when determining the fault of the main node in the ith period, waiting for the fault recovery of the main node in the ith period according to the waiting time;

the sending a first lock message to the slave node includes:

and after the waiting time is exceeded, sending a first lock message to the slave node.

10. A computing device comprising a processor and a memory; wherein,

the memory is used for storing programs;

the processor is configured to execute the program stored in the memory, and when the program stored in the memory is executed, to perform the method of any one of claims 1 to 9.