CN117667469A - Control method and device - Google Patents

Control method and device Download PDF

Info

Publication number
CN117667469A
CN117667469A CN202211042061.5A CN202211042061A CN117667469A CN 117667469 A CN117667469 A CN 117667469A CN 202211042061 A CN202211042061 A CN 202211042061A CN 117667469 A CN117667469 A CN 117667469A
Authority
CN
China
Prior art keywords
node
cluster
nodes
database cluster
snapshot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211042061.5A
Other languages
Chinese (zh)
Inventor
周智伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202211042061.5A priority Critical patent/CN117667469A/en
Publication of CN117667469A publication Critical patent/CN117667469A/en
Pending legal-status Critical Current

Links

Abstract

The embodiment of the application provides a control method and device. The method comprises the following steps: the first node may determine a node arbitration policy corresponding to each node based on a change in an online state and a working state of each node in the cluster. And, the first node may determine a cluster arbitration policy for the cluster based on the number of nodes in the cluster. The first node can determine whether to operate the database information of the node or not in the recovery process based on the node arbitration policy and the cluster arbitration policy, and whether to move the metadata information in the cluster or not, so as to reduce unnecessary network overhead and improve the resource utilization rate of the cluster.

Description

Control method and device
Technical Field
The embodiment of the application relates to the field of databases, in particular to a control method and device.
Background
Currently, in the technical field of computer database software applications, a shared storage database cluster is a multi-instance shared data storage system, and a user can log in any one database instance in the cluster to obtain complete database service.
In order to handle the change of the node information of the database in the shared storage cluster, the following method is generally adopted in the industry: the change of the information of the database nodes in the cluster is monitored in real time by using a third party or self-developed database cluster monitoring software, when a database node fault or a newly added node event occurs, the cluster monitoring software can sense the event and issue a corresponding instruction to the database node, and the database node invokes a preset processing flow to complete the fault recovery of the cluster.
The method can respond to the node change in real time and timely complete the fault recovery of the cluster. However, in a shared storage cluster, the data pages and locks are no longer private resources of a single process, but rather distributed public resources of the entire cluster, requiring management using DRC (Distribute Resource Catalog, distributed resource directory). In view of load balancing, DRCs are typically evenly distributed to the various nodes in the cluster. When the number of nodes in a cluster changes, the number of DRCs managed by each node also changes. Therefore, when the node fails and the node is newly added, the DRC resource is inevitably reconstructed or transferred. In this scenario, the prior art management approach has redundant processing logic, resulting in increased fault handling time and network resource consumption.
Disclosure of Invention
The embodiment of the application provides a control method and device. In the method, a first node serving as a master control node can determine an arbitration policy of the node based on the state of the node in the cluster, determine the arbitration policy of the cluster based on the change condition of the number of the nodes in the cluster, and execute corresponding operations based on the arbitration policy of the node and the arbitration policy of the cluster so as to reduce network overhead.
In a first aspect, an embodiment of the present application provides a control method, where the method is applied to a database cluster. The method comprises the following steps: a first node obtains the change condition and the working state of the online state of each node in the database cluster; the first node belongs to the database cluster. The first node determines a node arbitration policy corresponding to each node based on the change condition and the working state of the online state of each node; the node arbitration policy is used to indicate whether or not database information of nodes in the database cluster needs to be operated. The first node obtains the change condition of the number of nodes in the database cluster. The first node determines a cluster policy corresponding to the database cluster based on the change condition of the number of nodes in the database cluster; the cluster policy is used for indicating whether metadata information in the database cluster needs to be moved. And the first node executes corresponding fault recovery operation based on the node arbitration policy and the cluster policy. In this way, in the embodiment of the present application, by combining the state of the node and the state of the cluster, it is determined whether to process related data in the cluster, thereby effectively avoiding network consumption caused by frequent processing of data, reducing the time taken up by fault recovery, and improving the overall resource utilization rate of the cluster. In the embodiment of the application, by arbitrating a single node, the node arbitration result corresponding to each node in the cluster can be obtained at the same time, a plurality of nodes can be processed correspondingly at the same time, and certain nodes can be processed without executing.
Illustratively, the metadata information is optionally DRC as described in the embodiments below.
Exemplary database information for a node includes, but is not limited to, logs and transactions.
In one possible implementation manner, before the first node obtains the change condition and the working state of the online state of each node in the database cluster, the method includes: the first node negotiates with other nodes in the database cluster to determine that the first node is an organizer, and the other nodes in the database cluster are participants. In this way, the nodes in the cluster can select the master control device, i.e. the organizer, in a negotiation manner. The other nodes are participants.
In a possible implementation manner, the first node obtains a change condition and an operating state of an online state of each node in the database cluster, including: the first node acquires a stable node snapshot from the shared storage of the database cluster, wherein the stable node snapshot comprises identification information of at least one node in an online state in the database cluster before the last fault recovery operation is executed; the method comprises the steps that a first node obtains a real-time online node snapshot, wherein the real-time online node snapshot comprises identification information of at least one node currently in an online state in a database cluster; the first node determines a change condition of the online state of each node based on the stable node snapshot and the real-time online node snapshot. In this way, in the embodiment of the present application, the node may select the most suitable recovery procedure by comparing the difference between the latest stable node snapshot stored in the cluster and the real-time online node snapshot with the state of the online node.
In one possible implementation manner, the determining, by the first node, a node arbitration policy corresponding to each node based on a change condition and a working state of an online state of each node includes: if the change condition of the online state of the single node indicates that the single node is a restart node and the working state of the single node is a target state, determining that the arbitration policy of the single node is that the log and the transaction of the single node need to be processed in the fault recovery operation; if the change condition of the online state of the single node indicates that the single node is a removed node, determining the arbitration policy of the single node as the log and the transaction of the single node need to be processed in the fault recovery operation. Thus, in the embodiment of the present application, the first node may determine, based on the state of each node, a corresponding node arbitration result, so as to select not to process in the recovery flow, or perform corresponding processing on at least one node at the same time.
In one possible implementation, the determining, by the first node, a cluster policy corresponding to the database cluster based on a change in a number of nodes in the database cluster includes: and if the change condition of the node number indicates that the node number in the database cluster is unchanged, determining the cluster strategy to avoid moving metadata information in the database cluster. In this way, in the embodiment of the present application, under the condition that the number of nodes of the cluster is unchanged, metadata information movement is not required to be performed on the cluster, so as to reduce unnecessary network consumption and improve resource utilization.
In one possible implementation, the determining, by the first node, a cluster policy corresponding to the database cluster based on a change in a number of nodes in the database cluster includes: and if the change condition of the node quantity indicates the change of the node quantity in the database cluster, determining the cluster strategy to be that the metadata information in the database cluster needs to be moved. In this way, in the embodiment of the present application, under the condition that the number of clusters is changed, metadata in the clusters is moved, so that network consumption can be effectively reduced.
In a second aspect, embodiments of the present application provide a control apparatus. The device comprises: one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored on the memory, which when executed by the one or more processors, cause the apparatus to perform the steps of: acquiring the change condition and the working state of the online state of each node in the database cluster; determining a node arbitration policy corresponding to each node based on the change condition and the working state of the online state of each node; the node arbitration policy is used for indicating whether the operation of the database information of the nodes in the database cluster is needed; acquiring the change condition of the number of nodes in the database cluster; determining a cluster policy corresponding to the database cluster based on the change in the number of nodes in the database cluster; the cluster strategy is used for indicating whether metadata information in the database cluster needs to be moved or not; and executing corresponding fault recovery operation based on the node arbitration policy and the cluster policy.
In one possible implementation, the computer program, when executed by the one or more processors, causes the apparatus to perform the steps of: negotiating with other nodes in the database cluster, determining the nodes as organizers, wherein the other nodes in the database cluster are participants.
In one possible implementation, the computer program, when executed by the one or more processors, causes the apparatus to perform the steps of: obtaining a stable node snapshot from the shared storage of the database cluster, wherein the stable node snapshot comprises identification information of at least one node in an online state in the database cluster before the last fault recovery operation is executed; acquiring a real-time online node snapshot, wherein the real-time online node snapshot comprises identification information of at least one node currently in an online state in a database cluster; and determining the change condition of the online state of each node based on the stable node snapshot and the real-time online node snapshot.
In one possible implementation, the computer program, when executed by the one or more processors, causes the apparatus to perform the steps of: if the change condition of the online state of the single node indicates that the single node is a restart node and the working state of the single node is a target state, determining that the arbitration policy of the single node is that the log and the transaction of the single node need to be processed in the fault recovery operation; if the change condition of the online state of the single node indicates that the single node is a removed node, determining the arbitration policy of the single node as the log and the transaction of the single node need to be processed in the fault recovery operation.
In one possible implementation, the computer program, when executed by the one or more processors, causes the apparatus to perform the steps of: and if the change condition of the node number indicates that the node number in the database cluster is unchanged, determining the cluster strategy to avoid moving metadata information in the database cluster.
In one possible implementation, the computer program, when executed by the one or more processors, causes the apparatus to perform the steps of: if the change condition of the node number indicates the change of the node number in the database cluster, determining that the cluster strategy is that metadata information in the output local library cluster needs to be moved urgently.
In a third aspect, embodiments of the present application provide a computer-readable medium storing a computer program comprising instructions for performing the method of the first aspect or any possible implementation of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer program comprising instructions for performing the method of the first aspect or any possible implementation of the first aspect.
In a fifth aspect, embodiments of the present application provide a chip that includes a processing circuit, a transceiver pin. Wherein the transceiver pin and the processing circuit communicate with each other via an internal connection path, the processing circuit performing the method of the first aspect or any one of the possible implementation manners of the first aspect to control the receiving pin to receive signals and to control the transmitting pin to transmit signals.
Drawings
FIG. 1 is a schematic diagram of an exemplary illustrated database cluster;
fig. 2 is a schematic diagram of a structure of an exemplary server;
FIG. 3 is a schematic diagram of an exemplary illustrated database cluster;
FIGS. 4a and 4b are exemplary prior art fault handling flows shown in the examples;
FIG. 5 is a schematic diagram of an exemplary database cluster;
FIG. 6 is a schematic diagram of an exemplary database;
FIG. 7 is a flow chart diagram of an exemplary control method;
fig. 8 is a schematic diagram of an exemplary application scenario;
fig. 9 is a schematic diagram of an exemplary application scenario;
fig. 10 is a schematic diagram of an exemplary application scenario;
fig. 11 is a schematic view of an exemplary device configuration.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone.
The terms first and second and the like in the description and in the claims of embodiments of the present application are used for distinguishing between different objects and not necessarily for describing a particular sequential order of objects. For example, the first target object and the second target object, etc., are used to distinguish between different target objects, and are not used to describe a particular order of target objects.
In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the description of the embodiments of the present application, unless otherwise indicated, the meaning of "a plurality" means two or more. For example, the plurality of processing units refers to two or more processing units; the plurality of systems means two or more systems.
Before describing the technical solution of the embodiments of the present application, a storage system of the embodiments of the present application will be described first with reference to the accompanying drawings. Referring to fig. 1, a schematic diagram of a storage system is provided in an embodiment of the present application. One or more servers are included in the storage system. For example, the server 1 and the server 2 and the server … … are included. The number of servers can be set according to actual requirements, and the application is not limited.
Fig. 2 is a schematic diagram illustrating the structure of an exemplary server. Referring to fig. 2, a Database (DB) application (or Database software, hereinafter, simply referred to as Database) is deployed in the server. The database is a collection of files (including data files, temporary files, redo log files, control files, etc.) stored on a physical disk or file system. A database instance is a set of operating system processes (or a multi-threaded process) and some memory. The database may be operated by the database instance, and access to or modification of the database is typically accomplished by the database instance.
The storage system shown in fig. 1 may also be referred to as a shared storage database cluster or a shared storage cluster (hereinafter referred to as a cluster). As shown in fig. 2, each server in the cluster includes storage. The storage comprises shared storage and can also comprise local storage. For example, in order to realize that multiple instances access and modify data simultaneously, in the embodiment of the present application, a database in each server stores a data file, a control file, and a log file on a shared storage. As shown in fig. 3, the shared storage in the servers in the cluster (including the shared storage 1 of the server 1 and the shared storage 2 … … of the server 2, and the shared storage n of the server n) forms a shared storage array (may also be referred to as a shared storage magnetic array or a shared storage cluster, etc., and this application is not limited thereto, and is hereinafter referred to as shared storage for short).
For example, the local storage in the server may be used to save a configuration file, a local archive log, a remote archive log, and the like, and may be set according to actual requirements, which is not limited in this application.
In the embodiment of the present application, the communication network of the storage system may be divided into an internal network and an external network. The internal network is used for exchanging information and data between the servers, the public network is used for providing database service to the outside, and the user can log in the shared storage database cluster by using the public network address to access the database.
Illustratively, in the prior art embodiment, the cluster includes a cluster control device (or software) for monitoring the operation states of the various nodes (i.e., databases) in the cluster. Fig. 4a and 4b are exemplary fault handling flows in the prior art embodiment shown. Referring to FIG. 4a, an exemplary cluster includes DB1, DB2, and DB3. Wherein, DB1 is used for managing DRC1 and DRC2, DB2 is used for managing DRC3 and DRC4, DB3 is used for managing DRC5 and DRC6. Referring to fig. 4b, for example, if DB3 is abnormal and restarted, the cluster control device detects that DB3 is abnormal, and then the cluster control device immediately executes the fault handling procedure. Specifically, the cluster control device reallocates DRCs, for example, DB1 manages DRCs 1 to DRC3, and DB2 manages DRCs 4 to DRC6. After the successful restart of DB3, the cluster control means detects that DB3 is restored to normal, the DRC management state is reassigned again, for example, still restored to the correspondence relationship in fig. 4 a. Illustratively, in the management flows of fig. 4a to 4b and fig. 4b to 4a, the DBs communicate with each other through the network resources to confirm the DRCs managed respectively, resulting in an increase in network resource consumption. In addition, in the prior art, the cluster control device does not have the function of processing a plurality of node joining events at the same time or processing a plurality of node simultaneous occurrence fault events and node joining events. The cluster control device immediately carries out corresponding processing after detecting a node event (including a fault or new node joining), and processes the next detected node event after the processing is completed. Illustratively, the node event referred to in this application is optionally a change in node information in the cluster, including a failed restart of a node in the cluster, a newly added node in the cluster, removing an existing node in the cluster, and so on.
The embodiment of the application provides a control method which supports a cluster to process a plurality of events simultaneously in one processing flow so as to reduce fault recovery time. In addition, the method effectively reduces the movement of DRC resources in the recovery process so as to save network resources.
Fig. 5 is a schematic diagram of an exemplary cluster configuration. Referring to fig. 5, the server includes a server 1, a server 2, and a server 3. The number of servers is merely illustrative, and the present application is not limited thereto. Among them, the server 1 includes, but is not limited to: cluster management system (CM) software 1 (hereinafter referred to as CM 1) and DB1, and the server 2 includes, but is not limited to: CM2 and DB2. The server 3 includes, but is not limited to: CM3 and DB3. It should be noted that the server further includes other modules or interfaces, which are not limited in this application.
For example, the CM may be used to maintain nodes within the cluster and to provide real-time online node snapshots. Wherein the CM maintains the nodes within the cluster, optionally the CM, may query the nodes currently within the cluster, e.g., obtain identification information of the nodes that are online.
In an embodiment of the present application, a node snapshot includes: real-time online node snapshots and stable node snapshots.
Real-time online node snapshot: the collection of node identification information stored on the CM may be understood as the nodes that the CM considers should be in the cluster. For example, when nodes are added or removed, the real-time online node snapshot information may change.
Stabilizing node snapshots: the set of node identification information stored on the shared storage can be understood as a node which can normally provide service to the outside in the cluster after the last processing flow is completed. For example, after the failure processing of the removing node is finished, the removing node may also remove the new stable node snapshot from the stable node snapshot and save the new stable node snapshot to the shared storage.
As shown in fig. 5, communication between servers and the shared storage array may be performed through an internal network. For example, CMs between servers may communicate with each other, and DBs between servers may also communicate with each other. Also, the CM or DB in each server may also communicate with a shared storage array. By way of example, the communication protocols between different modules or software may be the same or different, and the present application is not limited.
Fig. 6 is a schematic diagram of an exemplary database. Referring to fig. 6, exemplary databases include, but are not limited to: listening threads, working thread pools, system buffers, background threads, DSS (Distributed Storage Service, distributed storage services), DMS (Distribute Memory Service, distributed memory services), and the like. Exemplary DMS include, but are not limited to: the system comprises a page interaction protocol module, a DRC management module and a fault recovery module.
Illustratively, the DSS may provide shared storage access capability, and the database may access the shared storage through the DSS.
Illustratively, a DMS may be used to manage, control, and maintain the status and information of nodes (i.e., databases) in a cluster. Illustratively, a page exchange protocol module in the DMS may be used to manage page usage of the nodes (i.e., databases). DRC management in DMS may be used to maintain DRC information for a node. The fault recovery module may be used to handle node events.
Illustratively, the fault recovery module may further include, but is not limited to: lock rob threads (modules), arbitrate threads (modules), and resume threads (modules).
Illustratively, the lock robbery module is configured to communicate with the CM to obtain the distributed lock. In the embodiment of the application, each node in the cluster performs lock robbing, a node that is obtained by robbing may be called an organizer, and a node that is not obtained by robbing may be called a participant. In the embodiment of the application, the lock grabbing modules of the nodes can communicate with each other to acquire the lock in a grabbing manner, namely a preemptive manner. In other embodiments, the lock preemption module of each node may also select the organizer by voting, and the organizer will acquire the lock. The specific mode can be set according to actual requirements, and the application is not limited. In this embodiment, the lock-robbing process of the lock-robbing module may be executed after the cluster initialization is completed, and the organizer who robs the lock will be used as the organizer in the subsequent process until the organizer exits the cluster. The other nodes in the cluster will re-execute the preemption procedure.
Illustratively, the arbitration module is configured to arbitrate (or determine) whether a failure recovery procedure needs to be initiated according to the stable node snapshot, the real-time online node snapshot, and the online node status. In one example, if the arbitration module determines that a fault recovery procedure needs to be initiated, a determination is made as to what type of fault recovery to initiate.
Illustratively, the fault recovery module is configured to execute a fault recovery procedure based on an arbitration result of the arbitration module.
Referring to fig. 5, fig. 7 is a flow chart illustrating an exemplary control method. Referring to fig. 7, specific examples include, but are not limited to:
s701, node 1 requests a stable node snapshot from the shared storage.
S702, the shared storage sends a stable node snapshot to the node 1.
In the embodiment of the present application, the node 1 is taken as an organizer for illustration. The specific determination manner of the organizer can be referred to above, and will not be described herein. In this embodiment of the present application, a node may be understood as a database, and may also be understood as a server where the database is located.
Illustratively, the arbitration module in node 1 sends stable node snapshot request information to the shared storage, where the request information is used to request the latest stable node snapshot stored in the shared storage. It can also be understood that the latest stored stable node snapshot in the shared storage is the real-time online node snapshot obtained when the control method flow is executed last time. That is, after each control flow execution is finished, the organizer (e.g., node 1) may save the real-time online node snapshot obtained in the current flow into the shared storage as a stable node snapshot.
Illustratively, the shared store sends the most recently saved stable node snapshot to the arbitration module of node 1 in response to the received operation. Alternatively, each new stable node snapshot stored in the shared storage may overwrite the previous stable node snapshot. Optionally, the shared storage may also store the stable node snapshot obtained each time, which is not limited in this application.
S703, node 1 requests a real-time online node snapshot from the CM.
Illustratively, the arbitration module of node 1 sends real-time online node snapshot request information to the CM (i.e., CM1 in node 1) for requesting real-time online node snapshots from the CM.
Illustratively, the CM obtains a snapshot of real-time online nodes in the cluster in response to the received real-time online node snapshot request information.
In this embodiment of the present application, as described above, CMs in each node (i.e., server) communicate with each other, and each time a new node is added or a node is deleted in a cluster, the CM of each node obtains a cluster change. The specific manner in which the CM obtains the node changes in the cluster may refer to the prior art embodiments, which are not limited in this application. It is understood that the CM records identification information of each node currently on-line in the cluster. Correspondingly, the CM acquires a real-time online node snapshot, wherein the snapshot comprises identification information of each node currently online in the cluster. For example, if a newly added node requests to join the cluster, the newly added node may send join indication information for indicating that the node will join the cluster, and the CM of each node will receive the information and confirm that the cluster newly joins a node.
In the embodiment of the application, the real-time online node is a node considered to be in a cluster by the cluster management system, and is irrelevant to the state of the node. That is, as described above, the CM of each node in the cluster maintains identification information of real-time online nodes according to the node changes in the cluster. If the node is abnormally restarted, after the CM corresponding to the abnormal node detects the node abnormality, the node is restarted (which can be also understood as the initialization of the CM to control the database of the node). Alternatively, if the threshold number of times is exceeded or the predetermined time period, the abnormal node is still successful in startup, the CM removes the node from the cluster. The nodes of the removed cluster are non-real-time online nodes, i.e. do not belong to the cluster. It is understood that a node that has joined a cluster will be considered by the CM to remain in the cluster until the cluster is not removed. In the embodiment of the present application, the abnormal situation of the node may also be referred to as a dropped line, that is, the node is still in the cluster although the node is dropped.
S705, the node 1 requests the operation state from other nodes.
Illustratively, the node 1 serves as an organizer, and sends node operation state request information to each node (i.e., database) in the cluster through the internal network to request the operation state of each node.
And the nodes 2 to n respond to the received node working state request information and send the working states of the databases of the nodes to the node 1.
In the embodiment of the present application, the working states of the nodes include, but are not limited to: an OUT (offline) state, a JOIN (joining) state, a recovery state, and an IN (joining) state.
Illustratively, the OUT state is that the node is not activated or initialized to a target state. The target state is the JOIN state. For example, a node expects to join a cluster, and the node sends join indication information to other nodes in the cluster, for indicating that the node joins the cluster. And the other nodes respond to the received joining indication information to determine that the nodes join the cluster. Accordingly, the real-time online node snapshot will include the identification information of the node. Optionally, the node executes a corresponding initialization procedure, and the specific procedure may refer to the prior art, which is not limited in this application. And before the node does not complete initialization, the node is in the OUT state.
Illustratively, the JOIN state is a state in which the node has completed initialization and waits for a form (which may be understood as waiting for a failback procedure to be performed). It may be understood that after the node joins the cluster and completes initialization, the cluster needs to perform a corresponding reformation for the node, i.e. a failure recovery procedure, for example, the cluster needs to reallocate DRC, or perform some other operation (specific operation may refer to the prior art, and not be described in detail in this application) and so on, so that the node joins the cluster and is converted into a stable node. It can be understood that, in the embodiment of the present application, a stable node is a node that has been added to a cluster and is in a normal state, i.e. can normally process a database transaction.
Illustratively, the REFORM state is the state in which the node is performing a failback procedure. In the embodiment of the present application, the restarted node and the newly added node need to execute the refrm procedure, that is, after the failure recovery procedure, may be considered to have been added to the cluster, and the state is normal.
Illustratively, the IN state is that the fault recovery process execution of the node is ended, the cluster is already added, and the state is normal, i.e. the database transaction can be processed normally. The normal joining cluster described in the embodiments of the present application may include joining of a new node, and may also include the above scenario that a node is restarted due to a failure, that is, the node is always in the cluster, but the node itself fails, and after the node restarts and executes a failure recovery procedure, the normal joining cluster is considered.
In one possible implementation, for a node that is offline, i.e., OUT in state, the offline node does not receive the request information since the node is offline. As described above, the nodes in the cluster may communicate with each other, and accordingly, if any node goes offline, each node may learn that the node goes offline.
And S707, the node 1 arbitrates based on the stable node snapshot, the real-time online node snapshot and the node working state, and obtains a node arbitration result.
Illustratively, table 1 is an exemplary set of arbitration rules, specifically as follows:
TABLE 1
Referring to table 1, it exemplarily shows node changes corresponding to different permutation and combination. The details of table 1 are described below:
classification 1: the node is in a real-time online node snapshot, the node is in a stable node snapshot, and the state of the node is OUT.
Corresponding to the scene: the nodes are in the cluster, i.e., in the node snapshot in real-time. And the node is in the cluster before the last fault recovery flow is finished in the stable node snapshot. The state of the node is OUT, i.e. offline, or the node is restarted and the target state (JOIN) is not reached.
Arbitration result: and (5) not processing, and ending the flow.
That is, if in the current flow, the node 1 detects that there is a node in the cluster in which the old node (i.e., the node already in the cluster) is restarted and the target state is not reached, the current flow is ended, and S701 is executed again.
Classification 2: the node is in a real-time online node snapshot, the node is in a stable node snapshot, and the state of the node is JOIN.
Corresponding to the scene: the nodes are in the cluster, i.e., in the node snapshot in real-time. And the node is in the cluster before the last fault recovery flow is finished in the stable node snapshot. The state of the node is JOIN, i.e., the target state, i.e., the node is restarted, and the initialization is completed, and the node has entered the target state (JOIN) and waits for the fault recovery process to be executed.
Arbitration result: logs and transactions of the processing node.
In this example, if node 1 detects an old node restart and initializes to a target state (JOIN state), then the corresponding node arbitration result is determined as: logs and transactions of the processing node. That is, when the policy is executed in S709, the log and the transaction of the node of the class are processed accordingly.
Classification 3: the node is in a real-time online snapshot, the node is in a stable node snapshot, and the state of the node is refrm.
Corresponding to the scene: the nodes are in the cluster, i.e., in the node snapshot in real-time. And the node is in the cluster before the last fault recovery flow is finished in the stable node snapshot. The state of the node is refrm, i.e., after the node is restarted, the initialization is complete, the target state (JOIN) has been entered, and the failure recovery procedure is being performed.
Arbitration result: and (5) not processing, and ending the flow.
For example, if the node 1 detects that there is an old node restart in the cluster and the node is executing the recovery process, the current process is ended and S701 is executed again.
Classification 4: the node is IN a real-time online node snapshot, the node is IN a stable node snapshot, and the state of the node is IN.
Corresponding to the scene: the nodes are in the cluster, i.e., in the node snapshot in real-time. And the node is in the cluster before the last fault recovery flow is finished in the stable node snapshot. The node is IN, i.e. is a stable node, and can be understood as a node IN a cluster and IN a normal state.
Arbitration result: not processing
Illustratively, for stable results in the cluster, no processing need be performed on that class of nodes, i.e., execution continues with S708.
Classification 5: the node is in the real-time online node snapshot, the node is not in the stable node snapshot, and the state of the node is OUT.
Corresponding to the scene: the nodes are in the cluster, i.e., in the node snapshot in real-time. However, the node is not in the stable node snapshot, i.e., the node is not in the cluster until the end of the last failure handling flow. That is, the node is a newly added node. And the state of the node is OUT, namely the newly added node and the target state (JOIN) is not reached.
Arbitration result: and (5) not processing, and ending the flow.
For example, if the node 1 detects that there is a newly added node in the cluster (i.e., the node has joined the cluster) and the node does not complete initialization, i.e., does not reach the target state (JOIN state), the current flow is ended, and S701 is re-executed.
Classification 6: the node is in the real-time online node snapshot, the node is not in the stable node snapshot, and the state of the node is JOIN.
Corresponding to the scene: the nodes are in the cluster, i.e., in the node snapshot in real-time. However, the node is not in the stable node snapshot, i.e., the node is not in the cluster until the end of the last failure handling flow. That is, the node is a newly added node. And, the state of the node is JOIN, i.e., the newly added node and enters the target state (JOIN).
Arbitration result: the joining of new nodes is handled.
Illustratively, if the node 1 detects that there is a new node in the cluster (i.e., the node has joined the cluster) and the node completes initialization, i.e., reaches the target state (JOIN state), then it determines that the corresponding node arbitration result is to handle the joining of the new node. That is, in S709, the cluster needs to perform a corresponding process of new node joining for the class of nodes, for example, needs to allocate DRC or the like for the new node.
Classification 7: the node is in the real-time online node snapshot, the node is not in the stable node snapshot, and the state of the node is refrm.
Corresponding to the scene: the nodes are in the cluster, i.e., in the node snapshot in real-time. However, the node is not in the stable node snapshot, i.e., the node is not in the cluster until the end of the last failure handling flow. That is, the node is a newly added node. And the state of the node is refrm, i.e., the new node initialization is complete, has entered the target state (JOIN), and is performing a failure recovery procedure.
Arbitration result: and (5) not processing, and ending the flow.
For example, if the node 1 detects that there is a newly added node in the cluster, and the node is executing the recovery process (e.g., executing S709), the process ends and S701 is re-executed.
Classification 8: the node is not in the real-time online node snapshot, and the node is in the stable node snapshot.
Corresponding to the scene: the nodes are in the stable node snapshot, that is, the nodes are in the cluster before the last processing flow is finished. However, if the node is not in the real-time online snapshot, the node is not currently in the cluster, i.e., the removed node. For this type of node, it is not necessary to obtain the state of the node.
Arbitration result: processing removes the log and transactions of the node.
For example, if the node 1 detects that a removed node exists in the cluster, determining that the corresponding node arbitration result is to process the log and the transaction of the removed node. Such as deleting the phase Guan Rizhi and transaction rollback, etc.
Classification 9: the classification is an absence classification.
It should be noted that, in the embodiment of the present application, only a table is used as an example for illustration, and in other embodiments, the arbitration rule may be in other forms, which is not limited in the present application.
S708, node 1 arbitrates based on the cluster number change, and obtains the cluster arbitration result.
Illustratively, after the node 1 obtains the node arbitration result as an organizer, it further detects whether there is a change between the number of nodes in the cluster and the number of nodes in the last failure flow. It should be noted that, as described above, in some cases, the arbitration result of the node may be to end the current flow, and for this type of scenario, S708 is not performed, but S701 is re-performed.
Illustratively, the node 1 may count whether the number of nodes in the cluster changes based on the number of nodes included in the real-time online node snapshot and the number of nodes included in the stable node snapshot. The change of the number of nodes can be understood as whether there is a node change between the current fault processing flow and the last fault processing flow.
For example, if the number of nodes included in the real-time online node snapshot is different from the number of nodes included in the stable node snapshot, it may be determined that the number of nodes is changed. For example, if the number of nodes included in the real-time online node snapshot is the same as the number of nodes included in the stable node snapshot, it may be determined that the number of nodes is unchanged.
In the embodiment of the application, the cluster may include old node restart, new node joining and/or node removing, etc., and there may be several scenarios as follows:
1) Presence only node reboot
Illustratively, after the node 1 determines that only one or more nodes in the cluster include the reboot based on the arbitration result, the node 1 further counts whether the number of nodes in the cluster has changed.
Illustratively, in a scenario where there is only a node reboot, the number of nodes in the cluster is unchanged. Accordingly, the node 1 may determine that the number of nodes is unchanged based on the number of nodes included in the real-time online node snapshot and the number of nodes included in the stable node snapshot.
In the embodiment of the application, if only the nodes are restarted and the number of the nodes is unchanged, the corresponding strategy is that the distributed metadata information is not required to be moved.
In the embodiment of the present application, metadata information relocation includes, but is not limited to, DRC reallocation, and the specific flow may be referred to above, and will not be described herein.
2) There is only node removal.
Illustratively, node 1 (i.e., the organizer, not described in detail below) determines that only removed nodes are present in the cluster based on the arbitration results of the respective nodes. Node 1 detects whether a number of nodes in the cluster has changed.
Illustratively, the nodes have been removed and the number of nodes in the cluster will change accordingly. Accordingly, the node 1 detects that the number of nodes in the cluster changes, and the detection method may refer to the above, which is not described herein. In the case that the cluster only includes node removal and the number of cluster nodes changes, the corresponding policy is to need to relocate the distributed metadata information.
3) Only new node joins exist.
Illustratively, node 1 determines that only newly added nodes exist in the cluster based on the arbitration results for each node. Node 1 detects whether a number of nodes in the cluster has changed.
Illustratively, if there are newly added nodes in the cluster, the number of nodes in the cluster changes. Accordingly, the node 1 detects that the number of nodes in the cluster changes, and the detection method may refer to the above, which is not described herein. Under the condition that the cluster only comprises the new addition of nodes and the number of the cluster nodes is changed, the corresponding strategy is that the distributed metadata information needs to be moved.
4) There is both a node reboot and a node removal.
Illustratively, node 1 determines that there is both a node reboot and a node remove in the cluster based on the arbitration results of the nodes. Node 1 detects whether a number of nodes in the cluster has changed.
Illustratively, if there is a node removal in the cluster and no new node is added, the number of nodes in the cluster may change. Accordingly, the node 1 detects that the number of nodes in the cluster changes, and the detection method may refer to the above, which is not described herein. Under the condition that the cluster has the nodes restarted and removed at the same time and the number of the cluster nodes is changed, the corresponding strategy is to need to carry out relocation on the distributed metadata information.
5) And there is both a node restart and a node addition.
Illustratively, node 1 determines that there is both a node restart and a node addition in the cluster based on the arbitration results of the nodes. Node 1 detects whether a number of nodes in the cluster has changed.
For example, if there is a new node in the cluster and no node is removed, the number of nodes in the cluster may change. Accordingly, the node 1 detects that the number of nodes in the cluster changes, and the detection method may refer to the above, which is not described herein. Under the condition that the cluster has node restarting and node newly increasing and the number of the cluster nodes is changed, the corresponding strategy is to need to carry out relocation on the distributed metadata information.
6) Node removal and node addition exist simultaneously.
Illustratively, node 1 determines that there is both node removal and node addition in the cluster based on the arbitration results for each node. Node 1 detects whether a number of nodes in the cluster has changed.
In one example, if the node 1 detects that the number of nodes in the cluster changes, the corresponding policy is that the distributed metadata information needs to be migrated.
In another example, if the node 1 detects that the number of nodes in the cluster is unchanged, that is, the number of removed nodes is equal to the number of newly added nodes, the corresponding policy is that the distributed metadata information needs to be migrated.
7) And node removal, node addition and node restarting exist at the same time.
Illustratively, node 1 determines that there is a node removal, a node addition, and a node restart in the cluster at the same time based on the arbitration results of the nodes. Node 1 detects whether a number of nodes in the cluster has changed.
In one example, if the node 1 detects that the number of nodes in the cluster changes, the corresponding policy is that the distributed metadata information needs to be migrated.
In another example, if the node 1 detects that the number of nodes in the cluster is unchanged, that is, the number of removed nodes is equal to the number of newly added nodes, the corresponding policy is that the distributed metadata information needs to be migrated.
That is, in S708 in the embodiment of the present application, if the node 1 detects that the number of nodes in the cluster has changed from that in the last flow, it is determined that the corresponding arbitration result is to perform DRC relocation. Illustratively, if node 1 detects that the number of nodes in the cluster has not changed, then it is determined that the corresponding arbitration result is that DRC relocation need not be performed.
In this embodiment, the DRC relocation refers to reallocating DRCs in the cluster, for example, as shown in fig. 4a to fig. 4b, that is, reallocating DRCs corresponding to DBs in the cluster, and the specific reallocating method may refer to the prior art embodiment, which is not limited in this application.
In the embodiment of the present application, the node 1 performs the flow in fig. 7 in a polling manner, that is, after the end of the execution of the flow by the node 1, the execution is restarted from S701, and the control process is performed on the cluster abnormality in time.
In an embodiment of the present application, the exception of the cluster may include: the on-line node restarting, the off-line node joining (or adding) and/or the like in the cluster can be regarded as an abnormal phenomenon of the cluster and can be also understood as a change state of the cluster.
S709, the node 1 executes the corresponding policy based on the node arbitration result and the cluster arbitration result.
Illustratively, the node 1 performs corresponding processing on the node in combination with the node arbitration result obtained in S707 and the cluster arbitration result obtained in S708. Note that, as described above, if the node arbitration result is the end arbitration flow, S708 and S709 are not required to be executed, i.e., S701 is re-executed.
For example, if node 1 obtains the arbitration result of node 2 as a log and transaction of the processing node, and the cluster result is that DRC relocation is required. Accordingly, the cluster will re-process the log and the transaction of the node 2, such as log replay and transaction rollback, and the like, and may have an organizer in the processing process, the node 2 itself and/or any other node participate, which is not limited in the present application, and details of the specific processing may be referred to the prior art embodiments, and will not be described herein. For example, based on the cluster arbitration result, the node 1 determines that DRC relocation needs to be performed, and then the cluster may reallocate the current DRC, and the specific allocation procedure may refer to fig. 4a to 4b, which are not described herein. It should be noted that fig. 4a to fig. 4b are only schematic illustrations of a flow of DRC reassignment, and the flow of DRC reassignment (i.e. relocation) in the embodiments of the present application may be any feasible manner in the embodiments of the prior art, and the present application is not limited thereto.
The flow in fig. 7 is described in detail below in several specific embodiments.
Scene one:
taking the schematic view of the scenario shown in fig. 5 as an example, please refer to fig. 5, in this scenario, the cluster includes nodes 1 to 3.
For example, the node 1 executes the flow in fig. 7, and this flow is referred to as a first circulation flow in this example, and it should be noted that, in the embodiment of the present application, the first circulation flow, the second circulation flow, etc. are only better to distinguish different circulation flows, and the number of times and the precedence relationship of the circulation flows are not limited in the embodiment of the present application. For example, in this example, the node 1 further performs one or more loop flows before the first loop flow is performed, which is not limited in this application and will not be repeated hereinafter.
Illustratively, node 1 performs S701 and S701, taking a stable node snapshot. For example, it is assumed that, in the previous flow, the cluster includes nodes 1 to 3, that is, the stable node snapshot obtained this time includes identification information of nodes 1 to 3, so as to indicate that the online nodes in the cluster in the previous flow include nodes 1 to 3.
Illustratively, the node 1 performs S703 and S704 as an organizer, and obtains a real-time online node snapshot. The real-time online node snapshot includes identification information of the nodes 1 to 3 to indicate that the current cluster includes the nodes 1 to 3.
Illustratively, the node 1 executes S705 and S706, and obtains the operation states of the respective nodes (including the nodes 1 to 3). The working states of the nodes 1 to 3 are IN, that is, IN the cluster, and the transaction can be normally processed (which can be understood as that the working states of the devices are normal).
Illustratively, node 1 performs S707, and node 1 determines that the node arbitration results of each node are all unprocessed, i.e., each node (including nodes 1 to 3) is a stable node, based on the arbitration rules in table 1, without any processing.
Illustratively, node 1 performs S708, node 1 obtains a change in the number of nodes in the cluster, in which example, node 1 detects that the number of nodes in the cluster is unchanged, and determines that the cluster arbitration result is: no DRC relocation is required.
Illustratively, node 1 performs S709, and node 1 determines the corresponding execution policy to do nothing based on the node arbitration result and the arbitration result in the cluster. Node 1 re-executes S701. It should be noted that details not described in the first to third scenes may refer to the relevant content in fig. 7, and the description is not repeated in this application.
Illustratively, the present flow ends. In this example, the node 1 re-executes S701 to S709, which is referred to as a second loop flow in this example, and the execution flow is the same as above, and will not be described here again. Fig. 8 is a schematic view of an exemplary application scenario. Referring to fig. 8, in this scenario, the device of node 3 fails and the database process exits. CM3 detects that DB3 is abnormally exited, attempts to re-pull DB3, and successfully pulls DB3, i.e., DB3 restarts. That is, in this scenario, node 3 is still in the cluster and reboots.
As described above, the states of the nodes include: OUT state, JOIN state, refrm state, and IN state. Illustratively, after restarting, node 3 is in the OUT state, switches to the JOIN state after initialization is complete, and waits for a refrm, i.e., a failure recovery procedure.
In the embodiment of the present application, the state of the node is asynchronous with the flow in fig. 7. For example, if the node 1 re-executes the processes of S701 to S709, that is, the second loop flow, and the node 3 is abnormal after the node 1 executes S705 and S706, the node 1 considers the node 3 as a stable node in the present flow. When the node 1 re-executes S701 to S709 again, possibly before S706 is executed, the node 3 has already entered the JOIN state, and the node state acquired by the node 1 is the JOIN state. Of course, in another embodiment, the node 1 may still be in the OUT state before executing S706, and the node state acquired by the node 1 is the OUT state, which is not limited in this application.
In this example, when the node 1 executes S706 in the second loop flow, the state obtained by the node 3 is the OUT state. Specifically, node 1 repeatedly performs S701-S705, where since node 3 is a reboot node, it is still in the cluster, and node 1 and node 2 are still in the cluster. Correspondingly, the real-time online node snapshot comprises nodes 1 to 3. And, because node 1-node 3 is in the cluster in last procedure, corresponding, include node 1-node 3 in the stable node snapshot.
Illustratively, node 1 obtains the operating state of node 3 as the OUT state in S706.
Illustratively, the node 1 executes S707, where the node 1 determines that the node arbitration results of the node 1 and the node 2 are both unprocessed, and the arbitration result of the node 3 is unprocessed, which is the current flow result. Accordingly, the node 1 ends the present flow, i.e., does not continue to execute S708 and S709, and repeatedly executes S701.
Node 1 performs a third round robin flow:
suppose that node 1 has completed initialization, i.e., entered the target state (JOIN state), before node 1 performs S706 in the third round-robin flow.
For example, the node 1 repeatedly performs S701 to S705, and the details thereof may be referred to above, which will not be described herein.
Illustratively, when the node 1 executes S706, the operating states of the node 1 and the node 2 are obtained as IN, and the operating state of the node 3 is JOIN.
Illustratively, the node 1 executes S707, and the node 1 obtains that the node arbitration results of the node 1 and the node 2 are both unprocessed, and the node arbitration result of the node 3 is a log and a transaction of the processing node.
Illustratively, node 1 performs S708, and node 1 detects that the number of nodes in the cluster has not changed, and determines that the arbitration result of the cluster is that DRC relocation is not required.
Illustratively, node 1 executes S709, and node 1 obtains an execution policy of: node 1 and node 2 do not process and node 3 needs to process logs and transactions. And DRC relocation is not needed in the cluster. Accordingly, the cluster may be executed based on the acquired policies, i.e. without performing any processing on node 1 and node 2, and with corresponding processing on the log and transactions of node 3. Such as transaction rollback, etc., the present application is not limited.
In the embodiment of the present application, compared to the prior art, as shown in fig. 4a to 4b, in the prior art, if node abnormality occurs, the cluster immediately performs corresponding processing, for example, if there is a node restart, the cluster needs to perform DRC shifting, and after the node restart, the cluster performs DRC shifting again, and in this process, at least two DRC shifting needs to be performed. In the embodiment of the present application, under the condition that the node is abnormal, the organizer may obtain the state of the cluster based on the real-time node snapshot, the stable node snapshot and the working state of the node, instead of only processing for a single node. In the embodiment of the present application, as described in scenario one, in some specific scenarios (i.e., the number of the clusters is unchanged) including the node reboot in the cluster, the cluster does not need to perform DRC shifting, so as to reduce network consumption and other resource consumption caused by DRC shifting, and improve the overall processing efficiency of failure recovery.
Scene II:
fig. 9 is a schematic diagram of an exemplary application scenario. Referring to fig. 9, for example, in this scenario, node 3 (i.e., server 3) is removed at any time after the end of the previous failure processing flow. For example, the device of node 3 fails, CM3 attempts to pull up (i.e. triggers a database restart in node 3) after detecting that node 3 fails, and node 3 is removed from the cluster after several attempts have failed. Also, in this scenario, the newly joining node 4, i.e., the server 4, the CM4 and the database 4 are included in the server 4.
In this example, the node 1 performs a first round-robin flow in accordance with the flow in fig. 7. In this example, the node 1 obtains the stable node snapshot, the real-time online node snapshot, and the working states of the nodes, and the specific obtaining manner may refer to the related content in fig. 7, which is not described herein.
In this example, since node 3 is still in the cluster before the last failure process ends, the corresponding stable node snapshot includes identification information of node 1, node 2, and node 3, which is used to indicate that node 1, node 2, and node 3 are included in the cluster before the last failure process ends.
In this example, since node 3 has been removed, and node 4 is a newly added node. Correspondingly, the real-time online node snapshot comprises a node 1, a node 2 and a node 4, and is used for indicating that the current cluster comprises the node 1, the node 2 and the node 4. Optionally, before executing the current loop, or before executing S703, the node 1 has sent join indication information to each node in the cluster, so as to indicate joining the cluster. Accordingly, each node determines that node 4 has joined the cluster in response to the received joining indication information.
Illustratively, node 1 obtains the state of all nodes in the cluster (including node 1, node 2, and node 4). The specific acquisition manner may refer to the relevant content in fig. 7, and will not be described herein. Illustratively, the state of node 1 is IN state, the state of node 2 is IN state, and the state of node 4 is JOIN state.
In one possible implementation, similar to the description in scenario one, since the operation state of the node 4 and the loop flow executed by the node 1 are asynchronous, the node 4 state that may be acquired by the node 1 may also be an OUT state, which is not limited in this application. Accordingly, if the state of the node 4 is the OUT state, the description is omitted here similarly to the first scenario.
Illustratively, node 1 performs S707, and node 1 arbitrates each node based on table 1, and determines that the arbitration results of node 1 and node 2 are both unprocessed. The arbitration result of node 3 is a log and transaction of the processing node. The result of the arbitration by node 4 is to handle the joining of new nodes.
Illustratively, node 1 performs S708, node 1 detecting the number of nodes in the cluster. In this example, the number of nodes in the cluster is unchanged, and accordingly, the node 1 detects that the number of nodes in the cluster is unchanged, the node 1 may determine that the corresponding cluster arbitration result is that DRC movement is not required.
Illustratively, the node 1 executes S709, and the node 1 obtains an execution policy based on the node arbitration result and the cluster arbitration result, including: node 1 and node 2 do not need to process, node 3 needs to process related logs and transactions, node 4 needs to perform related processes of new node joining, for example, related processes include DRC managed by legacy node 3, and other contents to be processed can refer to the prior art embodiment, the scheme of this embodiment of the present application mainly describes the process of DRC in the cluster, and other processes of the node can refer to related steps in the prior art embodiment, which is not repeated herein.
Thus, the method in the embodiment of the application can process node removal and node addition simultaneously. Also, in the case where there is no change in the number of cluster nodes, DRC relocation does not need to be performed, but in the prior art, after detecting that node 3 is removed, the clusters will reallocate DRC, and after having a new node 4 joined, the DRC is reallocated again. Compared with the prior art, the method and the device can effectively reduce network consumption caused by DRC allocation, and can effectively improve the processing efficiency of the fault recovery process by simultaneously processing node removal and node addition.
After the execution of the current fault recovery process (i.e., the circulation process) is finished, the CM saves the acquired real-time online node snapshot (i.e., the identification information including the node 1, the node 2 and the node 4) into the shared storage, and uses the real-time online node snapshot as a stable node snapshot corresponding to the current process for the next process.
Illustratively, node 1 may continue to perform the failback procedure of fig. 7, this time referred to as the second round robin procedure:
in the second loop flow, the node 1 executes S701 and S702, and acquires a stable node snapshot. After the first circulation flow is finished, the stored stable node snapshot includes node 1, node 2 and node 4, which are used for indicating that the nodes online in the last (i.e., first) circulation flow include node 1, node 2 and node 4.
Illustratively, the node 1 performs S703 and S704, obtains a real-time online node snapshot, where the snapshot includes node 1, node 2, and node 4, and indicates that the currently online node includes node 1, node 2, and node 4.
Illustratively, node 1 performs S706 and S706, assuming that in this example, the current state of node 4 is a REFORM state, that is, the cluster is executing REFORM for node 4, it may also be understood that after the first round of flow is over, the cluster is still processing the relevant transaction of node 4.
Illustratively, node 1 obtains the operating states of node 1 and node 2 as IN and node 4 as REFORM. And the node 1 executes S707 to determine that the node arbitration results of the node 1 and the node 2 are not processed, and the node arbitration result of the node 4 is not processed, and the flow ends. Note that in this example, since only the arbitration flow is performed and the restoration flow is not performed (i.e., S709), accordingly, there is no need to save the stable node snapshot. That is, the stable node snapshot stored in the current store is still the snapshot saved by the last restore process.
Correspondingly, the node 1 executes a third circulation flow:
for example, the node 1 performs S701 to S704, and the details thereof may be referred to above, which are not described herein. The node 1 performs S705 to S706, assuming that the node 4 has completed the refrm procedure, i.e. successfully joined the cluster and the operation state is normal, and correspondingly, the current state of the node 4 is IN state. For example, the states of node 1, node 2 and node 4 obtained by node 1 are all IN states, and the subsequent steps may refer to the description IN scenario one, which is not described herein.
Scene III:
fig. 10 is a schematic view of an exemplary application scenario. Referring to fig. 10, for example, in this scenario, the cluster includes nodes 1 to 3, that is, nodes 1 to 3 are all stable nodes. At this time, node 4 and node 5 are newly added in the cluster. Wherein node 4 includes, but is not limited to, CM4 and DB4, and node 5 includes, but is not limited to, CM5 and DB5.
In this example, the node 1 performs a first round-robin flow in accordance with the flow in fig. 7. In this example, the node 1 obtains the stable node snapshot, the real-time online node snapshot, and the working states of the nodes, and the specific obtaining manner may refer to the related content in fig. 7, which is not described herein.
In this example, the stable node snapshot includes node 1, node 2, and node 3, and the real-time online snapshot includes node 1, node 2, node 3, node 4, and node 5.
The node 1 executes S705 to S706. In the embodiment of the present application, the node 4 and the node 5 serve as newly added nodes, and the states of the devices of the nodes may be the same or different, which depends on the performance of the nodes. For example, when the node 4 and the node 5 JOIN the cluster simultaneously and the node 1 performs S705 to S706, the operating state fed back by the node 4 may be an OUT state, and the operating state fed back by the node 5 may be a JOIN state. For another example, the operating states fed back by the node 4 and the node 5 may be OUT or JOIN, which is not limited in this application.
In this example, taking the state of feedback from node 4 as the OUT state, the state of feedback from node 5 is the JOIN state as an example.
For example, the node 1 executes S707 to determine that the node arbitration results of the nodes 1 to 3 are all unprocessed, the node arbitration result of the node 4 is unprocessed, the current flow is ended, and the node arbitration result of the node 5 is processed to join a new node.
Illustratively, the current flow is ended as a result of the node 4.
Node 1 re-executes S701, i.e., enters the second loop flow:
the node 1 executes S701 to S704, and the node 1 acquires the real-time online node snapshot including the nodes 1 to 5 and the stable node snapshot including the nodes 1 to 3. Wherein, as described above, the stable node snapshot is saved after the execution of the recovery process is completed, and since the recovery process of S709 is not executed in the first cycle process, correspondingly, the currently saved stable node snapshot is still saved before the first cycle process. That is, the stable node snapshot acquired this time is the same as the stable node snapshot acquired in the first circulation flow.
Illustratively, the node 1 executes S705 to S706 to obtain the operating states of all the nodes (including the nodes 1 to 5) in the cluster. The specific acquisition manner may refer to the relevant content in fig. 7, and will not be described herein. Illustratively, node 1 obtains the information comprising: the states of the nodes 1 to 3 are IN states, and the states of the nodes 4 and 5 are JOIN states. That is, the node 4 has entered the JOIN state before the node 1 performs S705.
The node 1 performs S707, and the node 1 acquires the arbitration result of each node based on table 1, including: the arbitration results of the nodes 1 to 3 are not processed, and the arbitration results of the nodes 4 and 5 are processed for adding new nodes.
Illustratively, node 1 performs S708, detects a change in the number of clusters, and determines that the arbitration result of the clusters is that DRC relocation needs to be performed.
Illustratively, node 1 executes S709, and node 1 determines an execution policy based on the arbitration result of the node and the arbitration result of the cluster as: nodes 1 to 3 do not process, nodes 4 and 5 need to perform the joining process of the new node, and the cluster needs to perform DRC relocation. Accordingly, the clusters reassign DRCs among the clusters so that node 4 and node 5 correspond to DRCs. For example, the cluster includes 120 DRCs, and each of the nodes 1 to 3 manages 40 DRCs. In the DRC relocation process, the clusters reallocate DRCs, and each of the nodes 1 to 5 manages 24 DRCs, and the specific reallocation method can refer to the prior art embodiment and is not repeated in the present application.
The above description has been presented mainly from the point of interaction between the network elements. It will be appreciated that the control means, in order to achieve the above-described functions, comprise corresponding hardware structures and/or software modules performing the respective functions. Those of skill in the art will readily appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In one example, fig. 11 shows a schematic block diagram of a control device 1100 of an embodiment of the present application. The control device may include: the processor 1101 and transceiver/transceiving pin 1102, and optionally, a memory 1103. The processor 1101 is operable to perform the steps performed by an organizer (which may be understood as a database or node) in the methods of the previous embodiments and to control the receive pins to receive signals and the transmit pins to transmit signals.
The various components of the control device 1100 are coupled together by a bus 1104, where the bus system 1104 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, the various buses are labeled in the drawing as bus system 1104.
Alternatively, the memory 1103 may be used for storing instructions in the foregoing method embodiments.
It should be understood that the control device 1100 according to the embodiment of the present application may correspond to a node in each of the methods of the foregoing embodiments, and that the foregoing and other management operations and/or functions of each element in the control device 1100 are respectively for implementing the corresponding steps of each of the foregoing methods, which are not repeated herein for brevity.
All relevant contents of each step related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.
Based on the same technical idea, the embodiments of the present application also provide a computer readable storage medium storing a computer program, where the computer program includes at least one piece of code, and the at least one piece of code is executable by a control device to control the control device to implement the above-mentioned method embodiments.
Based on the same technical idea, the embodiments of the present application also provide a computer program for implementing the above-mentioned method embodiments when the computer program is executed by a control device.
The program may be stored in whole or in part on a storage medium that is packaged with the processor, or in part or in whole on a memory that is not packaged with the processor.
Based on the same technical concept, the embodiment of the application also provides a processor, which is used for realizing the embodiment of the method. The processor may be a chip.
The steps of a method or algorithm described in connection with the disclosure of the embodiments disclosed herein may be embodied in hardware, or may be embodied in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access Memory (Random Access Memory, RAM), flash Memory, read Only Memory (ROM), erasable programmable Read Only Memory (Erasable Programmable ROM), electrically Erasable Programmable Read Only Memory (EEPROM), registers, hard disk, a removable disk, a compact disc Read Only Memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may reside in a network device. The processor and the storage medium may reside as discrete components in a network device.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Claims (15)

1. A control method, characterized by being applied to a database cluster, the method comprising:
A first node obtains the change condition and the working state of the online state of each node in the database cluster; the first node belongs to the database cluster;
the first node determines a node arbitration policy corresponding to each node based on the change condition and the working state of the online state of each node; the node arbitration policy is used for indicating whether the operation of the database information of the nodes in the database cluster is needed;
the first node obtains the change condition of the number of nodes in the database cluster;
the first node determines a cluster policy corresponding to the database cluster based on the change condition of the number of nodes in the database cluster; the cluster strategy is used for indicating whether metadata information in the database cluster needs to be moved or not;
and the first node executes corresponding fault recovery operation based on the node arbitration policy and the cluster policy.
2. The method of claim 1, wherein before the first node obtains the change condition and the working state of the online state of each node in the database cluster, the method comprises:
The first node negotiates with other nodes in the database cluster to determine that the first node is an organizer, and the other nodes in the database cluster are participants.
3. The method of claim 1, wherein the first node obtains a change in presence status and an operational status of each node in the database cluster, comprising:
the first node acquires a stable node snapshot from the shared storage of the database cluster, wherein the stable node snapshot comprises identification information of at least one node in an online state in the database cluster before the last fault recovery operation is executed;
the method comprises the steps that a first node obtains a real-time online node snapshot, wherein the real-time online node snapshot comprises identification information of at least one node currently in an online state in a database cluster;
the first node determines a change condition of the online state of each node based on the stable node snapshot and the real-time online node snapshot.
4. The method of claim 1, wherein the first node determines a node arbitration policy corresponding to each node based on the change condition and the working state of the online state of each node, comprising:
If the change condition of the online state of the single node indicates that the single node is a restart node and the working state of the single node is a target state, determining that the arbitration policy of the single node is that the log and the transaction of the single node need to be processed in the fault recovery operation;
if the change condition of the online state of the single node indicates that the single node is a removed node, determining the arbitration policy of the single node as the log and the transaction of the single node need to be processed in the fault recovery operation.
5. The method of claim 1, wherein the first node determines a cluster policy corresponding to the database cluster based on a change in a number of nodes in the database cluster, comprising:
and if the change condition of the node number indicates that the node number in the database cluster is unchanged, determining the cluster strategy to avoid moving metadata information in the database cluster.
6. The method of claim 1, wherein the first node determines a cluster policy corresponding to the database cluster based on a change in a number of nodes in the database cluster, comprising:
And if the change condition of the node quantity indicates the change of the node quantity in the database cluster, determining the cluster strategy to be that the metadata information in the database cluster needs to be moved.
7. A control apparatus, characterized by comprising:
one or more processors;
a memory;
and one or more computer programs, wherein the one or more computer programs are stored on the memory, which when executed by the one or more processors, cause the apparatus to perform the steps of:
acquiring the change condition and the working state of the online state of each node in the database cluster;
determining a node arbitration policy corresponding to each node based on the change condition and the working state of the online state of each node; the node arbitration policy is used for indicating whether the operation of the database information of the nodes in the database cluster is needed;
acquiring the change condition of the number of nodes in the database cluster;
determining a cluster policy corresponding to the database cluster based on the change in the number of nodes in the database cluster; the cluster strategy is used for indicating whether metadata information in the database cluster needs to be moved or not;
And executing corresponding fault recovery operation based on the node arbitration policy and the cluster policy.
8. The apparatus of claim 7, wherein the computer program, when executed by the one or more processors, causes the apparatus to perform the steps of: negotiating with other nodes in the database cluster, determining the nodes as organizers, wherein the other nodes in the database cluster are participants.
9. The apparatus of claim 7, wherein the computer program, when executed by the one or more processors, causes the apparatus to perform the steps of: obtaining a stable node snapshot from the shared storage of the database cluster, wherein the stable node snapshot comprises identification information of at least one node in an online state in the database cluster before the last fault recovery operation is executed;
acquiring a real-time online node snapshot, wherein the real-time online node snapshot comprises identification information of at least one node currently in an online state in a database cluster;
and determining the change condition of the online state of each node based on the stable node snapshot and the real-time online node snapshot.
10. The apparatus of claim 7, wherein the computer program, when executed by the one or more processors, causes the apparatus to perform the steps of: if the change condition of the online state of the single node indicates that the single node is a restart node and the working state of the single node is a target state, determining that the arbitration policy of the single node is that the log and the transaction of the single node need to be processed in the fault recovery operation;
if the change condition of the online state of the single node indicates that the single node is a removed node, determining the arbitration policy of the single node as the log and the transaction of the single node need to be processed in the fault recovery operation.
11. The apparatus of claim 7, wherein the computer program, when executed by the one or more processors, causes the apparatus to perform the steps of:
and if the change condition of the node number indicates that the node number in the database cluster is unchanged, determining the cluster strategy to avoid moving metadata information in the database cluster.
12. The apparatus of claim 7, wherein the computer program, when executed by the one or more processors, causes the apparatus to perform the steps of:
And if the change condition of the node quantity indicates the change of the node quantity in the database cluster, determining the cluster strategy to be that the metadata information in the database cluster needs to be moved.
13. A computer storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the method of any of claims 1-6.
14. A computer program product, characterized in that the computer program product, when run on a computer, causes the computer to perform the method according to any of claims 1-6.
15. A chip comprising one or more interface circuits and one or more processors; the interface circuit is configured to receive a signal from a memory of an electronic device and to send the signal to the processor, the signal including computer instructions stored in the memory; the computer instructions, when executed by the processor, cause the electronic device to perform the method of any of claims 1-6.
CN202211042061.5A 2022-08-29 2022-08-29 Control method and device Pending CN117667469A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211042061.5A CN117667469A (en) 2022-08-29 2022-08-29 Control method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211042061.5A CN117667469A (en) 2022-08-29 2022-08-29 Control method and device

Publications (1)

Publication Number Publication Date
CN117667469A true CN117667469A (en) 2024-03-08

Family

ID=90071903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211042061.5A Pending CN117667469A (en) 2022-08-29 2022-08-29 Control method and device

Country Status (1)

Country Link
CN (1) CN117667469A (en)

Similar Documents

Publication Publication Date Title
CN107771321B (en) Recovery in a data center
US11888599B2 (en) Scalable leadership election in a multi-processing computing environment
US11755435B2 (en) Cluster availability management
US8074222B2 (en) Job management device, cluster system, and computer-readable medium storing job management program
US20220027246A1 (en) Storage system and control software deployment method
US8417899B2 (en) System and method for controlling access to shared storage device
WO2018103318A1 (en) Distributed transaction handling method and system
US8140623B2 (en) Non-blocking commit protocol systems and methods
JP4378335B2 (en) Device for dynamically switching transaction / data writing method to disk, switching method, and switching program
EP2434729A2 (en) Method for providing access to data items from a distributed storage system
US20070043726A1 (en) Affinity-based recovery/failover in a cluster environment
US11550820B2 (en) System and method for partition-scoped snapshot creation in a distributed data computing environment
US20060198386A1 (en) System and method for distributed information handling system cluster active-active master node
WO2014000430A1 (en) Method and apparatus for realizing high availability cluster heartbeat services
US10367676B1 (en) Stable leader selection for distributed services
CN109783204A (en) A kind of distributed transaction processing method, device and storage medium
US11003550B2 (en) Methods and systems of operating a database management system DBMS in a strong consistency mode
US10970177B2 (en) Methods and systems of managing consistency and availability tradeoffs in a real-time operational DBMS
JP5707409B2 (en) calculator
US10169157B2 (en) Efficient state tracking for clusters
CN117667469A (en) Control method and device
EP1107119A2 (en) Extending cluster membership and quorum determinations to intelligent storage systems
RU2714602C1 (en) Method and system for data processing
JP6100135B2 (en) Fault tolerant system and fault tolerant system control method
Ito et al. Automatic reconfiguration of an autonomous disk cluster

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication