CN109639794B - State cluster recovery method, device, equipment and readable storage medium - Google Patents

State cluster recovery method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN109639794B
CN109639794B CN201811507350.1A CN201811507350A CN109639794B CN 109639794 B CN109639794 B CN 109639794B CN 201811507350 A CN201811507350 A CN 201811507350A CN 109639794 B CN109639794 B CN 109639794B
Authority
CN
China
Prior art keywords
identity
node
master
cluster
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811507350.1A
Other languages
Chinese (zh)
Other versions
CN109639794A (en
Inventor
杜鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dt Dream Technology Co Ltd
Original Assignee
Hangzhou Dt Dream Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dt Dream Technology Co Ltd filed Critical Hangzhou Dt Dream Technology Co Ltd
Priority to CN201811507350.1A priority Critical patent/CN109639794B/en
Publication of CN109639794A publication Critical patent/CN109639794A/en
Application granted granted Critical
Publication of CN109639794B publication Critical patent/CN109639794B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/51Discovery or management thereof, e.g. service location protocol [SLP] or web services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5041Network service management, e.g. ensuring proper service fulfilment according to agreements characterised by the time relationship between creation and deployment of a service
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/50Address allocation
    • H04L61/5007Internet protocol [IP] addresses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/163In-band adaptation of TCP data exchange; In-band control procedures

Abstract

The invention discloses a recovery method of a stateful cluster, which comprises the following steps: after the target node is restarted, acquiring an identity identification file of the distributed coordination service record; determining the identity of the main node by using the identity file, and judging whether the identity of the main node is the same as the local identity; if yes, acquiring a distributed lock of the distributed coordination service, and setting a VIP (virtual IP interface) for providing access service to the outside by the state cluster in a local network card; if not, after the master node acquires the distributed lock, the slave node is added into the stateful cluster and added into the master identity application queue. The integrity of the stateful cluster data can be guaranteed when the stateful cluster runs, when the cluster restarts or when a single node restarts. The invention also discloses a recovery device, equipment and a readable storage medium for the stateful cluster, and the recovery device and the equipment have corresponding technical effects.

Description

State cluster recovery method, device, equipment and readable storage medium
Technical Field
The present invention relates to the field of computer application technologies, and in particular, to a method, an apparatus, a device, and a readable storage medium for recovering a stateful cluster.
Background
In IT systems such as cloud computing, big data, artificial intelligence and the like, a plurality of key services store core data of services, the normal operation of the services is the premise of stable operation of the system, and in order to solve the problems of single-point failure and data loss, a cluster is generally formed by using a method of redundant backup of a plurality of nodes, and the services are uniformly provided to the outside. These services with variable data are called stateful services. Such as mariaddb cluster of Galera technology as database service, active-standby cluster of ovn-db, and active-standby cluster of mongo, such as rabbitmq-server as message forwarding service. When the node providing the service is abnormal (such as power failure and network abnormality), the service of other nodes can continue to work.
A plurality of nodes form a state cluster, each node stores respective data, and the data consistency of each node is ensured through cluster heartbeat and synchronization. Some clusters provide read-write capability for multiple nodes simultaneously, such as Galera-mariaddb, rabbitmq-server; some clusters are divided into a master + slave role, only a master node provides read-write capability, and slave only provides read capability. In the aspect of cluster recovery, each cluster can easily solve the problems of single-point failure and rejoining. However, it is difficult to restore the cluster to normal if a plurality of nodes of the cluster are abnormal (such as power failure and network oscillation), or even all of the nodes are abnormal, or the cluster is shut down in plan (such as shutdown of the cluster for maintenance). The problem is more prominent especially in the scene requiring automatic recovery from full power-on. The concrete points are as follows: when the cluster is restarted, the last hung node should be restarted for the first time to ensure that the data is the most complete and complete. That is, when the whole cluster restarts, an arbitration module is needed to decide which node starts first, which is often determined by the order of cluster shutdown, and the arbitration module decides which node is last shutdown by probing and then lets the node start first to ensure data integrity (since the last shutdown node has the most complete data, the data of the node that is shutdown in advance may not be complete). As shown in fig. 1 (the start-up sequence is the same as the sequence of the open arrows shown, as opposed to the close time sequence), it is common to start mariaddb clusters by an agent of mariaddb, such as a placemaker. However, the following disadvantages exist in managing the start-up and operation of the cluster by the additional Pacemaker module:
the first disadvantage is that: the pacemaker itself relies on corosyn, which has poor stability when the network oscillates, and the complexity of the system is increased.
The second disadvantage is that: when the placemaker manages each business module, the agent needs to be configured, the implementation of each agent is different, and when the version of the business module is upgraded, the agents are possibly incompatible.
The third disadvantage is that: the pacemaker is suitable for the native starting mode of the business module, but the pacemaker cannot do the best for the containerized business module.
The defect four is as follows: the placemaker state machine is complex, and the agent is high in coupling with the service module, which causes difficulty in maintenance.
In summary, how to effectively solve the problems of guaranteeing data integrity and the like when a cluster is restarted is a technical problem that needs to be solved urgently by those skilled in the art at present.
Disclosure of Invention
The invention aims to provide a method, a device and equipment for recovering a stateful cluster and a readable storage medium, so as to guarantee the integrity of data when the cluster is restarted.
In order to solve the technical problems, the invention provides the following technical scheme:
a stateful cluster recovery method, comprising:
after the target node is restarted, acquiring an identity identification file of the distributed coordination service record;
determining a host node identity by using the identity file, and judging whether the host node identity is the same as a local identity or not;
if yes, acquiring a distributed lock of the distributed coordination service, and setting a VIP (virtual IP interface) for providing access service to the outside by the state cluster in a local network card;
and if not, after the master node acquires the distributed lock, adding the slave node identity into the stateful cluster and adding the slave node identity into the master identity applying queue.
Preferably, the setting of the VIP for providing the access service to the outside by the state cluster in the local network card includes:
and setting the master-slave service state of the local machine as a master state, and adding the VIP in the local network card.
Preferably, after the adding the slave node identity to the stateful cluster and adding the application master identity queue, the method further includes:
and circularly monitoring the distributed lock of the distributed coordination service and the state change message of the master service and the slave service.
Preferably, after the circularly monitoring the distributed lock of the distributed coordination service and the status change message of the master-slave service, the method further includes:
acquiring the distributed lock, executing the distributed lock for acquiring the distributed coordination service, and setting a VIP (very important person) for providing access service to the outside by a state cluster in a local network card;
and writing the local identifier into the identity identifier file as the identity identifier of the main node.
Preferably, the acquiring the distributed lock includes:
in a contention manner, the distributed lock is acquired.
Preferably, after the target node is restarted, acquiring an identity file of the distributed coordination service record, including:
after the target node is restarted, judging whether the VIP exists in the local network card or not;
if the VIP exists, deleting the VIP from the local network card, initializing the local state to an initialization state, and setting the state into the identity file;
and acquiring the identity identification file through an interface of the distributed coordination service.
Preferably, the distributed coordination service continuously maintains the identity file, and updates the identity file when a master identity changes; the current state of the target node is saved in the identification file, and the current state is any one of an initialization state, a master preparation state, a master waiting state, a slave preparation state and a slave state.
A stateful cluster restoration apparatus, comprising:
the identity identification file acquisition module is used for acquiring the identity identification file of the distributed coordination service record after the target node is restarted;
the identification judgment module is used for determining the identity of the main node by using the identity file and judging whether the identity of the main node is the same as the local identification;
the master identity determining module is used for acquiring a distributed lock of the distributed coordination service if the master identity determining module is used, and a VIP (very important person) for providing access service to the outside by a state cluster is arranged in a local network card;
and the slave identity determining module is used for adding the slave node identity into the stateful cluster and adding the slave node identity into the application master identity queue after the master node acquires the distributed lock if the slave node identity is not the stateful cluster.
A stateful cluster recovery device, comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the state cluster recovery method when the computer program is executed.
A readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the above-mentioned stateful cluster restoration method.
By applying the method provided by the embodiment of the invention, after the target node is restarted, the identity identification file of the distributed coordination service record is obtained; determining the identity of the main node by using the identity file, and judging whether the identity of the main node is the same as the local identity; if yes, acquiring a distributed lock of the distributed coordination service, and setting a VIP (virtual IP interface) for providing access service to the outside by the state cluster in a local network card; if not, after the master node acquires the distributed lock, the slave node is added into the stateful cluster and added into the master identity application queue.
Under the condition that only the target node is restarted or the whole cluster is restarted, the target node firstly obtains the identity identification file after being restarted, and then the identity of the master node is determined by utilizing the identity identification file. And judging whether the identity identification of the host node is the same as the local identification, if so, indicating that the target node is the host node before restarting, acquiring the distributed lock, and setting a VIP (very important person) for providing access to the outside by the state cluster in a local network card. The master node is locked in a distributed lock mode, and when the master node fails, other nodes compete for the master identity in a distributed lock competition mode; if the distributed locks are different, the target node is a slave node before restarting, and the slave node can be added into the stateful cluster and the queue applying the master identity after the master node acquires the distributed locks. Because the VIP providing the service to the outside by the stateful cluster is still arranged on the main node after the restart, the restart sequence is not required even when the whole cluster is restarted, and the VIP can be directly rejoined into the cluster according to the master-slave identity of each node in the stateful service before the restart. In addition, if the master node is abnormal in the service operation process, the identity of the master node can be obtained by other nodes in the master identity application queue in a distributed lock competition mode. Therefore, the integrity of the data of the stateful cluster can be guaranteed when the stateful cluster runs, the cluster is restarted or the single node is restarted. Compared with the method for managing the starting and the running of the cluster through an additional Pacemaker module, the method provided by the embodiment of the invention has the advantages of simple structure and easiness in implementation.
In addition, in the embodiment of the invention, the VIP which provides the service to the outside by the state cluster is arranged on the main node, the TCP connection can be directly established between the service and the client, the SLB can be bypassed, and better maintainability and high-efficiency communication can be realized.
Accordingly, embodiments of the present invention further provide a stateful cluster restoring apparatus, a device, and a readable storage medium corresponding to the stateful cluster restoring method, which have the above technical effects and are not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a restart diagram of a recovery method of a stateful cluster according to the prior art;
FIG. 2 is a flowchart illustrating an implementation of a stateful cluster recovery method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating operation of a stateful cluster before restarting in an embodiment of the present invention;
FIG. 4 is a restart diagram illustrating restarting a stateful cluster in an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating the activation of a master module according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating operations of a master module on a slave node when a master node fails according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a master node reboot according to an embodiment of the present invention;
FIG. 8 is a diagram illustrating a slave node rebooting in an embodiment of the present invention;
fig. 9 is a schematic diagram of node IP planning in an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a stateful cluster recovery apparatus in an embodiment of the present invention;
fig. 11 is a schematic structural diagram of a stateful cluster recovery device in the embodiment of the present invention;
fig. 12 is a schematic structural diagram of a stateful cluster recovery device in an embodiment of the present invention.
Detailed Description
The core of the invention is to provide a stateful cluster recovery method,
in order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The first embodiment is as follows:
referring to fig. 2, fig. 2 is a flowchart of a stateful cluster restoration method in an embodiment of the present invention, where the method is applicable to each node in a stateful cluster, and the method includes the following steps:
s101, after the target node is restarted, acquiring an identity identification file of the distributed coordination service record.
It should be noted that the target node in the embodiment of the present invention may be any one target node in the stateful cluster. The target node restart may be caused by a power-down restart, a fault restart, or other planned restarts, and the target node restart may be a single-point restart or may be each node when the whole stateful cluster is restarted as a whole. That is, when the stateful cluster is restarted, each node may be regarded as a target node, and the recovery method provided by the embodiment of the present invention is executed. Wherein, the stateful cluster can be maridb + galera, rabbitmq-server, ovn-db, mongodb. In addition, the recovery method for the stateful cluster provided in the embodiment of the present invention is also applicable to restart recovery of stateful cluster software, and specific recovery methods for the stateful cluster software refer to the recovery method for the stateful cluster described herein.
And after the target node is restarted, firstly, acquiring the identity identification file of the distributed coordination service record. Specifically, the distributed coordination service may be a common distributed coordination service such as an ETCD, a consul, a zookeeper, and the like, which is not described herein in detail.
The distributed coordination service continuously maintains the identity identification file, and updates the identity identification file when the main identity changes; the current state of the target node is stored in the identity file, and is any one of an initialization state, a master preparation state, a master waiting state, a slave preparation state and a slave state. The init may represent an initialization state, the master may represent a master state, the to _ master may represent a master ready state, the wait _ master may represent a master wait state, the to _ slave may represent a slave ready state (or called a standby ready state), and the slave may represent a slave state (or called a standby state). When the state of the target node is changed, the current state of the target node can be written into the identity file. Specifically, the id file may be a/var/run/leader file.
S102, determining the identity of the main node by using the identity file, and judging whether the identity of the main node is the same as the identity of the local computer.
The node ID corresponding to the main state can be read from the identification file, and the node ID is judged to be the same as the local ID. If the identity is the same, the identity of the main node is the same as the identity of the main node. Of course, if the target node is restarted together with the whole cluster, the current state of the local identifier may also be read from the identifier file (where the current state refers to the current state of the target node last saved by the distributed coordination service before restarting), and if the current state is the main state, it may also be considered whether the identifier of the main node is the same as the local identifier, and if the current state is not the main state, it is considered that the identifier of the main node is different from the local identifier. Therefore, whether the target node is a master node before restarting the stateful cluster or whether a new master node exists after the target node is disconnected can be determined according to the judgment result.
If the judgment result is yes, the target node is still the main node; if the judgment result is negative, the target node is the slave node. Adding the state clusters with different identities according to different judgment results, namely executing the operation of the step S103 if the state clusters are added with different identities; if not, the operation of step S104 is performed.
S103, acquiring a distributed lock of the distributed coordination service, and setting a VIP (virtual IP interface) for providing access service to the outside by the state cluster in the local network card.
In the embodiment of the invention, the identity of the master node is locked by adopting a distributed lock mode, namely when the target node is determined to be the master node, the distributed lock of the distributed coordination service can be acquired. Then, a Virtual IP Address, which is a VIP (Virtual IP Address) that the state cluster provides access to the outside, may be set in the local network card. The VIP is different from the real IP address of the proxy server, and the proxy server assigns a range of virtual IP addresses to each client according to the number of clients in the Internet, and assigns a virtual IP address to each client according to a certain rule, so that the client can be indirectly connected to the Internet. The VIP is mainly used for switching between different hosts and is mainly used for master-slave switching of the server. Specifically, the specific implementation process of accessing by using the VIP may refer to the VIP in the currently common SLB service, that is, when the client accesses the access service provided by the stateful cluster, the client accesses through the VIP. And (4) deploying the VIP on the main node, namely, completely dropping the service or business processing on the main node, and canceling the SLB. It should be noted that, in the embodiment of the present invention, the service or service deployed in the stateful cluster refers to a service or service that can be automatically started after an abnormal exit is realized in a system d or docker manner. The distributed lock is locked with the main node, and when the main node fails, other nodes acquire the identity of the main node in a lock competition mode.
Specifically, when adding the VIP, the master-slave service state of the host may also be set as the master state. That is, adding VIP to the local network card can set the master-slave service state in the local node to the master state, so that the VIP can access the service to the outside and manage the service on other slave nodes. The master-slave service state is a service state having two selectable states, namely a master state and a slave state, such as ovn-db service having the master state and the slave state, and a service state on a certain node (the master state is when deployed on the master node, and the slave state is when deployed on the slave node).
And S104, after the master node acquires the distributed lock, adding the slave node into the stateful cluster and adding the slave node into the master identity application queue.
After the master node acquires the distributed lock, the target node can determine whether the slave node is added to the stateful cluster, and meanwhile, the target node can also be added to the master identity applying queue. After the main node fails, the identity of the main node is obtained, and the service stability of the state cluster is guaranteed.
By applying the method provided by the embodiment of the invention, after the target node is restarted, the identity identification file of the distributed coordination service record is obtained; determining the identity of the main node by using the identity file, and judging whether the identity of the main node is the same as the local identity; if yes, acquiring a distributed lock of the distributed coordination service, and setting a VIP (virtual IP interface) for providing access service to the outside by the state cluster in a local network card; if not, after the master node acquires the distributed lock, the slave node is added into the stateful cluster and added into the master identity application queue.
Under the condition that only the target node is restarted or the whole cluster is restarted, the target node firstly obtains the identity identification file after being restarted, and then the identity of the master node is determined by utilizing the identity identification file. And judging whether the identity identification of the host node is the same as the local identification, if so, indicating that the target node is the host node before restarting, acquiring the distributed lock, and setting a VIP (very important person) for providing access to the outside by the state cluster in a local network card. The master node is locked in a distributed lock mode, and when the master node fails, other nodes compete for the master identity in a distributed lock competition mode; if the distributed locks are different, the target node is a slave node before restarting, and the slave node can be added into the stateful cluster and the queue applying the master identity after the master node acquires the distributed locks. Because the VIP providing the service to the outside by the stateful cluster is still arranged on the main node after the restart, the restart sequence is not required even when the whole cluster is restarted, and the VIP can be directly rejoined into the cluster according to the master-slave identity of each node in the stateful service before the restart. In addition, if the master node is abnormal in the service operation process, the identity of the master node can be obtained by other nodes in the master identity application queue in a distributed lock competition mode. Therefore, the integrity of the data of the stateful cluster can be guaranteed when the stateful cluster runs, the cluster is restarted or the single node is restarted. Compared with the method for managing the starting and the running of the cluster through an additional Pacemaker module, the method provided by the embodiment of the invention has the advantages of simple structure and easiness in implementation.
In addition, in the embodiment of the invention, the VIP which provides the service to the outside by the state cluster is arranged on the main node, the TCP connection can be directly established between the service and the client, the SLB can be bypassed, and better maintainability and high-efficiency communication can be realized.
It should be noted that, based on the above embodiments, the embodiments of the present invention also provide corresponding improvements. In the preferred/improved embodiment, the same steps as those in the above-mentioned embodiment or corresponding steps can be referred to each other, and the corresponding beneficial effects can be referred to each other, and are not shown in a list in the preferred/improved embodiment.
Preferably, in view of the problem that a master node may fail during the operation of the cluster, in order to avoid the master node being missing due to the cluster, the embodiment of the present invention further provides the following solutions:
after step S104 is executed, that is, after the slave node is added to the stateful cluster and added to the application master identity queue, the following steps may also be executed:
the method comprises the steps that firstly, distributed locks of distributed coordination services and state change messages of master services and slave services are monitored circularly;
step two, acquiring a distributed lock, executing the step of acquiring the distributed lock of the distributed coordination service, and setting a VIP (very important person) for providing access service to the outside by a state cluster in a local network card;
and step three, writing the local identification as the identity identification of the main node into an identity identification file.
For convenience of description, the above three steps will be described in detail in combination.
And circularly monitoring the distributed lock of the distributed coordination service, so that the running state in the cluster can be known in real time, such as whether the master node is normal or not. The master-slave service state change information is obtained, and the state of the master-slave service of the machine can be adjusted in time so as to provide access service to the outside better. When the master node fails, the distributed lock can be acquired in a competitive manner. The method specifically comprises the following steps:
the first method is as follows: zookeeper implementing distributed lock
Implementations include implementing a shared lock with uniqueness of node names and implementing a shared lock with temporary sequential nodes. The algorithm idea of sharing the lock is realized by utilizing the uniqueness of the node name: by utilizing the uniqueness of the name, when the locking operation is carried out, only all the nodes are required to establish/test/Lock nodes together, only one node is successfully established, and the successful node obtains the Lock. And during unlocking, only deleting/test/Lock nodes, and enabling the other nodes to enter the competition creation nodes again until all the nodes acquire the locks. The algorithm idea of sharing the lock is realized by using the temporary sequence nodes: for the locking operation, all nodes can go to/lock directory to create temporary sequential nodes, and if the created nodes find the minimum node with self created node sequence number under/lock/directory, the lock is obtained. Otherwise, a node having a smaller sequence number than the self-created node (the largest node smaller than the self-created node) is monitored, and waiting is entered.
The second method comprises the following steps: redis implementation of distributed locks
redis implements distributed locks mainly by four commands: setnx (set ifnot exists maintains an optimistic lock): when no key exists, the value is set for the key. setnx differs from set: set is present key, then de-overwrite value; if setnx is that no key exists, assigning values to the key and the value again; getset: obtaining an old value according to the key and setting a new value; expire: setting an expiration time; del: and (5) deleting. The specific implementation mode is that when the lock is acquired, setnx locking is used, an timeout time is added to the lock by using an exception command, the lock is automatically released when the timeout time is exceeded, the value of the lock is a randomly generated UUID, and judgment is carried out when the lock is released. In addition, an acquisition timeout time is set when the lock is acquired, and if the acquisition timeout time is exceeded, the lock acquisition is abandoned. When releasing the lock, judging whether the lock is the lock or not through the UUID, and if the lock is the lock, executing delete to release the lock.
The third method comprises the following steps: database implementation distributed lock
The implementation mode is as follows: utilized are optimistic locks and pessimistic locks, wherein the optimistic lock: and adding a field of a version number into the table, inquiring data with the version number before updating each time, and then carrying a version number condition after a where condition statement when updating, wherein the successful updating indicates that the lock is occupied, and the unsuccessful updating indicates that the lock is not occupied. Pessimistic locks: with a select for update (X-lock)/select in share mode (S-lock), generally more X-locks are used because more write functions are subsequently implemented.
When the distributed lock is acquired, the step of acquiring the distributed lock of the distributed coordination service may be performed, and the local network card is provided with a step of setting a VIP for providing an access service to the outside by the state cluster, that is, the operation of step S103 in the foregoing. Then, the local identification is written into the identification file as the identity identification of the main node, so as to ensure that the latest state information is always recorded in the identification file.
Preferably, considering that the target node may be a master node before the restart, but due to a reason such as an excessively long failure time, when the target node is restarted, a new master node is already in normal operation, and therefore, when the target node is restarted, acquiring the identity file of the distributed coordination service record specifically includes:
step one, after a target node is restarted, judging whether a local network card has VIP or not;
step two, if the VIP exists, deleting the VIP from the local network card, initializing the local state to an initialization state, and setting the initialization state into an identity file;
and step three, acquiring the identity identification file through an interface of the distributed coordination service.
For convenience of description, the above three steps will be described in combination.
In order to avoid two VIPs in the same stateful cluster and destroy the data consistency of the cluster, after the target node is restarted, whether the VIP exists in the local network card or not can be judged, if yes, the VIP is deleted from the local network card, the local state is initialized to the init state, and the state is set in the identity file. And then, acquiring the identity identification file through an interface of the distributed coordination service.
Example two:
in order to facilitate better understanding of the technical solutions provided by the embodiments of the present invention for those skilled in the art, the following takes the example that the distributed coordination service is specifically the etc, and the technical solutions provided by the embodiments of the present invention are described in detail.
It should be noted that the premise of this embodiment is to cancel the load balancing function of the SLB and drop all the service processing to the master node, which has certain requirements on the overall scale of the system and the service pressure, and is only suitable for medium and small-scale service architectures. All services can be automatically started after abnormal exit is realized in a systemd or docker mode. When the cluster is restarted, it is necessary to require that the previous master node can be started normally, for example, the cluster contains nodes 1, 2 and 3, where 1 is the master node, and after the cluster is restarted as a whole, the node 1 must be in place, and the other nodes 2 and 3 may not be in place.
Based on the recovery method for the stateful cluster provided in the first embodiment, a master selection module is designed, that is, the master selection module can implement the recovery method for the stateful cluster provided in the first embodiment.
Referring to fig. 3, fig. 4, fig. 5 and fig. 6, fig. 3 is a schematic operation diagram before restarting a stateful cluster in an embodiment of the present invention, fig. 4 is a schematic restart diagram when restarting the stateful cluster in the embodiment of the present invention, and fig. 5 is a schematic start diagram of a master module in the embodiment of the present invention; fig. 6 is a schematic diagram illustrating operations of a master module on a slave node when a master node fails according to an embodiment of the present invention.
First, a election module is deployed on each node within a stateful cluster. And initializing the master selecting module, for example, any one node in the ETCD can be selected as a master node, that is, a certain node in the cluster is used as an initial master node. After the master module on each node is started, the following steps can be executed:
if the local network card has VIP, deleting the VIP; setting the state of a local machine to be an initialization state, and setting the initialization state to var/run/leader; checking the ID of the host node through the ETCD interface, and judging whether the ID of the host node is the same as the ID of the host node; specifically, if the two nodes are the same, the node is set as a master node in the initialization setting; if not, it means that the present node is set as a slave node.
When the node is set as a main node, a leader lock is acquired through an ETCD interface, the state of the node is modified into a main preparation state, and state information is set into a var/run/leader; then waiting for 5 seconds, informing the node that the standby service is changed into the main state, such as ovn-db, and for the service of multiple masters, the service does not need to be changed, such as galera + mariardb; writing the ID of the local node into the identity of a main node of the ETCD, modifying the state of the local node into a main state, and setting main state information into var/run/leader; and adding the VIP to the network card, and circularly monitoring a leader lock of the ETCD and processing identity change, such as changing the master identity into the slave identity.
When the node is set as a main node, the state of the node is modified into a main waiting state, and the main waiting state is set into a var/run/leader; and circularly inquiring whether the leader lock is acquired or not through the ETCD, if not, indicating that the main node is not started, and after waiting for 1 second, continuing to inquire until the main node is started. When the master node is monitored to be started, the master service of the node is informed to be modified to the standby state, and the modification is invalid for the services of multiple masters and the standby service which is already in the standby state. Then, the native state is modified into a slave state, and the slave state is set into var/run/leader. Accordingly, like the master node, the slave node may also cyclically listen to the leader lock of the ETCD and process identity changes, such as changing the master identity to the slave identity.
The slave node monitors the leader lock of the ETCD circularly, when the fault of the master node occurs, the slave node can compete for the leader lock in the ETCD in a competitive mode, and the slave node can wait for 5 seconds after obtaining the leader lock. Then, the current state is modified to the master ready state and set to var/run/leader. Then, the node is informed that the standby service is changed into the main state, and the service of multiple main nodes does not need to be changed; writing the ID of the local node into the identity of a main node of the ETCD, modifying the state of the local node into a main state, and setting main state information into var/run/leader; and adding the VIP to the network card, and circularly monitoring a leader lock of the ETCD and processing identity change, such as changing the master identity into the slave identity.
In fig. 5 and fig. 6, the quiet time of 5 seconds and the waiting time of 1 second in the circular query can be replaced by other time lengths, but the quiet time is twice the query waiting time. All nodes in the cluster run a 'master-selecting module' (i.e. the master-selecting service shown in fig. 3), which competes for the master identity through a distributed lock of the etc d, sets identity information into a/var/run/leader file (status information includes master, to _ slave, and to _ master), continuously maintains the file, and updates the value of the file when the master identity changes, i.e. changes the status.
The master module can ensure that the identity of the master module before exiting is consistent with the identity after restarting, namely the master module of the master node can still obtain the master identity after exiting and restarting, and the master module of the non-master node can only obtain the non-master identity after exiting and restarting.
And the master selecting module simultaneously realizes the switching of the VIP when changing the master identity and keeps the binding relationship between the node where the VIP is located and the master node.
When the node is restarted, because the/var/run/leader is a memory file, the file does not exist, and each cluster service (such as maridb, rabbitmq-server) needs to wait for the file to initialize and operate again. When the master module is started, the value of/var/run/leader is set according to the steps. When the value is one of slave or master, each cluster service selects different starting modes according to the value.
Referring to fig. 7, fig. 7 is a schematic diagram of a master node restart in an embodiment of the present invention, where an example of the master node starting mariaddb and rabtimq-server is shown, and fig. 8 is a schematic diagram of a slave node restart in an embodiment of the present invention.
For convenience of explanation, the following two scenarios of initialization, main failure, and cluster power-off restart are used to illustrate the steps of implementing maridb and rabbitmq-server cluster restart:
fig. 9 shows node IP planning, and fig. 9 is a schematic diagram of node IP planning in the embodiment of the present invention. It should be noted that the IP in the planning diagram shown in fig. 9 may also be another specific IP address, and is not limited to fig. 9. The start script is as follows:
the starting scripts of the main selecting module, mariardb and rabbitmq-server are as follows: the system comprises a leader _ start.sh, a maridb _ start.sh and a rabbitstart.sh, wherein all three services are managed by a system md and are automatically pulled up again after exception, the maridb of the three nodes form a galera + maridb cluster, and the rabbitmq-server of the three nodes form a rabbitmq cluster.
Firstly, initialization during deployment:
1) setting the KV value of the ETCD during deployment: "master-ID": "hostl"
2) Three nodes start up simultaneously and three services per node start up simultaneously.
3) The/var/run/leader of node 1, 2, 3 does not exist.
4) maridb _ start.sh and rabbit _ start.sh keep the state of waiting for this file:
Figure BDA0001899527050000131
Figure BDA0001899527050000141
5) and when the reader.sh of the node 1 inquires that the value of the master-ID of the etcd is the same as that of the local computer, a lock of the reader is obtained, the lock is called as a master node, the master is written into/var/run/reader, and 10.0.0.254 is added to the network card of 10.0.0.11.
6) After the maridb.sh and the rabbitsh of the node 1 detect that/var/run/leader is master, starting respective service processes according to the main identity respectively as follows:
mariadb.sh:
sed-ie′/^safe_to_bootstrap/s/0/1/′/var/lib/mysql/grastate.dat
mysqld_safe--wsrep-new-cluster
rabbit.sh:
rabbitmqctl force_boot
/usr/sbin/rabbitmq-server
7) sh write slave to/var/run/leader for node 2, 3
8) After detecting that/var/run/leader is slave, the maridb.sh and the rabbitsh of the nodes 2 and 3 start respective service processes according to the backup identities, which are respectively:
mariadb.sh:
sed-ie′/^safe_to_bootstrap/s/1/0/′/var/lib/mysql/grastate.dat
mysqld_safe
rabbit.sh:
/usr/sbin/rabbitmq-server
9) at the moment, the maridb cluster and the rabbitmq cluster are already established, and the client can access the maridb service and the rabbitmq-server service through 10.0.0.254
II, abnormal power failure of the node 1:
1) node 2 competes for the leader lock resource to become the master identity and node 3 maintains the slave (or alternate) identity.
2) Sh change/var/run/leader value to "master" for leader of node 2.
3) Sh changes the KV value of the ETCD of the node 2: "master-ID": "host 2"
4) Sh of node 2 adds 10.0.0.254 to the 10.0.0.12 network card.
5) maridb and rabbitmq clusters are well established and support multi-master reads and writes without changing anything. Clients can access maridb service and rabbitmq-server service through 10.0.0.254.
Thirdly, restarting the whole power failure of the three nodes:
1) the KV value of the ETCD is as follows: "master-ID": "host 2"
The latter operation is the same as "initialization at deployment", since hsot2 is the master node before restart, host2 obtains the master identity at restart, and the other nodes obtain the backup identities.
In this way, traffic on a single node in the cluster can be accessed directly through the VIP. After the nodes in the cluster compete for the primary identity through the distributed coordination component, the node information is written into the shared storage, and when the subsequent cluster is restarted, the primary identity is still obtained by the primary node of the last time. In the service operation process, if the master node is abnormal, other nodes can acquire the identity of the master node and update the new node ID into the shared storage. When the whole cluster service is recovered, the cluster state before shutdown is not tried to be recovered, but the master node is used for booting the cluster forcibly, and the slave node is added into the cluster in the member identity, so that the requirement on the starting sequence during restarting is avoided.
Example three:
corresponding to the above method embodiment, the embodiment of the present invention further provides a stateful cluster restoring apparatus, and the stateful cluster restoring apparatus described below and the stateful cluster restoring method described above may be referred to in correspondence.
Referring to fig. 10, the apparatus includes the following modules:
the identity file acquisition module 101 is used for acquiring the identity file of the distributed coordination service record after the target node is restarted;
the identification judgment module 102 is configured to determine a host node identification by using the identification file, and judge whether the host node identification is the same as the local identification;
the master identity determining module 103 is configured to, if yes, obtain a distributed lock of the distributed coordination service, and set a VIP, in which a state cluster provides an access service to the outside, in the local network card;
and the slave identity determining module 104 is configured to, if the master node does not acquire the distributed lock, join the stateful cluster with the slave node identity, and join the application master identity queue.
By applying the device provided by the embodiment of the invention, after the target node is restarted, the identity identification file of the distributed coordination service record is obtained; determining the identity of the main node by using the identity file, and judging whether the identity of the main node is the same as the local identity; if yes, acquiring a distributed lock of the distributed coordination service, and setting a VIP (virtual IP interface) for providing access service to the outside by the state cluster in a local network card; if not, after the master node acquires the distributed lock, the slave node is added into the stateful cluster and added into the master identity application queue.
Under the condition that only the target node is restarted or the whole cluster is restarted, the target node firstly obtains the identity identification file after being restarted, and then the identity of the master node is determined by utilizing the identity identification file. And judging whether the identity identification of the host node is the same as the local identification, if so, indicating that the target node is the host node before restarting, acquiring the distributed lock, and setting a VIP (very important person) for providing access to the outside by the state cluster in a local network card. The master node is locked in a distributed lock mode, and when the master node fails, other nodes compete for the master identity in a distributed lock competition mode; if the distributed locks are different, the target node is a slave node before restarting, and the slave node can be added into the stateful cluster and the queue applying the master identity after the master node acquires the distributed locks. Because the VIP providing the service to the outside by the stateful cluster is still arranged on the main node after the restart, the restart sequence is not required even when the whole cluster is restarted, and the VIP can be directly rejoined into the cluster according to the master-slave identity of each node in the stateful service before the restart. In addition, if the master node is abnormal in the service operation process, the identity of the master node can be obtained by other nodes in the master identity application queue in a distributed lock competition mode. Therefore, the integrity of the data of the stateful cluster can be guaranteed when the stateful cluster runs, the cluster is restarted or the single node is restarted. Compared with the method for managing the starting and the running of the cluster through an additional Pacemaker module, the device provided by the embodiment of the invention is simple in structure and easy to realize.
In addition, in the embodiment of the invention, the VIP which provides the service to the outside by the state cluster is arranged on the main node, the TCP connection can be directly established between the service and the client, the SLB can be bypassed, and better maintainability and high-efficiency communication can be realized.
In an embodiment of the present invention, the primary identity determining module 103 is specifically configured to set a primary service state and a secondary service state of the local computer to a primary state, and add a VIP to a network card of the local computer.
In one embodiment of the present invention, the method further comprises:
and the cyclic monitoring module is used for cyclically monitoring the distributed lock of the distributed coordination service and the state change message of the master service and the slave service after the slave node is added into the state cluster and the master identity application queue.
In one embodiment of the present invention, the method further comprises:
the master identity competition module is used for acquiring the distributed locks after circularly monitoring the distributed locks of the distributed coordination services and the state change messages of the master services and the slave services, executing the acquisition of the distributed locks of the distributed coordination services, and setting a VIP step of providing access services to the outside by the state cluster in a local network card; and writing the local identification as the identity of the main node into an identity file.
In an embodiment of the present invention, the master identity competition module is specifically configured to acquire the distributed lock in a competition manner.
In a specific embodiment of the present invention, the identification file obtaining module 101 is specifically configured to determine whether a VIP exists in a local network card after a target node is restarted; if yes, deleting the VIP from the local network card, initializing the local state to an initialization state, and setting the state to an identity file; and acquiring the identity identification file through an interface of the distributed coordination service.
In one embodiment of the present invention, the distributed coordination service continuously maintains the identity file, and updates the identity file when the master identity changes; the current state of the target node is stored in the identity file, and is any one of an initialization state, a master preparation state, a master waiting state, a slave preparation state and a slave state.
Example four:
corresponding to the above method embodiment, an embodiment of the present invention further provides a stateful cluster restoring apparatus, and a stateful cluster restoring apparatus described below and a stateful cluster restoring method described above may be referred to in correspondence.
Referring to fig. 11, the stateful cluster restoring apparatus includes:
a memory D1 for storing computer programs;
processor D2, configured to, when executing the computer program, implement the steps of the stateful cluster restoring method of the above-described method embodiment.
Specifically, referring to fig. 12, a schematic structural diagram of a stateful cluster restoring apparatus provided in this embodiment is provided, where the stateful cluster restoring apparatus may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 322 (e.g., one or more processors) and a memory 332, and one or more storage media 330 (e.g., one or more mass storage devices) storing an application 342 or data 344. Memory 332 and storage media 330 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instructions operating on a data processing device. Still further, central processor 322 may be configured to communicate with storage medium 330 to execute a series of instruction operations in storage medium 330 on stateful cluster restoration device 301.
Stateful cluster restoration device 301 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input-output interfaces 358, and/or one or more operating systems 341. Such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
The steps in the stateful cluster restoring method described above may be implemented by the structure of a stateful cluster restoring apparatus.
Example five:
corresponding to the above method embodiment, an embodiment of the present invention further provides a readable storage medium, and a readable storage medium described below and a stateful cluster recovery method described above may be referred to in correspondence.
A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the stateful cluster restoration method of the above-mentioned method embodiment.
The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various other readable storage media capable of storing program codes.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Claims (10)

1. A method for stateful cluster recovery, comprising:
after the target node is restarted, acquiring an identity identification file of the distributed coordination service record;
determining a host node identity by using the identity file, and judging whether the host node identity is the same as a local identity or not;
if yes, acquiring a distributed lock of the distributed coordination service, and setting a VIP (virtual IP interface) for providing access service to the outside by the state cluster in a local network card; wherein the VIP is a virtual IP address;
and if not, after the master node acquires the distributed lock, adding the slave node identity into the stateful cluster and adding the slave node identity into the master identity applying queue.
2. The stateful cluster restoration method according to claim 1, wherein the providing of a VIP for providing access service to outside by the stateful cluster in the local network card comprises:
and setting the state of the master-slave service of the local machine as a master state, and adding the VIP in the local network card.
3. The stateful cluster restoring method according to claim 2, wherein after the joining the stateful cluster with the slave node identity and the joining the application master identity queue, further comprising:
and circularly monitoring the distributed lock of the distributed coordination service and the state change message of the master service and the slave service.
4. The stateful cluster recovery method of claim 3, wherein after the cyclically listening for the distributed lock of the distributed coordination service and the state change message of the master-slave service, further comprising:
acquiring the distributed lock, executing the distributed lock for acquiring the distributed coordination service, and setting a VIP (very important person) for providing access service to the outside by a state cluster in a local network card;
and writing the local identifier into the identity identifier file as the identity identifier of the main node.
5. The stateful cluster recovery method of claim 4, wherein the acquiring the distributed lock comprises:
in a contention manner, the distributed lock is acquired.
6. The stateful cluster recovery method of claim 1, wherein obtaining the identity file of the distributed coordination service record after the target node is restarted comprises:
after the target node is restarted, judging whether the VIP exists in the local network card or not;
if the VIP exists, deleting the VIP from the local network card, initializing the local state to an initialization state, and setting the state into the identity file;
and acquiring the identity identification file through an interface of the distributed coordination service.
7. The stateful cluster restoration method according to any one of claims 1 to 6, wherein the distributed coordination service continuously maintains the identity file, and updates the identity file when a change occurs in a master identity; the current state of the target node is saved in the identification file, and the current state is any one of an initialization state, a master preparation state, a master waiting state, a slave preparation state and a slave state.
8. A stateful cluster restoration apparatus, comprising:
the identity identification file acquisition module is used for acquiring the identity identification file of the distributed coordination service record after the target node is restarted;
the identification judgment module is used for determining the identity of the main node by using the identity file and judging whether the identity of the main node is the same as the local identification;
the master identity determining module is used for acquiring a distributed lock of the distributed coordination service if the master identity determining module is used, and a VIP (very important person) for providing access service to the outside by a state cluster is arranged in a local network card; wherein the VIP is a virtual IP address;
and the slave identity determining module is used for adding the slave node identity into the stateful cluster and adding the slave node identity into the application master identity queue after the master node acquires the distributed lock if the slave node identity is not the stateful cluster.
9. A stateful cluster restoration device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the stateful cluster restoration method of any one of claims 1 to 7 when executing the computer program.
10. A readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the stateful cluster restoration method according to any one of the claims 1 to 7.
CN201811507350.1A 2018-12-10 2018-12-10 State cluster recovery method, device, equipment and readable storage medium Active CN109639794B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811507350.1A CN109639794B (en) 2018-12-10 2018-12-10 State cluster recovery method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811507350.1A CN109639794B (en) 2018-12-10 2018-12-10 State cluster recovery method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN109639794A CN109639794A (en) 2019-04-16
CN109639794B true CN109639794B (en) 2021-07-13

Family

ID=66072601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811507350.1A Active CN109639794B (en) 2018-12-10 2018-12-10 State cluster recovery method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN109639794B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110417600B (en) * 2019-08-02 2022-10-25 秒针信息技术有限公司 Node switching method and device of distributed system and computer storage medium
CN111026807A (en) * 2019-11-25 2020-04-17 深圳壹账通智能科技有限公司 Distributed lock synchronization method and device, computer equipment and readable storage medium
CN111212127A (en) * 2019-12-29 2020-05-29 浪潮电子信息产业股份有限公司 Storage cluster, service data maintenance method, device and storage medium
CN111131492A (en) * 2019-12-31 2020-05-08 中国联合网络通信集团有限公司 Node access method and system
CN111405015B (en) * 2020-03-09 2022-09-30 中国建设银行股份有限公司 Data processing method, device, equipment and storage medium
CN111400107B (en) * 2020-04-21 2023-03-03 贵州新致普惠信息技术有限公司 Self-starting recovery system and method for database multi-master cluster
CN111949214B (en) * 2020-08-07 2022-07-26 苏州浪潮智能科技有限公司 Disk misoperation method, system, equipment and medium for preventing HANA cluster
CN111970148A (en) * 2020-08-14 2020-11-20 北京金山云网络技术有限公司 Distributed task scheduling method and system
CN112073265B (en) * 2020-08-31 2022-05-13 帷幄匠心科技(杭州)有限公司 Internet of things monitoring method and system based on distributed edge computing
CN112037873B (en) * 2020-08-31 2022-09-13 合肥工业大学 Single-point optimization method based on cluster selection and consensus mechanism
CN112272220B (en) * 2020-10-16 2022-05-13 苏州浪潮智能科技有限公司 Cluster software start control method, system, terminal and storage medium
CN112269683B (en) * 2020-10-22 2022-12-06 苏州浪潮智能科技有限公司 Off-line node on-line service recovery method and related components
CN112269693B (en) * 2020-10-23 2024-03-01 北京浪潮数据技术有限公司 Node self-coordination method, device and computer readable storage medium
CN112202687B (en) 2020-12-03 2021-05-25 苏州浪潮智能科技有限公司 Node synchronization method, device, equipment and storage medium
CN112887367B (en) * 2021-01-11 2022-11-01 华云数据控股集团有限公司 Method, system and computer readable medium for realizing high availability of distributed cluster
CN113326511B (en) * 2021-06-25 2024-04-09 深信服科技股份有限公司 File repair method, system, equipment and medium
CN113949691A (en) * 2021-10-15 2022-01-18 湖南麒麟信安科技股份有限公司 ETCD-based virtual network address high-availability implementation method and system
CN113660350A (en) * 2021-10-18 2021-11-16 恒生电子股份有限公司 Distributed lock coordination method, device, equipment and storage medium
CN114124903A (en) * 2021-11-15 2022-03-01 新华三大数据技术有限公司 Virtual IP address management method and device
CN114501094A (en) * 2022-02-09 2022-05-13 浙江博采传媒有限公司 Sequence playing method and device based on virtual production and storage medium
CN114900449B (en) * 2022-03-30 2024-02-23 网宿科技股份有限公司 Resource information management method, system and device
CN115277114B (en) * 2022-07-08 2023-07-21 北京城市网邻信息技术有限公司 Distributed lock processing method and device, electronic equipment and storage medium
CN115277712B (en) * 2022-07-08 2023-07-25 北京城市网邻信息技术有限公司 Distributed lock service providing method, device and system and electronic equipment
CN115269248B (en) * 2022-07-28 2023-08-08 安超云软件有限公司 Method and device for preventing brain fracture under double-node cluster, electronic equipment and storage medium
CN115484139B (en) * 2022-09-02 2024-03-15 武汉众智数字技术有限公司 Video strategy management decentralization method based on video network monitoring
CN115617917B (en) * 2022-12-16 2023-03-10 中国西安卫星测控中心 Method, device, system and equipment for controlling multiple activities of database cluster

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467508A (en) * 2010-11-04 2012-05-23 中兴通讯股份有限公司 Method for providing database service and database system
CN102868736A (en) * 2012-08-30 2013-01-09 浪潮(北京)电子信息产业有限公司 Design and implementation method of cloud computing monitoring framework, and cloud computing processing equipment
CN103763378A (en) * 2014-01-24 2014-04-30 中国联合网络通信集团有限公司 Task processing method and system and nodes based on distributive type calculation system
CN105159818A (en) * 2015-08-28 2015-12-16 东北大学 Log recovery method in memory data management and log recovery simulation system in memory data management
CN106656624A (en) * 2017-01-04 2017-05-10 合肥康捷信息科技有限公司 Optimization method based on Gossip communication protocol and Raft election algorithm
CN106713250A (en) * 2015-11-18 2017-05-24 杭州华为数字技术有限公司 Data access method and device based on distributed system
CN106844092A (en) * 2016-12-09 2017-06-13 武汉烽火信息集成技术有限公司 A kind of method of the MariaDB Galera Cluster of automatic recovery power down
CN106911524A (en) * 2017-04-27 2017-06-30 紫光华山信息技术有限公司 A kind of HA implementation methods and device
CN108881489A (en) * 2018-08-03 2018-11-23 高新兴科技集团股份有限公司 A kind of coordination system and method for Distributed Services

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467508A (en) * 2010-11-04 2012-05-23 中兴通讯股份有限公司 Method for providing database service and database system
CN102868736A (en) * 2012-08-30 2013-01-09 浪潮(北京)电子信息产业有限公司 Design and implementation method of cloud computing monitoring framework, and cloud computing processing equipment
CN103763378A (en) * 2014-01-24 2014-04-30 中国联合网络通信集团有限公司 Task processing method and system and nodes based on distributive type calculation system
CN105159818A (en) * 2015-08-28 2015-12-16 东北大学 Log recovery method in memory data management and log recovery simulation system in memory data management
CN106713250A (en) * 2015-11-18 2017-05-24 杭州华为数字技术有限公司 Data access method and device based on distributed system
CN106844092A (en) * 2016-12-09 2017-06-13 武汉烽火信息集成技术有限公司 A kind of method of the MariaDB Galera Cluster of automatic recovery power down
CN106656624A (en) * 2017-01-04 2017-05-10 合肥康捷信息科技有限公司 Optimization method based on Gossip communication protocol and Raft election algorithm
CN106911524A (en) * 2017-04-27 2017-06-30 紫光华山信息技术有限公司 A kind of HA implementation methods and device
CN108881489A (en) * 2018-08-03 2018-11-23 高新兴科技集团股份有限公司 A kind of coordination system and method for Distributed Services

Also Published As

Publication number Publication date
CN109639794A (en) 2019-04-16

Similar Documents

Publication Publication Date Title
CN109639794B (en) State cluster recovery method, device, equipment and readable storage medium
US9405530B2 (en) System and method for supporting patching in a multitenant application server environment
CN109491776B (en) Task arranging method and system
EP3127018B1 (en) Geographically-distributed file system using coordinated namespace replication
US5822531A (en) Method and system for dynamically reconfiguring a cluster of computer systems
CN111131146B (en) Multi-supercomputing center software system deployment and incremental updating method in wide area environment
US8533525B2 (en) Data management apparatus, monitoring apparatus, replica apparatus, cluster system, control method and computer-readable medium
CN109173270B (en) Game service system and implementation method
CN110569085A (en) configuration file loading method and system
CN114064414A (en) High-availability cluster state monitoring method and system
CN110543335A (en) Application program configuration management method and system
CN114116912A (en) Method for realizing high availability of database based on Keepalived
WO2018004403A1 (en) Managing a lifecycle of a software container
CN111818188B (en) Load balancing availability improving method and device for Kubernetes cluster
CN116132530A (en) Method for realizing MQTT Broker server by applying Raft algorithm based on Netty framework
Cisco CWM to CWM Communications
CN114124700A (en) Cluster parameter configuration method and device, electronic equipment and readable storage medium
US10608882B2 (en) Token-based lightweight approach to manage the active-passive system topology in a distributed computing environment
Masetti et al. Increasing Availability by Implementing Software Redundancy in the CMS Detector Control System
CN111124428A (en) Application automatic publishing method based on middleware creating and related device
CN114356214B (en) Method and system for providing local storage volume for kubernetes system
CN116431291B (en) Deployment method, system, equipment and storage medium of virtualization management platform
CN114915545B (en) Application scheduling deployment management method based on DHCP network cluster
CN115617917B (en) Method, device, system and equipment for controlling multiple activities of database cluster
Eberhardt et al. Smac: State management for geo-distributed containers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant