CN109639794A - A kind of stateful cluster recovery method, apparatus, equipment and readable storage medium storing program for executing - Google Patents
A kind of stateful cluster recovery method, apparatus, equipment and readable storage medium storing program for executing Download PDFInfo
- Publication number
- CN109639794A CN109639794A CN201811507350.1A CN201811507350A CN109639794A CN 109639794 A CN109639794 A CN 109639794A CN 201811507350 A CN201811507350 A CN 201811507350A CN 109639794 A CN109639794 A CN 109639794A
- Authority
- CN
- China
- Prior art keywords
- node
- cluster
- state
- identity
- stateful
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/51—Discovery or management thereof, e.g. service location protocol [SLP] or web services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0668—Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/50—Network service management, e.g. ensuring proper service fulfilment according to agreements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/50—Network service management, e.g. ensuring proper service fulfilment according to agreements
- H04L41/5041—Network service management, e.g. ensuring proper service fulfilment according to agreements characterised by the time relationship between creation and deployment of a service
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/50—Address allocation
- H04L61/5007—Internet protocol [IP] addresses
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/16—Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
- H04L69/163—In-band adaptation of TCP data exchange; In-band control procedures
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a kind of stateful cluster recovery methods, method includes the following steps: obtaining the identity file of distributed coordination service log after destination node is restarted;Determine that primary node identity identifies using identity file, and it is identical to judge that primary node identity mark is identified whether with the machine;If it is, obtaining the distributed lock of distributed coordination service, and state cluster is provided in the machine network interface card, the VIP of access service is externally provided;If it is not, then stateful cluster is added from node identities, and application owner identification queue is added after host node obtains distributed lock.Can stateful cluster run when, when cluster reboot or when single node is restarted, ensure stateful company-data integrality.The invention also discloses a kind of stateful cluster recovery device, equipment and readable storage medium storing program for executing, have corresponding technical effect.
Description
Technical field
The present invention relates to computer application technologies, more particularly to a kind of stateful cluster recovery method, apparatus, set
Standby and readable storage medium storing program for executing.
Background technique
In the IT systems such as cloud computing, big data, artificial intelligence, have the service memories of many keys the core number of business
According to their normal operation is the premise of system stable operation, in order to solve Single Point of Faliure and data loss problem, is generally used
The method of multiple node redundancy backups forms a cluster, unified externally to provide service.There are the services of variable data to claim for these
For stateful service.Such as the mariadb cluster of the Galera technology of database service, active and standby cluster, the mongo of ovn-db
Active and standby cluster, such as active and standby cluster of rabbitmq-server as message forwarding services.When the node for the service that provides is abnormal (such as
Power-off, Network Abnormal) after, the service of other nodes can work on.
Multiple nodes are formed with state cluster, and each node saves respective data, by cluster heartbeat with it is synchronous come
Guarantee the data consistency of each node.Some clusters are multiple nodes while providing literacy, as Galera-Mariadb,
rabbitmq-server;Some clusters are divided into master+slave role, and only master node provides literacy,
Slave can only provide reading ability.In terms of cluster recovery, above-mentioned every kind of cluster can easily solve Single Point of Faliure and be added again
The problem of.But if the multiple nodes of cluster are abnormal (such as power-off, network oscillation), in addition it is all abnormal or inside the plan whole
After body shutdown (as the purpose in maintenance closes cluster), then cluster recovery is normal, it is exactly relatively difficult thing.Especially
When requiring all to power on the scene restored automatically, problem is more prominent.It is in particular in: when cluster reboot, the last one extension
The node fallen should first restart, to guarantee that data are most complete most complete.That is, when whole cluster reboot, it is necessary to one
Which node is a arbitration modules, which choose, first starts, this is often depending on sequence when cluster is closed, and arbitration modules can pass through detection
Mode choose out which node is finally closed, and then allows the node first to start, to guarantee data integrity (because last
The node of closing just has most complete data, and the data for the node closed in advance are possible to incomplete).As shown in Figure 1
(boot sequence is opposite with shut-in time sequence with the sequence for illustrating hollow arrow), it is common as pacemaker passes through
The mode of the agent starting mariadb cluster of mariadb, is exactly so.But pass through additional Pacemaker module management
Starting and running for cluster has the disadvantage that
Disadvantage one: pacemaker itself relies on corosync, and the latter's stability in network oscillation is poor, also increases
System complexity.
Disadvantage two: pacemaker needs to configure agent when managing each business module, and the realization of each agent is respectively not
It is identical, in business module edition upgrading, in fact it could happen that agent incompatible situation.
Disadvantage three: pacemaker is suitble to the primary start-up mode of business module, but for the business module after containerization,
Pacemaker is helpless.
Disadvantage four: pacemaker state machine is complicated, and agent realization and business module coupling it is very high, cause to safeguard
It is difficult.
In conclusion when how to efficiently solve cluster reboot, the problems such as ensureing data integrity, be current this field skill
Art personnel technical problem urgently to be solved.
Summary of the invention
The object of the present invention is to provide a kind of stateful cluster recovery method, apparatus, equipment and readable storage medium storing program for executing, with
When cluster reboot, data integrity is ensured.
In order to solve the above technical problems, the invention provides the following technical scheme:
A kind of stateful cluster recovery method, comprising:
After destination node is restarted, the identity file of distributed coordination service log is obtained;
It determines that primary node identity identifies using the identity file, and judges the primary node identity mark and the machine
It identifies whether identical;
If it is, obtaining the distributed lock of the distributed coordination service, and state set is provided in the machine network interface card
Group externally provides the VIP of access service;
If it is not, then after host node obtains the distributed lock, the stateful cluster is added from node identities, and
Application owner identification queue is added.
Preferably, the state cluster that is provided in the machine network interface card externally provides the VIP of access service, comprising:
Principal and subordinate's service state setting of the machine is main state, and adds the VIP in the machine network interface card.
Preferably, it described the stateful cluster is added from node identities, and is added after application owner identification queue,
Further include:
Circulation monitors the distributed lock of the distributed coordination service and the Status Change message of principal and subordinate service.
Preferably, it is serviced in the distributed lock of the circulation monitoring distributed coordination service and the principal and subordinate
After Status Change message, further includes:
The distributed lock is obtained, executes the distributed lock for obtaining the distributed coordination service, and in the machine net
The step of state cluster externally provides the VIP of access service is provided in card;
Described the machine is identified, the identity file is written as primary node identity mark.
It is preferably, described to obtain the distributed lock, comprising:
With competitive way, the distributed lock is obtained.
Preferably, after the destination node is restarted, the identity file of distributed coordination service log is obtained, comprising:
After the destination node is restarted, judge in the machine network interface card with the presence or absence of the VIP;
If it is present the VIP is deleted from the machine network interface card, and local state is initialized as initialization shape
State, and be arranged into the identity file;
The interface serviced by the distributed coordination obtains the identity file.
Preferably, identity file described in the distributed coordination service persistence maintenance, when owner identification changes
Update the identity file;Wherein, the current state of the destination node is saved in the identity file, it is described
Current state be init state, main preparation state, major state, main wait state, from preparing state and from any in state
It is a kind of.
A kind of stateful cluster recovery device, comprising:
Identity file acquisition module after restarting for destination node, obtains the identity of distributed coordination service log
Identify file;
Judgment module is identified, for determining that primary node identity identifies using the identity file, and judges the master
Node identities mark identifies whether identical with the machine;
Owner identification determining module, for if it is, obtain the distributed lock of the distributed coordination service, and in the machine
It is provided with state cluster in network interface card, the VIP of access service is externally provided;
From identity determining module, it is used for if it is not, then after host node obtains the distributed lock, to add from node identities
Enter the stateful cluster, and application owner identification queue is added.
A kind of stateful cluster recovery equipment, comprising:
Memory, for storing computer program;
Processor, the step of above-mentioned stateful cluster recovery method is realized when for executing the computer program.
A kind of readable storage medium storing program for executing is stored with computer program, the computer program quilt on the readable storage medium storing program for executing
The step of processor realizes above-mentioned stateful cluster recovery method when executing.
Using method provided by the embodiment of the present invention, after destination node is restarted, distributed coordination service log is obtained
Identity file;It determines that primary node identity identifies using identity file, and judges primary node identity mark and the machine mark
Whether identical know;If it is, obtaining the distributed lock of distributed coordination service, and state cluster is provided in the machine network interface card
The VIP of access service is externally provided;If it is not, then after host node obtains distributed lock, it is stateful to be added from node identities
Cluster, and application owner identification queue is added.
In the case that either only destination node single-point is restarted, or entire cluster is restarted, destination node is being restarted
Afterwards, it obtains first and takes identity file, then determine primary node identity using identity file.Judge primary node identity mark
Whether knowledge is identical as the machine mark, if they are the same, then shows that destination node is host node before restarting, then obtains distributed lock, and
It is provided with state cluster in the machine network interface card, the VIP of access access is externally provided.Wherein, the mode locked in a distributed manner locks main section
Point, can be in host node failure, other nodes compete owner identification by way of competing distributed lock;If it is different, then showing mesh
Mark node be before restarting from node, then can be after host node gets distributed lock, stateful collection is added from node identities
Group, and be added in the queue of application owner identification.Since the VIP that stateful cluster externally provides service is still arranged in after restart
On host node, even thus when entire cluster reboot, can be directly according to each node before restarting without requiring to restart sequence
Principal and subordinate's identity in stateful service rejoins cluster.In addition, if host node is abnormal during service operation,
Primary node identity can be obtained by way of competing distributed lock by other nodes in application owner identification queue.In this way, can be
When stateful cluster is run, when cluster reboot or when single node is restarted, ensure stateful company-data integrality.Compared to logical
Starting and running for additional Pacemaker module management cluster is crossed, method provided by the embodiment of the present invention, framework is simple,
It is easy to accomplish.
In addition, in the embodiment of the present invention by stateful cluster externally provide service VIP setting on the primary node, service with
TCP connection can be directly established between client, can get around SLB, realize preferably maintainable and efficient communication.
Correspondingly, the embodiment of the invention also provides stateful clusters corresponding with above-mentioned stateful cluster recovery method
Recovery device, equipment and readable storage medium storing program for executing, have above-mentioned technique effect, and details are not described herein.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is that a kind of existing stateful cluster recovery method restarts schematic diagram;
Fig. 2 is a kind of implementation flow chart of stateful cluster recovery method in the embodiment of the present invention;
Fig. 3 is the operation schematic diagram in the embodiment of the present invention before stateful cluster reboot;
Schematic diagram is restarted when Fig. 4 is stateful cluster reboot in the embodiment of the present invention;
Fig. 5 is the starting schematic diagram that main module is selected in the embodiment of the present invention;
Fig. 6 is in the embodiment of the present invention when host node failure, from the operation schematic diagram for selecting main module on node;
Fig. 7 is a kind of schematic diagram that host node is restarted in the embodiment of the present invention;
Fig. 8 is a kind of schematic diagram restarted from node in the embodiment of the present invention;
Fig. 9 is a kind of node IP planning schematic diagram in the embodiment of the present invention;
Figure 10 is a kind of structural schematic diagram of stateful cluster recovery device in the embodiment of the present invention;
Figure 11 is a kind of structural schematic diagram of stateful cluster recovery equipment in the embodiment of the present invention;
Figure 12 is a kind of structural schematic diagram of stateful cluster recovery equipment in the embodiment of the present invention.
Specific embodiment
Core of the invention is to provide a kind of stateful cluster recovery method,
In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description
The present invention is described in further detail.Obviously, described embodiments are only a part of the embodiments of the present invention, rather than
Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise
Under every other embodiment obtained, shall fall within the protection scope of the present invention.
Embodiment one:
Referring to FIG. 2, Fig. 2 is a kind of flow chart of stateful cluster recovery method in the embodiment of the present invention, this method can
Applied to each of stateful cluster node, method includes the following steps:
After S101, destination node are restarted, the identity file of distributed coordination service log is obtained.
It should be noted that destination node in embodiments of the present invention can be any one target in stateful cluster
Node.The reason of destination node is restarted can for power-down rebooting, failure restart or other it is planned restart, in addition, destination node
Restarting can restart for single-point, each node when can also integrally restart for entire stateful cluster.That is, working as stateful cluster reboot
When, each node can be considered destination node, and execute restoration methods provided by the embodiment of the present invention.Wherein, stateful
Cluster concretely mariadb+galera, rabbitmq-server, ovn-db, mongodb.In addition, institute of the embodiment of the present invention
The stateful cluster recovery method provided could be applicable to the restarting in recovery of stateful clustered software, specific stateful cluster
Software rejuvenation mode is referring to stateful cluster recovery method described herein.
After destination node is restarted, the identity file of distributed coordination service log is obtained first.Specifically, distributed association
Being taken after mixing with liquid business can be the common distributed coordination services such as ETCD, consul, zookeeper, not repeat one by one herein.
Wherein, distributed coordination services persistence maintenance identity file, and identity mark is updated when owner identification changes
Know file;Wherein, the current state of destination node is saved in identity file, current state is init state, main standard
Standby state, major state, main wait state, from preparing state and from any one in state.Wherein, it can indicate initial with init
Change state, master indicate major state, and to_master indicates main preparation state, and wait_master indicates main wait state, to_
Slave indicates that from preparing state (or standby preparation state), slave is indicated from state (or being referred to as standby state).Work as target
When the state of node changes, the current state of itself can be written in identity file.Specifically, the identity is literary
Part can be specially/var/run/leader file.
S102, it determines that primary node identity is identified using identity file, and judges primary node identity mark and the machine mark
Whether identical know.
It can judge that the node ID is identical as the machine ID from whether the corresponding node ID of reading major state in file is identified.Such as
Fruit is identical, then it is identical to show that primary node identity mark is identified with this.Certainly, it is and entire cluster if destination node is this time restarted
Restart together, can also reading the current state that the machine identifies from mark file, (current state here refers in particular to restart preceding distribution
The current state for the destination node that formula coordination service finally saves), if current state is main state, it is also visual based on save identity mark
It is identical as the machine mark whether knowledge identifies, if current state is not major state, it is different from the machine mark to be considered as owner identification mark.
In this way, can be by judging result, before determining stateful cluster reboot, whether destination node is before restarting host node, or really
Node set the goal after going offline, if there are new host nodes.
If it is judged that be it is yes, that is, show that destination node is still host node;If judging result be it is no, that is, show target section
Point is from node.According to different judging results, stateful cluster is added with different identity, i.e., if so, thening follow the steps
The operation of S103;If not, thening follow the steps the operation of S104.
S103, the distributed lock for obtaining distributed coordination service, and be provided with state cluster in the machine network interface card and externally mention
For the VIP of access service.
Primary node identity is locked by the way of distributed lock in embodiments of the present invention, i.e., when determining based on destination node
When node, then the distributed lock of distributed coordination service can be obtained.Then, state cluster can be provided in the machine network interface card external
The VIP (Virtual IP Address, virtual ip address) for providing access access is virtual IP address.VIP and agency service
The real IP address of device is different, be by proxy server according to Internet internal customer's machine number, give virtual ip address
A range, and distribute to one virtual ip address of each client computer by certain regulation, can realize in this way client computer with
Internet's is indirectly connected.VIP is primarily used to carry out the switching between different hosts, and the principal and subordinate for being used primarily in server cuts
It changes.Specifically, the specific implementation process to be accessed using VIP, can refer to the VIP in SLB service common at present, i.e. client
When end accesses access service provided by stateful cluster, accessed by the VIP.On the primary node by VIP deployment, i.e., will
Service or business processing are all fallen on host node, cancel SLB.It should be noted that being deployed in embodiments of the present invention
Business or service in state cluster refer both to open automatically after capable of realizing abnormal exit by way of systemd or docker
Dynamic business or service.Distributed lock and host node are locked, can be when host node break down, other nodes pass through competing
The mode of lock is striven, primary node identity is obtained.
Specifically, also principal and subordinate's service state of the machine can be arranged and be main state when adding VIP.That is, in the machine network interface card
Principal and subordinate's service state in this node can be arranged and be main state by middle addition VIP, externally pass through access service will pass through VIP,
And other are managed from the service on node.Wherein, it is optional with major state and from two kinds of state that principal and subordinate, which takes state,
The service state of state such as has major state and services from the ovn-db of state, and the service state on some node is (in portion
Administration is main state on the primary node, and being deployed in from node is then from state).
S104, after host node obtains distributed lock, stateful cluster is added from node identities, and the main body of application is added
Part queue.
After host node gets distributed lock, whether destination node can be added stateful cluster from node, with this
Meanwhile destination node can also be added in application owner identification queue.To obtain primary node identity after host node breaks down,
Ensure the service stability of stateful cluster.
Using method provided by the embodiment of the present invention, after destination node is restarted, distributed coordination service log is obtained
Identity file;It determines that primary node identity identifies using identity file, and judges primary node identity mark and the machine mark
Whether identical know;If it is, obtaining the distributed lock of distributed coordination service, and state cluster is provided in the machine network interface card
The VIP of access service is externally provided;If it is not, then after host node obtains distributed lock, it is stateful to be added from node identities
Cluster, and application owner identification queue is added.
In the case that either only destination node single-point is restarted, or entire cluster is restarted, destination node is being restarted
Afterwards, it obtains first and takes identity file, then determine primary node identity using identity file.Judge primary node identity mark
Whether knowledge is identical as the machine mark, if they are the same, then shows that destination node is host node before restarting, then obtains distributed lock, and
It is provided with state cluster in the machine network interface card, the VIP of access access is externally provided.Wherein, the mode locked in a distributed manner locks main section
Point, can be in host node failure, other nodes compete owner identification by way of competing distributed lock;If it is different, then showing mesh
Mark node be before restarting from node, then can be after host node gets distributed lock, stateful collection is added from node identities
Group, and be added in the queue of application owner identification.Since the VIP that stateful cluster externally provides service is still arranged in after restart
On host node, even thus when entire cluster reboot, can be directly according to each node before restarting without requiring to restart sequence
Principal and subordinate's identity in stateful service rejoins cluster.In addition, if host node is abnormal during service operation,
Primary node identity can be obtained by way of competing distributed lock by other nodes in application owner identification queue.In this way, can be
When stateful cluster is run, when cluster reboot or when single node is restarted, ensure stateful company-data integrality.Compared to logical
Starting and running for additional Pacemaker module management cluster is crossed, method provided by the embodiment of the present invention, framework is simple,
It is easy to accomplish.
In addition, in the embodiment of the present invention by stateful cluster externally provide service VIP setting on the primary node, service with
TCP connection can be directly established between client, can get around SLB, realize preferably maintainable and efficient communication.
It should be noted that based on the above embodiment, the embodiment of the invention also provides be correspondingly improved scheme.Excellent
It can mutually be referred between step or corresponding steps same with the above-mentioned embodiment involved in choosing/improvement embodiment, it is corresponding beneficial
Effect can also be cross-referenced, no longer goes to live in the household of one's in-laws on getting married one by one also in preferred/improvement embodiment of this paper.
Preferably, it is contemplated that cluster in the process of running, it is possible that the problems such as host node failure, to avoid because of cluster
Host node is lacked, therefore the embodiment of the invention also provides following solutions:
After executing step S104, i.e., stateful cluster is added from node identities, and application owner identification team is added
After column, it can also carry out following steps:
Step 1: circulation monitors the distributed lock of distributed coordination service and the Status Change message of principal and subordinate's service;
Step 2: obtaining distributed lock, the distributed lock for obtaining distributed coordination service is executed, and set in the machine network interface card
The state cluster of being equipped with externally provides the step of VIP of access service;
Step 3: by the machine mark as primary node identity mark write-in identity file.
It is described in detail for ease of description, below combining above three step.
Circulation monitors the distributed lock of distributed coordination service, can learn the operating status in cluster in real time, such as main section
Situations such as whether point is normal.It obtains principal and subordinate's service state and changes message, the state of principal and subordinate's service of the machine can be adjusted in time, so as to
Access service is preferably externally provided.When host node failure, distributed lock can be obtained with competitive way.Specifically it may is that
Mode one: zookeeper realizes distributed lock
Implementation is realized shared lock including the use of the uniqueness of nodename and is realized altogether using temporal order's node
Enjoy lock.Wherein, the algorithm thinking of shared lock is realized using the uniqueness of nodename: utilizing title uniqueness, locking operation
When, it is only necessary to all nodes creation/test/Lock node together, only one is created successfully, and winner is locked.When unlock,
Deletion/test/Lock node is only needed, remaining node is again introduced into competition creation node, until all nodes are all locked.It utilizes
The algorithm thinking of temporal order's node realization shared lock: for locking operation, can allow all nodes all go/lock catalogue under create
Build temporal order's node, if creation node find itself creation sequence node number be /lock/ catalogue under the smallest node,
Then locked.Otherwise, the node smaller than the sequence number of oneself creation node (the maximum section smaller than the node of oneself creation is monitored
Point), into waiting.
Mode two: redis realizes distributed lock
Redis realizes that distributed lock is ordered mainly by four: setnx (it is optimistic locking that set ifnot exits, which maintains):
It is just value for key setting value when key is not present.The difference of setnx and set: set is then to go to cover there are key
value;Setnx is that there is no key, then gives key and value assignment again;Getset: old value, and set are obtained according to key
New value;Expire: setting expired time;Del: it deletes.Specific implementation is locked when obtaining lock using setnx,
And a time-out time is added for lock using expire order, it is more than that the time, then release was locked automatically, the value value of lock is one
The UUID generated at random is judged when discharging and locking by this.In addition, also setting up an acquisition when obtaining lock
Time-out time, if be more than this time if abandon obtain lock.Discharge lock when, by UUID judgement whether the lock,
If the lock, then executes delete and carry out lock release.
Mode three: database realizing distributed lock
Implementation: what is utilized is optimistic locking and Pessimistic Locking, wherein optimistic locking: the field of version number is added in table, often
The data with version number are all first inquired before secondary update, band version number's condition after where conditional statement when then update again,
It is updated successfully expression lock to have occupied, the unsuccessful expression lock of update does not have occupied.Pessimistic Locking: select...for update is utilized
(X lock)/select...lock in share mode (S lock), it is in general more with X lock, because of subsequent more function that can write
The realization of energy.
When getting distributed lock, the distributed lock for obtaining distributed coordination service can be performed, and in the machine network interface card
The state cluster of being provided with externally provides the step of VIP of access service, the i.e. operation of above step S103.Then, it incite somebody to action this
Machine mark is written in identity file as primary node identity mark, to ensure that identity file always records newest shape
State information.
Preferably, it is contemplated that destination node may be before the restart host node, but due to fault time is too long etc.,
In destination node restarting, have new host node and working normally, therefore, in destination node restarting, is obtaining
The identity file for taking distributed coordination service log, specifically includes:
Step 1: judging in the machine network interface card after destination node is restarted with the presence or absence of VIP;
Step 2: if it is present VIP is deleted from the machine network interface card, and initializing local state is init state,
And it is arranged into identity file;
Step 3: obtaining identity file by the interface of distributed coordination service.
For ease of description, above three step is combined below and is illustrated.
In order to avoid occurring two VIP in the same stateful cluster, the data consistency of cluster is destroyed, therefore in target
After node is restarted, can first it judge with the presence or absence of VIP in the machine network interface card, if it is present the VIP is deleted from the machine network interface card,
And initializing local state is init state, and is arranged into identity file.Then, connecing by distributed coordination service
Mouthful, obtain identity file.
Embodiment two:
Technical solution provided by the embodiment of the present invention is better understood for the ease of those skilled in the art, below to divide
For cloth coordination service is specially ETCD, it is provided for the embodiments of the invention technical solution and is described in detail.
It should be noted that the premise of the present embodiment is to cancel the load-balancing function of SLB, and business processing is all fallen
Onto host node, this also has certain requirement to system overall size and traffic pressure, is only suitable for the business frame of middle-size and small-size scale
Structure.The mode that each business can be transferred through systemd or docker realizes abnormal starting automatic after exiting.It must when cluster reboot
Host node before need asking can normally start, if cluster contains node 1,2,3, and wherein 1 is host node, cluster is integrally restarted
Afterwards, node 1 must be in place, other 2 and 3 can not be in place.
Main module, i.e. choosing master are selected in stateful cluster recovery method provided by one based on the above embodiment, design one
Module can realize stateful cluster recovery method provided by above-described embodiment one.
Fig. 3, Fig. 4, Fig. 5 and Fig. 6 are please referred to, Fig. 3 is the operation signal in the embodiment of the present invention before stateful cluster reboot
Figure restarts schematic diagram when Fig. 4 is stateful cluster reboot in the embodiment of the present invention, and Fig. 5 is to select main mould in the embodiment of the present invention
The starting schematic diagram of block;Fig. 6 be the embodiment of the present invention in when host node failure, illustrate from the operation for selecting main module on node
Figure.
Firstly, main module is selected in deployment on each node in stateful cluster.And to selecting main module to initialize
Setting, such as can select any one node as host node in ETCD, i.e., using some node in cluster as initial main
Node.On each node select main module to start after, executable following steps:
If there are VIP on the machine network interface card, which is deleted;Init state is set by local state, and should
Init state is arranged into var/run/leader;Host node ID is checked by ETCD interface, and judges host node ID and this
Whether node ID is identical;Specifically, showing that this node is to be arranged to host node in Initialize installation if identical;Such as
Fruit is different, then it represents that this node is arranged to from node.
When this node is arranged to host node, leader lock is obtained by ETCD interface, and local state is revised as
Main preparation state, and status information is arranged into var/run/leader;Then it waits 5 seconds, notifies the standby service of this node
Change is main state, and such as ovn-db, mostly main service is then not necessarily to change, such as galera+mariadb;By localized nodes
ID is written in the primary node identity of ETCD, local state modification is main state, and major state information is arranged to var/
In run/leader;VIP is added to network interface card, and recycles the leader lock for monitoring ETCD and processing identity change, will such as be led
Identity is changed to from identity.
When this node is arranged to host node, local state is modified as main wait state, and by the main wait state
It is arranged into var/run/leader;Whether it has been acquired by ETCD cyclic query leader lock, if not being acquired also, table
Bright host node is also inactive, after waiting 1 second, continues to inquire, until host node starts.After monitoring host node starting, lead to
Know that the main service of this node is revised as standby state, for mostly main service and has been then to be repaired in vain for the standby service of state
Change.Then, local state is revised as from state, and will be arranged from state into var/run/leader.Correspondingly, with main section
Point is the same, from node, is also recycled the leader lock for monitoring ETCD and processing identity change, such as owner identification is changed to from body
Part.
The leader lock for monitoring ETCD is recycled from node, when host node failure breaks down, can be competed from node
Mode, the leader lock fought in ETCD can be waited for peacefully 5 seconds after obtaining leader lock.Then, current state is revised as
Main preparation state, and be arranged into var/run/leader.Then, notify the standby service of this node is changed to be main state, for
Mostly main service is then not necessarily to change;Localized nodes ID is written in the primary node identity of ETCD, local state is revised as
Major state, and major state information is arranged into var/run/leader;VIP is added to network interface card, and recycles and monitors ETCD's
Owner identification, is such as changed to from identity by leader lock and processing identity change.
Wherein, the silence in Fig. 5 and Fig. 65 seconds and other durations be can be replaced within waiting 1 second when cyclic query, but
Silence period will be two times of the inquiry waiting time.In all nodes operation " selecting main module " (choosing master i.e. shown in Fig. 3 of cluster
Service), this module competes owner identification by the distributed lock of ETCD, and/var/run/leader text is arrived in identity information setting
In part (status information includes: master, to_slave, slave, to_master), and persistence maintenance this file, work as owner identification
The value of this file is updated when changing, i.e. change state.
Identity after selecting main module to can guarantee the identity before the module exits and restart is consistent, i.e. the choosing master of host node
Module exits restart after still can obtain owner identification, non-master select main module to exit and restart after, non-owner identification can only be obtained.
It selects main module when changing owner identification while realizing the switching of VIP, node where keeping VIP and host node binding are closed
System.
When node is restarted, since/var/run/leader is memory file, file is simultaneously not present, each group service
(such as mariadb, rabbitmq-server) will wait the initialization of this file to operate again.It, can be according to above-mentioned when main module being selected to start
Step setting/var/run/leader value.When this value is slave or master a period of time, each group service is according to this value
To select different Starting mode.
Referring to FIG. 7, Fig. 7 is a kind of schematic diagram that host node is restarted in the embodiment of the present invention, host node starts mariadb
Starting with rabbimq-server is illustrated, referring to FIG. 8, Fig. 8 is a kind of signal restarted from node in the embodiment of the present invention
Figure.
For ease of description, it illustrates and realizes with two initialization, major error, cluster power-off restarting scenes below
The step of mariadb and rabbitmq-server cluster reboot:
Node IP planning is as shown in figure 9, Fig. 9 is a kind of node IP planning schematic diagram in the embodiment of the present invention.It needs to illustrate
Be IP in planning schematic diagram shown in Fig. 9 can be not also to limit with Fig. 9 for other specific IP address.Start script
It is as follows:
Select main module, the starting script of mariadb, rabbitmq-server are as follows: leader_start.sh, mariadb_
Start.sh, rabbit_start.sh, three services are managed by systemd, automatically by pull-up again after exception, three
The mariadb of a node forms galera+mariadb cluster, and the rabbitmq-server of three nodes forms rabbitmq collection
Group.
One, it is initialized when disposing:
1) the KV value of ETCD: " master-ID " is set when disposing: " hostl "
2) three nodes start simultaneously, and three services of each node also start simultaneously.
3) node 1,2,3 /var/run/leader is not present.
4) mariadb_start.sh and rabbit_start.sh remains waiting for the state of this file:
5) leader.sh of node 1 inquire etcd " master-ID " value it is identical with the machine, then obtain " leader "
Lock, referred to as host node are write in master to/var/run/leader, and 10.0.0.254 is added to the network interface card of 10.0.0.11
On.
6) mariadb.sh and rabbit.sh of node 1 detect/var/run/leader be master after, according to
Owner identification starts respective business process and is respectively as follows:
Mariadb.sh:
sed-ie′/^safe_to_bootstrap/s/0/1/′/var/lib/mysql/grastate.dat
mysqld_safe--wsrep-new-cluster
Rabbit.sh:
rabbitmqctl force_boot
/usr/sbin/rabbitmq-server
7) node 2,3 leader_py.sh write slave to/var/run/leader
8) node 2,3 mariadb.sh and rabbit.sh detect/after var/run/leader is slave, according to
Standby identity starts respective business process and is respectively as follows:
Mariadb.sh:
sed-ie′/^safe_to_bootstrap/s/1/0/′/var/lib/mysql/grastate.dat
mysqld_safe
Rabbit.sh:
/usr/sbin/rabbitmq-server
9) mariadb cluster and rabbitmq cluster have built up at this time, and client can be visited by 10.0.0.254
Ask mariadb service and rabbitmq-server service
Two, 1 abnormal power-down of node:
1) competition of node 2 locks resource to leader, becomes owner identification, and node 3 is maintained from node (or slave node) identity.
2) the leader.sh general/var/run/leader value of node 2 is changed to " master "
3) the KV value of the leader.sh change ETCD of node 2: " master-ID ": " host2 "
4) 10.0.0.254 is added on the network interface card of 10.0.0.12 by the leader.sh of node 2.
5) mariadb cluster and rabbitmq cluster have built up, and support how main read-write, it is not necessary that changes.
Client mariadb service can be accessed by 10.0.0.254 and rabbitmq-server is serviced.
Three, three node entirety power-off restartings:
1) the KV value of ETCD are as follows: " master-ID ": " host2 "
Subsequent operation is identical as " initializing when deployment ", since hsot2 is host node before restarting, thus when restarting
Host2 obtains owner identification, other nodes obtain standby identity.
In this way, the business in cluster on individual node can directly be accessed by VIP.Node in cluster passes through distribution
After the competition to owner identification of coordination component, this nodal information is written in shared storage, when subsequent cluster reboot, still by last time master
Node obtains owner identification.During service operation, host node is abnormal, has other nodes and obtains primary node identity, and will be new
Node ID is updated into shared storage.Each group service is no longer an attempt to restore the cluster state before closing when integrally restoring,
But it is compulsory by host node come Boostrap cluster, cluster is added from node with the identity of member, when avoiding problems restarting
Requirement to boot sequence.
Embodiment three:
Corresponding to above method embodiment, the embodiment of the invention also provides a kind of stateful cluster recovery device, under
The stateful cluster recovery device of text description can correspond to each other reference with above-described stateful cluster recovery method.
Shown in Figure 10, which comprises the following modules:
Identity file acquisition module 101 after restarting for destination node, obtains the body of distributed coordination service log
Part mark file;
Judgment module 102 is identified, for determining that primary node identity identifies using identity file, and judges host node body
Part mark identifies whether identical with the machine;
Owner identification determining module 103, for if it is, the distributed lock of distributed coordination service is obtained, and in the machine
It is provided with state cluster in network interface card, the VIP of access service is externally provided;
From identity determining module 104, it is used for if it is not, then after host node obtains distributed lock, to add from node identities
Enter stateful cluster, and application owner identification queue is added.
Using device provided by the embodiment of the present invention, after destination node is restarted, distributed coordination service log is obtained
Identity file;It determines that primary node identity identifies using identity file, and judges primary node identity mark and the machine mark
Whether identical know;If it is, obtaining the distributed lock of distributed coordination service, and state cluster is provided in the machine network interface card
The VIP of access service is externally provided;If it is not, then after host node obtains distributed lock, it is stateful to be added from node identities
Cluster, and application owner identification queue is added.
In the case that either only destination node single-point is restarted, or entire cluster is restarted, destination node is being restarted
Afterwards, it obtains first and takes identity file, then determine primary node identity using identity file.Judge primary node identity mark
Whether knowledge is identical as the machine mark, if they are the same, then shows that destination node is host node before restarting, then obtains distributed lock, and
It is provided with state cluster in the machine network interface card, the VIP of access access is externally provided.Wherein, the mode locked in a distributed manner locks main section
Point, can be in host node failure, other nodes compete owner identification by way of competing distributed lock;If it is different, then showing mesh
Mark node be before restarting from node, then can be after host node gets distributed lock, stateful collection is added from node identities
Group, and be added in the queue of application owner identification.Since the VIP that stateful cluster externally provides service is still arranged in after restart
On host node, even thus when entire cluster reboot, can be directly according to each node before restarting without requiring to restart sequence
Principal and subordinate's identity in stateful service rejoins cluster.In addition, if host node is abnormal during service operation,
Primary node identity can be obtained by way of competing distributed lock by other nodes in application owner identification queue.In this way, can be
When stateful cluster is run, when cluster reboot or when single node is restarted, ensure stateful company-data integrality.Compared to logical
Starting and running for additional Pacemaker module management cluster is crossed, device provided by the embodiment of the present invention, framework is simple,
It is easy to accomplish.
In addition, in the embodiment of the present invention by stateful cluster externally provide service VIP setting on the primary node, service with
TCP connection can be directly established between client, can get around SLB, realize preferably maintainable and efficient communication.
In a kind of specific embodiment of the invention, owner identification determining module 103, specifically for taking the principal and subordinate of the machine
The setting of business state is main state, and adds VIP in the machine network interface card.
In a kind of specific embodiment of the invention, further includes:
Circulation monitors module, for stateful cluster is added from node identities, and be added application owner identification queue it
Afterwards, circulation monitors the distributed lock of distributed coordination service and the Status Change message of principal and subordinate's service.
In a kind of specific embodiment of the invention, further includes:
Owner identification competes module, what the distributed lock and principal and subordinate for monitoring distributed coordination service in circulation serviced
After Status Change message, distributed lock is obtained, executes the distributed lock for obtaining distributed coordination service, and in the machine network interface card
The state cluster of being provided with externally provides the step of VIP of access service;By the machine mark as primary node identity mark write-in body
Part mark file.
In a kind of specific embodiment of the invention, owner identification competes module, is specifically used for being obtained and being divided with competitive way
Cloth lock.
In a kind of specific embodiment of the invention, identity file acquisition module 101 is specifically used for destination node
After restarting, judge in the machine network interface card with the presence or absence of VIP;If it is present VIP is deleted from the machine network interface card, and initialize this
Machine state is init state, and is arranged into identity file;By the interface of distributed coordination service, identity mark is obtained
Know file.
In a kind of specific embodiment of the invention, distributed coordination services persistence maintenance identity file, works as master
Identity file is updated when identity changes;Wherein, the current state that destination node is saved in identity file, when
Preceding state be init state, main preparation state, major state, main wait state, from preparing state and from any one in state
Kind.
Example IV:
Corresponding to above method embodiment, the embodiment of the invention also provides a kind of stateful cluster recovery equipment, under
A kind of stateful cluster recovery equipment of text description can correspond to each other ginseng with a kind of above-described stateful cluster recovery method
According to.
Shown in Figure 11, which includes:
Memory D1, for storing computer program;
Processor D2 realizes the stateful cluster recovery method of above method embodiment when for executing computer program
Step.
It is that a kind of specific structure of stateful cluster recovery equipment provided in this embodiment shows specifically, please referring to Figure 12
Be intended to, which can generate bigger difference because configuration or performance are different, may include one or
More than one processor (central processing units, CPU) 322 (for example, one or more processors) and
Storage medium 330 (such as one or one of memory 332, one or more storage application programs 342 or data 344
The above mass memory unit).Wherein, memory 332 and storage medium 330 can be of short duration storage or persistent storage.It is stored in
The program of storage medium 330 may include one or more modules (diagram does not mark), and each module may include logarithm
According to the series of instructions operation in processing equipment.Further, central processing unit 322 can be set to and storage medium 330
Communication executes the series of instructions operation in storage medium 330 in stateful cluster recovery equipment 301.
Stateful cluster recovery equipment 301 can also include one or more power supplys 326, one or more have
Line or radio network interface 350, one or more input/output interfaces 358, and/or, one or more operation systems
System 341.For example, Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
Step in stateful cluster recovery method as described above can be by the structure of stateful cluster recovery equipment
It realizes.
Embodiment five:
Corresponding to above method embodiment, the embodiment of the invention also provides a kind of readable storage medium storing program for executing, are described below
A kind of readable storage medium storing program for executing can correspond to each other reference with a kind of above-described stateful cluster recovery method.
A kind of readable storage medium storing program for executing is stored with computer program on readable storage medium storing program for executing, and computer program is held by processor
The step of stateful cluster recovery method of above method embodiment is realized when row.
The readable storage medium storing program for executing be specifically as follows USB flash disk, mobile hard disk, read-only memory (Read-Only Memory,
ROM), the various program storage generations such as random access memory (Random Access Memory, RAM), magnetic or disk
The readable storage medium storing program for executing of code.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure
And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and
The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These
Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession
Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered
Think beyond the scope of this invention.
Claims (10)
1. a kind of stateful cluster recovery method characterized by comprising
After destination node is restarted, the identity file of distributed coordination service log is obtained;
It determines that primary node identity identifies using the identity file, and judges that the primary node identity mark is identified with the machine
It is whether identical;
If it is, obtaining the distributed lock of the distributed coordination service, and state cluster pair is provided in the machine network interface card
It is outer that the VIP of access service is provided;
If it is not, then the stateful cluster is added from node identities, and being added after host node obtains the distributed lock
Apply for owner identification queue.
2. stateful cluster recovery method according to claim 1, which is characterized in that described to be provided in the machine network interface card
State cluster externally provides the VIP of access service, comprising:
The state setting that the principal and subordinate of the machine services is main state, and adds the VIP in the machine network interface card.
3. stateful cluster recovery method according to claim 1, which is characterized in that described to be added from node identities
The stateful cluster, and be added after application owner identification queue, further includes:
Circulation monitors the distributed lock of the distributed coordination service and the Status Change message of principal and subordinate service.
4. stateful cluster recovery method according to claim 3, which is characterized in that monitor the distribution in the circulation
After the distributed lock of formula coordination service and the Status Change message of principal and subordinate service, further includes:
The distributed lock is obtained, executes the distributed lock for obtaining the distributed coordination service, and in the machine network interface card
The state cluster of being provided with externally provides the step of VIP of access service;
Described the machine is identified, the identity file is written as primary node identity mark.
5. stateful cluster recovery method according to claim 4, which is characterized in that it is described to obtain the distributed lock,
Include:
With competitive way, the distributed lock is obtained.
6. stateful cluster recovery method according to claim 1, which is characterized in that after the destination node is restarted, obtain
Take the identity file of distributed coordination service log, comprising:
After the destination node is restarted, judge in the machine network interface card with the presence or absence of the VIP;
If it is present the VIP is deleted from the machine network interface card, and initializing local state is init state, and
It is arranged into the identity file;
The interface serviced by the distributed coordination obtains the identity file.
7. stateful cluster recovery method according to any one of claims 1 to 6, which is characterized in that the distributed association
It is taken after mixing with liquid identity file described in business persistence maintenance, the identity file is updated when owner identification changes;Wherein, institute
The current state that the destination node is saved in identity file is stated, the current state is init state, main preparation
State, major state, main wait state, from preparing state and from any one in state.
8. a kind of stateful cluster recovery device characterized by comprising
Identity file acquisition module after restarting for destination node, obtains the identity of distributed coordination service log
File;
Judgment module is identified, for determining that primary node identity identifies using the identity file, and judges the host node
Identity identifies whether identical with the machine;
Owner identification determining module, for if it is, obtain the distributed lock of the distributed coordination service, and in the machine network interface card
In be provided with state cluster externally provide access service VIP;
From identity determining module, it is used for if it is not, then after host node obtains the distributed lock, institute is added from node identities
Stateful cluster is stated, and application owner identification queue is added.
9. a kind of stateful cluster recovery equipment characterized by comprising
Memory, for storing computer program;
Processor realizes the stateful cluster recovery as described in any one of claim 1 to 7 when for executing the computer program
The step of method.
10. a kind of readable storage medium storing program for executing, which is characterized in that be stored with computer program, the meter on the readable storage medium storing program for executing
It is realized when calculation machine program is executed by processor as described in any one of claim 1 to 7 the step of stateful cluster recovery method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811507350.1A CN109639794B (en) | 2018-12-10 | 2018-12-10 | State cluster recovery method, device, equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811507350.1A CN109639794B (en) | 2018-12-10 | 2018-12-10 | State cluster recovery method, device, equipment and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109639794A true CN109639794A (en) | 2019-04-16 |
CN109639794B CN109639794B (en) | 2021-07-13 |
Family
ID=66072601
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811507350.1A Active CN109639794B (en) | 2018-12-10 | 2018-12-10 | State cluster recovery method, device, equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109639794B (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110417600A (en) * | 2019-08-02 | 2019-11-05 | 秒针信息技术有限公司 | Node switching method, device and the computer storage medium of distributed system |
CN111026807A (en) * | 2019-11-25 | 2020-04-17 | 深圳壹账通智能科技有限公司 | Distributed lock synchronization method and device, computer equipment and readable storage medium |
CN111131492A (en) * | 2019-12-31 | 2020-05-08 | 中国联合网络通信集团有限公司 | Node access method and system |
CN111212127A (en) * | 2019-12-29 | 2020-05-29 | 浪潮电子信息产业股份有限公司 | Storage cluster, service data maintenance method, device and storage medium |
CN111400107A (en) * | 2020-04-21 | 2020-07-10 | 贵州新致普惠信息技术有限公司 | Self-starting recovery system and method for database multi-master cluster |
CN111405015A (en) * | 2020-03-09 | 2020-07-10 | 中国建设银行股份有限公司 | Data processing method, device, equipment and storage medium |
CN111949214A (en) * | 2020-08-07 | 2020-11-17 | 苏州浪潮智能科技有限公司 | Disk misoperation method, system, equipment and medium for preventing HANA cluster |
CN111970148A (en) * | 2020-08-14 | 2020-11-20 | 北京金山云网络技术有限公司 | Distributed task scheduling method and system |
CN112037873A (en) * | 2020-08-31 | 2020-12-04 | 合肥工业大学 | Single-point optimization method based on cluster selection and consensus mechanism |
CN112073265A (en) * | 2020-08-31 | 2020-12-11 | 帷幄匠心科技(杭州)有限公司 | Internet of things monitoring method and system based on distributed edge computing |
CN112269693A (en) * | 2020-10-23 | 2021-01-26 | 北京浪潮数据技术有限公司 | Node self-coordination method, device and computer readable storage medium |
CN112272220A (en) * | 2020-10-16 | 2021-01-26 | 苏州浪潮智能科技有限公司 | Cluster software start control method, system, terminal and storage medium |
CN112269683A (en) * | 2020-10-22 | 2021-01-26 | 苏州浪潮智能科技有限公司 | Offline node online service recovery method and related components |
CN112887367A (en) * | 2021-01-11 | 2021-06-01 | 华云数据控股集团有限公司 | Method, system and computer readable medium for realizing high availability of distributed cluster |
CN113326511A (en) * | 2021-06-25 | 2021-08-31 | 深信服科技股份有限公司 | File repair method, system, device and medium |
CN113660350A (en) * | 2021-10-18 | 2021-11-16 | 恒生电子股份有限公司 | Distributed lock coordination method, device, equipment and storage medium |
CN113949691A (en) * | 2021-10-15 | 2022-01-18 | 湖南麒麟信安科技股份有限公司 | ETCD-based virtual network address high-availability implementation method and system |
CN114124903A (en) * | 2021-11-15 | 2022-03-01 | 新华三大数据技术有限公司 | Virtual IP address management method and device |
CN114501094A (en) * | 2022-02-09 | 2022-05-13 | 浙江博采传媒有限公司 | Sequence playing method and device based on virtual production and storage medium |
WO2022116660A1 (en) * | 2020-12-03 | 2022-06-09 | 苏州浪潮智能科技有限公司 | Node synchronization method and apparatus, device and storage medium |
CN114900449A (en) * | 2022-03-30 | 2022-08-12 | 网宿科技股份有限公司 | Resource information management method, system and device |
CN115277712A (en) * | 2022-07-08 | 2022-11-01 | 北京城市网邻信息技术有限公司 | Distributed lock service providing method, device and system and electronic equipment |
CN115277114A (en) * | 2022-07-08 | 2022-11-01 | 北京城市网邻信息技术有限公司 | Distributed lock processing method and device, electronic equipment and storage medium |
CN115269248A (en) * | 2022-07-28 | 2022-11-01 | 江苏安超云软件有限公司 | Method and device for preventing split brain under dual-node cluster, electronic equipment and storage medium |
CN115484139A (en) * | 2022-09-02 | 2022-12-16 | 武汉众智数字技术有限公司 | Video strategy management decentralized method based on video network monitoring |
CN115617917A (en) * | 2022-12-16 | 2023-01-17 | 中国西安卫星测控中心 | Method, device, system and equipment for controlling multiple activities of database cluster |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102467508A (en) * | 2010-11-04 | 2012-05-23 | 中兴通讯股份有限公司 | Method for providing database service and database system |
CN102868736A (en) * | 2012-08-30 | 2013-01-09 | 浪潮(北京)电子信息产业有限公司 | Design and implementation method of cloud computing monitoring framework, and cloud computing processing equipment |
CN103763378A (en) * | 2014-01-24 | 2014-04-30 | 中国联合网络通信集团有限公司 | Task processing method and system and nodes based on distributive type calculation system |
CN105159818A (en) * | 2015-08-28 | 2015-12-16 | 东北大学 | Log recovery method in memory data management and log recovery simulation system in memory data management |
CN106656624A (en) * | 2017-01-04 | 2017-05-10 | 合肥康捷信息科技有限公司 | Optimization method based on Gossip communication protocol and Raft election algorithm |
CN106713250A (en) * | 2015-11-18 | 2017-05-24 | 杭州华为数字技术有限公司 | Data access method and device based on distributed system |
CN106844092A (en) * | 2016-12-09 | 2017-06-13 | 武汉烽火信息集成技术有限公司 | A kind of method of the MariaDB Galera Cluster of automatic recovery power down |
CN106911524A (en) * | 2017-04-27 | 2017-06-30 | 紫光华山信息技术有限公司 | A kind of HA implementation methods and device |
CN108881489A (en) * | 2018-08-03 | 2018-11-23 | 高新兴科技集团股份有限公司 | A kind of coordination system and method for Distributed Services |
-
2018
- 2018-12-10 CN CN201811507350.1A patent/CN109639794B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102467508A (en) * | 2010-11-04 | 2012-05-23 | 中兴通讯股份有限公司 | Method for providing database service and database system |
CN102868736A (en) * | 2012-08-30 | 2013-01-09 | 浪潮(北京)电子信息产业有限公司 | Design and implementation method of cloud computing monitoring framework, and cloud computing processing equipment |
CN103763378A (en) * | 2014-01-24 | 2014-04-30 | 中国联合网络通信集团有限公司 | Task processing method and system and nodes based on distributive type calculation system |
CN105159818A (en) * | 2015-08-28 | 2015-12-16 | 东北大学 | Log recovery method in memory data management and log recovery simulation system in memory data management |
CN106713250A (en) * | 2015-11-18 | 2017-05-24 | 杭州华为数字技术有限公司 | Data access method and device based on distributed system |
CN106844092A (en) * | 2016-12-09 | 2017-06-13 | 武汉烽火信息集成技术有限公司 | A kind of method of the MariaDB Galera Cluster of automatic recovery power down |
CN106656624A (en) * | 2017-01-04 | 2017-05-10 | 合肥康捷信息科技有限公司 | Optimization method based on Gossip communication protocol and Raft election algorithm |
CN106911524A (en) * | 2017-04-27 | 2017-06-30 | 紫光华山信息技术有限公司 | A kind of HA implementation methods and device |
CN108881489A (en) * | 2018-08-03 | 2018-11-23 | 高新兴科技集团股份有限公司 | A kind of coordination system and method for Distributed Services |
Cited By (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110417600A (en) * | 2019-08-02 | 2019-11-05 | 秒针信息技术有限公司 | Node switching method, device and the computer storage medium of distributed system |
CN111026807A (en) * | 2019-11-25 | 2020-04-17 | 深圳壹账通智能科技有限公司 | Distributed lock synchronization method and device, computer equipment and readable storage medium |
CN111212127A (en) * | 2019-12-29 | 2020-05-29 | 浪潮电子信息产业股份有限公司 | Storage cluster, service data maintenance method, device and storage medium |
CN111131492A (en) * | 2019-12-31 | 2020-05-08 | 中国联合网络通信集团有限公司 | Node access method and system |
CN111405015B (en) * | 2020-03-09 | 2022-09-30 | 中国建设银行股份有限公司 | Data processing method, device, equipment and storage medium |
CN111405015A (en) * | 2020-03-09 | 2020-07-10 | 中国建设银行股份有限公司 | Data processing method, device, equipment and storage medium |
CN111400107A (en) * | 2020-04-21 | 2020-07-10 | 贵州新致普惠信息技术有限公司 | Self-starting recovery system and method for database multi-master cluster |
CN111400107B (en) * | 2020-04-21 | 2023-03-03 | 贵州新致普惠信息技术有限公司 | Self-starting recovery system and method for database multi-master cluster |
CN111949214A (en) * | 2020-08-07 | 2020-11-17 | 苏州浪潮智能科技有限公司 | Disk misoperation method, system, equipment and medium for preventing HANA cluster |
CN111970148A (en) * | 2020-08-14 | 2020-11-20 | 北京金山云网络技术有限公司 | Distributed task scheduling method and system |
CN112037873A (en) * | 2020-08-31 | 2020-12-04 | 合肥工业大学 | Single-point optimization method based on cluster selection and consensus mechanism |
CN112073265A (en) * | 2020-08-31 | 2020-12-11 | 帷幄匠心科技(杭州)有限公司 | Internet of things monitoring method and system based on distributed edge computing |
CN112073265B (en) * | 2020-08-31 | 2022-05-13 | 帷幄匠心科技(杭州)有限公司 | Internet of things monitoring method and system based on distributed edge computing |
CN112037873B (en) * | 2020-08-31 | 2022-09-13 | 合肥工业大学 | Single-point optimization method based on cluster selection and consensus mechanism |
CN112272220A (en) * | 2020-10-16 | 2021-01-26 | 苏州浪潮智能科技有限公司 | Cluster software start control method, system, terminal and storage medium |
CN112272220B (en) * | 2020-10-16 | 2022-05-13 | 苏州浪潮智能科技有限公司 | Cluster software start control method, system, terminal and storage medium |
CN112269683A (en) * | 2020-10-22 | 2021-01-26 | 苏州浪潮智能科技有限公司 | Offline node online service recovery method and related components |
CN112269683B (en) * | 2020-10-22 | 2022-12-06 | 苏州浪潮智能科技有限公司 | Off-line node on-line service recovery method and related components |
CN112269693B (en) * | 2020-10-23 | 2024-03-01 | 北京浪潮数据技术有限公司 | Node self-coordination method, device and computer readable storage medium |
CN112269693A (en) * | 2020-10-23 | 2021-01-26 | 北京浪潮数据技术有限公司 | Node self-coordination method, device and computer readable storage medium |
WO2022116660A1 (en) * | 2020-12-03 | 2022-06-09 | 苏州浪潮智能科技有限公司 | Node synchronization method and apparatus, device and storage medium |
US11895185B2 (en) | 2020-12-03 | 2024-02-06 | Inspur Suzhou Intelligent Technology Co., Ltd. | Node synchronization method and apparatus, device and storage medium |
CN112887367A (en) * | 2021-01-11 | 2021-06-01 | 华云数据控股集团有限公司 | Method, system and computer readable medium for realizing high availability of distributed cluster |
CN113326511B (en) * | 2021-06-25 | 2024-04-09 | 深信服科技股份有限公司 | File repair method, system, equipment and medium |
CN113326511A (en) * | 2021-06-25 | 2021-08-31 | 深信服科技股份有限公司 | File repair method, system, device and medium |
CN113949691A (en) * | 2021-10-15 | 2022-01-18 | 湖南麒麟信安科技股份有限公司 | ETCD-based virtual network address high-availability implementation method and system |
CN113660350A (en) * | 2021-10-18 | 2021-11-16 | 恒生电子股份有限公司 | Distributed lock coordination method, device, equipment and storage medium |
CN114124903A (en) * | 2021-11-15 | 2022-03-01 | 新华三大数据技术有限公司 | Virtual IP address management method and device |
CN114501094A (en) * | 2022-02-09 | 2022-05-13 | 浙江博采传媒有限公司 | Sequence playing method and device based on virtual production and storage medium |
CN114900449B (en) * | 2022-03-30 | 2024-02-23 | 网宿科技股份有限公司 | Resource information management method, system and device |
CN114900449A (en) * | 2022-03-30 | 2022-08-12 | 网宿科技股份有限公司 | Resource information management method, system and device |
CN115277114B (en) * | 2022-07-08 | 2023-07-21 | 北京城市网邻信息技术有限公司 | Distributed lock processing method and device, electronic equipment and storage medium |
CN115277114A (en) * | 2022-07-08 | 2022-11-01 | 北京城市网邻信息技术有限公司 | Distributed lock processing method and device, electronic equipment and storage medium |
CN115277712A (en) * | 2022-07-08 | 2022-11-01 | 北京城市网邻信息技术有限公司 | Distributed lock service providing method, device and system and electronic equipment |
CN115269248A (en) * | 2022-07-28 | 2022-11-01 | 江苏安超云软件有限公司 | Method and device for preventing split brain under dual-node cluster, electronic equipment and storage medium |
CN115269248B (en) * | 2022-07-28 | 2023-08-08 | 安超云软件有限公司 | Method and device for preventing brain fracture under double-node cluster, electronic equipment and storage medium |
CN115484139A (en) * | 2022-09-02 | 2022-12-16 | 武汉众智数字技术有限公司 | Video strategy management decentralized method based on video network monitoring |
CN115484139B (en) * | 2022-09-02 | 2024-03-15 | 武汉众智数字技术有限公司 | Video strategy management decentralization method based on video network monitoring |
CN115617917A (en) * | 2022-12-16 | 2023-01-17 | 中国西安卫星测控中心 | Method, device, system and equipment for controlling multiple activities of database cluster |
CN115617917B (en) * | 2022-12-16 | 2023-03-10 | 中国西安卫星测控中心 | Method, device, system and equipment for controlling multiple activities of database cluster |
Also Published As
Publication number | Publication date |
---|---|
CN109639794B (en) | 2021-07-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109639794A (en) | A kind of stateful cluster recovery method, apparatus, equipment and readable storage medium storing program for executing | |
US11360854B2 (en) | Storage cluster configuration change method, storage cluster, and computer system | |
CN104715001B (en) | The method and system of write operation is performed for the shared resource in the cluster to data handling system | |
US9984140B1 (en) | Lease based leader election system | |
ES2881606T3 (en) | Geographically distributed file system using coordinated namespace replication | |
CN109753364A (en) | A kind of implementation method, equipment and the medium of network-based distributed lock | |
CN110990200B (en) | Flow switching method and device based on multiple active data centers | |
CN109101341A (en) | The distribution method and equipment of distributed lock | |
CN106850260A (en) | A kind of dispositions method and device of virtual resources management platform | |
DE102005053727A1 (en) | Distributed lock | |
CN107666493B (en) | Database configuration method and equipment thereof | |
CN103973725A (en) | Distributed collaboration method and collaboration device | |
CN101137984A (en) | Systems, methods, and software for distributed loading of databases | |
JP2012173996A (en) | Cluster system, cluster management method and cluster management program | |
CN106603319A (en) | Fault processing method, management server, and logic server | |
CN106325768B (en) | A kind of two-shipper storage system and method | |
CN113515316A (en) | Novel edge cloud operating system | |
CN111342986B (en) | Distributed node management method and device, distributed system and storage medium | |
CN109508223A (en) | A kind of virtual machine batch creation method, system and equipment | |
CN116346834A (en) | Session synchronization method, device, computing equipment and computer storage medium | |
US20230058193A1 (en) | Computer system and storage medium | |
US10608882B2 (en) | Token-based lightweight approach to manage the active-passive system topology in a distributed computing environment | |
CN109558205B (en) | Disk access method and device | |
CN110647427A (en) | Main and standby system based on storage sharing and implementation method thereof | |
CN105491101B (en) | The treating method and apparatus of data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |