CN110661599B - HA implementation method, device and storage medium between main node and standby node - Google Patents

HA implementation method, device and storage medium between main node and standby node Download PDF

Info

Publication number
CN110661599B
CN110661599B CN201810687830.4A CN201810687830A CN110661599B CN 110661599 B CN110661599 B CN 110661599B CN 201810687830 A CN201810687830 A CN 201810687830A CN 110661599 B CN110661599 B CN 110661599B
Authority
CN
China
Prior art keywords
node
standby
main
arbitration
master
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810687830.4A
Other languages
Chinese (zh)
Other versions
CN110661599A (en
Inventor
朱骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201810687830.4A priority Critical patent/CN110661599B/en
Publication of CN110661599A publication Critical patent/CN110661599A/en
Application granted granted Critical
Publication of CN110661599B publication Critical patent/CN110661599B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/22Arrangements for detecting or preventing errors in the information received using redundant apparatus to increase reliability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/203Failover techniques using migration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements

Abstract

The embodiment of the invention discloses a method, a device and a storage medium for realizing HA between a main node and a standby node, belonging to the technical field of communication. The method comprises the following steps: monitoring communication conditions between the arbitration node and a main node and a standby node of a communication service and between the main node and the standby node by the arbitration node; analyzing whether the states of the main node and the standby node are effective or not according to the monitoring result; and performing coordination management on the master node and the standby node of the communication service according to the states of the master node and the standby node. By adopting the embodiment of the invention, when the master node or the standby node is abnormal in the cloud network, the rapid master-standby switching can be realized without depending on a hardware channel.

Description

HA implementation method, device and storage medium between main node and standby node
Technical Field
The embodiment of the invention relates to the technical field of communication, in particular to a High Availability (HA) realization method, device and storage medium for a main node and a standby node.
Background
In order to improve service reliability, system communication devices usually adopt a form of a master node and a slave node, and services are respectively deployed on the master node and the slave node. At ordinary times, only the service on the main node works, and when the service on the main node or the main node is abnormal, the standby node is quickly switched to the main node to take over the service work on the original main node, so that the service is ensured not to be interrupted.
On the existing physical network device (PNF), the master node and the standby node are usually hardware single boards (physical CPUs), the offline or the abnormality of the hardware single board can be quickly sensed through a hardware channel, and the abnormality of the node (virtual machine or container) under the cloud network (VNF) often does not have the sensing channel.
Disclosure of Invention
In view of this, embodiments of the present invention provide an HA implementation method and apparatus for a host device and a standby device, and a storage medium, so as to solve a problem that when a node in a cloud network is abnormal, the node in the prior art often cannot sense the abnormality through a hardware channel to implement the HA.
The technical scheme adopted by the embodiment of the invention for solving the technical problems is as follows:
according to a first aspect of the embodiments of the present invention, a method for implementing an HA between a master node and a standby node is provided, including:
monitoring communication conditions between the arbitration node and a main node and a standby node of a communication service and between the main node and the standby node by the arbitration node;
analyzing whether the states of the main node and the standby node are effective or not according to the monitoring result;
and performing coordination management on the master node and the standby node of the communication service according to the states of the master node and the standby node.
According to a second aspect of the embodiments of the present invention, there is provided an apparatus for implementing an HA between a master node and a standby node, the apparatus including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method according to the first aspect.
According to a third aspect of embodiments of the present invention there is provided a storage medium storing one or more programs executable by one or more processors to perform the steps of the method according to the first aspect.
The method, the device and the storage medium for realizing the HA between the main node and the standby node of the embodiment of the invention judge the states of the main node and the standby node by monitoring the communication conditions between the arbitration node and the main node and the standby node of the communication service and between the main node and the standby node, and carry out coordination management on the main node and the standby node according to the states of the main node and the standby node, thereby realizing rapid main-standby switching without depending on a hardware channel when the main node or the standby node is abnormal in a cloud network.
Drawings
Fig. 1 is a flowchart of an HA implementation method between a master node and a standby node according to an embodiment of the present invention;
fig. 2 is a logical structure diagram of an HA of a primary node and a standby node according to a first embodiment of the present invention;
fig. 3 is a schematic diagram illustrating switching between a main node and a standby node when communication links of nodes are normal in the first embodiment of the present invention;
fig. 4 is a schematic diagram illustrating an operation when a communication link between a master node and a slave node is abnormal according to a first embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating an operation of a master node, a slave node, and an arbitration node when a communication link between the master node and the slave node is abnormal according to an embodiment of the present invention;
fig. 6 is a schematic diagram illustrating an operation when a communication link between a standby node and a master node and an arbitration node is abnormal according to an embodiment of the present invention;
fig. 7 is a schematic diagram illustrating an operation of a communication link between two nodes being abnormal according to an embodiment of the present invention;
fig. 8 is a schematic diagram of a module result of an HA implementation apparatus between a master node and a standby node according to a second embodiment of the present invention.
The implementation, functional features and advantages of the objects of the embodiments of the present invention will be further described with reference to the accompanying drawings.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the embodiments of the present invention clearer and more obvious, the embodiments of the present invention are described in further detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the embodiments of the invention and are not limiting of the embodiments of the invention.
An embodiment of the present invention provides a method for implementing an HA between a master node and a slave node, please refer to fig. 1, where the method includes:
step S101, an arbitration node monitors the communication conditions between the arbitration node and a main node and a standby node of a communication service and between the main node and the standby node;
step S102, whether the states of the main node and the standby node are effective or not is analyzed according to the monitoring result;
step S103, the master node and the standby node of the communication service are coordinated and managed according to the states of the master node and the standby node.
In practical application, the HA logical structure diagram of the master node and the HA logical structure diagram can refer to fig. 2. In order to implement the HA logical structure of fig. 2, a high-reliability auxiliary process is deployed in a certain named node in advance, so that the named node becomes an arbitration node. The named node is a virtual network system, and each node of the network system knows the existence of the named node and can communicate with the named node. In this embodiment, the high-reliability process may also be replaced by a thread or other execution entity, and is collectively referred to as a HAHelp for convenience of description. The HAHelp is used for assisting the communication service to complete the functions of election of the main node and the standby node, transferring the standby node to the main node and the like. A high reliability execution thread is then deployed among all nodes (which may include arbitration nodes). The high-reliability execution thread is usually located in the boot management or root process of the node, and may be a separate process called HAClient. The HAClient is used for monitoring and managing the service state of the node.
The arbitration node selects the main and the standby HAClients by using the HAHelp, and the nodes where the main and the standby HAClients are located become the main and the standby nodes. Specifically, the HAHelp scans all the alternative nodes, selects two nodes as a master node and a standby node according to the running condition of each node in a given time limit, and records the positions of the master node and the standby node.
The arbitration node and the selected main node and the standby node keep monitoring the communication condition in real time to realize keep-alive monitoring, and once the node where the main node is located is abnormal, the standby node completes the actions of standby-to-main and the like under the assistance of the arbitration node and the partner node. It should be noted that, in this embodiment, the implementation of the HA among the arbitration node, the master node, and the standby node is implemented by HAHelp and HAClient.
In practical application, in the embodiment, arbitrating the coordination management of the master node and the standby node includes, when the state of the master node and/or the standby node is invalid, resetting the master node and/or the standby node, and notifying the standby node to convert into the master node or reselect the master node and/or the standby node; and when the states of the main node and the standby node are effective, maintaining the main node and the standby node unchanged.
In order to simplify the model, the embodiment is described by taking a pair of master and slave nodes as an example, but the method is also applicable to high-reliability management of a plurality of pairs of master and slave nodes, that is, one arbitration node can coordinate management of a plurality of pairs of master and slave nodes.
In a possible scheme, the step S102 of analyzing whether the states of the master node and the standby node are valid according to the monitoring result includes:
step S1021, if the communication links between the arbitration node and the master node and the standby node are normal, determining whether the states of the master node and the standby node are valid according to the running conditions of the master node and the standby node;
step S1022, if the communication links between the arbitration node and the master node and the standby node are all abnormal, determining whether the states of the master node and the standby node are valid according to whether the communication link between the master node and the standby node is abnormal;
step S1023, if the communication link between the arbitration node and one of the main node and the standby node is normal, determining whether the states of the main node and the standby node are effective according to the communication states of the arbitration node and the main node and the standby node.
In a feasible scheme, step S1021, "if the communication links between the arbitration node and the master node and the slave node are normal, determine whether the states of the master node and the slave node are valid according to the operating conditions of the master node and the slave node", includes the following two scenarios:
firstly, when a service exception notification sent by a main node is received or the main node is detected to be reset, determining that the main node is invalid;
and secondly, if receiving a service abnormity notification sent by the standby node or detecting that the standby node is reset, confirming that the state of the standby node is invalid.
In one possible implementation, before performing step S1021, the method further includes:
the main node or the standby node generates self-detectable abnormity and sends service abnormity notification to the arbitration node; or
And if the master node or the standby node is abnormal, the cloud network forcibly resets the master node and the standby node.
In practical applications, if the links between the arbitration node and the master and slave nodes, and between the master node and the slave node are normal (as shown in fig. 2), that is, under the condition that the communication link is normal, as long as the master and slave haclients can reach each other, even if the HAHelp is crashed for a long time or cannot reach, the current master HAClient is still valid.
Referring to fig. 3, if a detectable anomaly occurs in the master node (the HAClient itself is normal), the master HAClient detects the anomaly of the master service and actively notifies the HAHelp to initiate switching, the HAHelp resets the node where the master HAClient is located, and then notifies the slave HAClient to switch to the master, which needs to be switched quickly to meet the requirement of Non-stop routing (NSR); if the host node is in an undetectable abnormity (the HAClient is abnormal per se), the cloud network quickly resets the node after detecting the abnormity, and the HAHelp informs the HAClient to transfer to the master after detecting the reset of the host node.
In a possible scheme, the step S1023, "if at least one communication link between the arbitration node and the master node and the standby node is normal, determine whether the states of the master node and the standby node are valid according to the communication states of the three nodes, and includes the following scenarios:
if the communication link between the arbitration node and the main node is abnormal, when the main node unreachable notice sent by the standby node is received, the state of the main node is determined to be invalid, otherwise, the states of the main node and the standby node are determined to be valid.
Referring to fig. 5, at this time, the communication link between the arbitration node and the master node is abnormal, and the communication link between the standby node and the master node is normal. And the HAClient informs the HAHelp that the main HAClient can not reach, the HAHelp detects whether the original main node still exists after receiving the notice that the main HAClient can not reach and the HAClient sends, if so, the HAClient informs the cloud network to reset the original main node, and then informs the standby node to convert into the main node and reselects the standby node.
In practical application, when the main HAClient finds that the condition that both the HAHelp and the main HAClient are unreachable lasts for a preset time, the node can be reset, namely, the main suicide.
If the communication link between the arbitration node and the standby node is abnormal, when the unreachable notice of the standby node sent by the main node is received, the state of the standby node is determined to be invalid, otherwise, the states of the main node and the standby node are determined to be valid.
Referring to fig. 6, at this time, the communication link between the arbitration node and the master node is abnormal, the communication link between the arbitration node and the slave node is normal, the master hacient notifies the HAHelp that the slave hacient is unreachable, the HAHelp detects whether the original slave node still exists after receiving the notification that the slave hacient is unreachable, which is sent by the master hacient, and if the original slave node exists, the HAHelp notifies the cloud network to reset the original slave node, and then reselects the slave node.
In practical application, when the HAClient finds that the condition that neither the HAHelp nor the main HAClient is reachable lasts for a preset time, the node of the HAClient can be reset, namely, the HAClient suicide is prepared.
And if the communication links between the arbitration node and the main node and between the arbitration node and the standby node are normal, when receiving an unreachable notification of the opposite node sent by the main node or the standby node, determining that the state of the standby node is invalid.
Referring to fig. 4, if the links between the arbitration node and the master and slave nodes are normal, and the communication link between the master node and the slave node is abnormal (i.e. communication is not reachable), at this time, the master and slave haclients find that the opposite end is not reachable, send a master and slave unreachable notification to the hahellp to confirm their own connectivity, and reset the node where they are located if the hahellp is not reachable. The HAHelp receives the main and standby unreachable notification of one of the main and standby HAClients to start detecting the connectivity of the main HAClient, and if the main HAClient can reach, the HAHelp notifies the standby HAClient to reset the node.
With reference to the second scenario of step S1021, it can be seen that, in this embodiment, if the HAHelp and the master HAClient are reachable, the standby node is reset (if the standby node is not reachable, the standby node may be reset by using the cloud network) no matter whether the HAHelp and the standby HAClient are reachable, and then the standby node is reselected.
In practical application, if the communication link between the main and the standby haclients is normal, and the communication link between the HAHelp and one of the main and the standby haclients is also normal, the HAHelp confirms that the main and the standby nodes are effective, and does not perform any processing.
In a possible scheme, in step S1022, if the communication links between the arbitration node and the master node and the standby node are both abnormal, determining whether the states of the master node and the standby node are valid according to whether the communication link between the master node and the standby node is abnormal, including:
if the communication links between the arbitration node and the main and standby nodes are abnormal, if one of the nodes is detected to be reset within the preset time, the states of the main and standby nodes are confirmed to be invalid, otherwise, the arbitration node is reset, so that the cloud network reselects the arbitration node.
Under the situation, the HAHelp and the main and the auxiliary HAClients are not reachable, and the two conditions are divided according to the communication link between the main and the auxiliary HAClients:
if the primary and secondary haclients are unreachable, please refer to fig. 7, at this time, the primary and secondary nodes are considered to be failed, and the primary and secondary nodes need to be reselected.
Under the condition, the HAHelp cannot actively inform the main node and the standby node to reset, the failure node initiates suicide action, namely, the HAClient detects that neither the HAHelp nor the partner HAClient can reach the preset time limit and resets the node. As long as one HAClient is normal and the reset node is successful, the HAHelp detects that the node is reset, then the HAClient and the backup node are considered to be invalid, re-election is initiated immediately to elect new main and backup nodes, and the cloud network is informed to reset the other node.
If the main and standby HAClients can not reset the nodes because of the abnormality which is not detected by the HAHelp, the HAHelp can not judge the states of the main and standby nodes, the condition is called as arbitration (HAHelp) failure, in order to prevent the situation that no main node exists all the time, the HAHelp needs to inform the cloud network to reset the main and standby nodes and reselect, but if the communication link of the HAHelp is a problem, the false detection can occur, in order to reduce the false detection probability,
and the HAHelp carries out migration, if the number of times of migration reaches a preset value, the communication link with the main node or the standby node cannot be recovered, and then the cloud network is informed to reset.
In practical application, the HAhelp performs migration, that is, the arbitration node is actively reset, and the cloud network reselects a new arbitration node.
And (II) the main node and the standby node can be reached, and the main node and the standby node are considered to be effective and need to be maintained in the case. And the HAHelp can not normally sense the things because of the connectivity of the HAHelp, as described above, the HAHelp migrates the HAHelp, the communication can be recovered if the main node and the standby node are normal, and if the migration is still the same for several times, the main node and the standby node are forcibly reset to initiate re-election.
The method comprises the steps that undetected abnormity occurs on a main HAClient and a standby HAClient at the same time, the HAHelp and the main HAClient and the standby HAClient can not reach in the scene, the main node and the standby node are not reset, the HAHelp finds that the main node and the standby node exist but can not confirm whether communication is connected (actually disconnected), the HAHelp automatically migrates to other nodes (nodes except the main node and the standby node) at the moment and then detects the connectivity of the main node and the standby HAClient, the migration is continued if the HAHelp still cannot reach, the main HAClient and the standby HAClient are confirmed to be abnormal after the migration times reach a preset value, the original main node and the standby node are reset, and new main nodes and new standby nodes are elected.
Therefore, after "resetting the arbitration node to cause the cloud network to reselect the arbitration node" is performed, the method further includes:
and if the frequency of selecting the arbitration node by the cloud network reaches a preset value, if the communication link between the arbitration node selected by the cloud network and the main node and the communication link between the arbitration node selected by the cloud network and the main node are still abnormal, the states of the main node and the standby node are confirmed to be invalid.
In a possible solution, the step "performing coordinated management on the master node and the standby node of the communication service according to the states of the master node and the standby node" includes:
and when the state of the main node and/or the standby node is invalid, resetting the main node and/or the standby node, informing the standby node of converting into the main node or re-electing the main node and/or the standby node.
Please refer to table 1, which lists the correspondence between the communication connection status of each node in the present embodiment and the way of the arbitration node to coordinate and manage the master node and the slave node, where T in the table indicates that the communication connection is normal, and F indicates that the communication connection is abnormal.
TABLE 1
Figure BDA0001712144600000081
The HA implementation method between the master node and the standby node in this embodiment judges the states of the master node and the standby node by monitoring the communication conditions between the arbitration node and the master node and the standby node of the communication service and between the master node and the standby node, and performs coordination management on the master node and the standby node according to the states of the master node and the standby node, so that when a master node or the standby node is abnormal in a cloud network, a fast master-standby switching is achieved without depending on a hardware channel, and the occurrence of a double master node can be avoided occasionally.
On the basis of the foregoing embodiment, a second embodiment of the present invention provides an apparatus for implementing an HA between a master node and a standby node, referring to fig. 8, where the apparatus includes: a memory 801, a processor 802 and a computer program 803 stored on the memory 801 and executable on the processor 802, the computer program 803 realizing the steps of the method according to the first embodiment when executed by the processor 802.
On the basis of the foregoing embodiment, a third embodiment of the present invention provides a storage medium, where the storage medium includes a stored program, and the program, when running, controls a device on which the storage medium is located to perform the operations according to the first embodiment.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
The preferred embodiments of the present invention have been described above with reference to the accompanying drawings, and are not intended to limit the scope of the embodiments of the invention. Any modifications, equivalents and improvements that may occur to those skilled in the art without departing from the scope and spirit of the embodiments of the present invention are intended to be within the scope of the claims of the embodiments of the present invention.

Claims (10)

1. A method for realizing HA between a main node and a standby node comprises the following steps:
monitoring communication conditions between the arbitration node and a main node and a standby node of a communication service and between the main node and the standby node by the arbitration node;
analyzing whether the states of the main node and the standby node are effective or not according to the monitoring result; it includes:
if the communication links between the arbitration node and the main node and the standby node are normal, determining whether the states of the main node and the standby node are valid according to the operating conditions of the main node and the standby node;
if the communication links between the arbitration node and the main and standby nodes are abnormal, determining whether the states of the main and standby nodes are valid according to whether the communication links between the main node and the standby nodes are abnormal;
if at least one communication link between the arbitration node and the main node and the standby node is normal, determining whether the states of the main node and the standby node are effective or not according to the communication states of the arbitration node, the main node and the standby node;
and performing coordination management on the master node and the standby node of the communication service according to the states of the master node and the standby node.
2. The method as claimed in claim 1, wherein the determining whether the states of the master node and the standby node are valid according to the operating conditions of the master node and the standby node if the communication links between the arbitration node and the master node and between the standby node are normal comprises:
if a service exception notification sent by the main node is received or the main node is detected to be reset, determining that the main node is invalid;
and if a service abnormity notification sent by the standby node is received or the standby node is detected to be reset, confirming that the state of the standby node is invalid.
3. The method as claimed in claim 2, wherein if the communication links between the arbitration node and the master node and the slave node are normal, before determining whether the states of the master node and the slave node are valid according to the operating conditions of the master node and the slave node, the method further comprises:
the main node or the standby node generates self-detectable abnormity and sends service abnormity notification to the arbitration node; or
And if the main node or the standby node is abnormal, the cloud network forcibly resets the main node or the standby node.
4. The method as claimed in claim 1, wherein the determining whether the states of the master node and the standby node are valid according to the communication states of the arbitration node, the master node and the standby node if at least one communication link between the arbitration node and the master node is normal comprises:
if the communication link between the arbitration node and the main node is abnormal, when a main node unreachable notice sent by the standby node is received, determining that the state of the main node is invalid;
if the communication link between the arbitration node and the standby node is abnormal, when a standby node unreachable notice sent by the main node is received, determining that the state of the standby node is invalid;
and if the communication links between the arbitration node and the main node and between the arbitration node and the standby node are normal, when receiving an unreachable notification of the opposite node sent by the main node or the standby node, determining that the state of the standby node is invalid.
5. The method for implementing HA between a master node and a slave node according to claim 1, wherein if the communication links between the arbitration node and the master node and the slave node are both abnormal, determining whether the states of the master node and the slave node are valid according to whether the communication links between the master node and the slave node are abnormal, comprises:
if the communication links between the arbitration node and the main node and the standby node are abnormal, if one of the nodes is detected to be reset within preset time, the states of the main node and the standby node are confirmed to be invalid, otherwise, the arbitration node is reset, so that the cloud network reselects the arbitration node.
6. The method for implementing HA between a master node and a slave node as claimed in claim 5, wherein after confirming that the states of both the master node and the slave node are invalid, the method further comprises: notifying the cloud network to reset another node;
after the resetting the arbitration node to cause the cloud network to reselect the arbitration node, the method further comprises:
and if the frequency of selecting the arbitration node by the cloud network reaches a preset value, if the communication link between the arbitration node selected by the cloud network and the main node or the communication link between the arbitration node selected by the cloud network and the standby node are still abnormal, the states of the main node and the standby node are confirmed to be invalid.
7. The method for implementing HA between a master node and a slave node as claimed in claim 1, wherein said method further comprises:
when the main node detects that the communication links between the main node and the corresponding standby node and the communication links between the main node and the arbitration node are abnormal, resetting the node per se;
and when the standby node detects that the communication links between the standby node and the corresponding main node and the arbitration node are abnormal, resetting the node per se.
8. The HA implementing method between the master node and the standby node as claimed in any one of claims 1 to 7, wherein the performing coordinated management on the master node and the standby node of the communication service according to the states of the master node and the standby node comprises:
and when the state of the main node and/or the standby node is invalid, resetting the main node and/or the standby node, and informing the standby node of converting into the main node or re-electing the main node and/or the standby node.
9. An apparatus for implementing HA between a master node and a standby node, the apparatus comprising: memory, processor and computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the method according to any one of claims 1 to 7.
10. A storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement steps of a method as claimed in any one of claims 1 to 7.
CN201810687830.4A 2018-06-28 2018-06-28 HA implementation method, device and storage medium between main node and standby node Active CN110661599B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810687830.4A CN110661599B (en) 2018-06-28 2018-06-28 HA implementation method, device and storage medium between main node and standby node

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810687830.4A CN110661599B (en) 2018-06-28 2018-06-28 HA implementation method, device and storage medium between main node and standby node

Publications (2)

Publication Number Publication Date
CN110661599A CN110661599A (en) 2020-01-07
CN110661599B true CN110661599B (en) 2022-04-29

Family

ID=69026562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810687830.4A Active CN110661599B (en) 2018-06-28 2018-06-28 HA implementation method, device and storage medium between main node and standby node

Country Status (1)

Country Link
CN (1) CN110661599B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114257594B (en) * 2021-12-21 2023-12-01 四川灵通电讯有限公司 Method for distributing network resource to user network side in distributed system
CN117851300A (en) * 2022-09-30 2024-04-09 华为云计算技术有限公司 Multi-AZ arbitration system and method

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6085307A (en) * 1996-11-27 2000-07-04 Vlsi Technology, Inc. Multiple native instruction set master/slave processor arrangement and method thereof
JP2008250419A (en) * 2007-03-29 2008-10-16 Nec Corp Competition arbitration apparatus, master-slave system, and method for competition arbitration
US20080298230A1 (en) * 2007-05-30 2008-12-04 Luft Siegfried J Scheduling of workloads in a distributed compute environment
CN101808091A (en) * 2010-03-11 2010-08-18 中兴通讯股份有限公司 Control method and control system for supporting data protocol protection
CN102355369B (en) * 2011-09-27 2014-01-08 华为技术有限公司 Virtual clustered system as well as processing method and processing device thereof
CN103647820B (en) * 2013-12-09 2016-11-23 华为数字技术(苏州)有限公司 Referee method and arbitration device for distributed cluster system
US10833940B2 (en) * 2015-03-09 2020-11-10 Vapor IO Inc. Autonomous distributed workload and infrastructure scheduling
CN105450446A (en) * 2015-11-17 2016-03-30 绵阳市维博电子有限责任公司 Duplicated hot-redundancy system and arbitration switching method
CN105743995B (en) * 2016-04-05 2019-10-18 北京轻元科技有限公司 A kind of system and method for the deployment of portable High Availabitity and management container cluster
CN107104822B (en) * 2017-03-29 2020-09-08 恒生电子股份有限公司 Server disaster recovery processing method and device, storage medium and electronic equipment
CN107147528A (en) * 2017-05-23 2017-09-08 郑州云海信息技术有限公司 One kind stores gateway intelligently anti-fissure system and method
CN107193695A (en) * 2017-05-25 2017-09-22 北京计算机技术及应用研究所 A kind of configuration and synchronization method of double control disk array
CN107203443A (en) * 2017-06-23 2017-09-26 郑州云海信息技术有限公司 A kind of method and apparatus of the virtual machine High Availabitity based on KVM virtualization
CN107171870A (en) * 2017-07-17 2017-09-15 郑州云海信息技术有限公司 A kind of two-node cluster hot backup method and device
CN107302598A (en) * 2017-08-21 2017-10-27 长沙曙通信息科技有限公司 A kind of new dual-active storage activity arbitration implementation method
CN108132829A (en) * 2018-01-11 2018-06-08 郑州云海信息技术有限公司 A kind of high available virtual machine realization method and system based on OpenStack

Also Published As

Publication number Publication date
CN110661599A (en) 2020-01-07

Similar Documents

Publication Publication Date Title
US11194679B2 (en) Method and apparatus for redundancy in active-active cluster system
CN106330475B (en) Method and device for managing main and standby nodes in communication system and high-availability cluster
CN109495312B (en) Method and system for realizing high-availability cluster based on arbitration disk and double links
CN102394914A (en) Cluster brain-split processing method and device
US11889330B2 (en) Methods and related devices for implementing disaster recovery
CN107508694B (en) Node management method and node equipment in cluster
CN110661599B (en) HA implementation method, device and storage medium between main node and standby node
CN111585835B (en) Control method and device for out-of-band management system and storage medium
CN109189854B (en) Method and node equipment for providing continuous service
CN114124803B (en) Device management method and device, electronic device and storage medium
CN113438105B (en) Method, device and equipment for assisting multi-IRF (inter-range radio frequency) splitting detection by MAD (multi-object detection)
US10382301B2 (en) Efficiently calculating per service impact of ethernet ring status changes
US11954509B2 (en) Service continuation system and service continuation method between active and standby virtual servers
CN108874918B (en) Data processing device, database all-in-one machine and data processing method thereof
CN107819648B (en) Method and device for detecting NETCONF connection of network configuration
CN107547257B (en) Server cluster implementation method and device
US10122588B2 (en) Ring network uplink designation
CN104009873A (en) Processing method and device for iSCSI
WO2022083503A1 (en) Data processing method and device
EP4084492A1 (en) A method, system and olt for dual-parenting pon protection
CN112653596B (en) Method and device for routing information issuing and gateway equipment switching
WO2021249173A1 (en) Distributed storage system, abnormality processing method therefor, and related device
CN117560268A (en) Cluster management method and related device
JP2022174535A (en) Cluster system, monitoring system, monitoring method, and program
CN115277379A (en) Distributed lock disaster tolerance processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant