WO2016070530A1 - Method and system for processing operation of primary and standby device - Google Patents

Method and system for processing operation of primary and standby device Download PDF

Info

Publication number
WO2016070530A1
WO2016070530A1 PCT/CN2015/073275 CN2015073275W WO2016070530A1 WO 2016070530 A1 WO2016070530 A1 WO 2016070530A1 CN 2015073275 W CN2015073275 W CN 2015073275W WO 2016070530 A1 WO2016070530 A1 WO 2016070530A1
Authority
WO
WIPO (PCT)
Prior art keywords
standby
primary
primary device
link
standby device
Prior art date
Application number
PCT/CN2015/073275
Other languages
French (fr)
Chinese (zh)
Inventor
杨青海
毕忠良
杨骐
尹旺中
陈宗立
朱田
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016070530A1 publication Critical patent/WO2016070530A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity

Definitions

  • the present invention relates to the field of communications, and in particular to a method and system for processing an active/standby device.
  • the general bearer is usually configured in the form of a primary device and a standby device.
  • the primary device mainly performs related functions; and the standby device exists as a backup of the primary device.
  • the standby device upgrades the primary device to take over the work of the original primary device, maintaining the uninterrupted service, and when the standby device is down, the primary device reselects the new standby device.
  • heartbeat messages are generally used for keepalive between the primary device and the backup device of the cluster system.
  • the primary device and the standby device are inactive, the node cannot be identified as a node fault or a link fault. As a result, the active device and the standby device are in the wrong path. If the threshold is not exceeded, the heartbeat packet of the peer cannot be received. , it is considered that the other party has an abnormality, and the standby device is started to upgrade to the primary device or re-select the standby device.
  • the system may evolve into a "dual master" (that is, the standby device also switches to the primary device, but the original primary device is still still Running).
  • the recovery is generally performed by the restart of the reforming system, but this will reduce the stability of the system and make the user experience poor.
  • the standby device directly converts to the primary device, causing the two primary devices to operate in the system, reducing The stability of the system makes the user experience poor, and no effective solution has been proposed.
  • the present invention provides a method and system for processing an active/standby device.
  • a method for processing an active/standby device includes: after determining that the primary device and the first standby device are disconnected, the primary device detects the primary device and other devices.
  • Link connectivity wherein the first standby device detects link connectivity between the first standby device and other devices, where the other device is the cluster device in which the primary device and the first standby device are located.
  • a device other than the primary device and the standby device the primary device according to the detection result of the link connectivity Processing of the primary device and/or the first standby device, and the first standby device according to the detection result of the link connectivity to the primary device and/or the first standby device Run for processing.
  • the primary device processes the operation of the primary device and/or the first standby device according to the detection result of the link connectivity, and includes: when the primary device detects the chain When the path is connected, the first standby device is replaced with a second standby device; when the primary device detects that the link is not connected, the active device is prohibited from running for a second predetermined time period. .
  • the first standby device processes the operation of the primary device and the first standby device according to the detection result of the link connectivity, including: when the first standby device detects the chain When the road is connected, it is determined whether the primary device is running; when the primary device is not running, the first standby device is used as the primary device.
  • the first standby device is prohibited from operating for a third predetermined period of time.
  • determining, by at least one of the following manners, whether the active device is running notifying by a third party outside the primary device and the first standby device; transmitting a message on the forwarding plane by using the first standby device The specified information of the channel is detected.
  • the first standby device processes the operation of the primary device and the first standby device according to the detection result of the link connectivity, including: when the first standby device detects the chain When the road is not connected, the first standby device is prohibited from operating during the first predetermined time period.
  • determining that the primary device and the first standby device are disconnected including: when the primary device and/or the first standby device does not receive a keep-alive message, determining the primary device and the The first standby device is disconnected.
  • an operating system for an active and standby device including: a primary device, configured to detect the primary device and the primary device after determining that the primary device and the first standby device are disconnected Link connectivity of other devices, and processing of the operation of the primary device and/or the first standby device according to the detection result of the link connectivity, wherein the other device is the primary device and The cluster system in which the first standby device is located, except for the primary device and the standby device; the first standby device is configured to detect the primary device and the primary device Detecting link connectivity of the first standby device with other devices, and detecting the primary device and/or the first standby according to the detection result of the link connectivity. The operation of the device is processed.
  • the primary device is further configured to: when the primary device detects that the link is connected, replace the first standby device with a second standby device; and when the primary device detects the When the link is not connected, the active device is prohibited from running during the second predetermined time period.
  • the first standby device is further configured to: when the first standby device detects that the link is connected, determine whether the active device is running; when the primary device is not running, The first standby device is used as a primary device.
  • the primary device and the standby device simultaneously detect the link connectivity between each of the primary device and the standby device, and further detect the link connectivity according to the detected link connectivity.
  • the technical solution for processing the active device and/or the standby device solves the problem that in the related art, after the primary device and the standby device are disconnected, since the node failure or the link failure cannot be distinguished, the standby device is directly converted into the primary device.
  • the use of the device causes two main devices to operate in the system, which reduces the stability of the system and makes the user experience poor, thereby enhancing the stability of the system and improving the user experience.
  • FIG. 1 is a flowchart of a method for processing an active/standby device according to an embodiment of the present invention
  • FIG. 2 is a block diagram showing the structure of a cluster system in accordance with a preferred embodiment of the present invention
  • FIG. 3 is a schematic diagram of processing after a master/slave device detects link connectivity according to a preferred embodiment of the present invention
  • FIG. 4 is a structural block diagram of an operation processing system of a master and backup device according to an embodiment of the present invention
  • FIG. 5 is another structural block diagram of an operation processing system of a master and backup device according to an embodiment of the present invention.
  • the active device and the standby device when the active device and the standby device fail to be inactivated, it is not confirmed that the device is faulty or the link is faulty, and the primary device and the backup device may be evolved according to the wrong path, that is, the standby device directly switches to the primary device.
  • the device which causes problems with two active devices in the system, provides the following technical solutions.
  • FIG. 1 is a flowchart of a method for processing an active/standby device according to an embodiment of the present invention. The process includes the following steps:
  • Step S102 after determining that the primary device and the first standby device are disconnected, the primary device detects link connectivity between the primary device and the other device, and the first standby device detects the chain of the first standby device and other devices.
  • Road connectivity wherein the other device is a device other than the primary device and the backup device in the cluster system where the primary device and the first standby device are located;
  • Step S104 The primary device processes the operation of the primary device and/or the first standby device according to the detection result of the link connectivity, and the first standby device performs the primary usage according to the detection result of the link connectivity.
  • the operation of the device and/or the first standby device described above is processed.
  • the primary device and the standby device simultaneously detect the link connectivity between each of the primary device and the standby device, and further detect the link connectivity according to the detected link connectivity.
  • the technical solution for processing the primary device and/or the standby device is that after the primary device and the backup device are disconnected, since the node failure or the link failure cannot be distinguished, the backup device is directly converted into the primary device.
  • the two main devices operate in the system, which reduces the stability of the system and makes the user experience poor, thereby enhancing the stability of the system and improving the user experience.
  • step S104 the embodiment of the present invention can be embodied in the following aspects:
  • the primary device processes the operation of the primary device and/or the first backup device according to the detection result of the link connectivity, and includes: when the primary device detects that the link is connected, And replacing the first standby device with the second standby device; when the primary device detects that the link is not connected, prohibiting the active device from running for a second predetermined time period.
  • the first standby device processes the operation of the primary device and the first backup device according to the detection result of the link connectivity, and includes: when the first standby device detects that the link is connected, determining Whether the above-mentioned primary device is running; when the primary device is not running, the first standby device is used as a primary device, and when the primary device is running, the first is prohibited for a third predetermined time period The standby device is running.
  • the first backup device processes the operation of the primary device and the first backup device according to the detection result of the link connectivity, and includes: when the first standby device detects that the link is not connected, The first standby device is prohibited from operating during the first predetermined time period.
  • the primary device processes the operation of the primary device and/or the standby device according to the link connectivity
  • the standby device uses the primary device according to the link connectivity.
  • the process of processing the standby equipment can be combined and judged.
  • the judgment process of the primary equipment and the judgment process of the standby equipment are not contradictory, and the two processes can coexist.
  • the primary device side and the standby device side simultaneously detect the connectivity of the link, and when the primary device detects the connectivity of the link, when the primary device detects the above
  • the primary device replaces the first standby device with the second standby device, that is, the link does not fail, but the primary device and the backup device still lose connectivity, indicating that the first standby device exists.
  • the fault needs to be replaced with the second standby device; when the primary device detects that the link is not connected, the active device is prohibited from running for the second predetermined time period.
  • the standby device determines whether the active device is running in multiple manners. In an optional example of the embodiment of the present invention, determining whether the primary device is running by using at least one of the following: A third party outside the standby device informs; the specified information of the message transmission channel on the forwarding plane is detected by the first standby device.
  • step S102 determining that the primary device and the first standby device are disconnected by performing the following process: when the primary device and/or the first standby device does not receive the keep-alive message, determining the foregoing The primary device and the first standby device are disconnected.
  • the primary device and the standby device calculate the connectivity of the link based on the connectivity with other devices in the system.
  • the connectivity value is TURE(T), indicating the relationship between the primary device and the backup device.
  • the link is connected, or FALSE(F), indicating that the link between the primary device and the standby device is not connected.
  • the two-way keep-alive and two-way detection are used between the active device and the standby device to ensure that the two ends of the link are aware of the state change of the keep-alive link at the same time. If the keep-alive fails in either direction, it is determined that the primary device and the standby device are disconnected.
  • TT the connectivity detection of the active and standby devices is TURE(T);
  • the primary device selects a new standby device.
  • the standby device detects whether the primary device is in position through the third-party mechanism. If the primary device is running, the standby device is suspended for a preset period of time. The detection result is the primary device. The standby device is switched to the primary device when it is not running.
  • the primary device connectivity detection is FALSE (F)
  • the standby device connectivity detection is TURE (T);
  • the primary device is suspended for a preset period of time; the standby device detects whether the primary device is running through the third-party mechanism. If the primary device is running, the standby device is suspended. If the primary device is not running, the backup device is switched to Main equipment.
  • the primary device selects a new alternate device; the standby device is suspended.
  • the primary and backup devices are suspended.
  • no matter which party detects the connectivity in the primary device and the standby device is FASLE (F), all of which are suspended during the predetermined time period; no matter which party detects the connectivity between the primary device and the standby device It is TURE(T), the primary device selects the new standby device; the standby device detects whether the primary device is running through the 3rd party mechanism, and the detection device is running, then the standby device is suspended, and the detection result is not running.
  • the backup device is switched to the primary device.
  • the cluster system of the preferred embodiment of the present invention will be briefly described. As shown in FIG. 2, the cluster system is divided into several devices. For the convenience of description, only three devices are described in FIG. The primary device (master node) and the standby device (by the node) are bi-directionally kept and detected in both directions. The primary device and the standby device perform connectivity detection with other devices (other nodes) in the system.
  • the primary device and the standby device perform connectivity detection with other devices in the system, and adopt a message-based detection mechanism, which may be, but is not limited to, the following solutions: communication link detection, such as a TCP link, a TIPC link, and the like.
  • communication link detection such as a TCP link, a TIPC link, and the like.
  • the asynchronous message keeps alive, and the above-mentioned solution is a technical means commonly used in the related art, and the embodiment of the present invention will not be described again.
  • FIG. 3 is a schematic diagram of processing after the active/standby device detects link connectivity according to a preferred embodiment of the present invention.
  • the technical solution illustrated in FIG. 3 can be summarized as: mutual heartbeat between the primary device and the standby device.
  • the message, the primary device and the standby device each receive and check the received message.
  • the connectivity detection is FASLE(F) to restart itself;
  • the connectivity detection is TRUE(T), if it is the primary device, the new standby device is selected, if it is standby
  • the device detects whether the active device is running through the third-party mechanism. If the primary device is running, the standby device is suspended. If the primary device is not running, the backup device is transferred to the active device.
  • FIG. 4 is a structural block diagram of an operation processing system for the active and standby devices according to the embodiment of the present invention.
  • the primary device 40 is configured to detect link connectivity between the primary device 40 and the other device 44 after determining that the primary device 40 and the first backup device 42 are disconnected, and the detection result according to the link connectivity.
  • the operation of the primary device 40 and/or the first standby device 42 is processed, wherein the other devices 44 are in the cluster system where the primary device 40 and the first standby device 42 are located, except the primary device 40 and the first standby device.
  • the first backup device 42 is configured to detect link connectivity between the first backup device 42 and the other device 44 when the primary device 40 detects link connectivity between the primary device 40 and the other device 44, and according to the chain
  • the detection result of the road connectivity processes the operation of the primary device 40 and/or the first standby device 42.
  • the primary device and the standby device simultaneously detect the link connectivity between the primary device and the standby device, and then according to the detected link.
  • the technical solution of the detection of the connectivity of the primary device and/or the standby device is that after the primary device and the backup device are disconnected, since the node failure or the link failure cannot be distinguished, the standby device is directly converted to
  • the main device causes two main devices to operate in the system, which reduces the stability of the system and makes the user experience poor, thereby enhancing the stability of the system and improving the user experience.
  • the primary device 40 is further configured to: when the primary device 40 detects that the link is connected, replace the first backup device 42 with the second backup device 46; and when the primary device is When detecting that the link is not connected, the foregoing active device is prohibited from running during the second predetermined time period; the first standby device 42 is further configured. To determine whether the primary device 40 is running when the first backup device 42 detects that the link is in communication, the first standby device 42 is used as the primary device when the primary device 40 is not operating.
  • the embodiment of the present invention achieves the following technical effects: the problem of "dual master" caused by direct switching of the standby device in the related art is solved, the actual condition of the keep-alive link is correctly detected, and the system is unified. Evolve the path to avoid the separate evolution of the device to prevent the occurrence of the above-mentioned double main phenomenon and improve the stability of the system.
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
  • the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module.
  • the invention is not limited to any specific combination of hardware and software.
  • the primary device and the standby device simultaneously detect link connectivity between the primary device and the standby device, and further, according to the detected chain.
  • the technical solution for processing the primary device and/or the standby device is to solve the problem that the primary device and the backup device are disconnected after the primary device and the backup device are disconnected, and the node failure or the link failure cannot be distinguished.
  • the device will directly convert the main device into two.
  • the main device is running in the system, which reduces the stability of the system and makes the user experience worse. This enhances the stability of the system and improves the user experience. effect.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Hardware Redundancy (AREA)

Abstract

Provided are a method and system for processing operation of a primary and a standby device. The method comprises: after determining that a primary device is out of contact with a first standby device, detecting, by a primary device, link connectivity between the primary device and other devices; meanwhile, detecting, by the first standby device, link connectivity between the first standby device and other devices, wherein the other devices are devices, except the primary device and the standby device, in a cluster system where the primary device and the first standby device are located; processing, by the primary device, operation of the primary device and/or the first standby device according to a detection result of the link connectivity; and processing, by the first standby device, the operation of the primary device and/or the first standby device according to the detection result of the link connectivity. By means of the technical solution, the problem in the related art that system stability is reduced due to the fact that a node failure or a link failure are not distinguished is solved, thereby achieving the effect of enhancing the system stability.

Description

主备设备的运行处理方法及系统Operation and processing method and system for active and standby equipment 技术领域Technical field
本发明涉及通信领域,具体而言,涉及一种主备设备的运行处理方法及系统。The present invention relates to the field of communications, and in particular to a method and system for processing an active/standby device.
背景技术Background technique
在集群系统中,一般承载的通常会配置为主用设备和备用设备的形式。其中,主用设备主要执行相关功能;而备用设备作为主用设备的备份存在。当主用设备宕机时,备用设备就会升级为主用设备接替原主用设备的相关工作,维持业务的不中断,而当备用设备宕机时,主用设备会重新选择出新的备用设备。基于上述技术方案,主用设备和备用红色贝协同增强的系统的稳定性。In a cluster system, the general bearer is usually configured in the form of a primary device and a standby device. Among them, the primary device mainly performs related functions; and the standby device exists as a backup of the primary device. When the primary device is down, the standby device upgrades the primary device to take over the work of the original primary device, maintaining the uninterrupted service, and when the standby device is down, the primary device reselects the new standby device. Based on the above technical solution, the stability of the system in which the primary device and the spare red bay are synergistically enhanced.
现有技术中,集群系统的主用设备和备用设备间一般使用心跳报文进行保活。当主用设备和备用设备节点保活失效时,并不能确认是节点故障还是链路故障,进而导致主用设备和备用设备按照错误的路径演化,当超过一定时间阈值收不到对端的心跳报文,则认为对方发生了异常,并启动备用设备升级为主用设备或者重新选择备用设备的操作。但是,当主用设备和备用设备之间的保活链路出现闪断并恢复后,系统可能会演化成“双主”(即备用设备也会切换为主用设备,但原主用设备仍然还在运行中)。目前市场上相关技术中在集群设备出现双主后一般都是通过重整系统重启进行恢复,但是这样会降低系统的稳定性,使得用户体验度差。In the prior art, heartbeat messages are generally used for keepalive between the primary device and the backup device of the cluster system. When the primary device and the standby device are inactive, the node cannot be identified as a node fault or a link fault. As a result, the active device and the standby device are in the wrong path. If the threshold is not exceeded, the heartbeat packet of the peer cannot be received. , it is considered that the other party has an abnormality, and the standby device is started to upgrade to the primary device or re-select the standby device. However, when the keep alive link between the active device and the standby device flashes and recovers, the system may evolve into a "dual master" (that is, the standby device also switches to the primary device, but the original primary device is still still Running). At present, in the related technologies on the market, after the dual-master of the cluster device appears, the recovery is generally performed by the restart of the reforming system, but this will reduce the stability of the system and make the user experience poor.
针对相关技术中,在主用设备和备用设备失联后,由于不能区分是节点故障还是链路故障,进而备用设备会直接转换为主用设备造成了两个主用设备在系统中运行,降低了系统稳定性,使得用户体验度差的问题,尚未提出有效的解决方案。In the related art, after the primary device and the standby device are disconnected, since the node failure or the link failure cannot be distinguished, the standby device directly converts to the primary device, causing the two primary devices to operate in the system, reducing The stability of the system makes the user experience poor, and no effective solution has been proposed.
发明内容Summary of the invention
为了解决上述技术问题,本发明提供了一种主备设备的运行处理方法及系统。In order to solve the above technical problem, the present invention provides a method and system for processing an active/standby device.
根据本发明的一个实施例,提供了一种主备设备的运行处理方法,包括:在确定主用设备和第一备用设备失联后,所述主用设备检测该主用设备与其他设备的链路连通性,同时所述第一备用设备检测该第一备用设备与其他设备的链路连通性,其中,所述其他设备为所述主用设备和所述第一备用设备所在的集群系统中,除所述主用设备和所述备用设备之外的设备;所述主用设备根据所述链路连通性的检测结果对所述 主用设备和/或所述第一备用设备的运行进行处理,以及所述第一备用设备根据所述链路连通性的检测结果对所述主用设备和/或所述第一备用设备的运行进行处理。According to an embodiment of the present invention, a method for processing an active/standby device includes: after determining that the primary device and the first standby device are disconnected, the primary device detects the primary device and other devices. Link connectivity, wherein the first standby device detects link connectivity between the first standby device and other devices, where the other device is the cluster device in which the primary device and the first standby device are located. a device other than the primary device and the standby device; the primary device according to the detection result of the link connectivity Processing of the primary device and/or the first standby device, and the first standby device according to the detection result of the link connectivity to the primary device and/or the first standby device Run for processing.
优选地,所述主用设备根据所述链路连通性的检测结果对所述主用设备和/或所述第一备用设备的运行进行处理,包括:当所述主用设备检测所述链路为连通时,则将所述第一备用设备更换为第二备用设备;当所述主用设备检测所述链路未连通时,则在第二预定时间段内禁止所述主用设备运行。Preferably, the primary device processes the operation of the primary device and/or the first standby device according to the detection result of the link connectivity, and includes: when the primary device detects the chain When the path is connected, the first standby device is replaced with a second standby device; when the primary device detects that the link is not connected, the active device is prohibited from running for a second predetermined time period. .
优选地,所述第一备用设备根据所述链路连通性的检测结果对所述主用设备和所述第一备用设备的运行进行处理,包括:当所述第一备用设备检测所述链路为连通时,则判断所述主用设备是否正在运行;在所述主用设备未在运行时,将所述第一备用设备作为主用设备。Preferably, the first standby device processes the operation of the primary device and the first standby device according to the detection result of the link connectivity, including: when the first standby device detects the chain When the road is connected, it is determined whether the primary device is running; when the primary device is not running, the first standby device is used as the primary device.
优选地,在所述主用设备正在运行时,则在第三预定时间段内禁止所述第一备用设备运行。Preferably, when the primary device is running, the first standby device is prohibited from operating for a third predetermined period of time.
优选地,通过以下至少之一方式判断所述主用设备是否正在运行:通过所述主用设备和所述第一备用设备外的第三方告知;通过所述第一备用设备在转发面消息传输通道的指定信息检测。Preferably, determining, by at least one of the following manners, whether the active device is running: notifying by a third party outside the primary device and the first standby device; transmitting a message on the forwarding plane by using the first standby device The specified information of the channel is detected.
优选地,所述第一备用设备根据所述链路连通性的检测结果对所述主用设备和所述第一备用设备的运行进行处理,包括:当所述第一备用设备检测所述链路未连通时,则在第一预定时间段内禁止所述第一备用设备运行。Preferably, the first standby device processes the operation of the primary device and the first standby device according to the detection result of the link connectivity, including: when the first standby device detects the chain When the road is not connected, the first standby device is prohibited from operating during the first predetermined time period.
优选地,确定主用设备和第一备用设备失联,包括:当所述主用设备和/或所述第一备用设备未接收到保活报文时,确定所述主用设备和所述第一备用设备失联。Preferably, determining that the primary device and the first standby device are disconnected, including: when the primary device and/or the first standby device does not receive a keep-alive message, determining the primary device and the The first standby device is disconnected.
根据本发明的另一个实施例,还提供了一种主备设备的运行处理系统,包括:主用设备,设置为在确定主用设备和第一备用设备失联后,检测该主用设备与其他设备的链路连通性,以及根据所述链路连通性的检测结果对所述主用设备和/或第一备用设备的运行进行处理,其中,所述其他设备为所述主用设备和所述第一备用设备所在的集群系统中,除所述主用设备和所述备用设备之外的设备;所述第一备用设备,设置为在所述主用设备检测所述主用设备与其他设备的链路连通性时,检测所述第一备用设备与其他设备的链路连通性,以及根据所述链路连通性的检测结果对所述主用设备和/或所述第一备用设备的运行进行处理。 According to another embodiment of the present invention, an operating system for an active and standby device is further provided, including: a primary device, configured to detect the primary device and the primary device after determining that the primary device and the first standby device are disconnected Link connectivity of other devices, and processing of the operation of the primary device and/or the first standby device according to the detection result of the link connectivity, wherein the other device is the primary device and The cluster system in which the first standby device is located, except for the primary device and the standby device; the first standby device is configured to detect the primary device and the primary device Detecting link connectivity of the first standby device with other devices, and detecting the primary device and/or the first standby according to the detection result of the link connectivity. The operation of the device is processed.
优选地,所述主用设备还设置为当所述主用设备检测所述链路为连通时,则将所述第一备用设备更换为第二备用设备;以及当所述主用设备检测所述链路未连通时,则在第二预定时间段内禁止所述主用设备运行。Preferably, the primary device is further configured to: when the primary device detects that the link is connected, replace the first standby device with a second standby device; and when the primary device detects the When the link is not connected, the active device is prohibited from running during the second predetermined time period.
优选地,所述第一备用设备还设置为当所述第一备用设备检测所述链路为连通时,则判断所述主用设备是否正在运行;在所述主用设备未在运行时,将所述第一备用设备作为主用设备。Preferably, the first standby device is further configured to: when the first standby device detects that the link is connected, determine whether the active device is running; when the primary device is not running, The first standby device is used as a primary device.
通过本发明,采用在主用设备和备用设备失联后,主用设备和备用设备同时检测各自与集群系统中其他设备的链路连通性,进而根据检测到的链路连通性的检测结果对主用设备和/或备用设备进行处理的技术方案,解决了相关技术中在主用设备和备用设备失联后,由于不能区分是节点故障还是链路故障,进而备用设备会在直接转换为主用设备造成了两个主用设备在系统中运行,降低了系统稳定性,使得用户体验度差的问题,进而达到了增强了系统稳定性,提升了用户体验度的效果。According to the present invention, after the primary device and the standby device are disconnected, the primary device and the standby device simultaneously detect the link connectivity between each of the primary device and the standby device, and further detect the link connectivity according to the detected link connectivity. The technical solution for processing the active device and/or the standby device solves the problem that in the related art, after the primary device and the standby device are disconnected, since the node failure or the link failure cannot be distinguished, the standby device is directly converted into the primary device. The use of the device causes two main devices to operate in the system, which reduces the stability of the system and makes the user experience poor, thereby enhancing the stability of the system and improving the user experience.
附图说明DRAWINGS
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings described herein are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:
图1是根据本发明实施例的主备设备的运行处理方法的流程图;1 is a flowchart of a method for processing an active/standby device according to an embodiment of the present invention;
图2是根据本发明优选实施例的集群系统的结构框图;2 is a block diagram showing the structure of a cluster system in accordance with a preferred embodiment of the present invention;
图3为根据本发明优选实施例的主备设备检测链路连通情况后的处理示意图;3 is a schematic diagram of processing after a master/slave device detects link connectivity according to a preferred embodiment of the present invention;
图4是根据本发明实施例的主备设备的运行处理系统的结构框图;4 is a structural block diagram of an operation processing system of a master and backup device according to an embodiment of the present invention;
图5是根据本发明实施例的主备设备的运行处理系统的另一结构框图。FIG. 5 is another structural block diagram of an operation processing system of a master and backup device according to an embodiment of the present invention.
具体实施方式detailed description
下文中将参考附图并结合实施例来详细说明本发明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。 The invention will be described in detail below with reference to the drawings in conjunction with the embodiments. It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict.
相关技术中,由于当主用设备和备用设备保活失效时,并不能确认是设备故障或链路故障,进而导致主用设备和备用设备可能按照错误的路径演化,即备用设备直接切换为主用设备,导致在系统中存在两个主用设备的问题,提供了以下技术方案。In the related art, when the active device and the standby device fail to be inactivated, it is not confirmed that the device is faulty or the link is faulty, and the primary device and the backup device may be evolved according to the wrong path, that is, the standby device directly switches to the primary device. The device, which causes problems with two active devices in the system, provides the following technical solutions.
为了解决上述技术问题,在本实施例中提供了一种主备设备的运行处理方法,图1是根据本发明实施例的主备设备的运行处理方法的流程图,如图1所示,该流程包括如下步骤:In order to solve the above technical problem, in this embodiment, a method for processing an active/standby device is provided. FIG. 1 is a flowchart of a method for processing an active/standby device according to an embodiment of the present invention. The process includes the following steps:
步骤S102,在确定主用设备和第一备用设备失联后,主用设备检测该主用设备与其他设备的链路连通性,同时第一备用设备检测该第一备用设备与其他设备的链路连通性,其中,上述其他设备为上述主用设备和第一备用设备所在的集群系统中,除上述主用设备和上述备用设备之外的设备;Step S102, after determining that the primary device and the first standby device are disconnected, the primary device detects link connectivity between the primary device and the other device, and the first standby device detects the chain of the first standby device and other devices. Road connectivity, wherein the other device is a device other than the primary device and the backup device in the cluster system where the primary device and the first standby device are located;
步骤S104,主用设备根据上述链路连通性的检测结果对主用设备和/或第一备用设备的运行进行处理,以及上述第一备用设备根据上述链路连通性的检测结果对上述主用设备和/或上述第一备用设备的运行进行处理。Step S104: The primary device processes the operation of the primary device and/or the first standby device according to the detection result of the link connectivity, and the first standby device performs the primary usage according to the detection result of the link connectivity. The operation of the device and/or the first standby device described above is processed.
通过上述各个步骤,采用在主用设备和备用设备失联后,主用设备和备用设备同时检测各自与集群系统中其他设备的链路连通性,进而根据检测到的链路连通性的检测结果对主用设备和/或备用设备进行处理的技术方案,在主用设备和备用设备失联后,由于不能区分是节点故障还是链路故障,进而备用设备会在直接转换为主用设备造成了两个主用设备在系统中运行,降低了系统稳定性,使得用户体验度差的问题,进而达到了增强了系统稳定性,提升了用户体验度的效果。Through the above steps, after the primary device and the standby device are disconnected, the primary device and the standby device simultaneously detect the link connectivity between each of the primary device and the standby device, and further detect the link connectivity according to the detected link connectivity. The technical solution for processing the primary device and/or the standby device is that after the primary device and the backup device are disconnected, since the node failure or the link failure cannot be distinguished, the backup device is directly converted into the primary device. The two main devices operate in the system, which reduces the stability of the system and makes the user experience poor, thereby enhancing the stability of the system and improving the user experience.
对于步骤S104所体现的技术方案,在本发明实施例可以从以下几个方面体现:For the technical solution embodied in step S104, the embodiment of the present invention can be embodied in the following aspects:
(1)上述主用设备根据上述链路连通性的检测结果对上述主用设备和/或上述第一备用设备的运行进行处理,包括:当上述主用设备检测上述链路为连通时,则将上述第一备用设备更换为第二备用设备;当上述主用设备检测上述链路未连通时,则在第二预定时间段内禁止上述主用设备运行。(1) The primary device processes the operation of the primary device and/or the first backup device according to the detection result of the link connectivity, and includes: when the primary device detects that the link is connected, And replacing the first standby device with the second standby device; when the primary device detects that the link is not connected, prohibiting the active device from running for a second predetermined time period.
(2)第一备用设备根据上述链路连通性的检测结果对上述主用设备和上述第一备用设备的运行进行处理,包括:当上述第一备用设备检测上述链路为连通时,则判断上述主用设备是否正在运行;在上述主用设备未在运行时,将上述第一备用设备作为主用设备,在上述主用设备正在运行时,则在第三预定时间段内禁止上述第一备用设备运行。 (2) The first standby device processes the operation of the primary device and the first backup device according to the detection result of the link connectivity, and includes: when the first standby device detects that the link is connected, determining Whether the above-mentioned primary device is running; when the primary device is not running, the first standby device is used as a primary device, and when the primary device is running, the first is prohibited for a third predetermined time period The standby device is running.
(3)上述第一备用设备根据上述链路连通性的检测结果对上述主用设备和上述第一备用设备的运行进行处理,包括:当上述第一备用设备检测上述链路未连通时,则在第一预定时间段内禁止上述第一备用设备运行。(3) The first backup device processes the operation of the primary device and the first backup device according to the detection result of the link connectivity, and includes: when the first standby device detects that the link is not connected, The first standby device is prohibited from operating during the first predetermined time period.
需要说明的是,上述(1)-(3),即主用设备根据链路连通性对主用设备和/或备用设备的运行进行处理,以及备用设备根据链路连通性对主用设备和/或备用设备的运行进行处理的过程,是可以结合判断的,主用设备的判断过程和备用设备的判断过程并不矛盾,两个过程是可以共存的。It should be noted that, in the above (1)-(3), the primary device processes the operation of the primary device and/or the standby device according to the link connectivity, and the standby device uses the primary device according to the link connectivity. / The process of processing the standby equipment can be combined and judged. The judgment process of the primary equipment and the judgment process of the standby equipment are not contradictory, and the two processes can coexist.
实际上,在主用设备和备用设备失联后,主用设备侧和备用设备侧是同时检测链路的连通性,而当主用设备检测链路的连通性时,当上述主用设备检测上述链路为连通时,则上述主用设备将上述第一备用设备更换为第二备用设备,即链路没有发生故障,但主用设备和备用设备仍然失联,那么说明上述第一备用设备存在故障,需要更换为第二备用设备;当上述主用设备检测上述链路未连通时,则在第二预定时间段内禁止上述主用设备运行。In fact, after the primary device and the backup device are disconnected, the primary device side and the standby device side simultaneously detect the connectivity of the link, and when the primary device detects the connectivity of the link, when the primary device detects the above When the link is connected, the primary device replaces the first standby device with the second standby device, that is, the link does not fail, but the primary device and the backup device still lose connectivity, indicating that the first standby device exists. The fault needs to be replaced with the second standby device; when the primary device detects that the link is not connected, the active device is prohibited from running for the second predetermined time period.
备用设备判断主用设备是否正在运行可以有多种方式,在本发明实施例的一个可选示例中,通过以下至少之一方式判断上述主用设备是否正在运行:通过上述主用设备和上述第一备用设备外的第三方告知;通过上述第一备用设备在转发面消息传输通道的指定信息检测。The standby device determines whether the active device is running in multiple manners. In an optional example of the embodiment of the present invention, determining whether the primary device is running by using at least one of the following: A third party outside the standby device informs; the specified information of the message transmission channel on the forwarding plane is detected by the first standby device.
可选地,在步骤S102中,可以通过执行以下过程确定主用设备和第一备用设备失联:当上述主用设备和/或上述第一备用设备未接收到保活报文时,确定上述主用设备和上述第一备用设备失联。Optionally, in step S102, determining that the primary device and the first standby device are disconnected by performing the following process: when the primary device and/or the first standby device does not receive the keep-alive message, determining the foregoing The primary device and the first standby device are disconnected.
综上所述,主用设备和备用设备基于与系统中其他设备的连通状态,计算出自身链路的连通性,其中,连通性取值TURE(T),说明主用设备和备用设备之间的链路为连通的,或FALSE(F),说明主用设备和备用设备之间的链路是未连通的。In summary, the primary device and the standby device calculate the connectivity of the link based on the connectivity with other devices in the system. The connectivity value is TURE(T), indicating the relationship between the primary device and the backup device. The link is connected, or FALSE(F), indicating that the link between the primary device and the standby device is not connected.
并且,主用设备和备用设备间采用双向保活,双向检测,以保证链路两端同一时间内感知保活链路的状态变化。任一方向保活失效,则判定主用设备和备用设备失联。In addition, the two-way keep-alive and two-way detection are used between the active device and the standby device to ensure that the two ends of the link are aware of the state change of the keep-alive link at the same time. If the keep-alive fails in either direction, it is determined that the primary device and the standby device are disconnected.
本发明实施例提供的上述技术方案:在主用设备和备用设备失联后,主备设备计算自身链路的连通性,计算结果可能出现如下4种情况,如下表一所示:The foregoing technical solution provided by the embodiment of the present invention: after the primary device and the standby device are disconnected, the active and standby devices calculate connectivity of the link, and the following four situations may occur in the calculation result, as shown in Table 1 below:
表一Table I
  主(T)Master (T) 主(F)Main (F)
备(T)Preparation (T) TTTT FTFT
备(F)Preparation (F) TFTF FFFF
以下对上述四种情况进行简单说明:The following four cases are briefly described:
a)TT,主备设备连通性检测都是TURE(T);a) TT, the connectivity detection of the active and standby devices is TURE(T);
主用设备选择新的备用设备;而备用设备通过第3方机制探测主用设备是否在位,探测结果主用设备正在运行则备用设备在预设时间段内暂停使用,探测结果为主用设备没有在运行则备用设备转为主用设备。The primary device selects a new standby device. The standby device detects whether the primary device is in position through the third-party mechanism. If the primary device is running, the standby device is suspended for a preset period of time. The detection result is the primary device. The standby device is switched to the primary device when it is not running.
b)FT,主用设备连通性检测是FALSE(F),备用设备连通性检测是TURE(T);b) FT, the primary device connectivity detection is FALSE (F), the standby device connectivity detection is TURE (T);
主在预设时间内暂停使用;备用设备通过第3方机制探测主用设备是否正在运行,探测结果主用设备正在运行则备用设备暂停使用,探测结果主用设备没有正在运行则备用设备转为主用设备。The primary device is suspended for a preset period of time; the standby device detects whether the primary device is running through the third-party mechanism. If the primary device is running, the standby device is suspended. If the primary device is not running, the backup device is switched to Main equipment.
c)TF,主用设备连通性检测是TURE(T),备用设备连通性检测是FALSE(F);c) TF, the primary device connectivity detection is TURE(T), and the standby device connectivity detection is FALSE(F);
主用设备选择新的备用设备;备用设备暂停使用。The primary device selects a new alternate device; the standby device is suspended.
d)FF,主备连通性检测都是FALSE(F);d) FF, the primary and secondary connectivity tests are FALSE (F);
主用设备和备用设备暂停使用。The primary and backup devices are suspended.
上述四种情况可以大概总结为:主用设备和备用设备中无论哪方检测连通性是FASLE(F),均在预定时间段内暂停运行;主用设备和备用设备中无论哪方检测连通性是TURE(T),主用设备则选择新的备用设备;备用设备则通过第3方机制探测主用设备是否正在运行,探测结果主正在运行则备用设备暂停使用,探测结果主没有在运行则备用设备转为主用设备。The above four cases can be roughly summarized as: no matter which party detects the connectivity in the primary device and the standby device is FASLE (F), all of which are suspended during the predetermined time period; no matter which party detects the connectivity between the primary device and the standby device It is TURE(T), the primary device selects the new standby device; the standby device detects whether the primary device is running through the 3rd party mechanism, and the detection device is running, then the standby device is suspended, and the detection result is not running. The backup device is switched to the primary device.
为了更好的理解上述主用设备和备用设备失联后的技术方案,以下结合优选实施例进行说明,但不限定本发明实施例:In order to better understand the technical solution after the above-mentioned main device and the backup device are disconnected, the following description will be made with reference to the preferred embodiments, but the embodiments of the present invention are not limited:
首先,对本发明优选实施例的集群系统进行简单说明,如图2所示,集群系统按照划分为若干个设备。为了方便描述,图2中只描述了3个设备。主用设备(主节点)和备用设备(被节点)双向保活,双向检测。主用设备和备用设备与系统中其他设备(其他节点)进行连通性检测。First, the cluster system of the preferred embodiment of the present invention will be briefly described. As shown in FIG. 2, the cluster system is divided into several devices. For the convenience of description, only three devices are described in FIG. The primary device (master node) and the standby device (by the node) are bi-directionally kept and detected in both directions. The primary device and the standby device perform connectivity detection with other devices (other nodes) in the system.
其中,主用设备和备用设备与系统中其他设备进行连通性检测,采用基于消息的检测机制,可采用但不限于如下方案:通信链路检测,比如TCP链路、TIPC链路等, 异步消息保活,由于上述方案为相关技术中常用的技术手段,本发明实施例对此不再赘述。The primary device and the standby device perform connectivity detection with other devices in the system, and adopt a message-based detection mechanism, which may be, but is not limited to, the following solutions: communication link detection, such as a TCP link, a TIPC link, and the like. The asynchronous message keeps alive, and the above-mentioned solution is a technical means commonly used in the related art, and the embodiment of the present invention will not be described again.
图3为根据本发明优选实施例的主备设备检测链路连通情况后的处理示意图,如图3所示,图3所示意的技术方案可以总结为:主用设备和备用设备间互发心跳报文,主用设备和备用设备各自接收和检查收到的报文。通过保活报文检测到主备失联后,连通性检测为FASLE(F)者重启自己;连通性检测为TRUE(T)者,如果是主用设备则选择新的备用设备,如果是备用设备则通过第3方机制探测主用设备是否正在运行,探测结果主用设备正在运行则备用设备暂停使用,探测结果未主用设备未正在运行则备用设备转为主用设备。FIG. 3 is a schematic diagram of processing after the active/standby device detects link connectivity according to a preferred embodiment of the present invention. As shown in FIG. 3, the technical solution illustrated in FIG. 3 can be summarized as: mutual heartbeat between the primary device and the standby device. The message, the primary device and the standby device each receive and check the received message. After the active/standby loss is detected by the keep-alive message, the connectivity detection is FASLE(F) to restart itself; the connectivity detection is TRUE(T), if it is the primary device, the new standby device is selected, if it is standby The device detects whether the active device is running through the third-party mechanism. If the primary device is running, the standby device is suspended. If the primary device is not running, the backup device is transferred to the active device.
需要说明的是,图3中的“在位”可以理解为是否在运行,“自杀”可以理解为在预定时间段内不使用。It should be noted that “in place” in FIG. 3 can be understood as whether it is running, and “suicide” can be understood as not being used within a predetermined time period.
在本发明实施例中,还提供了一种主备设备的运行处理系统,图4为根据本发明实施例的主备设备的运行处理系统的结构框图,如图4所示,包括:In the embodiment of the present invention, an operational processing system for the active and standby devices is also provided. FIG. 4 is a structural block diagram of an operation processing system for the active and standby devices according to the embodiment of the present invention.
主用设备40,设置为在确定主用设备40和第一备用设备42失联后,检测该主用设备40与其他设备44的链路连通性,以及根据所述链路连通性的检测结果对主用设备40和/或第一备用设备42的运行进行处理,其中,其他设备44为主用设备40和第一备用设备42所在的集群系统中,除主用设备40和第一备用设备42之外的设备;The primary device 40 is configured to detect link connectivity between the primary device 40 and the other device 44 after determining that the primary device 40 and the first backup device 42 are disconnected, and the detection result according to the link connectivity. The operation of the primary device 40 and/or the first standby device 42 is processed, wherein the other devices 44 are in the cluster system where the primary device 40 and the first standby device 42 are located, except the primary device 40 and the first standby device. Equipment other than 42;
第一备用设备42,设置为在主用设备40检测主用设备40与其他设备44的链路连通性时,检测第一备用设备42与其他设备44的链路连通性,以及根据所述链路连通性的检测结果对主用设备40和/或第一备用设备42的运行进行处理。The first backup device 42 is configured to detect link connectivity between the first backup device 42 and the other device 44 when the primary device 40 detects link connectivity between the primary device 40 and the other device 44, and according to the chain The detection result of the road connectivity processes the operation of the primary device 40 and/or the first standby device 42.
通过上述系统内各个设备的综合作用,采用在主用设备和备用设备失联后,主用设备和备用设备同时检测各自与集群系统中其他设备的链路连通性,进而根据检测到的链路连通性的检测结果对主用设备和/或备用设备进行处理的技术方案,在主用设备和备用设备失联后,由于不能区分是节点故障还是链路故障,进而备用设备会在直接转换为主用设备造成了两个主用设备在系统中运行,降低了系统稳定性,使得用户体验度差的问题,进而达到了增强了系统稳定性,提升了用户体验度的效果。Through the combined action of the devices in the above system, after the primary device and the standby device are disconnected, the primary device and the standby device simultaneously detect the link connectivity between the primary device and the standby device, and then according to the detected link. The technical solution of the detection of the connectivity of the primary device and/or the standby device is that after the primary device and the backup device are disconnected, since the node failure or the link failure cannot be distinguished, the standby device is directly converted to The main device causes two main devices to operate in the system, which reduces the stability of the system and makes the user experience poor, thereby enhancing the stability of the system and improving the user experience.
可选地,如图5所示,主用设备40还设置为当主用设备40检测上述链路为连通时,则将第一备用设备42更换为第二备用设备46;以及当上述主用设备检测上述链路未连通时,则在第二预定时间段内禁止上述主用设备运行;第一备用设备42还设置 为当第一备用设备42检测上述链路为连通时,则判断主用设备40是否正在运行;在主用设备40未在运行时,将第一备用设备42作为主用设备。Optionally, as shown in FIG. 5, the primary device 40 is further configured to: when the primary device 40 detects that the link is connected, replace the first backup device 42 with the second backup device 46; and when the primary device is When detecting that the link is not connected, the foregoing active device is prohibited from running during the second predetermined time period; the first standby device 42 is further configured. To determine whether the primary device 40 is running when the first backup device 42 detects that the link is in communication, the first standby device 42 is used as the primary device when the primary device 40 is not operating.
综上所述,本发明实施例达到了以下技术效果:解决了相关技术中备用设备直接切换而导致的“双主”的问题,正确的检测保活链路的实际状况,并制定系统统一的演化路径,避免设备单独演化,以防止出现上述双主现象的发生,提高了系统的稳定性。In summary, the embodiment of the present invention achieves the following technical effects: the problem of "dual master" caused by direct switching of the standby device in the related art is solved, the actual condition of the keep-alive link is correctly detected, and the system is unified. Evolve the path to avoid the separate evolution of the device to prevent the occurrence of the above-mentioned double main phenomenon and improve the stability of the system.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation. Based on this understanding, the technical solution of the present invention, which is essential or contributes to the prior art, can be embodied in the form of a software product stored in a storage medium (such as a read-only memory (Read-Only). Memory, referred to as ROM), Random Access Memory (RAM), disk, CD-ROM, includes a number of instructions to make a terminal device (can be a mobile phone, computer, server, or network device, etc.) The methods described in various embodiments of the invention are performed.
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的对象在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It is to be understood that the terms "first", "second" and the like in the specification and claims of the present invention are used to distinguish similar objects, and are not necessarily used to describe a particular order or order. It is to be understood that the objects so used are interchangeable, where appropriate, so that the embodiments of the invention described herein can be carried out in a sequence other than those illustrated or described herein. In addition, the terms "comprises" and "comprises" and "the" and "the" are intended to cover a non-exclusive inclusion, for example, a process, method, system, product, or device that comprises a series of steps or units is not necessarily limited to Those steps or units may include other steps or units not explicitly listed or inherent to such processes, methods, products or devices.
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。It will be apparent to those skilled in the art that the various modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein. The steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 The above description is only the preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.
工业实用性Industrial applicability
基于本发明实施例提供的上述技术方案,采用在主用设备和备用设备失联后,主用设备和备用设备同时检测各自与集群系统中其他设备的链路连通性,进而根据检测到的链路连通性的检测结果对主用设备和/或备用设备进行处理的技术方案,解决了相关技术中在主用设备和备用设备失联后,由于不能区分是节点故障还是链路故障,进而备用设备会在直接转换为主用设备造成了两个主用设备在系统中运行,降低了系统稳定性,使得用户体验度差的问题,进而达到了增强了系统稳定性,提升了用户体验度的效果。 According to the foregoing technical solution provided by the embodiment of the present invention, after the primary device and the backup device are disconnected, the primary device and the standby device simultaneously detect link connectivity between the primary device and the standby device, and further, according to the detected chain. The technical solution for processing the primary device and/or the standby device is to solve the problem that the primary device and the backup device are disconnected after the primary device and the backup device are disconnected, and the node failure or the link failure cannot be distinguished. The device will directly convert the main device into two. The main device is running in the system, which reduces the stability of the system and makes the user experience worse. This enhances the stability of the system and improves the user experience. effect.

Claims (10)

  1. 一种主备设备的运行处理方法,包括:A method for processing an active/standby device includes:
    在确定主用设备和第一备用设备失联后,所述主用设备检测该主用设备与其他设备的链路连通性,同时所述第一备用设备检测该第一备用设备与其他设备的链路连通性,其中,所述其他设备为所述主用设备和所述第一备用设备所在的集群系统中,除所述主用设备和所述备用设备之外的设备;After determining that the primary device and the first standby device are disconnected, the primary device detects link connectivity between the primary device and the other device, and the first standby device detects the first standby device and other devices. Link connectivity, wherein the other device is a device other than the primary device and the standby device in the cluster system in which the primary device and the first standby device are located;
    所述主用设备根据所述链路连通性的检测结果对所述主用设备和/或所述第一备用设备的运行进行处理,以及所述第一备用设备根据所述链路连通性的检测结果对所述主用设备和/或所述第一备用设备的运行进行处理。The primary device processes the operation of the primary device and/or the first standby device according to the detection result of the link connectivity, and the first standby device according to the link connectivity The detection result processes the operation of the primary device and/or the first standby device.
  2. 根据权利要求1所述的方法,其中,所述主用设备根据所述链路连通性的检测结果对所述主用设备和/或所述第一备用设备的运行进行处理,包括:The method of claim 1, wherein the active device processes the operation of the primary device and/or the first standby device according to the detection result of the link connectivity, including:
    当所述主用设备检测所述链路为连通时,则将所述第一备用设备更换为第二备用设备;When the primary device detects that the link is connected, the first standby device is replaced with a second standby device;
    当所述主用设备检测所述链路未连通时,则在第二预定时间段内禁止所述主用设备运行。When the primary device detects that the link is not connected, the active device is prohibited from running for a second predetermined period of time.
  3. 根据权利要求1所述的方法,其中,所述第一备用设备根据所述链路连通性的检测结果对所述主用设备和所述第一备用设备的运行进行处理,包括:The method according to claim 1, wherein the first standby device processes the operations of the primary device and the first standby device according to the detection result of the link connectivity, including:
    当所述第一备用设备检测所述链路为连通时,则判断所述主用设备是否正在运行;When the first standby device detects that the link is connected, determining whether the active device is running;
    在所述主用设备未在运行时,将所述第一备用设备作为主用设备。The first standby device is used as a primary device when the primary device is not operating.
  4. 根据权利要求3所述的方法,其中,The method of claim 3, wherein
    在所述主用设备正在运行时,则在第三预定时间段内禁止所述第一备用设备运行。When the primary device is running, the first standby device is prohibited from operating for a third predetermined period of time.
  5. 根据权利要求3或4任一项所述的方法,其中,通过以下至少之一方式判断所述主用设备是否正在运行:The method according to any one of claims 3 or 4, wherein the primary device is determined to be running by at least one of the following:
    通过所述主用设备和所述第一备用设备外的第三方告知;Notifying by the third party outside the primary device and the first standby device;
    通过所述第一备用设备在转发面消息传输通道的指定信息检测。 The specified information is detected by the first standby device on the forwarding plane message transmission channel.
  6. 根据权利要求1所述的方法,其中,所述第一备用设备根据所述链路连通性的检测结果对所述主用设备和所述第一备用设备的运行进行处理,包括:The method according to claim 1, wherein the first standby device processes the operations of the primary device and the first standby device according to the detection result of the link connectivity, including:
    当所述第一备用设备检测所述链路未连通时,则在第一预定时间段内禁止所述第一备用设备运行。When the first standby device detects that the link is not connected, the first standby device is prohibited from running for a first predetermined time period.
  7. 根据权利要求1至4任一项所述的方法,其中,确定主用设备和第一备用设备失联,包括:The method according to any one of claims 1 to 4, wherein determining that the primary device and the first standby device are disconnected comprises:
    当所述主用设备和/或所述第一备用设备未接收到保活报文时,确定所述主用设备和所述第一备用设备失联。When the active device and/or the first standby device do not receive the keep-alive message, it is determined that the primary device and the first standby device are disconnected.
  8. 一种主备设备的运行处理系统,包括:An operation processing system for an active and standby device, comprising:
    主用设备,设置为在确定主用设备和第一备用设备失联后,检测该主用设备与其他设备的链路连通性,以及根据所述链路连通性的检测结果对所述主用设备和/或第一备用设备的运行进行处理,其中,所述其他设备为所述主用设备和所述备用设备所在的集群系统中,除所述主用设备和所述第一备用设备之外的设备;The primary device is configured to detect link connectivity between the active device and the other device after determining that the primary device and the first standby device are disconnected, and to use the detection result of the link connectivity according to the detection result of the link connectivity Processing of the device and/or the first standby device, wherein the other device is a cluster system in which the primary device and the standby device are located, except the primary device and the first standby device External equipment;
    所述第一备用设备,设置为在所述主用设备检测所述主用设备与其他设备的链路连通性时,检测所述第一备用设备与其他设备的链路连通性,以及根据所述链路连通性的检测结果对所述主用设备和/或所述第一备用设备的运行进行处理。The first backup device is configured to detect link connectivity between the first standby device and other devices when the primary device detects link connectivity between the primary device and other devices, and The detection result of the link connectivity processes the operation of the primary device and/or the first standby device.
  9. 根据权利要求8所述的系统,其中,所述主用设备还设置为当所述主用设备检测所述链路为连通时,则将所述第一备用设备更换为第二备用设备;以及当所述主用设备检测所述链路未连通时,则在第二预定时间段内禁止所述主用设备运行。The system according to claim 8, wherein said primary device is further configured to replace said first standby device with a second standby device when said primary device detects that said link is in communication; When the primary device detects that the link is not connected, the active device is prohibited from running for a second predetermined period of time.
  10. 根据权利要求8所述的系统,其中,所述第一备用设备还设置为当所述第一备用设备检测所述链路为连通时,则判断所述主用设备是否正在运行;在所述主用设备未在运行时,将所述第一备用设备作为主用设备。 The system according to claim 8, wherein the first standby device is further configured to determine, when the first standby device detects that the link is connected, to determine whether the primary device is running; When the primary device is not running, the first standby device is used as the primary device.
PCT/CN2015/073275 2014-11-04 2015-02-25 Method and system for processing operation of primary and standby device WO2016070530A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410614104.1 2014-11-04
CN201410614104.1A CN105634779B (en) 2014-11-04 2014-11-04 The operation processing method and device of master/slave device

Publications (1)

Publication Number Publication Date
WO2016070530A1 true WO2016070530A1 (en) 2016-05-12

Family

ID=55908461

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/073275 WO2016070530A1 (en) 2014-11-04 2015-02-25 Method and system for processing operation of primary and standby device

Country Status (2)

Country Link
CN (1) CN105634779B (en)
WO (1) WO2016070530A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019036892A1 (en) * 2017-08-22 2019-02-28 深圳瀚飞科技开发有限公司 Remote communication detection system and detection method for online monitoring platform
CN107688547B (en) * 2017-08-23 2020-06-16 苏州浪潮智能科技有限公司 Method and system for switching between main controller and standby controller
CN107579860A (en) * 2017-09-29 2018-01-12 新华三技术有限公司 Node electoral machinery and device
CN109728981A (en) * 2019-03-19 2019-05-07 江苏汇智达信息科技有限公司 A kind of cloud platform fault monitoring method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101674199A (en) * 2009-09-22 2010-03-17 中兴通讯股份有限公司 Method for realizing switching during network fault and finders
CN101729290A (en) * 2009-11-04 2010-06-09 中兴通讯股份有限公司 Method and device for realizing business system protection
CN102480423A (en) * 2010-11-30 2012-05-30 中兴通讯股份有限公司 Method and system for protecting layer 2 tunneling protocol (L2TP) network

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101207408B (en) * 2006-12-22 2012-07-11 中兴通讯股份有限公司 Apparatus and method of synthesis fault detection for main-spare taking turns
US8094569B2 (en) * 2008-12-05 2012-01-10 Cisco Technology, Inc. Failover and failback of communication between a router and a network switch
US8244125B2 (en) * 2009-01-21 2012-08-14 Calix, Inc. Passive optical network protection switching
WO2012103725A1 (en) * 2011-06-29 2012-08-09 华为技术有限公司 Method and apparatus for maintaining connectivity of transmission lines
US8675479B2 (en) * 2011-07-12 2014-03-18 Tellabs Operations, Inc. Methods and apparatus for improving network communication using ethernet switching protection
CN103931139B (en) * 2013-03-19 2017-02-15 华为技术有限公司 Method and device for redundancy protection, and device and system
CN103560955B (en) * 2013-10-24 2016-09-28 华为技术有限公司 Redundance unit changing method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101674199A (en) * 2009-09-22 2010-03-17 中兴通讯股份有限公司 Method for realizing switching during network fault and finders
CN101729290A (en) * 2009-11-04 2010-06-09 中兴通讯股份有限公司 Method and device for realizing business system protection
CN102480423A (en) * 2010-11-30 2012-05-30 中兴通讯股份有限公司 Method and system for protecting layer 2 tunneling protocol (L2TP) network

Also Published As

Publication number Publication date
CN105634779A (en) 2016-06-01
CN105634779B (en) 2019-09-03

Similar Documents

Publication Publication Date Title
US8438307B2 (en) Method and device of load-sharing in IRF stack
US8825844B2 (en) Notifying network operator when virtual addresses do not match on network elements configured for interchassis redundancy
CN101588304B (en) Implementation method of VRRP and device
CN105933407B (en) method and system for realizing high availability of Redis cluster
CN106330475B (en) Method and device for managing main and standby nodes in communication system and high-availability cluster
US10560550B1 (en) Automatic configuration of a replacement network device in a high-availability cluster
WO2016070530A1 (en) Method and system for processing operation of primary and standby device
CN110730125B (en) Message forwarding method and device, dual-active system and communication equipment
CN109462533B (en) Link switching method, link redundancy backup network and computer readable storage medium
CN102780615B (en) Link backup method and routing forwarding device
WO2016095344A1 (en) Link switching method and device, and line card
CN102891769A (en) Link fault informing method and apparatus
CN109495916B (en) Communication method and device
CN103856357A (en) Stack system fault processing method and stack system
CN105024798A (en) Method and device for time synchronization
CN103220189B (en) Multi-active detection (MAD) backup method and equipment
CN104994173A (en) Message processing method and system
WO2017036165A1 (en) Link fault detection method and apparatus
US20150263884A1 (en) Fabric switchover for systems with control plane and fabric plane on same board
CN108259325B (en) Route maintenance method and route equipment
CN107872822B (en) Service bearing method and device
WO2017146718A1 (en) Ring protection network division
CN106130783B (en) Port fault processing method and device
WO2019000954A1 (en) Method, device and system for monitoring node survival state
CN108174417B (en) Main/standby switching method and device, related electronic equipment and readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15856894

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15856894

Country of ref document: EP

Kind code of ref document: A1