CN106603261B - Hot backup method, first main device, standby device and communication system - Google Patents

Hot backup method, first main device, standby device and communication system Download PDF

Info

Publication number
CN106603261B
CN106603261B CN201510675648.3A CN201510675648A CN106603261B CN 106603261 B CN106603261 B CN 106603261B CN 201510675648 A CN201510675648 A CN 201510675648A CN 106603261 B CN106603261 B CN 106603261B
Authority
CN
China
Prior art keywords
link quality
link
standby
active
active device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510675648.3A
Other languages
Chinese (zh)
Other versions
CN106603261A (en
Inventor
张耀坤
牛承光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201510675648.3A priority Critical patent/CN106603261B/en
Publication of CN106603261A publication Critical patent/CN106603261A/en
Application granted granted Critical
Publication of CN106603261B publication Critical patent/CN106603261B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/22Alternate routing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the invention provides a hot backup method, first main equipment, standby equipment and a communication system, which are beneficial to saving system resources. The method comprises the following steps: the method comprises the steps that a first main device obtains a first link quality level from the first main device to a standby device, an X1 link quality level from the first main device to other N-1 main devices, and a Y link quality level from the N-1 main devices to the standby device, when the sum of the number of abnormal levels in the first link quality level and the X1 link quality level is less than the number of abnormal levels in the Y link quality level, the number of normal levels in the X1 link quality level is more than or equal to the number of normal levels in the Y link quality level, and the link quality level of the first main device is an abnormal level, the first main device is in hot backup to the standby device; otherwise, the first main device keeps the state of the main device.

Description

hot backup method, first main device, standby device and communication system
Technical Field
The embodiment of the invention relates to the technical field of communication, in particular to a hot backup method, first main equipment, standby equipment and a communication system.
Background
With the development of communication technology, more and more functions are integrated on communication devices in a network, and a failure of a communication device will cause that a terminal device served by the communication device cannot normally communicate, and may even affect the normal operation of the whole network. Therefore, generally, each communication device is configured with a backup channel to ensure that the communication device can still normally operate when the main communication channel fails, and the network is also configured with a backup device to enable the whole network operator to normally operate when a certain device fails.
The hot backup technology is a reliable protection technology in the network, when a single communication device in the network fails, the service in the failed communication device is quickly switched to the standby communication device, and the terminal device does not sense the failure switching of the communication device. The hot backup generally includes a dual-computer hot backup and a multi-computer hot backup. Wherein, the dual-computer hot backup, namely one standby communication equipment protects one main communication equipment; the multi-machine hot backup, namely 1 standby communication device, protects N main communication devices. Obviously, the dual-computer hot backup is one of the multi-computer hot backups.
The application scenarios of the hot backup are wide, for example, for a Broadband Remote Access Server (BRAS), the hot backup is implemented by using a Virtual Router Redundancy Protocol (VRRP). Taking the example that two BRAS perform dual-server hot backup, when the primary BRAS device fails, the primary and secondary switching occurs, and the data traffic on the primary BRAS, whether uplink or downlink, is switched to the secondary BRAS, and in the whole switching process, the user equipment will not sense, so that the user equipment does not need to reconnect the network. And when the fault of the primary BRAS is recovered, the primary BRAS is switched to continue to work.
The general multi-computer hot backup technology can only make correct judgment on the fault of the link interruption, and is easy to make wrong judgment on the condition of serious link packet loss, thereby possibly causing resource waste.
Disclosure of Invention
The embodiment of the invention provides a hot backup method, a first main device, a standby device and a communication system, wherein whether backup is carried out or not is determined by judging the link quality between each main device and each standby device, so that each communication device in the system after backup can normally communicate, and system resources are saved.
In a first aspect, a hot backup method is provided, including:
a first active device acquires a first link quality level, wherein the first link quality level is used for representing the link quality between the first active device and a standby device;
when the first link quality level indicates that a link is abnormal, the first active device obtains N-1 Xi-th link quality levels between the first active device and other N-1 active devices, wherein the Xi-th link quality level is used for indicating the link quality between the first active device and the ith active device, N is greater than or equal to 2, and i is greater than 1 and less than or equal to N;
the first master device sends a request message to each of the other N-1 master devices, where the request message is used to request each of the other N-1 master devices to send a link quality level between the other N-1 master devices and the standby device;
the first master device respectively receives M Yj-th link quality levels between a jth master device and the standby device, which are sent by M master devices in the other N-1 master devices, wherein the Yj-th link quality level is used for representing the link quality between the jth master device and the standby device, j is greater than 1 and less than or equal to N, and M is greater than or equal to zero and less than or equal to N-1;
When the sum of the number of the first link quality levels and the number of the M Yj-th link quality levels representing link anomalies is smaller than the number of the N-1 Xi-th link quality levels representing link anomalies, and the number of the M Yj-th link quality levels representing link normality is larger than or equal to the number of the N-1 Xi-th link quality levels representing link normality, and the traffic of the first active device is damaged, the first active device switches the traffic to the standby device.
with reference to the first aspect, a first possible implementation manner of the first aspect is provided, which further includes:
When the sum of the number of the first link quality levels and the number of the M Yj link quality levels indicating the link abnormality is greater than or equal to the number of the N-1 xth link quality levels indicating the link abnormality, or when the number of the M Yj link quality levels indicating the link normality is less than the number of the N-1 xth link quality levels indicating the link normality, or when the service of the first active device itself is not damaged, the first active device does not back up data to the standby device within a preset silence time window.
With reference to the first possible implementation manner of the first aspect, a second possible implementation manner of the first aspect is further provided, and the method further includes:
When the silent time window is overtime, the first master device acquires the first link quality level again;
and when the first link quality level indicates that the link is normal, the first active device informs the standby device to switch the service back to the first active device.
With reference to the first aspect, the first possible implementation manner of the first aspect, or the second possible implementation manner of the first aspect, a third possible implementation manner of the first aspect is further provided, and the method further includes:
and the first main equipment suspends the transmission of the backup data to the standby equipment from the waiting time window beginning when the request message is respectively transmitted to each main equipment in the other N-1 main equipments.
in a second aspect, a hot backup method is provided, including:
the standby equipment acquires a first link quality level, wherein the first link quality level is used for expressing the link quality between the first active equipment and the standby equipment;
When the first link quality level represents a link abnormal level, the standby device obtains N-1 Yj-th link quality levels between the standby device and each of N-1 primary devices except the first primary device, wherein the Yj-th link quality level is used for representing the link quality between the standby device and the j-th primary device, N is greater than or equal to 2, and j is greater than 1 and less than or equal to N;
The standby device sends a request message to each of the other N-1 primary devices, where the request message is used to request each of the other N-1 primary devices to send a link quality level between the other N-1 primary devices and the first primary device;
The standby equipment respectively receives M Xi link quality levels between the ith active equipment and the first active equipment, which are sent by the other N-1 active equipments, wherein the Xi link quality levels are used for representing the link quality between the ith active equipment and the first active equipment, i is greater than 1 and less than or equal to N, and M is greater than or equal to zero and less than or equal to N-1;
when the sum of the number of the first link quality levels and the number of the N-1 Yj link quality levels representing link anomalies is less than the number of the M Xi link quality levels representing link anomalies, and the number of the N-1 Yj link quality levels representing link normality is greater than or equal to the number of the M Xi link quality levels representing link normality, the standby device raises an access priority, where the raised access priority is used to indicate that a user of the first active device switches to the standby device.
With reference to the second aspect, a first possible implementation manner of the second aspect is further provided, and the method further includes:
When the sum of the number of the first link quality levels and the number of the N-1 Yj link quality levels representing link anomalies is greater than or equal to the number of the M Xi link quality levels representing link anomalies, or the number of the N-1 Yj link quality levels representing link normality is less than the number of the M Xi link quality levels representing link normality, the standby device suspends receiving the backup data sent by the first active device within a preset silence time window.
In a third aspect, a first active device is provided, including:
a first obtaining module, configured to obtain a first link quality level, where the first link quality level is used to indicate link quality between the first active device and the standby device;
The judging module is used for judging whether the first link quality level represents link abnormity;
A second obtaining module, configured to obtain, when the determining module determines that the first link quality level indicates a link abnormality, N-1 xth link quality levels between the first active device and N-1 other active devices, where the xth link quality level is used to indicate link quality between the first active device and the ith active device, N is greater than or equal to 2, and i is greater than 1 and less than or equal to N;
A sending module, configured to send a request packet to each of the other N-1 primary devices, where the request packet is used to request each of the other N-1 primary devices to send a link quality level between the other N-1 primary devices and the standby device;
A receiving module, configured to receive M Yj-th link quality levels between a jth active device and the standby device, where the jth active device is sent by M active devices in the N-1 other active devices, where the Yj-th link quality level is used to indicate link quality between the jth active device and the standby device, j is greater than 1 and less than or equal to N, and M is greater than or equal to zero and less than or equal to N-1;
a processing module, configured to switch a service to the standby device when a sum of the first link quality level and the number of the M Yj link quality levels indicating the link anomaly is less than the number of the N-1 Xi link quality levels indicating the link anomaly, and the number of the M Yj link quality levels indicating the link anomaly is greater than or equal to the number of the N-1 Xi link quality levels indicating the link anomaly, and the service of the first active device is damaged.
With reference to the third aspect, a first possible implementation manner of the third aspect is provided, where the processing module is further configured to, when a sum of the first link quality level and the number of the M Yj link quality levels indicating the link abnormality is greater than or equal to the number of the N-1 Xi link quality levels indicating the link abnormality, or when the number of the M Yj link quality levels indicating the link normality is smaller than the number of the N-1 Xi link quality levels indicating the link normality, or when a service of the first primary device itself is not damaged, not backup data to the backup device within a preset silence time window.
With reference to the first possible implementation manner of the third aspect, a second possible implementation manner of the third aspect is further provided, where the first obtaining module is further configured to obtain the first link quality level again after the silence time window is expired;
The sending module is further configured to notify the standby device to switch back to the first active device when the determining module determines that the first link quality level indicates that the link is normal.
with reference to the third aspect, the first possible implementation manner of the third aspect, or the second possible implementation manner of the third aspect, the sending module is further configured to suspend sending backup data to the standby device within a waiting time window from when the request packet is sent to each of the other N-1 active devices.
In a fourth aspect, a standby device is provided, comprising:
A first obtaining module, configured to obtain a first link quality level, where the first link quality level is used to indicate link quality between a first active device and a standby device;
the judging module is used for judging whether the first link quality level represents link abnormity;
a second obtaining module, configured to obtain, when the determining module determines that the first link quality level indicates a link abnormal level, N-1 Yj link quality levels between the standby device and each of N-1 primary devices other than the first primary device, where the Yj link quality level is used to indicate link quality between the standby device and the j primary device, N is greater than or equal to 2, and j is greater than 1 and less than or equal to N;
A sending module, configured to send a request packet to each of the other N-1 primary devices, where the request packet is used to request each of the other N-1 primary devices to send a link quality level between the other N-1 primary devices and the first primary device;
A receiving module, configured to receive M Xi link quality levels between an ith active device and the first active device, where the Xi link quality levels are used to represent link qualities between the ith active device and the first active device, and i is greater than 1 and less than or equal to N, and M is greater than or equal to zero and less than or equal to N-1, where the M is sent by the other N-1 active devices;
a processing module, configured to, when a sum of the number of the first link quality level and the number of the N-1 Yj link quality levels indicating the link anomaly is smaller than the number of the M xth link quality levels indicating the link anomaly, and the number of the N-1 Yj link quality levels indicating the link anomaly is greater than or equal to the number of the M xth link quality levels indicating the link anomaly, promote an access priority, where the promoted access priority is used to indicate that a user of the first active device switches to the standby device.
with reference to the fourth aspect, a first possible implementation manner of the fourth aspect is provided, where the processing module is further configured to suspend receiving backup data sent by the first active device within a preset silence time window when a sum of the first link quality level and N-1 numbers of Yj link quality levels indicating link anomalies is greater than or equal to a number of the M number of Xi link quality levels indicating link anomalies, or a number of the N-1 number of Yj link quality levels indicating link normality is smaller than a number of the M number of Xi link quality levels indicating link normality.
In a fifth aspect, a communication system is provided, including: n active devices and a standby device provided in the fourth aspect or the first possible implementation manner of the fourth aspect, where any one of the N active devices is the first active device provided in the third aspect or any one possible implementation manner of the third aspect.
The hot backup method, the first main device, the standby device and the communication system provided by the embodiment of the invention count the link state between the first main device and each other main device, and count the link state between each main device and each standby device, so that the reason causing the link problem between the first main device and the standby device can be accurately judged, more reasonable hot backup processing can be performed, wrong hot backup processing caused by link packet loss between the first main device and the standby device can be avoided, and system resources are saved.
drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1A is a schematic diagram of a BRAS dual-computer hot backup;
fig. 1B is a schematic diagram of another BRAS dual-server hot-standby;
Fig. 2 is a schematic diagram of a multi-machine hot standby network including N primary devices and one standby device;
FIG. 3 is a flowchart of a hot backup method according to an embodiment of the present invention;
FIG. 4 is a flowchart of a hot backup method according to an embodiment of the present invention;
FIG. 5 is a flowchart of a hot backup method according to an embodiment of the present invention;
FIG. 6 is a flowchart of a hot backup method according to an embodiment of the present invention;
FIG. 7 is a flowchart of a hot backup method according to an embodiment of the present invention;
FIG. 8 is a flowchart of a hot backup method according to an embodiment of the present invention;
Fig. 9 is a schematic structural diagram of a first active device according to an embodiment of the present invention;
Fig. 10 is a schematic structural diagram of a standby device according to an embodiment of the present invention;
Fig. 11 is a schematic structural diagram of a first active device according to an embodiment of the present invention;
fig. 12 is a schematic structural diagram of a standby device according to an embodiment of the present invention;
Fig. 13 is a schematic structural diagram of a communication system according to an embodiment of the present invention.
Detailed Description
in order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
the hot backup is a commonly used technology in a communication network, and for example, a BRAS uses a VRRP protocol to implement the hot backup, a basic flow of the BRAS dual-server hot backup is shown in fig. 1A and fig. 1B. Fig. 1A is a schematic diagram of a BRAS dual-computer hot backup; fig. 1B is a schematic diagram of another BRAS dual-server hot-standby.
in fig. 1A and 1B, terminal 11 communicates with switch 13 through access device 12, switch 13 is capable of communicating with BRAS 14 and BRAS 15, BRAS 14 accesses Core network 17 through Core Router (CR) 16, and BRAS 15 accesses Core network 17 through CR 16.
the BRAS 14 is a primary BRAS, and the BRAS 15 is a standby BRAS. BRAS 14 is responsible for terminal 11 to Access and carry user services, and BRAS 15 synchronizes database information such as a session of a user, such as a user Media Access Control (MAC), a network Protocol (Internet Protocol, IP), a Dynamic Host Configuration Protocol (DHCP) lease, a DHCP option82 (option82), and the like, from BRAS 14. The priority of the subscriber route issued by BRAS 14 can be made higher than BRAS 15 by the routing policy, so that the downstream traffic from the network side to terminal 11 is preferably forwarded at BRAS 14, as shown by path 18 in fig. 1A.
When the BRAS 14 has a fault, the fault may be a link fault or a complete machine fault, the primary/secondary switching of the VRRP and the Redundant User Information (RUI) occurs, the BRAS 15 serves as a new primary BRAS, and the BRAS 14 serves as a new secondary BRAS. BRAS 15 may issue an Internet Protocol (IP) route of a high-priority user, and BRAS 14 may cancel the IP route of the user or reduce the priority of the route, so that traffic descending to terminal 11 from the network side is switched to BRAS 15. On the user access side, the BRAS 15 device sends an Address Resolution Protocol (ARP) message to modify the MAC table in the switch 13, and the uplink traffic of the terminal 11 is also switched to the BRAS 15, as shown in path 19 in fig. 1B. The above switching protection process is not sensed by the terminal 11, i.e. the terminal 11 does not need to dial up again. When the failure of the BRAS 14 is recovered, a subscriber accessing the BRAS 15 may be handed back to the BRAS 14.
The BRAS uses the VRRP protocol to realize multi-machine hot standby and has several important modules:
Module 1: and a VRRP module. And deploying a VRRP protocol between every two BRASs, and electing the determined main and standby roles. After deploying the VRRP, the primary BRAS sends a VRRP heartbeat message to the backup BRAS, where the message is a two-layer multicast message, and therefore a two-layer channel needs to be established between the primary BRAS and the backup BRAS. If the active BRAS and the standby BRAS are both hung down on a two-layer network, such as an exchanger, a two-layer channel exists between the active BRAS and the standby BRAS. If the distance between the primary BRAS and the backup BRAS is long, or a channel passing through a two-Layer Network does not exist between the primary BRAS and the backup BRAS, a two-Layer Virtual Private Network (L2 VPN) tunnel passing through a metropolitan area Network needs to be deployed between the primary BRAS and the backup BRAS. The VRRP module may track the status of the access ports, such as available (up) or unavailable (down), to dynamically adjust the priority parameters of the VRRP protocol. Meanwhile, the VRRP module may be linked with Detection technologies such as Bidirectional Forwarding Detection (BFD), trigger re-election of the primary BRAS and the backup BRAS, and perform switching processing in a linked manner.
and (3) module 2: a Remote Backup Service (RBS) module. The RBS module is used to transfer backup data between the two BRAS. The RBS module is implemented using Transmission Control Protocol (TCP), and provides a mechanism for batch backup and real-time backup of services. Bulk backup occurs after a TCP connection is successfully established. The real-time backup occurs when the user access is successful, the attribute change of the user is successful after the user is online, or the backup period of the user statistical data arrives.
And a module 3: an RUI processing module. The RUI processing module is responsible for user information backup, tracking the state of VRRP, determining the states of the main equipment and the standby equipment of the access user, and controlling the switching of forwarding routes, charging control and the like according to the states.
the VRRP protocol can effectively determine the main-standby relationship of two BRASs, and simultaneously combines detection technologies such as BFD and the like, so that the standby BRAS can be ensured to quickly sense and quickly switch after the main BRAS fails. Specifically, as shown in fig. 1A and fig. 1B, the VRRP heartbeat packet is forwarded through the switches connected downstream of the two BRAS, and the BFD tracks the links on the network side of the primary BRAS, the complete machine failure of the primary BRAS, the link failure with the switch, and the link failure with the CR device on the network side, and the backup BRAS can quickly sense the failure. In a relatively complex multi-machine hot standby networking, rapid sensing and switching processing of faults are realized by combining VRRP and BFD detection technologies.
the VRRP and BFD detection technologies in the current multi-machine hot standby technology can only handle link failures, such as link interruption, complete machine failures, and other failure types, and may make an erroneous determination for link packet loss and other failures. Fig. 2 is a schematic diagram of a multi-machine hot standby network including N active devices and one standby device. In the network shown in fig. 2, a multi-unit hot standby network including 3 primary BRAS and one standby BRAS is taken as an example, the 3 primary BRAS are BRAS 21, BRAS 22 and BRAS 23 respectively, and the standby BRAS is BRAS 24. The BRAS 21, BRAS 22 and BRAS 23 respectively establish VRRP and RBS with BRAS 24 to form 3 device pairs, and the two BRAS included in each device pair in the 3 device pairs have a primary-standby relationship. Any BRAS in each equipment pair can judge the state of the opposite terminal according to VRRP and BFD, and determine whether the RBS is connected or not according to whether the route is accessible or not. The detection technology can only take effect when a link fails or the whole machine fails, and any BRAS in each equipment pair cannot accurately judge the state of the opposite end when the intermediate network seriously loses packets.
specifically, when the RBS is repeatedly broken or BFD vibrates, the primary BRAS still considers itself to be the VRRP master device and keeps the role of the VRRP master device unchanged; the VRRP state of the backup BRAS is continuously oscillated due to BFD oscillation and the fact that the VRRP heartbeat sent by the main BRAS can be received occasionally. However, based on this information, neither the active BRAS nor the standby BRAS has a way to determine whether there is a problem with the peer or with the network where the peer is located. The backup BRAS also cannot accurately judge whether the backup BRAS has problems, and unless the backup BRAS has non-RUI users and the non-RUI users have a large number of dropped lines, the backup BRAS can know that the network of the backup BRAS has faults.
In the following three scenarios, VRRP cannot do correct processing. (1) The network where the primary BRAS is located loses packets, causing RBS oscillation. The primary BRAS should be switched to the backup BRAS to perform RUI switching, but actually, because the packet is lost, the link is not available, and VRRP switching cannot be triggered. (2) And the network where the backup BRAS is located has a fault and loses packet, which causes RBS oscillation. The main and standby states of the RUI remain unchanged, and data should not be backed up to the standby BRAS, but in fact, backup is started as long as the RBS is restored, which causes resource waste. (3) Network faults between the main BRAS and the standby BRAS cause RBS oscillation. At this time, both parties cannot guarantee that links to all users are reachable, links between some users and the primary BRAS are reachable, and links between some users and the backup BRAS are reachable, and at this time, the RUI should meet the services of all users.
therefore, the VRRP can only make a correct backup selection when the whole machine fails or the link fails, but the VRRP may make an erroneous judgment when the whole machine or the link fails, which may result in a waste of resources.
in a multi-machine hot backup scenario, link states between multiple main devices and one standby device may have problems to different degrees, but link problems caused by which device may not be determined specifically only by detection of the two devices themselves. Therefore, in order to make the correct backup selection in a multi-machine hot backup scenario, it is first necessary to correctly determine the device that causes the link problem. In the presence of N: in a 1 multi-machine hot standby scene, when a primary BRAS finds that the packet loss rate of the primary BRAS and other primary BRASs is very small, but the packet loss rate between the primary BRAS and a standby BRAS is very large, the standby BRAS is a device causing a link problem. When all the primary BRAS and the standby BRAS find that the packet loss rates of the primary BRAS and the standby BRAS are large, it is indicated that the primary BRAS is a device causing a link problem. Therefore, a performance detection protocol can be configured between all BRAS, when VRRP oscillation or RBS oscillation occurs, the primary BRAS finds that packet loss rates of the primary BRAS and other primary BRASs are low, but packet loss rates between the primary BRAS and the backup BRAS are high, which indicates that the backup BRAS is a device causing link problems. When all the primary BRAS and the standby BRAS find that the packet loss rate of the primary BRAS and the packet loss rate of one of the primary BRAS are very high, the situation that the one of the primary BRAS is a device causing a link problem is described.
Based on the above conception, the embodiment of the invention provides a hot backup method, which can accurately determine the equipment causing the link failure, and further perform hot backup treatment reasonably.
Fig. 3 is a flowchart of a hot backup method according to an embodiment of the present invention, where a ratio of the numbers of the hot backup method applied to the active device and the standby device is N: 1, wherein N is more than or equal to 2. The hot backup method provided by the embodiment of the invention is described in detail below with reference to fig. 3.
s301, the first active device obtains a first link quality level, where the first link quality level is used to indicate a link quality between the first active device and the standby device.
Specifically, the hot backup method provided in this embodiment is applied to a multi-machine hot backup system including N primary devices and one standby device, where N is greater than or equal to 2.
The main execution body of the hot backup method provided in this embodiment is any active device, which is referred to as a first active device. First, a first active device needs to acquire a first link quality level from the first active device to a standby device. The first link quality level is used to represent the link quality between the first active device and the standby device. The first link quality level may be represented by various parameters, such as a packet loss rate, a data throughput rate, and the like between the first active device and the standby device. The first active device may know whether a link between the first active device and the standby device is normal through the first link quality level. The first link quality level may be classified into several classes, such as normal, abnormal, unavailable, and the like, where unavailable is a state when the first active device cannot acquire the first link quality level. Or the first link quality level may be a numerical value or a proportional value, and for the states of normal, abnormal, unavailable, etc., a corresponding value range may be set, which is not illustrated one by one herein.
S302, when the first link quality level indicates that the link is abnormal, the first active device obtains N-1 Xi-th link quality levels between the first active device and other N-1 active devices, wherein the Xi-th link quality level is used for indicating the link quality between the first active device and the ith active device, N is greater than or equal to 2, and i is greater than 1 and less than or equal to N.
Specifically, after the first active device obtains the first link quality level, if the first link quality level indicates that the link is normal, it means that the link state between the first active device and the standby device is normal, and the processing may be performed according to a normal hot backup process.
When the first link quality level indicates that the link is abnormal, it means that there is a problem in the link between the first active device and the standby device, and the first active device needs to determine whether the link problem is caused by a failure of the standby device or the link problem is caused by a failure of the first active device.
For example, the first active device first obtains N-1 ith link quality levels between the first active device and other N-1 active devices, where the Xi link quality level is used to indicate the link quality between the first active device and the ith active device, N is greater than or equal to 2, and i is greater than 1 and less than or equal to N. Since the system includes N-1 active devices in addition to the first active device, the first active device acquires the N-1 xth link quality levels. And if the W Xth link quality levels in the N-1 Xth link quality levels are abnormal and the S Xth link quality levels are normal, the sum of W and S is N-1.
S303, the first active device sends a request message to each of the other N-1 active devices, where the request message is used to request each of the other N-1 active devices to send a link quality level between the other N-1 active devices and the standby device.
Specifically, the first active device needs to know the link quality level between the other N-1 active devices and the standby device, that is, to know the link states between the other N-1 active devices and the standby device. The first master device sends a request message to each master device in the other N-1 devices respectively.
It should be noted that the cause of the abnormal state of the link between the first active device and the standby device may be caused by the failure of the first active device, and the link between the first active device and the other N-1 active devices may be abnormal due to the failure of the first active device, so that the request packet sent by the first active device is not necessarily received by the other N-1 devices.
s304, the first active device respectively receives M Yj-th link quality levels between the j-th active device and the standby device, which are sent by the M active devices in the other N-1 active devices, wherein the Yj-th link quality levels are used for representing the link quality between the j-th active device and the standby device, j is greater than 1 and less than or equal to N, and M is greater than or equal to zero and less than or equal to N-1.
specifically, after sending a request message to other N-1 active devices, the first active device receives a Yj link quality level sent by the j-th active device, where the Yj link quality level is used to indicate the link quality between the j-th active device and the standby device. Because the link between the first active device and the other N-1 active devices may have a problem, the first active device may not receive the Yj-th link quality sent by the j-th active device in the other N-1 active devices, that is, may not receive N-1 link quality levels, and the Yj link quality level in the N-1 link quality levels is used to indicate the link quality between the j-th active device and the standby device. If the first active device receives M link quality levels in total, M is less than or equal to N-1. And if P link quality levels are abnormal and Q link quality levels are normal, the sum of P and Q is less than or equal to N-1.
S305, when the sum of the quantity of the first link quality level and the quantity of the M Yj-th link quality levels representing the link abnormity is less than the quantity of the N-1 Xi-th link quality levels representing the link abnormity, and the quantity of the M Yj-th link quality levels representing the link normality is more than or equal to the quantity of the N-1 Xi-th link quality levels representing the link normality, and the service of the first active device is damaged, the first active device switches the service to the standby device.
for example, the Yj-th link quality levels represent link quality levels between each active device and the standby device in the M active devices. If M is greater than 3, the M Yj-th link quality levels are respectively the Y2-th link quality level, the Y3-th link quality level … -th YM + 1-th link quality level. Wherein, the Y2-th link quality level represents the link state from the second active device to the standby device; the Y3 th link quality level represents the link state from the third active device to the standby device; the YM +1 link quality level represents the link state from the M +1 st active device to the standby device. When M is other values, the details of the M Yj-th link quality levels are not illustrated herein.
for example, the N-1 Xi-th link quality level represents a link quality level between each active device of the N-1 active devices and the first active device. If N is greater than 3, the N-1 Xi link quality levels are the Xn link quality level of the X2 link quality level, the X3 link quality level of … link quality level. Wherein the xth 2 link quality level represents a link state between the first active device and the second active device. The xth 3 link quality level represents a link state between the first active device and the third active device. The XN link quality level represents the link state between the nth active device and the first active device. The details of the N-1 Xth link quality level when N is other values are not illustrated herein.
Specifically, after the first active device obtains the first link quality level, the N-1 ith link quality level, and the M Yj link quality levels, it can be determined who caused the link problem between the first active device and the standby device. If the sum of the quantity of the first link quality level and the quantity of the M Yj link quality levels representing the link abnormity is less than the quantity of the N-1 Xi link quality levels representing the link abnormity, namely the sum of P and 1 is less than W. The sum of P and 1 is that the first active device also considers that the link quality level between the first active device and the standby device indicates a link abnormality, that is, the link quality level of the standby device indicates that the number of the link abnormalities is smaller than the number of the link quality level of the first active device indicating that the number of the link abnormalities is smaller; and the quantity of the M Yj-th link quality levels indicating the normal level of the link is more than or equal to the quantity of the N-1 Xi-th link quality levels indicating the normal level of the link, namely Q is more than or equal to S, namely the quantity of the link quality levels of the standby equipment indicating the normal level of the link is more than or equal to the quantity of the link quality levels of the first active equipment indicating the normal level of the link. That is, the link quality level of the standby device indicates that the likelihood of the link being abnormal is less than the likelihood of the link quality level of the first active device indicating that the link is abnormal, and the link quality level of the standby device indicates that the likelihood of the link being normal is equal to or greater than the likelihood of the link quality level of the first active device indicating that the link is normal. Meanwhile, when the first active device is required to find that its own service is damaged, the first active device will determine that the problem of the link between the first active device and the standby device is caused by its own fault, so that the first active device switches the service to the standby device.
The first master device determines the service switching only when determining that the service of the first master device is damaged, in addition to determining the quality level of each link, because the first master device determines that the service switching is performed when the quality level of each link is damaged in the following manner: in the 1 multi-machine hot backup system, the standby equipment is always in a standby state, and only when the first main equipment really has a problem, the standby equipment is switched to the standby equipment, and the standby equipment is not suitable for being used as the main equipment for a long time and exists in a network. If the service of the first active device is not damaged, the first active device can still provide network service for the user, and only the hot backup of the data is affected due to the possible problem of the link between the first active device and the other active devices and the link between the first active device and the standby device. In order to ensure that users in the system are not frequently switched and to ensure the stability of the system, only when the service of the first active device is damaged and the problem of the link between the first active device and the standby device is caused by the failure of the first active device, the service of the first active device is switched to the standby device. The first active device may determine whether its own service is damaged by detecting whether an accessed user has a dropped connection, has no data transmission for a long time, or has normal data transmission with another accessed user, in other words, the first active device determines that its own service is damaged when detecting that an accessed user has a dropped connection, has no data transmission for a long time, or has data transmission abnormality with another accessed user.
the first active device may use a normal service switching method to complete the service switching to the standby device, which is not described herein again.
In the hot backup method provided by this embodiment, the link states between the first primary device and each of the other primary devices are counted, and the link states between each of the primary devices and the standby device are counted, so that the cause of the link problem between the first primary device and the standby device can be accurately determined, more reasonable hot backup processing can be performed, an erroneous hot backup processing due to the loss of the link between the first primary device and the standby device is avoided, and system resources are saved.
further, in the embodiment shown in fig. 3, after the first active device switches the service to the standby device, the first active device will transition to the standby device state, and the standby device will transition to the active device state. At this time, the first active device may also start a preset silence time window, and within the silence time window, the first active device will not switch back to the active device state. Therefore, frequent switching between the main equipment and the standby equipment can be avoided.
Further, after the silence time window is overtime, the first active device will again obtain the first link quality level between the first active device and the standby device, and if the first link quality level is normal, the first active device will notify the standby device to switch the service back to the active device. The time of the silent time window can be set according to an empirical value, and the silent time window can be generally set as the mean time to failure repair of the system.
Further, in the embodiment shown in fig. 3, the triggering manner for the first active device to switch the service to the standby device may include: the first main device reduces the access priority or initializes, and triggers the user device accessed to the first main device to switch to the standby device. That is, after the first active device determines that the first active device has a link problem with the standby device due to a failure of the first active device, the first active device may actively reduce the access priority of the first active device, so that the user equipment accessing the first active device is switched to the standby device, thereby completing the service switching. Or the first active device may also perform initialization processing, and also may switch the user equipment accessing the first active device to the standby device, thereby completing the service switching. The principle of the first active device for reducing the access priority is that the access priority needs to be reduced to be lower than the access priority of the standby device, so that the user originally accessing the first active device can be switched to the standby device.
Further, after the first active device lowers the access priority or initializes, the hot backup method provided in this embodiment further includes: and the first active device sends a message that the first active device reduces the access priority or initializes to the standby device. Therefore, the standby device can also acquire the information of the first main device for reducing the priority or initializing, so that the access of the user equipment originally accessing the first main device can be prepared. It should be noted that, although the first active device sends the message that the first active device lowers the access priority or initializes the first active device to the standby device, the standby device may not receive the message due to a problem in a link between the first active device and the standby device, but the first active device lowers the access priority, and therefore, even if the standby device does not receive the information sent by the first active device, the switching of the user equipment can still be completed.
fig. 4 is a flowchart of a hot backup method according to an embodiment of the present invention. The hot backup method provided in this embodiment, on the basis of the method provided in the embodiment shown in fig. 3, may further include:
S401, when the sum of the quantity of the first link quality level and the quantity of the M Yj-th link quality levels representing the link abnormity is greater than or equal to the quantity of the N-1 Xi-th link quality levels representing the link abnormity, or when the quantity of the M Yj-th link quality levels representing the link normality is less than the quantity of the N-1 Xi-th link quality levels representing the link normality, or when the service of the first active device is not damaged, the first active device does not back up data to the standby device within a preset silence time window.
for example, after the first active device obtains the first link quality level, the N-1 ith link quality level, and the M Yj link quality levels, the following determination may be performed. If the sum of the number of the first link quality level and the number of the N-1 Xi-th link quality levels representing the link abnormal level is more than or equal to the number of the M Yj-th link quality levels representing the link abnormal level, namely the sum of P and 1 is more than or equal to W. P +1 is that the first active device also considers that the link quality level between the first active device and the standby device indicates a link abnormality), that is, the link quality level of the standby device indicates that the number of link abnormalities is greater than or equal to the number of link quality levels of the first active device indicating that the number of link abnormalities is, the first active device may consider that the link state between the standby device and the other N-1 active devices is inferior to the link state between the first active device and the other N-1 active devices, and then the first active device may determine that the link problem between the first active device and the standby device is caused by a failure of the standby device.
for example, if the number of the N-1 Xi-th link quality levels indicating the normal level of the link is smaller than the number of the M Yj-th link quality levels indicating the normal level of the link, that is, Q is smaller than S, that is, the link quality level of the standby device indicates that the number of the link is smaller than the number of the link quality level of the first active device indicating that the link is normal, the first active device may determine that the link state from the standby device to the N-1 other active devices is inferior to the link state from the first active device to the N-1 other active devices, and then the first active device may also determine that the link problem from the first active device to the standby device is caused by the failure of the standby device. That is, the voting opinion of each device is that the link quality level of the standby device indicates that the possibility of the link being abnormal is greater than or equal to the possibility that the link quality level of the first active device indicates that the link is abnormal, or the possibility that the link quality level of the standby device indicates that the link is normal is less than the possibility that the link quality level of the first active device indicates that the link is normal.
then, when the first active device determines that the link problem from the first active device to the standby device is caused by the failure of the standby device, if the first active device continues to send backup data to the standby device, the backup may fail or be repeatedly backed up due to reasons such as link packet loss, thereby causing resource waste. Then the first primary device will keep the state of the primary device at this time, and no hot backup is performed, i.e. no data is backed up to the backup device.
Further, the first active device may also start a quiet time window, and the first active device does not perform hot backup processing in the quiet time window, that is, does not backup data to the standby device. After the silent time window is expired, the first active device may perform the hot backup process again according to the embodiment shown in fig. 3. The time of the silent time window can be set according to an empirical value, and the silent time window can be generally set as the mean time to failure repair of the system.
Further, in the embodiment shown in fig. 4, after the first active device determines that the link state from the standby device to the other N-1 active devices is inferior to the link state from the first active device to the other N-1 active devices, the method further includes: the first master device sends a silent time window to the standby device, and the silent time window is used for enabling the standby device not to process the received backup data in the silent time window.
Specifically, when the first active device determines that the link problem from the first active device to the standby device is caused by the failure of the standby device, the first active device determines to maintain the state of the active device, and after the first active device starts the silence time window, the first active device may further send the silence time window to the standby device. And after the standby equipment receives the silence time window, determining that the first main equipment does not perform hot backup in the time window. At this time, the backup device will not process even if it receives the backup data.
Further, the method provided by the embodiment corresponding to fig. 3 or the method provided by the embodiment corresponding to fig. 4 further includes: and the first main equipment suspends the transmission of the backup data to the standby equipment from the waiting time window beginning when the request message is respectively transmitted to each main equipment in the other N-1 main equipments.
Specifically, after the first active device sends the request packet to the other N-1 active devices, the first active device does not determine which device fails to cause the link problem between the first active device and the standby device, and the first active device may suspend sending the backup data to the standby device within a preset waiting time window. Therefore, resource waste caused by the fact that the first main device continues to backup data to the standby device due to the failure of the standby device is avoided.
Fig. 5 is a flowchart of a hot backup method according to an embodiment of the present invention. The hot backup method provided by this embodiment is applied to the case where the number ratio of the active device to the standby device is N: 1, wherein N is greater than or equal to 2. The method provided by the embodiment of the invention is explained below with reference to fig. 5.
S501, the standby device obtains a first link quality level, where the first link quality level is used to indicate a link quality between the first active device and the standby device.
The main execution body of the hot backup method provided by the embodiment is a standby device. The meaning of the first link quality level can refer to the corresponding content in the embodiment corresponding to fig. 3, and is not described herein again.
s502, when the first link quality level represents the link abnormal level, the standby device obtains N-1 Yj-th link quality levels between the standby device and each of the other N-1 active devices except the first active device, wherein the Yj-th link quality level is used for representing the link quality between the standby device and the j-th active device, N is greater than or equal to 2, and j is greater than 1 and less than or equal to N.
Specifically, after the standby device obtains the first link quality level, if the first link quality level indicates that the link is normal, it means that the link state between the first active device and the standby device is normal, and the processing may be performed according to a normal hot backup process.
When the first link quality level indicates that the link is abnormal, it means that the link between the first active device and the standby device has a problem, and the standby device needs to determine whether the link problem is caused by the failure of the standby device or the link problem is caused by the failure of the first active device. The standby equipment obtains N-1 link quality levels between the standby equipment and other N-1 main equipment, the Yj link quality level in the N-1 link quality levels is used for representing the link quality between the standby equipment and the j main equipment, N is greater than or equal to 2, and j is greater than 1 and less than or equal to N. The meaning of the N-1 Yj-th link quality level is the same as that in the embodiment corresponding to fig. 3, and is not described herein again. The standby device acquires N-1 Yj-th link quality levels, wherein p Yj-th link quality levels in the N-1 Yj-th link quality levels are abnormal, q Yj-th link quality levels are normal, and p + q is N-1.
s503, the standby device sends a request message to each of the other N-1 active devices, where the request message is used to request each of the other N-1 active devices to send a link quality level between the other N-1 active devices and the first active device.
Specifically, the standby device obtains the link quality level between the other N-1 primary devices and the first primary device by sending a request message.
It should be noted that the reason for causing the link state abnormality between the first active device and the standby device may be caused by a failure of the standby device, and therefore the request packet sent by the standby device is not necessarily received by all of the other N-1 devices.
S504, the standby device respectively receives M Xi link quality levels between the ith active device and the first device, which are sent by other N-1 active devices, wherein the Xi link quality levels are used for representing the link quality between the ith active device and the first active device, i is greater than 1 and less than or equal to N, and M is greater than or equal to zero and less than or equal to N-1.
Specifically, the meaning of the M Xi-th link quality levels is the same as that in the embodiment corresponding to fig. 3, and is not described herein again. After the standby device sends the request message, it will receive the link quality level between each active device and the first active device sent by other active devices. Since the standby device may have a problem with the links between the standby device and the other N-1 active devices, the standby device may not receive the N-1 link quality levels transmitted by the other N-1 active devices. The device receives a total of M Xi link quality levels, M being less than or equal to N-1. And setting w Xi link quality levels as abnormal, s Xi link quality levels as normal, and w + s is less than or equal to N-1.
s505, when the sum of the quantity of the first link quality level and the quantity of the N-1 Yj-th link quality levels representing the link abnormity is less than the quantity of the M Xi-th link quality levels representing the link abnormity, and the quantity of the N-1 Yj-th link quality levels representing the link normality is more than or equal to the quantity of the M Xi-th link quality levels representing the link normality, the standby equipment promotes the access priority, and the promoted access priority is used for indicating the user of the first active equipment to be switched to the standby equipment.
Specifically, after the standby device obtains the first link quality level, the N-1 Yj-th link quality level, and the M Xi-th link quality levels, it can be determined who caused the link problem between the first active device and the standby device. If the sum of the quantity of the link abnormity represented in the first link quality level and the N-1 Yj-th link quality levels is less than the quantity of the link abnormity represented in the M Xi-th link quality levels, namely p +1< w. The p +1 is that the standby device also considers that the link quality level between the standby device and the first active device represents the link abnormity, that is, the link quality level of the standby device represents that the number of the link abnormity is smaller than the number of the link quality level of the first active device representing the link abnormity; and the number of the N-1 Yj-th link quality levels indicating that the links are normal is greater than or equal to the number of the M Xi-th link quality levels indicating that the links are normal, namely q is greater than or equal to s, namely the link quality level of the standby equipment is considered to indicate that the number of the links is normal is greater than or equal to the number of the links considered to be normal by the link quality level of the first active equipment. That is, the voting opinion of each device is that the link quality level of the standby device indicates that the possibility of the link being abnormal is less than the possibility of the link quality level of the first active device indicating that the link is abnormal, and the possibility of the link quality level of the standby device indicating that the link is normal is equal to or greater than the possibility of the link quality level of the first active device indicating that the link is normal. The standby device will determine that the link problem between the first active device and the standby device is due to the failure of the first active device, and therefore the standby device will determine that the service of the first active device needs to be switched to the standby device.
for example, if the standby device wants to switch the service of the first active device to the standby device, the standby device may be implemented by a method of raising the access priority. When the standby device raises the access priority, the user device that was originally accessed to the first active device may be switched to the standby device. The principle of the standby device for improving the access priority is that the access priority needs to be improved to be higher than the access priority of the first main device, so that a user originally accessing the first main device can be switched to the standby device.
In the hot backup method provided by this embodiment, the link states between the standby device and each of the other primary devices are counted, and the link states between each of the primary devices and the first primary device are counted, so that the cause of the link problem between the first primary device and the standby device can be accurately determined, more reasonable hot backup processing can be performed, an erroneous hot backup processing due to the loss of the link between the first primary device and the standby device is avoided, and system resources are saved.
Fig. 6 is a flowchart of a hot backup method according to an embodiment of the present invention, where the hot backup method according to the embodiment further includes, on the basis of the method according to the embodiment of fig. 5:
S601, when the sum of the quantity of the first link quality level and the quantity of the N-1 Yj-th link quality levels representing the link abnormity is larger than or equal to the quantity of the link abnormity represented in the M Xi-th link quality levels, or the quantity of the N-1 Yj-th link quality levels representing the link normality is smaller than the quantity of the link normality represented in the M Xi-th link quality levels, the standby equipment suspends receiving the backup data sent by the first active equipment within a preset silence time window.
Specifically, after the standby device obtains the first link quality level, the N-1 Yj link quality levels, and the M Xi-th link quality levels, the following determination may be made. If the sum of the number of link anomalies represented in the first link quality level and the N-1 Yj link quality levels is greater than or equal to the number of link anomalies represented in the M-th Xi link quality levels, that is, p +1 ≧ w (p +1 is because the standby device also considers that the link quality level between the standby device and the first active device indicates a link abnormality), that is, it considers that the link quality level of the standby device indicates that the number of link abnormalities is greater than or equal to the number that the link quality level of the first active device indicates an abnormality, then the standby device can consider that the link state between the standby device and the other N-1 active devices is inferior to the link state between the first active device and the other N-1 active devices, and then the standby device can determine that the link problem between the first active device and the standby device is caused by a standby device failure. If the number of the links in the N-1 Yj link quality levels indicating that the links are normal is less than the number of the links in the M Xi link quality levels indicating that the links are normal, i.e., q is less than s, i.e., the link quality level of the standby device indicates that the number of the links in the link quality levels indicating that the links are normal is less than the number of the links in the link quality level indicating that the first active device indicates that the links are normal, the standby device may determine that the link state between the standby device and the N-1 other active devices is inferior to the link state between the first active device and the N-1 other active devices, and then the standby device may determine that the link problem between the first active device and the standby device is caused by the failure of the standby device. That is, the link quality level of the standby device indicates that the possibility of the link being abnormal is greater than or equal to the possibility that the link quality level of the first active device indicates that the link is abnormal, or the link quality level of the standby device indicates that the possibility that the link is normal is less than the possibility that the link quality level of the first active device indicates that the link is normal.
then, when the standby device determines that the problem of the link from the first active device to the standby device is caused by the failure of the standby device, if the standby device continues to receive the backup data sent by the first active device, the backup may fail or be repeatedly backed up due to reasons such as a link packet loss, thereby causing a waste of resources. Then the standby device suspends receiving the backup data sent by the first active device at this time. Further, the standby device may also start a silence time window, and suspend receiving, by the standby device, the backup data sent by the first active device in the silence time window. After the silent time window expires, the standby device may perform the hot backup process of the embodiment shown in fig. 5 again. The time of the silent time window can be set according to an empirical value, and the silent time window can be generally set as the mean time to failure repair of the system.
Further, in the embodiment shown in fig. 5 or fig. 6, the standby device determines whether to switch the service between the first active device and the standby device according to the link state between the standby device and each active device. However, there is also a possibility that the first primary device determines whether to start service switching, and after the first primary device determines that the link problem between the first primary device and the standby device is caused by the standby device and that service switching is not performed, the first primary device may send a silence time window to the standby device; after the standby device receives the silence time window sent by the first active device, it can be known that the first active device has determined that no service switching is performed, and it is determined that the problem of the link from the first active device to the standby device is caused by the failure of the standby device, and then the standby device does not process the received backup data in the silence time window. That is, after determining that no service switching is performed, the first active device may send a silent time window to the standby device, and the standby device may not process the received backup data in the time window without performing any judgment. When the silence time window expires, the standby device may then proceed according to the method of the embodiment shown in fig. 5 or fig. 6.
In addition, when the first active device determines whether to start service switching, and after the first active device determines that the service switching is performed if the first active device determines that the link problem between the first active device and the standby device is caused by the first active device, the first active device may send a message that the first active device lowers the access priority or initializes to the standby device. After the standby device receives the message sent by the first active device and indicating that the first active device lowers the access priority or the initialization, it can be known that the first active device has determined that the service switching is required, and then the standby device can accept the service switching of the first active device. That is, after the first active device determines to trigger the service switching, it may send a message that the first active device reduces the access priority or initializes to the standby device, so that the standby device may directly accept the service switching of the first active device without performing judgment.
Further, in the hot backup method provided by the embodiments shown in fig. 3 to fig. 6, the active device and the standby device are BRAS, and the first active device performs hot backup to the standby device through VRRP. The hot backup method provided by the embodiment of the invention is further explained below by taking BRAS as an example of hot backup through VRRP.
In order to apply the hot backup method provided by the present embodiment under the existing system architecture as shown in fig. 1A and fig. 1B, an additional functional module needs to be deployed in the BRAS-based system, including: and adding a deployment performance detection module between the BRAS (every two BRAS are deployed, including deployment between every two main BRAS, deployment between every main BRAS and corresponding standby BRAS). A module "inter-BRAS link quality detection" is added in the system, a detection period D is set, for example, D is 10 seconds, a packet loss detection (mainly, throughput index) result of an IP Flow Performance detection (FPM) is collected according to the D period, the detection result is converted into a packet loss ratio (droppacket rate) between 0 and 100, and the larger the value is, the more the packet loss is serious.
a module "BRAS terminal drop rate detection" module is added in the system, and a statistical period P is set, for example, P is 60 seconds; a detection period E is set, e.g., E10 seconds. The module calculates, according to the period of E, an average per second timeout to detect (down rate) value over P periods. The system sets a maximum offline rate (MaxDownRate), for example, 200 sessions (Sesson)/sec. Setting a down-line health index (cientdownhealth target), wherein the cientdownhealth target is 100/maxdownote; the value range is 0-100, and the larger the value is, the serious the disconnection condition is.
Adding a module "RBS flash detection" inside the system, and setting a statistical period Q, for example, 1800 seconds (30 minutes); one detection period F is set, for example, 10 seconds. The module calculates the RBS oscillation frequency (rbssshakerate) per fixed time (e.g., every 300 seconds) averaged over a Q period, from Down (Up) to Up (Up) or from Up (Up) to Down (Down) as one oscillation, according to the period of F. The RbsShakeRate ranges, for example, from 0 to 25 (300/12). A maximum oscillation frequency (MaxRbsShakeRate) is set, for example, as 300 (calculated as 1 oscillation per second). Converting an RBS health index (RBSHEulthTarget) RBSHEulthTarget which is RbsShakeRate 100/MaxRbsShakeRate, and taking a value between 0 and 100, wherein the health index of the RBS is lower when the value is larger.
also, a new Device Health Advertisement Protocol (DHAP) is defined, and the contents of the Protocol are shown in table 1.
TABLE 1 DHAP protocol
In table 1, fields of Version number (Version), Role (Role), Message Type (Message Type), and advertisement period (Interval) each occupy 1 byte, fields of health indicator (health target) and Reserved field (Reserved) each occupy 2 bytes, and fields of System identifier (System ID) each occupy 4 bytes.
The Version field is the Version number of the protocol, for example, the value is 1 when the Version is the first Version currently. The Role field is the Role of the device, for example, the value 1 is the main device, and the value 2 is the standby device; the role is set according to whether the device is currently in the active device or the standby device in the system. The MessageType field is the message type: for example, the value 1 is Health Advertisement (Health Advertisement), but may be of other types. Interval is the announce period, in seconds, with a default of 1, with a maximum range of 1 second to 255 seconds. Reserved is a Reserved field. The SystemID field is the system ID, uniquely identifying the device with an IP address of the device, which may for example select the source IP address of the RBS communication. The HealthTarget field is a health index of the equipment, and takes the value DropPacketRetrate + CientDownHealthTarget + RBSHEelthTarget; the smaller the value is, the healthier the product is, and the value range is 0-300.
The transport layer of the DHAP Protocol may be, for example, a User Datagram Protocol (UDP) or a Transmission Control Protocol (TCP), and is implemented according to the UDP in the embodiment of the present invention.
Detection parameters for detecting the peer device need to be designed for the DHAP protocol, including an Advertisement period (Advertisement Interval), a tuning Time (Skew _ Time), a peer offline detection period (peerdown Interval), and the like. The notification period is a period in which the DHAP protocol packet is notified outwards, and the unit is second, for example, default to 1 second, and the configuration range may be from 1 second to 255 seconds. The adjustment time is the adjustment time of the offline detection period of the opposite terminal, and the unit is second, and the value thereof can be ((256-health target)/256), for example. The offline detection period of the opposite terminal is the offline detection period of the opposite terminal device, the unit is second, and the value thereof may be (3 × Advertisement _ Interval) + view _ time), for example.
Two states may be defined for the DHAP protocol, an initialization (initialization) state and a Running (Running) state, respectively, where in the initialization state, each device configures neighbor information and then switches to the Running state. In the running state, each device receives the notification (Advertisement) message of the neighbor device, responds to one notification (Advertisement) message in real time, namely, the notification (Advertisement) message of the opposite side is regarded as a request, the health degree index (health target) notified by the neighbor device is recorded into the state of the neighbor device in the database periodic monitoring database, if the notification of the neighbor device cannot be received in an overdue state, the neighbor is regarded as unavailable, and the health degree index (Peerhealth target) parameter of the opposite side is set to be 65535 at the maximum value.
the neighbor device health information database (health targetdb) information definition collected by each device is shown in table 2.
table 2 HealthTargetDB information
The health information of the device is also recorded in the database.
The system identification (MySystemId) of the variable itself is defined, and the system identification (Identity, ID) of the device is recorded. The total number of devices (TotalDeviceNumber) of the variable is defined, and the total number of devices is recorded, including the device. The health thresholds are defined as Normal (Normal) < ═ 50, 50< Abnormal (Abnormal) < ═ 100, and unavailable (unaviable) > 100. Defining a disconnection threshold of the terminal equipment: normal (Noraml) < ═ 100; anomaly (> Absnormal) > 100. RBS flash threshold is defined, normal (Noraml) ═ 0, Abnormal (Abnormal) >2 (more than 2 per minute).
After the above modules and protocols are set in the system, the hot backup processing may be performed according to the method shown in fig. 7 or fig. 8, where fig. 7 is a processing flow of the active BRAS, and fig. 8 is a processing flow of the standby BRAS.
Fig. 7 is a flowchart of a hot backup method according to an embodiment of the present invention, where this embodiment is applied to a multi-machine hot backup scenario including N primary BRASs and one backup BRAS through VRRP hot backup, and an execution main body of this embodiment is any primary BRAS, which is referred to as a first primary BRAS. As shown in fig. 7, the hot backup method of the present embodiment includes:
S701, the first primary BRAS detects the heattarget of the backup BRAS, and when the first primary BRAS finds that the heattarget of the backup BRAS becomes an Abnormal (Abnormal) level, S702 is continuously executed. Otherwise, the first primary BRAS processes according to the existing hot backup process. The HealthTarget of the backup BRAS detected by the first active BRAS is the first link quality level.
s702, comparing the HealthTarget of the other N-1 primary BRASs with the first primary BRAS to obtain the degradation of the W equipment and the IP FPM index of the W equipment, and ensuring the normal performance of the S equipment and the IP FPM index of the S equipment; w + S ═ N-1. The HealthTarget of the other N-1 active BRAS detected by the first active BRAS is the Xi link quality level.
S703, the first primary BRAS sends a first request message to other N-1 primary BRASs to request the other side to answer whether the backup BRAS is degraded or not; and notifying the message of the health degree from the self to the backup BRAS, and waiting for a time window Tw of a response message, and pausing the backup of the data to the backup BRAS in the time window Tw.
S704, in a waiting time window Tw, the first active BRAS receives L response messages. Wherein P answers that the health degree from the opposite terminal to the backup BRAS is degraded, Q answers that the health degree from the opposite terminal to the backup BRAS is not degraded, P + Q is L, and L is N-1; and (6) carrying out hand-lifting voting. The response message received by the first active BRAS is the Yj link quality level.
s705, if (P +1> ═ W or Q < S). The backup BRAS to other primary BRAS is more unstable than the first primary BRAS, and votes the backup BRAS to have a fault: VRRP does not switch in a silent mode, meanwhile, a silent time window Ts is started, and data backup to a backup BRAS is suspended in the time window Ts; messages are continuously sent to the backup BRAS that inform the VRRP of silence (which message slave may not receive).
s706, if (P +1< W and Q > ═ S) and self health abnormality (Abnormal) is found at the same time (traffic is damaged, such as BRAS RUI user detection is dropped, ARP RUI detection is not reachable, etc.). The standby BRAS is more stable to other main BRAS than the first main BRAS, and votes the fault: immediately reducing the priority of the VRRP or initializing the VRRP and triggering VRRP/RUI switching; and sending a VRRP heartbeat message to the backup BRAS, wherein the message carries the current priority or the message of VRRP initialization (the message can not be received by the backup BRAS), and simultaneously starting a silent time window Ts.
and S707, except for the conditions of S705 and S706, the first primary BRAS keeps the state of the VRRP primary equipment, starts a silent time window Ts, and does not back up data to the opposite terminal.
In S705 to S707, the first primary BRAS performs the following determination within the quiet time window Ts, if the throughput rates of the first primary BRAS and the backup BRAS are recovered to normal: the silent state is released, the VRRP negotiates with the opposite terminal in the current state, and the state of the main equipment is kept unchanged or switched back to the state of the main equipment in a delayed way. Otherwise, if the second quiet time window Ts is overtime, S701 is repeatedly executed.
S708, the RUI user backup is restored.
fig. 8 is a flowchart of a hot backup method according to an embodiment of the present invention, where the embodiment is applied to a multi-machine hot backup scenario that includes N primary BRASs and one backup BRAS through VRRP hot backup, and an execution main body of the embodiment is a backup BRAS. As shown in fig. 8, the hot backup method of the present embodiment includes:
S801, the standby BRAS detects whether the RBS health index (rbshielttarget) of the first primary BRAS exceeds a threshold, and when the RBS health index (rbshielttarget) of the standby BRAS and the first primary BRAS exceeds the threshold, S802 is continuously executed. Otherwise, processing according to the existing hot backup process. The HealthTarget of the first active BRAS detected by the standby BRAS is the first link quality level.
S802, if the standby BRAS receives the message of VRRP silence sent by the first main BRAS, the S810 is switched to.
S803, if the standby BRAS receives the VRRP initialization message or the low priority VRRP sent by the first primary BRAS, the process goes to S811.
s804, the standby BRAS compares the IP FPM packet loss ratio (DropHealthTarget) of other N-1 main BRASs to obtain the IPFPM index degradation of p devices and the own IPFPM index, and q devices and the own IP FPM index are normal; p + q ═ N-1. The DropHealthTarge of the other N-1 primary BRASs detected by the standby BRAS is the Yj link quality level.
S805, the standby BRAS sends a second request message to other N-1 primary BRASs to request the other side to answer whether the link to the first primary BRAS is degraded or not; and reports a message (passing the health target attribute of the message) of the health index (throughput rate degradation) from the self to the first primary BRAS, waits for a time window Tw of a response message, and suspends receiving the backup data sent by the first primary BRAS in the time window Tw.
S806, the backup BRAS receives l response messages within a waiting time window Tw. Wherein w answers that the throughput rate from the opposite end to the first primary BRAS is degraded, s answers that the health index from the opposite end to the first primary BRAS is not degraded, w + s is l, and l < ═ N-1; and (6) carrying out hand-lifting voting. The response message received by the standby BRAS is the Xi link quality level.
S807, if (p +1> ═ w or q < S), the backup BRAS to other primary BRAS is less stable than the first primary BRAS, votes the backup BRAS to fail: proceed to S810.
S808, if (p +1< w and q > ═ S), the backup BRAS to other primary BRAS is more stable than the first primary BRAS, voting the first primary BRAS failure: the VRRP priority of the backup BRAS is immediately raised and the process proceeds to S811.
S809, except for the cases in S807 and S808, each device maintains the VRRP state, or the active device state or the standby device state, but does not backup data to the opposite end in the active device state.
S810, the VRRP of the backup BRAS transitions to the initialization state and is not processed even if the RBS backup message is received.
S811, the VRRP of the backup BRAS is raised from the backup device state to the active device state.
the embodiments in fig. 7 and fig. 8 are only an implementation manner in which the BRAS performs hot backup through VRRP, but the hot backup method provided by the embodiment of the present invention is not limited thereto, where the link quality level between BRAS may include: at least one of a packet loss proportion, an overtime offline rate proportion and a remote backup oscillation proportion; the abnormal level comprises that at least one or any sum of a packet loss proportion, an overtime offline rate proportion and a remote backup oscillation proportion exceeds a preset abnormal proportion threshold.
Fig. 9 is a schematic structural diagram of a first active device according to an embodiment of the present invention, where the first active device provided in this embodiment is applied to an active device and a standby device, and the first active device is N: 1, where N is greater than or equal to 2, as shown in fig. 9, the first primary device provided in this embodiment includes:
a first obtaining module 91, configured to obtain a first link quality level, where the first link quality level is used to indicate link quality between the first active device and the standby device.
A determining module 92, configured to determine whether the first link quality level indicates a link anomaly.
a second obtaining module 93, configured to obtain, when the determining module 92 determines that the first link quality level indicates a link abnormality, N-1 xth link quality levels between the first active device and other N-1 active devices, where the xth link quality level is used to indicate link quality between the first active device and the ith active device, N is greater than or equal to 2, and i is greater than 1 and less than or equal to N.
a sending module 94, configured to send a request packet to each of the other N-1 active devices, where the request packet is used to request each of the other N-1 active devices to send a link quality level between the other N-1 active devices and the standby device, respectively.
A receiving module 95, configured to receive M Yj-th link quality levels between a jth active device and the standby device, where the jth active device is sent by M active devices in the other N-1 active devices, where the Yj-th link quality level is used to indicate link quality between the jth active device and the standby device, j is greater than 1 and less than or equal to N, and M is greater than or equal to zero and less than or equal to N-1.
A processing module 96, configured to switch a service to the standby device when a sum of the number of the first link quality level and the number of the M Yj link quality levels, which indicates a link anomaly, is less than the number of the N-1 Xi link quality levels, and the number of the M Yj link quality levels, which indicates a link anomaly, is greater than or equal to the number of the N-1 Xi link quality levels, which indicates a link anomaly, and a service of the first active device is damaged.
The first active device provided in this embodiment is used to implement the technical solution of the method embodiment shown in fig. 3, and the implementation principle and the technical effect are similar, which are not described herein again.
further, in the embodiment shown in fig. 9, the processing module 96 is further configured to start a preset silence time window after switching the service to the standby device, and no longer switch back to the active device state in the silence time window.
Further, in the embodiment shown in fig. 9, the processing module 96 is specifically configured to reduce access priority or initialize, and trigger a user accessing the first active device to switch a service to the standby device.
Further, in the embodiment shown in fig. 9, the sending module 94 is further configured to send, to the standby device, a message that the first active device lowers the access priority or initializes, after the processing module 96 lowers the access priority or initializes.
Further, in the embodiment shown in fig. 9, the first obtaining module 91 is further configured to obtain the first link quality level again after the silence time window is expired. The sending module 94 is further configured to notify the standby device to switch back the service to the first active device when the determining module 92 determines that the first link quality level indicates that the link is normal.
Further, in the embodiment shown in fig. 9, the processing module 96 is further configured to not backup data to the standby device within a preset silence time window when a sum of the first link quality level and the number of the M Yj link quality levels indicating the link abnormality is greater than or equal to the number of the N-1 Xi link quality levels indicating the link abnormality, or when the number of the M Yj link quality levels indicating the link normality is smaller than the number of the N-1 Xi link quality levels indicating the link normality, or when the service of the first active device itself is not damaged.
Further, in the embodiment shown in fig. 9, the sending module 94 is further configured to send the silence time window to the standby device after the processing module 96 determines that the link status from the standby device to the other N-1 primary devices is worse than the link status from the first primary device to the other N-1 primary devices, where the silence time window is used for the standby device not to process the received backup data within the silence time window.
Further, in the embodiment shown in fig. 9, the sending module 94 is further configured to suspend sending backup data to the standby device within a waiting time window from when the request packet is sent to each active device of the other N-1 active devices respectively.
further, in the embodiment shown in fig. 9, the active device and the standby device are BRAS;
The processing module 96 is specifically configured to perform a hot backup to the standby device via VRRP.
Further, the link quality levels in the embodiment shown in fig. 9 include: at least one of a packet loss proportion, an overtime offline rate proportion and a remote backup oscillation proportion. The link quality level may be the first link quality level, the Xi link quality level, or the Yj link quality level. The abnormal level comprises that at least one of the packet loss proportion, the overtime offline rate proportion and the remote backup oscillation proportion exceeds a preset abnormal proportion threshold or the sum of any two of the packet loss proportion, the overtime offline rate proportion and the remote backup oscillation proportion exceeds a preset abnormal proportion threshold.
Fig. 10 is a schematic structural diagram of a standby device according to an embodiment of the present invention, where the standby device provided in this embodiment is applied to an active device and a standby device, where N: 1, where N is greater than or equal to 2, as shown in fig. 10, the standby device provided in this embodiment includes:
a first obtaining module 101, configured to obtain a first link quality level, where the first link quality level is used to indicate a link quality between a first active device and a standby device.
a determining module 102, configured to determine whether the first link quality level indicates a link anomaly.
the second obtaining module 103 is further configured to, when the determining module 102 determines that the first link quality level indicates a link abnormal level, obtain N-1 Yj-th link quality levels between the standby device and each of N-1 active devices other than the first active device, where the Yj-th link quality level is used to indicate link quality between the standby device and the j-th active device, N is greater than or equal to 2, and j is greater than 1 and less than or equal to N.
A sending module 104, configured to send a request packet to each of the other N-1 active devices, where the request packet is used to request each of the other N-1 active devices to send a link quality level between the other N-1 active devices and the first active device.
A receiving module 105, configured to receive M Xi-th link quality levels between the ith active device and the first device, where the Xi-th link quality levels are sent by the other N-1 active devices, where the Xi-th link quality level is used to indicate link quality between the ith active device and the first active device, i is greater than 1 and less than or equal to N, and M is greater than or equal to zero and less than or equal to N-1.
A processing module 106, configured to, when a sum of the number of the first link quality level and the number of the N-1 Yj link quality levels indicating the link anomaly is smaller than the number of the M xth link quality levels indicating the link anomaly, and the number of the N-1 Yj link quality levels indicating the link anomaly is greater than or equal to the number of the M xth link quality levels indicating the link anomaly, promote an access priority, where the promoted access priority is used to indicate that a user of the first active device switches to the standby device.
The standby device provided in this embodiment is used to implement the technical solution of the method embodiment shown in fig. 5, and the implementation principle and technical effect are similar, which are not described herein again.
further, in the embodiment shown in fig. 10, the processing module 106 is further configured to suspend receiving the backup data sent by the first active device within a preset silence time window when a sum of the first link quality level and N-1 number of the Yj link quality levels indicating a link anomaly is greater than or equal to a number indicating a link anomaly in M number of the Xi link quality levels, or a number indicating a link normality in N-1 number of the Yj link quality levels is less than a number indicating a link normality in M number of the Xi link quality levels.
Further, in the embodiment shown in fig. 10, the receiving module 105 is further configured to receive a silence time window sent by the first active device; the processing module 106 is further configured to not process the received backup data during the silent time window.
Further, in the embodiment shown in fig. 10, the receiving module 105 is further configured to receive a message that is sent by the first active device and used by the first active device to reduce the access priority or initialize; the processing module 106 is further configured to accept that the user of the first active device switches to the standby device after receiving the message that the first active device lowers the access priority or initializes.
Further, in the embodiment shown in fig. 10, the active device and the standby device are BRAS; and the first main equipment is subjected to hot backup to the standby equipment through VRRP.
further, the link quality levels in the embodiment shown in fig. 10 include: at least one of a packet loss proportion, an overtime offline rate proportion and a remote backup oscillation proportion. The link quality level may be the first link quality level, the Xi link quality level, or the Yj link quality level. The abnormal level comprises that at least one of the packet loss proportion, the overtime offline rate proportion and the remote backup oscillation proportion exceeds a preset abnormal proportion threshold or the sum of any two of the packet loss proportion, the overtime offline rate proportion and the remote backup oscillation proportion exceeds a preset abnormal proportion threshold.
fig. 11 is a schematic structural diagram of a first active device according to an embodiment of the present invention, and as shown in fig. 11, the first active device according to this embodiment includes: processor 111, transmitter 112, receiver 113. Optionally, the first active device may further include a memory 114. The processor 111, the transmitter 112, the receiver 113 and the memory 114 may be connected by a system bus or other methods, and fig. 11 illustrates an example in which the system bus is connected; the system bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one line is shown in FIG. 11, but this does not represent only one bus or one type of bus. The memory 114 is used to store computer programs. The processor 111 can read the code corresponding to the stored computer program from the memory 114, and perform the following operations:
acquiring a first link quality level, wherein the first link quality level is used for representing the link quality between the first active device and the standby device;
Judging whether the first link quality level represents link abnormity;
When the first link quality level is judged to represent link abnormity, obtaining N-1 Xi link quality levels between the first main device and other N-1 main devices, wherein the Xi link quality level is used for representing the link quality between the first main device and the ith main device, N is greater than or equal to 2, and i is greater than 1 and less than or equal to N;
Respectively sending a request message to each of the other N-1 active devices through a sender 112, where the request message is used to request each of the other N-1 active devices to respectively send a link quality level between the other N-1 active devices and the standby device;
M Yj-th link quality levels between the jth active device and the standby device, which are sent by M active devices of the other N-1 active devices, are respectively received through the receiver 113, where the Yj-th link quality levels are used to represent link qualities between the jth active device and the standby device, j is greater than 1 and less than or equal to N, and M is greater than or equal to zero and less than or equal to N-1.
when the sum of the first link quality level and the quantity of the M Yj-th link quality levels representing the link abnormity is less than the quantity of the N-1 Xi-th link quality levels representing the link abnormity, and the quantity of the M Yj-th link quality levels representing the link normality is more than or equal to the quantity of the N-1 Xi-th link quality levels representing the link normality, and the service of the first active device is damaged, the service is switched to the standby device.
The first active device of this embodiment is used to implement the hot backup method shown in fig. 3, and the implementation principle and the technical effect are similar, which are not described herein again.
In the first active device provided in this embodiment, the processor 111 is configured to implement processing of the determining module 92, the processing module 96, the first obtaining module 91, and the second obtaining module 93 in the first active device shown in fig. 9; the transmitter 112 is configured to implement the processing of the transmission module 94 in the first active device shown in fig. 9; the receiver 113 is used to implement the processing of the receiving module 95 in the first active device shown in fig. 9.
Fig. 12 is a schematic structural diagram of a standby device according to an embodiment of the present invention, and as shown in fig. 12, the standby device according to the embodiment includes: processor 121, transmitter 122, receiver 123. Optionally, the standby device may also include a memory 124. The processor 121, the transmitter 122, the receiver 123 and the memory 124 may be connected by a system bus or other methods, and fig. 12 illustrates an example of the system bus being connected; the system bus may be an ISA bus, PCI bus, EISA bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one line is shown in FIG. 12, but this does not represent only one bus or one type of bus. The memory 124 is used to store computer programs. The processor 121 may read the code corresponding to the stored computer program from the memory 124, and perform the following operations:
acquiring a first link quality level, wherein the first link quality level is used for representing the link quality between first active equipment and standby equipment;
Judging whether the first link quality level represents link abnormity;
The method comprises the steps of obtaining N-1 Yj-th link quality levels between the standby device and each of the other N-1 primary devices except the first primary device when the first link quality level is judged to represent a link abnormal level, wherein the Yj-th link quality level is used for representing the link quality between the standby device and the j-th primary device, N is greater than or equal to 2, and j is greater than 1 and less than or equal to N;
Respectively sending a request message to each of the other N-1 active devices through a sender 122, where the request message is used to request each of the other N-1 active devices to respectively send a link quality level between the other N-1 active devices and the first active device;
Receiving, by a receiver 123, M Xi-th link quality levels between an ith active device and the first active device, where the Xi-th link quality levels are sent by the other N-1 active devices, where the Xi-th link quality level is used to represent link quality between the ith active device and the first active device, i is greater than 1 and less than or equal to N, and M is greater than or equal to zero and less than or equal to N-1;
When the sum of the number of the first link quality levels and the number of the N-1 Yj-th link quality levels representing link anomalies is less than the number of the M Xi link quality levels representing link anomalies, and the number of the N-1 Yj-th link quality levels representing link normality is greater than or equal to the number of the M Xi link quality levels representing link normality, an access priority is raised, and the raised access priority is used for indicating that the user of the first active device is switched to the standby device.
In the standby device provided in this embodiment, the processor 121 is configured to implement the processing of the first obtaining module 101, the second obtaining module 103, the judging module 102 and the processing module 106 in the standby device shown in fig. 10; the transmitter 122 is used to implement the processing of the transmit module 104 in the standby device shown in fig. 10; the receiver 123 is used to implement the processing of the receiving module 105 in the standby apparatus shown in fig. 10.
Fig. 13 is a schematic structural diagram of a first communication system according to an embodiment of the present invention, where the communication system includes an active device and a standby device, where N: fig. 13 shows that the multi-machine hot backup system of fig. 1, the communication system of this embodiment includes N active devices 131 and 1 standby device 132.
The active device 131 includes a first active device as shown in fig. 9 or fig. 11; the standby device 132 includes a standby device as shown in fig. 10 or 12.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (13)

1. A method of hot backup, the method comprising:
A first active device acquires a first link quality level, wherein the first link quality level is used for representing the link quality between the first active device and a standby device;
When the first link quality level indicates that a link is abnormal, the first active device obtains N-1 Xi-th link quality levels between the first active device and other N-1 active devices, wherein the Xi-th link quality level is used for indicating the link quality between the first active device and the ith active device, N is greater than or equal to 2, and i is greater than 1 and less than or equal to N;
The first master device sends a request message to each of the other N-1 master devices, where the request message is used to request each of the other N-1 master devices to send a link quality level between the other N-1 master devices and the standby device;
the first master device respectively receives M Yj-th link quality levels between a jth master device and the standby device, which are sent by M master devices in the other N-1 master devices, wherein the Yj-th link quality level is used for representing the link quality between the jth master device and the standby device, j is greater than 1 and less than or equal to N, and M is greater than zero and less than or equal to N-1;
When the sum of the number of the first link quality levels and the number of the M Yj-th link quality levels representing link anomalies is smaller than the number of the N-1 Xi-th link quality levels representing link anomalies, and the number of the M Yj-th link quality levels representing link normality is larger than or equal to the number of the N-1 Xi-th link quality levels representing link normality, and the traffic of the first active device is damaged, the first active device switches the traffic to the standby device.
2. The method of claim 1, further comprising:
When the sum of the number of the first link quality levels and the number of the M Yj link quality levels indicating the link abnormality is greater than or equal to the number of the N-1 xth link quality levels indicating the link abnormality, or when the number of the M Yj link quality levels indicating the link normality is less than the number of the N-1 xth link quality levels indicating the link normality, or when the service of the first active device itself is not damaged, the first active device does not back up data to the standby device within a preset silence time window.
3. the method of claim 2, further comprising:
When the silent time window is overtime, the first master device acquires the first link quality level again;
And when the first link quality level indicates that the link is normal, the first active device informs the standby device to switch the service back to the first active device.
4. the method according to any one of claims 1 to 3, further comprising:
And the first main equipment suspends the transmission of the backup data to the standby equipment from the waiting time window beginning when the request message is respectively transmitted to each main equipment in the other N-1 main equipments.
5. A method of hot backup, the method comprising:
The standby equipment acquires a first link quality level, wherein the first link quality level is used for expressing the link quality between the first active equipment and the standby equipment;
When the first link quality level represents a link abnormal level, the standby device obtains N-1 Yj-th link quality levels between the standby device and each of N-1 primary devices except the first primary device, wherein the Yj-th link quality level is used for representing the link quality between the standby device and the j-th primary device, N is greater than or equal to 2, and j is greater than 1 and less than or equal to N;
the standby device sends a request message to each of the other N-1 primary devices, where the request message is used to request each of the other N-1 primary devices to send a link quality level between the other N-1 primary devices and the first primary device;
the standby equipment respectively receives M Xi link quality levels between the ith main equipment and the first main equipment, which are sent by the other N-1 main equipments, wherein the Xi link quality levels are used for representing the link quality between the ith main equipment and the first main equipment, i is greater than 1 and less than or equal to N, and M is greater than zero and less than or equal to N-1;
When the sum of the number of the first link quality levels and the number of the N-1 Yj link quality levels representing link anomalies is less than the number of the M Xi link quality levels representing link anomalies, and the number of the N-1 Yj link quality levels representing link normality is greater than or equal to the number of the M Xi link quality levels representing link normality, the standby device raises an access priority, where the raised access priority is used to indicate that a user of the first active device switches to the standby device.
6. The method of claim 5, further comprising:
when the sum of the number of the first link quality levels and the number of the N-1 Yj link quality levels representing link anomalies is greater than or equal to the number of the M Xi link quality levels representing link anomalies, or the number of the N-1 Yj link quality levels representing link normality is less than the number of the M Xi link quality levels representing link normality, the standby device suspends receiving the backup data sent by the first active device within a preset silence time window.
7. A first master device, the first master device comprising:
A first obtaining module, configured to obtain a first link quality level, where the first link quality level is used to indicate link quality between the first active device and the standby device;
the judging module is used for judging whether the first link quality level represents link abnormity;
A second obtaining module, configured to obtain, when the determining module determines that the first link quality level indicates a link abnormality, N-1 xth link quality levels between the first active device and N-1 other active devices, where the xth link quality level is used to indicate link quality between the first active device and the ith active device, N is greater than or equal to 2, and i is greater than 1 and less than or equal to N;
a sending module, configured to send a request packet to each of the other N-1 primary devices, where the request packet is used to request each of the other N-1 primary devices to send a link quality level between the other N-1 primary devices and the standby device;
A receiving module, configured to receive M Yj-th link quality levels between a jth active device and the standby device, where the jth active device is sent by M active devices in the N-1 other active devices, and the Yj-th link quality level is used to indicate link quality between the jth active device and the standby device, j is greater than 1 and less than or equal to N, and M is greater than zero and less than or equal to N-1;
A processing module, configured to switch a service to the standby device when a sum of the first link quality level and the number of the M Yj link quality levels indicating the link anomaly is less than the number of the N-1 Xi link quality levels indicating the link anomaly, and the number of the M Yj link quality levels indicating the link anomaly is greater than or equal to the number of the N-1 Xi link quality levels indicating the link anomaly, and the service of the first active device is damaged.
8. the first active device according to claim 7, wherein the processing module is further configured to, when a sum of the numbers of the first link quality level and the M Yj link quality levels that represent link anomalies is greater than or equal to a number of the N-1 Xi link quality levels that represent link anomalies, or when a number of the M Yj link quality levels that represent link normality is less than a number of the N-1 Xi link quality levels that represent link normality, or when a service of the first active device itself is not damaged, not backup data to the standby device within a preset silence time window.
9. the first active device according to claim 8, wherein the first obtaining module is further configured to obtain the first link quality level again after the silence time window is expired;
The sending module is further configured to notify the standby device to switch back to the first active device when the determining module determines that the first link quality level indicates that the link is normal.
10. the first active device according to any one of claims 7 to 9, wherein the sending module is further configured to suspend sending backup data to the standby device within a waiting time window from when the request packet is sent to each of the other N-1 active devices.
11. A backup device, the backup device comprising:
A first obtaining module, configured to obtain a first link quality level, where the first link quality level is used to indicate link quality between a first active device and a standby device;
The judging module is used for judging whether the first link quality level represents link abnormity;
A second obtaining module, configured to obtain, when the determining module determines that the first link quality level indicates a link abnormal level, N-1 Yj link quality levels between the standby device and each of N-1 primary devices other than the first primary device, where the Yj link quality level is used to indicate link quality between the standby device and the j primary device, N is greater than or equal to 2, and j is greater than 1 and less than or equal to N;
a sending module, configured to send a request packet to each of the other N-1 primary devices, where the request packet is used to request each of the other N-1 primary devices to send a link quality level between the other N-1 primary devices and the first primary device;
A receiving module, configured to receive M Xi link quality levels between an ith active device and the first active device, where the M Xi link quality levels are sent by the other N-1 active devices, where the Xi link quality level is used to indicate a link quality between the ith active device and the first active device, i is greater than 1 and less than or equal to N, and M is greater than zero and less than or equal to N-1;
A processing module, configured to, when a sum of the number of the first link quality level and the number of the N-1 Yj link quality levels indicating the link anomaly is smaller than the number of the M xth link quality levels indicating the link anomaly, and the number of the N-1 Yj link quality levels indicating the link anomaly is greater than or equal to the number of the M xth link quality levels indicating the link anomaly, promote an access priority, where the promoted access priority is used to indicate that a user of the first active device switches to the standby device.
12. The standby device according to claim 11, wherein the processing module is further configured to suspend receiving the backup data sent by the first active device within a preset silence time window when a sum of the first link quality level and N-1 of the Yj link quality levels, which indicates that the number of link anomalies is greater than or equal to a number of link anomalies indicated in M of the Xi link quality levels, or a number of link anomalies indicated in N-1 of the Yj link quality levels, which indicates that the number of link anomalies is less than a number of link anomalies indicated in M of the Xi link quality levels.
13. A communication system, characterized in that the communication system comprises N primary devices and the standby device of claim 11 or 12, and any one of the N primary devices is the first primary device of any one of claims 7 to 10.
CN201510675648.3A 2015-10-15 2015-10-15 Hot backup method, first main device, standby device and communication system Active CN106603261B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510675648.3A CN106603261B (en) 2015-10-15 2015-10-15 Hot backup method, first main device, standby device and communication system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510675648.3A CN106603261B (en) 2015-10-15 2015-10-15 Hot backup method, first main device, standby device and communication system

Publications (2)

Publication Number Publication Date
CN106603261A CN106603261A (en) 2017-04-26
CN106603261B true CN106603261B (en) 2019-12-06

Family

ID=58554246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510675648.3A Active CN106603261B (en) 2015-10-15 2015-10-15 Hot backup method, first main device, standby device and communication system

Country Status (1)

Country Link
CN (1) CN106603261B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201929551A (en) * 2017-12-26 2019-07-16 圓剛科技股份有限公司 Streaming system with backup mechanism and backup method thereof
CN108334425A (en) * 2018-01-26 2018-07-27 郑州云海信息技术有限公司 A kind of the redundancy replacement method, apparatus and equipment of server QPI link
CN108880917B (en) 2018-08-23 2021-01-05 华为技术有限公司 Switching method and device of control plane equipment and transfer control separation system
CN110874929A (en) * 2018-08-31 2020-03-10 株式会社电装天 Data collection device, data collection system, data collection method, and vehicle-mounted device
CN109327398B (en) * 2018-11-21 2021-05-28 新华三技术有限公司 Method and device for preventing packet loss
KR102481113B1 (en) 2019-02-11 2022-12-26 주식회사 엘지에너지솔루션 System and method for checking slave battery management system
CN110119111B (en) * 2019-02-26 2021-04-16 北京龙鼎源科技股份有限公司 Communication method and device, storage medium, and electronic device
CN111953436A (en) * 2020-08-12 2020-11-17 深圳市泛海三江电子股份有限公司 Communication method and system based on redundancy technology
US11503526B2 (en) * 2020-09-15 2022-11-15 International Business Machines Corporation Predictive communication compensation

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100479434C (en) * 2005-09-15 2009-04-15 华为技术有限公司 Method and system for realizing virtual router redundant protocol master and standby equipment switching
CN101447858B (en) * 2008-01-17 2012-01-11 中兴通讯股份有限公司 Method for realizing synchronous switching of virtual router redundancy protocol in dual-machine hot backup system
CN101257405B (en) * 2008-04-03 2010-12-08 中兴通讯股份有限公司 Method for implementing double chain circuits among master-salve equipments
CN102448095A (en) * 2012-01-20 2012-05-09 杭州华三通信技术有限公司 Dual homing protection method and equipment
CN102664750B (en) * 2012-04-09 2014-09-10 北京星网锐捷网络技术有限公司 Method, system and device for hot backup of multi-machine
CN103368712A (en) * 2013-07-18 2013-10-23 华为技术有限公司 Switchover method and device for main equipment and standby equipment

Also Published As

Publication number Publication date
CN106603261A (en) 2017-04-26

Similar Documents

Publication Publication Date Title
CN106603261B (en) Hot backup method, first main device, standby device and communication system
US10649866B2 (en) Method and apparatus for indirectly assessing a status of an active entity
CN112866004B (en) Control plane equipment switching method and device and transfer control separation system
US8117337B2 (en) Method and device for implementing link pass through in point-to-multipoint network
EP3188450B1 (en) Reducing false alarms when using network keep-alive messages
CN109344014B (en) Main/standby switching method and device and communication equipment
US20080253295A1 (en) Network system and node apparatus
KR100576005B1 (en) Router redundancy method and apparatus for supporting high availability
CN104301146A (en) Link switching method and device in software defined network
CN104283711B (en) Fault detection method, node and system based on two-way converting detection BFD
US20140043960A1 (en) Method, tor switch, and system for implementing protection switchover based on trill network
US10541904B2 (en) Establishing a network fault detection session
CN106341270B (en) A kind of fault handling method and device
CN107241208B (en) Message forwarding method, first switch and related system
WO2011157149A2 (en) Method, communication device and system, and service request device for main/standby switch between communication devices
CN106817267B (en) Fault detection method and equipment
CN106487696B (en) Link failure detection method and device
EP2787699A1 (en) Data transmission method, device, and system
CN112714060B (en) Link detection method and device
CN109039728B (en) BFD-based flow congestion detection method and system
JP4692419B2 (en) Network device, redundant switching method used therefor, and program thereof
CN109039921A (en) A kind of Designated Router switching method, multicast router and multicast network
CN115842751B (en) Link detection method, device, system, equipment and readable storage medium
JP6301750B2 (en) Relay device
CN110995585A (en) Link non-load sharing protection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant