CN111953591A - Fault processing method and device - Google Patents

Fault processing method and device Download PDF

Info

Publication number
CN111953591A
CN111953591A CN202010692829.8A CN202010692829A CN111953591A CN 111953591 A CN111953591 A CN 111953591A CN 202010692829 A CN202010692829 A CN 202010692829A CN 111953591 A CN111953591 A CN 111953591A
Authority
CN
China
Prior art keywords
port
state
current state
keepalive
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010692829.8A
Other languages
Chinese (zh)
Inventor
殷建忠
李玉刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou H3C Technologies Co Ltd
New H3C Technologies Co Ltd
Original Assignee
Hangzhou H3C Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou H3C Technologies Co Ltd filed Critical Hangzhou H3C Technologies Co Ltd
Priority to CN202010692829.8A priority Critical patent/CN111953591A/en
Publication of CN111953591A publication Critical patent/CN111953591A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • H04L45/245Link aggregation, e.g. trunking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/22Alternate routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery

Abstract

The application provides a fault processing method and a device, wherein the method comprises the following steps: receiving a first keepalive message sent by second network equipment, wherein the first keepalive message comprises the current state of a second DR port of the second network equipment, and the current state of the second DR port is a first state; according to the current state of the second DR port, the working role of the first network equipment is updated to be main equipment, the MAD mechanism is released, and the current state of the first DR port is updated to be a second state; and sending a second keepalive message to the second network equipment, wherein the second keepalive message comprises the updated current state of the first DR port, so that the second network equipment updates the working role of the second network equipment to be slave equipment, starts an MAD mechanism, and updates the current states of the second DR port and a second uplink port connected with an external network to be the first state.

Description

Fault processing method and device
Technical Field
The present application relates to the field of communications technologies, and in particular, to a fault handling method and apparatus.
Background
Distributed Resilient Network Interconnect (DRNI) is a cross-device link aggregation technology, and specifically means that two physical devices are virtualized into one device on an aggregation layer to implement cross-device link aggregation, thereby providing device-level redundancy protection and traffic load sharing.
As shown in fig. 1, fig. 1 is a schematic diagram of a conventional DRNI network model. The Device A and the Device B form load sharing to jointly transmit traffic, and when one Device fails, the traffic can be quickly switched to the other Device, so that normal operation of services is guaranteed.
The device a and the device B form a Distributed aggregation (DR) system, and may also be referred to as DR devices. The DR devices are adjacent to each other in the DR system, wherein the device a is a Primary (Primary) device and the device B is a Secondary (Secondary) device.
Currently, while the DR system is operating, a failure may occur, for example, an IPL link failure. If the IPL link fails, in order to prevent the wrong forwarding of the traffic, the device B starts a Multi-Active Detection (MAD) mechanism, and sets both the DR port and the uplink port between the DR port and the external network to a down state, and only the device a keeps single-Active operation.
At this time, if the DR port on the device a also fails, as shown in fig. 2, fig. 2 is a schematic diagram of a conventional DRNI secondary failure scenario. Because the DR interface on the device B is still in the down state, the flow of the DR system is interrupted for a long time before the failure of the DR interface on the device a is recovered, and the keep-alive requirement under the secondary failure scene cannot be met.
Disclosure of Invention
In view of this, the present application provides a fault handling method and apparatus, so as to solve the problem in the prior art that the flow of the DR system is interrupted for a long time in a scenario where the DR system has multiple faults.
In a first aspect, the present application provides a fault handling method, where the method is applied to a first network device in a DR system, where the DR system further includes a second network device, where an IPL link between the first network device and the second network device fails, the first network device starts a MAD mechanism, and a first DR port of the first network device and a first uplink port connected to an IP network are both in a first state, where the method includes:
receiving a first keepalive message sent by the second network device through a keepalive link, wherein the first keepalive message comprises a current state of a second DR port of the second network device, and the current state of the second DR port is a first state;
updating the working role of the first network equipment to be main equipment according to the current state of the second DR port, removing the MAD mechanism and updating the current state of the first DR port to be a second state;
and sending a second keepalive message to the second network equipment through the keepalive link, wherein the second keepalive message comprises the updated current state of the first DR port, so that the second network equipment updates the working role of the second network equipment into slave equipment, starts the MAD mechanism, and updates the current states of the second DR port and a second uplink port connected with an external network into the first state.
In a second aspect, the present application provides a fault handling apparatus, where the apparatus is applied to a first network device in a DR system, where the DR system further includes a second network device, where an IPL link between the first network device and the second network device fails, the first network device starts an MAD mechanism, and a first DR port of the first network device and a first uplink port connected to an IP network are both in a first state, and the apparatus includes:
a receiving unit, configured to receive a first keepalive packet sent by the second network device through a keepalive link, where the first keepalive packet includes a current state of a second DR port of the second network device, and the current state of the second DR port is a first state;
an updating unit, configured to update the work role of the first network device to a master device according to the current state of the second DR port, remove the MAD mechanism, and update the current state of the first DR port to a second state;
and the sending unit is used for sending a second keepalive message to the second network equipment through the keepalive link, wherein the second keepalive message comprises the updated current state of the first DR port, so that the second network equipment updates the working role of the second network equipment into slave equipment, starts the MAD mechanism, and updates the current states of the second DR port and a second uplink port connected with an external network into the first state.
In a third aspect, the present application provides a network device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to perform the method provided by the first aspect of the present application.
Therefore, by applying the fault handling method and apparatus provided by the present application, the first network device receives a first keepalive message including a current state of a second DR port of the second network device, which is sent by the second network device through a keepalive link, where the current state of the second DR port is the first state. And according to the current state of the second DR port, the first network equipment updates the working role of the first network equipment to be the main equipment, releases the MAD mechanism and updates the current state of the first DR port to be the second state. And through the keepalive link, the first network equipment sends a second keepalive message including the updated current state of the first DR port to the second network equipment, so that the second network equipment updates the working role of the second network equipment to slave equipment, starts an MAD mechanism, and updates the current states of the second DR port and a second uplink port connected with the IP network to the first state.
Therefore, the problem that in the prior art, the flow of the DR system is interrupted for a long time under the condition that the DR system has multiple fault scenes is solved. The method and the device have the advantages that the flow can be continuously forwarded under the condition of multiple faults, meanwhile, additional configuration is not needed, the usability is high, the flow switching performance during fault recovery is improved, and the fault influence range is reduced.
Drawings
Fig. 1 is a schematic diagram of a conventional DRNI network model;
fig. 2 is a schematic diagram of a conventional DRNI secondary failure scenario;
fig. 3 is a flowchart of a fault handling method according to an embodiment of the present application;
fig. 4 is a timing flowchart of a fault handling method according to an embodiment of the present application;
fig. 5 is a structural diagram of a fault handling apparatus according to an embodiment of the present application;
fig. 6 is a hardware structure diagram of a network device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the corresponding listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The following describes the fault handling method provided in the embodiments of the present application in detail. Referring to fig. 3, fig. 3 is a flowchart illustrating a fault handling method according to an embodiment of the present application. The method is applied to a first network device in a DR system. The DR system also includes a second network device. The IPL link between the first network device and the second network device fails, and both the first DR port of the first network device and the first uplink port connected to the external network are in the first state. The fault handling method provided by the embodiment of the application can comprise the following steps.
Step 310, receiving a first keepalive message sent by the second network device through a keepalive link, where the first keepalive message includes a current state of a second DR port of the second network device, and the current state of the second DR port is a first state.
Specifically, as shown in fig. 2, the device a and the device B form a DR system and are adjacent to each other. The device a is a master device, and the device B is a slave device. And the equipment A and the equipment B exchange protocol messages and data traffic through an IPL link. Meanwhile, the device A and the device B detect the neighbor state through a Keepalive link. The device B includes a first DR port, a first upstream port connected to an external network (in one example, the external network may be embodied as an IP network). The device A comprises a second DR port and a second uplink port connected with an external network. The first DR opening and the second DR opening are two-layer aggregation openings and belong to the same DR group.
In this embodiment, the first network device is device B, and the second network device is device a. If the IPL link fails, in order to prevent the wrong forwarding of the traffic, the device B starts the MAD mechanism, and the device B sets both the first DR port and the first uplink port to the first state, and only the device a keeps the single active operation. Wherein the first state is a down state.
At this time, if the device a detects that the second DR port of the device a is failed, the device a sets the current state of the second DR port from the second state to the first state. Wherein the second state is the up state. The device a generates a first keepalive packet, where the first keepalive packet includes a current state of a second DR port of the device a, and the current state of the second DR port is a first state, that is, the second DR port is in a down state.
In this embodiment of the present application, a field (specifically, a Local DR State field) is added to the keepalive message by the device a, and the current State of the DR interface is represented by a value of the field. For example, a Local DR State field value of 1 indicates that the current State of the DR interface is a down State; the value of the Local DR State field is 0, which indicates that the current State of the DR interface is up State, and the default value of the Local DR State field is 0.
It can be understood that, after the DR interface is set to the down state, the forwarding table entry configured at the DR interface for forwarding the traffic is deleted. After the DR interface is restored to the up state, the learning of MAC address or ARP table item can be carried out again, and the forwarding table item can be generated according to the learning result.
And step 320, updating the working role of the first network device to be a master device according to the current state of the second DR port, removing the MAD mechanism, and updating the current state of the first DR port to be the second state.
Specifically, after receiving the first keepalive message, the device B obtains the current state of the second DR port from the first keepalive message. And according to the current state of the second DR port, the equipment B senses that the state of the second DR port of the equipment A is a down state, and determines that the equipment A cannot continuously forward the flow.
The device B updates its own work role as the master device. At the same time, device B releases the MAD mechanism with the working role of the master. The device B updates the current state of the first DR port to the second state, i.e., the up state.
Further, the device B updates the current state of the first uplink port to the second state, i.e., the up state. At this time, the current states of the first uplink port and the first DR port are both up states.
Furthermore, after the current states of the first upstream port and the first DR port are both the up state, the device B also performs MAC address learning or ARP entry learning at the first DR port. And according to the learning result, the equipment B generates a forwarding table entry for forwarding the flow.
Therefore, the device B replaces the device A to forward the flow through the first uplink port and the first DR port, and the problem that the flow of the DR system is interrupted for a long time in the situation that the DR system has multiple faults in the prior art is solved. Meanwhile, extra configuration is not needed, usability is high, flow switching performance during fault recovery is improved, and the fault influence range is reduced.
Step 330, sending a second keepalive message to the second network device through the keepalive link, where the second keepalive message includes the updated current state of the first DR port, so that the second network device updates the work role of the second network device to a slave device, and starts the MAD mechanism to update the current states of the second DR port and a second uplink port connected to an external network to the first state.
Specifically, after the device B updates the current state of the first DR port to up, the device B generates a second keepalive packet, where the first keepalive packet includes the updated current state of the first DR port of the device B, and the current state of the first DR port is the second state, that is, the first DR port is in the up state.
In this embodiment of the present application, a field (specifically, a Local DR State field) is added to the keepalive message by the device B, and the current State of the DR interface is represented by a value of the field. For example, a Local DR State field value of 1 indicates that the current State of the DR interface is a down State; the value of the Local DR State field is 0, which indicates that the current State of the DR interface is up State, and the default value of the Local DR State field is 0.
And the device B sends a second keepalive message to the device A through the keepalive link. And after receiving the second keepalive message, the device A acquires the current state of the first DR port of the device B. The device A senses that the first DR port state of the device B is an up state, and the second DR port state of the device A is a down state. The device a updates its working role as a slave. Meanwhile, the device a starts the MAD mechanism, and updates the current states of the second DR port and the second uplink port connected to the external network to the first state, that is, both the second DR port and the second uplink port are updated to the down state. Therefore, the problem of packet loss caused by introducing traffic under the condition that the second DR interface fails to recover subsequently but the DR interface has no forwarding table entry can be prevented.
Therefore, by applying the fault handling method provided by the present application, the first network device receives a first keepalive message including a current state of a second DR port of the second network device, which is sent by the second network device through a keepalive link, where the current state of the second DR port is the first state. And according to the current state of the second DR port, the first network equipment updates the working role of the first network equipment to be the main equipment, releases the MAD mechanism and updates the current state of the first DR port to be the second state. And through the keepalive link, the first network equipment sends a second keepalive message including the updated current state of the first DR port to the second network equipment, so that the second network equipment updates the working role of the second network equipment to slave equipment, starts an MAD mechanism, and updates the current states of the second DR port and a second upstream port connected with an external network to the first state.
The problem of among the prior art, DR system flow breaks off for a long time under the multiple fault scene appears in the DR system is solved. The method and the device have the advantages that the flow can be continuously forwarded under the condition of multiple faults, meanwhile, additional configuration is not needed, the usability is high, the flow switching performance during fault recovery is improved, and the fault influence range is reduced.
Optionally, in an implementation scenario, if both the device a and the device B perceive that the current state of the peer DR port is a down state. At this time, each device updates its own work role to the master device and releases its own MAD mechanism. No matter which DR port of the subsequent equipment is recovered from the fault, MAC address learning or ARP table item learning can be immediately carried out, a forwarding table item is generated, and then flow forwarding is achieved. And each device interacts with the keepalive message with the current state of the DR port as the up state, so that the opposite end updates the working role of the opposite end again according to the current state of the DR port of the opposite end and the current state of the DR port of the opposite end contained in the keepalive message, issues an MAD mechanism and updates the current states of the DR port of the opposite end and the uplink port of the opposite end.
Specifically, according to the example in the foregoing step, the IPL link between the device a and the device B fails, and the second DR port included in the device a has failed and is not recovered. According to the description of the previous steps, the working role of the device B is the main device, the first DR port and the first uplink port are both in the up state, and the device B realizes the forwarding of the flow; the working role of the device A is slave device, and the second DR port and the second uplink port are both in down state.
On the basis, if the device B detects that the first DR port of the device B is in failure, the device B sets the current state of the first DR port from the up state to the down state. The device B generates a third keepalive packet, where the third keepalive packet includes a current state of the first DR port of the device B, and the current state of the first DR port is a first state, that is, the first DR port is in a down state.
In this embodiment of the present application, a field (specifically, a Local DR State field) is added to the keepalive message by the device B, and the current State of the DR interface is represented by a value of the field. For example, a Local DR State field value of 1 indicates that the current State of the DR interface is a down State; the value of the Local DR State field is 0, which indicates that the current State of the DR interface is up State, and the default value of the Local DR State field is 0.
And the equipment B sends a third keepalive message to the equipment A. And after receiving the third keepalive message, the device A acquires the current state of the first DR port from the third keepalive message. According to the current state of the first DR port, the device A senses that the state of the first DR port of the device B is a down state, and determines that the device B cannot continuously forward the flow.
The device a updates its own work role as a master device. At the same time, device a releases the MAD mechanism with the working role of the master. At this time, the DR ports of the device a and the device B both fail, and neither starts the MAD mechanism.
Then, no matter which DR interface fails and recovers first, the device in which the DR interface is located updates the current state of the DR interface to the up state. Then, the device sends a keepalive message including the current state of the DR port to the opposite end, so that the opposite end updates the working role of the opposite end to the slave device, and starts the MAD mechanism again to restore the active/standby state.
Further, if the second DR port of the device a fails to recover first, the device a generates a fourth keepalive packet, where the fourth keepalive packet includes a current state of the second DR port, and the current state of the second DR port is an up state. And after receiving the fourth keepalive message sent by the device A, the device B acquires the current state of the second DR port from the fourth keepalive message. And according to the current state of the second DR port, the equipment B updates the working role of the equipment B to be slave equipment, starts an MAD mechanism and updates the current states of the first DR port and the first uplink port to be a down state.
Further, if the first DR port of the device B fails to recover first, the device B generates a fifth keepalive packet, where the fifth keepalive packet includes the current state of the first DR port, and the current state of the first DR port is the up state. And the equipment B sends a fifth keepalive message to the equipment A. And after receiving the fifth keepalive message, the device A acquires the current state of the first DR port. According to the current state of the first DR port, the device A updates the working role of the device A to be slave equipment, starts an MAD mechanism and updates the current states of the second DR port and the second uplink port to be a down state.
Alternatively, in another implementation scenario, on the basis of the foregoing embodiment, if the second DR interface fails and recovers, but the device a has started the MAD mechanism, the second DR interface state still maintains the down state and does not introduce traffic. The first DR port and the first uplink port of the device B continue to work stably.
If the subsequent first DR port also fails, at this time, the device a may refer to the execution process of the device B in the foregoing embodiment, and take over the execution of the traffic forwarding by the device B, thereby avoiding the interruption or packet loss caused by the failure-free recovery.
If the IPL link fails to recover before the second DR port, the prior art implementation may be referred to.
Specifically, if the IPL link fails to recover before the second DR port, the device a and the device B synchronize the current state of the DR port through the IPL link. Since the current state of the first DR interface included in the device B is the up state, the aggregation interface to which the second DR interface included in the device B belongs may be set to the global up state (the up state is the logic state of the aggregation interface). At this time, the aggregation port can learn the forwarding table entry in the logic level, but the aggregation member port (i.e., the second DR port) is still in the down state, so that the second DR port does not introduce traffic, and traffic loss due to the fact that the forwarding table entry is not learned is avoided.
And then, the equipment B can synchronously forward the table entry to the equipment A through the IPL link, the table entry to be forwarded is synchronously completed, and the equipment A issues the forwarding table entry to the second DR port. If the second DR interface fails and recovers, the device a may update the second DR interface and the second uplink interface from the down state to the up state.
Optionally, in this embodiment of the present application, service keep-alive under a scenario that the DR system has multiple failures may also be implemented in another manner.
Specifically, under the condition that the IPL link has failed, after the device a detects that the second DR interface of the device a has failed, the device a may actively stop receiving and sending keepalive messages. The device B does not receive the keepalive message sent by the device A within the preset time, and the device B determines the keepalive protocol fault with the device A. And the device B senses the split of the distributed aggregation system, updates the working role of the device B to be the master device by the device B, removes the MAD mechanism and updates the current states of the DR port and the uplink port of the device B to be the up state. Device B takes over for device a to continue forwarding traffic.
The following describes the fault handling method provided in the application embodiment in detail with reference to fig. 4. Fig. 4 is a timing flowchart of a fault handling method according to an embodiment of the present application.
As shown in fig. 4, device a is a master device and device B is a slave device. And the equipment A and the equipment B exchange protocol messages and data traffic through an IPL link. Meanwhile, the device A and the device B detect the neighbor state through a Keepalive link. The device B comprises a first DR port and a first uplink port connected with an external network. The device A comprises a second DR port and a second uplink port connected with an external network. The first DR opening and the second DR opening are two-layer aggregation openings and belong to the same DR group.
In the embodiment of the present application, if the IPL link fails, that is, the IPL link failure is a primary failure. In order to prevent the traffic from being forwarded incorrectly, the device B starts the MAD mechanism, and sets both the first DR port and the first uplink port to the down state, and only the device a keeps single-active operation.
At this time, if the device a detects that the second DR port of the device a has a failure, that is, the failure of the second DR port is a secondary failure. To avoid traffic disruption, device a sets the current state of the second DR port from the up state to the down state. The equipment A generates a first keepalive message, wherein the first keepalive message comprises the current state of a second DR port of the equipment A, and the current state of the second DR port is a down state.
It can be understood that, after the DR interface is set to the down state, the forwarding table entry configured at the DR interface for forwarding the traffic is deleted. After the DR interface is restored to the up state, the learning of MAC address or ARP table item can be carried out again, and the forwarding table item can be generated according to the learning result.
And after receiving the first keepalive message, the equipment B acquires the current state of the second DR port. And according to the current state of the second DR port, the equipment B senses that the state of the second DR port of the equipment A is a down state, and determines that the equipment A cannot continuously forward the flow.
The device B updates its own work role as the master device. At the same time, device B releases the MAD mechanism with the working role of the master. The device B updates the current state of the first DR port to the up state.
Further, the device B also updates the current state of the first upstream port to the up state. At this time, the current states of the first uplink port and the first DR port are both up states.
Furthermore, after the current states of the first upstream port and the first DR port are both the up state, the device B also performs MAC address learning or ARP entry learning at the first DR port. And according to the learning result, the equipment B generates a forwarding table entry for forwarding the flow.
Therefore, the device B replaces the device A to forward the flow through the first uplink port and the first DR port, and the problem that the flow of the DR system is interrupted for a long time in the situation that the DR system has multiple faults in the prior art is solved. The method and the device realize continuous forwarding of the flow under the condition of multiple faults. The method and the device have the advantages that the flow can be continuously forwarded under the condition of multiple faults, meanwhile, additional configuration is not needed, the usability is high, the flow switching performance during fault recovery is improved, and the fault influence range is reduced.
After the device B updates the current state of the first DR port to up, the device B generates a second keepalive message, where the first keepalive message includes the updated current state of the first DR port of the device B, and the current state of the first DR port is up.
In this embodiment of the present application, a field (specifically, a Local DR State field) is added to the keepalive message by the device B, and the current State of the DR interface is represented by a value of the field. For example, a Local DR State field value of 1 indicates that the current State of the DR interface is a down State; the value of the Local DR State field is 0, which indicates that the current State of the DR interface is up State, and the default value of the Local DR State field is 0.
And the device B sends a second keepalive message to the device A through the keepalive link. And after receiving the second keepalive message, the device A acquires the current state of the first DR port of the device B. The device A senses that the first DR port state of the device B is an up state, and the second DR port state of the device A is a down state. The device a updates its working role as a slave. Meanwhile, the device a starts the MAD mechanism, and updates the current states of the second DR port and the second upstream port connected to the external network to the down state. Therefore, the problem of packet loss caused by introducing traffic under the condition that the subsequent fault recovery of the second DR port is caused but no forwarding table entry exists in the DR port can be prevented.
Based on the same inventive concept, the embodiment of the application also provides a fault processing device corresponding to the fault processing method. Referring to fig. 5, fig. 5 is a structural diagram of a fault handling apparatus provided in this embodiment, where the apparatus is applied to a first network device in a DR system, where the DR system further includes a second network device, an IPL link between the first network device and the second network device fails, the first network device starts an MAD mechanism, and a first DR port of the first network device and a first uplink port connected to an IP network are both in a first state, and the apparatus includes:
a receiving unit 510, configured to receive a first keepalive packet sent by the second network device through a keepalive link, where the first keepalive packet includes a current state of a second DR port of the second network device, and the current state of the second DR port is a first state;
an updating unit 520, configured to update the working role of the first network device to be a master device according to the current state of the second DR port, and remove the MAD mechanism, and update the current state of the first DR port to be the second state;
a sending unit 530, configured to send a second keepalive packet to the second network device through the keepalive link, where the second keepalive packet includes an updated current state of the first DR port, so that the second network device updates a work role of the second network device to a slave device, and starts the MAD mechanism, and updates current states of the second DR port and a second uplink port connected to an external network to the first state.
Optionally, the updating unit 520 is further configured to update the current state of the first uplink port to the second state;
the device further comprises: a forwarding unit (not shown in the figure) for forwarding the traffic through the first uplink port and the first DR port.
Optionally, the apparatus further comprises: a learning unit (not shown in the figure) configured to perform MAC address learning or ARP entry learning at the first DR port;
a generating unit (not shown in the figure), configured to generate a forwarding table entry for forwarding the traffic according to the learning result.
Optionally, the sending unit 530 is further configured to send, when the first DR port fails, a third keepalive packet to the second network device through the keepalive link, where the third keepalive packet includes a current state of the first DR port, so that the second network device updates a work role of the second network device to a master device, and releases the MAD mechanism.
Optionally, the receiving unit 510 is further configured to receive a fourth keepalive packet sent by the second device, where the fourth keepalive packet includes a current state of the second DR interface, and the current state of the second DR interface is the second state;
the updating unit 520 is further configured to update the working role of the first network device to be a slave device according to the current state of the second DR port, and start the MAD mechanism to update the current state of the first DR port to be the first state.
Optionally, the sending unit 530 is further configured to send, when the first DR interface is recovered from the failure, a fifth keepalive packet to the second network device through the keepalive link, where the fifth keepalive packet includes a current state of the first DR interface, so that the second network device updates a working role of the second network device to a slave device, and starts the MAD mechanism to update the current state of the second DR interface to the first state.
Optionally, the first state is a down state and the second state is an up state.
Therefore, by applying the fault handling apparatus provided by the present application, the apparatus receives a first keepalive message including a current state of a second DR port of a second network device, which is sent by the second network device through a keepalive link, where the current state of the second DR port is a first state. According to the current state of the second DR port, the device updates the working role of the first network equipment to the main equipment, releases the MAD mechanism and updates the current state of the first DR port to the second state. Through the keepalive link, the device sends a second keepalive message including the updated current state of the first DR port to the second network device, so that the second network device updates the working role of the second network device to be the slave device, starts the MAD mechanism, and updates the current states of the second DR port and a second upstream port connected with the IP network to be the first state.
Therefore, the problem that in the prior art, the flow of the DR system is interrupted for a long time under the condition that the DR system has multiple fault scenes is solved. The method and the device have the advantages that the flow can be continuously forwarded under the condition of multiple faults, meanwhile, additional configuration is not needed, the usability is high, the flow switching performance during fault recovery is improved, and the fault influence range is reduced
Based on the same inventive concept, the present application further provides a network device, as shown in fig. 6, including a processor 610, a transceiver 620, and a machine-readable storage medium 630, where the machine-readable storage medium 630 stores machine-executable instructions capable of being executed by the processor 610, and the processor 610 is caused by the machine-executable instructions to perform the fault handling method provided by the present application. The failure processing apparatus shown in fig. 5 can be implemented by using the hardware structure of the network device shown in fig. 6.
The computer-readable storage medium 630 may include a Random Access Memory (RAM) or a Non-volatile Memory (NVM), such as at least one disk Memory. Optionally, the computer-readable storage medium 630 may also be at least one memory device located remotely from the processor 610.
The Processor 610 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
In the embodiment of the present application, the processor 610 reads the machine executable instructions stored in the machine readable storage medium 630, and the machine executable instructions cause the processor 610 itself and the call transceiver 620 to perform the fault handling method described in the embodiment of the present application.
Additionally, embodiments of the present application provide a machine-readable storage medium 630, where the machine-readable storage medium 630 stores machine-executable instructions, which when invoked and executed by the processor 610, cause the processor 610 itself and the invoking transceiver 620 to perform the fault handling methods described in embodiments of the present application.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
As for the embodiments of the fault handling apparatus and the machine-readable storage medium, since the contents of the related methods are substantially similar to those of the foregoing embodiments of the methods, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the embodiments of the methods.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims (14)

1. A fault handling method is applied to a first network device in a DR system, the DR system further includes a second network device, an IPL link between the first network device and the second network device fails, the first network device starts a MAD mechanism, and a first DR port of the first network device and a first upstream port connected to an external network are both in a first state, the method includes:
receiving a first keepalive message sent by the second network device through a keepalive link, wherein the first keepalive message comprises a current state of a second DR port of the second network device, and the current state of the second DR port is a first state;
updating the working role of the first network equipment to be main equipment according to the current state of the second DR port, removing the MAD mechanism and updating the current state of the first DR port to be a second state;
and sending a second keepalive message to the second network equipment through the keepalive link, wherein the second keepalive message comprises the updated current state of the first DR port, so that the second network equipment updates the working role of the second network equipment into slave equipment, starts the MAD mechanism, and updates the current states of the second DR port and a second uplink port connected with an external network into the first state.
2. The method of claim 1, wherein before sending a second keepalive packet to the second network device over the keepalive link, the method further comprises:
updating the current state of the first uplink port to the second state;
and forwarding the flow through the first uplink port and the first DR port.
3. The method of claim 2, further comprising:
learning MAC addresses or ARP table items at the first DR port;
and generating a forwarding table entry for forwarding the flow according to the learning result.
4. The method of claim 1, further comprising:
when the first DR port is in fault, sending a third keepalive message to the second network equipment through the keepalive link, wherein the third keepalive message comprises the current state of the first DR port, so that the second network equipment updates the working role of the second network equipment as main equipment, and the MAD mechanism is released.
5. The method of claim 4, wherein after sending the third keepalive packet to the second network device, the method further comprises:
receiving a fourth keepalive message sent by the second device, where the fourth keepalive message includes a current state of the second DR port, and the current state of the second DR port is the second state;
and updating the working role of the first network equipment to be slave equipment according to the current state of the second DR port, starting the MAD mechanism, and updating the current state of the first DR port to be the first state.
6. The method of claim 4, wherein after sending the third keepalive packet to the second network device, the method further comprises:
when the first DR port is recovered from the fault, a fifth keepalive message is sent to the second network equipment through the keepalive link, wherein the fifth keepalive message comprises the current state of the first DR port, so that the second network equipment updates the working role of the second network equipment to slave equipment, starts the MAD mechanism, and updates the current state of the second DR port to the first state.
7. The method of any of claims 1-6, wherein the first state is a down state and the second state is an up state.
8. A fault handling apparatus, applied to a first network device in a DR system, wherein the DR system further includes a second network device, an IPL link between the first network device and the second network device fails, the first network device starts a MAD mechanism, and a first DR port of the first network device and a first uplink port connected to an IP network are both in a first state, the apparatus comprising:
a receiving unit, configured to receive a first keepalive packet sent by the second network device through a keepalive link, where the first keepalive packet includes a current state of a second DR port of the second network device, and the current state of the second DR port is a first state;
an updating unit, configured to update the work role of the first network device to a master device according to the current state of the second DR port, remove the MAD mechanism, and update the current state of the first DR port to a second state;
and the sending unit is used for sending a second keepalive message to the second network equipment through the keepalive link, wherein the second keepalive message comprises the updated current state of the first DR port, so that the second network equipment updates the working role of the second network equipment into slave equipment, starts the MAD mechanism, and updates the current states of the second DR port and a second uplink port connected with an external network into the first state.
9. The apparatus according to claim 8, wherein the updating unit is further configured to update the current state of the first uplink port to the second state;
the device further comprises: and the forwarding unit is used for forwarding the flow through the first uplink port and the first DR port.
10. The apparatus of claim 9, further comprising:
the learning unit is used for learning MAC addresses or ARP table items at the first DR port;
and the generating unit is used for generating a forwarding table entry for forwarding the flow according to the learning result.
11. The apparatus of claim 8, wherein the sending unit is further configured to,
when the first DR port is in fault, sending a third keepalive message to the second network equipment through the keepalive link, wherein the third keepalive message comprises the current state of the first DR port, so that the second network equipment updates the working role of the second network equipment as main equipment, and the MAD mechanism is released.
12. The method of claim 11, wherein the receiving unit is further configured to,
receiving a fourth keepalive message sent by the second device, where the fourth keepalive message includes a current state of the second DR port, and the current state of the second DR port is the second state;
the updating unit is further configured to update the work role of the first network device to a slave device according to the current state of the second DR port, and start the MAD mechanism to update the current state of the first DR port to the first state.
13. The method of claim 11, wherein the sending unit is further configured to,
when the first DR port is recovered from the fault, a fifth keepalive message is sent to the second network equipment through the keepalive link, wherein the fifth keepalive message comprises the current state of the first DR port, so that the second network equipment updates the working role of the second network equipment to slave equipment, starts the MAD mechanism, and updates the current state of the second DR port to the first state.
14. The apparatus of any of claims 8-13, wherein the first state is a down state and the second state is an up state.
CN202010692829.8A 2020-07-17 2020-07-17 Fault processing method and device Pending CN111953591A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010692829.8A CN111953591A (en) 2020-07-17 2020-07-17 Fault processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010692829.8A CN111953591A (en) 2020-07-17 2020-07-17 Fault processing method and device

Publications (1)

Publication Number Publication Date
CN111953591A true CN111953591A (en) 2020-11-17

Family

ID=73341578

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010692829.8A Pending CN111953591A (en) 2020-07-17 2020-07-17 Fault processing method and device

Country Status (1)

Country Link
CN (1) CN111953591A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113300878A (en) * 2021-04-13 2021-08-24 北京华三通信技术有限公司 Method and device for realizing data smoothing
CN113949623A (en) * 2021-10-18 2022-01-18 迈普通信技术股份有限公司 MLAG double-master abnormity repairing method and device, electronic equipment and storage medium
CN114221899A (en) * 2021-11-30 2022-03-22 新华三技术有限公司合肥分公司 Fault processing method and device
CN113949623B (en) * 2021-10-18 2024-04-26 迈普通信技术股份有限公司 MLAG double-master exception repairing method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140314097A1 (en) * 2013-04-23 2014-10-23 Telefonaktiebolaget L M Ericsson (Publ) Method and system for network and intra-portal link (ipl) sharing in distributed relay control protocol (drcp)
CN106878047A (en) * 2016-12-13 2017-06-20 新华三技术有限公司 Fault handling method and device
CN107547370A (en) * 2017-09-25 2018-01-05 新华三技术有限公司 Flow forwarding method, apparatus and system
CN107547398A (en) * 2017-05-23 2018-01-05 新华三技术有限公司 Message forwarding method, device and equipment
CN107682261A (en) * 2017-10-24 2018-02-09 新华三技术有限公司 Flow forwarding method and device
CN108737189A (en) * 2018-05-25 2018-11-02 新华三技术有限公司 DR device roles update method and device
CN109194521A (en) * 2018-09-29 2019-01-11 新华三技术有限公司 A kind of flow forwarding method and equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140314097A1 (en) * 2013-04-23 2014-10-23 Telefonaktiebolaget L M Ericsson (Publ) Method and system for network and intra-portal link (ipl) sharing in distributed relay control protocol (drcp)
CN106878047A (en) * 2016-12-13 2017-06-20 新华三技术有限公司 Fault handling method and device
CN107547398A (en) * 2017-05-23 2018-01-05 新华三技术有限公司 Message forwarding method, device and equipment
CN107547370A (en) * 2017-09-25 2018-01-05 新华三技术有限公司 Flow forwarding method, apparatus and system
CN107682261A (en) * 2017-10-24 2018-02-09 新华三技术有限公司 Flow forwarding method and device
CN108737189A (en) * 2018-05-25 2018-11-02 新华三技术有限公司 DR device roles update method and device
CN109194521A (en) * 2018-09-29 2019-01-11 新华三技术有限公司 A kind of flow forwarding method and equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113300878A (en) * 2021-04-13 2021-08-24 北京华三通信技术有限公司 Method and device for realizing data smoothing
CN113949623A (en) * 2021-10-18 2022-01-18 迈普通信技术股份有限公司 MLAG double-master abnormity repairing method and device, electronic equipment and storage medium
CN113949623B (en) * 2021-10-18 2024-04-26 迈普通信技术股份有限公司 MLAG double-master exception repairing method and device, electronic equipment and storage medium
CN114221899A (en) * 2021-11-30 2022-03-22 新华三技术有限公司合肥分公司 Fault processing method and device
CN114221899B (en) * 2021-11-30 2024-03-08 新华三技术有限公司合肥分公司 Fault processing method and device

Similar Documents

Publication Publication Date Title
CN108574614B (en) Message processing method, device and network system
WO2018054156A1 (en) Vxlan message forwarding method, device and system
JP5152642B2 (en) Packet ring network system, packet transfer method, and node
CN112769587A (en) Forwarding method and device for access flow of dual-homing device and storage medium
CN110912780A (en) High-availability cluster detection method, system and controlled terminal
CN109861867B (en) MEC service processing method and device
US20130304805A1 (en) Network system and network relay apparatus
US20150365320A1 (en) Method and device for dynamically switching gateway of distributed resilient network interconnect
EP1678894A1 (en) Redundant routing capabilities for a network node cluster
CN101729426B (en) Method and system for quickly switching between master device and standby device of virtual router redundancy protocol (VRRP)
EP2748989B1 (en) Methods and apparatus for avoiding inter-chassis redundancy switchover to non-functional standby nodes
CN107911291A (en) VRRP routers switching method, router, VRRP active-standby switch system and storage medium
US10447652B2 (en) High availability bridging between layer 2 networks
CN112367254A (en) Cross-device link aggregation method and device and electronic device
CN111953591A (en) Fault processing method and device
CN114978987B (en) Server Redundancy Backup Method
JP2003258822A (en) Packet ring network and inter-packet ring network connection method used in the same
CN113037565B (en) Message processing method and device
JP2006333077A (en) Line redundancy method and relaying device to be used therein
CN113645312A (en) Method and device for protecting sub-ring network link based on ERPS protocol
CN113973020B (en) Method, device and system for sending multicast message
WO2024001359A1 (en) Communication state switching method, port configuration method, and communication system and medium
CN112104548A (en) Communication method and device
CN112511419B (en) Distributed forwarding system
CN108768798B (en) Equipment access method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201117

RJ01 Rejection of invention patent application after publication