WO2022262739A1 - 故障处理方法、交换设备、存储介质 - Google Patents

故障处理方法、交换设备、存储介质 Download PDF

Info

Publication number
WO2022262739A1
WO2022262739A1 PCT/CN2022/098763 CN2022098763W WO2022262739A1 WO 2022262739 A1 WO2022262739 A1 WO 2022262739A1 CN 2022098763 W CN2022098763 W CN 2022098763W WO 2022262739 A1 WO2022262739 A1 WO 2022262739A1
Authority
WO
WIPO (PCT)
Prior art keywords
link
fault
state
peer
operating state
Prior art date
Application number
PCT/CN2022/098763
Other languages
English (en)
French (fr)
Inventor
林宁
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2022262739A1 publication Critical patent/WO2022262739A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/55Prevention, detection or correction of errors
    • H04L49/557Error correction, e.g. fault recovery or fault tolerance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • H04L45/243Multipath using M+N parallel active paths
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/55Prevention, detection or correction of errors

Definitions

  • the present application relates to but not limited to the communication field, and in particular relates to a fault handling method, a switching device, and a storage medium.
  • the switching system is an important part of network nodes.
  • cross-device link aggregation technology has begun to be applied to various switching systems.
  • the switching system that applies cross-device link aggregation technology includes a first device and a second device.
  • the cross-device aggregation link is obtained by binding the communication ports of the first device and the second device, and then the cross-device aggregation link is established with the network element.
  • the communication connection can effectively improve the reliability of data exchange.
  • a peer-to-peer (peer-link) link is configured between the first device and the second device, and the hardware entry associated with the cross-device aggregation link is transferred through the peer-link.
  • peer-link link fails, the second device will disconnect the communication connection with the network element, and the first device alone completes the data exchange.
  • the switching system cannot perform data exchange, resulting in abnormalities in network nodes, which cannot guarantee the stability and reliability of the network.
  • Embodiments of the present application provide a fault handling method, a switching device, and a storage medium.
  • the embodiment of the present application provides a fault handling method, which is applied to the second device of the cross-device link aggregation switching system, and the cross-device link aggregation switching system further includes a first device, and the first device A peer-link link is connected to the second device, and the fault handling method includes: when the peer-link link is in a fault state, when it is detected that the operating state of the first device changes In the fault state, take over the data exchange processing between the first device and the network element.
  • the embodiment of the present application also provides a fault handling method, which is applied to the first device of the cross-device link aggregation switching system, and the cross-device link aggregation switching system further includes a second device, and the first A peer-link link is connected between the device and the second device, and the fault handling method includes: when the peer-link link is in a fault state, when it is detected that the first device fails, sending a first fault notification to the second device, so that the second device takes over the first device when it is determined according to the first fault notification that the operating state of the first device has changed to a fault state Data exchange processing with network elements.
  • an embodiment of the present application provides a switching device, including: a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • a switching device including: a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the computer program, the following is implemented: The fault handling method in one aspect, or, when the processor executes the computer program, implements the fault handling method in the second aspect.
  • FIG. 1 is a flowchart of a fault handling method applied to a second device provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a device of a network node provided by another embodiment of the present application.
  • Fig. 3 is a flow chart of determining that the first device fails according to another embodiment of the present application.
  • FIG. 4 is a flow chart of updating a hardware entry of a second device provided by another embodiment of the present application.
  • Fig. 5 is a flow chart of updating the hardware entry of the first device provided by another embodiment of the present application.
  • FIG. 6 is a flow chart of restoring cross-device aggregated links provided by another embodiment of the present application.
  • FIG. 7 is a flow chart of processing when the second device fails according to another embodiment of the present application.
  • FIG. 8 is a flow chart of the second device exiting the backoff state provided by another embodiment of the present application.
  • FIG. 9 is a flow chart of a fault handling method applied to the first device provided by another embodiment of the present application.
  • Fig. 10 is a flow chart of updating the hardware entry of the first device provided by another embodiment of the present application.
  • FIG. 11 is a flow chart of restoring cross-device aggregated links provided by another embodiment of the present application.
  • Fig. 12 is a flow chart of processing when the second device fails according to another embodiment of the present application.
  • Fig. 13 is a flow chart of determining that the second device is faulty provided by another embodiment of the present application.
  • Fig. 14 is a structural diagram of a network node of an example of the present application.
  • Fig. 15 is a flow chart of the fault handling method of the example of the present application.
  • FIG. 16 is an apparatus diagram of an inter-device link aggregation switching device provided by another embodiment of the present application.
  • the present application provides a fault handling method, a switching device, and a storage medium.
  • a peer-link link is connected between the first device and the second device of the cross-device link aggregation switching system, and the peer-link link is in In the case of a fault state, when the second device detects that the operating state of the first device has changed to a fault state, the second device takes over data exchange processing performed by the first device and a network element.
  • the peer-link link when the peer-link link is in a fault state, when the first device fails, the data exchange function of the cross-device link aggregation switching system can be taken over by the second device, thereby avoiding the occurrence of network nodes Abnormal, effectively improving the stability of the communication system.
  • Figure 1 is a flowchart of a fault handling method provided by an embodiment of the present application, the fault handling method is applied to the second device of the cross-device link aggregation switching system, the cross-device link aggregation
  • the switching system further includes a first device, a peer-to-peer link is connected between the first device and the second device, and the fault handling method includes but is not limited to step S100.
  • Step S100 when the peer-link link is in a fault state, when it is detected that the operating state of the first device has changed to a fault state, take over the data exchange processing between the first device and the network element.
  • the devices in the cross-device link aggregation switching system can be common switches or routing devices.
  • This embodiment does not limit the specific device types, and can form a cross-device link aggregation
  • the data exchange between the link and the network element is sufficient.
  • the network element in this embodiment may be a device in a network node such as a server or a gateway device, and it only needs to perform data interaction with a switching device through an inter-device aggregation link.
  • the specific number is not limited, and those skilled in the art are motivated to select the corresponding port number from the switching device according to the specific number of network elements. For example, as shown in FIG.
  • the network node includes an inter-device link aggregation switching system and at least two network elements, wherein the inter-device link aggregation switching system includes a first device 220 and a second device 230, and the first device 220 and A peer-link link is connected between the second devices 230, and the two network elements are respectively upstream and downstream devices of the cross-device link aggregation switching system, wherein the first network element 210 is connected to the upstream of the cross-device link aggregation switching system direction, the second network element 240 is connected to the downlink direction of the inter-device link aggregation switching system.
  • the inter-device link aggregation switching system includes a first device 220 and a second device 230, and the first device 220 and A peer-link link is connected between the second devices 230, and the two network elements are respectively upstream and downstream devices of the cross-device link aggregation switching system, wherein the first network element 210 is connected to the upstream of the cross-device link
  • the first port 221 and the third port 222 are selected in the first device 220, the second port 231 and the fourth port 232 are selected in the second device, and the first port 221 and the A first communication link is connected between the first network elements 210, a second communication link is connected between the second port 231 and the first network element 210, and a second communication link is connected between the third port 222 and the second network element 240.
  • a fourth communication link is connected between the fourth port 232 and the second network element 240; the cross-device aggregation link for the first network element 210 is obtained by binding the first port 221 and the second port 231 , the cross-device aggregation link for the second network element 240 is obtained by binding the third port 222 and the fourth port 232 .
  • the above structure shown in Figure 2 is only an example, and those skilled in the art have the motivation to adjust the number, type and connection sequence of devices and cross-device aggregation links according to actual needs, which does not affect the technical solution of this embodiment. cause limitation.
  • the peer-link link is a communication link connected to the peer-link port, therefore, for the fault detection of the peer-link link, it can be realized by detecting the state of the peer-link port, which is well known to those skilled in the art How to judge the link status is not described here.
  • the second device when the peer-link link fails, the second device will perform a back-off operation, disconnect the link with the upstream and downstream devices, and only the first device will carry the data exchange function , but the second device itself is in a normal operating state.
  • the technical solution of this embodiment can use the fault status of the first device to wake up the second device, so that the second device can restore the communication connection with the network element, and take over the data exchange function previously carried by the first device, which can ensure the cross-device link
  • the aggregation switching system runs continuously to ensure the stability of network nodes.
  • an alternative transmission link is connected between the first device and the second device, and the change of the operating state of the first device to a fault state is determined by the following steps: Step S300, obtaining The first failure notification sent by the first device through the alternative transmission link, the first failure notification is generated and sent by the first device when a failure occurs; or, step S310, detecting that the alternative transmission link is abnormal; or, In step S320, a keep-alive message is sent to the first device through the alternative transmission link, and no keep-alive response message fed back by the first device through the alternative transmission link is received within the preset keep-alive period.
  • step S300 when the first device fails or the device is restarted, it is possible to send the first fault notification before the device is shut down.
  • the first device sends a death notification to the second device before the whole machine is abnormal.
  • the notification message for another example, sends a device restart-related message to the second device before performing the restart, and may also notify the second device through other trigger information. This embodiment does not limit the type of the message.
  • step S300 it is more accurate to notify the second device of the fault of the first device by sending the first fault notification, and the possibility of misjudgment by the second device is low.
  • the notification is usually generated when the first device detects that it is about to fail. It may also happen that the first device fails, but the first fault notification cannot be sent in time. For example, the first device starts to generate and send after detecting the restart command The first fault notification, but the restart has been performed before sending, the first fault notification has not been sent at this time, and the first device has entered the fault state, therefore, in addition to the fault detection method in step S300, it can also be transmitted through an alternative transmission chain
  • the abnormality detection of the road is used to determine the running state of the first device.
  • the alternative transmission link is the transmission link between the first device and the second device, and since the technical solution of this embodiment is based on the premise of the normal operation of the second device, therefore When the first device fails and the data cannot be sent and received, the alternative transmission link will be abnormal, and it is determined that the operating state of the first device is changed to a fault state. It should be noted that, in step S310, in the manner of determining the failure of the first device through the abnormality of the transmission link, other available transmission links between the first device and the second device may also be used, except for the peer-link link. Any link other than the link can be used, which is not limited in this embodiment.
  • the alternative transmission link may be the Keepalive link between the first device and the second device, therefore, the keepalive message can be sent to the first device through this link, and in the A device is in a normal operating state and can respond to keep-alive messages and feed back keep-alive response messages.
  • the resource allocation of the first device it may take a certain amount of time to respond to keep-alive messages, so you can set A keep-alive period, if a keep-alive response message is not received within this period, it can be determined that the first device is in a failure state.
  • other message detections can also be used to determine the running state of the first device, such as heartbeat message detection, etc., which will not be described here, and the running state of the first device can be determined through message interaction.
  • step S300, step S310 and step S320 can be selected one of them, or more than two can be combined into a condition set, for example, in the case of failure of the alternative transmission link, it may also be the first
  • the operating status of the device has not changed, but only the port corresponding to the alternative transmission link fails.
  • a new transmission link can be replaced, and the operating status of the first device can be determined through keep-alive messages.
  • the trigger information sent by the first device is received, it can be determined that it is not the candidate transmission link itself that fails, but the first device fails.
  • those skilled in the art may also determine that the operating state of the first device is changed to a fault state through other methods or a combination of various conditions, and no further limitation is made here.
  • the second device communicates with the network element through a second communication link, and an alternative transmission link is connected between the first device and the second device.
  • step S100 shown in FIG. 1 before it is detected that the operating state of the first device has changed to a fault state, it also includes but is not limited to the following steps S400-S420.
  • Step S400 when it is detected that the operation state of the peer-link link has changed to a failure state, close the second communication link.
  • Step S410 acquiring first update information sent by the first device through an alternative transmission link, where the first update information represents a change of a hardware entry of the first device.
  • Step S420 updating the hardware entry of the second device according to the first update information.
  • the second communication link can be closed by closing the communication port corresponding to the second communication link in the second device.
  • the second device 230 performs a backoff operation according to an existing protocol, disconnects the second port 231, and enters a backoff state.
  • the peer-link link is used to transmit the update information of the hardware entries of the first device and the second device, and the hardware entries usually include Table item information of the device, such as Media Access Control (Media Access Control, MAC) address table, routing table and Address Resolution Protocol (Address Resolution Protocol, ARP) table and other information, so the data volume of update information of hardware table items is usually less than Large
  • the first update information can be implemented between the first device and the second device through an alternative transmission link transfer, wherein, the alternative transmission link may be any available link between the first device and the second device, for example, a Keepalive link may be used as an alternative transmission link, and those skilled in the art are capable of A candidate transmission link is selected from any communicable transmission link according to the amount of interactive data, which is not limited in this embodiment.
  • the number of alternative transmission links can also be arbitrary.
  • a first candidate transmission link can be selected for the first port 221 and the second port 231
  • a second candidate transmission link can be selected for the third port 222 and the fourth port 232, of course, only A candidate transmission link is determined, which is not required in this embodiment, and the number of candidate transmission links can be determined according to actual needs.
  • the first device communicates with the network element through the first communication link, and the first communication link and the second communication link belong to the same cross-device aggregation link.
  • step S100 when executing After completing step S100 shown in FIG. 1 , it also includes but not limited to the following step S500.
  • Step S500 when the peer-link link is maintained in a fault state, when it is detected that the operating state of the first device has changed to a normal operating state, the second update information is obtained, and the second update information is sent to the The first device, so that the first device updates the hardware entry of the first device according to the second update information, and maintains the first communication link in a closed state, wherein the second update information represents a change of the hardware entry of the second device .
  • step S100 shown in Figure 1 the second device completes the takeover of the first device, and the cross-device link aggregation switching system performs data exchange with the upstream and downstream devices through the second device.
  • the first device can recover from the failure by repairing or restarting the device, if the peer-link link is still in the fault state, the first device and the second device cannot restore the connection mode of the cross-device aggregated link, so , by keeping the first communication link in the off state, keeping the first device in the back-off state, and maintaining the working mode in which the second device performs data exchange alone.
  • the second device communicates with the network element and bears the data exchange function of the inter-device link aggregation switching system, and during this process, the hardware entry of the second device may be changes, and the first device is in the back-off state, but the fault has been resolved.
  • the hardware entry of the second device is synchronized to the first device through an alternative transmission link to ensure that the two devices of the cross-device link aggregation switching system
  • the hardware entries in each device are synchronized.
  • the functions of the first device and the second device can be re-determined, that is, the current second device is determined as the first device, Determine the current first device as the second device, and then use the alternative transmission link to execute the fault handling method described in the above embodiment, or keep the previously determined function unchanged, after the peer-link link returns to normal
  • the inter-device aggregation link is re-established, which is not limited in this embodiment.
  • step S400 shown in FIG. 5 after sending the second update information to the first device through an alternative transmission link, it also includes but is not limited to the following step S600 .
  • Step S600 when it is detected that the operation state of the peer-link link has changed to a normal operation state, the data exchange process based on the cross-device aggregation link with the network element is resumed through the second communication link in cooperation with the first communication link, wherein, The first communication link is opened by the first device upon detecting that the operating state of the peer-link link has changed to a normal operating state.
  • the first device when the first device recovers from a fault, since the peer-link link maintains a fault state, inter-device link aggregation cannot be achieved, and the first communication link remains closed, and the second The second port of the device carries the function of data exchange, but when the peer-link link returns to normal operation, the inter-device link aggregation has a hardware foundation.
  • the first The device can reopen the first communication link, so that the cross-device aggregated link switching system can perform data exchange processing with the network element again through the cross-device aggregated link. Realize inter-device link aggregation, so I won’t go into details here.
  • step S700 is also included but not limited to.
  • Step S700 in the case of detecting that the second device is faulty, generate a second fault notification, and send the second fault notification to the first device through an alternative transmission link, so that the first device
  • the data exchange process between the second device and the network element is taken over by opening the first communication link.
  • the second device when the peer-link link maintains a failure state and the first device maintains a back-off state, when the second device fails, the second device can take over the first device through the method described in the above embodiment , the first device opens the first communication link, restores the communication connection with the network element, thereby taking over the second device, ensuring that the cross-device link aggregation switching system can maintain the switching function and improve the stability of the network.
  • the first device and the second device can form a mutual protection state, and one of the devices fails and wakes up Another device takes over the switching function of the network node, which can effectively improve the stability of the network.
  • the method for the second device to generate the second fault notification and send it to the first device can refer to the description of the first device sending the first fault notification to the second device in the description of step S300 in the embodiment shown in FIG. 3
  • step S800 is also included but not limited to.
  • Step S800 re-opening the second communication link that is in the closed state.
  • the second device After the failure of the peer-link link, according to the existing protocol, the second device enters the back-off state. When the first device further fails, the switching system generates a second failure. According to the description of the above embodiment, it can The second device takes over the first device to maintain the switching function of the switching system. Since the way the second device performs the backoff operation is to disconnect the second communication link, after the second device wakes up, it can reopen the second communication link and establish a communication connection with the network element, thereby realizing the switching function. Fast recovery and improved network stability.
  • FIG. 9 provides a fault handling method, which is applied to the first device of the cross-device link aggregation switching system, and the cross-device link aggregation switching system also includes a second device, the first device and the second device A peer-link link is connected between them, and the fault handling method includes but is not limited to step S900.
  • Step S900 when the peer-link link is in a fault state, when it is detected that the first device is faulty, send a first fault notification to the second device, so that the second device can determine the first device according to the first fault notification
  • the operating status of the device changes to a fault status, the data exchange process between the first device and the network element is taken over.
  • the second device when the second device is in the back-off state, once the first device fails, it will not perform any operations, causing the cross-device link aggregation switching system to stop working, affecting the stability of the network. Therefore, it is necessary to wake up the second device to take over the data exchange process between the first device and the network element.
  • the second device can be used to detect the running status of the first device, but the detection method of the second device may be misjudged. Therefore, in order to improve the accuracy of the second device in determining the fault state of the first device, the first fault notification can be sent to the second device when a fault is detected.
  • the specific principle and method can refer to the steps of the embodiment shown in Figure 3 The description of S300 will not be repeated here.
  • the first device communicates with the network element through the first communication link, and the first communication link and the second communication link belong to the same cross-device aggregation link.
  • step S900 in the embodiment shown in FIG. 9 , it also includes but is not limited to the following steps S1000-S1020.
  • Step S1000 when the peer-link link remains in a fault state, when the operating state of the first device is changed to a normal operating state, obtain the second update information sent by the second device through the alternative transmission link, wherein the first The second update information represents the change of the hardware entry of the second device.
  • Step S1010 updating the hardware entry of the first device according to the second update information.
  • Step S1020 maintaining the first communication link in a closed state.
  • step S1020 in the embodiment shown in FIG. 10 is executed, it also includes but is not limited to the following steps S1100-S1110.
  • Step S1100 when it is detected that the operation state of the peer-link link has changed to a normal operation state, open the first communication link.
  • Step S1110 resume the data exchange process with the network element based on the cross-device aggregation link through the first communication link in cooperation with the second communication link.
  • step S1010 in the embodiment shown in FIG. 10 is executed, the following step S1200 is also included but not limited to.
  • Step S1200 when the peer-link link is in a fault state, when it is detected that the operating state of the second device has changed to a fault state, open the first communication link to take over the data exchange process between the second device and the network element.
  • the detection of the running status of the second device by the first device includes but is not limited to the following steps: Step S1300, obtain the second device through the alternative The second failure notification sent by the transmission link, the second failure notification is generated and sent by the second device in the event of a failure; or, step S1310, detecting that the alternative transmission link is abnormal; or, step S1320, passing the alternative The transmission link sends a keep-alive message to the second device, and no keep-alive response message fed back by the second device through the alternative transmission link is received within the preset keep-alive period.
  • the technical solution of this embodiment is similar to the technical solution recorded in the embodiment shown in FIG.
  • the embodiment is that the second device detects the first device.
  • the specific principles and steps of the detection are the same except for the execution subject and the direction of information transmission. For the sake of brevity, the details will not be repeated here.
  • the network node takes the structure shown in FIG. 14 as an example.
  • switch A is provided with ports A1 and port A2
  • switch B is provided with ports B1 and B2.
  • Switch A and switch B form a cross-device link aggregation switch system, wherein port A1 and port B1 is bound to obtain cross-device aggregated links, and port A2 and port B2 are bound to obtain cross-device aggregated links.
  • the fault judgment method takes switch A actively sending a death notification message as an example.
  • the fault handling method includes but not limited to the following steps S1510-S1550.
  • step S1510 the cross-device link aggregation switch system is connected to the gateway device and the server, wherein a first communication link is connected between port A1 and the gateway device, a second communication link is connected between port B1 and the gateway device, and port A1 is connected to the gateway device.
  • a third communication link is connected between A2 and the server, and a fourth communication link is connected between port B2 and the server.
  • step S1520 switch B determines that the peer-link link fails.
  • Step S1521 switch B deactivates port B1 and port B2, disconnects the second communication link and the fourth communication link, thereby interrupting the communication with the gateway device and the server, enters the back-off state, and reserves the connection between switch A and switch B.
  • the keepalive link communicates normally.
  • step S1522 the switch A keeps working, and synchronizes the change of the hardware entry of the switch A with the switch B through the keepalive link.
  • Step S1530 switch A fails to restart the whole machine, and sends a death notification message to switch B through the keepalive link before restarting.
  • step S1531 switch B re-enables port B1 and port B2, opens the second communication link and the fourth communication link, and restores normal communication between switch B, the gateway device and the server.
  • step S1540 the switch A completes the restart, and the peer-link remains faulty.
  • step S1541 switch A keeps port A1 and port A2 closed, keeps disconnecting the first communication link and the third communication link, and maintains the back-off state.
  • step S1542 switch B and switch A perform hardware entry synchronization through a keepalive link.
  • Step S1550 the peer-link link fault is eliminated, switch A opens port A1 and port A2, opens the first communication link and the third communication link, switch A rejoins the inter-device link aggregation, and switch A and switch B form a complete A cross-device link aggregation system.
  • the second device can maintain a normal backup relationship during the whole process, and at the same time, the second device can also communicate with each other. Keeping the synchronization of the overall hardware entries greatly improves the stability and reliability of the network.
  • an embodiment of the present application also provides a cross-device link aggregation switching device.
  • the cross-device link aggregation switching device 1600 includes: a memory 1610, a processor 1620, and stored in the memory 1610 and can be A computer program running on the processor 1620.
  • the processor 1620 and the memory 1610 may be connected through a bus or in other ways.
  • the non-transitory software programs and instructions required to implement the fault handling method of the above embodiment are stored in the memory 1610, and when executed by the processor 1620, the fault handling method applied to the second device in the above embodiment is executed, for example, Execute method step S100 in Fig. 1 described above, method step S300 to step S320 in Fig. 3, method step S400 to step S420 in Fig. 4, method step S500 in Fig. 5, method step S600 in Fig. 6, Method step S700 in FIG. 7 and method step S800 in FIG. 8; or, execute the fault handling method applied to the first device in the above-mentioned embodiment, for example, execute method steps S900 and FIG. 10 in FIG. 9 described above method steps S1000 to S1020 in FIG.
  • the fault handling method of the device link aggregation switching system for example, executes step S1510 to step S1550 of the method in FIG. 15 .
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by a processor or a controller, for example, by the above-mentioned Executed by a processor in the embodiment of the cross-device link aggregation switching system, the processor may execute the fault handling method applied to the second device in the above embodiment, for example, execute the method step S100 in FIG. 1 described above , method step S300 to step S320 in Fig. 3, method step S400 to step S420 in Fig. 4, method step S500 in Fig. 5, method step S600 in Fig. 6, method step S700 in Fig. 7 and Fig.
  • Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit .
  • a processor such as a central processing unit, digital signal processor, or microprocessor
  • Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
  • the embodiment of this application includes: a peer-link link is connected between the first device and the second device of the cross-device link aggregation switching system, and when the peer-link link is in a fault state, when the first device The second device detects that the operating state of the first device has changed to a failure state, and the second device takes over data exchange processing performed by the first device and the network element.
  • the peer-link link when the peer-link link is in a fault state, when the first device fails, the data exchange function of the cross-device link aggregation switching system can be taken over by the second device, thereby avoiding the occurrence of network nodes Abnormal, effectively improving the stability of the communication system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种故障处理方法、交换设备、存储介质,跨设备链路聚合交换系统的第一设备和第二设备之间连接有peer-link链路,在所述peer-link链路处于故障状态的情况下,当所述第二设备检测到所述第一设备的运行状态变更为故障状态,所述第二设备接管所述第一设备与网元进行的数据交换处理。

Description

故障处理方法、交换设备、存储介质
相关申请的交叉引用
本申请基于申请号为202110658832.2、申请日为2021年6月15日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及但不限于通信领域,尤其涉及一种故障处理方法、交换设备、存储介质。
背景技术
交换系统是网络节点的重要组成部分,随着通信技术的发展,跨设备链路聚合技术开始应用到各种交换系统。应用跨设备链路聚合技术的交换系统包括第一设备和第二设备,通过绑定第一设备和第二设备的通信端口得到跨设备聚合链路,再通过跨设备聚合链路与网元建立通信连接,能够有效提高数据交换的可靠性。
第一设备和第二设备之间配置有对等(peer-link)链路,通过peer-link链路传递与跨设备聚合链路相关联的硬件表项。根据有关协议,当peer-link链路出现故障,第二设备会断开与网元的通信连接,由第一设备单独完成数据交换。但是,若此时第一设备出现故障,交换系统无法进行数据交换,导致网络节点出现异常,无法保障网络的稳定性和可靠性。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本申请实施例提供了一种故障处理方法、交换设备、存储介质。
第一方面,本申请实施例提供了一种故障处理方法,应用于跨设备链路聚合交换系统的第二设备,所述跨设备链路聚合交换系统还包括第一设备,所述第一设备和所述第二设备之间连接有peer-link链路,所述故障处理方法包括:在所述peer-link链路处于故障状态的情况下,当检测到所述第一设备的运行状态变更为故障状态,接管所述第一设备与网元的数据交换处理。
第二方面,本申请实施例还提供了一种故障处理方法,应用于跨设备链路聚合交换系统的第一设备,所述跨设备链路聚合交换系统还包括第二设备,所述第一设备和所述第二设备之间连接有peer-link链路,所述故障处理方法包括:在所述peer-link链路处于故障状态的情况下,当检测到所述第一设备发生故障,向所述第二设备发送第一故障通知,以使所述第二设备在根据所述第一故障通知确定所述第一设备的运行状态变更为故障状态的情况下,接管所述第一设备与网元的数据交换处理。
第三方面,本申请实施例提供了一种交换设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如第一方面所述的故障处理方法,或者,所述处理器执行所述计算机程序时实现如第二方面所述的故障处理方法。
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。
附图说明
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。
图1是本申请一个实施例提供的应用于第二设备的故障处理方法的流程图;
图2是本申请另一个实施例提供的网络节点的装置示意图;
图3是本申请另一个实施例提供的确定第一设备出现故障的流程图;
图4是本申请另一个实施例提供的更新第二设备的硬件表项的流程图;
图5是本申请另一个实施例提供的更新第一设备的硬件表项的流程图;
图6是本申请另一个实施例提供的恢复跨设备聚合链路的流程图;
图7是本申请另一个实施例提供的第二设备出现故障的处理流程图;
图8是本申请另一个实施例提供的第二设备退出退避状态的流程图;
图9是本申请另一个实施例提供的应用于第一设备的故障处理方法的流程图;
图10是本申请另一个实施例提供的更新第一设备的硬件表项的流程图;
图11是本申请另一个实施例提供的恢复跨设备聚合链路的流程图;
图12是本申请另一个实施例提供的第二设备出现故障的处理流程图;
图13是本申请另一个实施例提供的确定第二设备出现故障的流程图;
图14是本申请的示例的网络节点的结构图;
图15是本申请的示例的故障处理方法的流程图;
图16是本申请另一个实施例提供的跨设备链路聚合交换设备的装置图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书、权利要求书或上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
本申请提供了一种故障处理方法、交换设备、存储介质,跨设备链路聚合交换系统的第一设备和第二设备之间连接有peer-link链路,在所述peer-link链路处于故障状态的情况下,当所述第二设备检测到所述第一设备的运行状态变更为故障状态,所述第二设备接管所述第一设备与网元进行的数据交换处理。根据本申请实施例提供的方案,在peer-link链路处于故障状态,当第一设备出现故障,能够通过第二设备接管跨设备链路聚合交换系统的数据交换功能,从而避免了网络节点出现异常,有效提高了通信系统的稳定性。
下面结合附图,对本申请实施例作进一步阐述。
如图1所示,图1是本申请一个实施例提供的一种故障处理方法的流程图,该故障处理方法应用于跨设备链路聚合交换系统的第二设备,所述跨设备链路聚合交换系统还包括第一设备,所述第一设备和所述第二设备之间连接有对等peer-link链路,所述故障处理方法包括但不限于有步骤S100。
步骤S100,在peer-link链路处于故障状态的情况下,当检测到第一设备的运行状态变 更为故障状态,接管第一设备与网元的数据交换处理。
需要说明的是,跨设备链路聚合交换系统的设备可以是常见的交换机或者路由设备,本实施例对具体的设备类型不多作限定,能够组成跨设备链路聚合系统,并通过跨设备聚合链路与网元进行数据交换处理即可。可以理解的是,本实施例的网元可以是服务器或者网关设备等网络节点中的设备,能够通过跨设备聚合链路与交换设备进行数据交互即可,本实施例对网元的具体类型和具体数量不多作限定,并且,本领域技术人员有动机根据网元的具体数量从交换设备中选取出对应的端口数量。例如,如图2所示,网络节点包括跨设备链路聚合交换系统和至少两个网元,其中,跨设备链路聚合交换系统包括第一设备220和第二设备230,第一设备220和第二设备230之间连接有peer-link链路,两个网元分别为跨设备链路聚合交换系统的上下游设备,其中,第一网元210连接于跨设备链路聚合交换系统的上行方向,第二网元240连接于跨设备链路聚合交换系统的下行方向。另外,由于上下游分别设置有网元,因此在第一设备220中选取第一端口221和第三端口222,在第二设备中选取第二端口231和第四端口232,第一端口221与第一网元210之间连接有第一通信链路,第二端口231与第一网元210之间连接有第二通信链路,第三端口222与第二网元240之间连接有第三通信链路,第四端口232与第二网元240之间连接有第四通信链路;通过绑定第一端口221与第二端口231得到针对第一网元210的跨设备聚合链路,通过绑定第三端口222和第四端口232得到针对第二网元240的跨设备聚合链路。当然,上述图2中所示的结构仅为举例说明,本领域技术人员有动机根据实际需求调整设备和跨设备聚合链路的数量、类型和连接顺序,这并不会对本实施例的技术方案造成限定。
可以理解的是,peer-link链路为连接在peer-link端口的通信链路,因此,对于peer-link链路的故障检测,可以通过检测peer-link端口的状态实现,本领域技术人员熟知如何判断链路状态,在此不多作赘述。
值得注意的是,对于现有协议中的规定,当peer-link链路出现故障后,第二设备会进行退避操作,断开与上下游设备的链路,仅由第一设备承载数据交换功能,但是第二设备本身是处于正常运行的状态,因此,在出现第一设备发生故障的情况下,相比起现有协议中第二设备保持退避状态造成跨设备链路聚合交换系统停止工作,本实施例的技术方案能够利用第一设备的故障状态唤醒第二设备,使得第二设备恢复与网元的通信连接,并且接管第一设备在先承载的数据交换功能,能够确保跨设备链路聚合交换系统持续运行,确保网络节点的稳定性。
另外,参照图3,在一实施例中,第一设备和第二设备之间还连接有备选传输链路,第一设备的运行状态变更为故障状态由以下步骤确定:步骤S300,获取到第一设备通过备选传输链路发送的第一故障通知,第一故障通知由第一设备在发生故障的情况下生成并发送;或者,步骤S310,检测到备选传输链路异常;或者,步骤S320,通过备选传输链路向第一设备发送保活报文,在预设的保活周期内未接收到第一设备通过备选传输链路反馈的保活响应报文。
需要说明的是,在步骤S300中,在第一设备发生故障或者设备重启时,是能够在设备关闭之前发送第一故障通知的,例如,第一设备在整机异常之前向第二设备发送死亡通告报文,又如,在执行重启之前向第二设备发送设备重启相关的消息,也可以通过其他触发信息告知第二设备,本实施例对消息类型不多作限定。
需要说明的是,在步骤S300中,通过发送第一故障通知的方式将第一设备的故障告知第二设备是较为准确的方式,第二设备误判的可能性较低,但是,第一故障通知通常是在第一设备检测到即将故障时生成,也可能出现第一设备的故障发生了,但是第一故障通知未能及时发出的情况,例如第一设备检测到重启指令之后开始生成和发送第一故障通知,但是在发送之前已经执行重启,此时第一故障通知并未发出,而第一设备已经进入故障状态,因此,除了步骤S300的故障检测方式以外,还可以通过备选传输链路的异常检测来确定第一设备的运行状态。在本实施例的技术方案中,由于备选传输链路是第一设备和第二设备之间的传输链路,由于本实施例的技术方案是建立在第二设备正常运行的前提下,因此当第一设备出现故障导致数据无法收发,备选传输链路会出现异常,确定第一设备的运行状态变更为故障状态。需要说明的是,在步骤S310中通过传输链路的异常确定第一设备故障的方式中,也可以针对第一设备和第二设备之间的其他可用传输链路,除了peer-link链路之外的任意链路均可,本实施例对此不多作限定。
需要说明的是,在步骤S320中,备选传输链路可以采用第一设备和第二设备之间的Keepalive链路,因此,能够通过该链路向第一设备发送保活报文,在第一设备处于正常运行状态之下,能够响应保活报文,并且反馈保活响应报文,而考虑到第一设备的资源分配,可能需要一定时间才可以响应保活报文,因此可以设定保活周期,在该周期内未接收到保活响应报文,则可以确定第一设备处于故障状态。当然,也可以采用其他报文的检测实现第一设备运行状态的确定,例如通过心跳报文检测等,在此不多作赘述,能够通过报文交互方式确定第一设备的运行状态即可。
值得注意的是,步骤S300、步骤S310和步骤S320可以是任选其一,也可以是两个以上组合为条件集合,例如,在备选传输链路出现故障的情况下,也可能是第一设备的运行状态未发生变更,而仅仅是备选传输链路对应的端口出现故障,此时可以更换一条新的传输链路,通过保活报文的方式确定第一设备的运行状况,又如,在更换一条新的传输链路之后,接收到第一设备发送的触发信息,则可以确定并非备选传输链路本身出现故障,而是第一设备出现故障。当然,本领域技术人员也可以通过其他方式或者多种条件的组合确定第一设备的运行状态变更为故障状态,在此不多作限定。
另外,在一实施例中,第二设备通过第二通信链路与网元通信连接,第一设备和第二设备之间还连接有备选传输链路,参照图4,在一实施例中,在图1所示的步骤S100中,在检测到第一设备的运行状态变更为故障状态之前,还包括但不限于有以下步骤S400-S420。
步骤S400,当检测到peer-link链路的运行状态变更为故障状态,关闭第二通信链路。
步骤S410,获取第一设备通过备选传输链路发送的第一更新信息,其中,第一更新信息表征第一设备的硬件表项的变更。
步骤S420,根据第一更新信息更新第二设备的硬件表项。
值得注意的是,可以通过关闭第二设备中与第二通信链路所对应的通信端口,实现第二通信链路的关闭,例如图2所示的结构中,当peer-link链路出现故障,第二设备230根据现有协议执行退避操作,断开第二端口231,并进入退避状态。
本领域技术人员熟知的是,在跨设备链路聚合交换系统中,peer-link链路用于传递第一设备和第二设备的硬件表项的更新信息,硬件表项通常包括通信端口所连接的设备的表项信息,例如介质访问控制(Media Access Control,MAC)地址表、路由表和地址解析协议 (Address Resolution Protocol,ARP)表等信息,因此硬件表项的更新信息的数据量通常不大,当peer-link链路处于故障状态下,为了确保第二设备能够以正确的硬件表项接管交换功能,可以通过备选传输链路第一设备和第二设备之间实现第一更新信息的传递,其中,备选传输链路可以是第一设备和第二设备之间任意的可用链路,例如可以采用保活(Keepalive)链路作为备选传输链路,本领域技术人员有能力根据交互的数据量从任意一条可通信的传输链路选取备选传输链路,本实施例对此不多作限定。并且,备选传输链路的数量也可以是任意,例如在图2所示的结构中,第一设备220和第二设备230中分别有两个端口,用于实现上下游的数据交换,因此,可以针对第一端口221和第二端口231选定一条第一备选传输链路,针对第三端口222和第四端口232选定一条第二备选传输链路,当然,也可以仅选定一条备选传输链路,本实施例对此不用多作限定,根据实际需求确定备选传输链路的数量即可。
需要说明的是,在第二设备接管交换功能之后,通过改变硬件表项所指定的端口,使得数据通过第二通信链路进行收发,硬件表项在交换设备中的使用方法为本领域技术人员熟知的技术,在此不多作赘述。
另外,第一设备通过第一通信链路与网元通信连接,第一通信链路和第二通信链路归属于同一条跨设备聚合链路,参照图5,在一实施例中,在执行完图1所示的步骤S100之后,还包括但不限于有以下步骤S500。
步骤S500,在peer-link链路维持在故障状态的情况下,当检测第一设备的运行状态变更为正常运行状态,获取第二更新信息,通过备选传输链路将第二更新信息发送至第一设备,以使第一设备根据第二更新信息更新第一设备的硬件表项,并维持第一通信链路处于关闭状态,其中,第二更新信息表征第二设备的硬件表项的变更。
需要说明的是,在执行完图1所示的步骤S100后,第二设备完成对第一设备的接管,跨设备链路聚合交换系统通过第二设备与上下游设备进行数据交换,在这个过程中,虽然第一设备可以通过设备修复或者重启等方式从故障中恢复,但是若peer-link链路依然处于故障状态,第一设备和第二设备无法恢复跨设备聚合链路的连接方式,因此,可以通过维持第一通信链路处于关闭状态的方式,保持第一设备处于退避状态,维持由第二设备单独进行数据交换的工作方式。
可以理解的是,在本实施例的场景下,第二设备与网元通信连接并且承载跨设备链路聚合交换系统的数据交换功能,而在此过程中,第二设备的硬件表项可能会发生变化,而第一设备虽然处于退避状态,但是故障已经解除,此时通过备选传输链路将第二设备的硬件表项同步至第一设备,以确保跨设备链路聚合交换系统的两个设备中的硬件表项同步。
值得注意的是,在第二设备承载数据交换,第一设备处于退避状态的场景下,既可以对第一设备和第二设备的职能重新确定,即将当前的第二设备确定为第一设备,将当前的第一设备确定为第二设备,再利用备选传输链路执行上述实施例中记载的故障处理方法,也可以保持在先确定的职能不变,在peer-link链路恢复正常后重新建立跨设备聚合链路,本实施例对此不多作限定。
另外,参照图6,在一实施例中,在图5所示的步骤S400中,在通过备选传输链路将第二更新信息发送至第一设备之后,还包括但不限于有以下步骤S600。
步骤S600,当检测到peer-link链路的运行状态变更为正常运行状态,通过第二通信链 路协同第一通信链路恢复与网元的基于跨设备聚合链路的数据交换处理,其中,第一通信链路由第一设备在检测到peer-link链路的运行状态变更为正常运行状态的情况下打开。
需要说明的是,基于上述实施例的描述,当第一设备从故障恢复,由于peer-link链路维持故障状态,因此不能实现跨设备链路聚合,保持关闭第一通信链路,通过第二设备的第二端口承载数据交换功能,但当peer-link链路恢复为正常运行状态,则跨设备链路聚合具备了硬件基础,此时为了实现更好的系统稳定性和交换性能,第一设备可以重新打开第一通信链路,使得跨设备聚合链路交换系统重新通过跨设备聚合链路与网元进行数据交换处理,本领域技术人员熟知如何在具备硬件基础且交换系统无故障的情况下实现跨设备链路聚合,在此不多作赘述。
另外,参照图7,在一实施例中,在执行完图4所示的步骤S400之后,还包括但不限于有以下步骤S700。
步骤S700,在检测到第二设备发生故障的情况下,生成第二故障通知,并通过备选传输链路将第二故障通知发送至第一设备,以使第一设备在根据第二故障通知确定第二设备的运行状态变更为故障状态的情况下,通过打开第一通信链路接管第二设备与网元的数据交换处理。
值得注意的是,在peer-link链路维持故障状态,并且第一设备维持退避状态的情况下,当第二设备出现故障,可以通过上述实施例中描述的第二设备接管第一设备的方式,由第一设备打开第一通信链路,恢复与网元的通信连接,从而接管第二设备,确保跨设备链路聚合交换系统能够维持交换功能,提高网络的稳定性。
值得注意的是,采用了本实施例的技术方案,在peer-link链路处于故障状态的情况下,第一设备和第二设备之间可以形成相互保护的状态,其中一个设备出现故障,唤醒另一个设备对网络节点的交换功能进行接管,能够有效提高网络的稳定性。
需要说明的是,第二设备生成第二故障通知,并发送至第一设备的方法,可以参考图3所示实施例的步骤S300的描述中第一设备向第二设备发送第一故障通知的原理,区别仅在于收发的主体不同,为了叙述简便在此不赘述。
另外,参照图8,在一实施例中,图1所示的步骤S100之后,还包括但不限于有以下步骤S800。
步骤S800,重新打开处于关闭状态的所述第二通信链路。
值得注意的是,在peer-link链路出现故障后,根据现有协议第二设备进入退避状态,当第一设备进一步出现故障后,交换系统产生二次故障,根据上述实施例的描述,可以通过第二设备接管第一设备,以维持交换系统的交换功能。由于第二设备进行退避操作的方式是断开第二通信链路,因此,当第二设备被唤醒之后,可以重新打开第二通信链路,建立与网元的通信连接,从而实现交换功能的快速恢复,提高网络的稳定性。
另外,参照图9,图9提供了一种故障处理方法,应用于跨设备链路聚合交换系统的第一设备,跨设备链路聚合交换系统还包括第二设备,第一设备和第二设备之间连接有peer-link链路,该故障处理方法包括但不限于有步骤S900。
步骤S900,在peer-link链路处于故障状态的情况下,当检测到第一设备发生故障,向第二设备发送第一故障通知,以使第二设备在根据第一故障通知确定第一设备的运行状态变更为故障状态的情况下,接管第一设备与网元的数据交换处理。
需要说明的是,通过第二设备接管第一设备与网元进行数据交换处理的原理和方式可以参考图1所示实施例记载的技术方案,区别在于本实施例的技术方案是以第一设备作为执行主体进行描述,为了叙述简便在此不重复赘述。
需要说明的是,根据现有协议的规定,当第二设备处于退避状态,第一设备一旦出现故障,并不会执行任何操作,导致跨设备链路聚合交换系统停止工作,影响网络的稳定,因此,需要唤醒第二设备对第一设备与网元的数据交换处理进行接管,当然,可以通过第二设备对第一设备的运行状态进行检测,但是第二设备检测的方式可能存在误判,因此,为了提高第二设备确定第一设备故障状态的准确性,可以在检测到故障出现的情况下向第二设备发送第一故障通知,具体原理和方式可以参考图3中所示实施例步骤S300的描述,在此不多作赘述。
另外,参照图10,在一实施例中,第一设备通过第一通信链路与网元通信连接,第一通信链路和第二通信链路归属于同一条跨设备聚合链路,在执行完图9所示实施例中的步骤S900之后,还包括但不限于有以下步骤S1000-S1020。
步骤S1000,在peer-link链路维持在故障状态的情况下,当第一设备的运行状态变更为正常运行状态,获取第二设备通过备选传输链路发送的第二更新信息,其中,第二更新信息表征第二设备的硬件表项的变更。
步骤S1010,根据第二更新信息更新第一设备的硬件表项。
步骤S1020,维持第一通信链路处于关闭状态。
需要说明的是,本实施例的技术方案与图5所示实施例记载的技术方案相类似,区别在于本实施例的技术方案是以第一设备作为执行主体进行描述,第一设备恢复正常运行状态后,保持退避状态并通过第二更新信息更新第一设备的硬件表项流程和原理可以参考图5所示实施例的描述,为了叙述简便在此不重复赘述。
另外,参照图11,在执行完图10所示实施例中的步骤S1020之后,还包括但不限于有以下步骤S1100-S1110。
步骤S1100,当检测到peer-link链路的运行状态变更为正常运行状态,打开第一通信链路。
步骤S1110,通过第一通信链路协同第二通信链路恢复与网元的基于跨设备聚合链路的数据交换处理。
需要说明的是,本实施例的技术方案与图5所示实施例记载的技术方案相类似,区别在于本实施例的技术方案是以第一设备作为执行主体进行描述,第一设备重新打开第一通信链路的流程和原理可以参考图5所示实施例的描述,为了叙述简便在此不重复赘述。
另外,参照图12,在一实施例中,在执行完图10所示实施例中的步骤S1010之后,还包括但不限于有以下步骤S1200。
步骤S1200,在peer-link链路处于故障状态的情况下,当检测到第二设备的运行状态变更为故障状态,打开第一通信链路以接管第二设备与网元的数据交换处理。
需要说明的是,本实施例的技术方案与图6所示实施例记载的技术方案相类似,区别在于本实施例的技术方案是以第一设备作为执行主体进行描述,第一设备接管第二设备的流程和原理可以参考图6所示实施例的描述,为了叙述简便在此不重复赘述。
另外,参照图13,在一实施例中,图12所示实施例中,第一设备对第二设备的运行状态检测包括但不限于有以下步骤:步骤S1300,获取到第二设备通过备选传输链路发送的第 二故障通知,第二故障通知由第二设备在发生故障的情况下生成并发送;或者,步骤S1310,检测到备选传输链路异常;或者,步骤S1320,通过备选传输链路向第二设备发送保活报文,在预设的保活周期内未接收到第二设备通过备选传输链路反馈的保活响应报文。
需要说明的是,本实施例的技术方案与图3所示实施例记载的技术方案相类似,区别在于本实施例的技术方案是以第一设备对第二设备进行检测,而图3所示实施例是第二设备对第一设备进行检测,其检测的具体原理和步骤除了执行主体和信息传递方向之外没有区别,为了叙述简便在此不重复赘述。
为了更好地说明本申请的技术方案,以下提出一个具体示例,在该示例中,网络节点以图14所示的结构为例,如图14所示,以交换机A作为第一设备,交换机B作为第二设备,在交换机A中设置有端口A1和端口A2,在交换机B中设置有端口B1和B2,交换机A和交换机B组成一套跨设备链路聚合交换机系统,其中,端口A1和端口B1绑定得到跨设备聚合链路,端口A2和端口B2绑定得到跨设备聚合链路,备选传输链路以Keepalive链路为例,交换机A的故障以重启故障为例,对交换机A的故障判定方式以交换机A主动发送死亡通告报文为例。
在图14的结构基础上,参考图15,该故障处理方法包括但不限于有以下步骤S1510-S1550。
步骤S1510,跨设备链路聚合交换机系统与网关设备和服务器连接,其中,端口A1与网关设备之间连接有第一通信链路,端口B1与网关设备之间连接有第二通信链路,端口A2与服务器之间连接有第三通信链路,端口B2与服务器之间连接有第四通信链路。
步骤S1520,交换机B确定peer-link链路发生故障。
步骤S1521,交换机B将端口B1、端口B2停用,断开第二通信链路和第四通信链路,从而中断与网关设备和服务器的通信,进入退避状态,保留交换机A和交换机B之间的keepalive链路正常通信。
步骤S1522,交换机A维持工作,并通过keepalive链路将交换机A的硬件表项的变化与交换机B进行同步。
步骤S1530,交换机A出现整机重启故障,并在重启前通过Keepalive链路向交换机B发出死亡通告报文。
步骤S1531,交换机B重新启用端口B1和端口B2,打开第二通信链路和第四通信链路,交换机B与网关设备和服务器之间恢复正常通信。
步骤S1540,交换机A完成重启,peer-link链路维持故障。
步骤S1541,交换机A维持端口A1和端口A2关闭,保持断开第一通信链路和第三通信链路,维持退避状态。
步骤S1542,交换机B与交换机A之间通过keepalive链路进行硬件表项同步。
步骤S1550,peer-link链路故障消除,交换机A打开端口A1和端口A2,打开第一通信链路和第三通信链路,交换机A重新加入跨设备链路聚合,交换机A和交换机B形成完整的跨设备链路聚合系统。
由以上的技术方案可见,与现有的跨设备链路聚合故障处理方法相比,通过本示例的技术方案,只需要跨设备链路聚合交换机系统第一设备和第二设备之间存在可以通信的链路,则在第一设备peer-link链路故障之后,如果再出现第一设备异常,第二设备即可迅速接管 整个网络节点的交换功能,并且在第一设备和peer-link链路恢复正常之后,重新建立跨设备链路聚合系统,形成互为保护的状态。可以避免在peer-link链路故障之后再发生第一设备故障的情况下出现的网络节点的通信异常,而且在整个过程中第二设备可以维持正常的备份关系,同时第二设备之间还能保持整体硬件表项的同步,大幅提升了网络的稳定性和可靠性。
另外,参照图16,本申请的一个实施例还提供了一种跨设备链路聚合交换设备,该跨设备链路聚合交换设备1600包括:存储器1610、处理器1620及存储在存储器1610上并可在处理器1620上运行的计算机程序。
处理器1620和存储器1610可以通过总线或者其他方式连接。
实现上述实施例的故障处理方法所需的非暂态软件程序以及指令存储在存储器1610中,当被处理器1620执行时,执行上述实施例中的应用于第二设备的故障处理方法,例如,执行以上描述的图1中的方法步骤S100、图3中的方法步骤S300至步骤S320、图4中的方法步骤S400至步骤S420、图5中的方法步骤S500、图6中的方法步骤S600、图7中的方法步骤S700和图8中的方法步骤S800;或者,执行上述实施例中的应用于第一设备的故障处理方法,例如,执行以上描述的图9中的方法步骤S900、图10中的方法步骤S1000至步骤S1020、图11中的方法步骤S1100至步骤S1110、图12中的方法步骤S1200、图13中的方法步骤S1300至步骤S1320;或者,执行上述实施例中的应用于跨设备链路聚合交换系统的故障处理方法,例如,执行图15中的方法步骤S1510至步骤S1550。
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
此外,本申请的一个实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个处理器或控制器执行,例如,被上述跨设备链路聚合交换系统实施例中的一个处理器执行,可使得上述处理器执行上述实施例中的应用于第二设备的故障处理方法,例如,执行以上描述的图1中的方法步骤S100、图3中的方法步骤S300至步骤S320、图4中的方法步骤S400至步骤S420、图5中的方法步骤S500、图6中的方法步骤S600、图7中的方法步骤S700和图8中的方法步骤S800;或者,执行上述实施例中的应用于第一设备的故障处理方法,例如,执行以上描述的图9中的方法步骤S900、图10中的方法步骤S1000至步骤S1020、图11中的方法步骤S1100至步骤S1110、图12中的方法步骤S1200、图13中的方法步骤S1300至步骤S1320;或者,执行上述实施例中的应用于跨设备链路聚合交换系统的故障处理方法,例如,执行图15中的方法步骤S1510至步骤S1550。本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、 数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
本申请实施例包括:跨设备链路聚合交换系统的第一设备和第二设备之间连接有peer-link链路,在所述peer-link链路处于故障状态的情况下,当所述第二设备检测到所述第一设备的运行状态变更为故障状态,所述第二设备接管所述第一设备与网元进行的数据交换处理。根据本申请实施例提供的方案,在peer-link链路处于故障状态,当第一设备出现故障,能够通过第二设备接管跨设备链路聚合交换系统的数据交换功能,从而避免了网络节点出现异常,有效提高了通信系统的稳定性。
以上是对本申请的一些实施进行了具体说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请范围的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。

Claims (15)

  1. 一种故障处理方法,应用于跨设备链路聚合交换系统的第二设备,所述跨设备链路聚合交换系统还包括第一设备,所述第一设备和所述第二设备之间连接有对等peer-link链路,所述故障处理方法包括:
    在所述peer-link链路处于故障状态的情况下,当检测到所述第一设备的运行状态变更为故障状态,接管所述第一设备与网元的数据交换处理。
  2. 根据权利要求1所述的方法,其中,所述第一设备和所述第二设备之间还连接有备选传输链路,所述第一设备的运行状态变更为故障状态由以下至少之一的方式确定:
    获取到所述第一设备通过所述备选传输链路发送的第一故障通知,所述第一故障通知由所述第一设备在发生故障的情况下生成并发送;
    或者,
    检测到所述备选传输链路异常;
    或者,
    通过所述备选传输链路向所述第一设备发送保活报文,在预设的保活周期内未接收到所述第一设备通过所述备选传输链路反馈的保活响应报文。
  3. 根据权利要求1所述的方法,其中,所述第二设备通过第二通信链路与所述网元通信连接,所述第一设备和所述第二设备之间还连接有备选传输链路,在检测到所述第一设备的运行状态变更为故障状态之前,所述方法包括:
    当检测到所述peer-link链路的运行状态变更为故障状态,关闭所述第二通信链路;
    获取所述第一设备通过所述备选传输链路发送的第一更新信息,其中,所述第一更新信息表征所述第一设备的硬件表项的变更;
    根据所述第一更新信息更新所述第二设备的硬件表项。
  4. 根据权利要求3所述的方法,其中,所述第一设备通过第一通信链路与所述网元通信连接,所述第一通信链路和所述第二通信链路归属于同一条跨设备聚合链路,在所述接管所述第一设备与网元的数据交换处理之后,所述方法还包括:
    在所述peer-link链路维持在故障状态的情况下,当检测所述第一设备的运行状态变更为正常运行状态,获取第二更新信息,通过所述备选传输链路将所述第二更新信息发送至所述第一设备,以使所述第一设备根据所述第二更新信息更新所述第一设备的硬件表项,并维持所述第一通信链路处于关闭状态,其中,所述第二更新信息表征所述第二设备的硬件表项的变更。
  5. 根据权利要求4所述的方法,其中,在所述通过所述备选传输链路将所述第二更新信息发送至所述第一设备之后,所述方法还包括:
    当检测到所述peer-link链路的运行状态变更为正常运行状态,通过所述第二通信链路协同所述第一通信链路恢复与所述网元的基于跨设备聚合链路的数据交换处理,其中,所述第一通信链路由所述第一设备在检测到所述peer-link链路的运行状态变更为正常运行状态的情况下打开。
  6. 根据权利要求4所述的方法,其中,在所述通过所述备选传输链路将所述第二更新信息发送至所述第一设备之后,所述方法还包括:
    在检测到所述第二设备发生故障的情况下,生成第二故障通知,并通过所述备选传输链路将所述第二故障通知发送至所述第一设备,以使所述第一设备在根据所述第二故障通知确定所述第二设备的运行状态变更为故障状态的情况下,通过打开所述第一通信链路接管所述第二设备与所述网元的数据交换处理。
  7. 根据权利要求3所述的方法,其中,在所述接管所述第一设备与网元的数据交换处理之前,所述方法还包括:
    重新打开处于关闭状态的所述第二通信链路。
  8. 一种故障处理方法,应用于跨设备链路聚合交换系统的第一设备,所述跨设备链路聚合交换系统还包括第二设备,所述第一设备和所述第二设备之间连接有peer-link链路,所述故障处理方法包括:
    在所述peer-link链路处于故障状态的情况下,当检测到所述第一设备发生故障,向所述第二设备发送第一故障通知,以使所述第二设备在根据所述第一故障通知确定所述第一设备的运行状态变更为故障状态的情况下,接管所述第一设备与网元的数据交换处理。
  9. 根据权利要求8所述的方法,其中,所述第一设备和所述第二设备之间还连接有备选传输链路,在所述向所述第二设备发送第一故障通知之前,所述方法还包括:
    在所述peer-link链路处于故障状态的情况下,生成第一更新信息,所述第一更新信息表征所述第一设备的硬件表项的变更;
    通过所述备选传输链路向所述第二设备发送所述第一更新信息,以使所述第二设备根据所述第一更新信息更新所述第二设备的硬件表项。
  10. 根据权利要求9所述的方法,其中,所述第一设备通过第一通信链路与所述网元通信连接,所述第一通信链路和所述第二通信链路归属于同一条跨设备聚合链路,在所述向所述第二设备发送第一故障通知,以使所述第二设备在根据所述第一故障通知确定所述第一设备的运行状态变更为故障状态的情况下,接管所述第一设备与网元的数据交换处理之后,所述方法还包括:
    在所述peer-link链路维持在故障状态的情况下,当所述第一设备的运行状态变更为正常运行状态,获取所述第二设备通过所述备选传输链路发送的第二更新信息,其中,所述第二更新信息表征所述第二设备的硬件表项的变更;
    根据所述第二更新信息更新所述第一设备的硬件表项;
    维持所述第一通信链路处于关闭状态。
  11. 根据权利要求10所述的方法,其中,在所述根据所述第二更新信息更新所述第一设备的硬件表项之后,所述方法还包括:
    当检测到所述peer-link链路的运行状态变更为正常运行状态,打开所述第一通信链路;
    通过所述第一通信链路协同所述第二通信链路恢复与所述网元的基于跨设备聚合链路的数据交换处理。
  12. 根据权利要求10所述的方法,其中,在所述根据所述第二更新信息更新所述第一设备的硬件表项之后,所述方法还包括:
    在所述peer-link链路处于故障状态的情况下,当检测到所述第二设备的运行状态变更为故障状态,打开所述第一通信链路以接管所述第二设备与网元的数据交换处理。
  13. 根据权利要求12所述的方法,其中,所述第二设备的运行状态变更为故障状态由以 下至少之一的方式确定:
    获取到所述第二设备通过所述备选传输链路发送的第二故障通知,所述第二故障通知由所述第二设备在发生故障的情况下生成并发送;
    或者,
    检测到所述备选传输链路异常;
    或者,
    通过所述备选传输链路向所述第二设备发送保活报文,在预设的保活周期内未接收到所述第二设备通过所述备选传输链路反馈的保活响应报文。
  14. 一种交换设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如权利要求1至7中任意一项所述的故障处理方法,或者,所述处理器执行所述计算机程序时实现如权利要求8至13中任意一项所述的故障处理方法。
  15. 一种计算机可读存储介质,存储有计算机可执行指令,其中,所述计算机可执行指令用于执行如权利要求1至13中任意一项所述的故障处理方法。
PCT/CN2022/098763 2021-06-15 2022-06-14 故障处理方法、交换设备、存储介质 WO2022262739A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110658832.2 2021-06-15
CN202110658832.2A CN115484218A (zh) 2021-06-15 2021-06-15 故障处理方法、交换设备、存储介质

Publications (1)

Publication Number Publication Date
WO2022262739A1 true WO2022262739A1 (zh) 2022-12-22

Family

ID=84419133

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/098763 WO2022262739A1 (zh) 2021-06-15 2022-06-14 故障处理方法、交换设备、存储介质

Country Status (2)

Country Link
CN (1) CN115484218A (zh)
WO (1) WO2022262739A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116248618A (zh) * 2023-05-08 2023-06-09 河北豪沃尔智能科技有限责任公司 信息传输装置、信息传输线路故障检测方法及设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108989200A (zh) * 2018-07-11 2018-12-11 深圳市信锐网科技术有限公司 数据包转发方法、装置和系统
US20190230026A1 (en) * 2018-01-19 2019-07-25 Super Micro Computer, Inc. Automatic multi-chassis link aggregation configuration with link layer discovery
CN111988213A (zh) * 2020-07-16 2020-11-24 浪潮思科网络科技有限公司 一种在evpn mlag环境下同步vxlan隧道的方法及设备、介质
CN112671642A (zh) * 2021-01-14 2021-04-16 新华三信息安全技术有限公司 一种报文转发方法及设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190230026A1 (en) * 2018-01-19 2019-07-25 Super Micro Computer, Inc. Automatic multi-chassis link aggregation configuration with link layer discovery
CN108989200A (zh) * 2018-07-11 2018-12-11 深圳市信锐网科技术有限公司 数据包转发方法、装置和系统
CN111988213A (zh) * 2020-07-16 2020-11-24 浪潮思科网络科技有限公司 一种在evpn mlag环境下同步vxlan隧道的方法及设备、介质
CN112671642A (zh) * 2021-01-14 2021-04-16 新华三信息安全技术有限公司 一种报文转发方法及设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116248618A (zh) * 2023-05-08 2023-06-09 河北豪沃尔智能科技有限责任公司 信息传输装置、信息传输线路故障检测方法及设备
CN116248618B (zh) * 2023-05-08 2023-09-08 河北豪沃尔智能科技有限责任公司 信息传输装置、信息传输线路故障检测方法及设备

Also Published As

Publication number Publication date
CN115484218A (zh) 2022-12-16

Similar Documents

Publication Publication Date Title
US10764119B2 (en) Link handover method for service in storage system, and storage device
EP2951962B1 (en) Systems and methods for layer-2 traffic polarization during failures in a virtual link trunking domain
US9385944B2 (en) Communication system, path switching method and communication device
US7549078B2 (en) Redundancy in routing devices
WO2018050120A1 (zh) 主备伪线pw切换
CN112134796B (zh) 一种实现流量切换的方法、装置及系统
US8824275B2 (en) Route calculating after switching occurs from a primary main control board to a standby main control board
CN111698028B (zh) 一种fc链路检测方法、装置、设备及机器可读存储介质
EP2748989B1 (en) Methods and apparatus for avoiding inter-chassis redundancy switchover to non-functional standby nodes
WO2022262739A1 (zh) 故障处理方法、交换设备、存储介质
US9960993B2 (en) Packet network linear protection systems and methods in a dual home or multi-home configuration
CN110278094B (zh) 链路恢复方法及装置、系统、存储介质、电子装置
CN111988213A (zh) 一种在evpn mlag环境下同步vxlan隧道的方法及设备、介质
CN111586505B (zh) Pon接入系统中主备倒换业务快速恢复的实现方法及装置
CN111953591A (zh) 故障处理方法及装置
WO2013007124A1 (zh) 虚拟路由冗余协议通告链路保护方法及系统
CN115333991B (zh) 跨设备链路聚合方法、装置、系统及计算机可读存储介质
WO2021114978A1 (zh) 路由器链路更新方法、路由器及存储介质
CN111224803B (zh) 一种堆叠系统中多主检测方法及堆叠系统
CN112653596B (zh) 一种路由信息下发、网关设备切换的方法及装置
CN115567343B (zh) 一种基于mlag的以太环网主备倒换方法、设备及介质
WO2020083271A1 (zh) 一种聚合链路收敛方法、装置及存储介质
CN111083068B (zh) 一种聚合链路收敛方法、装置及存储介质
WO2016169139A1 (zh) 一种单纤故障的响应方法及装置
CN117221216A (zh) 网络设备故障处理方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22824218

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE