WO2007140689A1

WO2007140689A1 - A failure processing method and a system and a device thereof

Info

Publication number: WO2007140689A1
Application number: PCT/CN2007/001194
Authority: WO
Inventors: Jianhua Gao; Dan Li
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2006-06-09
Filing date: 2007-04-12
Publication date: 2007-12-13
Also published as: US8040793B2; EP1986371B1; EP1986371A1; CN101087207B; ES2376361T3; US20090086623A1; CN101087207A; DK1986371T3; EP1986371A4; ATE537664T1

Abstract

A failure processing method, applying to a label switch path comprising the first node, the second node and at least the third node, wherein, the first node and the second node are adjacent nodes that occur communication breakdown, the first node restarts, the third node is the normal node which is closest to the first node, when the communication between the first node and the second node is broken, the third node maintains the control state information of the LSP during a period of time; when the first node and the second node resumes communication in the period of time, the first node, the second node and the third node resume the control state information of the LSP. The invention also comprises a failure processing system and a device on LSP. By the application of the invention, the LSP can be resumed reliably when several nodes on the LSP occur communication breakdown.

Description

Fault processing method, system and device thereof

The present invention relates to a general multi-protocol label switching (GMPLS) technology, and more particularly to a method and system for fault processing in GMPLS. Background of the invention

At present, under the dual demand driven by the high-speed growth of Internet Protocol (IP) services and the new bandwidth utilization model introduced by WDM technology, the burstiness and uncertainty of IP services require network bandwidth. Dynamic allocation, while the traditional static optical transmission network is difficult to meet the needs of dynamic allocation, so intelligent optical network came into being. The intelligent optical network directly introduces the IP-based intelligent control technology on the optical network to effectively support the dynamic establishment and teardown of the connection, and the reasonable on-demand allocation of network resources based on traffic engineering can provide good network protection/ Restore performance.

The GMPLS control plane is introduced in the intelligent optical network, which enables the network to have strong survivability in the event of a fault, and implement dynamic application, release, and reconfiguration of bandwidth. Simplified network management and new value-added services. The biggest challenge facing the IP-based GMPLS signaling protocol when applied to telecom and optical transport networks is stability and security. In order to ensure maximum uninterrupted service, any control plane failure should not affect the services already established on the transmission plane and cause service interruption. In practical applications, the network must have good isolation and recovery capabilities for control plane failures, whether one or multiple control nodes in the control plane fail. After one or more consecutive control nodes fail and recover, the signaling state associated with the control node established before the failure must be restored to normal. For the processing of node communication failures, the Internet Engineering Task Force Request for Comments (IETF RFC) 3473 defines a resource reservation protocol (RSVP-TE) with traffic engineering to recover the restart of nodes in the control plane.

FIG. 1 shows a flow chart of an existing node restart processing method. Referring to Figure 1, the method includes:

In step 101, when node B loses power, node A does not receive a HELLO message from node B.

Node A and Node B are two nodes on the GMPLS control plane. When both Node A and Node B are in a normal state, the LDPO messages are mutually advertised to notify each other of the running state of the control plane software, and the refresh message is periodically sent. , to refresh the control status information in the two nodes. After Node B loses power, it cannot send a HELLO message to Node A.

In step 102, node A starts a recovery wait timer (Restart_Timer) for self-refresh.

When there is a label switched path (LSP) that passes through both node A and node B, node A determines that it cannot receive the HELLO message from node B and then starts its own recovery wait timer. Thereafter, Node A does not periodically send a refresh message corresponding to the LSP to Node B, but implements self-refresh by maintaining control state information related to the LSP. In other words, when the node A does not receive the periodic refresh message from the neighbor node B, the node A still maintains the control state information corresponding to the LSP during the recovery wait timer. If the refresh message of the neighbor node is not received after the recovery wait timer expires, the unwashed LSP is deleted.

More specifically, for a functioning LSP, each node on the node receives a path (Path) message from the upstream node and a reservation (RESV) message from the downstream node, which is established in the node. Path status block (PSB) and pre-for the LSP The status block (RSB) is used to store the control status information carried in the Path message and the RESV message, such as the label value, the bandwidth value, and the routing information of the LSP. The node sends a Path message to its downstream neighbor node according to the information in its own PSB, and then sends a RESV message to its upstream neighbor node according to the information in the RSB. Because the node cannot send RESV message to its upstream neighbor node after power failure, and it cannot send a Path message to its downstream neighbor node, the RSB in the upstream neighbor node cannot be refreshed periodically. That is, node A is in itself and node B. The related RSB performs a self-refresh.

In step 103, node A continuously sends a HELLO message to node B, requesting node B to answer.

In steps 104-105, Node B powers on and restarts, starts Recovery _ Timer, and sends a HELLO message to node , indicating that Node B has been restarted.

The function of the recovery restart timer in Node B is that Node B requires its neighbor nodes to complete the restoration of the control state information of all LSPs passing through Node B and Node A before the recovery restart timer expires. After the recovery restart timer expires, Node B deletes the LSP that has not been recovered.

The HELLO message usually contains cells such as the source instance ( src-instance ) and the destination instance ( dst-instance ). The src-instance is filled with the constant value of the node that sends the HELLO message during normal operation. The value can be saved after power-off. After the node restarts, add 1 to the power-down saved value; dst-instance is filled in recently. The src-instance value contained in the HELLO message sent by the peer node is received once. If the HELLO message of the peer end is not received, or the HELLO message is sent for the first time after the restart, the value in the cell is 0.

When the node is powered off and restarted, the src-instance value in the HELLO message sent by the node is the value before the node is restarted + 1 , and the dst-instance value is 0; when the node itself is positive When the link is running normally, but the communication link between nodes is interrupted, the value of src- instance in the HELLO message sent by the node is equal to the value before the communication link is interrupted, and the dst-instance value is 0. The node receiving the HELLO message determines whether the neighbor node has restarted or only the communication link is interrupted according to the combination of the src-instance value and the dst-instance value carried in the message.

In steps 106-107, the node A stops the recovery wait timer, stops the self-refresh of the control state information related to the LSP passing through the node A and the node B, and sends the node B with the recovery label (Recovery-label) Path message.

When the node A determines that the node B is restarted by the HELLO message from the node B, according to the control state information in its own PSB, the path operation is used to initiate a recovery operation on the LSPs passing through the nodes A and B, and each LSP corresponds to a Path message. .

In steps 108-109, the Node B returns a response (ACK) message for the Path message to the node A, indicating that the Node B has received the Path message from the Node A, and the Node B is associated with the LSP according to the received Path message. The relevant control status information is restored.

Since the control state information in the PSB is lost when the node B is powered off, the node B receives the Path message sent by the node A, creates a corresponding PSB, and records the upstream direction control included in the Path message in the PSB. status information. In addition, if there is a downstream node through the LSP of the Node B, the Node B sends a Path message to the downstream node, and the downstream node also sends a RESV message to the Node B, and when the Node B receives the RESV message, it creates a corresponding RSB, records the control status information in the RESV message. The establishment of PSB and RSB in Node B is completed, which means the recovery of the corresponding LSP is completed.

In step 110, the restart recovery timer expires, and the node B deletes the LSP that has not been recovered. After the restart recovery timer in Node B stops counting, if there are still LSPs in the Node B that are not restored to the control state information, the LSPs are deleted. At this point, the existing node restart processing flow is completed.

The above process is for the case of single-node restart. When there are multiple consecutive nodes restarting on one LSP, processing according to the above procedure will cause the LSP to be deleted.

Specifically, it is assumed that there is an LSP passing through nodes A, B, C, and D in sequence, nodes B and C are powered down, and upstream node B is restarted first, and downstream node C takes a long time to restart. After Node B restarts, Node A determines that Node B restarts according to the HELLO message from Node B, then sends a Path message to Node B, and Node B 4 recovers the corresponding PSB according to the Path message from Node A. However, since node C has not been restarted, node B cannot receive the RESV message from node C, and cannot recover the RSB, so that the RESV message cannot be sent to node A to refresh the RSB in node A. In this way, after receiving the HELLO message from the Node B, the node A stops the self-refresh. If the node A does not get the RESV refresh message from the node B for a long time, the corresponding RSB in the node A is caused. Node A is deleted. Then, Node A sends a Path-Tear message to Node B, informing Node B to delete the local PSB, and the corresponding LSP is deleted.

If nodes B and C on the LSPs passing through nodes A, B, C, and D in turn go down, and the downstream node C restarts first, and the upstream node B takes a long time to restart, node C cannot receive the restart after restarting. Path message of node B. Therefore, the PSB on the node C cannot be recovered, and after receiving the HELLO message sent by the node C after restarting, the node D stops the self-refresh processing of the PSB, so that the PSB on the node D will not receive the C node for a long time. After the Path message is sent, the timeout is deleted, and after the timer corresponding to its own RSB expires, the node D sends a Resv-Tear message to the node C, informing the node D to delete the local RSB, thereby causing the corresponding The LSP is deleted.

It can be seen that the existing node restart processing method is used to communicate with multiple nodes on the LSP. In the event of a failure, the LSP cannot be reliably recovered, so that the failure of the control plane affects the traffic of the transmission plane. SUMMARY OF THE INVENTION Embodiments of the present invention provide a fault processing method capable of reliably recovering an LSP when a communication failure occurs in a plurality of nodes on an LSP.

A fault processing method is applied to a label switched path LSP including a first node, a second node, and at least one third node, where the first node and the second node are adjacent nodes in which communication interruption occurs. The first node is restarted, and the third node is a normal node that is closest to the restarted first node. The method includes:

When the communication between the first node and the second node is interrupted, the third node maintains control state information of the LSP for a period of time;

The first node, the second node, and the third node restore control state information of the LSP when communication is resumed between the first node and the second node within the period of time.

An embodiment of the present invention further provides a fault processing system, including a first node, a second node, and at least one third node, where the first node and the second node are adjacent nodes in which communication interruption occurs, The first node is restarted, and the third node is a normal node that is closest to the restarted first node, where

The third node is configured to maintain control state information of the LSP if the communication between the first node and the second node is interrupted, if the first node and the second node are Restoring communication, recovering control state information of the LSP;

The first node is configured to restore control state information of the LSP if the first node and the second node resume communication during the period of time;

The second node is configured to restore control state information of the LSP if the first node and the second node resume communication during the period of time. An embodiment of the present invention further provides a device on a label switching path LSP, where the LSP includes a first device, a second device, and at least one third device, where the first device and the second device are adjacent. The device that is interrupted by the communication, the first device is restarted, and the third device is a normal device that is the closest to the restarted first device, where when the device is the third device, the method includes:

a first module, configured to start timing when communication between the first device and the second device is interrupted; if the time length does not exceed a period of time, if the first device and the second device resume communication, Stop timing;

a second module, if the duration of the first module does not exceed the period of time, if the communication between the first device and the second device is interrupted, maintaining control state information of the device, if When the first device and the second device resume communication, the control state information of the device is restored.

It can be seen from the above technical solution that the method according to the inventive concept can reliably recover the LSP when a communication failure occurs on multiple nodes on the LSP. Specifically, the present invention has the following beneficial effects:

When the communication failure occurs between consecutive nodes due to the long-term power failure of the node or the link between the nodes is interrupted, and the time between the restoration of the nodes is inconsistent, the nodes on the LSP are in recovery waiting time. Maintaining the control state information corresponding to the LSP in the node to perform self-refresh; and, if all the nodes in which the communication failure occurs are restored to normal within the recovery waiting time, the control state information in the node on the LSP is not restored. Recovery is performed to effectively prevent abnormal deletion of LSPs and improve the reliability of LSP recovery.

In addition, in the case that the LSP cannot be recovered due to the non-communication interruption failure, the LSP can be quickly deleted, so that the faulty connection can be quickly and accurately removed, which is beneficial to quickly release the network resources occupied by the LSP and improve network resources. Utilization rate. BRIEF DESCRIPTION OF THE DRAWINGS Exemplary embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

FIG. 1 is a flowchart of a conventional node restart processing method;

2 is an exemplary flowchart of a fault processing method of the present invention;

3 is a schematic diagram of an operation state of a node on an LSP according to Embodiments 1, 2, and 3 of the present invention; FIG. 4 is a flowchart of a fault processing method according to Embodiment 1 of the present invention;

FIG. 5 is a flowchart of a fault processing method according to Embodiment 2 of the present invention; FIG.

6 is a flowchart of a fault processing method in Embodiment 3 of the present invention;

7 is a schematic diagram of an operation state of a node on an LSP according to Embodiments 4 and 5 of the present invention; FIG. 8 is a flowchart of a fault processing method according to Embodiment 4 of the present invention;

FIG. 9 is a flowchart of a fault processing method according to Embodiment 5 of the present invention. Mode for carrying out the invention

In order to make the technical solutions and advantages of the present invention more comprehensible, the present invention will be further described in detail below with reference to the accompanying drawings.

The embodiment of the present invention describes a fault processing method. In this method, the normal node closest to the restarting node on the label switching path (LSP) self-refreshes the control state information corresponding to the LSP in the recovery waiting time.

2 is an exemplary flow chart of a fault handling method in the present invention. As shown in Figure 2, the method includes:

In step 201, when at least one node on the LSP restarts, the restarting node determines that the communication channel is interrupted with the neighboring node;

In step 202, the node on the LSP maintains the node within the recovery waiting time. The control state information corresponding to the LSP, and when the restarting node and the neighboring node resume communication in the recovery waiting time, the node on the LSP recovers the control state information corresponding to the LSP.

In the method of the present invention, if the restarting node and the neighboring node are still in the control channel communication interruption state after the recovery waiting time expires, the node on the LSP deletes the control state information corresponding to the LSP.

The method according to the inventive concept includes the following three processing methods:

In the first mode, after determining that the communication between the restarting node and its neighboring node is interrupted, the restarting node sends a communication interruption fault information, and the normal node receiving the closest node on the LSP is received. After the communication interruption fault information, according to the recovery waiting time Timing, the node on the LSP performs recovery wait processing by maintaining the control state information corresponding to the LSP during the recovery waiting time, and stops the recovery waiting process and restores the control state information of the LSP when the restart node resumes normal communication with the neighbor node. ;

In the second mode, after determining that the communication between the restart node and the neighbor node is interrupted, the restarting node sends a communication interruption failure information, and the restarting node performs timing according to the recovery waiting time, and the node on the LSP maintains the LSP corresponding to the recovery waiting time. Control state information to perform recovery wait processing, and when the restart node resumes normal communication with the neighbor node, stop restoring the waiting for processing and restoring the control state information of the LSP;

In the third mode, after determining that the communication between the restarting node and the neighboring node is interrupted, the restarting node constructs a normal recovery response message, and sends the message to the normal node on the LSP, and the restarting node performs timing according to the recovery waiting time, and the node on the LSP is During the recovery waiting time, the recovery waiting process is performed by maintaining the control state information corresponding to the LSP. When the restarting node resumes normal communication with the neighboring node, the recovery of the waiting process is resumed and the control state information of the LSP is restored. .

The reasons for restarting the node and the neighbor node in the control channel communication interruption state include: After the node is powered off, it has not been restarted or the communication link between the restart node and the neighbor node is interrupted and the neighbor node is running normally. When any of the above reasons causes the restart node and the neighbor node to be in the control channel interrupt state, the operations performed by the multi-node fault processing method according to the inventive concept are the same.

In the following, the fault processing method in the present invention will be described by taking five embodiments as a case where the neighboring node of the restarting node is not restarted for a long time.

Example 1

In this embodiment, an LSP passing through nodes A, B, C, and D in sequence is taken as an example for description. Fig. 3 shows the operation status of the node on the LSP in this embodiment. As shown in Figure 3, Node B and Node C are powered down, and Node B restarts and Node C cannot be restarted for a long time. Node B is the restart node, node C is the neighbor node of node B, node A is the normal node in the upstream direction from the restart node B, and node D is the normal node in the downstream direction closest to the restart node B. Here, according to the mode 1, the recovery waiting time is set on the LSP on the normal node closest to the restarting node, that is, the node A.

4 is a flow chart of a fault processing method in this embodiment. Referring to FIG. 4, the method includes: In step 401 402, when the node B is powered on and restarted, the node B sends a HELLO message to the node A and the node C, and the node B learns that the node B and the node C are in the control channel through the HELLO message mechanism. Communication interruption status.

In the HELLO message sent by node B, the src-instance value is the value before power-down + 1 and the dst-instance value is 0, so that node A and node C know the restart of node B.

After sending a HELLO message to the node C, the node B does not receive the response of the node C to the HELLO message, and therefore determines that the communication between the node B and the node C is in a communication interruption state. The reason for the interruption of the communication between the two nodes is not only that the node C is powered off but has not been restarted, but also that the communication link between the node B and the node C is faulty.

In steps 403-404, node A sends a recovery message to node B, notifying node B. The control state information corresponding to the LSP is restored, the node B performs a recovery operation according to the received recovery message, and notifies the node A of the communication interruption failure information between the node B and the node C.

Here, the node A sends a recovery message such as a Path message carrying the recovery label to the Node B, so that the Node B establishes the PSB according to the received recovery message and saves the state control information carried in the recovery message in the PSB. To achieve the recovery of local control status information.

Under normal circumstances, Node B sends a recovery message to Node C after completing the recovery PSB. However, since node C has not been restarted, so that the RSB in Node B cannot be recovered according to the normal protocol flow described in RFC3473, Node B notifies Node A of the relationship between Node B and Node C in the opposite direction of the received recovery message. In the communication interruption fault signal, in step 405, the node A starts the recovery waiting timer that is timed according to the recovery waiting time, and maintains the RSB control state information corresponding to the LSP through the self-refresh processing.

The node A determines that the LSP is temporarily unrecoverable by the communication interruption information from the node B, and then starts the recovery waiting timer that is timed according to the recovery waiting time, and in the RSB corresponding to the LSP in the node A during the timer counting period. The control state information is self-refreshed to prevent the timer corresponding to the RSB from timing out and deleting the RSB. The recovery wait timer here can be preset and started in this step, or it can be set and started in this step.

In addition, the downstream neighbor node D of the node C is also in a communication interruption state with the node C. Therefore, the control state information corresponding to the LSP is self-refreshed before the recovery waiting timer in the node D times out, for example, the LSP in the node D. The corresponding PSB performs self-refresh.

In steps 406-407, when node C restarts before the recovery wait timer in node A expires, node C sends a HELLO message to node B and node D, indicating that node C is powered on and restarted. In steps 408 ~ ⁴ 12, the node B sends a recovery message to the node C. After the node C completes the recovery of the control state information, the node C sends a recovery message to the node D, and the node D enters a normal control state information refresh process; the node D passes the node. C. Node B sends a recovery response message to node A, and node A stops the recovery waiting timer and enters a normal refresh process.

After receiving the recovery message, Node C determines whether it is the destination node of the LSP. In the case that the node C is the destination node, the control state information in the PSB and the RSB corresponding to the LSP in the node C is restored, and a recovery response message such as a RESV message is sent to the node B; The response message is restored, the RSB corresponding to the LSP in the node is restored, and a recovery response message is sent to the node A. After receiving the recovery response message, the node A stops its recovery waiting timer and starts normal control state information refresh processing.

In the case that the node C is not the destination node, the control state information in the PSB corresponding to the LSP in the node C is restored, and the recovery message is forwarded to the normal node D in the downstream direction; after receiving the recovery message, the node D corresponds to the LSP. The PSB synchronizes and starts the normal control state information refresh processing; then the node D sends the recovery response message hop by hop in the upstream direction, and the node C and the node B that received the recovery response message restore the control state information in the RSB corresponding to the LSP. After receiving the resume response message, node A stops its own recovery wait timer and starts normal control state information refresh processing.

If the communication between the node B and the node C is interrupted due to the interruption of the link between the nodes, when the node C is the destination node, after receiving the recovery message, the node C restores its own control state information and starts normal refresh processing.

So far, the node failure processing flow in this embodiment is completed.

The above is the case where the node C restarts before the recovery wait timer in the node A times out. When the recovery time of the recovery wait timer reaches the preset recovery waiting time, that is, after the timer stops counting, the node C still does not restart, then The node A initiates the process of deleting the LSP that is not restored, and sends a message for deleting the LSP. After receiving the message of deleting the LSP, the node B deletes the message. The control status information corresponding to the LSP in the node. Node D deletes the control status information corresponding to the LSP in the node due to the local restart timing timeout, or even if the node C restarts at a certain time after the recovery wait timer set by the A node expires, the node D will always The recovery message sent by the upstream node C is not received, and the control state information corresponding to the LSP on the node D is deleted.

When the communication failure occurs on multiple nodes on the LSP, and the time when each node returns to normal is inconsistent, the normal node closest to the restart node maintains the control state corresponding to the LSP in the node within the set recovery waiting time. The information is self-refreshed; and, if all the nodes in which the communication failure occurs are restored to normal within the recovery waiting time, the unrecovered control state information in the node on the LSP is restored, and the abnormal deletion of the LSP is effectively prevented. Improve the reliability of LSP recovery. If other non-communication faults occur on the LSP, for example, the control state information of the LSP has been deleted in the upstream node, and the downstream node quickly deletes the LSP control state information in the node, this embodiment may follow the method described in the RFC3473 protocol. The LSP is deleted, so that the faulty connection can be quickly and accurately removed. This facilitates the rapid release of network resources occupied by the LSP and improves the utilization of network resources.

Example 2

In this embodiment, the LSP shown in Figure 3 is still taken as an example. Node B restarts and node C fails to restart for a long time. Here, according to mode 2, the recovery waiting time is set on the restart node, that is, the node B, and the communication interruption failure information is sent to the node A.

FIG. 5 is a flowchart of a fault processing method in the embodiment. Referring to FIG. 5, the method includes: In step 501 502, when the node B is powered on and restarted, the node B sends a HELLO message to the node A and the node C, and the node B learns that the node B and the node C are in the control channel through the HELLO message mechanism. Communication interruption status.

In the HELLO message sent by Node B, the src-instance value is the value before power-down + 1 . The dst-instance value is 0, so that node A and node C know the restart of node B.

In steps 503-505, node A sends a recovery message to node B, informing node B to restore control state information corresponding to the LSP, node B performs a recovery operation according to the received recovery message, and then node B starts a recovery wait timer, and the node B notifies the node A of the communication interruption failure information between itself and the node C, and the node A performs self-refresh processing on the RSB control state information corresponding to the LSP.

Since the node C is not restarted, the RSB in the Node B cannot be recovered according to the normal protocol procedure described in RFC3473, so the node; B, after completing the recovery of the PSB, starts a recovery wait timer that is timed according to the recovery waiting time, and the node is The communication interruption failure information between B and node C is notified to node A. After receiving the communication interruption fault information from the node B, the node A performs self-refresh on the RSB control state information corresponding to the LSP in the node. The recovery wait timer here can be preset and started in this step, or it can be set and started in this step.

In addition, the downstream neighbor node D of the node C is in a communication interruption state with the node C. Therefore, the control state information corresponding to the LSP is self-refreshed before the recovery waiting timer in the node D times out.

In steps 506-507, the recovery wait timer expires when node C is in node A. Before restarting, node C sends a HELLO message to node B and node D, indicating that node C is powered on and restarted.

In steps 50 ⁸ - 512, the node B stops the recovery wait timer, enters the normal refresh process, and sends a recovery message to the node C. After completing the recovery of the control state information, the node C sends a recovery message to the node D, node D. The normal refresh process is entered; the node D sends a recovery response message to the node A through the node C and the node B.

After the node B learns that the node C is restarted through the HELLO mechanism, it stops its own recovery wait timer. After receiving the recovery message, Node C determines whether it is the destination node of the LSP. In the case that the node C is the destination node, the control state information in the PSB and the RSB corresponding to the LSP in the node C is restored, and a recovery response message such as a RES V message is sent to the node B; Receiving the recovery response message, restoring the RSB corresponding to the LSP in the node, and sending a recovery response message to the node A; after receiving the recovery response message, the node A synchronizes the control state information in the RSB and starts to perform the normal control state information refresh. deal with.

In the case that the node C is not the destination node, the control state information in the PSB corresponding to the LSP in the node C is restored, and the recovery message is forwarded to the normal node D in the downstream direction; after receiving the recovery message, the node D corresponds to the LSP. The PSB synchronizes and starts the normal control state information refresh processing; then the node D sends the recovery response message hop by hop in the upstream direction, and the node C and the node B that received the recovery response message restore the control state information in the RSB corresponding to the LSP. After receiving the resume response message, the node A synchronizes the control state information in its own RSB and starts normal control state information refresh processing.

So far, the fault processing flow in this embodiment is completed.

The above is the case where the node C restarts before the recovery wait timer in the node B times out. When the recovery time of the recovery wait timer reaches the preset recovery waiting time, that is, after the recovery wait timer stops counting, the node C has not restarted. , node B sends to node A The process of deleting the unrecovered LSP is deleted, and the message for deleting the LSP is sent. After receiving the message of deleting the LSP of the node B, the node A deletes the control state information corresponding to the LSP in the node. Node D deletes the control state information corresponding to the LSP in the node due to the local restart timing timeout, or even if the node C restarts at a certain time after the recovery wait timer set by the node B expires, the node D will always The recovery message sent by the upstream node C is not received, and the control state information corresponding to the LSP on the node D is deleted.

When the communication failure occurs on multiple nodes on the LSP, and the time when each node returns to normal is inconsistent, the restarting node maintains the control state information corresponding to the LSP in the node within the recovery waiting time; and, if a communication failure occurs All the nodes in the recovery time are restored to normal, and the unrecovered control state information of the node on the LSP is restored, which effectively prevents the LSP from being deleted abnormally and improves the reliability of the LSP recovery. If other non-communication faults occur on the LSP, the LSP can be quickly deleted according to the method described in the RFC3473 protocol, so that the faulty connection can be quickly and accurately removed, which facilitates the rapid release of network resources occupied by the LSP and improves network resources. Utilization.

Example 3

In this embodiment, the LSP shown in Figure 3 is still taken as an example. Node B restarts and node C fails to restart for a long time. Here, according to mode 3, the recovery waiting time is set on the restarting node, that is, the recovery waiting time is set on the node B, and a normal recovery response message is sent to the node A.

Fig. 6 is a flow chart showing the method of fault handling in this embodiment. See Figure 6. The method includes:

In steps 601-602, when Node B is powered on and restarted, Node B sends a HELLO message to Node A and Node C, and Node B learns that the control channel communication is interrupted between Node B and Node C through the HELLO message mechanism.

After transmitting the HELLO message to the node C, the node B does not receive the response of the node C to the HELLO message, and therefore determines that the node B and the node C are in a communication interruption state. The reason for the interruption of the communication between the two nodes is not only that the node C is powered off but not yet restarted, but also that the communication link between the node B and the node C is faulty.

In step 603 605, node A sends a recovery message to node B, informing node B to restore control state information corresponding to the LSP, and node B performs a recovery operation according to the received recovery message, and then node B starts a recovery wait timer, and according to A The recovery message sent by the node and the transmission plane information saved in the node B create a PSB in the node, and construct a normal recovery response message, and then send it to the node A. After receiving the recovery response message, the node A receives the RSB control status corresponding to the LSP. The information is synchronized and normal refresh processing begins.

Since the node C has not been restarted, the RSB in the Node B can normally send a refresh message such as a RESV message to the upstream node A, but the node B does not receive a refresh message such as a RES V message sent by the node C. Therefore, only part of the RSB control status information is saved in Node B. Therefore, after completing the recovery of the PSB and completing the recovery of the partial RSB, the Node B starts the recovery waiting timer that is timed according to the recovery waiting time, and performs self-refresh processing on the RSB control state information corresponding to the LSP in the Node B. At the same time, node B constructs a normal recovery response message and sends it to node A. The purpose of this operation is to restart the node, that is, at the node B, and know that the communication with the node C is in a state of communication interruption, and the other normal nodes on the LSP refresh the LSP in the normal manner.

In addition, the downstream neighbor node D of the node C is also in a communication interruption state with the node C. Therefore, the control state information corresponding to the LSP is self-refreshed before the recovery wait timer in the node D times out, for example, the node LSP corresponds to the LSP. The PSB performs a self-refresh.

In steps 606-607, the recovery wait timer of node C in node B times out. Before restarting, send a HELLO message to Node B and Node D, indicating that Node C is powered on and restarted. In step 60δ ~ ⁶ 11 , the node B stops the recovery waiting timer, enters the normal refresh process, and sends a recovery message to the node C. After completing the recovery of the control state information, the node C sends a recovery message to the node D, node D. The normal refresh process is entered; node D sends a recovery response message to nodes C and B, and nodes C and B recover according to the recovery response message.

After the node B learns that the node C is restarted through the HELLO mechanism, it stops its own recovery wait timer. After receiving the recovery message from Node B, Node C determines whether it is the destination node of the LSP.

In the case that the node C is the destination node, the control state information in the PSB and the RSB corresponding to the LSP in the node C is restored, and a recovery response message such as a RESV message is sent to the node B; The response message is restored, the control state information in the RSB corresponding to the LSP in the node is restored, and the normal control state information refresh processing is started with the node C.

In the case that the node C is not the destination node, the control state information in the PSB corresponding to the LSP in the node C is restored, and the recovery message is forwarded to the normal node D in the downstream direction; after receiving the recovery message, the node D corresponds to the LSP. The PSB recovers and starts normal control state information refresh processing; then node D sends a recovery response message hop by hop in the upstream direction, and node C that receives the recovery response message recovers control state information in the RSB corresponding to the LSP, node B After receiving the resume response message, it restores its own control state information and starts normal control state information refresh processing between the node B and the node C.

So far, the fault processing flow in this embodiment is completed.

The above is the case where the node C restarts before the recovery wait timer in the node B times out. When the recovery time of the recovery wait timer reaches the set recovery waiting time, that is, after the timer stops counting, the node C still does not restart, then the node B initiates deletion to the node A direction and is not restored. The processing of the complex LSP is sent to the A to delete the LSP message. After receiving the message of deleting the LSP, the node A deletes the control state information corresponding to the LSP in the node. Node D deletes the local control state information corresponding to the LSP due to the local restart timing timeout, or even if the node C restarts at a certain time after the recovery wait timer set by the node B expires, the node D will always The recovery message sent by the upstream node C is not received, and the control state corresponding to the LSP of the node D is deleted.

When the communication failure occurs on multiple nodes on the LSP, and the time when each node returns to normal is inconsistent, the restarting node maintains the control state information corresponding to the LSP in the node within the recovery waiting time; and, if a communication failure occurs All the nodes in the recovery time are restored to normal, and the unrecovered control state information of the node on the LSP is restored, which effectively prevents the LSP from being deleted abnormally and improves the reliability of the LSP recovery. If other non-communication faults occur on the LSP, the LSP can be quickly deleted according to the method described in the RFC3473 protocol, so that the faulty connection can be quickly and accurately removed, which is beneficial to the fast network resources occupied by the LSP. Utilization of network resources.

Example 4

In this embodiment, an LSP passing through nodes A, B, C, and D in sequence is taken as an example for description. FIG. 7 is a schematic diagram showing the operation state of the node in this embodiment. As shown in Figure 7, Node B and Node C are powered down, and Node C restarts and Node B cannot restart for a long time. Node C is the restart node, Node B is the neighbor node of Node C, Node A is the normal node in the upstream direction closest to Restart Node C, and Node D is the normal node in the downstream direction closest to Restart Node C. Here, according to the mode 1, the recovery waiting time is preset in the normal node closest to the restart node on the LSP, that is, the node D.

FIG. 8 is a flowchart of a fault processing method in the embodiment. Referring to FIG. 8, the method includes: In step 801 802, when node C is powered on and restarted, node C sends a HELLO message to node B and node D, and node C learns node C through HELLO message mechanism. And Node B is in the control channel communication interrupt state.

After sending a HELLO message to the Node B, the Node C does not receive the response of the Node B to the HELLO message, and therefore determines that the Node B and the Node C are in a communication interruption state. The reason for the interruption of the communication between the two nodes is not only that the node B is powered off but has not been restarted, but also that the communication link between the node B and the node C is faulty.

In steps 803-804, node D sends a recovery message to node C, informing node C to restore control state information corresponding to the LSP, node C performs a recovery operation according to the received recovery message, and communicates between node B and node C. The interrupt failure information is notified to node D.

Here, the recovery message sent by node D may be a Recovery_Path message. Node C recovers the control state information in the PSB corresponding to its own LSP according to the received recovery message. Since Node B has not been restarted, Node C notifies Node D of the communication interruption failure information with Node B.

In step 805, the node D starts a recovery waiting timer, and performs self-refresh processing on the PSB control state information corresponding to the LSP.

The node D interrupts the failure information through the communication from the node C, and determines that the LSP is temporarily unable to recover, and then starts the recovery waiting timer that is timed according to the preset recovery waiting time, and controls the state in the PSB corresponding to the LSP in the node D. The information is self-refreshed to prevent the timer corresponding to the PSB from timing out and deleting the PSB. In addition, the upstream neighbor node A of the node B is in the state of communication interruption with the node B. Therefore, the control state information corresponding to the LSP is self-refreshed before the recovery waiting timer expires in the node A, for example, the LSP in the node A. The corresponding RSB performs self-refresh.

In steps 806-807, when Node B restarts before the recovery wait timer in Node D times out, Node B sends a HELLO message to Node A and Node C, indicating that Node B is powered on and restarted.

In step 808 813, both node A and node C send a recovery message to node B. After completing the recovery of the control state information, the node B sends a recovery message to the node C. After completing the synchronization of the control state information, the node C sends a recovery message to the node D. After receiving the recovery message, the node D stops the recovery waiting timer and synchronizes. The control status information of the user enters the normal refresh process; the node D sends a recovery response message to the normal node A in the upstream direction through the node (the node B).

After receiving the recovery message from the node C in the downstream direction, the node B determines whether it is the source node of the LSP.

In the case that the node B is the source node, the control state information in the PSB and the RSB corresponding to the LSP in the Node B is restored, and a recovery message such as a Path message carrying the recovery label is sent to the node C; According to the received recovery message, the PSB corresponding to the LSP in the node is synchronized, and a recovery message is sent to the node D. After receiving the recovery message, the node D synchronizes its own control state information and stops its own recovery waiting timer, and starts normal. Control status information ^ 'J new processing.

In the case where the node B is not the source node of the LSP, the node B maintains part of the PSB control state information that it already exists and performs normal refresh processing in the downstream direction. Until the node B receives the recovery message sent by the node A in the upstream direction, it sends a recovery message such as a Path message carrying the recovery label to the node C. The node C synchronizes the LSP corresponding to the node according to the received recovery message. PSB, and send a recovery message to node D; after receiving the recovery message, node D synchronizes its own control state information and stops its own recovery waiting timer, starts normal control state information refresh processing; node D passes node (3, node B sends a recovery response message such as a RESV message to the normal node A in the upstream direction, and the node C and the node B restore the local RSB control state information after receiving the recovery response message, and the node A synchronizes the local RSB after receiving the recovery response message. Control status information and enter normal refresh processing.

So far, the fault processing flow in this embodiment is completed. The above is the case where the node B restarts before the recovery wait timer in the node D expires. When the recovery time of the recovery wait timer reaches the preset recovery waiting time, that is, after the timer stops counting, the node B still does not restart, then Node D initiates the process of deleting the LSP that was not recovered.

This embodiment is similar to the first embodiment. When a plurality of nodes on the LSP have a communication failure and the time when the nodes are normal, the LSP is abnormally deleted and the reliability of the LSP recovery is improved. When other non-communication faults occur, this embodiment can delete the LSP in combination with the method described in RPC3437, so that the faulty connection can be quickly and accurately removed, which facilitates the rapid release of network resources occupied by the LSP and improves the utilization of network resources.

Example 5

In this embodiment, the LSP shown in FIG. 7 is still taken as an example, and the node C is restarted and the node B cannot be restarted for a long time. Here, according to mode 2, on the restart node, that is, node C, the recovery wait time is set in advance.

Fig. 9 is a flow chart showing the method of fault handling in this embodiment. See Figure 9, which includes:

In step 901 902, when the node C is powered on and restarted, the node C sends a HELLO message to the node B and the node D, and the node C learns that the node C and the node B are in the control channel communication interruption state through the HELLO message mechanism.

In steps 903-905, node D sends a recovery message to node C, informing node C to restore control state information corresponding to the LSP, node C performs a recovery operation according to the received recovery message, and then node C starts a recovery wait timer, and The communication interruption failure information between the node B and the node C is notified to the node D, and the node D performs self-refresh processing on the PSB control state information corresponding to the LSP.

In steps 906-907, the recovery wait timer of node B in node D times out. Before restarting, send a HELLO message to node A and node C, indicating that node B is powered on and restarted. In steps 908-913, both nodes A and C send a recovery message to the node B. After completing the recovery of the control state information, the node B sends a recovery message to the node C, and the node C stops the recovery after the synchronization of the control state information is completed. The timer sends a recovery message to the node D; the node D synchronizes its own control state information after receiving the recovery message, and enters a normal refresh process; the node D passes the node C; and the node B sends a recovery response message to the normal node A in the upstream direction. .

In the case that the node B is the source node, the control state information in the PSB and the RSB corresponding to the LSP in the Node B is restored, and a recovery message such as a Path message carrying the recovery label is sent to the node C; Synchronizing the PSB corresponding to the LSP in the node according to the received recovery message and stopping its own recovery waiting timer, sending a recovery message to the node D; after receiving the recovery message, the node D synchronizes its own control state information, and starts normal control. Status information refresh processing.

In the case where the Node B is not the source node of the LSP, the Node B maintains part of the PSB control state information that it already exists and performs normal refresh processing in the downstream direction. The node B sends a recovery message such as a Path message carrying the recovery label to the node C until the node B receives the recovery message sent by the node A in the upstream direction; the node C synchronizes the LSP corresponding to the node according to the received recovery message. The PSB stops its own recovery waiting timer and sends a recovery message to the node D. After receiving the recovery message, the node D synchronizes its own control state information and starts normal control state information refresh processing; the node D passes the node C and the node B upwards. The normal node A in the swim direction sends a recovery response message such as a RESV message hop by hop. After receiving the recovery response message, the node C and the node B restore the local RSB control state information, and the node A synchronizes the local RSB control state after receiving the recovery response message. information. Enter the normal refresh Reason.

So far, the fault processing flow in this embodiment is completed.

The above is the case where the node B restarts before the recovery wait timer in the node C expires. When the recovery time of the recovery wait timer reaches the preset recovery waiting time, that is, after the timer stops counting, the node B still does not restart, then Node D deletes the LSP that was not recovered.

This embodiment is similar to the second embodiment. When a plurality of nodes on the LSP have a communication failure and the time when the nodes are normal, the LSP is abnormally deleted and the reliability of the LSP recovery is improved. When other non-communication faults occur, the embodiment can delete the LSP according to the method described in RFC3437, so that the faulty connection can be quickly and accurately removed, which is beneficial to quickly release the network resources occupied by the LSP and improve the utilization of network resources.

In addition, if Mode 3 is used to process the LSP shown in FIG. 7, the recovery wait timer in the node C is timed, and the node C determines that the communication is interrupted with the Node B, and a normal recovery response message is constructed. Sent to node D. And after the node B resumes communication with the node C, the recovery process is performed in a similar manner to the embodiment 3. Specifically, the node C creates a PSB corresponding to the LSP according to the recovery message sent by the downstream direction node and the transport plane information saved by itself, and maintains the control state information in the PSB through the self-refresh process, and the node C is constructed normally. The recovery response message is sent to the normal node D on the LSP. The node D refreshes the control state information corresponding to the LSP in the node D according to the received normal recovery response message, and sends a normal refresh message to the restart node C. The node C that receives the refresh message restores the LSP corresponding to the node. The RSB control status information enters the normal refresh process. During the recovery waiting time, the node C determines to resume communication with the node B, and the node C sends a refresh message to the node B, and the node B determines whether it is the source node of the LSP.

In the case where the node B is the source node, the node B recovers or synchronizes the LSP corresponding to itself. The PSB controls the status information, and sends a recovery message to the node C. The node C synchronizes the PSB control status information corresponding to the LSP in the node according to the received recovery message, stops its own recovery timer, and sends a refresh message to the node B. The node B that has refreshed the message restores the RSB control state information corresponding to the LSP in the node to the normal refresh process, and ends the process flow.

In the case where the node B is not the source node, the node B maintains part of the PSB control state information that it already exists and performs normal refresh processing in the downstream direction. After receiving the recovery message sent by the upstream direction node A, the node B sends a recovery message such as a Path message carrying the recovery label to the node C; the node C stops its recovery timing according to the received recovery message. Synchronizing the PSB control state information corresponding to the LSP in the node, and sending a refresh message to the normal node A in the upstream direction by the node B; the node B receiving the refresh message recovers the RSB control state information corresponding to the LSP in the node, and the node After receiving the refresh message, A synchronizes the RSB control status information corresponding to the LSP in the node, enters the normal refresh process, and ends the process.

In the above embodiments 1, 2, 4 and 5, after the restart node receives the recovery message from the upstream or downstream neighbor node, it transmits a communication interruption failure information to the normal node. The restarting node may also send the communication interruption fault information to the normal node according to the LSP routing information pre-recorded by the node after detecting the interruption of the communication with the neighboring node when the recovery message is not received.

In addition, the foregoing five embodiments are all cases in which a node in an LSP has a communication failure due to a long period of non-restart. The method according to the present invention can also be applied to a restart node and a neighbor node to operate normally, but an inter-node communication chain. A communication failure occurs when the road is interrupted. At this time, the node that has restarted detects the communication link interruption between itself and the neighbor node through the HELLO message mechanism after the restart, and can then operate according to the method of any of the above embodiments. The embodiment of the present invention further provides a fault processing system, including a first node, a second node, and at least one third node, where the first node and the second node are adjacent nodes in which communication interruption occurs, and the first node Restart, the third node is the normal node closest to the first node that is restarted.

In the system, the third node is configured to maintain control state information of the LSP if the communication between the first node and the second node is interrupted within a period of time, if the communication between the first node and the second node is resumed , the control state information of the LSP is restored;

a first node, configured to restore control state information of the LSP if the communication between the first node and the second node is resumed within a period of time;

The second node is configured to restore the control state information of the LSP if the communication between the first node and the second node is resumed within a period of time.

Corresponding to the fault handling method, the system also has the following three corresponding methods.

In the first mode, the first node is configured to construct a resume response message after receiving the recovery message from the third node, send the message to the third node, and start timing; when the time length does not exceed a period of time, if the first node Resume communication with the second node, then stop timing;

The third node is configured to send a recovery message to the first node, and the recovery response message according to the maintaining the LSP control state information when the time length is not exceeded.

In the second mode, the first node is configured to send a communication interruption fault information between the first node and the second node;

The third node is configured to start timing after receiving the communication interruption failure information; if the first node does not exceed the time period, if the communication between the first node and the second node is interrupted, the control state information of the LSP is maintained; When the communication with the second node resumes, the timing is stopped.

In the third mode, the first node is configured to send the communication interruption fault information, and start timing; when the time period is not exceeded, if the first node resumes communication with the second node, the timing is stopped; The third node is configured to maintain control state information of the LSP when the time length is not exceeded.

The operation of the system is similar to that described in the method and will not be described here.

An embodiment of the present invention provides a device on a label switched path LSP. The LSP includes a first device, a second device, and at least one third device. The first device and the second device are adjacent to each other. The device, the first device restarts, and the third device is a normal device that is closest to the first device that is restarted. Wherein, when the device is a third device, the method includes:

a first module, configured to start timing when communication between the first device and the second device is interrupted; when the time length is not exceeded, if the first device and the second device resume communication, stop timing;

a second module, if the duration of the first module does not exceed a period of time, if the communication between the first device and the second device is interrupted, maintaining control state information of the device, if the first device and the second device resume communication, Then restore the control status information of the device.

When the device is the first device, the method further includes:

a third module, configured to report, to the third device, the communication interruption fault information between the device and the second device, where the first module is configured to start timing when the communication interruption fault information is reported; If the device and the second device resume communication, stop timing.

When the device is the first device, the method further includes:

a fourth module, configured to: when the third device sends a recovery message to the device, construct a normal recovery response message and return the message to the third device;

The first module is configured to start timing when the recovery response message is returned; when the time length is not exceeded, if the device and the second device resume communication, the timing is stopped.

The method, the system, and the device described in the foregoing embodiments can prevent the abnormal deletion of the LSP and improve the reliability of the LSP recovery when a plurality of nodes on the LSP are in communication failures and the time when the nodes are in normal state are inconsistent. And when it happens on the LSP When it is not a communication failure, the embodiment can delete the LSP in combination with the method described in the RPC3437, so that the faulty connection can be quickly and accurately removed, which is beneficial to quickly release the network resources occupied by the LSP and improve the utilization of network resources.

The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalents, improvements, etc., which are made within the spirit and principles of the present invention, should be included in the present invention. Within the scope of protection.

Claims

Claim

A method for processing a fault, which is applied to a label switched path LSP including a first node, a second node, and at least one third node, where the first node and the second node are adjacent to each other. The node is restarted, and the third node is a normal node that is closest to the restarted first node, and the method includes:

The first node, the second node, and the third node restore control state information of the LSP when the communication is resumed between the first node and the second node within the period of time.

2. The method according to claim 1, wherein the maintaining control state information of the LSP comprises:

The third node sends a recovery message to the first node;

Returning, by the first node, a recovery response message to the third node, and starting timing; when the timed length does not exceed the time period, the third node maintains the LSP according to the first recovery response message Control state information, wherein if the first node resumes communication with the second node, the first node stops timing.

The method according to claim 1, wherein the maintaining the control state information of the LSP includes:

The first node sends a communication interruption failure information;

The third node starts timing when receiving the communication failure information, and maintains control state information of the LSP when the time length does not exceed the time period, where if the first node and the second node The communication is resumed between the nodes, and the third node stops timing.

The method of claim 1, wherein the maintaining the control state information of the LSP comprises: The first node sends a communication interruption failure information to the third node, and starts timing; the third node maintains control state information of the LSP when the time length does not exceed the time period, where The communication resumes between the first node and the second node, and the first node stops timing.

The method according to claim 3 or 4, wherein the first node is an upstream node of the second node; the third node comprises an upstream third node, or comprises an upstream third node and a downstream third node, the upstream third node is an upstream node of the first node, and the downstream third node is a downstream node of the second node;

The upstream third node maintains the reserved state block RSB control state information corresponding to the LSP by self-refresh;

And when the first node fails to communicate with the second node when the time length is not exceeded, and the first node sends a recovery message to the second node, the second node Determining whether it is the destination node of the LSP;

If the second node is the destination node, the second node recovers the PSB and RSB control state information corresponding to the LSP in the self according to the received recovery message, and sends a recovery response message to the first node; The first node recovers the RSB control state information corresponding to the LSP in the LSP according to the received recovery response message, and sends the recovery response message to the upstream third node or the source node hop by hop in the upstream direction. The node that receives the recovery response message restores the RSB control state information corresponding to the LSP in the LSP, and ends the processing flow;

If the second node is not the destination node, the second node recovers the PSB control state information corresponding to the LSP in the LSP according to the received recovery message, and sends the recovery message hop by hop to the downstream in the downstream direction. The three nodes, the downstream node that receives the recovery message restores or synchronizes the PSB control state information corresponding to the LSP in the LSP, and enters the normal refresh process, and then the downstream third node hops the recovery response message in the upstream direction. The node that receives the recovery response message restores the RSB control state information corresponding to the LSP in the upstream node or the source node.

The method of claim 5, wherein: the upstream third node maintains the RSB control state information, the method further includes:

The downstream third node maintains the path state block PSB control state information corresponding to the LSP by self-refresh in its own restart timing time.

The method according to claim 3 or 4, wherein the first node is a downstream node of the second node, the third node comprises a downstream third node, or comprises an upstream third node and a downstream third node, wherein the downstream third node is a downstream node of the first node, and the upstream third node is an upstream node of the second node;

The downstream third node maintains PSB control state information corresponding to the LSP by self-refresh;

When the first node does not resume communication with the second node, and the first node sends a recovery message to the second node, the second node The node determines whether it is a source node on the LSP;

If the second node is the source node, the second node recovers control state information in the PSB and/or SB corresponding to the LSP in the self according to the received recovery message, and the second node is downstream The direction sends the recovery message to the downstream third node hop by hop; the downstream third node recovers or synchronizes its own control state information, and sends a recovery response message to the second node hop by hop in the upstream direction. The node that receives the recovery response message restores or synchronizes the RSB control state information corresponding to the LSP in the LSP, enters the normal refresh process, and ends the processing flow;

If the second node is not the source node, the second node recovers the PSB control state information corresponding to the LSP in the LSP according to the recovery message after receiving the recovery message of the upstream third node, and is downstream Direction sends the recovery message hop by hop to the downstream third section The downstream third node recovers or synchronizes its own control state information, and sends a recovery response message hop by hop to the upstream third node in the upstream direction, and the upstream node that receives the recovery response message has an LSP in itself The corresponding RSB control status information is restored or synchronized, and the normal refresh processing is entered.

The method of claim 7, wherein the method further comprises: the downstream third node maintaining the PSB control state information, the method further comprising:

The upstream third node maintains the path state block RSB control state information corresponding to the LSP by self-refresh in its own restart timing.

The method according to claim 2, wherein the first node is an upstream node of the second node; the third node includes an upstream third node, or includes an upstream third node and a downstream node. a third node, the upstream third node is an upstream node of the first node, and the downstream third node is a downstream node of the second node;

If the first node fails to communicate with the second node when the time length is not exceeded, the first node sends a recovery message according to the upstream third node and a self-saved transmission. Plane information, the PSB corresponding to the LSP is created by itself, the control state information in the PSB is maintained by the self-refresh process, and the constructed recovery response message is sent to the upstream third node; the upstream third node receives the The recovery response message arrives, maintaining control state information corresponding to the LSP in itself;

And if the first node fails to communicate with the second node, and the second node receives a recovery message from the first node, the second The node determines whether it is the destination node of the LSP;

If the second node is the destination node, the second node recovers the PSB and RSB control state information corresponding to the LSP in the self, and sends a recovery response message to the first node; The first node recovers the RSB control state information corresponding to the LSP in the first node, and the first node and the second node enter a normal refresh process. And end this process;

If the second node is not the destination node, the second node recovers the PSB control state information corresponding to the LSP in the PDCCH according to the received recovery message, and sends the recovery message to the hop by hop in the downstream direction. The downstream third node, the node that receives the recovery message recovers the PSB control state information corresponding to the LSP in the self, and enters a normal refresh process; and the downstream third node sequentially returns the recovery message in the upstream direction. The hop is sent to the upstream third node or the source node, and the node that receives the recovery response message restores or synchronizes the RSB control state information corresponding to the LSP in itself, and enters a normal refresh process.

The method according to claim 2, wherein the first node is a downstream node of the second node, and the third node comprises a downstream third node, or an upstream third node and a downstream third node. a node, wherein the upstream third node is an upstream node of the second node, and the downstream third node is a downstream node of the first node;

If the first node does not exceed the period of time, if the communication between the first node and the second node is interrupted, the first node sends a recovery message according to the downstream third node and a self-saved transmission. Plane information, the PSB corresponding to the LSP is created by itself, the control state information in the PSB is maintained by the self-refresh process, and the recovery response message is sent to the downstream third node; the downstream third node is received according to the Restoring the response message, synchronizing the control status information corresponding to the LSP in the self;

If the first node fails to communicate with the second node, and the first node sends a recovery message to the second node, the second node determines Whether it is the source node of the LSP;

If the second node is the source node, the second node recovers or synchronizes the PSB and/or RSB control state information corresponding to the LSP in the self according to the received recovery message, and sends the status information to the first node. The recovery message, receiving the first of the recovery message The node recovers the PSB control state information corresponding to the LSP in the self, and sends a recovery response message to the second node, and the second node that receives the recovery response message restores the RSB control state information corresponding to the LSP in the self, and enters the normal state. Refreshing the process, and ending the process flow; if the second node is not the source node, the second node recovers the PSB control state information corresponding to the LSP according to the recovery message from the upstream third node, and Sending the recovery message to the first node, the first node that receives the recovery message recovers the PSB control state information corresponding to the LSP in the self, and sends a recovery response message to the second node, and receives the The second node of the recovery response message restores or synchronizes the RSB control state information corresponding to the LSP in the self, and enters a normal refresh process.

The method according to claim 1, wherein the first node and the second node are in a control channel communication interruption state: the second node has not been restarted or the first node and the The communication link between the second nodes is broken.

The method according to claim 1, wherein after the third node maintains control state information of the LSP for a period of time, the method further includes: after the period of time, The first node and the second node are still in the control channel interruption state, and the first node and the third node delete the control status information corresponding to the LSP.

13. A fault handling system, comprising: a first node, a second node, and at least one third node, wherein the first node and the second node are adjacent nodes in which communication interruption occurs, and the first node restarts The third node is a normal node that is closest to the restarted first node, and is characterized in that

The first node, configured to: if the first node and the first Recovering the communication between the two nodes, the control state information of the LSP is restored; the second node is configured to: if the communication between the first node and the second node resumes communication, Restore control state information of the LSP.

14. The system of claim 13 wherein:

The first node is configured to construct a resume response message after receiving the recovery message from the third node, send the message to the third node, and start timing; when the time length does not exceed the period of time, if Stop the communication between the first node and the second node, and stop timing;

The third node is configured to send the recovery message to the first node, and maintain the control state information of the LSP according to the recovery response message when the time length does not exceed the time period.

15. The system of claim 13 wherein:

The first node is further configured to send communication interruption fault information between the first node and the second node;

The third node is configured to start timing after receiving the communication interruption failure information; if the time length of the timer is not exceeded, if the communication between the first node and the second node is interrupted, Control state information of the LSP; if the first node resumes communication with the second node, stopping timing.

16. The system of claim 13 wherein:

The first node is further configured to send a communication interruption failure information, and start timing; when the time length does not exceed the time period, if the first node resumes communication with the second node, stop timing;

The third node is further configured to maintain control state information of the LSP when the long period of time is not exceeded.

A device on a label switching path LSP, where the LSP includes a first device, a second device, and at least one third device, where the first device and the second device are adjacent devices that interrupt communication, The first device is restarted, and the third device is a normal device that is closest to the restarted first device, and is characterized in that: when the device is the third device, the method includes:

The device according to claim 17, wherein when the device is the first device, the method further includes:

a third module, configured to report, to the third device, communication interruption fault information between the device and the second device;

The first module is configured to start timing when the communication interruption fault information is reported; when the time length is not exceeded, if the device and the second device resume communication, stop timing.

The first module is configured to start timing when the recovery response message is returned; if the time length does not exceed the time period, if the device and the second device resume communication, Then stop timing.