WO2007140689A1 - A failure processing method and a system and a device thereof - Google Patents

A failure processing method and a system and a device thereof Download PDF

Info

Publication number
WO2007140689A1
WO2007140689A1 PCT/CN2007/001194 CN2007001194W WO2007140689A1 WO 2007140689 A1 WO2007140689 A1 WO 2007140689A1 CN 2007001194 W CN2007001194 W CN 2007001194W WO 2007140689 A1 WO2007140689 A1 WO 2007140689A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
lsp
recovery
state information
control state
Prior art date
Application number
PCT/CN2007/001194
Other languages
French (fr)
Chinese (zh)
Inventor
Jianhua Gao
Dan Li
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to DK07720767.8T priority Critical patent/DK1986371T3/en
Priority to AT07720767T priority patent/ATE537664T1/en
Priority to ES07720767T priority patent/ES2376361T3/en
Priority to EP07720767A priority patent/EP1986371B1/en
Publication of WO2007140689A1 publication Critical patent/WO2007140689A1/en
Priority to US12/331,125 priority patent/US8040793B2/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/50Routing or path finding of packets in data switching networks using label swapping, e.g. multi-protocol label switch [MPLS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0062Network aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0062Network aspects
    • H04Q2011/0079Operation or maintenance aspects
    • H04Q2011/0081Fault tolerance; Redundancy; Recovery; Reconfigurability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0062Network aspects
    • H04Q2011/0088Signalling aspects

Abstract

A failure processing method, applying to a label switch path comprising the first node, the second node and at least the third node, wherein, the first node and the second node are adjacent nodes that occur communication breakdown, the first node restarts, the third node is the normal node which is closest to the first node, when the communication between the first node and the second node is broken, the third node maintains the control state information of the LSP during a period of time; when the first node and the second node resumes communication in the period of time, the first node, the second node and the third node resume the control state information of the LSP. The invention also comprises a failure processing system and a device on LSP. By the application of the invention, the LSP can be resumed reliably when several nodes on the LSP occur communication breakdown.

Description

一种故障处理方法及其系统和设备 技术领域  Fault processing method, system and device thereof
本发明涉及通用多协议标签交换( GMPLS )技术,尤其涉及 GMPLS 中故障处理方法及其系统和设备。 发明背景  The present invention relates to a general multi-protocol label switching (GMPLS) technology, and more particularly to a method and system for fault processing in GMPLS. Background of the invention
目前, 在互联网协议(IP ) 业务高速增长所产生的带宽需求以及波 分复用技术所引入的新型带宽利用模式的双重驱动下, IP业务的突发性 和不确定性要求能够实现网络带宽的动态分配, 而传统的静态光传输网 难以满足动态分配的需求, 因此智能光网络应运而生。 智能光网络直接 在光纤网络上引入了以 IP为核心的智能控制技术,从而有效支持连接的 动态建立与拆除, 并且基于流量工程对网络资源进行合理的按需分配, 能够提供良好的网络保护 /恢复性能。  At present, under the dual demand driven by the high-speed growth of Internet Protocol (IP) services and the new bandwidth utilization model introduced by WDM technology, the burstiness and uncertainty of IP services require network bandwidth. Dynamic allocation, while the traditional static optical transmission network is difficult to meet the needs of dynamic allocation, so intelligent optical network came into being. The intelligent optical network directly introduces the IP-based intelligent control technology on the optical network to effectively support the dynamic establishment and teardown of the connection, and the reasonable on-demand allocation of network resources based on traffic engineering can provide good network protection/ Restore performance.
智能光网络中引入了 GMPLS控制平面, 使得网络在发生故障时具 有强大的生存能力, 实现带宽的动态申请、 释放和重新配置。 简化了网 络管理, 提供了新的增值业务。 基于 IP的 GMPLS信令协议在被运用到 电信及光传输网络时, 所面临的最大挑战是稳定性和安全性问题。 为了 最大程度地保证业务不被中断, 任何控制平面的故障都不应该影响到传 输平面已经建立的业务并导致业务中断。 在实际应用中, 无论控制平面 中一个还是连续多个控制节点出现故障时, 网络都必须对控制平面故障 具备良好的隔离和恢复能力。 在一点或连续多个控制节点失效并恢复 后, 失效前所建立的业务在控制节点相关的信令状态必须能够恢复正 常。 针对节点通信故障的处理,互联网工程任务组请求注解( IETF RFC ) 3473中定义了带流量工程的资源预留协议(RSVP-TE ) , 对控制平面中 节点的重启进行恢复处理。 The GMPLS control plane is introduced in the intelligent optical network, which enables the network to have strong survivability in the event of a fault, and implement dynamic application, release, and reconfiguration of bandwidth. Simplified network management and new value-added services. The biggest challenge facing the IP-based GMPLS signaling protocol when applied to telecom and optical transport networks is stability and security. In order to ensure maximum uninterrupted service, any control plane failure should not affect the services already established on the transmission plane and cause service interruption. In practical applications, the network must have good isolation and recovery capabilities for control plane failures, whether one or multiple control nodes in the control plane fail. After one or more consecutive control nodes fail and recover, the signaling state associated with the control node established before the failure must be restored to normal. For the processing of node communication failures, the Internet Engineering Task Force Request for Comments (IETF RFC) 3473 defines a resource reservation protocol (RSVP-TE) with traffic engineering to recover the restart of nodes in the control plane.
图 1示出了现有的节点重启处理方法的流程图。 参见图 1 , 该方法 包括:  FIG. 1 shows a flow chart of an existing node restart processing method. Referring to Figure 1, the method includes:
在步骤 101 中, 当节点 B掉电时, 节点 A接收不到来自于节点 B 的 HELLO消息。  In step 101, when node B loses power, node A does not receive a HELLO message from node B.
节点 A和节点 B为 GMPLS控制平面上的两个节点,在节点 A与节 点 B均处于正常状态时, 通过互发 HELLO消息来相互通告控制平面软 件的运行状态, 并且通过周期性地发送刷新消息, 来对两节点中的控制 状态信息进行刷新。 在节点 B发生掉电后, 无法向节点 A发送 HELLO 消息。  Node A and Node B are two nodes on the GMPLS control plane. When both Node A and Node B are in a normal state, the LDPO messages are mutually advertised to notify each other of the running state of the control plane software, and the refresh message is periodically sent. , to refresh the control status information in the two nodes. After Node B loses power, it cannot send a HELLO message to Node A.
在步骤 102中, 节点 A启动恢复等待定时器(Restart— Timer ) , 进 行自刷新。  In step 102, node A starts a recovery wait timer (Restart_Timer) for self-refresh.
当存在同时经过节点 A和节点 B的标签交换路径 ( LSP ) 时, 节点 A确定接收不到来自于节点 B的 HELLO消息后, 启动自身的恢复等待 定时器。 此后, 节点 A不会再周期性地向节点 B发送对应于该 LSP的 刷新消息, 而是通过维持与该 LSP相关的控制状态信息来实现自刷新。 换言之, 节点 A在接收不到来自于邻居节点 B的周期性刷新消息时,在 恢复等待定时器计时期间仍然保持该 LSP对应的控制状态信息。如果在 恢复等待定时器超时后仍然未接收到邻居节点的刷新消息, 则将未被刷 新的 LSP删除。  When there is a label switched path (LSP) that passes through both node A and node B, node A determines that it cannot receive the HELLO message from node B and then starts its own recovery wait timer. Thereafter, Node A does not periodically send a refresh message corresponding to the LSP to Node B, but implements self-refresh by maintaining control state information related to the LSP. In other words, when the node A does not receive the periodic refresh message from the neighbor node B, the node A still maintains the control state information corresponding to the LSP during the recovery wait timer. If the refresh message of the neighbor node is not received after the recovery wait timer expires, the unwashed LSP is deleted.
更具体地说, 对于正常运行的 LSP而言, 其上的每个节点均会接收 到来自于上游节点的路径 (Path ) 消息以及来自于下游节点的预留 ( RESV ) 消息, 节点中会建立针对该 LSP的路径状态块(PSB )和预 留状态块(RSB ) , 分别用于保存 Path消息和 RESV消息中携带的控制 状态信息, 例如标签值、 带宽值、 LSP经过的路由信息等。 节点根据自 身 PSB中的信息, 向其下游邻居节点发送 Path消息, 才艮据 RSB中的信 息, 向其上游邻居节点发送 RESV消息。 由于节点发生掉电后无法向其 上游邻居节点发送 RESV消息、'并且也无法向其下游邻居节点发送 Path 消息, 因此上游邻居节点中的 RSB无法被周期刷新, 即节点 A对自身 中与节点 B相关的 RSB进行自刷新。 More specifically, for a functioning LSP, each node on the node receives a path (Path) message from the upstream node and a reservation (RESV) message from the downstream node, which is established in the node. Path status block (PSB) and pre-for the LSP The status block (RSB) is used to store the control status information carried in the Path message and the RESV message, such as the label value, the bandwidth value, and the routing information of the LSP. The node sends a Path message to its downstream neighbor node according to the information in its own PSB, and then sends a RESV message to its upstream neighbor node according to the information in the RSB. Because the node cannot send RESV message to its upstream neighbor node after power failure, and it cannot send a Path message to its downstream neighbor node, the RSB in the upstream neighbor node cannot be refreshed periodically. That is, node A is in itself and node B. The related RSB performs a self-refresh.
在步骤 103中, 节点 A不断向节点 B发送 HELLO消息, 请求节点 B应答。  In step 103, node A continuously sends a HELLO message to node B, requesting node B to answer.
在步骤 104 ~ 105 中, 节点 B 上电重启, 启动恢复重启定时器 ( Recovery _ Timer ) , 并向节点 Α发送 HELLO消息, 指明节点 B已重 启。  In steps 104-105, Node B powers on and restarts, starts Recovery _ Timer, and sends a HELLO message to node , indicating that Node B has been restarted.
节点 B中恢复重启定时器的作用在于:节点 B要求其邻居节点在该 恢复重启定时器超时之前, 完成所有经过节点 B与节点 A的 LSP的控 制状态信息的恢复。 在恢复重启定时器超时后, 节点 B删除未被恢复的 LSP。  The function of the recovery restart timer in Node B is that Node B requires its neighbor nodes to complete the restoration of the control state information of all LSPs passing through Node B and Node A before the recovery restart timer expires. After the recovery restart timer expires, Node B deletes the LSP that has not been recovered.
HELLO 消息中通常包含源实例 ( src-instance ) 和目 的实例 ( dst-instance )等信元。 其中 src-instance中填写有发送 HELLO消息的 节点在正常运行时统一的常数值, 该数值可以被掉电保存, 节点重启后 在掉电保存值的基础上加 1 ; dst-instance中填写有最近一次收到对端节 点发来的 HELLO消息中包含的 src-instance值, 如果一直接收不到对端 的 HELLO消息, 或者重启后第一次发送 HELLO消息, 该信元中的数 值为 0。  The HELLO message usually contains cells such as the source instance ( src-instance ) and the destination instance ( dst-instance ). The src-instance is filled with the constant value of the node that sends the HELLO message during normal operation. The value can be saved after power-off. After the node restarts, add 1 to the power-down saved value; dst-instance is filled in recently. The src-instance value contained in the HELLO message sent by the peer node is received once. If the HELLO message of the peer end is not received, or the HELLO message is sent for the first time after the restart, the value in the cell is 0.
当节点发生掉电重启后, 在该节点所发送的 HELLO 消息中的 src-instance值是节点重起前的值 + 1 , dst-instance值为 0; 当节点本身正 常运行、 但节点间通信链路中断时, 节点所发送的 HELLO 消息中的 src- instance值等于通信链路中断前的值, 并且 dst-instance值为 0。 接收 到 HELLO 消息的节点根据该消息中所携带的 src-instance 值和 dst-instance值的变化组合来确定邻居节点是发生了重启还是仅仅发生了 通信链路中断。 When the node is powered off and restarted, the src-instance value in the HELLO message sent by the node is the value before the node is restarted + 1 , and the dst-instance value is 0; when the node itself is positive When the link is running normally, but the communication link between nodes is interrupted, the value of src- instance in the HELLO message sent by the node is equal to the value before the communication link is interrupted, and the dst-instance value is 0. The node receiving the HELLO message determines whether the neighbor node has restarted or only the communication link is interrupted according to the combination of the src-instance value and the dst-instance value carried in the message.
在步骤 106 ~ 107中, 节点 A停止恢复等待定时器, 停止对与经过 节点 A和节点 B的 LSP相关的控制状态信息的自刷新, 并且向节点 B 发送携带有恢复标签( Recovery― label ) 的 Path消息。  In steps 106-107, the node A stops the recovery wait timer, stops the self-refresh of the control state information related to the LSP passing through the node A and the node B, and sends the node B with the recovery label (Recovery-label) Path message.
当节点 A通过来自于节点 B的 HELLO消息确定节点 B重启之后, 根据自身的 PSB中的控制状态信息, 通过 Path消息对经过节点 A和 B 的 LSP发起恢复操作, 并且每一条 LSP对应一条 Path消息。  When the node A determines that the node B is restarted by the HELLO message from the node B, according to the control state information in its own PSB, the path operation is used to initiate a recovery operation on the LSPs passing through the nodes A and B, and each LSP corresponds to a Path message. .
在步骤 108 ~ 109中, 节点 B向节点 A返回针对 Path消息的响应 ( ACK ) 消息, 指明节点 B接收到了来自于节点 A的 Path消息, 并且 该节点 B根据接收到的 Path消息对与该 LSP相关的控制状态信息进行 恢复处理。  In steps 108-109, the Node B returns a response (ACK) message for the Path message to the node A, indicating that the Node B has received the Path message from the Node A, and the Node B is associated with the LSP according to the received Path message. The relevant control status information is restored.
由于节点 B在掉电时 PSB 中的控制状态信息已丟失, 因此节点 B 接收到节点 A发送过来的 Path消息后, 创建对应的 PSB, 并在该 PSB 中记录 Path消息中包含的上游方向的控制状态信息。 另外, 如果经过节 点 B的 LSP还存在下游节点,则节点 B会向该下游节点发送 Path消息, 下游节点也会向节点 B发送 RESV消息, 并且当节点 B接收到该 RESV 消息后, 创建对应的 RSB, 记录 RESV消息中的控制状态信息。 节点 B 中 PSB和 RSB的建立完成, 代表着完成对应 LSP的恢复。  Since the control state information in the PSB is lost when the node B is powered off, the node B receives the Path message sent by the node A, creates a corresponding PSB, and records the upstream direction control included in the Path message in the PSB. status information. In addition, if there is a downstream node through the LSP of the Node B, the Node B sends a Path message to the downstream node, and the downstream node also sends a RESV message to the Node B, and when the Node B receives the RESV message, it creates a corresponding RSB, records the control status information in the RESV message. The establishment of PSB and RSB in Node B is completed, which means the recovery of the corresponding LSP is completed.
在步骤 110中,重启恢复定时器超时,节点 B删除未被恢复的 LSP。 在节点 B中的重启恢复定时器停止计时后,如果节点 B中仍然存在 未被恢复控制状态信息的 LSP, 则这些 LSP被删除。 至此, 完成现有的节点重启处理流程。 In step 110, the restart recovery timer expires, and the node B deletes the LSP that has not been recovered. After the restart recovery timer in Node B stops counting, if there are still LSPs in the Node B that are not restored to the control state information, the LSPs are deleted. At this point, the existing node restart processing flow is completed.
上述流程针对的是单节点重启的情况, 当一条 LSP上存在连续多个 节点发生重启的情况时, 依据上述流程进行处理, 会导致该条 LSP被删 除。  The above process is for the case of single-node restart. When there are multiple consecutive nodes restarting on one LSP, processing according to the above procedure will cause the LSP to be deleted.
具体而言, 假设存在一条依次经过节点 A、 B、 C和 D的 LSP, 节 点 B和 C掉电, 并且上游节点 B先重启, 下游节点 C需要较长时间才 能够重启。 当节点 B重启后, 节点 A根据来自于节点 B的 HELLO消息 确定节点 B重启, 则向节点 B发送 Path消息, 节点 B 4艮据来自于节点 A的 Path消息恢复相应的 PSB。 但是, 由于节点 C尚未重启, 则节点 B 无法接收到来自于节点 C的 RESV消息, 无法对 RSB进行恢复, 从而 无法向节点 A发送 RESV消息对节点 A中的 RSB进行刷新。 这样, 节 点 A自重新接收到来自于节点 B的 HELLO消息后, 就停止了自刷新, 如果节点 A长时间得不到来自于节点 B的 RESV刷新消息,就会导致节 点 A中对应的 RSB被节点 A删除。然后节点 A向节点 B发出路径删除 ( Path— Tear )消息,通知节点 B删除本地的 PSB,进而导致对应的 LSP 被删除。  Specifically, it is assumed that there is an LSP passing through nodes A, B, C, and D in sequence, nodes B and C are powered down, and upstream node B is restarted first, and downstream node C takes a long time to restart. After Node B restarts, Node A determines that Node B restarts according to the HELLO message from Node B, then sends a Path message to Node B, and Node B 4 recovers the corresponding PSB according to the Path message from Node A. However, since node C has not been restarted, node B cannot receive the RESV message from node C, and cannot recover the RSB, so that the RESV message cannot be sent to node A to refresh the RSB in node A. In this way, after receiving the HELLO message from the Node B, the node A stops the self-refresh. If the node A does not get the RESV refresh message from the node B for a long time, the corresponding RSB in the node A is caused. Node A is deleted. Then, Node A sends a Path-Tear message to Node B, informing Node B to delete the local PSB, and the corresponding LSP is deleted.
如果依次经过节点 A、 B、 C和 D的 LSP上节点 B和 C掉电, 并且 下游节点 C先重启, 上游节点 B需要较长时间才能够重启, 那么节点 C 在重启后无法收到来自于节点 B的 Path消息。 因此节点 C上的 PSB无 法被恢复, 而节点 D收到节点 C重启后发来的 HELLO消息后, 会停止 对 PSB的自刷新处理, 从而节点 D上的 PSB会因为长时间收不到 C节 点发来的 Path消息而超时被删除, 并且节点 D在自身的 RSB对应的定 时器超时后, 会向节点 C发送预留删除(Resv— Tear ) 消息, 通知节点 D删除本地的 RSB , 进而导致对应的 LSP被删除。  If nodes B and C on the LSPs passing through nodes A, B, C, and D in turn go down, and the downstream node C restarts first, and the upstream node B takes a long time to restart, node C cannot receive the restart after restarting. Path message of node B. Therefore, the PSB on the node C cannot be recovered, and after receiving the HELLO message sent by the node C after restarting, the node D stops the self-refresh processing of the PSB, so that the PSB on the node D will not receive the C node for a long time. After the Path message is sent, the timeout is deleted, and after the timer corresponding to its own RSB expires, the node D sends a Resv-Tear message to the node C, informing the node D to delete the local RSB, thereby causing the corresponding The LSP is deleted.
可见, 应用现有的节点重启处理方法, 在 LSP上多个节点发生通信 故障时, 无法可靠地恢复该 LSP, 从而使得控制平面的故障影响到传送 平面的业务。 发明内容 本发明的实施例提供了一种故障处理方法, 能够在 LSP上多个节点 出现通信故障时, 可靠地恢复该 LSP。 It can be seen that the existing node restart processing method is used to communicate with multiple nodes on the LSP. In the event of a failure, the LSP cannot be reliably recovered, so that the failure of the control plane affects the traffic of the transmission plane. SUMMARY OF THE INVENTION Embodiments of the present invention provide a fault processing method capable of reliably recovering an LSP when a communication failure occurs in a plurality of nodes on an LSP.
一种故障处理方法, 应用于包括第一节点、 第二节点和至少一个第 三节点的标签交换路径 LSP, 其中, 所述第一节点和第二节点为相邻的 发生通信中断的节点, 所述第一节点重启, 所述第三节点为距离所述重 启的第一节点最近的正常节点。 该方法包括:  A fault processing method is applied to a label switched path LSP including a first node, a second node, and at least one third node, where the first node and the second node are adjacent nodes in which communication interruption occurs. The first node is restarted, and the third node is a normal node that is closest to the restarted first node. The method includes:
当所述第一节点和第二节点之间的通信中断时, 在一段时间内所述 第三节点维持所述 LSP的控制状态信息;  When the communication between the first node and the second node is interrupted, the third node maintains control state information of the LSP for a period of time;
当在所述一段时间内所述第一节点与所述第二节点之间恢复通信 时,所述第一节点、第二节点和第三节点恢复所述 LSP的控制状态信息。  The first node, the second node, and the third node restore control state information of the LSP when communication is resumed between the first node and the second node within the period of time.
本发明的实施例还提供了一种故障处理系统, 包括第一节点、 第二 节点和至少一个第三节点, 所述第一节点和第二节点为相邻的发生通信 中断的节点, 所述第一节点重启, 所述第三节点为距离所述重启的第一 节点最近的正常节点, 其中,  An embodiment of the present invention further provides a fault processing system, including a first node, a second node, and at least one third node, where the first node and the second node are adjacent nodes in which communication interruption occurs, The first node is restarted, and the third node is a normal node that is closest to the restarted first node, where
所述第三节点, 用于在一段时间内, 若所述第一节点和第二节点之 间通信中断, 则维持所述 LSP的控制状态信息, 若所述第一节点与所述 第二节点之间恢复通信, 则恢复所述 LSP的控制状态信息;  The third node is configured to maintain control state information of the LSP if the communication between the first node and the second node is interrupted, if the first node and the second node are Restoring communication, recovering control state information of the LSP;
所述第一节点, 用于在所述一段时间内, 若所述第一节点与所述第 二节点之间恢复通信, 则恢复所述 LSP的控制状态信息;  The first node is configured to restore control state information of the LSP if the first node and the second node resume communication during the period of time;
所述第二节点, 用于在所述一段时间内, 若所述第一节点与所述第 二节点之间恢复通信, 则恢复所述 LSP的控制状态信息。 本发明的实施例还提供了一种标签交换路径 LSP 上的设备, 所述 LSP上包括第一设备、 第二设备和至少一个第三设备, 所述第一设备和 第二设备为相邻的发生通信中断的设备, 所述第一设备重启, 所述第三 设备为距离所述重启的第一设备最近的正常设备, 其中, 当所述设备为 所述第三设备时包括: The second node is configured to restore control state information of the LSP if the first node and the second node resume communication during the period of time. An embodiment of the present invention further provides a device on a label switching path LSP, where the LSP includes a first device, a second device, and at least one third device, where the first device and the second device are adjacent. The device that is interrupted by the communication, the first device is restarted, and the third device is a normal device that is the closest to the restarted first device, where when the device is the third device, the method includes:
第一模块, 用于当所述第一设备和所述第二设备通信中断时, 开始 计时; 当所计时长未超出一段时间时, 若所述第一设备和所述第二设备 恢复通信, 则停止计时;  a first module, configured to start timing when communication between the first device and the second device is interrupted; if the time length does not exceed a period of time, if the first device and the second device resume communication, Stop timing;
第二模块,用于在所述第一模块所计的时长未超出所述一段时间时, 若所述第一设备和所述第二设备通信中断, 则维持所述设备的控制状态 信息, 若所述第一设备和所述第二设备恢复通信, 则恢复所述设备的控 制状态信息。  a second module, if the duration of the first module does not exceed the period of time, if the communication between the first device and the second device is interrupted, maintaining control state information of the device, if When the first device and the second device resume communication, the control state information of the device is restored.
由上述的技术方案可见,依据本发明思想的方法能够在 LSP上多个 节点出现通信故障时, 可靠地恢复该 LSP。 具体而言, 本发明具有如下 有益效果:  It can be seen from the above technical solution that the method according to the inventive concept can reliably recover the LSP when a communication failure occurs on multiple nodes on the LSP. Specifically, the present invention has the following beneficial effects:
当 LSP上由于节点长时间掉电不重启或者节点间链路中断等原因而 造成连续多个节点之间发生通信故障, 并且各个节点间恢复正常的时间 不一致时, LSP上的节点在恢复等待时间内, 维持上述节点内 LSP对应 的控制状态信息进行自刷新; 并且, 如果发生通信故障的所有节点在该 恢复等待时间之内均恢复正常,则对该 LSP上节点内未被恢复的控制状 态信息进行恢复,有效地防止 LSP的异常删除,提高 LSP恢复的可靠性。  When the communication failure occurs between consecutive nodes due to the long-term power failure of the node or the link between the nodes is interrupted, and the time between the restoration of the nodes is inconsistent, the nodes on the LSP are in recovery waiting time. Maintaining the control state information corresponding to the LSP in the node to perform self-refresh; and, if all the nodes in which the communication failure occurs are restored to normal within the recovery waiting time, the control state information in the node on the LSP is not restored. Recovery is performed to effectively prevent abnormal deletion of LSPs and improve the reliability of LSP recovery.
另外,本发明在由于非通信中断故障而导致 LSP无法恢复的情况下, 能够实现 LSP的快速删除, 从而能够快速准确地拆除故障连接, 有利于 快速释放该 LSP占用的网络资源, 提高网络资源的利用率。 附图简要说明 下面将参照附图详细描述本发明的示例性实施例, 使本领域的技术 人员更清楚本发明的特征和优点, 附图中: In addition, in the case that the LSP cannot be recovered due to the non-communication interruption failure, the LSP can be quickly deleted, so that the faulty connection can be quickly and accurately removed, which is beneficial to quickly release the network resources occupied by the LSP and improve network resources. Utilization rate. BRIEF DESCRIPTION OF THE DRAWINGS Exemplary embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
图 1为现有的节点重启处理方法的流程图;  FIG. 1 is a flowchart of a conventional node restart processing method;
图 2为本发明故障处理方法的示例性流程图;  2 is an exemplary flowchart of a fault processing method of the present invention;
图 3为本发明实施例 1、 2和 3中 LSP上节点运行状况的示意图; 图 4为本发明实施例 1中故障处理方法的流程图;  3 is a schematic diagram of an operation state of a node on an LSP according to Embodiments 1, 2, and 3 of the present invention; FIG. 4 is a flowchart of a fault processing method according to Embodiment 1 of the present invention;
图 5为本发明实施例 2中故障处理方法的流程图;  FIG. 5 is a flowchart of a fault processing method according to Embodiment 2 of the present invention; FIG.
图 6为本发明实施例 3中故障处理方法的流程图;  6 is a flowchart of a fault processing method in Embodiment 3 of the present invention;
图 7为本发明实施例 4和 5中 LSP上节点运行状况的示意图; 图 8为本发明实施例 4中故障处理方法的流程图;  7 is a schematic diagram of an operation state of a node on an LSP according to Embodiments 4 and 5 of the present invention; FIG. 8 is a flowchart of a fault processing method according to Embodiment 4 of the present invention;
图 9为本发明实施例 5中故障处理方法的流程图。 实施本发明的方式  FIG. 9 is a flowchart of a fault processing method according to Embodiment 5 of the present invention. Mode for carrying out the invention
为使本发明技术方案和优点更加清楚明白, 以下举具体实施例并参 照附图, 对本发明作进一步详细说明。  In order to make the technical solutions and advantages of the present invention more comprehensible, the present invention will be further described in detail below with reference to the accompanying drawings.
本发明实施例说明了一种故障处理方法, 该方法中, 标签交换路径 ( LSP )上距离重启节点最近的正常节点在恢复等待时间内对该 LSP对 应的控制状态信息进行自刷新。  The embodiment of the present invention describes a fault processing method. In this method, the normal node closest to the restarting node on the label switching path (LSP) self-refreshes the control state information corresponding to the LSP in the recovery waiting time.
图 2是本发明中故障处理方法的示例性流程图。 如图 2所示, 该方 法包括:  2 is an exemplary flow chart of a fault handling method in the present invention. As shown in Figure 2, the method includes:
在步驟 201中, 当 LSP上至少一个节点发生重启时, 重启节点确定 与邻居节点处于控制通道通信中断状态;  In step 201, when at least one node on the LSP restarts, the restarting node determines that the communication channel is interrupted with the neighboring node;
在步骤 202中, LSP上的节点在恢复等待时间内维持所述节点内该 LSP对应的控制状态信息, 并且当重启节点与邻居节点在所述恢复等待 时间内恢复通信时, LSP上的节点对该 LSP对应的控制状态信息进行恢 复。 In step 202, the node on the LSP maintains the node within the recovery waiting time. The control state information corresponding to the LSP, and when the restarting node and the neighboring node resume communication in the recovery waiting time, the node on the LSP recovers the control state information corresponding to the LSP.
在本发明的方法中, 如果在所述恢复等待时间超时后, 重启节点与 邻居节点之间仍然处于控制通道通信中断状态,则所述 LSP上的节点删 除该 LSP对应的控制状态信息。  In the method of the present invention, if the restarting node and the neighboring node are still in the control channel communication interruption state after the recovery waiting time expires, the node on the LSP deletes the control state information corresponding to the LSP.
依据本发明思想的方法包括如下三种处理方式:  The method according to the inventive concept includes the following three processing methods:
第一种方式中, 在确定重启节点与其邻居节点间通信中断后, 重启 节点发出通信中断故障信息, LSP上距离重启节点最近的正常节点接收. 到该通信中断故障信息后, 按照恢复等待时间进行计时, LSP上的节点 在该恢复等待时间内通过维持 LSP对应的控制状态信息来执行恢复等 待处理, 并且在重启节点与邻居节点恢复正常通信时, 停止恢复等待处 理且恢复该 LSP的控制状态信息;  In the first mode, after determining that the communication between the restarting node and its neighboring node is interrupted, the restarting node sends a communication interruption fault information, and the normal node receiving the closest node on the LSP is received. After the communication interruption fault information, according to the recovery waiting time Timing, the node on the LSP performs recovery wait processing by maintaining the control state information corresponding to the LSP during the recovery waiting time, and stops the recovery waiting process and restores the control state information of the LSP when the restart node resumes normal communication with the neighbor node. ;
第二方式中, 在确定重启节点与其邻居节点间通信中断后, 重启节 点发出通信中断故障信息, 并且该重启节点按照恢复等待时间进行计 时, LSP上的节点在该恢复等待时间内通过维持 LSP对应的控制状态信 息来执行恢复等待处理, 并且在重启节点与邻居节点恢复正常通信时, 停止恢复等待处理且恢复该 LSP的控制状态信息;  In the second mode, after determining that the communication between the restart node and the neighbor node is interrupted, the restarting node sends a communication interruption failure information, and the restarting node performs timing according to the recovery waiting time, and the node on the LSP maintains the LSP corresponding to the recovery waiting time. Control state information to perform recovery wait processing, and when the restart node resumes normal communication with the neighbor node, stop restoring the waiting for processing and restoring the control state information of the LSP;
第三方式中, 在确定重启节点与其邻居节点间通信中断后, 重启节 点构造正常的恢复应答消息, 向 LSP上的正常节点发送, 并且该重启节 点按照恢复等待时间进行计时, LSP上的节点在该恢复等待时间内通过 维持 LSP对应的控制状态信息来执行恢复等待处理, 当重启节点与邻居 节点恢复正常通信时, 停止恢复等待处理且恢复该 LSP 的控制状态信 息。 .  In the third mode, after determining that the communication between the restarting node and the neighboring node is interrupted, the restarting node constructs a normal recovery response message, and sends the message to the normal node on the LSP, and the restarting node performs timing according to the recovery waiting time, and the node on the LSP is During the recovery waiting time, the recovery waiting process is performed by maintaining the control state information corresponding to the LSP. When the restarting node resumes normal communication with the neighboring node, the recovery of the waiting process is resumed and the control state information of the LSP is restored. .
重启节点与邻居节点处于控制通道通信中断状态的原因包括: 邻居 节点掉电后尚未重启或者重启节点与邻居节点间的通信链路中断而邻 居节点正常运行。 上述任意一种原因导致重启节点与邻居节点处于控制 通道中断状态时, 依据本发明思想的多节点故障处理方法所执行的操作 都相同。 The reasons for restarting the node and the neighbor node in the control channel communication interruption state include: After the node is powered off, it has not been restarted or the communication link between the restart node and the neighbor node is interrupted and the neighbor node is running normally. When any of the above reasons causes the restart node and the neighbor node to be in the control channel interrupt state, the operations performed by the multi-node fault processing method according to the inventive concept are the same.
下面以重启节点的邻居节点长时间不重启为例, 通过五个实施例, 对本发明中的故障处理方法进行说明。  In the following, the fault processing method in the present invention will be described by taking five embodiments as a case where the neighboring node of the restarting node is not restarted for a long time.
实施例 1  Example 1
本实施例中以依次经过节点 A、 B、 C和 D的 LSP为例进行说明。 图 3示出了本实施例中 LSP上节点运行状况。如图 3所示, 节点 B和节 点 C掉电, 并且节点 B发生重启、 节点 C长时间无法重启。 节点 B为 重启节点, 节点 C为节点 B的邻居节点, 节点 A为距离重启节点 B最 近的上游方向的正常节点,节点 D为距离重启节点 B最近的下游方向的 正常节点。 这里依据方式 1 , 在 LSP上距离重启节点最近的正常节点, 即节点 A上设置恢复等待时间。  In this embodiment, an LSP passing through nodes A, B, C, and D in sequence is taken as an example for description. Fig. 3 shows the operation status of the node on the LSP in this embodiment. As shown in Figure 3, Node B and Node C are powered down, and Node B restarts and Node C cannot be restarted for a long time. Node B is the restart node, node C is the neighbor node of node B, node A is the normal node in the upstream direction from the restart node B, and node D is the normal node in the downstream direction closest to the restart node B. Here, according to the mode 1, the recovery waiting time is set on the LSP on the normal node closest to the restarting node, that is, the node A.
图 4是本实施例中故障处理方法的流程图。 参见图 4, 该方法包括: 在步骤 401 402中, 在节点 B上电重启时, 节点 B向节点 A和节 点 C发送 HELLO消息, 并且节点 B通过 HELLO消息机制获知节点 B 与节点 C处于控制通道通信中断状态。  4 is a flow chart of a fault processing method in this embodiment. Referring to FIG. 4, the method includes: In step 401 402, when the node B is powered on and restarted, the node B sends a HELLO message to the node A and the node C, and the node B learns that the node B and the node C are in the control channel through the HELLO message mechanism. Communication interruption status.
这里节点 B所发出的 HELLO消息中, src-instance值是掉电前的值 + 1 , dst-instance值为 0 , 以便节点 A和节点 C知晓节点 B的重启。  In the HELLO message sent by node B, the src-instance value is the value before power-down + 1 and the dst-instance value is 0, so that node A and node C know the restart of node B.
节点 B在向节点 C发出 HELLO消息之后, 接收不到节点 C对该 HELLO消息的响应,因此判定节点 B与节点 C之间处于通信中断状态。 造成两节点通信中断的原因不仅可以是节点 C掉电尚未重启,还可以是 节点 B和节点 C之间的通信链路出现故障。  After sending a HELLO message to the node C, the node B does not receive the response of the node C to the HELLO message, and therefore determines that the communication between the node B and the node C is in a communication interruption state. The reason for the interruption of the communication between the two nodes is not only that the node C is powered off but has not been restarted, but also that the communication link between the node B and the node C is faulty.
在步驟 403 ~ 404中, 节点 A向节点 B发送恢复消息, 通知节点 B 恢复 LSP对应的控制状态信息,节点 B根据接收到的恢复消息执行恢复 操作, 并且将节点 B和节点 C之间通信中断故障信息通知给节点 A。 In steps 403-404, node A sends a recovery message to node B, notifying node B. The control state information corresponding to the LSP is restored, the node B performs a recovery operation according to the received recovery message, and notifies the node A of the communication interruption failure information between the node B and the node C.
这里, 节点 A向节点 B发送诸如携带有恢复标签的 Path消息之类 的恢复消息, 以便节点 B根据接收到的恢复消息,建立 PSB并将恢复消 息中携带的状态控制信息保存在该 PSB中,以实现本地控制状态信息的 恢复。  Here, the node A sends a recovery message such as a Path message carrying the recovery label to the Node B, so that the Node B establishes the PSB according to the received recovery message and saves the state control information carried in the recovery message in the PSB. To achieve the recovery of local control status information.
在正常情况下, 节点 B在完成恢复 PSB之后, 向节点 C发送恢复 消息。 但是, 由于节点 C 尚未重启, 使得节点 B 中的 RSB 无法按照 RFC3473中描述的正常协议流程被恢复,所以节点 B沿接收到的恢复消 息相反的方向, 向节点 A通知节点 B和节点 C之间的通信中断故障信 在步骤 405中, 节点 A启动按照恢复等待时间进行计时的恢复等待 定时器, 通过自刷新处理来维持 LSP对应的 RSB控制状态信息。  Under normal circumstances, Node B sends a recovery message to Node C after completing the recovery PSB. However, since node C has not been restarted, so that the RSB in Node B cannot be recovered according to the normal protocol flow described in RFC3473, Node B notifies Node A of the relationship between Node B and Node C in the opposite direction of the received recovery message. In the communication interruption fault signal, in step 405, the node A starts the recovery waiting timer that is timed according to the recovery waiting time, and maintains the RSB control state information corresponding to the LSP through the self-refresh processing.
节点 A通过来自于节点 B的通信中断故障信息, 确定 LSP暂时无 法恢复, 则启动按照恢复等待时间进行计时的恢复等待定时器, 并在该 定时器计时期间对节点 A中该 LSP对应的 RSB中的控制状态信息进行 自刷新, 以防止该 RSB对应的定时器超时而删除该 RSB。 这里的恢复 等待定时器可以预先设置并在本步骤中启动, 也可以在本步骤中设置并 启动。  The node A determines that the LSP is temporarily unrecoverable by the communication interruption information from the node B, and then starts the recovery waiting timer that is timed according to the recovery waiting time, and in the RSB corresponding to the LSP in the node A during the timer counting period. The control state information is self-refreshed to prevent the timer corresponding to the RSB from timing out and deleting the RSB. The recovery wait timer here can be preset and started in this step, or it can be set and started in this step.
此外, 节点 C的下游邻居节点 D由于与节点 C也处于通信中断状 态,因此在节点 D中的恢复等待定时器超时之前对该 LSP对应的控制状 态信息进行自刷新, 例如对节点 D中该 LSP对应的 PSB进行自刷新。  In addition, the downstream neighbor node D of the node C is also in a communication interruption state with the node C. Therefore, the control state information corresponding to the LSP is self-refreshed before the recovery waiting timer in the node D times out, for example, the LSP in the node D. The corresponding PSB performs self-refresh.
在步骤 406 ~ 407中, 当节点 C在节点 A中的恢复等待定时器超时 之前重启, 则节点 C向节点 B和节点 D发送 HELLO消息, 指明节点 C 上电重启。 在步骤 408 ~ 412中, 节点 B向节点 C发送恢复消息, 节点 C完成 对控制状态信息的恢复后,向节点 D发送恢复消息, 节点 D进入正常的 控制状态信息刷新处理; 节点 D通过节点 C、 节点 B向节点 A发送恢 复应答消息, 节点 A停止恢复等待定时器, 进入正常的刷新处理。 In steps 406-407, when node C restarts before the recovery wait timer in node A expires, node C sends a HELLO message to node B and node D, indicating that node C is powered on and restarted. In steps 408 ~ 4 12, the node B sends a recovery message to the node C. After the node C completes the recovery of the control state information, the node C sends a recovery message to the node D, and the node D enters a normal control state information refresh process; the node D passes the node. C. Node B sends a recovery response message to node A, and node A stops the recovery waiting timer and enters a normal refresh process.
节点 C接收到恢复消息后, 判断自身是否为 LSP的目的节点。在节 点 C是目的节点的情况下,对节点 C中该 LSP对应的 PSB和 RSB中的 控制状态信息进行恢复,并向节点 B发送诸如 RESV消息之类的恢复应 答消息; 节点 B根据接收到的恢复应答消息,恢复该节点中 LSP对应的 RSB, 并向节点 A发送恢复应答消息; 节点 A接收到恢复应答消息后, 停止自身的恢复等待定时器并开始正常的控制状态信息刷新处理。  After receiving the recovery message, Node C determines whether it is the destination node of the LSP. In the case that the node C is the destination node, the control state information in the PSB and the RSB corresponding to the LSP in the node C is restored, and a recovery response message such as a RESV message is sent to the node B; The response message is restored, the RSB corresponding to the LSP in the node is restored, and a recovery response message is sent to the node A. After receiving the recovery response message, the node A stops its recovery waiting timer and starts normal control state information refresh processing.
在节点 C不是目的节点的情况下, 对节点 C中 LSP对应的 PSB中 的控制状态信息进行恢复, 并向下游方向的正常节点 D转发恢复消息; 节点 D接收到恢复消息后, 对 LSP对应的 PSB进行同步, 并开始正常 的控制状态信息刷新处理; 然后节点 D沿上游方向逐跳发送恢复应答消 息, 接收到恢复应答消息的节点 C和节点 B对 LSP对应的 RSB中的控 制状态信息进行恢复, 节点 A在接收到恢复应答消息后, 停止自身的恢 复等待定时器并开始正常的控制状态信息刷新处理。  In the case that the node C is not the destination node, the control state information in the PSB corresponding to the LSP in the node C is restored, and the recovery message is forwarded to the normal node D in the downstream direction; after receiving the recovery message, the node D corresponds to the LSP. The PSB synchronizes and starts the normal control state information refresh processing; then the node D sends the recovery response message hop by hop in the upstream direction, and the node C and the node B that received the recovery response message restore the control state information in the RSB corresponding to the LSP. After receiving the resume response message, node A stops its own recovery wait timer and starts normal control state information refresh processing.
如果由于节点间链路中断而导致节点 B与节点 C通信中断,则在节 点 C为目的节点的情况下, 节点 C接收到恢复消息后, 恢复自身的控制 状态信息并开始正常的刷新处理。  If the communication between the node B and the node C is interrupted due to the interruption of the link between the nodes, when the node C is the destination node, after receiving the recovery message, the node C restores its own control state information and starts normal refresh processing.
至此, 完成本实施例中节点故障处理流程。  So far, the node failure processing flow in this embodiment is completed.
以上为节点 C在节点 A中的恢复等待定时器超时之前重启的情况, 当恢复等待定时器的计时时间到达预先设置的恢复等待时间, 即该定时 器停止计时后, 节点 C仍未重启, 则节点 A发起删除未被恢复的 LSP 的处理, 发送删除 LSP的消息, 节点 B收到所述删除 LSP的消息后删 除该节点中该 LSP对应的控制状态信息。节点 D由于本地的重启定时时 间超时而删除该节点内 LSP对应的控制状态信息,或者, 即使在 A节点 所设置的恢复等待定时器超时后的某个时间节点 C重启,节点 D也会因 为一直接收不到上游节点 C发来的恢复消息而将该节点 D上 LSP对应 的控制状态信息删除。 The above is the case where the node C restarts before the recovery wait timer in the node A times out. When the recovery time of the recovery wait timer reaches the preset recovery waiting time, that is, after the timer stops counting, the node C still does not restart, then The node A initiates the process of deleting the LSP that is not restored, and sends a message for deleting the LSP. After receiving the message of deleting the LSP, the node B deletes the message. The control status information corresponding to the LSP in the node. Node D deletes the control status information corresponding to the LSP in the node due to the local restart timing timeout, or even if the node C restarts at a certain time after the recovery wait timer set by the A node expires, the node D will always The recovery message sent by the upstream node C is not received, and the control state information corresponding to the LSP on the node D is deleted.
使用上述流程, 当 LSP上连续多个节点发生通信故障, 并且各个节 点恢复正常的时间不一致时, 距离重启节点最近的正常节点在所设置的 恢复等待时间内 ,维持该节点内 LSP对应的控制状态信息,进行自刷新; 并且, 如果发生通信故障的所有节点在该恢复等待时间之内均恢复正 常, 则对该 LSP上节点内未被恢复的控制状态信息进行恢复, 有效地防 止 LSP的异常删除, 提高 LSP恢复的可靠性。 如果 LSP上发生其它的 非通讯故障, 例如上游节点中已经删除了该 LSP的控制状态信息, 下游 节点会快速删除该节点中的 LSP控制状态信息, 则本实施例可以按照 RFC3473协议中描述的方法将该 LSP删除,从而能够快速准确地拆除故 障连接, 有利于快速释放该 LSP占用的网络资源, 提高网络资源的利用 率。  When the communication failure occurs on multiple nodes on the LSP, and the time when each node returns to normal is inconsistent, the normal node closest to the restart node maintains the control state corresponding to the LSP in the node within the set recovery waiting time. The information is self-refreshed; and, if all the nodes in which the communication failure occurs are restored to normal within the recovery waiting time, the unrecovered control state information in the node on the LSP is restored, and the abnormal deletion of the LSP is effectively prevented. Improve the reliability of LSP recovery. If other non-communication faults occur on the LSP, for example, the control state information of the LSP has been deleted in the upstream node, and the downstream node quickly deletes the LSP control state information in the node, this embodiment may follow the method described in the RFC3473 protocol. The LSP is deleted, so that the faulty connection can be quickly and accurately removed. This facilitates the rapid release of network resources occupied by the LSP and improves the utilization of network resources.
实施例 2  Example 2
本实施例中仍以图 3所示的 LSP为例, 节点 B发生重启、 节点 C 长时间无法重启。 这里依据方式 2在重启节点上, 即节点 B上设置恢复 等待时间, 并且向节点 A发送通信中断故障信息。  In this embodiment, the LSP shown in Figure 3 is still taken as an example. Node B restarts and node C fails to restart for a long time. Here, according to mode 2, the recovery waiting time is set on the restart node, that is, the node B, and the communication interruption failure information is sent to the node A.
图 5是本实施例中故障处理方法的流程图。 参见图 5, 该方法包括: 在步骤 501 502中, 当节点 B上电重启时, 节点 B向节点 A和节 点 C发送 HELLO消息, 并且节点 B通过 HELLO消息机制获知节点 B 和节点 C处于控制通道通信中断状态。  FIG. 5 is a flowchart of a fault processing method in the embodiment. Referring to FIG. 5, the method includes: In step 501 502, when the node B is powered on and restarted, the node B sends a HELLO message to the node A and the node C, and the node B learns that the node B and the node C are in the control channel through the HELLO message mechanism. Communication interruption status.
在节点 B所发出的 HELLO消息中, src-instance值是掉电前的值 + 1 , dst-instance值为 0 , 以便节点 A和节点 C知晓节点 B的重启。 In the HELLO message sent by Node B, the src-instance value is the value before power-down + 1 . The dst-instance value is 0, so that node A and node C know the restart of node B.
节点 B在向节点 C发出 HELLO消息之后, 接收不到节点 C对该 HELLO消息的响应,因此判定节点 B与节点 C之间处于通信中断状态。 造成两节点通信中断的原因不仅可以是节点 C掉电尚未重启,还可以是 节点 B和节点 C之间的通信链路出现故障。  After sending a HELLO message to the node C, the node B does not receive the response of the node C to the HELLO message, and therefore determines that the communication between the node B and the node C is in a communication interruption state. The reason for the interruption of the communication between the two nodes is not only that the node C is powered off but has not been restarted, but also that the communication link between the node B and the node C is faulty.
在步骤 503 ~ 505中, 节点 A向节点 B发送恢复消息, 通知节点 B 恢复 LSP对应的控制状态信息,节点 B根据接收到的恢复消息执行恢复 操作, 然后节点 B启动恢复等待定时器, 并且节点 B将自身与节点 C 之间的通信中断故障信息通知给节点 A, 节点 A对 LSP对应的 RSB控 制状态信息进行自刷新处理。  In steps 503-505, node A sends a recovery message to node B, informing node B to restore control state information corresponding to the LSP, node B performs a recovery operation according to the received recovery message, and then node B starts a recovery wait timer, and the node B notifies the node A of the communication interruption failure information between itself and the node C, and the node A performs self-refresh processing on the RSB control state information corresponding to the LSP.
这里, 节点 A向节点 B发送诸如携带有恢复标签的 Path消息之类 的恢复消息, 以便节点 B根据接收到的恢复消息,建立 PSB并将恢复消 息中携带的状态控制信息保存在该 PSB中,以实现本地控制状态信息的 恢复。  Here, the node A sends a recovery message such as a Path message carrying the recovery label to the Node B, so that the Node B establishes the PSB according to the received recovery message and saves the state control information carried in the recovery message in the PSB. To achieve the recovery of local control status information.
由于节点 C未重启, 节点 B中的 RSB无法按照 RFC3473中描述的 正常协议流程被恢复, 因此节点; B在完成 PSB的恢复之后,启动按照恢 复等待时间进行计时的恢复等待定时器,并将节点 B与节点 C之间的通 信中断故障信息通知给节点 A。节点 A接收到来自于节点 B的通信中断 故障信息后, 对该节点中 LSP对应的 RSB控制状态信息进行自刷新。 这里的恢复等待定时器可以预先设置并在本步骤中启动, 也可以在本步 骤中设置并启动。  Since the node C is not restarted, the RSB in the Node B cannot be recovered according to the normal protocol procedure described in RFC3473, so the node; B, after completing the recovery of the PSB, starts a recovery wait timer that is timed according to the recovery waiting time, and the node is The communication interruption failure information between B and node C is notified to node A. After receiving the communication interruption fault information from the node B, the node A performs self-refresh on the RSB control state information corresponding to the LSP in the node. The recovery wait timer here can be preset and started in this step, or it can be set and started in this step.
另外, 节点 C的下游邻居节点 D由于与节点 C也处于通信中断状 态,因此在节点 D中的恢复等待定时器超时之前对该 LSP对应的控制状 态信息进行自刷新。  In addition, the downstream neighbor node D of the node C is in a communication interruption state with the node C. Therefore, the control state information corresponding to the LSP is self-refreshed before the recovery waiting timer in the node D times out.
在步驟 506 ~ 507中, 当节点 C在节点 A中的恢复等待定时器超时 之前重启, 则节点 C向节点 B和节点 D发送 HELLO消息, 指明节点 C 上电重启。 In steps 506-507, the recovery wait timer expires when node C is in node A. Before restarting, node C sends a HELLO message to node B and node D, indicating that node C is powered on and restarted.
在步骤 508 ~ 512中, 节点 B停止恢复等待定时器, 进入正常的刷 新处理, 并向节点 C发送恢复消息, 节点 C完成对控制状态信息的恢复 后, 向节点 D发送恢复消息, 节点 D进入正常的刷新处理; 节点 D通 过节点 C、 节点 B向节点 A发送恢复应答消息。 In steps 50 8 - 512, the node B stops the recovery wait timer, enters the normal refresh process, and sends a recovery message to the node C. After completing the recovery of the control state information, the node C sends a recovery message to the node D, node D. The normal refresh process is entered; the node D sends a recovery response message to the node A through the node C and the node B.
节点 B在通过 HELLO机制获知节点 C重启后, 停止自身的恢复等 待定时器。节点 C接收到恢复消息后 ,判断自身是否为 LSP的目的节点。 在节点 C是目的节点的情况下, 对节点 C中该 LSP对应的 PSB和 RSB 中的控制状态信息进行恢复,并向节点 B发送诸如 RES V消息之类的恢 复应答消息; 节点 B才艮据接收到的恢复应答消息, 恢复该节点中 LSP 对应的 RSB, 并向节点 A发送恢复应答消息; 节点 A接收到恢复应答 消息后同步自身 RSB 中的控制状态信息并开始执行正常的控制状态信 息刷新处理。  After the node B learns that the node C is restarted through the HELLO mechanism, it stops its own recovery wait timer. After receiving the recovery message, Node C determines whether it is the destination node of the LSP. In the case that the node C is the destination node, the control state information in the PSB and the RSB corresponding to the LSP in the node C is restored, and a recovery response message such as a RES V message is sent to the node B; Receiving the recovery response message, restoring the RSB corresponding to the LSP in the node, and sending a recovery response message to the node A; after receiving the recovery response message, the node A synchronizes the control state information in the RSB and starts to perform the normal control state information refresh. deal with.
在节点 C不是目的节点的情况下, 对节点 C中 LSP对应的 PSB中 的控制状态信息进行恢复, 并向下游方向的正常节点 D转发恢复消息; 节点 D接收到恢复消息后, 对 LSP对应的 PSB进行同步, 并开始正常 的控制状态信息刷新处理; 然后节点 D沿上游方向逐跳发送恢复应答消 息, 接收到恢复应答消息的节点 C和节点 B对 LSP对应的 RSB中的控 制状态信息进行恢复, 节点 A在接收到恢复应答消息后同步自身 RSB 中的控制状态信息并开始正常的控制状态信息刷新处理。  In the case that the node C is not the destination node, the control state information in the PSB corresponding to the LSP in the node C is restored, and the recovery message is forwarded to the normal node D in the downstream direction; after receiving the recovery message, the node D corresponds to the LSP. The PSB synchronizes and starts the normal control state information refresh processing; then the node D sends the recovery response message hop by hop in the upstream direction, and the node C and the node B that received the recovery response message restore the control state information in the RSB corresponding to the LSP. After receiving the resume response message, the node A synchronizes the control state information in its own RSB and starts normal control state information refresh processing.
至此, 完成本实施例中故障处理流程。  So far, the fault processing flow in this embodiment is completed.
以上为节点 C在节点 B中的恢复等待定时器超时之前重启的情况, 当恢复等待定时器的计时时间到达预先设置的恢复等待时间, 即该恢复 等待定时器停止计时后, 节点 C仍未重启, 则节点 B向节点 A方向发 起删除未被恢复的 LSP的处理, 并发送删除 LSP的消息。 节点 A收到 节点 B的删除 LSP的消息后, 删除该节点中 LSP对应的控制状态信息。 节点 D由于本地的重启定时时间超时而删除该节点内 LSP对应的控制状 态信息, 或者, 即使在节点 B所设置的恢复等待定时器超时后的某个时 间节点 C重启, 节点 D也会因为一直接收不到上游节点 C发来的恢复 消息而将该节点 D上 LSP对应的控制状态信息删除。 The above is the case where the node C restarts before the recovery wait timer in the node B times out. When the recovery time of the recovery wait timer reaches the preset recovery waiting time, that is, after the recovery wait timer stops counting, the node C has not restarted. , node B sends to node A The process of deleting the unrecovered LSP is deleted, and the message for deleting the LSP is sent. After receiving the message of deleting the LSP of the node B, the node A deletes the control state information corresponding to the LSP in the node. Node D deletes the control state information corresponding to the LSP in the node due to the local restart timing timeout, or even if the node C restarts at a certain time after the recovery wait timer set by the node B expires, the node D will always The recovery message sent by the upstream node C is not received, and the control state information corresponding to the LSP on the node D is deleted.
使用上述流程, 当 LSP上连续多个节点发生通信故障, 并且各个节 点恢复正常的时间不一致时, 重启节点在恢复等待时间内, 维持该节点 内 LSP对应的控制状态信息; 并且, 如果发生通信故障的所有节点在该 恢复等待时间之内均恢复正常,则对该 LSP上的节点内未被恢复的控制 状态信息进行恢复,有效地防止 LSP的异常删除,提高 LSP恢复的可靠 性。 如果 LSP上发生其它非通讯故障, 则本实施例可以按照 RFC3473 协议中描述的方法将该 LSP快速删除,从而能够快速准确地拆除故障连 接, 有利于快速释放该 LSP占用的网络资源, 提高网络资源的利用率。  When the communication failure occurs on multiple nodes on the LSP, and the time when each node returns to normal is inconsistent, the restarting node maintains the control state information corresponding to the LSP in the node within the recovery waiting time; and, if a communication failure occurs All the nodes in the recovery time are restored to normal, and the unrecovered control state information of the node on the LSP is restored, which effectively prevents the LSP from being deleted abnormally and improves the reliability of the LSP recovery. If other non-communication faults occur on the LSP, the LSP can be quickly deleted according to the method described in the RFC3473 protocol, so that the faulty connection can be quickly and accurately removed, which facilitates the rapid release of network resources occupied by the LSP and improves network resources. Utilization.
实施例 3  Example 3
本实施例中仍以图 3所示的 LSP为例, 节点 B发生重启、 节点 C 长时间无法重启。 这里依据方式 3在重启节点上, 即节点 B上设置恢复 等待时间并构造正常的恢复应答消息向节点 A发送。  In this embodiment, the LSP shown in Figure 3 is still taken as an example. Node B restarts and node C fails to restart for a long time. Here, according to mode 3, the recovery waiting time is set on the restarting node, that is, the recovery waiting time is set on the node B, and a normal recovery response message is sent to the node A.
图 6示出了本实施例中故障处理方法的流程图。 参见图 6, 该方法 包括:  Fig. 6 is a flow chart showing the method of fault handling in this embodiment. See Figure 6. The method includes:
在步骤 601 ~ 602中, 当节点 B上电重启时, 节点 B向节点 A和节 点 C发送 HELLO消息, 并且节点 B通过 HELLO消息机制获知节点 B 和节点 C之间处于控制通道通信中断状态。  In steps 601-602, when Node B is powered on and restarted, Node B sends a HELLO message to Node A and Node C, and Node B learns that the control channel communication is interrupted between Node B and Node C through the HELLO message mechanism.
节点 B在向节点 C发出 HELLO消息之后, 接收不到节点 C对该 HELLO消息的响应,因此判定节点 B与节点 C之间处于通信中断状态。 造成两节点通信中断的原因不仅可以是节点 C掉电尚未重启,还可以是 节点 B和节点 C之间的通信链路出现故障。 After transmitting the HELLO message to the node C, the node B does not receive the response of the node C to the HELLO message, and therefore determines that the node B and the node C are in a communication interruption state. The reason for the interruption of the communication between the two nodes is not only that the node C is powered off but not yet restarted, but also that the communication link between the node B and the node C is faulty.
在步骤 603 605中, 节点 A向节点 B发送恢复消息, 通知节点 B 恢复 LSP对应的控制状态信息,节点 B根据接收到的恢复消息执行恢复 操作, 然后节点 B启动恢复等待定时器, 并且根据 A节点发来的恢复消 息以及节点 B中保存的传送平面信息在该节点内创建 PSB,并构造正常 的恢复应答消息后向 A节点发送, 节点 A收到恢复应答消息后对 LSP 对应的 RSB控制状态信息进行同步并开始正常的刷新处理。  In step 603 605, node A sends a recovery message to node B, informing node B to restore control state information corresponding to the LSP, and node B performs a recovery operation according to the received recovery message, and then node B starts a recovery wait timer, and according to A The recovery message sent by the node and the transmission plane information saved in the node B create a PSB in the node, and construct a normal recovery response message, and then send it to the node A. After receiving the recovery response message, the node A receives the RSB control status corresponding to the LSP. The information is synchronized and normal refresh processing begins.
这里, 节点 A向节点 B发送诸如携带有恢复标签的 Path消息之类 的恢复消息, 以便节点 B根据接收到的恢复消息,建立 PSB并将恢复消 息中携带的状态控制信息保存在该 PSB中,以实现本地控制状态信息的 恢复。  Here, the node A sends a recovery message such as a Path message carrying the recovery label to the Node B, so that the Node B establishes the PSB according to the received recovery message and saves the state control information carried in the recovery message in the PSB. To achieve the recovery of local control status information.
由于节点 C尚未重启, 节点 B中的 RSB虽然能够向上游节点 A方 向正常发送诸如 RESV消息之类的刷新消息,但由于节点 B接收不到节 点 C发来的诸如 RES V消息之类的刷新消息,所以节点 B中只保存了部 分的 RSB控制状态信息。 因此, 节点 B在完成 PSB的恢复并完成部分 RSB的恢复后, 启动按照恢复等待时间进行计时的恢复等待定时器, 并 对节点 B中 LSP对应的 RSB控制状态信息进行自刷新处理。 同时, 节 点 B构造正常的恢复应答消息发送给节点 A。 此种操作的目的在于, 只 有重启节点处, 即节点 B处, 知晓与节点 C处于通信中断状态, 而 LSP 上的其他正常节点按照正常的方式对 LSP进行刷新。  Since the node C has not been restarted, the RSB in the Node B can normally send a refresh message such as a RESV message to the upstream node A, but the node B does not receive a refresh message such as a RES V message sent by the node C. Therefore, only part of the RSB control status information is saved in Node B. Therefore, after completing the recovery of the PSB and completing the recovery of the partial RSB, the Node B starts the recovery waiting timer that is timed according to the recovery waiting time, and performs self-refresh processing on the RSB control state information corresponding to the LSP in the Node B. At the same time, node B constructs a normal recovery response message and sends it to node A. The purpose of this operation is to restart the node, that is, at the node B, and know that the communication with the node C is in a state of communication interruption, and the other normal nodes on the LSP refresh the LSP in the normal manner.
另外, 节点 C的下游邻居节点 D由于与节点 C也处于通信中断状 态,因此在节点 D中的恢复等待定时器超时之前对该 LSP对应的控制状 态信息进行自刷新, 例如对节点 D该 LSP对应的 PSB进行自刷新。  In addition, the downstream neighbor node D of the node C is also in a communication interruption state with the node C. Therefore, the control state information corresponding to the LSP is self-refreshed before the recovery wait timer in the node D times out, for example, the node LSP corresponds to the LSP. The PSB performs a self-refresh.
在步骤 606 ~ 607中, 节点 C在节点 B中的恢复等待定时器超时之 前重启, 向节点 B和节点 D发送 HELLO消息, 指明节点 C上电重启。 在步骤 60δ ~ 611 中, 节点 B停止恢复等待定时器, 进入正常的刷 新处理, 并向节点 C发送恢复消息, 节点 C完成对控制状态信息的恢复 后, 向节点 D发送恢复消息, 节点 D进入正常的刷新处理; 节点 D向 节点 C和 B发送恢复应答消息, 节点 C和 B根据恢复应答消息进行恢 复。 In steps 606-607, the recovery wait timer of node C in node B times out. Before restarting, send a HELLO message to Node B and Node D, indicating that Node C is powered on and restarted. In step 60δ ~ 6 11 , the node B stops the recovery waiting timer, enters the normal refresh process, and sends a recovery message to the node C. After completing the recovery of the control state information, the node C sends a recovery message to the node D, node D. The normal refresh process is entered; node D sends a recovery response message to nodes C and B, and nodes C and B recover according to the recovery response message.
节点 B在通过 HELLO机制获知节点 C重启后, 停止自身的恢复等 待定时器。 节点 C接收到来自节点 B的恢复消息后, 判断自身是否为 LSP的目的节点。  After the node B learns that the node C is restarted through the HELLO mechanism, it stops its own recovery wait timer. After receiving the recovery message from Node B, Node C determines whether it is the destination node of the LSP.
在节点 C是目的节点的情况下, 对节点 C中该 LSP对应的 PSB和 RSB中的控制状态信息进行恢复,并向节点 B发送诸如 RESV消息之类 的恢复应答消息; 节点 B 根据接收到的恢复应答消息, 恢复该节点中 LSP对应的 RSB中的控制状态信息, 并和节点 C开始进行正常的控制 状态信息刷新处理。  In the case that the node C is the destination node, the control state information in the PSB and the RSB corresponding to the LSP in the node C is restored, and a recovery response message such as a RESV message is sent to the node B; The response message is restored, the control state information in the RSB corresponding to the LSP in the node is restored, and the normal control state information refresh processing is started with the node C.
在节点 C不是目的节点的情况下, 对节点 C中 LSP对应的 PSB中 的控制状态信息进行恢复, 并向下游方向的正常节点 D转发恢复消息; 节点 D接收到恢复消息后, 对 LSP对应的 PSB进行恢复, 并开始正常 的控制状态信息刷新处理; 然后节点 D沿上游方向逐跳发送恢复应答消 息, 接收到恢复应答消息的节点 C对 LSP对应的 RSB中的控制状态信 息进行恢复, 节点 B在接收到恢复应答消息后, 恢复自身的控制状态信 息并在节点 B和节点 C之间开始正常的控制状态信息刷新处理。  In the case that the node C is not the destination node, the control state information in the PSB corresponding to the LSP in the node C is restored, and the recovery message is forwarded to the normal node D in the downstream direction; after receiving the recovery message, the node D corresponds to the LSP. The PSB recovers and starts normal control state information refresh processing; then node D sends a recovery response message hop by hop in the upstream direction, and node C that receives the recovery response message recovers control state information in the RSB corresponding to the LSP, node B After receiving the resume response message, it restores its own control state information and starts normal control state information refresh processing between the node B and the node C.
至此, 完成本实施例中故障处理流程。  So far, the fault processing flow in this embodiment is completed.
以上为节点 C在节点 B中的恢复等待定时器超时之前重启的情况, 当恢复等待定时器的计时时间到达设置的恢复等待时间, 即该定时器停 止计时后, 节点 C仍未重启, 则节点 B向节点 A方向发起删除未被恢 复的 LSP的处理,并向 A发送删除 LSP的消息。节点 A收到该删除 LSP 的消息后,删除该节点中与该 LSP对应的控制状态信息。节点 D由于本 地的重启定时时间超时而删除本地与该 LSP对应的控制状态信息, 或 者,即使在节点 B所设置的恢复等待定时器超时后的某个时间节点 C重 启,节点 D也会因为一直收不到上游节点 C发来的恢复消息而导致节点 D与该 LSP对应的控制状态 ί言息被删除。 The above is the case where the node C restarts before the recovery wait timer in the node B times out. When the recovery time of the recovery wait timer reaches the set recovery waiting time, that is, after the timer stops counting, the node C still does not restart, then the node B initiates deletion to the node A direction and is not restored. The processing of the complex LSP is sent to the A to delete the LSP message. After receiving the message of deleting the LSP, the node A deletes the control state information corresponding to the LSP in the node. Node D deletes the local control state information corresponding to the LSP due to the local restart timing timeout, or even if the node C restarts at a certain time after the recovery wait timer set by the node B expires, the node D will always The recovery message sent by the upstream node C is not received, and the control state corresponding to the LSP of the node D is deleted.
使用上述流程, 当 LSP上连续多个节点发生通信故障, 并且各个节 点恢复正常的时间不一致时, 重启节点在恢复等待时间内, 维持该节点 内 LSP对应的控制状态信息; 并且, 如果发生通信故障的所有节点在该 恢复等待时间之内均恢复正常,则对该 LSP上的节点内未被恢复的控制 状态信息进行恢复,有效地防止 LSP的异常删除,提高 LSP恢复的可靠 性。 如果 LSP上发生其它非通讯故障, 则本实施例可以按照 RFC3473 协议中描述的方法将该 LSP快速删除,从而能够快速准确地拆除故障连 接, 有利于快速 #;¾: LSP占用的网络资源, 提高网络资源的利用率。  When the communication failure occurs on multiple nodes on the LSP, and the time when each node returns to normal is inconsistent, the restarting node maintains the control state information corresponding to the LSP in the node within the recovery waiting time; and, if a communication failure occurs All the nodes in the recovery time are restored to normal, and the unrecovered control state information of the node on the LSP is restored, which effectively prevents the LSP from being deleted abnormally and improves the reliability of the LSP recovery. If other non-communication faults occur on the LSP, the LSP can be quickly deleted according to the method described in the RFC3473 protocol, so that the faulty connection can be quickly and accurately removed, which is beneficial to the fast network resources occupied by the LSP. Utilization of network resources.
实施例 4  Example 4
本实施例中以依次经过节点 A、 B、 C和 D的 LSP为例进行说明。 图 7示出了本实施例中节点运行状况示意图。 如图 7所示, 节点 B和节 点 C掉电, 并且节点 C发生重启、 节点 B长时间无法重启。 节点 C为 重启节点, 节点 B为节点 C的邻居节点, 节点 A为距离重启节点 C最 近的上游方向的正常节点,节点 D为距离重启节点 C最近的下游方向的 正常节点。 这里依据方式 1 , 在 LSP上距离重启节点最近的正常节点, 即节点 D上预先设置恢复等待时间。  In this embodiment, an LSP passing through nodes A, B, C, and D in sequence is taken as an example for description. FIG. 7 is a schematic diagram showing the operation state of the node in this embodiment. As shown in Figure 7, Node B and Node C are powered down, and Node C restarts and Node B cannot restart for a long time. Node C is the restart node, Node B is the neighbor node of Node C, Node A is the normal node in the upstream direction closest to Restart Node C, and Node D is the normal node in the downstream direction closest to Restart Node C. Here, according to the mode 1, the recovery waiting time is preset in the normal node closest to the restart node on the LSP, that is, the node D.
图 8是本实施例中故障处理方法的流程图。 参见图 8, 该方法包括: 在步骤 801 802中, 当节点 C上电重启时, 节点 C向节点 B和节 点 D发送 HELLO消息, 并且节点 C通过 HELLO消息机制获知节点 C 和节点 B处于控制通道通信中断状态。 FIG. 8 is a flowchart of a fault processing method in the embodiment. Referring to FIG. 8, the method includes: In step 801 802, when node C is powered on and restarted, node C sends a HELLO message to node B and node D, and node C learns node C through HELLO message mechanism. And Node B is in the control channel communication interrupt state.
节点 C在向节点 B发出 HELLO消息之后, 接收不到节点 B对该 HELLO消息的响应,因此判定节点 B与节点 C之间处于通信中断状态。 造成两节点通信中断的原因不仅可以是节点 B掉电尚未重启,还可以是 节点 B和节点 C之间的通信链路出现故障。  After sending a HELLO message to the Node B, the Node C does not receive the response of the Node B to the HELLO message, and therefore determines that the Node B and the Node C are in a communication interruption state. The reason for the interruption of the communication between the two nodes is not only that the node B is powered off but has not been restarted, but also that the communication link between the node B and the node C is faulty.
在步骤 803 ~ 804中, 节点 D向节点 C发送恢复消息, 通知节点 C 恢复 LSP对应的控制状态信息,节点 C根据接收到的恢复消息执行恢复 操作, 并且将节点 B和节点 C之间的通信中断故障信息通知给节点 D。  In steps 803-804, node D sends a recovery message to node C, informing node C to restore control state information corresponding to the LSP, node C performs a recovery operation according to the received recovery message, and communicates between node B and node C. The interrupt failure information is notified to node D.
这里, 节点 D发出的恢复消息可以是 Recovery— Path消息。 节点 C 根据接收到的恢复消息对自身 LSP对应的 PSB 中的控制状态信息进行 恢复。 由于节点 B尚未重启, 因此节点 C将与节点 B之间的通信中断 故障信息通知给节点 D。  Here, the recovery message sent by node D may be a Recovery_Path message. Node C recovers the control state information in the PSB corresponding to its own LSP according to the received recovery message. Since Node B has not been restarted, Node C notifies Node D of the communication interruption failure information with Node B.
在步骤 805中 , 节点 D启动恢复等待定时器, 对 LSP对应的 PSB 控制状态信息进行自刷新处理。  In step 805, the node D starts a recovery waiting timer, and performs self-refresh processing on the PSB control state information corresponding to the LSP.
节点 D通过来自于节点 C的通信中断故障信息, 确定 LSP暂时无 法恢复, 则启动按照预先设置的恢复等待时间进行计时的恢复等待定时 器, 并对节点 D中该 LSP对应的 PSB中的控制状态信息进行自刷新, 以防止该 PSB对应的定时器超时而删除该 PSB。 并且, 节点 B的上游 邻居节点 A由于与节点 B也处于通信中断状态, 因此在节点 A中的恢 复等待定时器超时之前对该 LSP对应的控制状态信息进行自刷新,例如 对节点 A中该 LSP对应的 RSB进行自刷新。  The node D interrupts the failure information through the communication from the node C, and determines that the LSP is temporarily unable to recover, and then starts the recovery waiting timer that is timed according to the preset recovery waiting time, and controls the state in the PSB corresponding to the LSP in the node D. The information is self-refreshed to prevent the timer corresponding to the PSB from timing out and deleting the PSB. In addition, the upstream neighbor node A of the node B is in the state of communication interruption with the node B. Therefore, the control state information corresponding to the LSP is self-refreshed before the recovery waiting timer expires in the node A, for example, the LSP in the node A. The corresponding RSB performs self-refresh.
在步骤 806 ~ 807中, 当节点 B在节点 D中的恢复等待定时器超时 之前重启, 节点 B向节点 A和节点 C发送 HELLO消息, 指明节点 B 上电重启。  In steps 806-807, when Node B restarts before the recovery wait timer in Node D times out, Node B sends a HELLO message to Node A and Node C, indicating that Node B is powered on and restarted.
在步骤 808 813中, 节点 A和节点 C都向节点 B发送恢复消息, 节点 B完成对控制状态信息的恢复后, 向节点 C发送恢复消息, 节点 C 完成对控制状态信息的同步后, 向节点 D发送恢复消息; 节点 D收到恢 复消息后停止恢复等待定时器并同步自身的控制状态信息, 进入正常的 刷新处理; 节点 D通过节点(、 节点 B向上游方向的正常节点 A发送 恢复应答消息。 In step 808 813, both node A and node C send a recovery message to node B. After completing the recovery of the control state information, the node B sends a recovery message to the node C. After completing the synchronization of the control state information, the node C sends a recovery message to the node D. After receiving the recovery message, the node D stops the recovery waiting timer and synchronizes. The control status information of the user enters the normal refresh process; the node D sends a recovery response message to the normal node A in the upstream direction through the node (the node B).
节点 B接收到下游方向的节点 C发来的恢复消息后,判断自身是否 为 LSP的源节点。  After receiving the recovery message from the node C in the downstream direction, the node B determines whether it is the source node of the LSP.
在节点 B是源节点的情况下,对节点 B中该 LSP对应的 PSB和 RSB 中的控制状态信息进行恢复, 并向节点 C发送诸如携带有恢复标签的 Path消息之类的恢复消息; 节点 C根据接收到的恢复消息, 同步该节点 中 LSP对应的 PSB, 并向节点 D发送恢复消息; 节点 D接收到恢复消 息后, 同步自身的控制状态信息并停止自身的恢复等待定时器, 开始正 常的控制状态信息^ 'J新处理。  In the case that the node B is the source node, the control state information in the PSB and the RSB corresponding to the LSP in the Node B is restored, and a recovery message such as a Path message carrying the recovery label is sent to the node C; According to the received recovery message, the PSB corresponding to the LSP in the node is synchronized, and a recovery message is sent to the node D. After receiving the recovery message, the node D synchronizes its own control state information and stops its own recovery waiting timer, and starts normal. Control status information ^ 'J new processing.
在节点 B不是 LSP的源节点的情况下, 节点 B维持自身已经存在 的部分 PSB控制状态信息并对下游方向执行正常的刷新处理。直到节点 B收到上游方向节点 A发来的恢复消息后才向节点 C发送诸如携带有恢 复标签的 Path消息之类的恢复消息; 节点 C根据接收到的恢复消息, 同 步该节点中 LSP对应的 PSB , 并向节点 D发送恢复消息; 节点 D接收 到恢复消息后, 同步自身的控制状态信息并停止自身的恢复等待定时 器, 开始正常的控制状态信息刷新处理; 节点 D通过节点(3、 节点 B向 上游方向的正常节点 A逐跳发送诸如 RESV消息的恢复应答消息,节点 C和节点 B收到恢复应答消息后恢复本地的 RSB控制状态信息,节点 A 收到恢复应答消息后同步本地的 RSB控制状态信息,进入正常的刷新处 理。  In the case where the node B is not the source node of the LSP, the node B maintains part of the PSB control state information that it already exists and performs normal refresh processing in the downstream direction. Until the node B receives the recovery message sent by the node A in the upstream direction, it sends a recovery message such as a Path message carrying the recovery label to the node C. The node C synchronizes the LSP corresponding to the node according to the received recovery message. PSB, and send a recovery message to node D; after receiving the recovery message, node D synchronizes its own control state information and stops its own recovery waiting timer, starts normal control state information refresh processing; node D passes node (3, node B sends a recovery response message such as a RESV message to the normal node A in the upstream direction, and the node C and the node B restore the local RSB control state information after receiving the recovery response message, and the node A synchronizes the local RSB after receiving the recovery response message. Control status information and enter normal refresh processing.
至此, 完成本实施例中故障处理流程。 以上为节点 B在节点 D中的恢复等待定时器超时之前重启的情况, 当恢复等待定时器的计时时间到达预先设置的恢复等待时间, 即该定时 器停止计时后, 节点 B仍未重启, 则节点 D发起删除未被恢复的 LSP 的处理。 So far, the fault processing flow in this embodiment is completed. The above is the case where the node B restarts before the recovery wait timer in the node D expires. When the recovery time of the recovery wait timer reaches the preset recovery waiting time, that is, after the timer stops counting, the node B still does not restart, then Node D initiates the process of deleting the LSP that was not recovered.
本实施例与实施例 1相似, 当 LSP上连续多个节点发生通信故障, 并且各个节点恢复正常的时间不一致时, 能够有效地防止 LSP的异常删 除, 提高 LSP恢复的可靠性; 并且当 LSP上发生其它非通信故障时, 本 实施例能够结合 RPC3437中描述的方法将该 LSP删除, 从而能够快速 准确地拆除故障连接, 有利于快速释放该 LSP占用的网络资源, 提高网 络资源的利用率。  This embodiment is similar to the first embodiment. When a plurality of nodes on the LSP have a communication failure and the time when the nodes are normal, the LSP is abnormally deleted and the reliability of the LSP recovery is improved. When other non-communication faults occur, this embodiment can delete the LSP in combination with the method described in RPC3437, so that the faulty connection can be quickly and accurately removed, which facilitates the rapid release of network resources occupied by the LSP and improves the utilization of network resources.
实施例 5  Example 5
本实施例中仍以图 7所示的 LSP为例, 节点 C重启、 节点 B长时 间无法重启。 这里依据方式 2, 在重启节点上, 即节点 C上, 预先设置 恢复等待定时间。  In this embodiment, the LSP shown in FIG. 7 is still taken as an example, and the node C is restarted and the node B cannot be restarted for a long time. Here, according to mode 2, on the restart node, that is, node C, the recovery wait time is set in advance.
图 9示出了本实施例中故障处理方法的流程图。 参见图 9, 该方法 包括:  Fig. 9 is a flow chart showing the method of fault handling in this embodiment. See Figure 9, which includes:
在步驟 901 902中, 当节点 C上电重启时, 节点 C向节点 B和节 点 D发送 HELLO消息, 并且节点 C通过 HELLO消息机制获知节点 C 和节点 B处于控制通道通信中断状态。  In step 901 902, when the node C is powered on and restarted, the node C sends a HELLO message to the node B and the node D, and the node C learns that the node C and the node B are in the control channel communication interruption state through the HELLO message mechanism.
在步骤 903 ~ 905中, 节点 D向节点 C发送恢复消息, 通知节点 C 恢复 LSP对应的控制状态信息,节点 C根据接收到的恢复消息执行恢复 操作, 然后节点 C启动恢复等待定时器, 并且将节点 B和节点 C之间 的通信中断故障信息通知给节点 D, 节点 D对 LSP对应的 PSB控制状 态信息进行自刷新处理。  In steps 903-905, node D sends a recovery message to node C, informing node C to restore control state information corresponding to the LSP, node C performs a recovery operation according to the received recovery message, and then node C starts a recovery wait timer, and The communication interruption failure information between the node B and the node C is notified to the node D, and the node D performs self-refresh processing on the PSB control state information corresponding to the LSP.
在步骤 906 ~ 907中, 节点 B在节点 D中的恢复等待定时器超时之 前重启, 向节点 A和节点 C发送 HELLO消息, 指明节点 B上电重启。 在步骤 908 ~ 913中, 节点 A和 C都向节点 B发送恢复消息, 节点 B完成对控制状态信息的恢复后, 向节点 C发送恢复消息, 节点 C完成 对控制状态信息的同步后停止恢复等待定时器并向节点 D发送恢复消 息; 节点 D收到恢复消息后同步自身的控制状态信息, 进入正常的刷新 处理; 节点 D通过节点 C;、 节点 B向上游方向的正常节点 A发送恢复 应答消息。 In steps 906-907, the recovery wait timer of node B in node D times out. Before restarting, send a HELLO message to node A and node C, indicating that node B is powered on and restarted. In steps 908-913, both nodes A and C send a recovery message to the node B. After completing the recovery of the control state information, the node B sends a recovery message to the node C, and the node C stops the recovery after the synchronization of the control state information is completed. The timer sends a recovery message to the node D; the node D synchronizes its own control state information after receiving the recovery message, and enters a normal refresh process; the node D passes the node C; and the node B sends a recovery response message to the normal node A in the upstream direction. .
节点 B接收到下游方向的节点 C发来的恢复消息后,判断自身是否 为 LSP的源节点。  After receiving the recovery message from the node C in the downstream direction, the node B determines whether it is the source node of the LSP.
在节点 B是源节点的情况下,对节点 B中该 LSP对应的 PSB和 RSB 中的控制状态信息进行恢复, 并向节点 C发送诸如携带有恢复标签的 Path消息之类的恢复消息; 节点 C根据接收到的恢复消息后同步该节点 中 LSP对应的 PSB并停止自身的恢复等待定时器, 向节点 D发送恢复 消息; 节点 D接收到恢复消息后, 同步自身的控制状态信息, 开始正常 的控制状态信息刷新处理。  In the case that the node B is the source node, the control state information in the PSB and the RSB corresponding to the LSP in the Node B is restored, and a recovery message such as a Path message carrying the recovery label is sent to the node C; Synchronizing the PSB corresponding to the LSP in the node according to the received recovery message and stopping its own recovery waiting timer, sending a recovery message to the node D; after receiving the recovery message, the node D synchronizes its own control state information, and starts normal control. Status information refresh processing.
在节点 B不是 LSP的源节点的情况下, 节点 B维持自身已经存在 的部分 PSB控制状态信息并对下游方向执行正常的刷新处理。直到节点 B收到上游方向 A节点发来的恢复消息后才向节点 C发送诸如携带有恢 复标签的 Path消息之类的恢复消息; 节点 C根据接收到的恢复消息, 同 步该节点中 LSP对应的 PSB并停止自身的恢复等待定时器, 向节点 D 发送恢复消息; 节点 D接收到恢复消息后, 同步自身的控制状态信息, 开始正常的控制状态信息刷新处理; 节点 D通过节点 C、 节点 B向上游 方向的正常节点 A逐跳发送诸如 RESV消息的恢复应答消息, 节点 C 和节点 B收到恢复应答消息后恢复本地的 RSB控制状态信息, 节点 A 收到恢复应答消息后同步本地的 RSB控制状态信息。进入正常的刷新处 理。 In the case where the Node B is not the source node of the LSP, the Node B maintains part of the PSB control state information that it already exists and performs normal refresh processing in the downstream direction. The node B sends a recovery message such as a Path message carrying the recovery label to the node C until the node B receives the recovery message sent by the node A in the upstream direction; the node C synchronizes the LSP corresponding to the node according to the received recovery message. The PSB stops its own recovery waiting timer and sends a recovery message to the node D. After receiving the recovery message, the node D synchronizes its own control state information and starts normal control state information refresh processing; the node D passes the node C and the node B upwards. The normal node A in the swim direction sends a recovery response message such as a RESV message hop by hop. After receiving the recovery response message, the node C and the node B restore the local RSB control state information, and the node A synchronizes the local RSB control state after receiving the recovery response message. information. Enter the normal refresh Reason.
至此, 完成本实施例中故障处理流程。  So far, the fault processing flow in this embodiment is completed.
以上为节点 B在节点 C中的恢复等待定时器超时之前重启的情况, 当恢复等待定时器的计时时间到达预先设置的恢复等待时间, 即该定时 器停止计时后, 节点 B仍未重启, 则节点 D删除未被恢复的 LSP。  The above is the case where the node B restarts before the recovery wait timer in the node C expires. When the recovery time of the recovery wait timer reaches the preset recovery waiting time, that is, after the timer stops counting, the node B still does not restart, then Node D deletes the LSP that was not recovered.
本实施例与实施例 2相似, 当 LSP上连续多个节点发生通信故障, 并且各个节点恢复正常的时间不一致时,能够有效地防止 LSP的异常删 除, 提高 LSP恢复的可靠性; 并且当 LSP上发生其它非通信故障时, 本 实施例能够结合 RFC3437中描述的方法将该 LSP删除, 从而能够快速 准确地拆除故障连接, 有利于快速释放该 LSP占用的网络资源, 提高网 络资源的利用率。  This embodiment is similar to the second embodiment. When a plurality of nodes on the LSP have a communication failure and the time when the nodes are normal, the LSP is abnormally deleted and the reliability of the LSP recovery is improved. When other non-communication faults occur, the embodiment can delete the LSP according to the method described in RFC3437, so that the faulty connection can be quickly and accurately removed, which is beneficial to quickly release the network resources occupied by the LSP and improve the utilization of network resources.
另外, 如果采用方式 3来处理图 7所示的 LSP, 则由节点 C中的恢 复等待定时器进行计时,并且节点 C确定与节点 B之间处于通信中断状 态时, 构造正常的恢复应答消息, 发送给节点 D。 并且在节点 B与节点 C恢复通信后按照与实施例 3相似的方式进行恢复处理。 具体而言, 节 点 C根据下游方向节点发来的恢复消息以及自身保存的传送平面信息, 在自身创建 LSP对应的 PSB, 通过自刷新处理来维持该 PSB中的控制 状态信息, 并且节点 C构造正常的恢复应答消息,发送给 LSP上的正常 节点 D。 节点 D根据接收到的正常的恢复应答消息, 对节点 D中 LSP 对应的控制状态信息进行刷新并向该重启节点 C方向发送正常的刷新消 息, 收到刷新消息的节点 C恢复该节点中 LSP对应的 RSB控制状态信 息进入正常刷新处理。在恢复等待时间内节点 C确定与节点 B恢复通信, 则节点 C向节点 B发送刷新消息, 节点 B判断自身是否为 LSP的源节 点。  In addition, if Mode 3 is used to process the LSP shown in FIG. 7, the recovery wait timer in the node C is timed, and the node C determines that the communication is interrupted with the Node B, and a normal recovery response message is constructed. Sent to node D. And after the node B resumes communication with the node C, the recovery process is performed in a similar manner to the embodiment 3. Specifically, the node C creates a PSB corresponding to the LSP according to the recovery message sent by the downstream direction node and the transport plane information saved by itself, and maintains the control state information in the PSB through the self-refresh process, and the node C is constructed normally. The recovery response message is sent to the normal node D on the LSP. The node D refreshes the control state information corresponding to the LSP in the node D according to the received normal recovery response message, and sends a normal refresh message to the restart node C. The node C that receives the refresh message restores the LSP corresponding to the node. The RSB control status information enters the normal refresh process. During the recovery waiting time, the node C determines to resume communication with the node B, and the node C sends a refresh message to the node B, and the node B determines whether it is the source node of the LSP.
在节点 B是源节点的情况下, 节点 B恢复或同步自身中 LSP对应 的 PSB控制状态信息, 并向节点 C发送恢复消息, 节点 C根据接收到 的恢复消息同步该节点中 LSP对应的 PSB控制状态信息, 停止自身的 恢复定时器, 并向节点 B发送刷新消息, 收到刷新消息的节点 B恢复该 节点中 LSP对应的 RSB控制状态信息进入正常刷新处理, 并结束本处 理流程。 In the case where the node B is the source node, the node B recovers or synchronizes the LSP corresponding to itself. The PSB controls the status information, and sends a recovery message to the node C. The node C synchronizes the PSB control status information corresponding to the LSP in the node according to the received recovery message, stops its own recovery timer, and sends a refresh message to the node B. The node B that has refreshed the message restores the RSB control state information corresponding to the LSP in the node to the normal refresh process, and ends the process flow.
在节点 B不是源节点的情况下, 节点 B维持自身已经存在的部分 PSB控制状态信息并对下游方向执行正常的刷新处理。节点 B直到接收 到上游方向节点 A发来的恢复消息后,才向节点 C发送诸如携带有恢复 标签的 Path消息之类的恢复消息; 节点 C根据接收到的恢复消息后,停 止自身的恢复定时器, 同步该节点中 LSP对应的 PSB控制状态信息, 并经过节点 B向上游方向的正常节点 A发送刷新消息;收到刷新消息的 节点 B恢复该节点中 LSP对应的 RSB控制状态信息, 并且节点 A收到 刷新消息后同步该节点中 LSP对应的 RSB控制状态信息, 进入正常刷 新处理, 并结束本处理流程。  In the case where the node B is not the source node, the node B maintains part of the PSB control state information that it already exists and performs normal refresh processing in the downstream direction. After receiving the recovery message sent by the upstream direction node A, the node B sends a recovery message such as a Path message carrying the recovery label to the node C; the node C stops its recovery timing according to the received recovery message. Synchronizing the PSB control state information corresponding to the LSP in the node, and sending a refresh message to the normal node A in the upstream direction by the node B; the node B receiving the refresh message recovers the RSB control state information corresponding to the LSP in the node, and the node After receiving the refresh message, A synchronizes the RSB control status information corresponding to the LSP in the node, enters the normal refresh process, and ends the process.
在上述的实施例 1、 2、 4和 5中, 重启节点接收到来自于上游或者 下游邻居节点的恢复消息之后, 向正常的节点发送通信中断故障信息。 重启节点也可以在未收到恢复消息时, 在检测到与邻居节点间通信中断 后, 才艮据本节点预先记录的 LSP路由信息, 将通信中断故障信息发送给 正常节点。  In the above embodiments 1, 2, 4 and 5, after the restart node receives the recovery message from the upstream or downstream neighbor node, it transmits a communication interruption failure information to the normal node. The restarting node may also send the communication interruption fault information to the normal node according to the LSP routing information pre-recorded by the node after detecting the interruption of the communication with the neighboring node when the recovery message is not received.
另夕卜, 上述五个实施例均为 LSP中的节点因长时间不重启而发生通 信故障的情况, 依据本发明思想的方法还可以应用于重启节点与邻居节 点运行正常, 但是节点间通信链路中断而出现通信故障的情况。 此时, 发生重启的节点在重启后通过 HELLO消息机制检测出自身与邻居节点 之间的通信链路中断, 此后可以按照上述任意一种实施例的方法进行操 作。 本发明的实施例还提出了一种故障处理系统, 包括第一节点、 第二 节点和至少一个第三节点, 其中第一节点和第二节点为相邻的发生通信 中断的节点, 第一节点重启, 第三节点为距离重启的第一节点最近的正 常节点。 In addition, the foregoing five embodiments are all cases in which a node in an LSP has a communication failure due to a long period of non-restart. The method according to the present invention can also be applied to a restart node and a neighbor node to operate normally, but an inter-node communication chain. A communication failure occurs when the road is interrupted. At this time, the node that has restarted detects the communication link interruption between itself and the neighbor node through the HELLO message mechanism after the restart, and can then operate according to the method of any of the above embodiments. The embodiment of the present invention further provides a fault processing system, including a first node, a second node, and at least one third node, where the first node and the second node are adjacent nodes in which communication interruption occurs, and the first node Restart, the third node is the normal node closest to the first node that is restarted.
在系统中, 第三节点, 用于在一段时间内, 若第一节点和第二节点 之间通信中断, 则维持所述 LSP的控制状态信息, 若第一节点与第二节 点之间恢复通信, 则恢复 LSP的控制状态信息;  In the system, the third node is configured to maintain control state information of the LSP if the communication between the first node and the second node is interrupted within a period of time, if the communication between the first node and the second node is resumed , the control state information of the LSP is restored;
第一节点, 用于在一段时间内, 若第一节点与第二节点之间恢复通 信, 则恢复 LSP的控制状态信息;  a first node, configured to restore control state information of the LSP if the communication between the first node and the second node is resumed within a period of time;
第二节点, 用于在一段时间内, 若第一节点与第二节点之间恢复通 信, 则恢复 LSP的控制状态信息。  The second node is configured to restore the control state information of the LSP if the communication between the first node and the second node is resumed within a period of time.
与故障处理方法对应的, 该系统也有以下相应的三种方式。  Corresponding to the fault handling method, the system also has the following three corresponding methods.
第一种方式中, 第一节点用于在收到来自第三节点的恢复消息后构 造恢复应答消息, 发送给第三节点, 并且开始计时; 当所计时长未超出 一段时间时, 若第一节点与第二节点间恢复通信, 则停止计时;  In the first mode, the first node is configured to construct a resume response message after receiving the recovery message from the third node, send the message to the third node, and start timing; when the time length does not exceed a period of time, if the first node Resume communication with the second node, then stop timing;
第三节点用于向第一节点发送恢复消息, 在所计时长未超出一段时 间时, 根据的恢复应答消息维持 LSP的控制状态信息。  The third node is configured to send a recovery message to the first node, and the recovery response message according to the maintaining the LSP control state information when the time length is not exceeded.
第二种方式中, 第一节点用于发送第一节点和第二节点之间的通信 中断故障信息;  In the second mode, the first node is configured to send a communication interruption fault information between the first node and the second node;
第三节点用于当接收到通信中断故障信息后, 开始计时; 在所计时 长未超出一段时间时, 若第一节点与第二节点通信中断, 则维持 LSP的 控制状态信息; 若第一节点与第二节点恢复通信, 则停止计时。  The third node is configured to start timing after receiving the communication interruption failure information; if the first node does not exceed the time period, if the communication between the first node and the second node is interrupted, the control state information of the LSP is maintained; When the communication with the second node resumes, the timing is stopped.
第三种方式中, 第一节点用于发送通信中断故障信息,并开始计时; 在所计时长未超出一段时间时, 若第一节点与第二节点恢复通信时, 则 停止计时; 第三节点用于在所计时长未超出一段时间时, 维持 LSP的控制状态 信息。 In the third mode, the first node is configured to send the communication interruption fault information, and start timing; when the time period is not exceeded, if the first node resumes communication with the second node, the timing is stopped; The third node is configured to maintain control state information of the LSP when the time length is not exceeded.
系统的运作方式与方法中描述的类似, 在此不再赘述。  The operation of the system is similar to that described in the method and will not be described here.
本发明的实施例给出了一种标签交换路径 LSP上的设备, LSP上包 括第一设备、 第二设备和至少一个第三设备, 第一设备和第二设备为相 邻的发生通信中断的设备, 第一设备重启, 第三设备为距离重启的第一 设备最近的正常设备。 其中, 当该设备为第三设备时, 包括:  An embodiment of the present invention provides a device on a label switched path LSP. The LSP includes a first device, a second device, and at least one third device. The first device and the second device are adjacent to each other. The device, the first device restarts, and the third device is a normal device that is closest to the first device that is restarted. Wherein, when the device is a third device, the method includes:
第一模块, 用于当第一设备和第二设备通信中断时, 开始计时; 当 所计时长未超出一段时间时, 若第一设备和第二设备恢复通信, 则停止 计时;  a first module, configured to start timing when communication between the first device and the second device is interrupted; when the time length is not exceeded, if the first device and the second device resume communication, stop timing;
第二模块, 用于在第一模块所计的时长未超出一段时间时, 若第一 设备和第二设备通信中断, 则维持设备的控制状态信息, 若第一设备和 第二设备恢复通信 , 则恢复设备的控制状态信息。  a second module, if the duration of the first module does not exceed a period of time, if the communication between the first device and the second device is interrupted, maintaining control state information of the device, if the first device and the second device resume communication, Then restore the control status information of the device.
当该设备为第一设备时, 进一步包括:  When the device is the first device, the method further includes:
第三模块, 用于向第三设备报告设备和第二设备之间的通信中断故 障信息; 其中, 第一模块, 用于当报告通信中断故障信息时, 开始计时; 当所计时长未超出一段时间,若设备和第二设备恢复通信,则停止计时。  a third module, configured to report, to the third device, the communication interruption fault information between the device and the second device, where the first module is configured to start timing when the communication interruption fault information is reported; If the device and the second device resume communication, stop timing.
当该设备是第一设备时, 进一步包括:  When the device is the first device, the method further includes:
第四模块, 用于当第三设备向设备发送恢复消息时, 构造一个正常 的恢复应答消息返回给第三设备; 其中  a fourth module, configured to: when the third device sends a recovery message to the device, construct a normal recovery response message and return the message to the third device;
第一模块, 用于当返回恢复应答消息时, 开始计时; 当所计时长未 超出一段时间时, 若设备和第二设备恢复通信, 则停止计时。  The first module is configured to start timing when the recovery response message is returned; when the time length is not exceeded, if the device and the second device resume communication, the timing is stopped.
通过上述的实施例描述的方法、 系统及设备, 当 LSP上连续多个节 点发生通信故障, 并且各个节点恢复正常的时间不一致时, 能够有效地 防止 LSP的异常删除, 提高 LSP恢复的可靠性; 并且当 LSP上发生其 它非通信故障时, 本实施例能够结合 RPC3437 中描述的方法将该 LSP 删除, 从而能够快速准确地拆除故障连接, 有利于快速释放该 LSP占用 的网络资源, 提高网络资源的利用率。 The method, the system, and the device described in the foregoing embodiments can prevent the abnormal deletion of the LSP and improve the reliability of the LSP recovery when a plurality of nodes on the LSP are in communication failures and the time when the nodes are in normal state are inconsistent. And when it happens on the LSP When it is not a communication failure, the embodiment can delete the LSP in combination with the method described in the RPC3437, so that the faulty connection can be quickly and accurately removed, which is beneficial to quickly release the network resources occupied by the LSP and improve the utilization of network resources.
以上所述仅为本发明的较佳实施例而已, 并不用以限制本发明, 凡 在本发明的精神和原则之内, 所做的任何修改、 等同替换、 改进等, 均 应包含在本发明的保护范围之内。  The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalents, improvements, etc., which are made within the spirit and principles of the present invention, should be included in the present invention. Within the scope of protection.

Claims

权利要求书 Claim
1、 一种故障处理方法, 应用于包括第一节点、 第二节点和至少一个 第三节点的标签交换路径 LSP, 其中, 所述第一节点和第二节点为相邻 的发生通'信中断的节点, 所述第一节点重启, 所述第三节点为距离所述 重启的第一节点最近的正常节点, 其特征在于, 该方法包括: A method for processing a fault, which is applied to a label switched path LSP including a first node, a second node, and at least one third node, where the first node and the second node are adjacent to each other. The node is restarted, and the third node is a normal node that is closest to the restarted first node, and the method includes:
当所述第一节点和第二节点之间的通信中断时, 在一段时间内所述 第三节点维持所述 LSP的控制状态信息;  When the communication between the first node and the second node is interrupted, the third node maintains control state information of the LSP for a period of time;
当在所述一段时间内所述第一节点与所述第二节点之间恢复通信 时,所述第一节点、笫二节点和第三节点恢复所述 LSP的控制状态信息。  The first node, the second node, and the third node restore control state information of the LSP when the communication is resumed between the first node and the second node within the period of time.
2、 如权利要求 1所述的方法, 其特征在于, 所迷维持所述 LSP的 控制状态信息, 包括:  2. The method according to claim 1, wherein the maintaining control state information of the LSP comprises:
第三节点向所述第一节点发送恢复消息;  The third node sends a recovery message to the first node;
所述第一节点构造恢复应答消息返回给所述第三节点,并开始计时; 在所计时长未超出所述一段时间时, 所述第三节点根据所述第一恢 复应答消息维持所述 LSP的控制状态信息, 其中, 若所述第一节点与所 述第二节点间恢复通信, 则所述第一节点停止计时。  Returning, by the first node, a recovery response message to the third node, and starting timing; when the timed length does not exceed the time period, the third node maintains the LSP according to the first recovery response message Control state information, wherein if the first node resumes communication with the second node, the first node stops timing.
3、 如权利要求 1所述的方法, 其特征在于, 所述维持所述 LSP的 控制状态信息, 包括:  The method according to claim 1, wherein the maintaining the control state information of the LSP includes:
所述第一节点发出通信中断故障信息;  The first node sends a communication interruption failure information;
所述第三节点收到所述通信故障信息时开始计时, 并在所计时长未 超出所述一段时间时维持所述 LSP的控制状态信息, 其中, 若所述第一 节点与所述第二节点之间恢复通信 , 则所述第三节点停止计时。  The third node starts timing when receiving the communication failure information, and maintains control state information of the LSP when the time length does not exceed the time period, where if the first node and the second node The communication is resumed between the nodes, and the third node stops timing.
4、 如权利要求 1所述的方法, 其特征在于, 所迷维持所述 LSP的 控制状态信息, 包括: 所述第一节点向所述第三节点发出通信中断故障信息,并开始计时; 所述第三节点在所计时长未超出所述一段时间时, 维持所述 LSP的 控制状态信息, 其中, 若所述第一节点与所述第二节点之间恢复通信, 则所述第一节点停止计时。 The method of claim 1, wherein the maintaining the control state information of the LSP comprises: The first node sends a communication interruption failure information to the third node, and starts timing; the third node maintains control state information of the LSP when the time length does not exceed the time period, where The communication resumes between the first node and the second node, and the first node stops timing.
5、如权利要求 3或 4所述的方法, 其特征在于, 所述第一节点为所 述第二节点的上游节点; 所述第三节点包括上游第三节点, 或包括上游 第三节点和下游第三节点, 所述上游第三节点为所述第一节点的上游节 点, 所述下游第三节点为所述第二节点的下游节点;  The method according to claim 3 or 4, wherein the first node is an upstream node of the second node; the third node comprises an upstream third node, or comprises an upstream third node and a downstream third node, the upstream third node is an upstream node of the first node, and the downstream third node is a downstream node of the second node;
所述上游第三节点通过自刷新来维持该 LSP对应的预留状态块 RSB 控制状态信息;  The upstream third node maintains the reserved state block RSB control state information corresponding to the LSP by self-refresh;
当在所计时长未超出所述一段时间时, 若所述第一节点与所述第二 节点恢复通信, 并且所述第一节点向所述第二节点发送恢复消息, 则所 述第二节点判断自身是否为所述 LSP的目的节点;  And when the first node fails to communicate with the second node when the time length is not exceeded, and the first node sends a recovery message to the second node, the second node Determining whether it is the destination node of the LSP;
如果所述第二节点是所述目的节点, 则所述第二节点根据接收到的 恢复消息, 恢复自身中 LSP对应的 PSB和 RSB控制状态信息, 并向所 述第一节点发送恢复应答消息; 所述第一节点才艮据接收到的恢复应答消 息, 恢复自身中 LSP对应的 RSB控制状态信息, 并且沿上游方向将所 述恢复应答消息逐跳发送至所述上游第三节点或者源节点, 收到所述恢 复应答消息的节点恢复自身中 LSP对应的 RSB控制状态信息, 结束本 处理流程;  If the second node is the destination node, the second node recovers the PSB and RSB control state information corresponding to the LSP in the self according to the received recovery message, and sends a recovery response message to the first node; The first node recovers the RSB control state information corresponding to the LSP in the LSP according to the received recovery response message, and sends the recovery response message to the upstream third node or the source node hop by hop in the upstream direction. The node that receives the recovery response message restores the RSB control state information corresponding to the LSP in the LSP, and ends the processing flow;
如果所述第二节点不是目的节点, 则所述第二节点根据接收到的恢 复消息, 恢复自身中该 LSP对应的 PSB控制状态信息, 并沿下游方向 将恢复消息逐跳发送至所述下游第三节点, 接收到所述恢复消息的下游 节点对自身中 LSP对应的 PSB控制状态信息进行恢复或者同步, 进入 正常刷新处理, 然后所述下游第三节点沿上游方向将恢复应答消息逐跳 发送至所述上游第三节点或者源节点, 收到所述恢复应答消息的节点恢 复自身中 LSP对应的 RSB控制状态信息。 If the second node is not the destination node, the second node recovers the PSB control state information corresponding to the LSP in the LSP according to the received recovery message, and sends the recovery message hop by hop to the downstream in the downstream direction. The three nodes, the downstream node that receives the recovery message restores or synchronizes the PSB control state information corresponding to the LSP in the LSP, and enters the normal refresh process, and then the downstream third node hops the recovery response message in the upstream direction. The node that receives the recovery response message restores the RSB control state information corresponding to the LSP in the upstream node or the source node.
6、如权利要求 5所述的方法, 其特征在于, 所述上游第三节点维持 RSB控制状态信息的同时, 该方法进一步包括:  The method of claim 5, wherein: the upstream third node maintains the RSB control state information, the method further includes:
所述下游第三节点在自身的重启定时时间内通过自刷新来维持该 LSP对应的路径状态块 PSB控制状态信息。  The downstream third node maintains the path state block PSB control state information corresponding to the LSP by self-refresh in its own restart timing time.
7、 如权利要求 3或 4所述的方法, 其特征在于, 所述第一节点为所 述第二节点的下游节点, 所述第三节点包括下游第三节点, 或包括上游 第三节点和下游第三节点, 其中所述下游第三节点为所述第一节点的下 游节点, 所述上游第三节点为所述第二节点的上游节点;  The method according to claim 3 or 4, wherein the first node is a downstream node of the second node, the third node comprises a downstream third node, or comprises an upstream third node and a downstream third node, wherein the downstream third node is a downstream node of the first node, and the upstream third node is an upstream node of the second node;
所述下游第三节点通过自刷新来维持该 LSP对应的 PSB控制状态信 息;  The downstream third node maintains PSB control state information corresponding to the LSP by self-refresh;
当在所计时长未超出所述一段时间时, 若所述第一节点与所迷第二 节点恢复通信, 并且所述第一节点向所述第二节点发送恢复消息时, 则 所述第二节点判断自身是否为所述 LSP上的源节点;  When the first node does not resume communication with the second node, and the first node sends a recovery message to the second node, the second node The node determines whether it is a source node on the LSP;
如果所述第二节点是所述源节点, 则所述第二节点根据接收到的恢 复消息, 恢复自身中 LSP对应的 PSB和 /或 SB中的控制状态信息, 并 且所述第二节点沿下游方向将所述恢复消息逐跳发送至所述下游第三 节点; 所述下游第三节点恢复或同步自身的控制状态信息, 并沿上游方 向逐跳将恢复应答消息发送给所述第二节点, 接收到所述恢复应答消息 的节点对自身中 LSP对应的 RSB控制状态信息进行恢复或同步, 进入 正常刷新处理, 并结束本处理流程;  If the second node is the source node, the second node recovers control state information in the PSB and/or SB corresponding to the LSP in the self according to the received recovery message, and the second node is downstream The direction sends the recovery message to the downstream third node hop by hop; the downstream third node recovers or synchronizes its own control state information, and sends a recovery response message to the second node hop by hop in the upstream direction. The node that receives the recovery response message restores or synchronizes the RSB control state information corresponding to the LSP in the LSP, enters the normal refresh process, and ends the processing flow;
如果所述第二节点不是源节点, 则所述第二节点在收到上游第三节 点的恢复消息后, 才艮据所述恢复消息, 恢复自身中 LSP对应的 PSB控 制状态信息, 并沿下游方向将所述恢复消息逐跳发送至所述下游第三节 点; 所述下游第三节点恢复或同步自身的控制状态信息, 并沿上游方向 将恢复应答消息逐跳发送至所述上游第三节点, 接收到所述恢复应答消 息的上游节点对自身中 LSP对应的 RSB控制状态信息进行恢复或同步, 进入正常刷新处理。 If the second node is not the source node, the second node recovers the PSB control state information corresponding to the LSP in the LSP according to the recovery message after receiving the recovery message of the upstream third node, and is downstream Direction sends the recovery message hop by hop to the downstream third section The downstream third node recovers or synchronizes its own control state information, and sends a recovery response message hop by hop to the upstream third node in the upstream direction, and the upstream node that receives the recovery response message has an LSP in itself The corresponding RSB control status information is restored or synchronized, and the normal refresh processing is entered.
8、 如权利要求 7所述的方法, 其特征在于, 所述下游第三节点维持 PSB控制状态信息的同时, 该方法进一步包括:  The method of claim 7, wherein the method further comprises: the downstream third node maintaining the PSB control state information, the method further comprising:
所述上游第三节点在自身的重启定时时间内通过自刷新来维持该 LSP对应的路径状态块 RSB控制状态信息。  The upstream third node maintains the path state block RSB control state information corresponding to the LSP by self-refresh in its own restart timing.
9、 如权利要求 2所述的方法, 其特征在于, 所述第一节点为所述第 二节点的上游节点; 所述第三节点包括上游第三节点, 或包括上游第三 节点和下游第三节点, 所述上游第三节点为所述第一节点的上游节点, 所述下游第三节点为所述第二节点的下游节点;  The method according to claim 2, wherein the first node is an upstream node of the second node; the third node includes an upstream third node, or includes an upstream third node and a downstream node. a third node, the upstream third node is an upstream node of the first node, and the downstream third node is a downstream node of the second node;
在所计时长未超出所述一段时间时, 若所述第一节点与所述第二节 点通信中断, 则所述第一节点根据所述上游第三节点发来的恢复消息以 及自身保存的传送平面信息, 在自身创建 LSP对应的 PSB, 通过自刷新 处理来维持该 PSB中的控制状态信息,并且将所构造的恢复应答消息发 送给所述上游第三节点; 所述上游第三节点根据接收到的恢复应答消 息, 维持自身中 LSP对应的控制状态信息;  If the first node fails to communicate with the second node when the time length is not exceeded, the first node sends a recovery message according to the upstream third node and a self-saved transmission. Plane information, the PSB corresponding to the LSP is created by itself, the control state information in the PSB is maintained by the self-refresh process, and the constructed recovery response message is sent to the upstream third node; the upstream third node receives the The recovery response message arrives, maintaining control state information corresponding to the LSP in itself;
在所计时长未超出所述一段时间时, 若所述第一节点与所述第二节 点恢复通信, 且所述第二节点收到来自所述第一节点的恢复消息时, 所 述第二节点判断自身是否为所述 LSP的目的节点;  And if the first node fails to communicate with the second node, and the second node receives a recovery message from the first node, the second The node determines whether it is the destination node of the LSP;
如果所述第二节点是所述目的节点, 则所述第二节点恢复自身中 LSP对应的 PSB和 RSB控制状态信息, 并向所述第一节点发送恢复应 答消息; 接收到恢复应答消息的所述第一节点恢复自身中 LSP对应的 RSB控制状态信息, 所述第一节点与所述第二节点进入正常刷新处理, 并结束本处理流程; If the second node is the destination node, the second node recovers the PSB and RSB control state information corresponding to the LSP in the self, and sends a recovery response message to the first node; The first node recovers the RSB control state information corresponding to the LSP in the first node, and the first node and the second node enter a normal refresh process. And end this process;
如果所述第二节点不是所述目的节点, 则所述第二节点根据接收到 的恢复消息, 恢复自身中 LSP对应的 PSB控制状态信息, 并沿下游方 向将所述恢复消息逐跳发送至所述下游第三节点 , 接收到所述恢复消息 的节点对自身中 LSP对应的 PSB控制状态信息进行恢复, 进入正常刷 新处理; 并且所述下游第三节点沿上游方向将所述恢复庶答消息逐跳发 送至所述上游第三节点或源节点, 接收到所述恢复应答消息的节点对自 身中 LSP对应的 RSB控制状态信息进行恢复或同步, 并进入正常的刷 新处理。  If the second node is not the destination node, the second node recovers the PSB control state information corresponding to the LSP in the PDCCH according to the received recovery message, and sends the recovery message to the hop by hop in the downstream direction. The downstream third node, the node that receives the recovery message recovers the PSB control state information corresponding to the LSP in the self, and enters a normal refresh process; and the downstream third node sequentially returns the recovery message in the upstream direction. The hop is sent to the upstream third node or the source node, and the node that receives the recovery response message restores or synchronizes the RSB control state information corresponding to the LSP in itself, and enters a normal refresh process.
10、 如权利要求 2所述的方法, 其特征在于, 所述第一节点为所述 第二节点的下游节点, 所述第三节点包括下游第三节点, 或上游第三节 点和下游第三节点, 其中所述上游第三节点为所述第二节点的上游节 点, 所述下游第三节点为所述第一节点的下游节点;  The method according to claim 2, wherein the first node is a downstream node of the second node, and the third node comprises a downstream third node, or an upstream third node and a downstream third node. a node, wherein the upstream third node is an upstream node of the second node, and the downstream third node is a downstream node of the first node;
在所计时长未超出所述一段时间时, 若所述第一节点与所述第二节 点通信中断, 则所述第一节点根据所述下游第三节点发来的恢复消息以 及自身保存的传送平面信息, 在自身创建 LSP对应的 PSB, 通过自刷新 处理来维持 PSB中的控制状态信息,并且将所述恢复应答消息发送给所 述下游第三节点; 所述下游第三节点根据接收到的恢复应答消息, 对自 身中 LSP对应的控制状态信息进行同步;  If the first node does not exceed the period of time, if the communication between the first node and the second node is interrupted, the first node sends a recovery message according to the downstream third node and a self-saved transmission. Plane information, the PSB corresponding to the LSP is created by itself, the control state information in the PSB is maintained by the self-refresh process, and the recovery response message is sent to the downstream third node; the downstream third node is received according to the Restoring the response message, synchronizing the control status information corresponding to the LSP in the self;
在所计时长未超出所述一段时间时, 若所述第一节点与所述第二节 点恢复通信, 且所述第一节点向所述第二节点发送恢复消息, 则所述第 二节点判断自身是否为所述 LSP的源节点;  If the first node fails to communicate with the second node, and the first node sends a recovery message to the second node, the second node determines Whether it is the source node of the LSP;
如果所述第二节点是所述源节点, 则所述第二节点根据接收到的恢 复消息, 恢复或同步自身中 LSP对应的 PSB和 /或 RSB控制状态信息, 并向所述第一节点发送所述恢复消息, 接收到所述恢复消息的所述第一 节点恢复自身中 LSP对应的 PSB控制状态信息, 并向所述第二节点发 送恢复应答消息, 收到所述恢复应答消息的所述第二节点恢复自身中 LSP对应的 RSB控制状态信息,进入正常刷新处理,并结束本处理流程; 如果所述第二节点不是所述源节点, 则所述第二节点根据来自上游 第三节点的恢复消息, 恢复自身中该 LSP对应的 PSB控制状态信息, 并向所述第一节点发送所述恢复消息, 接收到所述恢复消息的所述第一 节点恢复自身中 LSP对应的 PSB控制状态信息, 并向所述第二节点发 送恢复应答消息, 收到所述恢复应答消息的所述第二节点恢复或同步自 身中 LSP对应的 RSB控制状态信息, 进入正常刷新处理。 If the second node is the source node, the second node recovers or synchronizes the PSB and/or RSB control state information corresponding to the LSP in the self according to the received recovery message, and sends the status information to the first node. The recovery message, receiving the first of the recovery message The node recovers the PSB control state information corresponding to the LSP in the self, and sends a recovery response message to the second node, and the second node that receives the recovery response message restores the RSB control state information corresponding to the LSP in the self, and enters the normal state. Refreshing the process, and ending the process flow; if the second node is not the source node, the second node recovers the PSB control state information corresponding to the LSP according to the recovery message from the upstream third node, and Sending the recovery message to the first node, the first node that receives the recovery message recovers the PSB control state information corresponding to the LSP in the self, and sends a recovery response message to the second node, and receives the The second node of the recovery response message restores or synchronizes the RSB control state information corresponding to the LSP in the self, and enters a normal refresh process.
11、 如权利要求 1所述的方法, 其特征在于, 所述第一节点与所述 第二节点处于控制通道通信中断状态为: 所述第二节点尚未重启或者所 述第一节点与所述第二节点间的通信链路中断。  The method according to claim 1, wherein the first node and the second node are in a control channel communication interruption state: the second node has not been restarted or the first node and the The communication link between the second nodes is broken.
12、 如权利要求 1所述的方法, 其特征在于, 所述的在一段时间内 所述第三节点维持所述 LSP的控制状态信息之后, 该方法进一步包括: 在所述一段时间后, 所述第一节点与所述第二节点仍处于控制通道 中断状态, 则所述第一节点和第三节点删除该 LSP对应的控制状态信  The method according to claim 1, wherein after the third node maintains control state information of the LSP for a period of time, the method further includes: after the period of time, The first node and the second node are still in the control channel interruption state, and the first node and the third node delete the control status information corresponding to the LSP.
13、 一种故障处理系统, 包括第一节点、 第二节点和至少一个第三 节点, 所述第一节点和第二节点为相邻的发生通信中断的节点, 所述第 一节点重启, 所述第三节点为距离所述重启的第一节点最近的正常节 点, 其特征在于, 13. A fault handling system, comprising: a first node, a second node, and at least one third node, wherein the first node and the second node are adjacent nodes in which communication interruption occurs, and the first node restarts The third node is a normal node that is closest to the restarted first node, and is characterized in that
所述第三节点, 用于在一段时间内, 若所述第一节点和第二节点之 间通信中断, 则维持所述 LSP的控制状态信息, 若所述第一节点与所述 第二节点之间恢复通信, 则恢复所述 LSP的控制状态信息;  The third node is configured to maintain control state information of the LSP if the communication between the first node and the second node is interrupted, if the first node and the second node are Restoring communication, recovering control state information of the LSP;
所述第一节点, 用于在所述一段时间内, 若所述第一节点与所述第 二节点之间恢复通信, 则恢复所述 LSP的控制状态信息; 所述第二节点, 用于在所述一段时间内, 若所述第一节点与所述第 二节点之间恢复通信, 则恢复所述 LSP的控制状态信息。 The first node, configured to: if the first node and the first Recovering the communication between the two nodes, the control state information of the LSP is restored; the second node is configured to: if the communication between the first node and the second node resumes communication, Restore control state information of the LSP.
14、 如权利要求 13所述的系统, 其特征在于,  14. The system of claim 13 wherein:
所述第一节点用于在收到来自所述第三节点的恢复消息后构造恢复 应答消息, 发送给所述第三节点, 并且开始计时; 当所计时长未超出所 述一段时间时, 若所述第一节点与所述第二节点间恢复通信, 则停止计 时;  The first node is configured to construct a resume response message after receiving the recovery message from the third node, send the message to the third node, and start timing; when the time length does not exceed the period of time, if Stop the communication between the first node and the second node, and stop timing;
所述第三节点用于向所述第一节点发送所述恢复消息, 在所计时长 未超出所述一段时间时,根据所述的恢复应答消息维持所述 LSP的控制 状态信息。  The third node is configured to send the recovery message to the first node, and maintain the control state information of the LSP according to the recovery response message when the time length does not exceed the time period.
15、 如权利要求 13所述的系统, 其特征在于,  15. The system of claim 13 wherein:
所述第一节点还用于发送所述第一节点和所述第二节点之间的通信 中断故障信息;  The first node is further configured to send communication interruption fault information between the first node and the second node;
所述第三节点用于当接收到所述通信中断故障信息后, 开始计时; 在所计时长未超出所述一段时间时, 若所述第一节点与所述第二节点通 信中断, 则维持所述 LSP的控制状态信息; 若所述第一节点与所述第二 节点恢复通信, 则停止计时。  The third node is configured to start timing after receiving the communication interruption failure information; if the time length of the timer is not exceeded, if the communication between the first node and the second node is interrupted, Control state information of the LSP; if the first node resumes communication with the second node, stopping timing.
16、 如权利要求 13所述的系统, 其特征在于,  16. The system of claim 13 wherein:
所述第一节点还用于发送通信中断故障信息, 并开始计时; 在所计 时长未超出所述一段时间时, 若所述第一节点与所述第二节点恢复通信 时, 则停止计时;  The first node is further configured to send a communication interruption failure information, and start timing; when the time length does not exceed the time period, if the first node resumes communication with the second node, stop timing;
所述第三节点还用于在所计时长未超出所述一段时间时, 维持所述 LSP的控制状态信息。 The third node is further configured to maintain control state information of the LSP when the long period of time is not exceeded.
17、 一种标签交换路径 LSP上的设备, 所述 LSP上包括第一设备、 第二设备和至少一个第三设备, 所述第一设备和第二设备为相邻的发生 通信中断的设备, 所述第一设备重启, 所述第三设备为距离所述重启的 第一设备最近的正常设备,其特征在于, 当所述设备为所述第三设备时, 包括: A device on a label switching path LSP, where the LSP includes a first device, a second device, and at least one third device, where the first device and the second device are adjacent devices that interrupt communication, The first device is restarted, and the third device is a normal device that is closest to the restarted first device, and is characterized in that: when the device is the third device, the method includes:
第一模块, 用于当所述第一设备和所述第二设备通信中断时, 开始 计时; 当所计时长未超出一段时间时, 若所述第一设备和所述第二设备 恢复通信, 则停止计时;  a first module, configured to start timing when communication between the first device and the second device is interrupted; if the time length does not exceed a period of time, if the first device and the second device resume communication, Stop timing;
第二模块,用于在所述第一模块所计的时长未超出所述一段时间时, 若所述第一设备和所述第二设备通信中断, 则维持所述设备的控制状态 信息, 若所述第一设备和所述第二设备恢复通信, 则恢复所述设备的控 制状态信息。  a second module, if the duration of the first module does not exceed the period of time, if the communication between the first device and the second device is interrupted, maintaining control state information of the device, if When the first device and the second device resume communication, the control state information of the device is restored.
18、 如权利要求 17所述的设备, 其特征在于, 当所述设备为所述第 一设备时, 进一步包括:  The device according to claim 17, wherein when the device is the first device, the method further includes:
第三模块, 用于向所述第三设备报告所述设备和所述第二设备之间 的通信中断故障信息;  a third module, configured to report, to the third device, communication interruption fault information between the device and the second device;
所述第一模块, 用于当报告所述通信中断故障信息时, 开始计时; 当所计时长未超出一段时间, 若所述设备和所述第二设备恢复通信, 则 停止计时。  The first module is configured to start timing when the communication interruption fault information is reported; when the time length is not exceeded, if the device and the second device resume communication, stop timing.
19、 如权利要求 17所述的设备, 其特征在于, 当所述设备是所述第 一设备时, 进一步包括:  The device according to claim 17, wherein when the device is the first device, the method further includes:
第四模块, 用于当所述第三设备向所述设备发送恢复消息时, 构造 一个正常的恢复应答消息返回给所述第三设备;  a fourth module, configured to: when the third device sends a recovery message to the device, construct a normal recovery response message and return the message to the third device;
所述第一模块, 用于当返回所述恢复应答消息时, 开始计时; 当所 计时长未超出所述一段时间时, 若所述设备和所述第二设备恢复通信, 则停止计时。 The first module is configured to start timing when the recovery response message is returned; if the time length does not exceed the time period, if the device and the second device resume communication, Then stop timing.
PCT/CN2007/001194 2006-06-09 2007-04-12 A failure processing method and a system and a device thereof WO2007140689A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
DK07720767.8T DK1986371T3 (en) 2006-06-09 2007-04-12 Procedure for error handling and associated system and device
AT07720767T ATE537664T1 (en) 2006-06-09 2007-04-12 AN ERROR PROCESSING METHOD AND SYSTEM AND APPARATUS
ES07720767T ES2376361T3 (en) 2006-06-09 2007-04-12 METHOD, SYSTEM AND DEVICE FOR FAILURE PROCESSING.
EP07720767A EP1986371B1 (en) 2006-06-09 2007-04-12 A failure processing method and a system and a device thereof
US12/331,125 US8040793B2 (en) 2006-06-09 2008-12-09 Method, system and device for processing failure

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200610092914A CN101087207B (en) 2006-06-09 2006-06-09 A processing method for multi-node communication failure
CN200610092914.0 2006-06-09

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/331,125 Continuation US8040793B2 (en) 2006-06-09 2008-12-09 Method, system and device for processing failure

Publications (1)

Publication Number Publication Date
WO2007140689A1 true WO2007140689A1 (en) 2007-12-13

Family

ID=38801054

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2007/001194 WO2007140689A1 (en) 2006-06-09 2007-04-12 A failure processing method and a system and a device thereof

Country Status (7)

Country Link
US (1) US8040793B2 (en)
EP (1) EP1986371B1 (en)
CN (1) CN101087207B (en)
AT (1) ATE537664T1 (en)
DK (1) DK1986371T3 (en)
ES (1) ES2376361T3 (en)
WO (1) WO2007140689A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117632569A (en) * 2024-01-12 2024-03-01 宝德计算机系统股份有限公司 Method and device for remedying PCI equipment information loss at BMC webpage end

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101616069B (en) * 2008-06-23 2012-04-04 华为技术有限公司 Information restoration method based on graceful restart and router
EP2182770B1 (en) * 2008-11-04 2011-09-21 HTC Corporation Method for improving uplink transmission in a wireless communication system
US20100284268A1 (en) * 2009-05-07 2010-11-11 Shan Zhu Node State Recovery for a Communication Network
US20100284269A1 (en) * 2009-05-07 2010-11-11 Shan Zhu Multi-Node State Recovery for a Communication Network
CN101964925B (en) * 2009-07-21 2014-04-30 中兴通讯股份有限公司 Method and system for controlling recovery of planar node after restarting in automatic switched optical network
US8850014B2 (en) * 2010-05-06 2014-09-30 Telefonaktiebolaget L M Ericsson (Publ) Handling failure of request message during set up of label switched path
US9197380B2 (en) * 2010-12-17 2015-11-24 Cisco Technology, Inc. Repeater nodes in shared media networks
CN102130830B (en) * 2010-12-28 2013-04-17 华为技术有限公司 Standby label switched path (LSP) activation method, device and system
CN102136947A (en) * 2011-03-10 2011-07-27 华为技术有限公司 Method and device for processing link faults
CN102123097B (en) 2011-03-14 2015-05-20 杭州华三通信技术有限公司 Method and device for protecting router
JP5835043B2 (en) * 2012-03-19 2015-12-24 富士通株式会社 Restart method and node device
WO2014040628A1 (en) * 2012-09-13 2014-03-20 Telefonaktiebolaget L M Ericsson (Publ) Method and apparatus for node realignment in a telecommunications network
CN103354521B (en) * 2013-07-08 2016-09-07 杭州华三通信技术有限公司 The optimization method of a kind of LSP based on LDP renewal and device
US9176833B2 (en) * 2013-07-11 2015-11-03 Globalfoundries U.S. 2 Llc Tolerating failures using concurrency in a cluster
US9641455B2 (en) * 2013-07-18 2017-05-02 International Business Machines Corporation Mechanism for terminating relay operations in a distributed switch with cascaded configuration
CN104980295A (en) * 2014-04-09 2015-10-14 中兴通讯股份有限公司 Method, device and system for preventing network node from aging
CN105530117A (en) * 2014-10-24 2016-04-27 中兴通讯股份有限公司 Method, device and system for updating protocol state of control channel
CN105991480A (en) * 2015-02-27 2016-10-05 中兴通讯股份有限公司 Method and device for implementing RSVP-TE protocol message processing
CN106028385B (en) * 2016-05-05 2019-06-07 南京邮电大学 A kind of mobile Internet of things system failure terms method
CN107248941B (en) * 2017-06-30 2020-01-10 华为技术有限公司 Method and device for detecting path
US10721159B2 (en) * 2018-04-25 2020-07-21 Hewlett Packard Enterprise Development Lp Rebuilt flow events
CN110661550B (en) * 2019-09-27 2021-08-31 青岛联众芯云科技有限公司 Method, device, storage medium and electronic equipment for forwarding message in HPLC communication link
CN115509776B (en) * 2022-09-29 2024-02-06 南京远能电力工程有限公司监理分公司 Data analysis method and system based on intelligent supervision platform of electric power engineering

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1492601A (en) * 2002-10-25 2004-04-28 华为技术有限公司 Recovering method for controlling plane after multi-node restarting in intelligent optical network
CN1567844A (en) 2003-06-22 2005-01-19 华为技术有限公司 A method for controlling establishment of label switch path (LSP) based on resource reservation protocol (RSVP)
US20050259570A1 (en) * 2004-05-19 2005-11-24 Kddi Corporation Fault recovery method and program therefor
CN1801802A (en) * 2004-12-31 2006-07-12 华为技术有限公司 Node restarting method on universal multi protocol label exchange path

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7298693B1 (en) * 1999-10-21 2007-11-20 Tellabs Operations, Inc. Reverse notification tree for data networks
US6590868B2 (en) * 2001-06-02 2003-07-08 Redback Networks Inc. Method and apparatus for restart communication between network elements
US6950427B1 (en) * 2001-08-09 2005-09-27 Cisco Technology, Inc. Technique for resynchronizing LSDB in OSPF after a software reload in a non-stop forwarding intermediate node of a computer network
US7639680B1 (en) * 2001-10-04 2009-12-29 Cisco Technology, Inc. Out of band data base synchronization for OSPF
CA2420842C (en) * 2002-03-06 2010-05-11 Nippon Telegraph And Telephone Corporation Upper layer node, lower layer node, and node control method
US7760652B2 (en) * 2002-04-16 2010-07-20 Enterasys Networks, Inc. Methods and apparatus for improved failure recovery of intermediate systems
US20040229294A1 (en) * 2002-05-21 2004-11-18 Po-Ying Chan-Hui ErbB surface receptor complexes as biomarkers
US7248579B1 (en) * 2002-10-15 2007-07-24 Cisco Technology, Inc. System and method for providing a link state database (LSDB) snapshot for neighbor synchronization
WO2005036839A2 (en) * 2003-10-03 2005-04-21 Avici Systems, Inc. Rapid alternate paths for network destinations
US7680028B1 (en) * 2003-11-21 2010-03-16 Cisco Technology, Inc. Method and apparatus for restarting RSVP processes in multiple network devices
JPWO2005057864A1 (en) * 2003-12-12 2007-07-12 富士通株式会社 Network path switching system
US7957266B2 (en) * 2004-05-28 2011-06-07 Alcatel-Lucent Usa Inc. Efficient and robust routing independent of traffic pattern variability
JP4434867B2 (en) * 2004-07-15 2010-03-17 富士通株式会社 MPLS network system and node
KR100693052B1 (en) * 2005-01-14 2007-03-12 삼성전자주식회사 Apparatus and method of fast reroute for mpls multicast
CA2557678A1 (en) * 2005-09-08 2007-03-08 Jing Wu Recovery from control plane interruptions in communication networks
US20070280140A1 (en) * 2006-05-30 2007-12-06 Thiruvengadam Venketesan Self-optimizing network tunneling protocol
JP4580372B2 (en) * 2006-08-10 2010-11-10 株式会社日立製作所 Network system
CN101155124B (en) * 2006-09-27 2011-09-14 华为技术有限公司 Method for implementing fast multicast rerouting
US7969866B2 (en) * 2008-03-31 2011-06-28 Telefonaktiebolaget L M Ericsson (Publ) Hierarchical virtual private LAN service hub connectivity failure recovery

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1492601A (en) * 2002-10-25 2004-04-28 华为技术有限公司 Recovering method for controlling plane after multi-node restarting in intelligent optical network
CN1567844A (en) 2003-06-22 2005-01-19 华为技术有限公司 A method for controlling establishment of label switch path (LSP) based on resource reservation protocol (RSVP)
US20050259570A1 (en) * 2004-05-19 2005-11-24 Kddi Corporation Fault recovery method and program therefor
CN1801802A (en) * 2004-12-31 2006-07-12 华为技术有限公司 Node restarting method on universal multi protocol label exchange path

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117632569A (en) * 2024-01-12 2024-03-01 宝德计算机系统股份有限公司 Method and device for remedying PCI equipment information loss at BMC webpage end
CN117632569B (en) * 2024-01-12 2024-04-12 宝德计算机系统股份有限公司 Method and device for remedying PCI equipment information loss at BMC webpage end

Also Published As

Publication number Publication date
US8040793B2 (en) 2011-10-18
EP1986371B1 (en) 2011-12-14
EP1986371A1 (en) 2008-10-29
CN101087207B (en) 2010-05-12
ES2376361T3 (en) 2012-03-13
US20090086623A1 (en) 2009-04-02
CN101087207A (en) 2007-12-12
DK1986371T3 (en) 2012-02-27
EP1986371A4 (en) 2009-04-15
ATE537664T1 (en) 2011-12-15

Similar Documents

Publication Publication Date Title
WO2007140689A1 (en) A failure processing method and a system and a device thereof
EP1753182A1 (en) A method for node restart recovery in the general multi-protocol label-switching path
US7787362B2 (en) Method and device for recovering a shared mesh network
WO2008055436A1 (en) Method of controlling the status of graceful restart and router
WO2009082923A1 (en) Link fault processing method and data forwarding device
WO2006081767A1 (en) A method for implementing master and backup transmission path
WO2009023996A1 (en) Method for implementing network interconnect via link aggregation
WO2007128176A1 (en) A service switching method and the network node thereof
WO2013053267A1 (en) Lacp link switching and data transmission method and device
WO2011057540A1 (en) Method, device and system for updating ring network topology information
US20120014246A1 (en) Method and system for setting up path through autonomous distributed control, and communication device
WO2007036103A1 (en) A method for recovering the service-forwarding route and the system thereof
WO2009089726A1 (en) Method, system and apparatus for synchronizing the border gateway protocol routes
CN101123563B (en) A method, device and network for stable restart of multi-hop counterfeit wire
EP1921797B1 (en) Recovery method and apparatus for optical network lsp occuring abnormal delete
WO2009055995A1 (en) Maintaining method for automatic switched optical network system when operation engenders alarm
WO2010102560A1 (en) Method, device and system for exiting graceful restart
CN101123473B (en) Recovery method for control plane after node restart in automatic switching optical network
WO2012097571A1 (en) Method and node device for recovering ring network service
WO2011000184A1 (en) Method and system for refreshing the single ring address in an ethernet ring
WO2015154583A1 (en) Method, device and system for updating protocol state of control channel
WO2013033997A1 (en) Method and system for implementing cross reversal of control plane
KR100501320B1 (en) method for recovery of CR-LSP in Multi Protocol Label Switching system
WO2019001487A1 (en) Path data deletion method, and message forwarding method and apparatus
WO2012109873A1 (en) Method and apparatus for managing diameter routing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07720767

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2007720767

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE