WO2010003323A1 - 一种链路故障恢复的方法、系统和装置 - Google Patents

一种链路故障恢复的方法、系统和装置 Download PDF

Info

Publication number
WO2010003323A1
WO2010003323A1 PCT/CN2009/070482 CN2009070482W WO2010003323A1 WO 2010003323 A1 WO2010003323 A1 WO 2010003323A1 CN 2009070482 W CN2009070482 W CN 2009070482W WO 2010003323 A1 WO2010003323 A1 WO 2010003323A1
Authority
WO
WIPO (PCT)
Prior art keywords
fault
layer
node
link
timer
Prior art date
Application number
PCT/CN2009/070482
Other languages
English (en)
French (fr)
Inventor
刘庆智
夏洪淼
郭大勇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2010003323A1 publication Critical patent/WO2010003323A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure

Definitions

  • the present invention relates to the field of network communications, and in particular, to a method, system and apparatus for link failure recovery.
  • Both the IP layer and the transport layer have corresponding technical implementation protection.
  • the multi-protocol label switching fast re-routing (MPLS FRR) technology can be used at the IP layer, and the transport layer can use 1+1 protection technology, 1:1 protection technology or shared protection. technology.
  • MPLS FRR multi-protocol label switching fast re-routing
  • the router can detect the fault and quickly protect it.
  • both the router and the optical device can detect the fault and initiate their own protection recovery mechanisms. Because the startup time of the two is similar, when the router switches the faulty link to the protection link, the optical layer recovery is successful. That is, the original link of the fault has been repaired and can be switched back to the original link. At this point, the router does useless work and may also cause route flapping.
  • the prior art introduces a timer (hold-off timer (usually 50 ms).
  • a timer hold-off timer (usually 50 ms).
  • the router detects a fault, it sets a timer, and waits for the optical layer to take protection measures for a predetermined time, and the router does not process.
  • the router determines whether the fault still exists through a mechanism such as bidirectional forwarding detection (BFD). If the transport layer has been restored, the router turns off the timer and resumes the normal working state. If the timer expires, the router detects the fault. Exist, immediately start the protection recovery mechanism of the IP layer.
  • BFD bidirectional forwarding detection
  • the inventor has found that at least the following problems exist in the prior art:
  • the protection recovery mechanism of the IP layer is started after the timer set by the router expires, thereby prolonging Recovery time of IP layer failure.
  • an embodiment of the present invention provides a method, system, and system for link failure recovery.
  • the device when overcoming the IP layer failure of the network, the router sets the timer to wait, which will prolong the defect of the recovery time of the IP layer failure.
  • the embodiment of the present invention provides a method for recovering a link fault, which includes: starting a timer when detecting a network fault; receiving fault information reported by a node that has detected the fault type; obtaining the fault type from the fault information; If the predetermined time is not exceeded and the fault type is an IP layer fault, the timer is closed, and the faulty IP layer link is switched to the IP layer protection link.
  • the embodiment of the present invention further provides a device for link fault recovery, comprising: a detecting module, configured to detect a network fault; and a starting module, configured to start a timer when the detecting module detects a network fault; And the failure module is configured to receive the fault information reported by the node that has detected the fault type; the parsing module is configured to obtain the fault type from the fault information received by the receiving module, and the protection module is configured to: when the predetermined time is not exceeded, and the fault type When the IP layer is faulty, the timer is turned off, and the faulty IP layer link is switched to the IP layer protection link.
  • the embodiment of the present invention further provides a system for link failure recovery, including: a first node and a second node;
  • a first node configured to carry fault information of the fault type to the second node when the network fault is detected
  • a second node configured to: when a network fault is detected, start a timer; receive fault information reported by the first node; obtain a fault type from the fault information; when the predetermined time is not exceeded, and the fault type is an IP layer fault
  • the timer is turned off, the faulty IP layer link is switched to the IP layer protection link.
  • the embodiment of the present invention when detecting a network fault, starts a timer, parses the fault information with a fault type sent by the downstream node, and obtains the fault type, when the timer is not exceeded.
  • the fault type is IP layer fault
  • the timer is immediately shut down and the faulty IP layer link is switched to the IP layer protection link, thereby greatly speeding up the fault recovery, shortening the fault recovery time, and avoiding useless waiting.
  • FIG. 1 is a schematic diagram of an application scenario of an embodiment of the present invention.
  • FIG. 3 is a flowchart of a method for performing link fault recovery on a first node in an embodiment of the present invention
  • 4 is a flowchart of a method for performing link fault recovery by an intermediate protection node according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram of switching a faulty IP layer link to an IP layer temporary protection link according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of a re-establishment of a link by a first node R1 in an embodiment of the present invention
  • Figure ⁇ is a flowchart of a method for link failure recovery in still another embodiment of the present invention.
  • FIG. 8 is a schematic diagram of an apparatus for link failure recovery in an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a system for link failure recovery in an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of an application scenario according to an embodiment of the present invention.
  • Nodes R1, R2, R3, and R4 are routers, and nodes N1, N2, N3, and N4 are optical nodes.
  • R1 and N1, R2 and N2, R3 and N3, R4 and N4 are connected by POS (Packet of SDH, SDH package) or OTN (Optical Transport Network) interface.
  • the link between the routers R1 and R2 is carried by the LSP (Lable Switch Path) between the optical nodes N1 and N2, and the link between the routers R2 and R3 is carried by the LSP between the optical nodes N2 and N3.
  • the link between the routers R3 and R4 is carried by the LSP between the optical nodes N3 and N4.
  • An LSP is established between routers R1 and R4, and the path is R1 - R2 - R3 - R4.
  • the LSP between node R3 and node R4 fails.
  • Step S201 When a network fault is detected, start a timer;
  • the first node (such as R1 in FIG. 1) may detect the network failure and start the timer.
  • the first node can only detect the fault, but it cannot be determined whether the fault belongs to the IP layer or the transport layer.
  • Step S202 Receive fault information reported by a node that has detected a fault type.
  • the first node (such as R1 in FIG. 1) sends information carrying a flag bit to the downstream node (such as R3 in FIG. 1), and the flag is used to indicate that the downstream node reports the fault type when detecting a network fault.
  • the downstream node can detect whether the fault occurred at the IP layer or the transport layer.
  • the downstream node sends fault information carrying the fault type to the head node, where the fault type is used to indicate whether the network fault that occurred is an IP layer fault or a transport layer fault.
  • Step S203 Obtain the fault type from the fault information.
  • the first node resolves the fault information reported by the downstream node, and obtains whether the network fault that occurred is an IP layer fault or a transport layer fault. Block, turn off the timer, and switch the faulty IP layer link to the IP layer protection link. The first node closes the timer and switches the failed IP layer link to the IP layer protection link.
  • the above method further includes: the device times out.
  • the timer expires, it is detected whether the network fault still exists; if it exists, the faulty IP layer link is switched to the IP layer protection link; if it does not exist, the transport layer fault has been restored. It should be noted that if the predetermined time is exceeded, the network fault is still detected. In this case, the IP layer link failure may not be recovered, or the transport layer fault may not be recovered. For the former case, the faulty IP layer link needs to be switched to the IP layer protection link. For the latter case, the current IP layer link needs to be switched to the IP layer protection link.
  • the intermediate protection node (such as R2 in FIG. 1) detects a network fault
  • the timer is started; the downstream node sends the fault information carrying the fault type to the intermediate protection node; and the intermediate protection node obtains the fault information by acquiring the fault information.
  • the intermediate protection node determines whether the network fault occurs to the IP layer fault or the transport layer fault; if the predetermined time of the above timer is not exceeded and the fault type is an IP layer fault, the intermediate protection node turns off the timer and switches the faulty IP layer link to IP. Layer temporary protection link.
  • the timer can be directly shut down and the faulty IP layer link is switched to the IP layer protection link. Fault recovery speed, shorten recovery time, avoid useless waiting.
  • FIG. 3 a flowchart of a method for performing link failure recovery for the first node is shown.
  • the scenario in Figure 1 is taken as an example.
  • R1 is the head node.
  • the detailed process of link failure recovery is as follows:
  • Step S301 When the path is established, the first node R1 sends a Path message to the downstream node. Specifically, an LSP is established between the first node R1 and the node R4, and the first node R1 is next. Point R2 initiates a Path message, and the next node R2 processes the Path message and sends it to the next node R3, and so on, and sends it to the path node specified by the ERO (Explicit Route Object) carried by the Path message.
  • ERO explicit Route Object
  • RSVP-TE Resource Reservation Protocol Based on Traffic Engineering Extension
  • a flag is added to the header of the session attribute Object of the message, and the flag is used for Indicates that the downstream node reports the fault type to the upstream 4 when the fault is detected, such as an IP layer fault or a transport layer fault. That is to say, after receiving the message, the downstream node needs to send the fault information including the fault type to the head node if the fault type is detected.
  • Step S302 When the first node R1 detects that the LSP is faulty, the timer is started.
  • the first node R1 detects that a fault has occurred through BFD, but cannot determine the type of the fault, that is, whether the fault belongs to the IP layer or the transport layer. At this time, the first node R1 starts the timer.
  • Step S303 The node R3 determines the type of the fault by detecting; specifically, the downstream node R3 passes the POS and OTN interface technologies to determine whether the fault occurs at the IP layer or the transport layer.
  • Step S304 The node R3 sends the fault information to indicate the fault type.
  • the downstream node R3 carries the fault type in the sent fault information, and is used to identify the fault type, such as an IP layer fault or a transport layer fault.
  • the downstream node R3 may directly send the fault information to the head node R1, or may send the fault information to the upstream node R2, and the node R2 sends the fault information to the head node R1.
  • Step S305 The first node R1 parses the fault information to obtain a fault type.
  • Step S306 the first node R1 detects whether the timer exceeds a predetermined time, if the execution step S307 is not exceeded; if the execution step S310 has been exceeded;
  • Step S307 Determine whether the fault type acquired by the first node R1 is an IP layer fault.
  • step S308 is performed; if not, that is, the fault type is a transport layer fault, step S309 is performed.
  • Step S308 The first node R1 closes the timer, and switches the faulty IP layer link to the IP layer protection link. At this time, the process ends.
  • Step S309 waiting for the timer to exceed the predetermined time, proceeds to step S306;
  • Step S310 The first node R1 detects whether the fault still exists, and if yes, performs step S311; if not, does not perform any operation;
  • Step S311 The first node R1 switches the faulty IP layer link to an IP layer protection link. The process ends.
  • the first node detects the fault and starts the timer. After receiving the fault information on the downstream node, the fault information is obtained by parsing the fault information.
  • the timer can be turned off immediately.
  • the IP layer protection mechanism is enabled to switch the faulty IP layer link to the IP layer protection link, which shortens the time for fault recovery.
  • FIG. 4 a flowchart of a method for performing link failure recovery for an intermediate protection node is shown.
  • the scenario in Figure 1 is taken as an example.
  • Node R2 is used as an intermediate protection node.
  • the specific process of link failure recovery is as follows:
  • Step S401 When the node R2 detects that the LSP is faulty, the timer is started.
  • the node R2 detects that the fault has occurred through the BFD, but cannot determine the type of the fault, that is, whether the fault belongs to the IP layer or the transport layer. At this point, node R2 starts the timer and waits for the transport layer to resume.
  • Step S402 After detecting the fault type, the node R3 reports the fault information to the node R2. Step S403: The node R2 parses the received fault information to obtain the fault type.
  • Step S404 the node R2 detects whether the timer exceeds a predetermined time, if not, step S405 is performed; if it has been exceeded, step S410 is performed;
  • step S405 it is determined whether the fault type acquired by the intermediate protection node R2 is an IP layer fault; if the fault type is an IP layer fault, step S406 is performed; if the fault type is not an IP layer fault, that is, the fault type is a transport layer fault, step S409 is performed;
  • Step S406 The node R2 closes the timer, and switches the faulty IP layer link to the IP layer temporary protection link.
  • the IP layer temporary protection link is a temporary link between node R2 and node R4, that is, the path is changed to node R2 - node R5 - node R4.
  • Step S407 The node R2 notifies the first node R1, and the IP layer between the node R3 and the node R4 fails;
  • Step S408 The first node R1 starts a recovery mechanism, and re-selects a new route to establish a link. At this time, the process ends.
  • the first node R1 selects a route that does not pass through the faulty link, and re-establishes the link.
  • the new link is: the first node R1 - the node R6 - the node R4.
  • Step S409 waiting for the timer to exceed the predetermined time, proceeds to step S404;
  • Step S410 The node R2 detects whether the fault still exists, if yes, performs step S411; if not, does not perform any operation;
  • Step S411 The node R2 switches the faulty IP layer link to the IP layer temporary protection link, and then proceeds to step S407.
  • the timer is shut down and the faulty IP layer link is switched to the IP layer protection link.
  • the faulty node can be Temporary information interaction.
  • a further embodiment of the present invention provides a method for link fault recovery.
  • the first node R1 acts as a protection point, and a fault occurs between the node R3 and the node R4.
  • the fault information reported by the node.
  • the processing in this scenario includes:
  • Step S701 After detecting the fault type, the node R3 sends the fault information to the upstream node; the node R3 may directly send the fault information to the head node R1, or may send the fault information to the upstream node R2, and send the node R2 to the head node R1. The fault information.
  • Step S702 After receiving the fault information, the first node R1 continues to detect the link. Specifically, after receiving the fault information, the first node R1 does not parse the fault information, but continues to detect the link.
  • Step S703 After detecting that the link is faulty, the first node R1 starts a timer.
  • Step S704 The first node R1 parses the fault information. If the fault type is faulty on the IP layer, the timer is closed, the faulty IP layer link is switched to the IP layer protection link, and the new route establishment path is reselected.
  • the protection point does not resolve the reported fault information. After detecting the fault, the timer is started and the fault information is resolved. If the fault is the IP layer, the timer is shut down and the faulty IP layer link is switched to the IP layer protection link. , greatly shorten the time for fault recovery.
  • An embodiment of the present invention provides a device for recovering a link fault, as shown in FIG. 8, including: a detecting module 801, configured to detect a network fault;
  • the startup module 802 is configured to start a timer when the detection module 801 detects a network failure, and the receiving module 803 is configured to receive the fault information reported by the node that has detected the fault type.
  • the parsing module 804 is configured to obtain a fault type from the fault information received by the receiving module 803.
  • the protection module 805 is configured to: when the timer is not exceeded for a predetermined time and the fault type is an IP layer fault, the timer is turned off, and the fault is The IP layer link is switched to an IP layer protection link.
  • the device further includes: a sending module 806, configured to send information, including a flag bit, to the node, where the flag is used to indicate that the node reports a fault type when detecting a network fault.
  • a sending module 806, configured to send information, including a flag bit, to the node, where the flag is used to indicate that the node reports a fault type when detecting a network fault.
  • the foregoing protection module 805 is further configured to wait for the timer to expire when the predetermined time of the timer is not exceeded and the fault type is a transport layer fault.
  • the foregoing protection module 805 is further configured to switch the faulty IP layer link to an IP layer protection link when the timer is exceeded for a predetermined time and the detecting module detects a network failure. It should be noted that if the predetermined time is exceeded, the network fault is still detected. In this case, the IP layer link failure may not be recovered, or the transport layer fault may not be recovered. For the former case, the foregoing protection module 805 needs to switch the faulty IP layer link to the IP layer protection link. For the latter case, the protection module 805 also needs to switch the current IP layer link to the IP layer protection link.
  • the device provided by the embodiment of the present invention obtains the fault type by parsing the reported fault information, and when the fault type is an IP layer fault, the timer is turned off and corresponding processing is performed, thereby shortening the fault recovery time when the IP layer fails. .
  • the embodiment of the present invention further provides a system for link failure recovery, as shown in FIG. 9, including: a first node 901 and a second node 902;
  • the first node 901 is configured to report, to the second node 902, fault information including a fault type when the network fault is detected;
  • the second node 902 is configured to: when a network fault is detected, start a timer; receive fault information reported by the first node 901; obtain a fault type from the received fault information; and when the predetermined time is not exceeded, the fault type When the IP layer is faulty, the timer is turned off, and the faulty IP layer link is switched to the IP layer protection link.
  • the foregoing second node 902 includes: a detecting module 9021, configured to detect a network fault;
  • the protection module 9025 is configured to: when the timer is not exceeded for a predetermined time and the fault type is an IP layer fault, the timer is turned off, The IP layer link of the barrier switches to the IP layer protection link.
  • the foregoing second node 902 further includes: a sending module 9026, configured to send, to the first node 901, information including a flag bit, where the flag bit is used to indicate that the first node 901 reports the fault type when detecting a network fault.
  • a sending module 9026 configured to send, to the first node 901, information including a flag bit, where the flag bit is used to indicate that the first node 901 reports the fault type when detecting a network fault.
  • the foregoing protection module 9025 is further configured to wait for the timer to expire when the predetermined time of the timer is not exceeded and the fault type is a transport layer fault.
  • the protection module 9025 is further configured to switch the faulty IP layer link to an IP layer protection link when the timer is exceeded for a predetermined time and the detection module detects a network failure.
  • the system and the device provided by the embodiment of the present invention detect a network fault, start a timer, parse the fault information with the fault type sent by the downstream node, and obtain the fault type, when the timer is not exceeded.
  • the time and fault type is IP layer fault
  • the timer is immediately turned off and the faulty IP layer link is switched to the IP layer protection link, which greatly speeds up the fault recovery, shortens the fault recovery time, and avoids useless waiting.
  • the present invention can be implemented by hardware or by means of software plus a necessary general hardware platform.
  • the technical solution of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.), including several The instructions are for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Description

一种链路故障恢复的方法、 系统和装置
本申请要求于 2008 年 7 月 7 日提交中国专利局、 申请号为 200810127676.1、 发明名称为"一种链路故障恢复的方法、 系统和装置"的中国 专利申请的优先权, 其全部内容通过引用结合在本申请中。 技术领域
本发明涉及网络通讯领域, 尤其涉及一种链路故障恢复的方法、 系统和装 置。
背景技术
在现有的多层网络中, 网络故障可能发生在 IP层或传送层, 所以通常在
IP层和传送层都有相应的技术实现保护 , 如在 IP层可以使用多协议标签交换 快速重路由 (MPLS FRR )技术, 传送层可以使用 1+1保护技术、 1:1保护技 术或共享保护技术。
当网络发生 IP层故障时, 路由器可以检测到该故障并快速保护; 当网络 发生传送层故障时,通常路由器和光设备都可以检测到该故障, 并分别启动各 自的保护恢复机制。 由于二者启动保护时间相近, 当路由器将故障的链路切换 到保护链路上时, 光层恢复也已成功, 即故障的原链路已经修复, 可以重新切 换回原链路。 这时路由器做了无用工作, 且还可能导致路由振荡。
为了避免这种情况,现有技术引入了定时器( hold-off timer,通常是 50ms ), 当路由器检测到故障时设置定时器, 在预定时间内等待光层先采取保护措施, 路由器不作处理。 等待过程中, 路由器通过双向转发检测(BFD )等机制判断 故障是否仍然存在, 若传送层已经恢复完毕, 那么路由器关闭定时器, 恢复正 常工作状态; 若超过定时器预定的时间, 路由器检测故障仍然存在, 立即启动 IP层的保护恢复机制。
发明人在实现本发明的过程中, 发现现有技术至少存在以下问题: 当网络故障是 IP层故障时, 在路由器设置的定时器超时后, 才会启动 IP 层的保护恢复机制 , 从而会延长 IP层故障的恢复时间。
发明内容
为解决上述问题,本发明实施例提供了一种链路故障恢复的方法、 系统和 装置, 克服网络出现 IP层故障时, 路由器设置定时器一直等待, 会延长 IP层 故障的恢复时间的缺陷。
本发明实施例提供一种链路故障恢复的方法, 包括: 检测到网络故障时, 启动定时器;接收已检测到故障类型的节点上报的故障信息; 从上述故障信息 中获取上述故障类型;如果未超过上述定时器预定的时间且上述故障类型为 IP 层故障, 关闭上述定时器, 将故障的 IP层链路切换为 IP层保护链路。
相应地, 本发明实施例还提供了一种链路故障恢复的装置, 包括: 检测模 块, 用于检测网络故障; 启动模块, 用于当检测模块检测到网络故障时, 启动 定时器; 接收模块, 用于接收已检测到故障类型的节点上报的故障信息; 解析 模块, 用于从接收模块接收的故障信息中获取故障类型; 保护模块, 用于当未 超过上述定时器预定的时间且故障类型为 IP层故障时, 关闭定时器, 将故障 的 IP层链路切换为 IP层保护链路。
相应地, 本发明实施例还提供了一种链路故障恢复的系统, 包括: 第一节 点和第二节点; 其中,
第一节点, 用于在检测到网络故障时, 向第二节点上 携带故障类型的故 障信息;
第二节点, 用于在检测到网络故障时, 启动定时器; 接收第一节点上报的 故障信息; 从故障信息中获取故障类型; 当未超过上述定时器预定的时间且故 障类型为 IP层故障时, 关闭定时器, 将故障的 IP层链路切换为 IP层保护链 路。
与现有技术相比, 本发明实施例通过在检测到网络故障时, 启动定时器, 解析下游节点发送的带有故障类型的故障信息, 并获取故障类型, 当未超过上 述定时器预定的时间且故障类型为 IP层故障时, 立即关闭定时器并将故障的 IP层链路切换为 IP层保护链路, 从而大大加快故障恢复速度, 缩短故障的恢 复时间, 避免无用的等待。
附图说明
图 1是本发明实施例的应用场景示意图;
图 2是本发明实施例中链路故障恢复的方法流程图;
图 3 是本发明实施例中首节点执行链路故障恢复的方法流程图; 图 4是本发明实施例中中间保护节点执行链路故障恢复的方法流程图; 图 5是本发明实施例中将故障的 IP层链路切换为 IP层临时保护链路的示 意图;
图 6是本发明实施例中首节点 R1重新建立链路的示意图;
图 Ί是本发明又一实施例中链路故障恢复的方法流程图;
图 8是本发明实施例中链路故障恢复的装置示意图;
图 9是本发明实施例中链路故障恢复的系统示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清 楚、 完整地描述。
如图 1所示, 为本发明实施例的应用场景的示意图。 节点 Rl、 R2、 R3、 R4 为路由器, 节点 Nl、 N2、 N3、 N4是光节点。 R1与 Nl、 R2与 N2、 R3 与 N3、R4与 N4之间采用 POS( Packet of SDH, SDH封装包)或者 OTN( Optical Transport Network, 光传送网)接口连接。 路由器 R1和 R2之间的链路由光节 点 N1和 N2之间的 LSP( Lable Switch Path,标签交换路径)承载,路由器 R2 和 R3之间的链路由光节点 N2和 N3 之间的 LSP承载, 路由器 R3和 R4之 间的链路由光节点 N3和 N4之间的 LSP承载。在路由器 R1至 R4之间建立 一条 LSP, 路径是 R1— R2— R3— R4。 其中, 节点 R3和节点 R4之间的 LSP 出现故障。
如图 2所示, 为本发明实施例中链路故障恢复的方法流程图, 包括: 步骤 S201、 检测到网络故障时, 启动定时器;
具体的, 可以是首节点 (如图 1中的 R1 )检测到网络故障时, 启动定时 器。 首节点只能检测到该故障, 但不能确定该故障属于 IP层还是传送层。
步骤 S202、 接收已检测到故障类型的节点上报的故障信息;
具体的, 首节点(如图 1中的 R1 )向下游节点(如图 1中的 R3 )发送携 带标志位的信息,该标志位用于指示下游节点在检测到网络故障时,上报故障 类型。 下游节点可以检测出该故障发生在 IP层还是传送层。 下游节点发送携 带故障类型的故障信息至首节点,其中,故障类型用于指示发生的网络故障是 属于 IP层故障还是传送层故障。 步骤 S203、 从上述故障信息中获取上述故障类型;
具体的, 首节点通过解析下游节点上报的故障信息,获取到发生的网络故 障是属于 IP层故障还是传送层故障。 障, 关闭定时器, 将故障的 IP层链路切换为 IP层保护链路。 首节点关闭定时器, 将故障的 IP层链路切换为 IP层保护链路。
上述方法还包括: 器超时。
如果超过上述定时器预定的时间 ,检测网络故障是否仍然存在;如果存在 , 将故障的 IP层链路切换为 IP层保护链路; 如果不存在, 说明已对传送层故障 进行了恢复。 需要说明的是, 如果超过上述定时器预定的时间, 检测到网络故 障仍然存在, 这种情况, 可能是 IP层链路故障还未恢复, 也有可能是传送层 故障还未恢复。对于前者情况,需要将故障的 IP层链路切换为 IP层保护链路, 对于后者情况, 也需要将当前的 IP层链路切换为 IP层保护链路。
上述方法, 也可以是中间保护节点(如图 1中的 R2 )检测到网络故障时, 启动定时器; 下游节点发送携带故障类型的故障信息至中间保护节点; 中间保 护节点通过解析故障信息, 获取到发生的网络故障是属于 IP层故障还是传送 层故障; 如果未超过上述定时器预定的时间且上述故障类型为 IP层故障, 中 间保护节点关闭定时器, 将故障的 IP层链路切换为 IP层临时保护链路。
通过上述步骤, 在未超过定时器预定的时间时, 若获取的故障类型为 IP 层故障, 可以直接关闭定时器, 并将故障的 IP层链路切换为 IP层保护链路, 这样可以大大加快故障恢复速度, 缩短恢复时间, 避免无用的等待。
如图 3所示, 为首节点执行链路故障恢复的方法流程图。 以图 1中的场景 为例进行说明, R1为首节点, 当节点 R3和节点 R4之间的 LSP出现故障时, 链路故障恢复的具体过程如下, 包括:
步骤 S301、 建立路径时, 首节点 R1向下游节点发送 Path消息; 具体的,在首节点 R1至节点 R4之间建立一条 LSP, 首节点 R1向下一节 点 R2发起 Path消息 ,下一节点 R2对该 Path消息进行处理后发送到下一节点 R3 , 以此类推, 逐次发送到 Path消息携带的 ERO ( Explicit Route Object, 目 标路径指示)指定的途径节点。 通过对 RSVP-TE (基于流量工程扩展的资源 预留协议)协议进行扩展, 向下游节点发送 Path消息时, 在该消息的会话属 性 Object的报文头中增加一标志位, 该标志位用于指示下游节点检测到故障 时向上游4艮告故障类型, 如 IP层故障或传送层故障。 也就是说, 下游节点接 收到该消息后, 若检测到故障类型, 需要向首节点发送包括故障类型的故障信 息。
步骤 S302、 首节点 R1检测到 LSP发生故障时, 启动定时器;
具体的,首节点 R1通过 BFD检测到发生故障,但不能确定该故障的类型, 即该故障属于 IP层还是传送层。 此时, 首节点 R1启动定时器。
步骤 S303、 节点 R3通过检测确定故障的类型; 具体的, 下游节点 R3通 过 POS和 OTN接口技术, 可以确定故障发生在 IP层还是传送层。
步骤 S304、 节点 R3发送故障信息, 指示故障类型; 具体的, 下游节点 R3在发送的故障信息中携带故障类型, 用于标识故障的类型, 如 IP层故障或 传送层故障。 下游节点 R3可以直接向首节点 R1发送该故障信息, 也可以向 上游节点 R2发送该故障信息, 由节点 R2向首节点 R1发送该故障信息。
步骤 S305、 首节点 R1解析故障信息, 获取故障类型;
步骤 S306、首节点 R1检测定时器是否超过预定的时间, 若未超过执行步 骤 S307; 若已超过执行步骤 S310;
步骤 S307、 判断首节点 R1获取的故障类型是否为 IP层故障;
如果是, 执行步骤 S308; 如果不是, 即该故障类型为传送层故障, 执行 步骤 S309。
步骤 S308、 首节点 R1关闭定时器, 将故障的 IP层链路切换为 IP层保护 链路, 此时, 流程结束;
步骤 S309、 等待定时器超过预定的时间, 转步骤 S306;
步骤 S310、 首节点 R1检测故障是否仍然存在,如果存在,执行步骤 S311; 如果不存在, 则不进行任何操作;
步骤 S311、 首节点 R1将故障的 IP层链路切换为 IP层保护链路, 此时, 流程结束。
才艮据上述步骤, 首节点检测到故障并启动定时器, 收到下游节点上^艮的故 障信息后, 通过解析该故障信息获取故障类型, 当故障属于 IP层时可以提前 关闭定时器, 立即启动 IP层保护机制, 将故障的 IP层链路切换为 IP层保护 链路, 缩短了故障恢复的时间。
如图 4 所示, 为中间保护节点执行链路故障恢复的方法流程图。 以图 1 中的场景为例进行说明, 节点 R2作为中间保护节点, 当节点 R3与节点 R4 之间发生故障时, 链路故障恢复的具体过程如下, 包括:
步骤 S401、 节点 R2检测到 LSP发生故障时, 启动定时器;
具体的, 节点 R2通过 BFD检测到发生故障, 但不能确定该故障的类型, 即该故障属于 IP层还是传送层。 此时, 节点 R2启动定时器, 优先等待传送层 恢复。
步骤 S402、 节点 R3检测到故障类型后, 向节点 R2上报故障信息; 步骤 S403、 节点 R2解析接收的故障信息, 获取故障类型;
步骤 S404、 节点 R2检测定时器是否超过预定的时间, 若未超过, 执行步 骤 S405; 若已超过, 执行步骤 S410;
步骤 S405、 判断中间保护节点 R2获取的故障类型是否为 IP层故障; 如 果故障类型是 IP层故障,执行步骤 S406; 如果故障类型不是 IP层故障, 即故 障类型是传送层故障, 执行步骤 S409;
步骤 S406、 节点 R2关闭定时器, 将故障的 IP层链路切换为 IP层临时保 护链路;
如图 5所示,该 IP层临时保护链路为节点 R2与节点 R4之间的临时链路, 即路径更改为节点 R2—节点 R5—节点 R4。
步骤 S407、节点 R2通知首节点 R1 ,节点 R3与节点 R4之间的 IP层发生 故障;
步骤 S408、首节点 R1启动恢复机制,重新选择新的路由建立链路,此时, 流程结束;
如图 6所示, 首节点 R1选择不经过故障链路的路由, 重新建立链路, 新 的链路为: 首节点 R1—节点 R6—节点 R4。 步骤 S409、 等待定时器超过预定的时间, 转步骤 S404;
步骤 S410、 节点 R2检测故障是否仍然存在, 如果存在, 执行步骤 S411; 如果不存在, 则不进行任何操作;
步骤 S411、 节点 R2将故障的 IP层链路切换为 IP层临时保护链路, 然后 转步骤 S407。
中间保护节点获取到 IP层发生故障后, 关闭定时器并将故障的 IP层链路 切换为 IP层保护链路, 在首节点启动恢复机制来修复原链路前, 发生故障的 节点之间可以暂时的进行信息交互。
本发明又一实施例提供了一种链路故障恢复的方法, 首节点 R1作为保护 点, 节点 R3与节点 R4之间发生故障, 首节点 R1还未检测到发生故障时, 就 已经接收到下游节点上报的故障信息, 该场景下的处理过程, 如图 7所示, 包 括:
步骤 S701、 节点 R3检测到故障类型后, 向上游节点上 故障信息; 节点 R3 可以直接向首节点 R1发送该故障信息, 也可以向上游节点 R2 发送该故障信息, 由节点 R2向首节点 R1发送该故障信息。
步骤 S702、 首节点 R1接收到该故障信息后, 继续对链路进行检测; 具体的, 首节点 R1接收到该故障信息后, 并不解析该故障信息, 而是对 链路继续进行检测。
步骤 S703、 首节点 R1检测到链路发生故障后, 启动定时器;
步骤 S704、 首节点 R1解析故障信息, 若故障类型为 IP层发生故障时, 关闭定时器, 将故障的 IP层链路切换为 IP层保护链路, 重新选择新的路由建 立路径。
保护点对上报的故障信息暂不解析,在检测到故障之后, 启动定时器并解 析故障信息, 若故障为 IP层时, 关闭定时器并将故障的 IP层链路切换为 IP 层保护链路, 较大地缩短故障恢复的时间。
本发明实施例提供了一种链路故障恢复的装置, 如图 8所示, 包括: 检测模块 801 , 用于检测网络故障;
启动模块 802, 用于当检测模块 801检测到网络故障时, 启动定时器; 接收模块 803 , 用于接收已检测到故障类型的节点上报的故障信息; 解析模块 804, 用于从接收模块 803接收的故障信息中获取故障类型; 保护模块 805, 用于当未超过上述定时器预定的时间且故障类型为 IP层 故障时, 关闭定时器, 将故障的 IP层链路切换为 IP层保护链路。
该装置还包括: 发送模块 806, 用于向上述节点发送包括标志位的信息, 该标志位用于指示上述节点在检测到网络故障时, 上报故障类型。
上述保护模块 805, 还用于当未超过上述定时器预定的时间且故障类型为 传送层故障时, 等待定时器超时。
上述保护模块 805, 还用于当超过上述定时器预定的时间且检测模块检测 到网络故障时, 将故障的 IP层链路切换为 IP层保护链路。 需要说明的是, 如 果超过上述定时器预定的时间, 检测到网络故障仍然存在, 这种情况, 可能是 IP层链路故障还未恢复, 也有可能是传送层故障还未恢复。 对于前者情况, 上述保护模块 805需要将故障的 IP层链路切换为 IP层保护链路,对于后者情 况, 上述保护模块 805也需要将当前的 IP层链路切换为 IP层保护链路。
本发明实施例提供的装置通过解析上报的故障信息获取故障类型 ,并在故 障类型为 IP层故障时, 关闭定时器并进行相应的处理, 由此可以在 IP层发生 故障时缩短故障恢复的时间。
本发明实施例还提供了一种链路故障恢复的系统, 如图 9所示, 包括: 第 一节点 901和第二节点 902; 其中,
第一节点 901 , 用于在检测到网络故障时, 向第二节点 902上报包括故障 类型的故障信息;
第二节点 902,用于在检测到网络故障时,启动定时器;接收第一节点 901 上报的故障信息;从接收的故障信息中获取故障类型; 当未超过上述定时器预 定的时间且故障类型为 IP层故障时, 关闭定时器, 将故障的 IP层链路切换为 IP层保护链路。
上述第二节点 902包括: 检测模块 9021 , 用于检测网络故障; 启动模块
9022,用于当检测模块 9021检测到网络故障时, 启动定时器;接收模块 9023, 用于接收第一节点 901上报的携带故障类型的故障信息; 解析模块 9024, 用 于从接收模块 9023接收的故障信息中获取故障类型; 保护模块 9025, 用于当 未超过上述定时器预定的时间且故障类型为 IP层故障时, 关闭定时器, 将故 障的 IP层链路切换为 IP层保护链路。
上述第二节点 902还包括: 发送模块 9026, 用于向第一节点 901发送包 括标志位的信息,该标志位用于指示第一节点 901在检测到网络故障时,上报 故障类型。
上述保护模块 9025, 还用于当未超过上述定时器预定的时间且故障类型 为传送层故障时, 等待定时器超时。
上述保护模块 9025, 还用于当超过上述定时器预定的时间且检测模块检 测到网络故障时 , 将故障的 IP层链路切换为 IP层保护链路。
通过本发明实施例提供的方法、 系统和装置, 在检测到网络故障时, 启动 定时器, 解析下游节点发送的带有故障类型的故障信息, 并获取故障类型, 当 未超过上述定时器预定的时间且故障类型为 IP层故障时, 立即关闭定时器并 将故障的 IP层链路切换为 IP层保护链路, 从而大大加快故障恢复速度, 缩短 故障的恢复时间, 避免无用的等待。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本发明 可以通过硬件实现,也可以借助软件加必要的通用硬件平台的方式来实现。基 于这样的理解,本发明的技术方案可以以软件产品的形式体现出来,该软件产 品可以存储在一个非易失性存储介质(可以是 CD-ROM, U盘, 移动硬盘等) 中, 包括若干指令用以使得一台计算机设备(可以是个人计算机, 服务器, 或 者网络设备等)执行本发明各个实施例所述的方法。
以上所述仅是本发明的优选实施方式, 应当指出, 对于本技术领域的普通 技术人员来说, 在不脱离本发明原理的前提下, 还可以做出若干改进和润饰, 这些改进和润饰也应视本发明的保护范围。

Claims

权 利 要 求
1、 一种链路故障恢复的方法, 其特征在于, 包括:
检测到网络故障时, 启动定时器;
接收已检测到故障类型的节点上报的故障信息,所述故障信息携带所述故 障类型;
从所述故障信息中获取所述故障类型;
如果未超过所述定时器预定的时间且所述故障类型为 IP层故障, 关闭所 述定时器, 将故障的 IP层链路切换为 IP层保护链路。
2、 如权利要求 1所述的方法, 其特征在于, 所述节点上报的故障信息是 根据其接收到的携带标志位的信息上报的,其中所述标志位用于指示所述节点 在检测到网络故障时上报故障类型。
3、 如权利要求 1所述的方法, 其特征在于, 还包括: 述定时器超时。
4、 如权利要求 1所述的方法, 其特征在于, 还包括:
如果超过所述定时器预定的时间 , 检测所述网络故障是否仍然存在; 如果存在, 将故障的 IP层链路切换为 IP层保护链路。
5、 如权利要求 3所述的方法, 其特征在于, 还包括:
如果超过所述定时器预定的时间 , 检测所述网络故障是否仍然存在; 如果存在, 将当前的 IP层链路切换为 IP层保护链路。
6、 一种链路故障恢复的装置, 其特征在于, 包括:
检测模块, 用于检测网络故障;
启动模块, 用于当所述检测模块检测到网络故障时, 启动定时器; 接收模块, 用于接收已检测到故障类型的节点上报的故障信息, 所述故障 信息携带所述故障类型;
解析模块, 用于从所述接收模块接收的故障信息中获取所述故障类型; 保护模块, 用于当未超过所述定时器预定的时间且所述故障类型为 IP层 故障时, 关闭所述定时器, 将故障的 IP层链路切换为 IP层保护链路。
7、 如权利要求 6所述的装置, 其特征在于, 还包括: 发送模块, 用于向所述节点发送携带标志位的信息, 所述标志位用于指示 所述节点在检测到网络故障时, 上4艮故障类型。
8、 如权利要求 6所述的装置, 其特征在于, 所述保护模块, 还用于当未 超时。
9、 如权利要求 6所述的装置, 其特征在于, 所述保护模块, 还用于当超 过所述定时器预定的时间且所述检测模块检测到网络故障时, 将故障的 IP层 链路切换为 IP层保护链路。
10、 如权利要求 8所述的装置, 其特征在于, 所述保护模块, 还用于当超 过所述定时器预定的时间且所述检测模块检测到网络故障时, 将当前的 IP层 链路切换为 IP层保护链路。
11、一种链路故障恢复的系统,其特征在于, 包括: 第一节点和第二节点; 其中,
所述第一节点, 用于在检测到网络故障时, 向所述第二节点上报携带故障 类型的故障信息;
所述第二节点, 用于在检测到网络故障时, 启动定时器; 接收所述第一节 点上报的故障信息; 从所述故障信息中获取所述故障类型; 当未超过所述定时 器预定的时间且所述故障类型为 IP层故障时, 关闭所述定时器, 将故障的 IP 层链路切换为 IP层保护链路。
12、 一种存储介质, 其特征在于, 包括若干指令, 所述指令使一台计算机 设备执行以下方法:
检测到网络故障时, 启动定时器;
接收已检测到故障类型的节点上报的故障信息,所述故障信息携带所述故 障类型;
从所述故障信息中获取所述故障类型;
如果未超过所述定时器预定的时间且所述故障类型为 IP层故障, 关闭所 述定时器, 将故障的 IP层链路切换为 IP层保护链路。
PCT/CN2009/070482 2008-07-07 2009-02-20 一种链路故障恢复的方法、系统和装置 WO2010003323A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200810127676A CN101626317A (zh) 2008-07-07 2008-07-07 一种链路故障恢复的方法、系统和装置
CN200810127676.1 2008-07-07

Publications (1)

Publication Number Publication Date
WO2010003323A1 true WO2010003323A1 (zh) 2010-01-14

Family

ID=41506675

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2009/070482 WO2010003323A1 (zh) 2008-07-07 2009-02-20 一种链路故障恢复的方法、系统和装置

Country Status (2)

Country Link
CN (1) CN101626317A (zh)
WO (1) WO2010003323A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877659B (zh) * 2010-06-30 2014-07-16 中兴通讯股份有限公司 一种丢包监控的方法、设备和系统
CN106411539B (zh) * 2015-07-21 2019-07-12 中国移动通信集团公司 一种多层网络中业务保护的方法、设备和系统
CN106357529B (zh) * 2015-07-21 2019-12-10 中国移动通信集团公司 一种多层网络中业务保护的方法、设备和系统
CN105045674B (zh) * 2015-09-15 2018-11-20 浪潮(北京)电子信息产业有限公司 一种链路切换方法和系统
CN106685682A (zh) * 2015-11-10 2017-05-17 中国移动通信集团公司 一种故障处理方法、网络设备及系统
US11553393B2 (en) 2018-06-13 2023-01-10 Huawei Technologies Co., Ltd. Transmission control method and device
CN110875824B (zh) * 2018-08-30 2023-10-13 华为技术有限公司 一种故障多层链路恢复方法和控制器
CN110661705B (zh) * 2019-09-29 2022-06-28 北京物芯科技有限责任公司 一种硬件网络交换引擎和网络故障处理系统及方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020063916A1 (en) * 2000-07-20 2002-05-30 Chiu Angela L. Joint IP/optical layer restoration after a router failure
CN1284329C (zh) * 2002-12-30 2006-11-08 北京邮电大学 一种多层网络故障恢复方法
CN101099337A (zh) * 2005-06-16 2008-01-02 中兴通讯股份有限公司 一种对设备主动上报信息进行传输的方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020063916A1 (en) * 2000-07-20 2002-05-30 Chiu Angela L. Joint IP/optical layer restoration after a router failure
CN1284329C (zh) * 2002-12-30 2006-11-08 北京邮电大学 一种多层网络故障恢复方法
CN101099337A (zh) * 2005-06-16 2008-01-02 中兴通讯股份有限公司 一种对设备主动上报信息进行传输的方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHEN, JIANHUA ET AL.: "Survivability of IP over WDM Network Using Multilayer Recovery Mechanism.", ZTE COMMUNICATIONS., no. 6, 10 December 2004 (2004-12-10), pages 20 - 24 *

Also Published As

Publication number Publication date
CN101626317A (zh) 2010-01-13

Similar Documents

Publication Publication Date Title
WO2010003323A1 (zh) 一种链路故障恢复的方法、系统和装置
EP2761829B1 (en) Point-to-point based multicast label distribution protocol local protection solution
WO2009082923A1 (fr) Procédé de traitement de défaut de liaison et dispositif de transfert de données
WO2009056053A1 (fr) Procédé, dispositif et système de commutateur de capacité de flux à ingénierie de trafic et commutation multiprotocole par étiquette
US9774494B2 (en) Method, node, and system for detecting performance of layer 3 virtual private network
WO2012037820A1 (zh) 多协议标签交换系统、节点设备及双向隧道的建立方法
WO2008055436A1 (fr) Procédé de commande d'état de redémarrage progressif et de routeur
WO2008141567A1 (en) Multi-protocol label switching network flow switch method and equipment
WO2009046644A1 (fr) Procédé et dispositif pour la commutation de flux de trafic
WO2008028382A1 (fr) Procédé et appareil pour exécuter une détection de liaison, conversion de stratégie d'acheminement de bout en bout
EP2866394B1 (en) Method and device for sending inter-domain fault information
WO2009082917A1 (fr) Procédé de redémarrage élégant d'un routeur, routeur et son système de communication
WO2006079292A1 (fr) Procede pour la creation d'un trajet pour l'etiquette retour dans un systeme de commutation d'etiquettes multiprotocole
WO2013097523A1 (zh) 一种因特网协议安全隧道切换方法、装置及传输系统
WO2012016458A1 (zh) 用于二层虚拟专用网络的数据传输方法和装置
WO2013185567A1 (zh) 一种分组传送网络保护倒换装置和方法
WO2010020146A1 (zh) 流量工程隧道的关联保护方法、装置及系统
WO2011095101A1 (zh) 一种分组传送网络的线性1:n保护方法、装置和系统
WO2008119294A1 (fr) Procédé et matériel de restauration du commerce en réseau
WO2014206207A1 (zh) 一种路由撤销方法和网络设备
EP2254289B1 (en) Method, device, and system for establishing label switching path in fast rerouting switching
WO2007036103A1 (fr) Procede de reparation de trajet de reacheminement de service et systeme associe
WO2008031352A1 (fr) Procédé et appareil de récupération d'un effacement anormal survenu dans le lsp d'un réseau optique
WO2012142888A1 (zh) 基于多协议标签交换网络的隧道组保护实现方法及装置
WO2011140923A1 (zh) 一种建立标签交换路径的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09793798

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09793798

Country of ref document: EP

Kind code of ref document: A1