CN100563201C - A kind of method for detecting route unit fault and device - Google Patents

A kind of method for detecting route unit fault and device Download PDF

Info

Publication number
CN100563201C
CN100563201C CNB2004101009014A CN200410100901A CN100563201C CN 100563201 C CN100563201 C CN 100563201C CN B2004101009014 A CNB2004101009014 A CN B2004101009014A CN 200410100901 A CN200410100901 A CN 200410100901A CN 100563201 C CN100563201 C CN 100563201C
Authority
CN
China
Prior art keywords
test pack
router
unit
fault
described test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2004101009014A
Other languages
Chinese (zh)
Other versions
CN1783837A (en
Inventor
尹相东
闫志伟
李占有
关旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CNB2004101009014A priority Critical patent/CN100563201C/en
Publication of CN1783837A publication Critical patent/CN1783837A/en
Application granted granted Critical
Publication of CN100563201C publication Critical patent/CN100563201C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a kind of method for detecting route unit fault, solve the existing problem that has the low and poor reliability of fault-detecting ability in the router failure that detects; Described method is: the forwarding engine by router makes up test pack; Send described test pack, make other miscellaneous service processing unit of its traversal router traffic passage and from the switching network loopback; Judge whether then to receive described test pack,, then carry out statistical analysis and judge whether service channel is in malfunction by state to this test pack if can receive test pack; If can not receive described test pack, then judge the router traffic channel failure.

Description

A kind of method for detecting route unit fault and device
Technical field
The present invention relates to the data transmission technology of the communications field, relate in particular to the method that fault that the router that is used for transmitting data is existed detects.
Background technology
The effect of core router in modern communication networks becomes more and more important, and its reliability requirement is also more and more higher.On national backbone network, the fault of a big capacity port of core router may have influence on the service of a province, and the service disconnection of a few minutes will cause online major accident.
In the communication network major accident, there is quite a few ratio to cause by hardware failure.For guaranteeing that core router satisfies the requirement of high reliability; except the redundancy protecting design; another key factor that must consider in the design is: the troubleshooting capability that improves system; after fault takes place; system can detect and fault location automatically; to shorten failure recovery time, improve system availability.
In the communication equipment running, timely, comprehensive fault-detecting ability is the troubleshooting capability basis of raising system.But the detection method that communication equipment is commonly used implements the certain difficulty of existence in router product.
A kind of method of detection router failure commonly used is that the state by key chip in the timing Query Board comes detection failure.The crucial veneer of communication equipment is usually all finished the configuration and the management work of veneer by CPU, in equipment running process, regularly read the status register of key chip in the veneer by CPU, can the detection chip fault.Generally all there is status register complicated business process chip inside, if logical device (FPGA or EPLD), can be in design the reservation state register.CPU reads the state of these registers, can find the malfunction of chip to a certain extent.According to the requirement of fault detect real-time, regularly the frequency of detection chip can be from Millisecond to a minute level for CPU.
With the method detection chip fault that CPU regularly inquires about chip status, the deficiency of two aspects is arranged:
1, because the status register quantity of complex chip is all a lot of usually, and CPU can not all detect, and normally only selects one of them or a few register detects.When chip partial function occurs when undesired, the register that possible CPU reads can not accurately reflect the malfunction of chip, thereby the accuracy of this method detection failure has certain limitation.
2, compare with conventional telecommunications equipment such as transmission equipment, telephone exchanges, a special character of router product is the Processing tasks that CPU wants the assumption agreement message.CPU bears the task of too much detection chip fault, can increase its burden, and under big flow status, fault detect may have influence on the message of router and transmit.
In communication equipment, also often utilize the alarm function of Business Processing chip to detect the fault of router.A lot of Business Processing chips can detect institute and handle professional state, and when finding to have problem such as step-out, error code or LOF, active reports CPU by interruption.CPU further does processing such as fault recovery, fault location after receiving interruption.
Though this method is less to the CPU usage influence, the problem that the problem that causes because of upstream equipment or this chip minor failure are caused can realize fault detect, but catastrophe failure for this chip, particularly in the time can't having reported interruption behind the failure of chip, system can't realize fault detect by this method.
Hence one can see that, compares with equipment such as traditional optical transmission, voice exchanges, and the fault-detecting ability of router product is generally on the low side, and its system reliability is difficult to guarantee.Therefore, under the prerequisite that does not influence system's normal function, press for a kind of fault detection method of suitable router product,, improve the availability of system with the troubleshooting capability of enhanced routers system.
Summary of the invention
The invention provides a kind of method for detecting route unit fault, to solve the existing problem that has the low and poor reliability of fault-detecting ability in the router failure that detects.
For addressing the above problem, the invention provides following technical scheme:
A kind of method for detecting route unit fault, this method comprises the following step:
Forwarding engine in A, the router traffic passage makes up test pack;
B, described forwarding engine send described test pack, make it travel through other miscellaneous service processing unit of described router traffic passage and from the switching network loopback;
C, described forwarding engine judge whether to receive described test pack, if can receive described test pack, then carry out statistical analysis by the state to this test pack and judge whether described service channel is in malfunction; If can not receive described test pack, then judge described router traffic channel failure.
According to said method:
The bag length of described test pack is that principle is determined with the fault of the easiest exposure router, or is set by the user.
In this method for detecting route unit fault, described test pack is from the forwarded upstream engine, through uplink traffic control unit, switching network interface conversion unit, to switching network, then from the switching network loopback, pass through switching network interface conversion unit, downlink traffic control unit again, receive and carry out statistical analysis by descending forwarding engine.
When judging the service channel fault, send alarm to device management module; After described device management module is received alarm, carry out business recovery automatically and attempt.
Whether when CPU judges that business can't be recovered automatically, it is normal to inquire about each device of whole link one by one.
When in inquiring link, having device undesired, then control test pack and change loop-back path, in each Field Replaceable Unit, loopback is set, fault location is arrived Field Replaceable Unit (field replaceableunit is called for short FRU).
Forwarding engine in a kind of router traffic passage, this forwarding engine comprises:
Make up the test pack unit, be used to make up test pack;
Send the test pack unit, be used to send described test pack, make it travel through other miscellaneous service processing unit of described router traffic passage and from the switching network loopback;
Detecting unit is used to judge whether to receive described test pack, if can receive described test pack, then carries out statistical analysis by the state to this test pack and judges whether described service channel is in malfunction; If can not receive described test pack, then judge described router traffic channel failure.
The present invention has following beneficial effect:
1, the fault detect of service channel is finished by service channel self (can be network processing unit, logic OR ASIC), do not taken cpu resource, overcome fault detect traffic affecting problem under the big flow.Though need CPU to handle when fault warning and location, this moment, regular traffic interrupted, and CPU has not needed the processing protocol message.
2, initiatively carry out fault detect by Service Processing Unit, interrupt alarm, adopt the present invention still can realize fault detect even can't report after the Business Processing chip catastrophe failure.
3, test pack takies finite bandwidth, does not influence the normal forwarding of message.Test packet is in the course of normal operation at router and regularly sends, and travels through each unit of service channel, thereby can improve the fault-detecting ability of service channel.
4, after fault took place, system can be convenient to the attendant and recover professional as early as possible with fault location to FRU.
Description of drawings
Fig. 1 is core router business processing flow figure of the present invention.
Fig. 2 is the path profile of test pack process under the fault detection status of the present invention.
Fig. 3 is the path profile of test pack process under the fault location state of the present invention.
Embodiment
For specifying this programme, at first simply introduce the related service processing unit of core router.As shown in Figure 1, the core router service channel mainly comprises parts such as physical interface unit, forwarding engine, flow controlling unit, switching network interface conversion unit and switching network.
The forwarding engine of core router can be realized with network processing unit, also can use logical device, ASIC different modes such as (Application Specific Integrated Circuit, application-specific integrated circuit (ASIC)s) to realize; Forwarding engine can be that up-downgoing is separately realized by different chips with flow controlling unit, also can be that up-downgoing is handled by same chip realization.For ease of explanation the present invention, suppose that here forwarding engine realizes with network processing unit, forwarding engine is realized (difference of these specific design schemes is to not influence of realization of the present invention) by different chips respectively with the flow controlling unit up-downgoing.
The concrete scheme that realizes fault detect and location by the timed sending test pack is described as follows:
Be responsible for the structure and the analysis of test pack in the present embodiment by network processing unit, traffic handing capacity is strong because network processing unit has, the control flexible characteristic, make up test pack and timed sending by it, at last the test pack of receiving is analyzed, greatly offloading the CPU.If forwarding engine is realized with logical device (mainly referring to on-site programmable gate array FPGA) or ASIC,, can realize sending the also function of analytical test bag too as long as in clear and definite demand of design phase.After it should be noted that the ASIC design is finished, the construction method of test pack, analytical method, alarming threshold etc. just can not be revised again, and its flexibility is than network processing unit difference.Need stronger software processes ability because of making up test pack, forwarding engine can be finished this work in the Service Processing Unit, and other unit such as flow controlling unit, switching network interface conversion unit, switching network etc. are not suitable for.
Test pack length is that principle is determined with the fault of the easiest exposure router, also can be set by the user.The content of test pack can be that pseudo random number or user set.Therefore excessive test pack flow can influence the regular traffic flow, needs reasonable limits test pack flow, and for example to the 10G port, the test pack flow is controlled at below the 1k byte per second usually, can ignore the influence of the surface speed forwarding of port.
Send this test pack in course of normal operation, make other miscellaneous service processing unit and loopback of its traversal router traffic passage, test pack should travel through Business Processing chip as much as possible, to enlarge the fault detect scope.The path of test pack process is shown in Fig. 2 dotted line.In core router, can be from the forwarded upstream engine, through uplink traffic control unit, switching network interface conversion unit, to switching network, then from the switching network loopback, pass through switching network interface conversion unit, downlink traffic control unit again, by descending forwarding engine acceptance test bag and carry out statistical analysis.If the uplink and downlink of forwarding engine and flow controlling unit are handled by same chip and are realized, these chips of twice process of test pack meeting, the fault detection capability of realization is constant.Because of test pack can't pass through the physical interface unit, so the physical interface unit is not in the fault detect scope of the method.
If downlink network processor continuous several times can not receive test pack, then judge the router traffic channel failure, alarm to device management module.
After alarm is received in the system operation monitoring unit (being in charge of the module of Service Processing Unit in the router software) of router, at first can do business recovery and attempt (step-out that system can cause non-hardware failure, transmit and problem such as stop and doing automatic recovery and handle) to shorten service outage duration.When system's operation monitoring unit judges business can't be recovered automatically, carry out fault location automatically.
When system carries out fault location in the operation monitoring unit, whether inquire about in the transfer path of test pack process each device at first one by one normal, concrete querying method is: CPU reads the status register of each chip internal, compare with right value, if find that content of registers is undesired, can judge that this chip is unusual.If the method still can't fault location, can take the Fault Isolation detection method, with fault location to FRU.Concrete localization method is as follows: system's operation monitoring unit controls is done loopback at each Service Processing Unit, does the experiment of repeated detection bag loopback.During each time of loopback test normal (content that refers to the test pack received is identical with the test pack content of sending) tested, each Service Processing Unit of participation loop-back path was normal; And in the experiment of loopback test undesired (refer to can not receive test pack, or the content of the test pack of receiving being inequality with the test pack content of sending), can determine trouble unit according to the order of loopback experiment.For example, as shown in Figure 3: network board and Line Processing Unit can on-the-spotly be changed, and all are FRU.During fault location, the software control test pack is from interface section (switching network interface conversion unit) loopback (seeing the loop-back path that Fig. 3 dotted line is represented) of two plates, if the test pack loopback is undesired, then can fault location at Line Processing Unit, otherwise can fault location in network board or two plate interface parts.Fault location is behind FRU, and system operation monitoring unit can provide clear and definite fault message in alarm, is convenient to attendant's fault location and take the fault recovery measure as early as possible.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims (10)

1, a kind of method for detecting route unit fault is characterized in that, this method comprises the following step:
Forwarding engine in A, the router traffic passage makes up test pack;
B, described forwarding engine send described test pack, make it travel through other miscellaneous service processing unit of described router traffic passage and from the switching network loopback;
C, described forwarding engine judge whether to receive described test pack, if can receive described test pack, then carry out statistical analysis by the state to this test pack and judge whether described service channel is in malfunction; If can not receive described test pack, then judge described router traffic channel failure.
2, the method for claim 1 is characterized in that, the described test pack of timed sending in the router course of work.
3, the method for claim 1, it is characterized in that, test pack is from the forwarded upstream engine, arrive switching network through uplink traffic control unit, switching network interface conversion unit, then from the switching network loopback, pass through switching network interface conversion unit, downlink traffic control unit again, receive and carry out statistical analysis by descending forwarding engine.
As claim 1,2 or 3 described methods, it is characterized in that 4, the bag length of described test pack is that principle is determined with the fault of the easiest exposure router, or is set by the user.
5, method as claimed in claim 4 is characterized in that, the content of described test pack is a pseudo random number, and perhaps the content of described test pack is set by the user.
6, method as claimed in claim 2 is characterized in that, sends alarm to device management module when judging the service channel fault.
7, method as claimed in claim 6 is characterized in that, device management module carries out the business recovery trial after receiving described alarm automatically.
8, method as claimed in claim 7 is characterized in that, when business can't be recovered automatically, whether each Service Processing Unit of inquiring about the test pack process one by one was normal.
9, method as claimed in claim 7 is characterized in that, when having device undesired in inquiring link, the control test pack changes loop-back path, in each Field Replaceable Unit loopback is set, so that fault location is arrived Field Replaceable Unit.
10, the forwarding engine in a kind of router traffic passage is characterized in that this forwarding engine comprises:
Make up the test pack unit, be used to make up test pack;
Send the test pack unit, be used to send described test pack, make it travel through other miscellaneous service processing unit of described router traffic passage and from the switching network loopback;
Detecting unit is used to judge whether to receive described test pack, if can receive described test pack, then carries out statistical analysis by the state to this test pack and judges whether described service channel is in malfunction; If can not receive described test pack, then judge described router traffic channel failure.
CNB2004101009014A 2004-12-02 2004-12-02 A kind of method for detecting route unit fault and device Expired - Fee Related CN100563201C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004101009014A CN100563201C (en) 2004-12-02 2004-12-02 A kind of method for detecting route unit fault and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004101009014A CN100563201C (en) 2004-12-02 2004-12-02 A kind of method for detecting route unit fault and device

Publications (2)

Publication Number Publication Date
CN1783837A CN1783837A (en) 2006-06-07
CN100563201C true CN100563201C (en) 2009-11-25

Family

ID=36773617

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004101009014A Expired - Fee Related CN100563201C (en) 2004-12-02 2004-12-02 A kind of method for detecting route unit fault and device

Country Status (1)

Country Link
CN (1) CN100563201C (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100428703C (en) * 2006-08-24 2008-10-22 华为数字技术有限公司 Method and system for set testing of router
CN101094111B (en) * 2007-07-18 2010-05-26 中兴通讯股份有限公司 Method and system for carrying out testing whole set of network device
CN101505240B (en) * 2008-02-05 2011-03-30 华为技术有限公司 Fault detection method and apparatus
CN101588271B (en) * 2008-05-20 2011-10-26 中兴通讯股份有限公司 Method for detecting routing in IP multimedia subsystem (IMS)
CN101330410B (en) * 2008-07-17 2011-06-08 华为技术有限公司 Far-end loopback test method, system and exchange
CN101505242B (en) * 2008-12-25 2012-10-17 华为技术有限公司 Router fault detection method and router device
CN101998422B (en) * 2009-08-18 2015-07-22 中兴通讯股份有限公司 Test method and system for data carrying in calling establishing process
CN101808021A (en) * 2010-04-16 2010-08-18 华为技术有限公司 Fault detection method, device and system, message statistical method and node equipment
CN102143014A (en) * 2010-11-03 2011-08-03 华为数字技术有限公司 Single board failure detection method, single board and router
CN103490928A (en) * 2013-09-22 2014-01-01 华为技术有限公司 Message transmission route stoppage determining method, message transmission route stoppage determining device and message transmission route stoppage determining system
CN108234476A (en) * 2017-12-29 2018-06-29 天津芯海创科技有限公司 The action listener method and monitoring system of exchange chip
CN108199980A (en) * 2017-12-29 2018-06-22 天津芯海创科技有限公司 The action listener method and monitoring system of exchange chip
CN112751688B (en) * 2019-10-30 2023-08-01 中兴通讯股份有限公司 Flow control processing method of OTN (optical transport network) equipment, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
高性能路由器故障测试技术研究与实现. 王圣.国防科技大学工学硕士学位论文. 2003
高性能路由器故障测试技术研究与实现. 王圣.国防科技大学工学硕士学位论文. 2003 *

Also Published As

Publication number Publication date
CN1783837A (en) 2006-06-07

Similar Documents

Publication Publication Date Title
US6728216B1 (en) Arrangement in a network repeater for monitoring link integrity and selectively down shifting link speed based on local configuration signals
CN101355466B (en) Method and apparatus for transmitting continuous check information message
CN100563201C (en) A kind of method for detecting route unit fault and device
CN101132320B (en) Method for detecting interface trouble and network node equipment
US7502328B2 (en) Method of monitoring link performance and diagnosing active link state in Ethernet passive optical network
CN100459528C (en) Method for inspecting Qos in telecommunication network
CN101729303A (en) Method and device for measuring network performance parameter
CN101710896B (en) Method and device for detecting link quality
CN101247270A (en) System and method for implementing bidirectional forwarding detection
EP0952700B1 (en) Network equipment such as a network repeater and testing method therefor
CN100488070C (en) Link switching device and its method in communication system
CN101483592A (en) Method and apparatus for inhibiting bidirectional forwarding detection link oscillation
CN104796329A (en) Automatic link switching method and automatic link switching device
CN101714939A (en) Fault treatment method for Ethernet ring network host node and corresponding Ethernet ring network
CN103684818A (en) Method and device for detecting failures of network channel
CN100466591C (en) Master-slave device system
CN107070739A (en) A kind of router operation troubles intelligent detecting method and system
CN101330410B (en) Far-end loopback test method, system and exchange
CN100386997C (en) Data transmission system and method between telecommunication equipments based on point-to-point connection
EP0939512B1 (en) Method and arrangement in a network repeater for automatically changing link speed
US7046693B1 (en) Method and system for determining availability in networks
CN101848165B (en) The method recovered after controlling interrupted communication link and interface board
CN101465762B (en) Method, equipment and system for detecting error connection between protection set ports
CN112714060B (en) Link detection method and device
CN110138657B (en) Aggregation link switching method, device, equipment and storage medium between switches

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20091125

Termination date: 20181202

CF01 Termination of patent right due to non-payment of annual fee