CN100563201C - A kind of method for detecting route unit fault and device - Google Patents
A kind of method for detecting route unit fault and device Download PDFInfo
- Publication number
- CN100563201C CN100563201C CNB2004101009014A CN200410100901A CN100563201C CN 100563201 C CN100563201 C CN 100563201C CN B2004101009014 A CNB2004101009014 A CN B2004101009014A CN 200410100901 A CN200410100901 A CN 200410100901A CN 100563201 C CN100563201 C CN 100563201C
- Authority
- CN
- China
- Prior art keywords
- test pack
- router
- unit
- fault
- described test
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Abstract
The invention discloses a kind of method for detecting route unit fault, solve the existing problem that has the low and poor reliability of fault-detecting ability in the router failure that detects; Described method is: the forwarding engine by router makes up test pack; Send described test pack, make other miscellaneous service processing unit of its traversal router traffic passage and from the switching network loopback; Judge whether then to receive described test pack,, then carry out statistical analysis and judge whether service channel is in malfunction by state to this test pack if can receive test pack; If can not receive described test pack, then judge the router traffic channel failure.
Description
Technical field
The present invention relates to the data transmission technology of the communications field, relate in particular to the method that fault that the router that is used for transmitting data is existed detects.
Background technology
The effect of core router in modern communication networks becomes more and more important, and its reliability requirement is also more and more higher.On national backbone network, the fault of a big capacity port of core router may have influence on the service of a province, and the service disconnection of a few minutes will cause online major accident.
In the communication network major accident, there is quite a few ratio to cause by hardware failure.For guaranteeing that core router satisfies the requirement of high reliability; except the redundancy protecting design; another key factor that must consider in the design is: the troubleshooting capability that improves system; after fault takes place; system can detect and fault location automatically; to shorten failure recovery time, improve system availability.
In the communication equipment running, timely, comprehensive fault-detecting ability is the troubleshooting capability basis of raising system.But the detection method that communication equipment is commonly used implements the certain difficulty of existence in router product.
A kind of method of detection router failure commonly used is that the state by key chip in the timing Query Board comes detection failure.The crucial veneer of communication equipment is usually all finished the configuration and the management work of veneer by CPU, in equipment running process, regularly read the status register of key chip in the veneer by CPU, can the detection chip fault.Generally all there is status register complicated business process chip inside, if logical device (FPGA or EPLD), can be in design the reservation state register.CPU reads the state of these registers, can find the malfunction of chip to a certain extent.According to the requirement of fault detect real-time, regularly the frequency of detection chip can be from Millisecond to a minute level for CPU.
With the method detection chip fault that CPU regularly inquires about chip status, the deficiency of two aspects is arranged:
1, because the status register quantity of complex chip is all a lot of usually, and CPU can not all detect, and normally only selects one of them or a few register detects.When chip partial function occurs when undesired, the register that possible CPU reads can not accurately reflect the malfunction of chip, thereby the accuracy of this method detection failure has certain limitation.
2, compare with conventional telecommunications equipment such as transmission equipment, telephone exchanges, a special character of router product is the Processing tasks that CPU wants the assumption agreement message.CPU bears the task of too much detection chip fault, can increase its burden, and under big flow status, fault detect may have influence on the message of router and transmit.
In communication equipment, also often utilize the alarm function of Business Processing chip to detect the fault of router.A lot of Business Processing chips can detect institute and handle professional state, and when finding to have problem such as step-out, error code or LOF, active reports CPU by interruption.CPU further does processing such as fault recovery, fault location after receiving interruption.
Though this method is less to the CPU usage influence, the problem that the problem that causes because of upstream equipment or this chip minor failure are caused can realize fault detect, but catastrophe failure for this chip, particularly in the time can't having reported interruption behind the failure of chip, system can't realize fault detect by this method.
Hence one can see that, compares with equipment such as traditional optical transmission, voice exchanges, and the fault-detecting ability of router product is generally on the low side, and its system reliability is difficult to guarantee.Therefore, under the prerequisite that does not influence system's normal function, press for a kind of fault detection method of suitable router product,, improve the availability of system with the troubleshooting capability of enhanced routers system.
Summary of the invention
The invention provides a kind of method for detecting route unit fault, to solve the existing problem that has the low and poor reliability of fault-detecting ability in the router failure that detects.
For addressing the above problem, the invention provides following technical scheme:
A kind of method for detecting route unit fault, this method comprises the following step:
Forwarding engine in A, the router traffic passage makes up test pack;
B, described forwarding engine send described test pack, make it travel through other miscellaneous service processing unit of described router traffic passage and from the switching network loopback;
C, described forwarding engine judge whether to receive described test pack, if can receive described test pack, then carry out statistical analysis by the state to this test pack and judge whether described service channel is in malfunction; If can not receive described test pack, then judge described router traffic channel failure.
According to said method:
The bag length of described test pack is that principle is determined with the fault of the easiest exposure router, or is set by the user.
In this method for detecting route unit fault, described test pack is from the forwarded upstream engine, through uplink traffic control unit, switching network interface conversion unit, to switching network, then from the switching network loopback, pass through switching network interface conversion unit, downlink traffic control unit again, receive and carry out statistical analysis by descending forwarding engine.
When judging the service channel fault, send alarm to device management module; After described device management module is received alarm, carry out business recovery automatically and attempt.
Whether when CPU judges that business can't be recovered automatically, it is normal to inquire about each device of whole link one by one.
When in inquiring link, having device undesired, then control test pack and change loop-back path, in each Field Replaceable Unit, loopback is set, fault location is arrived Field Replaceable Unit (field replaceableunit is called for short FRU).
Forwarding engine in a kind of router traffic passage, this forwarding engine comprises:
Make up the test pack unit, be used to make up test pack;
Send the test pack unit, be used to send described test pack, make it travel through other miscellaneous service processing unit of described router traffic passage and from the switching network loopback;
Detecting unit is used to judge whether to receive described test pack, if can receive described test pack, then carries out statistical analysis by the state to this test pack and judges whether described service channel is in malfunction; If can not receive described test pack, then judge described router traffic channel failure.
The present invention has following beneficial effect:
1, the fault detect of service channel is finished by service channel self (can be network processing unit, logic OR ASIC), do not taken cpu resource, overcome fault detect traffic affecting problem under the big flow.Though need CPU to handle when fault warning and location, this moment, regular traffic interrupted, and CPU has not needed the processing protocol message.
2, initiatively carry out fault detect by Service Processing Unit, interrupt alarm, adopt the present invention still can realize fault detect even can't report after the Business Processing chip catastrophe failure.
3, test pack takies finite bandwidth, does not influence the normal forwarding of message.Test packet is in the course of normal operation at router and regularly sends, and travels through each unit of service channel, thereby can improve the fault-detecting ability of service channel.
4, after fault took place, system can be convenient to the attendant and recover professional as early as possible with fault location to FRU.
Description of drawings
Fig. 1 is core router business processing flow figure of the present invention.
Fig. 2 is the path profile of test pack process under the fault detection status of the present invention.
Fig. 3 is the path profile of test pack process under the fault location state of the present invention.
Embodiment
For specifying this programme, at first simply introduce the related service processing unit of core router.As shown in Figure 1, the core router service channel mainly comprises parts such as physical interface unit, forwarding engine, flow controlling unit, switching network interface conversion unit and switching network.
The forwarding engine of core router can be realized with network processing unit, also can use logical device, ASIC different modes such as (Application Specific Integrated Circuit, application-specific integrated circuit (ASIC)s) to realize; Forwarding engine can be that up-downgoing is separately realized by different chips with flow controlling unit, also can be that up-downgoing is handled by same chip realization.For ease of explanation the present invention, suppose that here forwarding engine realizes with network processing unit, forwarding engine is realized (difference of these specific design schemes is to not influence of realization of the present invention) by different chips respectively with the flow controlling unit up-downgoing.
The concrete scheme that realizes fault detect and location by the timed sending test pack is described as follows:
Be responsible for the structure and the analysis of test pack in the present embodiment by network processing unit, traffic handing capacity is strong because network processing unit has, the control flexible characteristic, make up test pack and timed sending by it, at last the test pack of receiving is analyzed, greatly offloading the CPU.If forwarding engine is realized with logical device (mainly referring to on-site programmable gate array FPGA) or ASIC,, can realize sending the also function of analytical test bag too as long as in clear and definite demand of design phase.After it should be noted that the ASIC design is finished, the construction method of test pack, analytical method, alarming threshold etc. just can not be revised again, and its flexibility is than network processing unit difference.Need stronger software processes ability because of making up test pack, forwarding engine can be finished this work in the Service Processing Unit, and other unit such as flow controlling unit, switching network interface conversion unit, switching network etc. are not suitable for.
Test pack length is that principle is determined with the fault of the easiest exposure router, also can be set by the user.The content of test pack can be that pseudo random number or user set.Therefore excessive test pack flow can influence the regular traffic flow, needs reasonable limits test pack flow, and for example to the 10G port, the test pack flow is controlled at below the 1k byte per second usually, can ignore the influence of the surface speed forwarding of port.
Send this test pack in course of normal operation, make other miscellaneous service processing unit and loopback of its traversal router traffic passage, test pack should travel through Business Processing chip as much as possible, to enlarge the fault detect scope.The path of test pack process is shown in Fig. 2 dotted line.In core router, can be from the forwarded upstream engine, through uplink traffic control unit, switching network interface conversion unit, to switching network, then from the switching network loopback, pass through switching network interface conversion unit, downlink traffic control unit again, by descending forwarding engine acceptance test bag and carry out statistical analysis.If the uplink and downlink of forwarding engine and flow controlling unit are handled by same chip and are realized, these chips of twice process of test pack meeting, the fault detection capability of realization is constant.Because of test pack can't pass through the physical interface unit, so the physical interface unit is not in the fault detect scope of the method.
If downlink network processor continuous several times can not receive test pack, then judge the router traffic channel failure, alarm to device management module.
After alarm is received in the system operation monitoring unit (being in charge of the module of Service Processing Unit in the router software) of router, at first can do business recovery and attempt (step-out that system can cause non-hardware failure, transmit and problem such as stop and doing automatic recovery and handle) to shorten service outage duration.When system's operation monitoring unit judges business can't be recovered automatically, carry out fault location automatically.
When system carries out fault location in the operation monitoring unit, whether inquire about in the transfer path of test pack process each device at first one by one normal, concrete querying method is: CPU reads the status register of each chip internal, compare with right value, if find that content of registers is undesired, can judge that this chip is unusual.If the method still can't fault location, can take the Fault Isolation detection method, with fault location to FRU.Concrete localization method is as follows: system's operation monitoring unit controls is done loopback at each Service Processing Unit, does the experiment of repeated detection bag loopback.During each time of loopback test normal (content that refers to the test pack received is identical with the test pack content of sending) tested, each Service Processing Unit of participation loop-back path was normal; And in the experiment of loopback test undesired (refer to can not receive test pack, or the content of the test pack of receiving being inequality with the test pack content of sending), can determine trouble unit according to the order of loopback experiment.For example, as shown in Figure 3: network board and Line Processing Unit can on-the-spotly be changed, and all are FRU.During fault location, the software control test pack is from interface section (switching network interface conversion unit) loopback (seeing the loop-back path that Fig. 3 dotted line is represented) of two plates, if the test pack loopback is undesired, then can fault location at Line Processing Unit, otherwise can fault location in network board or two plate interface parts.Fault location is behind FRU, and system operation monitoring unit can provide clear and definite fault message in alarm, is convenient to attendant's fault location and take the fault recovery measure as early as possible.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.
Claims (10)
1, a kind of method for detecting route unit fault is characterized in that, this method comprises the following step:
Forwarding engine in A, the router traffic passage makes up test pack;
B, described forwarding engine send described test pack, make it travel through other miscellaneous service processing unit of described router traffic passage and from the switching network loopback;
C, described forwarding engine judge whether to receive described test pack, if can receive described test pack, then carry out statistical analysis by the state to this test pack and judge whether described service channel is in malfunction; If can not receive described test pack, then judge described router traffic channel failure.
2, the method for claim 1 is characterized in that, the described test pack of timed sending in the router course of work.
3, the method for claim 1, it is characterized in that, test pack is from the forwarded upstream engine, arrive switching network through uplink traffic control unit, switching network interface conversion unit, then from the switching network loopback, pass through switching network interface conversion unit, downlink traffic control unit again, receive and carry out statistical analysis by descending forwarding engine.
As claim 1,2 or 3 described methods, it is characterized in that 4, the bag length of described test pack is that principle is determined with the fault of the easiest exposure router, or is set by the user.
5, method as claimed in claim 4 is characterized in that, the content of described test pack is a pseudo random number, and perhaps the content of described test pack is set by the user.
6, method as claimed in claim 2 is characterized in that, sends alarm to device management module when judging the service channel fault.
7, method as claimed in claim 6 is characterized in that, device management module carries out the business recovery trial after receiving described alarm automatically.
8, method as claimed in claim 7 is characterized in that, when business can't be recovered automatically, whether each Service Processing Unit of inquiring about the test pack process one by one was normal.
9, method as claimed in claim 7 is characterized in that, when having device undesired in inquiring link, the control test pack changes loop-back path, in each Field Replaceable Unit loopback is set, so that fault location is arrived Field Replaceable Unit.
10, the forwarding engine in a kind of router traffic passage is characterized in that this forwarding engine comprises:
Make up the test pack unit, be used to make up test pack;
Send the test pack unit, be used to send described test pack, make it travel through other miscellaneous service processing unit of described router traffic passage and from the switching network loopback;
Detecting unit is used to judge whether to receive described test pack, if can receive described test pack, then carries out statistical analysis by the state to this test pack and judges whether described service channel is in malfunction; If can not receive described test pack, then judge described router traffic channel failure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004101009014A CN100563201C (en) | 2004-12-02 | 2004-12-02 | A kind of method for detecting route unit fault and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004101009014A CN100563201C (en) | 2004-12-02 | 2004-12-02 | A kind of method for detecting route unit fault and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1783837A CN1783837A (en) | 2006-06-07 |
CN100563201C true CN100563201C (en) | 2009-11-25 |
Family
ID=36773617
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2004101009014A Expired - Fee Related CN100563201C (en) | 2004-12-02 | 2004-12-02 | A kind of method for detecting route unit fault and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100563201C (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100428703C (en) * | 2006-08-24 | 2008-10-22 | 华为数字技术有限公司 | Method and system for set testing of router |
CN101094111B (en) * | 2007-07-18 | 2010-05-26 | 中兴通讯股份有限公司 | Method and system for carrying out testing whole set of network device |
CN101505240B (en) * | 2008-02-05 | 2011-03-30 | 华为技术有限公司 | Fault detection method and apparatus |
CN101588271B (en) * | 2008-05-20 | 2011-10-26 | 中兴通讯股份有限公司 | Method for detecting routing in IP multimedia subsystem (IMS) |
CN101330410B (en) * | 2008-07-17 | 2011-06-08 | 华为技术有限公司 | Far-end loopback test method, system and exchange |
CN101505242B (en) * | 2008-12-25 | 2012-10-17 | 华为技术有限公司 | Router fault detection method and router device |
CN101998422B (en) * | 2009-08-18 | 2015-07-22 | 中兴通讯股份有限公司 | Test method and system for data carrying in calling establishing process |
CN101808021A (en) * | 2010-04-16 | 2010-08-18 | 华为技术有限公司 | Fault detection method, device and system, message statistical method and node equipment |
CN102143014A (en) * | 2010-11-03 | 2011-08-03 | 华为数字技术有限公司 | Single board failure detection method, single board and router |
CN103490928A (en) * | 2013-09-22 | 2014-01-01 | 华为技术有限公司 | Message transmission route stoppage determining method, message transmission route stoppage determining device and message transmission route stoppage determining system |
CN108234476A (en) * | 2017-12-29 | 2018-06-29 | 天津芯海创科技有限公司 | The action listener method and monitoring system of exchange chip |
CN108199980A (en) * | 2017-12-29 | 2018-06-22 | 天津芯海创科技有限公司 | The action listener method and monitoring system of exchange chip |
CN112751688B (en) * | 2019-10-30 | 2023-08-01 | 中兴通讯股份有限公司 | Flow control processing method of OTN (optical transport network) equipment, electronic equipment and storage medium |
-
2004
- 2004-12-02 CN CNB2004101009014A patent/CN100563201C/en not_active Expired - Fee Related
Non-Patent Citations (2)
Title |
---|
高性能路由器故障测试技术研究与实现. 王圣.国防科技大学工学硕士学位论文. 2003 |
高性能路由器故障测试技术研究与实现. 王圣.国防科技大学工学硕士学位论文. 2003 * |
Also Published As
Publication number | Publication date |
---|---|
CN1783837A (en) | 2006-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6728216B1 (en) | Arrangement in a network repeater for monitoring link integrity and selectively down shifting link speed based on local configuration signals | |
CN101355466B (en) | Method and apparatus for transmitting continuous check information message | |
CN100563201C (en) | A kind of method for detecting route unit fault and device | |
CN101132320B (en) | Method for detecting interface trouble and network node equipment | |
US7502328B2 (en) | Method of monitoring link performance and diagnosing active link state in Ethernet passive optical network | |
CN100459528C (en) | Method for inspecting Qos in telecommunication network | |
CN101729303A (en) | Method and device for measuring network performance parameter | |
CN101710896B (en) | Method and device for detecting link quality | |
CN101247270A (en) | System and method for implementing bidirectional forwarding detection | |
EP0952700B1 (en) | Network equipment such as a network repeater and testing method therefor | |
CN100488070C (en) | Link switching device and its method in communication system | |
CN101483592A (en) | Method and apparatus for inhibiting bidirectional forwarding detection link oscillation | |
CN104796329A (en) | Automatic link switching method and automatic link switching device | |
CN101714939A (en) | Fault treatment method for Ethernet ring network host node and corresponding Ethernet ring network | |
CN103684818A (en) | Method and device for detecting failures of network channel | |
CN100466591C (en) | Master-slave device system | |
CN107070739A (en) | A kind of router operation troubles intelligent detecting method and system | |
CN101330410B (en) | Far-end loopback test method, system and exchange | |
CN100386997C (en) | Data transmission system and method between telecommunication equipments based on point-to-point connection | |
EP0939512B1 (en) | Method and arrangement in a network repeater for automatically changing link speed | |
US7046693B1 (en) | Method and system for determining availability in networks | |
CN101848165B (en) | The method recovered after controlling interrupted communication link and interface board | |
CN101465762B (en) | Method, equipment and system for detecting error connection between protection set ports | |
CN112714060B (en) | Link detection method and device | |
CN110138657B (en) | Aggregation link switching method, device, equipment and storage medium between switches |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20091125 Termination date: 20181202 |
|
CF01 | Termination of patent right due to non-payment of annual fee |