CN102857365A - Fault preventing and intelligent repairing method and device for network management system - Google Patents

Fault preventing and intelligent repairing method and device for network management system Download PDF

Info

Publication number
CN102857365A
CN102857365A CN201210185225XA CN201210185225A CN102857365A CN 102857365 A CN102857365 A CN 102857365A CN 201210185225X A CN201210185225X A CN 201210185225XA CN 201210185225 A CN201210185225 A CN 201210185225A CN 102857365 A CN102857365 A CN 102857365A
Authority
CN
China
Prior art keywords
fault
early warning
strategy
module
message
Prior art date
Application number
CN201210185225XA
Other languages
Chinese (zh)
Inventor
冯冲
陈斌
戴娴娴
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to CN201210185225XA priority Critical patent/CN102857365A/en
Publication of CN102857365A publication Critical patent/CN102857365A/en

Links

Abstract

The invention relates to a fault preventing and intelligent repairing method and device for a network management system. The device comprises a fault detection module, a fault prevention module and a fault repairing module; and the method comprises the following steps of: setting corresponding early warning thresholds, fault thresholds, early warning repairing policies and fault repairing policies for system parameters related to faults in the network management system in advance; after monitoring, if the system parameters related to faults are more than or equal to the corresponding early warning thresholds and less than the corresponding fault thresholds, generating early warning messages, finding the corresponding early warning repairing policies and repairing the early warning; through monitoring, if the system parameters related to the faults are more than or equal to the corresponding fault thresholds, generating fault messages and reporting fault warning; and according to the fault messages, finding the corresponding fault repairing policies and repairing the faults. By adopting the fault preventing and intelligent repairing method and device which are disclosed by the invention, fault occurrence probability of the network management system is reduced to the greatest extent, and fault repairing speed is improved.

Description

Trouble-saving and intelligent restorative procedure and device in the network management system

Technical field

The present invention relates to the communications field, relate in particular to trouble-saving and intelligent restorative procedure and device in a kind of network management system.

Background technology

The fault management of existing network management system is notified the attendant by the mode of alarm normally after system breaks down, the mode of being carried out manual intervention by the attendant solves fault.

This kind processing mode has following several large shortcoming:

1, always to wait until that fault just starts fault restoration when occuring, bring the operation loss to operator;

2, after fault occurs, mode by alarm is notified webmaster, and when the attendant investigated fault by the time, variation may occur the parameter of equipment, so that the attendant possibly can't during failure judgement what occur on earth, therefore also possibly can't really find the reason of problem;

Although 3 modes that break down rear use alarm have been notified the attendant, but possible attendant is not at Administrative Area, can't know in time that fault has occured equipment, perhaps this equipment is not equipped with NM server, can only wait until that the attendant inquires about just can know fault has occured.

Summary of the invention

The objective of the invention is, trouble-saving and intelligent restorative procedure and device in a kind of network management system are provided, can not trouble saving to improve existing network management system, more can not repair the defective of fault.

The invention provides trouble-saving and intelligent restorative procedure in a kind of network management system, said method may further comprise the steps:

Be in advance in the network management system with the corresponding threshold value of warning of the system parameter setting of fault correlation and fault threshold, corresponding early warning is set simultaneously repairs strategy and fault restoration strategy;

During less than corresponding fault threshold, then generate early warning information, and search corresponding early warning and repair strategy more than or equal to corresponding threshold value of warning when monitoring system parameters with fault correlation, according to above-mentioned early warning reparation strategy above-mentioned early warning is repaired;

When monitoring with the system parameters of fault correlation more than or equal to corresponding fault threshold, then generate failure message and reporting fault alarm; According to above-mentioned failure message, search corresponding fault restoration strategy simultaneously, and according to above-mentioned fault restoration strategy above-mentioned fault is repaired.

Preferably, said method also sets in advance corresponding fault collection strategy, early warning re-detection strategy, fault re-detection strategy and failure message send mode.

Preferably, above-mentioned according to described failure message, search corresponding fault restoration strategy, and according to above-mentioned fault restoration strategy above-mentioned fault is repaired specifically and may further comprise the steps:

According to above-mentioned failure message, search corresponding fault collection strategy;

According to above-mentioned fault collection strategy, gather the system parameters relevant with above-mentioned fault;

Generation fault restoration notice;

According to above-mentioned fault restoration notice, search corresponding fault restoration strategy;

According to above-mentioned fault restoration strategy, above-mentioned fault is repaired.

Preferably, said method is further comprising the steps of:

After the fault pre-alarming reparation is finished, search and carry out corresponding early warning re-detection strategy, detect early warning and whether eliminate, if then generate early warning and eliminate message; Otherwise, generate the early warning upgrading message;

After fault restoration is finished, search and carry out corresponding fault re-detection strategy, whether detection failure is eliminated, if then generate Failure elimination message and reporting fault and eliminate prompting; Otherwise, regenerate failure message.

Preferably, said method is also carried out following steps after regenerating failure message:

According to above-mentioned failure message, search corresponding failure message send mode, report above-mentioned failure message with above-mentioned failure message send mode.

Preferably, above-mentioned early warning re-detection strategy is corresponding health detection strategy in the network management system.

Preferably, above-mentioned early warning reparation strategy, fault restoration strategy, fault collection strategy, early warning re-detection strategy, fault re-detection strategy and health detection strategy are built-in script, User Defined script or the message event of above-mentioned network management system.

The present invention further provides trouble-saving and intelligent prosthetic device in a kind of network management system, said apparatus comprises fault detection module, trouble-saving module and fault restoration module;

Above-mentioned malfunction monitoring module is used for providing setting and the threshold value of warning of the system parameters of fault correlation and the interface of fault threshold; And the system parameters of monitoring and fault correlation, and judge the size of said system parameter and corresponding threshold value of warning and fault threshold; When the said system parameter during less than the fault threshold of correspondence, generates early warning information more than or equal to the threshold value of warning of correspondence, send to above-mentioned trouble-saving module; When monitoring the said system parameter more than or equal to the fault threshold of correspondence, generate failure message and reporting fault alarm, simultaneously above-mentioned failure message is sent to above-mentioned fault restoration module;

Above-mentioned trouble-saving module is used for providing the interface that early warning reparation strategy is set; And according to the early warning information of receiving, search corresponding early warning and repair strategy, and according to above-mentioned early warning reparation strategy above-mentioned early warning is repaired;

Above-mentioned fault restoration module is used for providing the interface that the fault restoration strategy is set; And according to the failure message of receiving, search corresponding fault restoration strategy, and according to above-mentioned fault restoration strategy above-mentioned fault is repaired.

Preferably, said apparatus also comprises fault collection module and failure message sending module,

Above-mentioned malfunction monitoring module, the failure message that is used for generating sends to above-mentioned fault collection module;

Above-mentioned fault collection module is used for providing the interface that the fault collection strategy is set; And according to the failure message of receiving, search corresponding fault collection strategy, and according to above-mentioned fault collection strategy, gather the system parameters relevant with above-mentioned fault, generate the fault restoration notice, send to above-mentioned fault restoration module;

Above-mentioned fault restoration module is used for searching corresponding fault restoration strategy, and according to above-mentioned fault restoration strategy above-mentioned fault being repaired according to the fault restoration notice of receiving;

Above-mentioned failure message sending module is used for providing the interface that the failure message send mode is set; And when receiving failure message, search corresponding failure message send mode, and report above-mentioned failure message with above-mentioned failure message send mode.

Preferably, above-mentioned trouble-saving module is used for providing the interface that early warning re-detection strategy is set; And according to early warning information, search corresponding early warning re-detection strategy, and according to above-mentioned early warning re-detection strategy, detect early warning and whether eliminate, and when early warning is eliminated, generate early warning and eliminate message, send to above-mentioned malfunction monitoring module; When early warning is not eliminated, generate the early warning upgrading message, send to above-mentioned malfunction monitoring module;

Above-mentioned fault restoration module is used for providing the interface that fault re-detection strategy is set; And the fault restoration notice, search corresponding fault re-detection strategy, and according to above-mentioned fault re-detection strategy, whether detection failure is eliminated, and when Failure elimination, generated Failure elimination message, send to above-mentioned malfunction monitoring module; When fault is not eliminated, generate failure message and send to above-mentioned failure message sending module.

Preferably, above-mentioned malfunction monitoring module is used for when generating early warning information, judges whether the early warning elimination message of receiving that within the default very first time above-mentioned trouble-saving module is returned; And within the above-mentioned very first time, receive when message is eliminated in early warning that above-mentioned trouble-saving module returns, eliminate above-mentioned early warning; And when generating failure message, judge whether within the second default time, to receive the Failure elimination message that above-mentioned fault restoration module is returned, and when receiving the Failure elimination message that above-mentioned fault restoration module returns within above-mentioned the second time, reporting fault is eliminated prompting.

Preferably, above-mentioned malfunction monitoring module comprises early warning monitoring submodule and failure monitoring submodule, wherein,

Above-mentioned early warning monitoring submodule is used for during less than the fault threshold of correspondence, generating early warning information more than or equal to the threshold value of warning of correspondence in system parameters, sends to above-mentioned trouble-saving module; And judge whether within the default very first time, to receive that the early warning that above-mentioned trouble-saving module is returned eliminates message, and within the above-mentioned very first time, receive when message is eliminated in early warning that above-mentioned trouble-saving module is returned, remove above-mentioned early warning;

Above-mentioned failure monitoring submodule is used for during more than or equal to the fault threshold of correspondence, generating failure message and reporting fault alarm in system parameters, simultaneously above-mentioned failure message is sent to above-mentioned fault collection module; And judge whether to receive the Failure elimination message that above-mentioned fault restoration module is returned within the second default time, and when receiving the Failure elimination message that above-mentioned fault restoration module returns within above-mentioned the second time, reporting fault is eliminated prompting.

Preferably, above-mentioned trouble-saving module comprises early warning reparation submodule and early warning detection sub-module; Wherein,

Submodule is repaired in above-mentioned early warning, is used for according to early warning information, searches corresponding early warning and repairs strategy, and according to above-mentioned early warning reparation strategy above-mentioned early warning is repaired;

Whether above-mentioned early warning detection sub-module is used for according to early warning information, searches corresponding early warning re-detection strategy, and detect early warning according to above-mentioned early warning re-detection strategy and eliminate.

Preferably, above-mentioned fault restoration module comprises intelligence reparation submodule and fault detect submodule; Wherein,

Above-mentioned intelligence is repaired submodule, according to the fault restoration notice, searches corresponding fault restoration strategy, and according to above-mentioned fault restoration strategy above-mentioned fault is repaired;

Whether above-mentioned fault detect submodule is used for the fault restoration notice, search corresponding fault re-detection strategy, and eliminate according to above-mentioned fault re-detection strategy detection failure.

The invention enables network management system when early warning or fault generation, can repair voluntarily first, reduced to greatest extent the fault rate of network management system, reduce the operation cost of operator; Only when network management system can not self-healing, just carry out manual intervention, significantly reduced the number of times of manual intervention, reduced operations risks, improved fault restoration speed.

Description of drawings

Accompanying drawing described herein is used to provide a further understanding of the present invention, consists of a part of the present invention, and illustrative examples of the present invention and explanation thereof are used for explaining the present invention, do not consist of improper restriction of the present invention.In the accompanying drawings:

Fig. 1 is the theory diagram of trouble-saving and intelligent prosthetic device preferred embodiment in the network management system of the present invention;

Fig. 2 is the flow chart of preferred embodiment of the malfunction monitoring part of trouble-saving and intelligent restorative procedure in the network management system of the present invention;

Fig. 3 is the flow chart of preferred embodiment of the early warning reparation part of trouble-saving and intelligent restorative procedure in the network management system of the present invention;

Fig. 4 is the flow chart of preferred embodiment of the fault collection part of trouble-saving and intelligent restorative procedure in the network management system of the present invention.

Embodiment

In order to make technical problem to be solved by this invention, technical scheme and beneficial effect clearer, clear, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, is not intended to limit the present invention.

As shown in Figure 1, be the theory diagram of trouble-saving and intelligent prosthetic device preferred embodiment in the network management system of the present invention, the present embodiment comprises fault detection module 10, trouble-saving module 20, fault collection module 30, fault restoration module 40 and failure message sending module 50; Wherein,

Malfunction monitoring module 10 is used for providing setting and the threshold value of warning of the system parameters of fault correlation and the interface of fault threshold; And the system parameters of monitoring and fault correlation, and judge the size of said system parameter and corresponding threshold value of warning and fault threshold; When the said system parameter more than or equal to the threshold value of warning of correspondence during less than the fault threshold of correspondence, generate early warning information, send to trouble-saving module 20, and whether failure judgement prevention module 20 returns early warning elimination message within the default very first time, within the above-mentioned very first time, receive when message is eliminated in early warning that trouble-saving module 20 returns, remove early warning; When monitoring the said system parameter more than or equal to the fault threshold of correspondence, generate failure message and reporting fault alarm, simultaneously above-mentioned failure message is sent to fault collection module 30, and whether failure judgement reparation module 40 returns Failure elimination message within the second default time, receive that within above-mentioned the second time reporting fault was eliminated prompting when message was eliminated in early warning that trouble-saving module 40 returns; Malfunction monitoring module 10 comprises early warning monitoring submodule 11 and failure monitoring submodule 12;

Early warning monitoring submodule 11, be used in system parameters more than or equal to the threshold value of warning of correspondence during less than the fault threshold of correspondence, generate early warning information, send to trouble-saving module 20, and whether failure judgement prevention module 20 returns early warning elimination message within the default very first time, within the above-mentioned very first time, receive when message is eliminated in early warning that trouble-saving module 20 returns, remove early warning;

Failure monitoring submodule 12, be used in system parameters during more than or equal to the fault threshold of correspondence, generate failure message and reporting fault alarm, simultaneously above-mentioned failure message is sent to fault collection module 30, and whether failure judgement reparation module 40 returns Failure elimination message within the second default time, receive that within above-mentioned the second time reporting fault was eliminated prompting when message was eliminated in early warning that trouble-saving module 40 returns;

Trouble-saving module 20 is used for providing the interface that early warning reparation strategy, early warning re-detection strategy are set; And according to the early warning information of receiving, search corresponding early warning and repair strategy, early warning re-detection strategy, and according to above-mentioned early warning reparation strategy above-mentioned early warning is repaired; According to above-mentioned early warning re-detection strategy, detect early warning and whether eliminate, when early warning is eliminated, generate early warning and eliminate message, send to malfunction monitoring module 10; When early warning is not eliminated, generate the early warning upgrading message, send to malfunction monitoring module 10; Trouble-saving module 20 comprises early warning reparation submodule 21 and early warning detection sub-module 22;

Submodule 21 is repaired in early warning, is used for according to early warning information, searches corresponding early warning and repairs strategy, and according to above-mentioned early warning reparation strategy above-mentioned early warning is repaired;

Whether early warning detection sub-module 22 is used for according to early warning information, searches corresponding early warning re-detection strategy, and detect early warning according to above-mentioned early warning re-detection strategy and eliminate;

Fault collection module 30 is used for providing the interface that the fault collection strategy is set; And according to the failure message of receiving, search corresponding fault collection strategy, and according to above-mentioned fault collection strategy, gather the system parameters relevant with above-mentioned fault, generate the fault restoration notice, send to fault restoration module 40;

Fault restoration module 40 is used for providing the interface that fault restoration strategy, fault re-detection strategy are set; And according to the fault restoration notice of receiving, search corresponding fault restoration strategy, fault re-detection strategy, and according to above-mentioned fault restoration strategy above-mentioned fault is repaired; According to above-mentioned fault re-detection strategy, whether detection failure is eliminated, and when Failure elimination, generates Failure elimination message, sends to malfunction monitoring module 10; When fault is not eliminated, generate failure message and send to failure message sending module 50; Fault restoration module 40 comprises intelligence reparation submodule 41 and fault detect submodule 42;

Intelligence is repaired submodule 41, is used for searching corresponding fault restoration strategy, and according to above-mentioned fault restoration strategy above-mentioned fault is repaired;

Whether fault detect submodule 42 is used for searching corresponding fault re-detection strategy, and eliminates according to above-mentioned fault re-detection strategy detection failure;

Failure message sending module 50 is used for providing the interface that the failure message send mode is set; And when receiving failure message, search corresponding failure message send mode, and report above-mentioned failure message with above-mentioned failure message send mode.

The theory diagram of trouble-saving and intelligent prosthetic device in the network management system according to the present invention, trouble-saving in the network management system of the present invention and intelligent restorative procedure can be divided into 6 parts, be respectively: tactful predetermined fraction, malfunction monitoring part, early warning reparation part, fault collection part, fault restoration part and failure message report part, below in conjunction with specific embodiment, trouble-saving in the network management system of the present invention and intelligent restorative procedure are elaborated.

The strategy predetermined fraction is specially: in advance in the network management system with the corresponding threshold value of warning of the system parameter setting of fault correlation and fault threshold, corresponding early warning is set simultaneously repairs strategy, fault restoration strategy, fault collection strategy, early warning re-detection strategy, fault re-detection strategy and failure message send mode; In conjunction with Fig. 1, wherein, threshold value of warning, fault threshold are arranged in the malfunction monitoring module 10, early warning reparation strategy, early warning re-detection strategy are arranged in the trouble-saving module 20, the fault collection strategy is arranged in the fault collection module 30, fault restoration strategy, fault re-detection strategy are arranged in the fault restoration module 40, and the failure message send mode is arranged in the failure message sending module 50.

Among the present invention, early warning re-detection strategy can adopt corresponding health detection strategy in the network management system; And early warning reparation strategy, fault restoration strategy, fault collection strategy, early warning re-detection strategy, fault re-detection strategy can be arranged in the network management system with the form of built-in script, User Defined script or message event.

As shown in Figure 2, be the flow chart of preferred embodiment of the malfunction monitoring part of trouble-saving and intelligent restorative procedure in the network management system of the present invention, this part is carried out by malfunction monitoring module 10, may further comprise the steps:

Step S001: the system parameters of 10 monitorings of malfunction monitoring module and fault correlation;

Step S002: judge the size of said system parameter and corresponding threshold value of warning, if less than above-mentioned threshold value of warning, execution in step S001 then; Otherwise, execution in step S003;

Step S003: judge the size of said system parameter and corresponding fault threshold, if less than above-mentioned fault threshold, execution in step S004 then; Otherwise, execution in step S007;

Step S004: generate early warning information, send to trouble-saving module 20;

Step S005: judge whether to receive that in the default very first time T1 trouble-saving module 20 returns early warning and eliminate message, if, execution in step S006 then; Otherwise, finish;

Step S006: eliminate early warning, change step S001 over to and carry out;

Step S007: generate failure message, send to fault collection module 03;

Step S008: judge in the second default time T 2 whether receive the Failure elimination message that fault restoration module 40 is returned, if, execution in step S009 then; Otherwise, finish;

Step S009: reporting fault is eliminated prompting, changes step S001 over to and carries out.

As shown in Figure 3, be the flow chart of preferred embodiment of the early warning reparation part of trouble-saving and intelligent restorative procedure in the network management system of the present invention, this part is carried out by trouble-saving module 20, may further comprise the steps:

Step S101: trouble-saving module 20 is received early warning information;

Step S102: according to early warning information, search corresponding early warning and repair strategy and early warning re-detection strategy;

In this step, early warning re-detection strategy also can be health detection strategy corresponding in the system, whether eliminate if select to detect early warning with the health detection strategy, then when detecting system health, show that early warning repairs, when the system that detects is unhealthy, show that then early warning do not repair;

Step S103: repair strategy according to above-mentioned early warning, above-mentioned early warning is repaired;

Step S104: according to above-mentioned early warning re-detection strategy, detect early warning and whether eliminate, if, execution in step S105 then; Otherwise, execution in step S106;

Step S105: generate early warning and eliminate message, send to malfunction monitoring module 10, finish;

Step S106: generate the early warning upgrading message, send to malfunction monitoring module 10, finish.

Below be three examples of this part:

Example one: the threshold value of warning that configures certain interface packet receiving speed is 10,000 bags of per second, and it is the User Defined script that strategy is repaired in early warning, and early warning re-detection strategy is intrasystem interface health detection strategy, and this interface health detection strategy is the built-in script of system; If the packet receiving speed at system's this interface in service surpasses this threshold value of warning, then trigger fault pre-alarming; Repair strategy according to early warning, this interface is analyzed at the bag of receiving for the previous period, the bag of flow maximum is wherein carried out speed limit, the speed limit time can be self-defined,, detect according to the packet receiving speed of health detection strategy to this interface again, if packet receiving speed is lower than threshold value of warning within a period of time after the time in speed limit, then early warning is eliminated, otherwise the early warning upgrading.

Example two: the threshold value of warning of the whole CPU usage of configuration-system is 60%, and it is the built-in script of system that strategy is repaired in early warning, and early warning re-detection strategy is the CPU health detection strategy of the built-in form of scripts of system; Whole CPU usage in service has surpassed 60% in system, then triggers fault pre-alarming; CPU in the system is taken the highest several processes to carry out dispatching priority and temporarily reduces (except the vital process of system), after waiting for Preset Time, whole CPU usage to system is added up, if be lower than threshold value of warning, then cancel fault pre-alarming, if still be higher than threshold value of warning, then trigger the early warning upgrading;

Example three: the threshold value of warning of the overall memory occupancy of configuration-system is 75%, and it is system's plug-in that strategy is repaired in early warning, and early warning re-detection strategy is the internal memory health detection strategy of the built-in form of scripts of system.Memory usage in system's integral body in service surpasses 75%, then triggers fault pre-alarming; And trigger early warning and repair strategy, in this example, it is garbage collection program that strategy is repaired in early warning, the internal memory that to not being cited in the system but does not discharge is cleared up, behind the garbage collection program end of run, again the memory usage of entire system detected, if be lower than threshold value of warning, then cancel fault pre-alarming, otherwise the early warning upgrading.

As shown in Figure 4, be the flow chart of preferred embodiment of the fault collection part of trouble-saving and intelligent restorative procedure in the network management system of the present invention, this part is carried out by fault collection module 30, may further comprise the steps:

Step S201: fault collection module 30 is received failure message;

Step S202: according to failure message, search corresponding fault collection strategy;

Step S203: according to above-mentioned fault collection strategy, gather the system parameters relevant with above-mentioned fault;

Step S204: generate the fault restoration notice, send to fault restoration module 40, finish.

Below be the preferred embodiment of the fault restoration part of trouble-saving and intelligent restorative procedure in the network management system of the present invention, this part is carried out by fault restoration module 40, may further comprise the steps:

Step S301: fault restoration module 40 is received the fault restoration notice;

Step S302: according to above-mentioned fault restoration notice, search corresponding fault restoration strategy and fault re-detection strategy;

Step S303: according to above-mentioned fault restoration strategy, above-mentioned fault is repaired;

Step S304: according to above-mentioned fault re-detection strategy, whether detection failure is eliminated, if, execution in step S305 then; Otherwise, execution in step S306;

Step S305: generate Failure elimination message, send to malfunction monitoring module 10, finish;

Step S306: generate failure message, send to failure message sending module 50, finish.

Below be the preferred embodiment that the failure message of trouble-saving and intelligent restorative procedure reports part in the network management system of the present invention, this part is carried out by failure message sending module 50, may further comprise the steps:

Step S401: failure message sending module 50 is received failure message;

Step S402: according to above-mentioned failure message, search corresponding failure message send mode;

Step S403: report above-mentioned failure message with above-mentioned failure message send mode.

Above-mentioned explanation illustrates and has described the preferred embodiments of the present invention, but as previously mentioned, be to be understood that the present invention is not limited to the disclosed form of this paper, should not regard the eliminating to other embodiment as, and can be used for various other combinations, modification and environment, and can in invention contemplated scope described herein, change by technology or the knowledge of above-mentioned instruction or association area.And the change that those skilled in the art carry out and variation do not break away from the spirit and scope of the present invention, then all should be in the protection range of claims of the present invention.

Claims (14)

1. trouble-saving and intelligent restorative procedure in the network management system is characterized in that, said method comprising the steps of:
Be in advance in the network management system with the corresponding threshold value of warning of the system parameter setting of fault correlation and fault threshold, corresponding early warning is set simultaneously repairs strategy and fault restoration strategy;
During less than corresponding fault threshold, then generate early warning information, and search corresponding early warning and repair strategy more than or equal to corresponding threshold value of warning when monitoring system parameters with fault correlation, according to described early warning reparation strategy described early warning is repaired;
When monitoring with the system parameters of fault correlation more than or equal to corresponding fault threshold, then generate failure message and reporting fault alarm; According to described failure message, search corresponding fault restoration strategy simultaneously, and according to described fault restoration strategy described fault is repaired.
2. method according to claim 1 is characterized in that, described method also sets in advance corresponding fault collection strategy, early warning re-detection strategy, fault re-detection strategy and failure message send mode.
3. method according to claim 2 is characterized in that, and is described according to described failure message, searches corresponding fault restoration strategy, and according to described fault restoration strategy described fault repaired specifically and may further comprise the steps:
According to described failure message, search corresponding fault collection strategy;
According to described fault collection strategy, gather the system parameters relevant with described fault;
Generation fault restoration notice;
According to described fault restoration notice, search corresponding fault restoration strategy;
According to described fault restoration strategy, described fault is repaired.
4. method according to claim 2 is characterized in that, described method is further comprising the steps of:
After the fault pre-alarming reparation is finished, search and carry out corresponding early warning re-detection strategy, detect early warning and whether eliminate, if then generate early warning and eliminate message; Otherwise, generate the early warning upgrading message;
After fault restoration is finished, search and carry out corresponding fault re-detection strategy, whether detection failure is eliminated, if then generate Failure elimination message and reporting fault and eliminate prompting; Otherwise, regenerate failure message.
5. method according to claim 4 is characterized in that, described method is also carried out following steps after regenerating failure message:
According to described failure message, search corresponding failure message send mode, report described failure message with described failure message send mode.
6. method according to claim 2 is characterized in that, described early warning re-detection strategy is corresponding health detection strategy in the network management system.
7. each described method according to claim 1-6, it is characterized in that, described early warning reparation strategy, fault restoration strategy, fault collection strategy, early warning re-detection strategy, fault re-detection strategy and health detection strategy are built-in script, User Defined script or the message event of described network management system.
8. trouble-saving and intelligent prosthetic device in the network management system is characterized in that, described device comprises fault detection module, trouble-saving module and fault restoration module;
Described malfunction monitoring module is used for providing setting and the threshold value of warning of the system parameters of fault correlation and the interface of fault threshold; And the system parameters of monitoring and fault correlation, and judge the size of described system parameters and corresponding threshold value of warning and fault threshold; When described system parameters during less than the fault threshold of correspondence, generates early warning information more than or equal to the threshold value of warning of correspondence, send to described trouble-saving module; When monitoring described system parameters more than or equal to the fault threshold of correspondence, generate failure message and reporting fault alarm, simultaneously described failure message is sent to described fault restoration module;
Described trouble-saving module is used for providing the interface that early warning reparation strategy is set; And according to the early warning information of receiving, search corresponding early warning and repair strategy, and according to described early warning reparation strategy described early warning is repaired;
Described fault restoration module is used for providing the interface that the fault restoration strategy is set; And according to the failure message of receiving, search corresponding fault restoration strategy, and according to described fault restoration strategy described fault is repaired.
9. device according to claim 8 is characterized in that, described device also comprises fault collection module and failure message sending module,
Described malfunction monitoring module, the failure message that is used for generating sends to described fault collection module;
Described fault collection module is used for providing the interface that the fault collection strategy is set; And according to the failure message of receiving, search corresponding fault collection strategy, and according to described fault collection strategy, gather the system parameters relevant with described fault, generate the fault restoration notice, send to described fault restoration module;
Described fault restoration module is used for searching corresponding fault restoration strategy, and according to described fault restoration strategy described fault being repaired according to the fault restoration notice of receiving;
Described failure message sending module is used for providing the interface that the failure message send mode is set; And when receiving failure message, search corresponding failure message send mode, and report described failure message with described failure message send mode.
10. device according to claim 9 is characterized in that,
Described trouble-saving module is used for providing the interface that early warning re-detection strategy is set; And according to early warning information, search corresponding early warning re-detection strategy, and according to described early warning re-detection strategy, detect early warning and whether eliminate, and when early warning is eliminated, generate early warning and eliminate message, send to described malfunction monitoring module; When early warning is not eliminated, generate the early warning upgrading message, send to described malfunction monitoring module;
Described fault restoration module is used for providing the interface that fault re-detection strategy is set; And the fault restoration notice, search corresponding fault re-detection strategy, and according to described fault re-detection strategy, whether detection failure is eliminated, and when Failure elimination, generated Failure elimination message, send to described malfunction monitoring module; When fault is not eliminated, generate failure message and send to described failure message sending module.
11. device according to claim 8 is characterized in that, described malfunction monitoring module is used for when generating early warning information, judges whether the early warning elimination message of receiving that within the default very first time described trouble-saving module is returned; And within the described very first time, receive when message is eliminated in early warning that described trouble-saving module returns, eliminate described early warning; And when generating failure message, judge whether within the second default time, to receive the Failure elimination message that described fault restoration module is returned, and when receiving the Failure elimination message that described fault restoration module returns within described the second time, reporting fault is eliminated prompting.
12. according to claim 8,9 or 11 each described devices, it is characterized in that, described malfunction monitoring module comprises early warning monitoring submodule and failure monitoring submodule, wherein,
Described early warning monitoring submodule is used for during less than the fault threshold of correspondence, generating early warning information more than or equal to the threshold value of warning of correspondence in system parameters, sends to described trouble-saving module; And judge whether within the default very first time, to receive that the early warning that described trouble-saving module is returned eliminates message, and within the described very first time, receive when message is eliminated in early warning that described trouble-saving module is returned, remove described early warning;
Described failure monitoring submodule is used for during more than or equal to the fault threshold of correspondence, generating failure message and reporting fault alarm in system parameters, simultaneously described failure message is sent to described fault collection module; And judge whether to receive the Failure elimination message that described fault restoration module is returned within the second default time, and when receiving the Failure elimination message that described fault restoration module returns within described the second time, reporting fault is eliminated prompting.
13. device according to claim 10 is characterized in that, described trouble-saving module comprises early warning reparation submodule and early warning detection sub-module; Wherein,
Submodule is repaired in described early warning, is used for according to early warning information, searches corresponding early warning and repairs strategy, and according to described early warning reparation strategy described early warning is repaired;
Whether described early warning detection sub-module is used for according to early warning information, searches corresponding early warning re-detection strategy, and detect early warning according to described early warning re-detection strategy and eliminate.
14. according to claim 9 or 10 each described devices, it is characterized in that, described fault restoration module comprises that intelligence repairs submodule and fault detect submodule; Wherein,
Described intelligence is repaired submodule, according to the fault restoration notice, searches corresponding fault restoration strategy, and according to described fault restoration strategy described fault is repaired;
Whether described fault detect submodule is used for the fault restoration notice, search corresponding fault re-detection strategy, and eliminate according to described fault re-detection strategy detection failure.
CN201210185225XA 2012-06-07 2012-06-07 Fault preventing and intelligent repairing method and device for network management system CN102857365A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210185225XA CN102857365A (en) 2012-06-07 2012-06-07 Fault preventing and intelligent repairing method and device for network management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210185225XA CN102857365A (en) 2012-06-07 2012-06-07 Fault preventing and intelligent repairing method and device for network management system

Publications (1)

Publication Number Publication Date
CN102857365A true CN102857365A (en) 2013-01-02

Family

ID=47403574

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210185225XA CN102857365A (en) 2012-06-07 2012-06-07 Fault preventing and intelligent repairing method and device for network management system

Country Status (1)

Country Link
CN (1) CN102857365A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103227662A (en) * 2013-04-25 2013-07-31 广东电网公司电力调度控制中心 Safety detection method and system of electric power communication equipment based on state control
CN103426062A (en) * 2013-08-19 2013-12-04 上海欧显信息科技有限公司 Fault grading and system maintaining method
WO2014129983A1 (en) * 2013-02-21 2014-08-28 Thai Oil Public Company Limited Methods, systems, and devices for managing a plurality of alarms
CN104199755A (en) * 2014-08-25 2014-12-10 广东欧珀移动通信有限公司 Method and system for diagnosing hardware module faults based on indicator lights
CN104639346A (en) * 2013-11-06 2015-05-20 中兴通讯股份有限公司 Method and device for detecting network management equipment of communication operator
CN107294786A (en) * 2017-07-13 2017-10-24 郑州云海信息技术有限公司 A kind of failure information processing method and device
CN108322345A (en) * 2018-02-07 2018-07-24 平安科技(深圳)有限公司 A kind of dissemination method and server of fault restoration data packet
CN108712283A (en) * 2018-05-10 2018-10-26 国网江西省电力有限公司信息通信分公司 Fault early warning method based on resource associations relationship in information system and device
WO2019052474A1 (en) * 2017-09-12 2019-03-21 中兴通讯股份有限公司 Audio system management method, apparatus and device, and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136799A (en) * 2007-09-20 2008-03-05 中兴通讯股份有限公司 Method for implementing communication appliance fault centralized alarm treatment
CN101179440A (en) * 2007-12-03 2008-05-14 中兴通讯股份有限公司 Method and system of implementing distant early warning of communication system
CN101201786A (en) * 2006-12-13 2008-06-18 中兴通讯股份有限公司 Method and device for monitoring fault log
CN101222742A (en) * 2007-11-22 2008-07-16 中国移动通信集团山东有限公司 Alarm self-positioning and self-processing method and system for mobile communication network guard system
CN101409637A (en) * 2008-11-20 2009-04-15 浪潮通信信息系统有限公司 Alarm interactive automatic processing method for communication network management system
CN101605346A (en) * 2008-06-10 2009-12-16 中兴通讯股份有限公司 The fault restoration method and apparatus
CN101800675A (en) * 2010-02-25 2010-08-11 华为技术有限公司 Failure monitoring method, monitoring equipment and communication system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101201786A (en) * 2006-12-13 2008-06-18 中兴通讯股份有限公司 Method and device for monitoring fault log
CN101136799A (en) * 2007-09-20 2008-03-05 中兴通讯股份有限公司 Method for implementing communication appliance fault centralized alarm treatment
CN101222742A (en) * 2007-11-22 2008-07-16 中国移动通信集团山东有限公司 Alarm self-positioning and self-processing method and system for mobile communication network guard system
CN101179440A (en) * 2007-12-03 2008-05-14 中兴通讯股份有限公司 Method and system of implementing distant early warning of communication system
CN101605346A (en) * 2008-06-10 2009-12-16 中兴通讯股份有限公司 The fault restoration method and apparatus
CN101409637A (en) * 2008-11-20 2009-04-15 浪潮通信信息系统有限公司 Alarm interactive automatic processing method for communication network management system
CN101800675A (en) * 2010-02-25 2010-08-11 华为技术有限公司 Failure monitoring method, monitoring equipment and communication system

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10127799B2 (en) 2013-02-21 2018-11-13 Thai Oil Public Company Limited Methods, systems, and devices for managing, reprioritizing, and suppressing initiated alarms
WO2014129983A1 (en) * 2013-02-21 2014-08-28 Thai Oil Public Company Limited Methods, systems, and devices for managing a plurality of alarms
US9697722B2 (en) 2013-02-21 2017-07-04 Thai Oil Public Company Limited Methods, systems, and devices for managing a plurality of alarms
US9633552B2 (en) 2013-02-21 2017-04-25 Thai Oil Public Company Limited Methods, systems, and devices for managing, reprioritizing, and suppressing initiated alarms
CN103227662B (en) * 2013-04-25 2016-06-01 广东电网公司电力调度控制中心 A kind of electric power communication device safety detection method based on mode control and system
CN103227662A (en) * 2013-04-25 2013-07-31 广东电网公司电力调度控制中心 Safety detection method and system of electric power communication equipment based on state control
CN103426062A (en) * 2013-08-19 2013-12-04 上海欧显信息科技有限公司 Fault grading and system maintaining method
CN104639346A (en) * 2013-11-06 2015-05-20 中兴通讯股份有限公司 Method and device for detecting network management equipment of communication operator
CN104199755A (en) * 2014-08-25 2014-12-10 广东欧珀移动通信有限公司 Method and system for diagnosing hardware module faults based on indicator lights
CN104199755B (en) * 2014-08-25 2017-08-08 广东欧珀移动通信有限公司 A kind of method and system that hardware module failure is diagnosed based on indicator lamp
CN107294786A (en) * 2017-07-13 2017-10-24 郑州云海信息技术有限公司 A kind of failure information processing method and device
WO2019052474A1 (en) * 2017-09-12 2019-03-21 中兴通讯股份有限公司 Audio system management method, apparatus and device, and storage medium
CN108322345A (en) * 2018-02-07 2018-07-24 平安科技(深圳)有限公司 A kind of dissemination method and server of fault restoration data packet
CN108322345B (en) * 2018-02-07 2020-08-21 平安科技(深圳)有限公司 Method for issuing fault repair data packet and server
CN108712283A (en) * 2018-05-10 2018-10-26 国网江西省电力有限公司信息通信分公司 Fault early warning method based on resource associations relationship in information system and device

Similar Documents

Publication Publication Date Title
US20200106662A1 (en) Systems and methods for managing network health
US10148489B2 (en) Service impact event analyzer for cloud SDN service assurance
CN104170323B (en) Fault handling method and device, system based on network function virtualization
US5673386A (en) Method and system for identification of software application faults
CN104301136B (en) Fault information reporting and the method and apparatus of processing
US6353902B1 (en) Network fault prediction and proactive maintenance system
US7430688B2 (en) Network monitoring method and apparatus
US6587686B1 (en) Method and apparatus for detecting base station transceivers malfunctions
US20070115837A1 (en) Scalable Selective Alarm Suppression for Data Communication Network
US9417982B2 (en) Method and apparatus for isolating a fault in a controller area network
ES2368771T3 (en) Method and device for the creation of a link gate in a network point to multipoint.
US9798680B2 (en) Policy control method and apparatus for terminal peripheral
KR100897557B1 (en) Method, system and device for processing tasks in device management
CN104486100B (en) Fault treating apparatus and method
CA2272609A1 (en) Software fault management system
HUE028290T2 (en) Identification of a manipulated or defect base station during handover
CN102457390B (en) A kind of Fault Locating Method based on QOE and system
RU2466505C2 (en) Method, device and system of communication for protection of alarm transfer
JP2012512578A (en) Method and apparatus for handling cell out-of-service failures
US20080056141A1 (en) Method and System for Providing Connectivity Outage Detection for MPLS Core Networks Based on Service Level Agreement
CN101201786A (en) Method and device for monitoring fault log
CN102075380B (en) Method and device for detecting server state
US8755499B2 (en) Methods, computer program products, and systems for managing voice over internet protocol (VOIP) network elements
CN102055525A (en) Loop detecting and controlling method
KR101697372B1 (en) Protection switching apparatus and method for protection switching of multiple protection group

Legal Events

Date Code Title Description
PB01 Publication
C06 Publication
SE01 Entry into force of request for substantive examination
C10 Entry into substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20130102

RJ01 Rejection of invention patent application after publication