CN104243192B - Fault handling method and system - Google Patents

Fault handling method and system Download PDF

Info

Publication number
CN104243192B
CN104243192B CN201310237951.6A CN201310237951A CN104243192B CN 104243192 B CN104243192 B CN 104243192B CN 201310237951 A CN201310237951 A CN 201310237951A CN 104243192 B CN104243192 B CN 104243192B
Authority
CN
China
Prior art keywords
fault
failure
relevant
fisrt
fisrt fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310237951.6A
Other languages
Chinese (zh)
Other versions
CN104243192A (en
Inventor
李宏琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shenzhou Taiyue Software Co Ltd
Original Assignee
Beijing Shenzhou Taiyue Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shenzhou Taiyue Software Co Ltd filed Critical Beijing Shenzhou Taiyue Software Co Ltd
Priority to CN201310237951.6A priority Critical patent/CN104243192B/en
Publication of CN104243192A publication Critical patent/CN104243192A/en
Application granted granted Critical
Publication of CN104243192B publication Critical patent/CN104243192B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a kind of fault handling method and system, is related to failure analysis techniques field.Fault handling method provided in an embodiment of the present invention and system, in this fault correlation method, equipment working condition is monitored in real time, occur when monitoring major error A, then forwardly and rearwardly searched in time window T with major error A time of origin points respectively, whether faulty B occurs, if so, then establishing incidence relation.The foundation of fault correlation relation, be advantageous to operation maintenance personnel and handled for relevant fault, improve troubleshooting efficiency.Further, the embodiment of the present invention has also set up memory cache queue, by the way that failure is cached in internal memory in a manner of queue, so as to it is determined that during fault correlation relation, only inquire about the memory cache queue can whether relevant failure occurs to quickly find, avoid analyzing large sample, further improve troubleshooting efficiency.

Description

Fault handling method and system
Technical field
The present invention relates to failure analysis techniques field, more particularly to a kind of fault handling method and system.
Background technology
In everyday devices maintenance, monitored typically by monitoring personnel, such as find failure, just the failure is submitted and safeguarded Personnel carry out investigation processing to it, to recover normal operating conditions in time.
But in above-mentioned processing method, for attendant, because the reporting fault received is disorderly and unsystematic, have no For rule, therefore, when to malfunction elimination and processing, efficiency is low.Therefore, at there is an urgent need to a kind of failure of effective Solution is managed, to improve troubleshooting efficiency.
The content of the invention
In view of the above problems, the embodiment of the present invention provides a kind of fault handling method and system, enabling according to orderly The failure reported, realize efficiently quickly troubleshooting solution.
The embodiment of the present invention employs following technical scheme:
One embodiment of the invention provides a kind of fault handling method, and methods described includes:
Whether when monitoring Fisrt fault generation, respectively forwardly and backward searching has the second failure hair in scheduled time window It is raw;
The generation of the second failure is such as found, then using the Fisrt fault and the second failure as relevant fault, is reported described Relevant fault;As do not found the generation of the second failure, then the Fisrt fault is reported;
For the failure reported, if relevant fault, then processing is merged to it;If Fisrt fault, then it is entered Row processing.
Methods described also includes:
Relation template is established, for recording the incidence relation between failure;And
Memory cache queue is established, if the faulty generation during monitoring, is cached in a manner of queue in internal memory The failure;
It is then described when monitoring Fisrt fault generation, respectively forwardly and backward search in scheduled time window whether have second Failure specifically includes:
When monitoring Fisrt fault generation, the relation template is inquired about, judges the Fisrt fault with the presence or absence of association Failure, if relevant fault is not present, report the Fisrt fault;
If relevant fault be present, scheduled time window before the Fisrt fault occurs is inquired about in the memory cache queue Inside whether there is the second failure being associated, also, in the scheduled time window after Fisrt fault generation, continuing monitoring is It is no to have the second failure.
Methods described also includes:Management is monitored to the memory cache queue, currently processed failure is removed and occurs in advance The failure fixed time before window;
It is then described when monitoring Fisrt fault generation, respectively forwardly and backward search in scheduled time window whether have second Failure specifically includes:
When monitoring Fisrt fault generation, the relation template is inquired about, judges the Fisrt fault with the presence or absence of association Failure, if relevant fault is not present, report the Fisrt fault;
Whether if relevant fault be present, being inquired about in the memory cache queue has the second failure being associated, and And whether in the scheduled time window after Fisrt fault generation, continuing to monitor has the second event in the memory cache queue Barrier occurs.
If relevant fault be present, methods described also includes:
Fault correlation relation in the relation template, establish in internal memory buffer queue and uniquely marked with network element Know and correlation rule ID mark one packet, then when monitor failure occur when, by current monitor to failure be buffered in it In the packet of corresponding network element unique mark and correlation rule ID marks.
Second failure is one or more.
The relation of the Fisrt fault of relevant fault and the second failure is as follows each other:
Fisrt fault is major error, then the second failure is time failure;Or
Fisrt fault is time failure, then the second failure is major error.
The embodiment of the present invention also provides a kind of fault processing system, and the system includes:
Relevant fault searching modul, for when monitoring Fisrt fault generation, respectively forwardly and backward searching pre- timing Between whether have the second failure in window;
Reporting module, occur if finding the second failure for the relevant fault searching modul, by the described first event Barrier and the second failure report the relevant fault as relevant fault;If the relevant fault searching modul does not find second Failure occurs, then reports the Fisrt fault;
Processing module, for the failure for reporting, if relevant fault, then processing is merged to it;If first Failure, then it is handled.
The system also includes:
Relation template module, for establishing relation template, record the incidence relation between failure;
Cache module, for establishing memory cache queue, if the faulty generation during monitoring, in a manner of queue The failure is cached in internal memory;
Then the relevant fault searching modul specifically includes:
Fault type judging unit, for when monitoring Fisrt fault and occurring, the relation template being inquired about, described in judgement Fisrt fault whether there is relevant fault;
Searching unit, is there is relevant fault in the judged result for the fault type judging unit, then described interior Deposit and inquire about before the Fisrt fault occurs whether have the second failure being associated in scheduled time window in buffer queue, also, In scheduled time window after Fisrt fault generation, continue to monitor whether the second failure;
Trigger element is reported, for being in the absence of relevant fault, then when the judged result of the fault type judging unit Trigger the reporting module and report the Fisrt fault;And the upper declaration form is triggered according to the lookup result of the searching unit First reporting fault.
The system also includes:
Memory cache queue management module, for being monitored management to the memory cache queue, remove currently processed Failure before scheduled time window occurs for failure;
Then the relevant fault searching modul specifically includes:
Fault type judging unit, for when monitoring Fisrt fault and occurring, the relation template being inquired about, described in judgement Fisrt fault whether there is relevant fault;
Searching unit, is there is relevant fault in the judged result for the fault type judging unit, then described interior Deposit and inquire about whether have the second failure being associated in buffer queue, also, the scheduled time after Fisrt fault generation In window, continue to monitor in the memory cache queue whether have the second failure;
Trigger element is reported, for being in the absence of relevant fault, then when the judged result of the fault type judging unit Trigger the reporting module and report the Fisrt fault;And the upper declaration form is triggered according to the lookup result of the searching unit First reporting fault.
The cache module also includes:
Grouped element, during for relevant fault be present, the fault correlation relation in the relation template, delay in internal memory The packet established in queue and identified with network element unique mark and correlation rule ID is deposited, then when monitoring failure When, by current monitor to failure be buffered in the packet of network element unique mark corresponding to it and correlation rule ID marks In;
Second failure is one or more;
The relation of the Fisrt fault of relevant fault and the second failure is as follows each other:
Fisrt fault is major error, then the second failure is time failure;Or
Fisrt fault is time failure, then the second failure is major error.
Fault handling method provided in an embodiment of the present invention and system, in this fault correlation method, to equipment working condition Monitored in real time, occur when monitoring major error A, then the time is forwardly and rearwardly searched with major error A time of origin points respectively In window T, if faulty B occurs, if so, then establishing incidence relation.The foundation of fault correlation relation, is advantageous to operation maintenance personnel Handled for relevant fault, improve troubleshooting efficiency.
Further, the embodiment of the present invention has also set up memory cache queue, by being cached in a manner of queue in internal memory Failure, so that whether it is determined that during fault correlation relation, only inquiring about the memory cache queue can relevant to quickly find Failure occurs, and avoids analyzing large sample, further improves troubleshooting efficiency.
Brief description of the drawings
Fig. 1 is a kind of fault handling method flow chart that one embodiment of the invention provides;
Fig. 2 is a kind of fault handling method flow chart that another embodiment of the present invention provides;
Fig. 3 is a kind of instantiation flow chart of fault handling method provided in an embodiment of the present invention;
Fig. 4 is a kind of fault processing system block diagram that one embodiment of the invention provides.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention Formula is described in further detail.
In everyday devices maintenance, by the continuous observation analysis of monitoring personnel, the pests occurrence rule that is out of order is summarized.Generally, if Multiple failures occur typically together, then claim possess influence relation, referred to as relevant fault between the plurality of failure.Such as when A failures Occur, it is generally front and rear 10 minutes in B failures can also occur, then it is assumed that alarm A and alarm B is influence relation, according to concrete application Scene, primary-slave relation be present between relevant fault, such as in above-mentioned incidence relation, A is major error, B is time failure.
In this fault correlation method, equipment working condition is monitored in real time, occurs when monitoring major error A, then divides Do not searched forwardly and rearwardly in time window T with major error A time of origin points, if faulty B occurs, and is closed if so, then establishing Connection relation.The foundation of fault correlation relation, be advantageous to attendant and handled for relevant fault, improve troubleshooting effect Rate.
Specifically, referring to Fig. 1, it is a kind of fault handling method provided in an embodiment of the present invention, specifically comprises the following steps:
S101:Monitor failure.
S102:When monitoring Fisrt fault generation, respectively forwardly and backward search in scheduled time window whether have second Failure occurs.
According to different application scenarios, the length of scheduled time window can set difference.For example set in communications industry communication In standby maintenance application scene, the length that can set scheduled time window is 10 minutes.
S103:The generation of the second failure is such as found, then using the Fisrt fault and the second failure as relevant fault, is reported The relevant fault;As do not found the generation of the second failure, then the Fisrt fault is reported.
Fisrt fault and the second failure are relevant faults, i.e., under normal circumstances, both meetings are with generation, in practical application In, if before failure reports, analyzing and processing can be associated to failure, and report, so, attendant can be with pin Merging treatment is associated to failure, troubleshooting efficiency can be greatly improved.
It should be noted that above-mentioned second failure can be one or more, that is to say, that if Fisrt fault is A failures, Second failure can be B failures, or B failures, C failures and D failures etc., not be limited herein.
It is further to note that the relation of the Fisrt fault of relevant fault and the second failure can be each other:
Fisrt fault is major error, then the second failure is time failure;Or, Fisrt fault is time failure, then the second failure is Major error.For example if certain base station fault is major error, downstream signal sends failure and just could be arranged to time failure.
S104:For the failure reported, if relevant fault, then processing is merged to it;It is if Fisrt fault, then right It is handled.
In the embodiment of the present invention, T time before major error A occurs is searched(That is before time window T)Secondary failure B.Time found Failure B, establishes incidence relation.And time failure B is continued to, establish incidence relation.When major error A exceedes time window T, major error A is not in association time failure B.
It can be seen that fault handling method provided in an embodiment of the present invention and system, in this fault correlation method, equipment is worked Situation is monitored in real time, is occurred when monitoring major error A, is then forwardly and rearwardly searched with major error A time of origin points respectively In time window T, if faulty B occurs, if so, then establishing incidence relation.The foundation of fault correlation relation, is advantageous to O&M Personnel are handled for relevant fault, improve troubleshooting efficiency.
Preferably, referring to Fig. 2, another embodiment of the present invention provides another fault handling method.The embodiment of the present invention Further establish has memory cache queue, by caching failure in internal memory in a manner of queue, so as to it is determined that fault correlation During relation, only inquiring about the memory cache queue can be to quickly find whether relevant failure occurs, so as to avoid to full-page proof Notebook data is analyzed and processed, and can further improve troubleshooting efficiency.
Comprise the following steps that:
S201:Relation template is established, for recording the incidence relation between failure.
S202:Memory cache queue is established, if the faulty generation during monitoring, in a manner of queue in internal memory Cache the failure.
In concrete practice, when receiving failure B, out of order time-out time window is calculated(That is time of failure+time window T minutes).Failure is stored in memory cache in a manner of queue.If currently processed failure is failure A, audit memory caching Whether faulty B is present in time window T minutes before the failure A cached in queue, it is seen then that by increasing memory cache queue, The step of can avoiding analyzing big-sample data, only inquired about in internal memory buffer queue.
S203:When monitoring Fisrt fault generation, above-mentioned relation template is inquired about, judges Fisrt fault with the presence or absence of association Failure, if relevant fault is not present, step S204 is performed, if relevant fault be present, perform step S205.
S204:Fisrt fault is reported, performs step S208.
S205:Whether inquired about in internal memory buffer queue before Fisrt fault occurs has the be associated in scheduled time window Two failures, also, in the scheduled time window after Fisrt fault generation, continue to monitor whether the second failure.As searched Occur to the second failure, then perform step S206, do not find the generation of the second failure such as, then perform step S204.
Preferably, the embodiment of the present invention also comprises the following steps:Management is monitored to internal memory buffer queue, is removed current Failure before scheduled time window occurs for handling failure.
For the step, when receiving failure B, out of order time-out time window is calculated(That is time of failure+time window T minutes).Failure is stored in memory cache in a manner of queue.If currently processed failure is failure A, audit memory caching Whether faulty B is present before the failure A cached in queue(The only currently processed failure stored in memory cache queue is i.e. Before failure A in time window T minutes), also, in Fisrt fault(Failure A)In scheduled time window after generation, continue to monitor Whether second failure is had in memory cache queue(Failure B)Occur.It can be seen that by increasing memory cache queue, can avoid pair The step of big-sample data is analyzed, only inquired about in internal memory buffer queue.
Further, if relevant fault be present, methods described of the embodiment of the present invention also includes:
Fault correlation relation in the relation template, established in internal memory buffer queue so that " network element is unique One packet of mark+correlation rule ID " mark, then when monitoring failure and occurring, by current monitor to failure be buffered in In the packet that network element unique mark and correlation rule ID corresponding to it identify.
Wherein, network element unique mark, for certain equipment in unique mark network.
Correlation rule ID, for identifying correlation rule, for example A failures are major error, and B failures are time failure.
Accordingly, the step of relevant fault is inquired about in internal memory buffer queue, " to set specially in internal memory buffer queue Inquired about in the packet of standby network element unique mark+correlation rule ID " marks.So, data processing can further be reduced Sample, further improve the efficiency of troubleshooting.
S206:Using Fisrt fault and the second failure as relevant fault, the relevant fault is reported.
S207:Processing is merged to relevant fault, is terminated.
S208:Fisrt fault is handled.
Fisrt fault and the second failure are relevant faults, i.e., under normal circumstances, both meetings are with generation, in practical application In, if before failure reports, analyzing and processing can be associated to failure, and report, so, attendant can be with pin Merging treatment is associated to failure, troubleshooting efficiency can be greatly improved.
It should be noted that above-mentioned second failure can be one or more, that is to say, that if Fisrt fault is A failures, Second failure can be B failures, or B failures, C failures and D failures etc., not be limited herein.
It is further to note that the relation of the Fisrt fault of relevant fault and the second failure can be each other:
Fisrt fault is major error, then the second failure is time failure;Or, Fisrt fault is time failure, then the second failure is Major error.For example if certain base station fault is major error, downstream signal sends failure and just could be arranged to time failure.
In the embodiment of the present invention, T time before major error A occurs is searched in internal memory buffer queue(That is before time window T)'s Secondary failure B.The secondary failure B found, establishes incidence relation.And time failure B is continued to, establish incidence relation.When major error A surpasses Time window T is crossed, major error A is not in association time failure B.
It can be seen that fault handling method provided in an embodiment of the present invention and system, in this fault correlation method, equipment is worked Situation is monitored in real time, is occurred when monitoring major error A, is then forwardly and rearwardly searched with major error A time of origin points respectively In time window T, if faulty B occurs, if so, then establishing incidence relation.The foundation of fault correlation relation, is advantageous to O&M Personnel are handled for relevant fault, improve troubleshooting efficiency.
Further, the embodiment of the present invention has also set up memory cache queue, by being cached in a manner of queue in internal memory Failure, so that whether it is determined that during fault correlation relation, only inquiring about the memory cache queue can relevant to quickly find Failure occurs, and avoids analyzing large sample, further improves troubleshooting efficiency.
Referring to Fig. 3, for a kind of example of specific fault handling method provided in an embodiment of the present invention, Integral Thought:It is first First, caching needs the failure associated.Then, according to fault location information, failure is grouped.Finally, when the primary and secondary of faulty equipment When failure breaks down, fault correlation is carried out.That realizes generallys include following description, and specific sub-step is shown in Figure 3, Here is omitted.
S301:Define correlation rule.
Alerted based on same device fails A, the B that breaks down is child alarm.Definition time window length is T minutes.
S302:Reception activity alerts.
I. failure A is received(Or failure B), establish one with " one of network element unique mark+correlation rule ID " packet, Calculate out of order time-out time window(That is time of failure+time window T minutes).Failure is stored in internal memory in a manner of queue In caching.
Ii. failure B is received(Or failure A)." network element unique mark+correlation rule ID ", which whether there is, has not timed out number for lookup According to.If it does, and each other primary and secondary alert, by failure A, failure B associate.
S303:Time-out abandons.
Queue is retrieved, the alarm more than time window T is deleted from packet queue.It is no longer used to associate.
It can be seen that the beneficial effect of this example is:The complex query of big data sample is greatly reduced, accelerates fault correlation speed Degree, so as to substantially increase the efficiency of troubleshooting.
Referring to Fig. 4, the embodiment of the present invention provides a kind of fault processing system, including:
Relevant fault searching modul 401, for when monitoring Fisrt fault generation, respectively forwardly and backward searching predetermined Whether second failure is had in time window.
Reporting module 402, occur if finding the second failure for relevant fault searching modul 401, by Fisrt fault With the second failure as relevant fault, the relevant fault is reported;If relevant fault searching modul 401 does not find the second failure hair It is raw, then Fisrt fault is reported.
Processing module 403, for the failure for reporting, if relevant fault, then processing is merged to it;If One failure, then handled it.
Further, fault processing system provided in an embodiment of the present invention also includes:
Relation template module 404, for establishing relation template, record the incidence relation between failure.
Cache module 405, for establishing memory cache queue, if the faulty generation during monitoring, with queue Mode caches the failure in internal memory.
Then relevant fault searching modul 401 specifically includes:
Fault type judging unit, for when monitoring Fisrt fault and occurring, the relation template being inquired about, described in judgement Fisrt fault whether there is relevant fault.
Searching unit, is there is relevant fault in the judged result for the fault type judging unit, then described interior Deposit and inquire about before the Fisrt fault occurs whether have the second failure being associated in scheduled time window in buffer queue, also, In scheduled time window after Fisrt fault generation, continue to monitor whether the second failure.
And trigger element is reported, for being in the absence of association event when the judged result of the fault type judging unit Barrier, then trigger the reporting module and report the Fisrt fault;And triggered according to the lookup result of the searching unit on described Declaration form member reporting fault.
Further, fault processing system provided in an embodiment of the present invention also includes:
Memory cache queue management module 406, for being monitored management to internal memory buffer queue, remove currently processed event Failure before scheduled time window occurs for barrier.
Then the relevant fault searching modul 401 specifically includes:
Fault type judging unit, for when monitoring Fisrt fault and occurring, the relation template being inquired about, described in judgement Fisrt fault whether there is relevant fault.
Searching unit, is there is relevant fault in the judged result for the fault type judging unit, then described interior Deposit and inquire about whether have the second failure being associated in buffer queue, also, the scheduled time after Fisrt fault generation In window, continue to monitor in the memory cache queue whether have the second failure.
And trigger element is reported, for being in the absence of association event when the judged result of the fault type judging unit Barrier, then trigger the reporting module and report the Fisrt fault;And triggered according to the lookup result of the searching unit on described Declaration form member reporting fault.
Preferably, above-mentioned cache module also includes:
Grouped element, during for relevant fault be present, the fault correlation relation in the relation template, delay in internal memory The packet established in queue and identified with network element unique mark and correlation rule ID is deposited, then when monitoring failure When, by current monitor to failure be buffered in the packet of network element unique mark corresponding to it and correlation rule ID marks In.
It should be noted that above-mentioned second failure can be one or more, that is to say, that if Fisrt fault is A failures, Second failure can be B failures, or B failures, C failures and D failures etc., not be limited herein.
It is further to note that the relation of the Fisrt fault of relevant fault and the second failure can be each other:
Fisrt fault is major error, then the second failure is time failure;Or, Fisrt fault is time failure, then the second failure is Major error.For example if certain base station fault is major error, downstream signal sends failure and just could be arranged to time failure.
It should be noted that the operation principle and processing procedure of the modules or unit in present system embodiment The associated description in embodiment of the method shown in above-mentioned Fig. 1, Fig. 2 and Fig. 3 is may refer to, here is omitted.
It can be seen that fault handling method provided in an embodiment of the present invention and system, in this fault correlation method, equipment is worked Situation is monitored in real time, is occurred when monitoring major error A, is then forwardly and rearwardly searched with major error A time of origin points respectively In time window T, if faulty B occurs, if so, then establishing incidence relation.The foundation of fault correlation relation, is advantageous to O&M Personnel are handled for relevant fault, improve troubleshooting efficiency.
Further, the embodiment of the present invention has also set up memory cache queue, by being cached in a manner of queue in internal memory Failure, so that whether it is determined that during fault correlation relation, only inquiring about the memory cache queue can relevant to quickly find Failure occurs, and avoids analyzing large sample, further improves troubleshooting efficiency.
For the ease of clearly describing the technical scheme of the embodiment of the present invention, in the embodiment of invention, employ " first ", Printed words such as " second " make a distinction to function and the essentially identical identical entry of effect or similar item, and those skilled in the art can manage The printed words such as solution " first ", " second " are not defined to quantity and execution order.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent substitution and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention It is interior.

Claims (9)

1. a kind of fault handling method, it is characterised in that methods described includes:
Relation template is established, for recording the incidence relation between failure;And
Memory cache queue is established, if the faulty generation during monitoring, caches the event in a manner of queue in internal memory Barrier;
When monitoring Fisrt fault generation, respectively forwardly and backward search in scheduled time window whether have the second failure;
The generation of the second failure is such as found, then using the Fisrt fault and the second failure as relevant fault, reports the association Failure;As do not found the generation of the second failure, then the Fisrt fault is reported;
For the failure reported, if relevant fault, then processing is merged to it;If Fisrt fault, then at it Reason;
It is described when monitoring Fisrt fault and occurring, whether search respectively forwardly and backward has the second failure hair in scheduled time window Life specifically includes:
When monitoring Fisrt fault generation, the relation template is inquired about, judges that the Fisrt fault whether there is relevant fault, If relevant fault is not present, the Fisrt fault is reported;
If relevant fault be present, inquired about in the memory cache queue before the Fisrt fault occurs is in scheduled time window It is no to have the second failure being associated, also, in the scheduled time window after Fisrt fault generation, continue to have monitored whether Second failure occurs.
2. fault handling method according to claim 1, it is characterised in that methods described also includes:The internal memory is delayed Deposit queue and be monitored management, remove the failure before currently processed failure generation scheduled time window;
It is then described when monitoring Fisrt fault generation, respectively forwardly and backward search in scheduled time window whether have the second failure Specifically include:
When monitoring Fisrt fault generation, the relation template is inquired about, judges that the Fisrt fault whether there is relevant fault, If relevant fault is not present, the Fisrt fault is reported;
Whether if relevant fault be present, being inquired about in the memory cache queue has the second failure being associated, also, Whether in scheduled time window after the Fisrt fault generation, continuing to monitor has the second failure hair in the memory cache queue It is raw.
3. fault handling method according to claim 1 or 2, it is characterised in that if relevant fault be present, methods described is also Including:
Fault correlation relation in the relation template, established in internal memory buffer queue with network element unique mark and Correlation rule ID mark one packet, then when monitor failure occur when, by current monitor to failure be buffered in its institute it is right In the packet for network element unique mark and correlation rule the ID mark answered.
4. according to the fault handling method described in claim any one of 1-2, it is characterised in that second failure be one or It is multiple.
5. according to the fault handling method described in claim any one of 1-2, it is characterised in that the first event of relevant fault each other The relation of barrier and the second failure is as follows:
Fisrt fault is major error, then the second failure is time failure;Or
Fisrt fault is time failure, then the second failure is major error.
6. a kind of fault processing system, it is characterised in that the system includes:
Relation template module, for establishing relation template, record the incidence relation between failure;
Cache module, for establishing memory cache queue, if the faulty generation during monitoring, in a manner of queue including Deposit middle caching failure;
Relevant fault searching modul, for when monitoring Fisrt fault generation, respectively forwardly and backward searching scheduled time window Inside whether there is the second failure;
Reporting module, occur if finding the second failure for the relevant fault searching modul, by the Fisrt fault and Second failure reports the relevant fault as relevant fault;If the relevant fault searching modul does not find the second failure Occur, then report the Fisrt fault;Processing module, for the failure for reporting, if relevant fault, then it is carried out Merging treatment;If Fisrt fault, then it is handled;
The relevant fault searching modul specifically includes:
Fault type judging unit, for when monitoring Fisrt fault generation, inquiring about the relation template, judging described first Failure whether there is relevant fault;
Searching unit, the judged result for the fault type judging unit are then delayed relevant fault to be present in the internal memory Deposit and inquire about before the Fisrt fault occurs whether have the second failure being associated in scheduled time window in queue, also, in institute State in the scheduled time window after Fisrt fault occurs, continue to monitor whether the second failure.
7. fault processing system according to claim 6, it is characterised in that
Trigger element is reported, for being in the absence of relevant fault when the judged result of the fault type judging unit, is then triggered The reporting module reports the Fisrt fault;And triggered according to the lookup result of the searching unit in the reporting module Report failure.
8. fault processing system according to claim 7, it is characterised in that the system also includes:
Memory cache queue management module, for being monitored management to the memory cache queue, remove currently processed failure The failure before scheduled time window occurs;
Then the relevant fault searching modul specifically includes:
Fault type judging unit, for when monitoring Fisrt fault generation, inquiring about the relation template, judging described first Failure whether there is relevant fault;
Searching unit, the judged result for the fault type judging unit are then delayed relevant fault to be present in the internal memory Deposit and inquire about whether have the second failure being associated in queue, also, in the scheduled time window after Fisrt fault generation, Continue to monitor in the memory cache queue whether have the second failure;
Trigger element is reported, for being in the absence of relevant fault when the judged result of the fault type judging unit, is then triggered The reporting module reports the Fisrt fault;And triggered according to the lookup result of the searching unit in the reporting module Report failure.
9. the fault processing system according to claim 7 or 8, it is characterised in that the cache module also includes:
Grouped element, during for relevant fault be present, the fault correlation relation in the relation template, in memory cache team The packet identified with network element unique mark and correlation rule ID is established in row, then, will when monitoring failure generation Current monitor to the network element unique mark that is buffered in corresponding to it of failure and correlation rule ID marks packet in;
Second failure is one or more;
The relation of the Fisrt fault of relevant fault and the second failure is as follows each other:
Fisrt fault is major error, then the second failure is time failure;Or
Fisrt fault is time failure, then the second failure is major error.
CN201310237951.6A 2013-06-17 2013-06-17 Fault handling method and system Active CN104243192B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310237951.6A CN104243192B (en) 2013-06-17 2013-06-17 Fault handling method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310237951.6A CN104243192B (en) 2013-06-17 2013-06-17 Fault handling method and system

Publications (2)

Publication Number Publication Date
CN104243192A CN104243192A (en) 2014-12-24
CN104243192B true CN104243192B (en) 2017-11-10

Family

ID=52230593

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310237951.6A Active CN104243192B (en) 2013-06-17 2013-06-17 Fault handling method and system

Country Status (1)

Country Link
CN (1) CN104243192B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106411615A (en) * 2016-11-22 2017-02-15 北京奇虎科技有限公司 Device used for cloud remediation of system application and method
CN108234189B (en) * 2016-12-22 2021-10-08 北京神州泰岳软件股份有限公司 Alarm data processing method and device
CN109659936A (en) * 2018-12-29 2019-04-19 国电南瑞科技股份有限公司 A kind of smart grid Dispatching Control System failure method of disposal and system
CN111240871B (en) * 2019-12-30 2023-07-18 潍柴动力股份有限公司 Method and device for reporting engine fault
CN113515078A (en) * 2021-05-20 2021-10-19 湖南湘船重工有限公司 Intelligent ship information monitoring and alarm processing method and system
CN115056228B (en) * 2022-07-06 2023-07-04 中迪机器人(盐城)有限公司 Abnormality monitoring and processing system and method for robot

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6239699B1 (en) * 1999-03-03 2001-05-29 Lucent Technologies Inc. Intelligent alarm filtering in a telecommunications network
CN1492624A (en) * 2002-10-22 2004-04-28 华为技术有限公司 Processing method of communication network warning and relatively analysis management device
CN101360013A (en) * 2008-09-25 2009-02-04 烽火通信科技股份有限公司 General fast fault locating method for transmission network based on correlativity analysis
CN102014020A (en) * 2010-11-12 2011-04-13 百度在线网络技术(北京)有限公司 Equipment for performing network monitoring on network equipment and method thereof
CN102098175A (en) * 2011-01-26 2011-06-15 浪潮通信信息系统有限公司 Alarm association rule obtaining method of mobile internet

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5423427B2 (en) * 2010-01-26 2014-02-19 富士通株式会社 Information management program, information management apparatus, and information management method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6239699B1 (en) * 1999-03-03 2001-05-29 Lucent Technologies Inc. Intelligent alarm filtering in a telecommunications network
CN1492624A (en) * 2002-10-22 2004-04-28 华为技术有限公司 Processing method of communication network warning and relatively analysis management device
CN101360013A (en) * 2008-09-25 2009-02-04 烽火通信科技股份有限公司 General fast fault locating method for transmission network based on correlativity analysis
CN102014020A (en) * 2010-11-12 2011-04-13 百度在线网络技术(北京)有限公司 Equipment for performing network monitoring on network equipment and method thereof
CN102098175A (en) * 2011-01-26 2011-06-15 浪潮通信信息系统有限公司 Alarm association rule obtaining method of mobile internet

Also Published As

Publication number Publication date
CN104243192A (en) 2014-12-24

Similar Documents

Publication Publication Date Title
CN104243192B (en) Fault handling method and system
CN103220173B (en) A kind of alarm monitoring method and supervisory control system
CN101808351B (en) Method and system for business impact analysis
CN106385334B (en) Call center system and its abnormality detection and self-recovery method
CN112737800B (en) Service node fault positioning method, call chain generating method and server
CN105549508A (en) Alarm method based on information combination and apparatus thereof
CN111786986B (en) Numerical control system network intrusion prevention system and method
CN103634166B (en) Equipment survival detection method and equipment survival detection device
CN102111788A (en) Alarm processing method and alarm management system
Roblee et al. Implementing large-scale autonomic server monitoring using process query systems
TWI448975B (en) Dispersing-type algorithm system applicable to image monitoring platform
CN102195791A (en) Alarm analysis method, device and system
CN105281824A (en) Method and device for detecting constant light-emitting optical network unit (ONU) and network management equipment
CN110381082B (en) Mininet-based attack detection method and device for power communication network
CN111614630A (en) Network security monitoring method and device and cloud WEB application firewall
KR101973728B1 (en) Integration security anomaly symptom monitoring system
WO2014040470A1 (en) Alarm message processing method and device
CN115701889A (en) Oil field industrial control safety supervision method based on SOAR
CN114385438A (en) Service operation risk early warning method, system and storage medium
CN111274089B (en) Server abnormal behavior perception system based on bypass technology
CN108023741A (en) One kind monitoring resource using method and server
CN114006719A (en) AI verification method, device and system based on situation awareness
CN113254313A (en) Monitoring index abnormality detection method and device, electronic equipment and storage medium
CN107615708A (en) Alarm information reporting method and device
CN111918233A (en) Anomaly detection method suitable for wireless aviation network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder

Address after: Room 818, 8 / F, 34 Haidian Street, Haidian District, Beijing 100080

Patentee after: BEIJING ULTRAPOWER SOFTWARE Co.,Ltd.

Address before: 100089 Beijing city Haidian District wanquanzhuang Road No. 28 Wanliu new building 6 storey block A Room 601

Patentee before: BEIJING ULTRAPOWER SOFTWARE Co.,Ltd.

CP02 Change in the address of a patent holder