CN104734871A - Method and device for positioning failures - Google Patents

Method and device for positioning failures Download PDF

Info

Publication number
CN104734871A
CN104734871A CN201310711392.8A CN201310711392A CN104734871A CN 104734871 A CN104734871 A CN 104734871A CN 201310711392 A CN201310711392 A CN 201310711392A CN 104734871 A CN104734871 A CN 104734871A
Authority
CN
China
Prior art keywords
fault
conduction chain
monitored object
chain
failure information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201310711392.8A
Other languages
Chinese (zh)
Inventor
郭宪杰
申山宏
刘淑霞
尚尔刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201310711392.8A priority Critical patent/CN104734871A/en
Priority to CN201480057055.4A priority patent/CN105659528B/en
Priority to PCT/CN2014/087332 priority patent/WO2015090098A1/en
Publication of CN104734871A publication Critical patent/CN104734871A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Locating Faults (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a method and device for positioning failures. The method includes the steps of obtaining current failure information, establishing a conduction chain set of all monitoring objects for all failure types in a preset time window at different time points according to the obtained current failure information, analyzing the relevance of conduction chains in the conduction chain set to obtain the failure object conduction chains of all the monitoring objects for different failure types, and positioning the failure objects and the failure types according to the failure object conduction chains. By means of the method, root failure positioning and efficient order sending can be rapidly and accurately achieved, and the efficiency of daily network maintenance and failure order sending is improved.

Description

A kind of method and device realizing fault location
Technical field
The present invention relates to network management technology, espespecially a kind of method and device realizing fault location.
Background technology
Existing network management system is for managing each monitored object.Usually need the parameters by netconfig function configuration monitoring object, comprise the name identification of monitored object, annexation etc.Such as monitored object is a switch and four computers, and switch connects this four computers.After having had this configuration data, be just familiar with each object of management system, normally identify monitored object, as Switcher100, Computer100, Computer101, Computer102, Computer103 etc. according to mark title.
Usually attendant can be reported after fault threshold being reached to the monitored results of monitored object, such as cpu busy percentage reaches more than 96% to be needed to report to the police, this time, monitored object will send a piece of news to supervisor (network management system), and message comprises: the information such as index, current criteria value, alarm name of object type, object identity, monitoring.Such as Computer, ID=100, CPU, 98%, Computer CPU Utilization Ratio is too high.From network management system, these alarm datas all report from each monitored object, and type of message is can be self-defining.
After alarm data is reported by monitored object, according to interface definition, type of message, message object and object identity can be obtained, receive as mentioned above one " Computer, ID=100, CPU; 98%, Computer CPU Utilization Ratio is too high ", will know that abnormal conditions have appearred in Computer100.
In the real network of complexity, a fault can cause more monitored object to break down, and typical in after power down, all monitored object may all cannot normally work; Transmission line interrupts causing the communication of a panel region to be obstructed.May be exactly can report up to a hundred warning information within one or two minutes, in the alarm data that these report, if the alarm data of quick position root, preferentially repair it, other alarm data may will recover automatically.The alarm data how quick position is underlying is exactly the analysis emphasis of prior art, normally according to the causality (power down and low pressure etc. have before and after or causality) between the annexation (as Switcher100 is connected to Computer100 etc. 4) between network monitoring object, business, conclude these annexations, causality forms alarm knowledge base or empirical rule, utilize existing alarm knowledge base or alarm empirical rule to carry out fault location and analysis to alarm data.
Utilizing existing alarm knowledge base or alarm empirical rule to carry out fault location and analysis to alarm data, is the main method that existing network is safeguarded.But existing method is applied in the alarm data that can bring magnanimity in the monitoring of whole network, and across a network equipment is very large across the warning association analysis difficulty between management system.Particularly periodically networking and routinely regular maintenance make network be in the middle of the process of dynamically change all the time, and bring very large inaccuracy in the face of dynamic network configuration change is understood to the alarm empirical rule of priori, the location of root fault cannot be carried out fast and accurately, commodity network cannot be promoted and to safeguard and pending accounts send efficiency in single process.
Summary of the invention
In order to solve the problems of the technologies described above, the invention provides a kind of method and the device that realize fault location, can carry out the location of root fault fast and accurately, the maintenance of lifting commodity network and fault send the efficiency in single process.
In order to reach foregoing invention object, the invention discloses a kind of method realizing fault location, comprising:
Obtain current failure information, current failure information at least comprises monitored object, fault type and temporal information;
According to the current failure information obtained, set up all monitored object for the conduction chain set of different faults type in the scheduled time window of different time points;
Correlation between conduction chain in the conduction chain set of setting up is analyzed, obtains the fault object conduction chain of all monitored object for different faults type;
According to the fault object conduction chain obtained, orient current fault object and fault type.
Preferably, said method can also have following features: also comprise before described acquisition current failure information: according to the historical failure information obtained, set up fault metadata storehouse.
Preferably, said method can also have following features: before the set of described foundation conduction chain, the method also comprises: judge whether described current failure information is present in described historical failure information;
Preferably, said method can also have following features: describedly set up all monitored object and comprise for the conduction chain set of different faults type in the scheduled time window of different time points:
Obtain described monitored object for the conduction chain of current failure type in the scheduled time window of current point in time;
Current monitor object is set up for the conduction chain set of current failure type in the scheduled time window of different time points according to described historical failure information.
Preferably, said method can also have following features: the correlation between the described conduction chain to conducting in chain set is analyzed, and obtains the fault object conduction chain of all monitored object for different faults type, comprising:
Obtain each monitored object in the set of described conduction chain respectively and the number of times of often kind of fault occurs, calculate the ratio in the total degree that this fault occurs each monitored object number of times breaks down at all monitored object, the monitored object list described ratio being greater than predetermined threshold conducts chain as fault object.
Preferably, said method can also have following features: when judging that described current failure information is not present in described historical failure information, the method also comprises:
Correlation between the described conduction chain to conducting in chain set is analyzed, and obtains the fault object conduction chain of all monitored object for different faults type, comprising:
Obtain each monitored object in the set of current conduction chain respectively and the number of times of often kind of fault occurs, calculate the ratio that each monitored object occurs in the number of times total degree that all monitored object break down in current conduction chain of this fault, the monitored object list described ratio being greater than predetermined threshold conducts chain as fault object.
Preferably, said method can also have following features: all monitored object of described acquisition are for after the fault object conduction chain of different faults type, and the method also comprises:
According to described fault object conduction chain, obtain the fault conduction chain for different monitoring object, the fault conduction chain according to different monitoring object orients fault object and fault type; Or,
According to described fault object conduction chain, obtain the object conduction chain for different faults type, the object conduction chain according to different faults type orients fault object and fault type.
The invention also discloses a kind of device realizing fault location, comprising:
Receiver module, for obtaining current failure information, current failure information at least comprises monitored object, fault type and temporal information;
First sets up module, for according to the current failure information obtained, sets up all monitored object for the conduction chain set of different faults type in the scheduled time window of different time points, and exports to second and set up module;
Second sets up module, analyzes, obtain the fault object conduction chain export to locating module of all monitored object for all fault types for the correlation set up first between the conduction chain in the conduction chain set that module sets up;
Locating module, for according to the fault object conduction chain setting up module from second, orients fault object and fault type.
Preferably, said apparatus can also have following features: described device also comprises: fault metadata sets up module, for according to the fault message obtained, sets up fault metadata storehouse, fault metadata library information is passed to the first processing module.
Preferably, said apparatus can also have following features: described first sets up module, also for judging whether described current failure information is present in described historical failure information;
When judging that described current failure information is present in described historical failure information, obtain described monitored object for the conduction chain of current failure type in the scheduled time window of current point in time; Set up current monitor object for the conduction chain set of current failure type in the scheduled time window of different time points according to described fault message, set up module to described second and send the first notice.
Preferably, said apparatus can also have following features: described second set up module specifically for:
Receive the first notice setting up module from first, obtain the number of times that in the set of described conduction chain, each monitored object breaks down, calculate the ratio in the total degree that number of times that each monitored object breaks down breaks down at all monitored object, the monitored object list described ratio being greater than predetermined threshold conducts chain as fault object.
Preferably, said apparatus can also have following features: described first sets up module, also for before judging to obtain current failure information not history of existence fault message time, set up module to second and send the second notice;
Described second sets up module, also for receiving the second notice setting up module from first, obtain the number of times that in the set of current conduction chain, each monitored object breaks down, calculate the ratio in the number of times total degree that all monitored object break down in current conduction chain that each monitored object breaks down, the monitored object list described ratio being greater than predetermined threshold conducts chain as fault object.
Preferably, said apparatus can also have following features: described locating module also for:
According to described fault object conduction chain, obtain the fault conduction chain for different monitoring object, the fault conduction chain according to the different monitoring object obtained orients fault object and fault type;
Or according to described fault object conduction chain, obtain the object conduction chain for different faults type, the object conduction chain according to different faults type orients fault object and fault type.
Technical scheme comprises: obtain current failure information, current failure information comprises monitored object, fault type and temporal information; According to acquisition current failure information, set up all monitored object for the conduction chain set of different faults type in the scheduled time window of different time points; Correlation between conduction chain in the conduction chain set of setting up is analyzed, obtains the fault object conduction chain of all monitored object for all fault types; And according to the fault object conduction chain obtained, orient fault object and fault type.The technical scheme of the application need not find the causality between annexation between monitored object and fault type one by one, doing so avoids the time cost that cost is higher, meets the requirement of real-time.Do not emphasize causality in logic and carry out the judgement of strong correlation, contain the uncertainty caused by change that may exist, according to the ability level that monitoring is safeguarded, judge its priority processed according to the height of correlation, carry out fault location with means more flexibly.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, and form a application's part, schematic description and description of the present invention, for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart that the present invention realizes the method for fault location;
Fig. 2 is the flow chart that the present invention realizes the embodiment of the method for fault location;
Fig. 3 is the structural representation that the present invention realizes the device of fault location.
Embodiment
Below in conjunction with drawings and the specific embodiments, the present invention is described in detail.
Fig. 1 is the flow chart that the present invention realizes the method for fault location, comprises the following steps:
Step 101, obtains current failure information.
Wherein, current failure information comprises monitored object, fault type and temporal information.
Preferably, before acquisition current failure information, can also comprise:
According to historical failure information, set up fault metadata storehouse.
Specifically comprise: first according to the existing fault message state of the whole network, identify monitored object and the fault category of minimum particle size, then set up basic fault metadata storehouse according to the monitored object of minimum particle size and fault type.
Illustrate, monitored object is focus main in network management, can repair, can only replace during catastrophe failure during monitored object generation minor failure.Usual each monitored object is made up of several different parts, from safeguarding angle, and the monitored object of so-called minimum particle size, the minimum unit parts can replaced exactly.Such as switch, if the switch that a small-sized integrated level is high, cannot change for each port after breaking down, then each port needs after there is catastrophe failure to change this switch, then the minimum particle size of this monitored object is just switch itself.If a larger switch, each port can change parts, then minimum particle size is defined as each port under switch, can change port part when this port breaks down.So the monitored object of minimum particle size is the port numbering under switch.
Above-mentioned fault metadata storehouse due to the network expansion of monitored object, fault type abundant and constantly expand, due to fault metadata storehouse limited amount, only can increase and not delete, ensure to continue in monitoring historical failure available.
Step 102, according to fault metadata storehouse, sets up all monitored object for the conduction chain set of different faults type in the scheduled time window of different time points.
Specifically comprise:
First, current monitor object is obtained for the conduction chain of current failure type in the scheduled time window of current point in time.
Secondly, if before obtaining current failure information history of existence fault message time, set up current monitor object for the conduction chain set of current failure type in the scheduled time window of different time points according to historical failure information; Before obtaining current failure information if there is no historical failure information time, then proceed to step 103.
Preferably, above-mentioned conduction chain is defined as: a series of object outages sequence that a certain object outages can affect after occurring.
Step 103, analyzes the correlation between the conduction chain in the conduction chain set of setting up, and obtains the fault object conduction chain of all monitored object for different faults type.
Specifically comprise:
If before obtaining current failure information history of existence fault message time, obtain each monitored object in the set of above-mentioned conduction chain and the number of times of often kind of fault occurs, calculate the ratio in the total degree that this fault occurs each monitored object number of times breaks down at all monitored object, the monitored object list above-mentioned ratio being greater than predetermined threshold conducts chain as fault object.Or
Before obtaining current failure information if there is no historical failure information time, obtain the number of times that in the set of current conduction chain, each monitored object breaks down, calculate the ratio in the number of times total degree that all monitored object break down in current conduction chain that each monitored object breaks down, the monitored object list above-mentioned ratio being greater than predetermined threshold conducts chain as fault object.
Preferably, said method also comprises:
According to fault object conduction chain, obtain the fault conduction chain for different monitoring object, orient fault object and fault type according to fault conduction chain.Or,
According to fault object conduction chain, obtain the object conduction chain for different faults type, orient fault object and fault type according to object conduction chain.
Wherein, the current failure information initially reported, comprising: the essential information such as monitored object, fault type, time, and above-mentioned current failure information is as basic correlation basis for estimation, and these data are come from the Network element object of monitored object; If initial history data are empty, then correlation is all fixed tentatively is 100% strong correlation, and because counts is only 1, confidence level and priority reduce, and when historical data is constantly accumulated, the computability of correlation is more and more higher.
First, above-mentioned predetermined threshold can adjust in actual applications.
Secondly, above-mentioned fault object conduction chain is defined as: the object outages set of the strong correlation that the fault type of monitored object affects.
Moreover above-mentioned fault conduction chain is defined as: the limited fault set of the fault of strong correlation, is all easy to other fault type (may be different objects) caused on this chain when namely occurring for this fault.
Finally, above-mentioned object conduction chain is defined as: the limited object set of the object of strong correlation, namely for this object, other object (may be different faults) that any fault is all easy to affect on this chain occurs.
Step 104, according to the fault object conduction chain obtained, orients fault object and fault type.
Said method is using network management system when monitoring each monitored object of the whole network and fault type, abandon the analytical method of existing Corpus--based Method, but towards real-time dynamic fault message, find out the strong correlation relation of the spatial and temporal distributions of monitored object and fault type in a network, and with reference to the correlation (including, but are not limited to monitored object, connection, fault time, fault type etc.) of the object chain in historical failure information, the strong correlation carried out between fault object judges.
Do not emphasize causality in logic in the present invention and carry out the judgement of strong correlation, the uncertainty caused by change that containing may exist, according to the ability level that monitoring is safeguarded, judge its priority processed according to the height of correlation, achieve fault location with means more flexibly.
Fig. 2 is the detail flowchart that the present invention realizes the method for fault location, comprises the following steps:
Step 201, obtains current failure information, comprising: the essential informations such as monitored object, fault type and time.
Step 202, according to historical data information, sets up fault metadata storehouse, and the fault metadata storehouse of foundation comprises: the monitored object of minimum particle size and fault category;
Be specially:
Without under the prerequisite of priori, according to the existing fault message state of the whole network, identify the monitored object O of minimum particle size nwith fault type F m, according to the monitored object O of minimum particle size nwith fault type F mset up basic fault metadata storehouse.
Above-mentioned fault metadata storehouse is due to the network capacity extension of monitored object, fault type abundant and constantly expanding.
The current failure information initially reported, comprising: the essential information such as monitored object, fault type, time, and above-mentioned current failure information is as basic correlation basis for estimation, and these data are come from the Network element object of monitored object; If initial history data are empty, then correlation is all fixed tentatively is 100% strong correlation, and because counts is only 1, confidence level and priority reduce, and when historical data is constantly accumulated, the computability of correlation is more and more higher.
The fault type newly increased, or the fault type changed, do not inquire, be used as initial fault message and calculate by strong correlation in above-mentioned fault metadata storehouse; The monitored object newly increased, or the monitored object changing mark, do not inquire, be used as initial fault message and calculate by strong correlation in above-mentioned fault metadata storehouse.
To the monitored object changing mark, finally its correlative relationship still can be identical with the arithmetic result of former monitored object.
Step 203, obtains current point in time T 0conduction chain L in time window ij0set.
Specifically comprise: obtain current monitor object for the scheduled time window T of current failure at current point in time 0interior conduction chain L ij0set.
Wherein, chain L is conducted ij0set expression, in time series, passes in the time after a certain fault occurs, the monitored object occurred and fault type thereof, the conduction chain set of formation.
Step 204, has judged whether historical data, if there is historical data, then proceeds to step 205; If there is no historical data, then proceed to step 206.
Step 205, according to historical data, sets up T kthe conduction chain L of time point ijkset.
Specifically comprise:
First, current monitor object is set up for the conduction chain set of current failure type in the scheduled time window of different time points according to historical failure information.
Finally, each monitored object O is analyzed ifault type F j, be based upon T kthe conduction chain set of time point.
Wherein, chain L is conducted ijkbe defined as: conduction chain L ijkrepresent at object O ifault type F jthe time point T occurred klater T 0the object outages time series set occurred in time.
Illustrate, such as generator O ithe low fault F of output voltage joccur in 20:03 timesharing in certain day evening, its later T 0the time series set of all fault objects occurred in the time can think the node of this fault object on the fault conduction chain of this time point, wherein T 0for empirical, be generally 3 minutes or 5 minutes.
Step 206, analyzes the strong correlation between each conduction chain, obtains the fault object conduction chain L of all monitored object for all fault types ij.
Correlation determination methods between above-mentioned each conduction chain is specially:
Before obtaining current failure information history of existence fault message time, obtain each monitored object in the set of described conduction chain and the number of times of often kind of fault occurs, calculate the ratio in the total degree that this fault occurs each monitored object number of times breaks down at all monitored object, the monitored object list described ratio being greater than predetermined threshold conducts chain as fault object.Or
Before obtaining current failure information not history of existence fault message time, obtain each monitored object in the set of current conduction chain and the number of times of often kind of fault occurs, calculate the ratio that each monitored object occurs in the number of times total degree that all monitored object break down in current conduction chain of this fault, the monitored object list described ratio being greater than predetermined threshold conducts chain as fault object.
Wherein, predetermined threshold can adjust in actual applications.
Illustrate, first suppose, monitored object O ifault type F joccur, set up its T 0fault object set all in conduction time is L ijk=F (O i, F j, T k), analysis of history data, before this control object O ifault type F joccurred K-1 time, the accumulative conduction of K bar fault altogether chain.
Then, in this K article of fault conduction chain set, M is amounted to kindividual fault object, analyzes the number of times ∑ C that all these monitored object occur in K-1 the conduction chain set of history km=Count(L ijk, O m) (k=1,2 ... k-1), M is obtained kthe number of times that individual monitored object occurs, in order to normalization can calculate its frequency occurred, namely occurrence number accounts for the percentage of total quantity.
Finally, be the fault object of 100% if there is the frequency, then the degree of correlation is the highest, for cause and effect strong correlation relation, but because fault object chain in the production environment of reality can change because network changes, it is more than 90% that empirical data can get the frequency, or determines the priority orders of fault object according to frequency order from high to low.Fault object conduction chain L ijbe defined as: object O ifault type F jthe object outages set of the strong correlation affected;
Illustrate, in a certain complex communications networks, include the network subsystems such as wireless base station network, backbone network transmission network, IT monitor network, power and environmental monitoring network.Simplify its network model, suppose in its networking mode, there are three monitor nodes: power supply P 1, transmission T 1with base station S 1.Its three objects have causality: transmit passive after power interruptions, and base station also interrupts providing service, and when power supply is normal, transmission abnormality interrupts base station can not provide service, that is: P 1-->(T 1-->S 1).
As transmission T 1outage can calculate its T after occurring 0a lot of faults is had to report in time period, wherein base station S 1interrupt to occur after its time series occurs, the fault also having other near certain same time point produces; Carry out correlation analysis with the conduction chain of historical data, will (T be found 1-->S 1) occurrence frequency can be very high, ideally should reach 100% with occurring, and other fault occurred at random, then the degree of correlation of occurrence frequency can be lower.
Equally, as power supply P 1after power down fault occurs, the T on its conduction chain can be calculated 1and S 1also after appearing in time series, and the degree of correlation is very high; (P 1-->T 1) and (P 1-->S 1) be exactly power supply P 1conduction chain, P 1--> (T 1-->S 1) be exactly a larger conduction chain.
But, when due to network expansion or maintenance variation, transmission T 1no longer connect base station S 1but S 2, at this moment (T 1-->S 1) relation no longer occur, (T 1-->S 2) be then new conduct the relation.Because historical data does not exist when this conduct the relation starts, then think only to occur strong incidence relation once (under initial situation all occur once all think strong incidence relation 100%, but priority will reduce), (P 1-->T 1) and (P1-->S 2) be power supply P 1conduction chain, when occur second time more than time, priority just can improve.
Step 207, according to above-mentioned fault object conduction chain L ij, find the root fault on fault object conduction chain, orient monitored object and fault type.
Said method can generate the strongly connected spanning tree based on monitored object and fault type; After fault occurs, all alarm monitorings on a timeline, can conduct chain L according to object ijcarry out strongly connectedly automatically presenting; This presenting can help user to analyze better and localizing faults, unifies to send list more easily when sending single to a class site problems, and in conjunction with historical data, convenient investigation, raises the efficiency.
Step 208, in step, on the basis of 206, said method can also comprise:
According to above-mentioned fault object conduction chain L ij, obtain the object conduction chain L for different faults type i, according to above-mentioned object conduction chain L iorient fault object and fault type; Wherein
Above-mentioned object conduction chain L ibe defined as: the object O of strong correlation ilimited object set, namely for this object, other object that any fault is all easy to affect on this chain occurring, may be wherein different faults;
Object conduction chain L iconcrete determination methods:
An object O imultiple fault type can be detected, each fault type F jacquisition conduction chain L can be calculated ij(j=1 ... m), the fault that chain includes monitored object and its detection be affected is conducted.In object outages set in multiple conduction chain, calculate the frequency of the object outages occurred in each set to judge the correlation between multiple conduction chain, identical with above-mentioned determination methods;
Illustrate, on the multiple veneers in certain machine frame, the serious communication failure for machine frame detects, and all can have influence on the communication capacity of veneer self.Thisly to associate with fault type not quite, have set membership between object, the mode just can conducting chain by object carries out finding and excavating, and just preferentially can investigate father's malfunctioning node of conduction chain root during fault recovery.
The object with strong correlation can be expanded and be summarized as a large object bag, and the fault in object bag can be assigned as a fault Shang Zhan team, and the fault of strong correlation in object bag preferentially can investigate the malfunctioning node of conduction chain root.Or
Step 209, according to above-mentioned fault object conduction chain L ij, obtain the fault conduction chain L for different monitoring object j, according to fault conduction chain L jorient fault object and fault type.Wherein
Above-mentioned fault conduction chain L jbe defined as: be the fault F of strong correlation jlimited fault set, being all easy to other fault type caused on this chain when namely occurring for this fault, may be different monitored object.
Fault conduction chain L jconcrete determination methods: a fault F jcan detectedly on multiple objects occur, for each fault type F jequally can different object O iconduction chain L when it occurs ij(i=1 ... n), the fault that chain includes object and its detection be affected is conducted.In object outages set in multiple conduction chain, calculate the frequency of the object outages occurred in each set to judge the correlation between multiple conduction chain, identical with above-mentioned determination methods.
Illustrate, in the levels communication process of communication protocol stack, low-level communication often affects upper layer communication.If when monitoring the protocol stack of different levels, the fault of underlying protocol stack can affect the function of upper-layer protocol stack; Thisly to associate not quite with object itself, have the strong incidence relation of logic between object, the mode just can conducting chain by fault carries out finding and excavating, and just preferentially can investigate the malfunctioning node of conduction chain root during fault recovery.
Fig. 3 is the structural representation of the positioner of a kind of fault of one embodiment of the invention, comprising: receiver module (30), and module (31) is set up in fault metadata storehouse, and first sets up module (32), and second sets up module (33) and locating module (34).
Receiver module, for obtaining current failure information, current failure information at least comprises monitored object, fault type and temporal information;
Wherein, first sets up module, for according to the current failure information obtained, sets up all monitored object for the conduction chain set of different faults type in the scheduled time window of different time points, and exports to second and set up module.
First sets up module, also for judging whether described current failure information is present in described historical failure information; When judging that described current failure information is present in described historical failure information, obtain described monitored object for the conduction chain of current failure type in the scheduled time window of current point in time; Set up current monitor object for the conduction chain set of current failure type in the scheduled time window of different time points according to described fault message, set up module to described second and send the first notice.
Preferably, first sets up module, also for before judging to obtain current failure information not history of existence fault message time, set up module to second and send the second notice;
Second sets up module, analyzes, obtain the fault object conduction chain export to locating module of all monitored object for all fault types for the correlation set up first between the conduction chain in the conduction chain set that module sets up.
Further, second set up module specifically for: receive from first set up module first notice, obtain the number of times that in the set of described conduction chain, each monitored object breaks down, calculate the ratio in the total degree that number of times that each monitored object breaks down breaks down at all monitored object, the monitored object list described ratio being greater than predetermined threshold conducts chain as fault object.
Preferably, second sets up module, also for receiving the second notice setting up module from first, obtain the number of times that in the set of current conduction chain, each monitored object breaks down, calculate the ratio in the number of times total degree that all monitored object break down in current conduction chain that each monitored object breaks down, the monitored object list described ratio being greater than predetermined threshold conducts chain as fault object.
Locating module, for according to the fault object conduction chain setting up module from second, orients fault object and fault type.
Further, locating module also for:
According to fault object conduction chain, obtain the fault conduction chain for different monitoring object, the fault conduction chain according to the different monitoring object obtained orients fault object and fault type; Or described fault object conduction chain, obtains the object conduction chain for different faults type, and the object conduction chain according to different faults type orients fault object and fault type.
Finally, said apparatus also comprises: fault metadata sets up module, for according to the fault message obtained, sets up fault metadata storehouse, fault metadata library information is passed to the first processing module.
The above, be only preferred embodiments of the present invention, be not intended to limit protection scope of the present invention, and for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (12)

1. realize a method for fault location, it is characterized in that, comprising: obtain current failure information, current failure information at least comprises monitored object, fault type and temporal information;
According to the current failure information obtained, set up all monitored object for the conduction chain set of different faults type in the scheduled time window of different time points;
Correlation between conduction chain in the conduction chain set of setting up is analyzed, obtains the fault object conduction chain of all monitored object for different faults type;
According to the fault object conduction chain obtained, orient current fault object and fault type.
2. method according to claim 1, is characterized in that, also comprises before described acquisition current failure information: according to the historical failure information obtained, set up fault metadata storehouse.
3. method according to claim 2, is characterized in that, before the set of described foundation conduction chain, the method also comprises: judge whether described current failure information is present in described historical failure information;
When judging that described current failure information is present in described historical failure information, describedly setting up all monitored object and comprise for the conduction chain set of different faults type in the scheduled time window of different time points:
Obtain described monitored object for the conduction chain of current failure type in the scheduled time window of current point in time;
Current monitor object is set up for the conduction chain set of current failure type in the scheduled time window of different time points according to described historical failure information.
4. method according to claim 3, is characterized in that, the correlation between the described conduction chain to conducting in chain set is analyzed, and obtains the fault object conduction chain of all monitored object for different faults type, comprising:
Obtain each monitored object in the set of described conduction chain respectively and the number of times of often kind of fault occurs, calculate the ratio in the total degree that this fault occurs each monitored object number of times breaks down at all monitored object, the monitored object list described ratio being greater than predetermined threshold conducts chain as fault object.
5. method according to claim 1 and 2, is characterized in that, when judging that described current failure information is not present in described historical failure information, the method also comprises:
Correlation between the described conduction chain to conducting in chain set is analyzed, and obtains the fault object conduction chain of all monitored object for different faults type, comprising:
Obtain each monitored object in the set of current conduction chain respectively and the number of times of often kind of fault occurs, calculate the ratio that each monitored object occurs in the number of times total degree that all monitored object break down in current conduction chain of this fault, the monitored object list described ratio being greater than predetermined threshold conducts chain as fault object.
6. method according to claim 1 and 2, is characterized in that, all monitored object of described acquisition are for after the fault object conduction chain of different faults type, and the method also comprises:
According to described fault object conduction chain, obtain the fault conduction chain for different monitoring object, the fault conduction chain according to different monitoring object orients fault object and fault type; Or,
According to described fault object conduction chain, obtain the object conduction chain for different faults type, the object conduction chain according to different faults type orients fault object and fault type.
7. realize a device for fault location, it is characterized in that, comprising:
Receiver module, for obtaining current failure information, current failure information at least comprises monitored object, fault type and temporal information;
First sets up module, for according to the current failure information obtained, sets up all monitored object for the conduction chain set of different faults type in the scheduled time window of different time points, and exports to second and set up module;
Second sets up module, analyzes, obtain the fault object conduction chain export to locating module of all monitored object for all fault types for the correlation set up first between the conduction chain in the conduction chain set that module sets up;
Locating module, for according to the fault object conduction chain setting up module from second, orients fault object and fault type.
8. device according to claim 7, is characterized in that, described device also comprises: fault metadata sets up module, for according to the fault message obtained, sets up fault metadata storehouse, fault metadata library information is passed to the first processing module.
9. device according to claim 8, is characterized in that, described first sets up module, also for judging whether described current failure information is present in described historical failure information;
When judging that described current failure information is present in described historical failure information, obtain described monitored object for the conduction chain of current failure type in the scheduled time window of current point in time; Set up current monitor object for the conduction chain set of current failure type in the scheduled time window of different time points according to described fault message, set up module to described second and send the first notice.
10. device according to claim 9, is characterized in that, described second set up module specifically for:
Receive the first notice setting up module from first, obtain the number of times that in the set of described conduction chain, each monitored object breaks down, calculate the ratio in the total degree that number of times that each monitored object breaks down breaks down at all monitored object, the monitored object list described ratio being greater than predetermined threshold conducts chain as fault object.
11. devices according to claim 7 or 8, it is characterized in that, described first sets up module, also for before judging to obtain current failure information not history of existence fault message time, set up module to second and send the second notice;
Described second sets up module, also for receiving the second notice setting up module from first, obtain the number of times that in the set of current conduction chain, each monitored object breaks down, calculate the ratio in the number of times total degree that all monitored object break down in current conduction chain that each monitored object breaks down, the monitored object list described ratio being greater than predetermined threshold conducts chain as fault object.
12. devices according to claim 7 or 8, is characterized in that, described locating module also for:
According to described fault object conduction chain, obtain the fault conduction chain for different monitoring object, the fault conduction chain according to the different monitoring object obtained orients fault object and fault type;
Or according to described fault object conduction chain, obtain the object conduction chain for different faults type, the object conduction chain according to different faults type orients fault object and fault type.
CN201310711392.8A 2013-12-20 2013-12-20 Method and device for positioning failures Withdrawn CN104734871A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201310711392.8A CN104734871A (en) 2013-12-20 2013-12-20 Method and device for positioning failures
CN201480057055.4A CN105659528B (en) 2013-12-20 2014-09-24 A kind of method and device for realizing fault location
PCT/CN2014/087332 WO2015090098A1 (en) 2013-12-20 2014-09-24 Method and apparatus for realizing fault location

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310711392.8A CN104734871A (en) 2013-12-20 2013-12-20 Method and device for positioning failures

Publications (1)

Publication Number Publication Date
CN104734871A true CN104734871A (en) 2015-06-24

Family

ID=53402074

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201310711392.8A Withdrawn CN104734871A (en) 2013-12-20 2013-12-20 Method and device for positioning failures
CN201480057055.4A Active CN105659528B (en) 2013-12-20 2014-09-24 A kind of method and device for realizing fault location

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201480057055.4A Active CN105659528B (en) 2013-12-20 2014-09-24 A kind of method and device for realizing fault location

Country Status (2)

Country Link
CN (2) CN104734871A (en)
WO (1) WO2015090098A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294076A (en) * 2016-08-24 2017-01-04 浪潮(北京)电子信息产业有限公司 A kind of server relevant fault Forecasting Methodology and system thereof
WO2018010176A1 (en) * 2016-07-15 2018-01-18 华为技术有限公司 Method and device for acquiring fault information
CN107690676A (en) * 2017-07-04 2018-02-13 深圳怡化电脑股份有限公司 Financial self-service equipment maintenance distribute leaflets generation method, handheld terminal and electronic equipment
CN108229613A (en) * 2017-12-30 2018-06-29 武汉凌科通光电科技有限公司 Opto-electronic device Fault Locating Method and system
CN108351814A (en) * 2015-10-27 2018-07-31 甲骨文国际公司 For the system and method to supporting packet to be prioritized
CN108880838A (en) * 2017-05-10 2018-11-23 阿里巴巴集团控股有限公司 Monitoring method and device, the computer equipment and readable medium of traffic failure
CN109936470A (en) * 2017-12-18 2019-06-25 中国电子科技集团公司第十五研究所 A kind of method for detecting abnormality
CN110611604A (en) * 2019-09-19 2019-12-24 国家电网有限公司 Local area network equipment evaluation processing method and device
CN110635960A (en) * 2019-11-11 2019-12-31 国家电网有限公司 Upgrading method and device of communication equipment
CN111143101A (en) * 2019-12-12 2020-05-12 东软集团股份有限公司 Method and device for determining fault source, storage medium and electronic equipment
CN111739188A (en) * 2019-10-11 2020-10-02 北京京东尚科信息技术有限公司 AGV fault growth rate determination method and apparatus
CN113839804A (en) * 2020-06-24 2021-12-24 华为技术有限公司 Network fault determination method and network equipment

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108306747B (en) * 2017-01-11 2021-07-23 阿里巴巴集团控股有限公司 Cloud security detection method and device and electronic equipment
CN111327443B (en) * 2018-12-17 2022-11-22 中国移动通信集团北京有限公司 Fault root index determination method and device
CN115988551B (en) * 2022-12-19 2023-09-08 南京濠暻通讯科技有限公司 O-RAN wireless unit fault management method based on ZYNQ

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101252477A (en) * 2008-03-27 2008-08-27 杭州华三通信技术有限公司 Determining method and analyzing apparatus of network fault root
CN101442762A (en) * 2008-12-29 2009-05-27 中国移动通信集团北京有限公司 Method and apparatus for analyzing network performance and locating network fault
CN101854277A (en) * 2010-06-12 2010-10-06 河北全通通信有限公司 Method for monitoring mobile communication operation analysis system
CN102158360A (en) * 2011-04-01 2011-08-17 华中科技大学 Network fault self-diagnosis method based on causal relationship positioning of time factors

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100375435C (en) * 2004-06-22 2008-03-12 中兴通讯股份有限公司 Alarm correlation analysis of light synchronous transmitting net
US8156377B2 (en) * 2010-07-02 2012-04-10 Oracle International Corporation Method and apparatus for determining ranked causal paths for faults in a complex multi-host system with probabilistic inference in a time series
CN103001811B (en) * 2012-12-31 2016-01-06 北京启明星辰信息技术股份有限公司 Fault locating method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101252477A (en) * 2008-03-27 2008-08-27 杭州华三通信技术有限公司 Determining method and analyzing apparatus of network fault root
CN101442762A (en) * 2008-12-29 2009-05-27 中国移动通信集团北京有限公司 Method and apparatus for analyzing network performance and locating network fault
CN101854277A (en) * 2010-06-12 2010-10-06 河北全通通信有限公司 Method for monitoring mobile communication operation analysis system
CN102158360A (en) * 2011-04-01 2011-08-17 华中科技大学 Network fault self-diagnosis method based on causal relationship positioning of time factors

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108351814B (en) * 2015-10-27 2021-08-17 甲骨文国际公司 System and method for prioritizing support packets
CN108351814A (en) * 2015-10-27 2018-07-31 甲骨文国际公司 For the system and method to supporting packet to be prioritized
WO2018010176A1 (en) * 2016-07-15 2018-01-18 华为技术有限公司 Method and device for acquiring fault information
CN106294076A (en) * 2016-08-24 2017-01-04 浪潮(北京)电子信息产业有限公司 A kind of server relevant fault Forecasting Methodology and system thereof
CN106294076B (en) * 2016-08-24 2019-03-15 浪潮(北京)电子信息产业有限公司 A kind of server relevant fault prediction technique and its system
CN108880838A (en) * 2017-05-10 2018-11-23 阿里巴巴集团控股有限公司 Monitoring method and device, the computer equipment and readable medium of traffic failure
CN108880838B (en) * 2017-05-10 2021-11-09 阿里巴巴集团控股有限公司 Service fault monitoring method and device, computer equipment and readable medium
CN107690676A (en) * 2017-07-04 2018-02-13 深圳怡化电脑股份有限公司 Financial self-service equipment maintenance distribute leaflets generation method, handheld terminal and electronic equipment
CN109936470A (en) * 2017-12-18 2019-06-25 中国电子科技集团公司第十五研究所 A kind of method for detecting abnormality
CN108229613A (en) * 2017-12-30 2018-06-29 武汉凌科通光电科技有限公司 Opto-electronic device Fault Locating Method and system
CN110611604A (en) * 2019-09-19 2019-12-24 国家电网有限公司 Local area network equipment evaluation processing method and device
CN111739188A (en) * 2019-10-11 2020-10-02 北京京东尚科信息技术有限公司 AGV fault growth rate determination method and apparatus
CN110635960A (en) * 2019-11-11 2019-12-31 国家电网有限公司 Upgrading method and device of communication equipment
CN111143101A (en) * 2019-12-12 2020-05-12 东软集团股份有限公司 Method and device for determining fault source, storage medium and electronic equipment
CN111143101B (en) * 2019-12-12 2023-07-07 东软集团股份有限公司 Method, device, storage medium and electronic equipment for determining fault source
CN113839804A (en) * 2020-06-24 2021-12-24 华为技术有限公司 Network fault determination method and network equipment

Also Published As

Publication number Publication date
WO2015090098A1 (en) 2015-06-25
CN105659528B (en) 2019-10-08
CN105659528A (en) 2016-06-08

Similar Documents

Publication Publication Date Title
CN104734871A (en) Method and device for positioning failures
CN104218676B (en) The intelligent warning system of power dispatching automation main website and method
CN103220173B (en) A kind of alarm monitoring method and supervisory control system
CN105159964A (en) Log monitoring method and system
CN104360208A (en) Acquisition failure analyzing and processing method of electricity utilization information acquisition operating and maintaining system
CN105049253B (en) A kind of method for obtaining mobile network's fault location and fault pre-alarming
CN101212367A (en) Alarm message processing method and device
CN104639587A (en) Robot fault monitoring system and method based on Internet of Things
CN109559064A (en) The operation and maintenance method of gate based on Internet of Things
CN107526044A (en) A kind of communication storage battery Telemetry Data Acquisition monitoring method and system
CN104574219A (en) System and method for monitoring and early warning of operation conditions of power grid service information system
CN104038373A (en) Information early warning and self repairing system and method
WO2014169869A1 (en) Alarm processing method and alarm system
CN104570976A (en) Monitoring system and method
CN104243192A (en) Fault treatment method and system
CN103763143A (en) Method and system for equipment abnormality alarming based on storage server
CN103701657A (en) Device and method for monitoring and processing dysfunction of continuously running data processing system
CN102984013A (en) Alarm analysis method for communication transmission network
CN109634808B (en) Chain monitoring event root cause analysis method based on correlation analysis
CN103905271B (en) A kind of alarm windstorm suppressing method
CN105739408A (en) Business monitoring method used for power scheduling system and business monitoring system
CN103297281B (en) A kind of method and system of electric power dedicated service passage monitoring running state
CN104765648A (en) Problem node detection method and device based on real-time computing system
CN108449212B (en) MAS message transmission method based on event association
CN106776193B (en) The virtual measuring method of apparatus for monitoring power supply slave failure

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20150624