CN101345656A - Global fault rate measuring method - Google Patents

Global fault rate measuring method Download PDF

Info

Publication number
CN101345656A
CN101345656A CNA2008101420796A CN200810142079A CN101345656A CN 101345656 A CN101345656 A CN 101345656A CN A2008101420796 A CNA2008101420796 A CN A2008101420796A CN 200810142079 A CN200810142079 A CN 200810142079A CN 101345656 A CN101345656 A CN 101345656A
Authority
CN
China
Prior art keywords
alarm
equipment
fault
measuring method
rate measuring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2008101420796A
Other languages
Chinese (zh)
Other versions
CN101345656B (en
Inventor
杨安印
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN2008101420796A priority Critical patent/CN101345656B/en
Publication of CN101345656A publication Critical patent/CN101345656A/en
Application granted granted Critical
Publication of CN101345656B publication Critical patent/CN101345656B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A global fault rate measuring method comprises: defining alarm conditions of each device; collecting alarm conditions by an alarm process center, then obtaining parameter values of each device, generating the alarm when some device parameter values satisfy the alarm condition; judging a device as a fault device when alarm continuing time of some device is larger than or equal to that of preset fault judging standard, repeating the step to judge the fault and obtaining the sum of the fault device; obtaining the sum of the devices and computing the global failure rate by the alarm process center. According to the method, the fault of user attention is judged more correctly, and evaluation of global performance is provided to network integrated environment.

Description

Global fault rate measuring method
Technical field
The present invention relates to communication network field, more particularly, relate to the global fault rate measuring method of communication network.
Background technology
The monitoring of communication network operation conditions is provided for equipment management personnel, user attendant and be analyzed data timely and effectively, therefore, for the integrated environment situation that helps related personnel's analytical equipment and network, the positioning performance bottleneck, in time take processing scheme targetedly, improvement equipment and network service quality all have very important meaning; Also can provide a relatively accurate prediction simultaneously to the service quality of equipment in following a period of time.The equipment failure rate analysis is in numerous analysis means.In a broad sense, equipment failure rate refers to: certain equipment (perhaps parts) constantly of working, and the probability that breaks down in the unit interval after this moment, probable value is big more, and the possibility that breaks down is big more; In a narrow sense, the definition of equipment failure rate then is not quite similar, on the strategy when embodying a concentrated reflection of the implication of fault and fault discovery, different strategies will influence the order of accuarcy and the convergence rate of failure rate.Failure-rate data is for helping the user and the user attendant grasps the network equipment failure situation and following operation trend provides important references, and can fixer network or the abort situation of equipment.When failure rate was excessive, user and user attendant can take measures prosthetic appliance and network were improved service quality, and reduce potential loss.
At present, generally added up breakdown rate in the network management technology, it can effectively monitor the physical fault of individual equipment.
But there is following problem in existing failure rate computing technique:
1, fault statistics only at the physical fault of equipment itself, is not considered the fault that integrated environment causes, as software fault.
2, potential fault is not had warning function, promptly have only when fault takes place and just can discover, do not shift to an earlier date forecast function.
3, for the judgement of failure rate, only be confined to individual equipment, do not have the notion of the global fault rate of equipment, lack understanding the network overall failure.
4, the calculating of failure rate is coupling on other strategy, on the general alarm basis that provides such as the system of being coupling in, therefore, when failure judgement equipment, filter or increase some and alarm, will have influence on the normal process of other module to alarm, so flexibility is restricted, independence is bad.
5, the ability of fault custom feature is not enough, do not pay attention to the user and mainly close the point of contact, the degree that highlights of closing the point of contact for the user is not enough, therefore, owing to fault definition can not change flexibly, so the incident that a lot of non-user pays close attention to affects the calculating of final failure rate to a certain extent, the incident that the user is paid close attention to is less to the influence power of final result, the failure rate effect is bad, the problem that can not concentrate the explanation user to pay close attention to, and might cause user and user attendant's erroneous judgement.
Summary of the invention
Technical problem to be solved by this invention provides a kind of global fault rate measuring method, uses this method, can judge the failure condition of user's focus more accurately, accuracy of judgement, and the evaluation and test that can give an overall performance of network integrated environment.
A kind of global fault rate measuring method, it may further comprise the steps:
1.1 define the alarm conditions of each equipment;
1.2 alarm conditions are collected at the alarming processing center earlier, obtain the parameter value of each equipment again, when a certain device parameter value meets described alarm conditions, produce alarm;
1.3 when alarm duration of a certain equipment during, judge that this equipment is faulty equipment, repeat this step each equipment is carried out fault judgement and draws the faulty equipment sum more than or equal to time of default fault judgement standard;
1.4 the alarming processing center obtains the equipment sum and calculates global fault rate.
In the technical program, described alarm conditions comprise alarm type, Class Type; In step 1.3,, judge that this equipment is faulty equipment when duration of the alarm level type of a certain equipment during more than or equal to time of the alarm level type of default fault judgement standard.
In the step 1.2 of the technical program, the alarming processing center is stored in warning information in the record alert database; In step 1.3, default fault judgement standard storage is in record alert database.
In the technical program, described alarm type can be temperature, and Class Type can be set according to the different temperatures value.
In the step 1.3 of the technical program,, the alarm time of lap is only calculated once if a plurality of alarms of equipment exist when overlapping in time.
In the step 1.2 of the technical program, the method that the alarming processing center initiatively reports by poll or equipment is obtained the parameter value of each equipment.
Also comprise the fault trend determining step after the step 1.4 of the technical program: the Class Type according to alarm conditions sorts and default a plurality of fault judgement standards, the judged result that will draw under the different faults criterion compares analysis, predicts the fault trend of each equipment.
Method of the present invention can come equipment is carried out the alarm conditions setting according to the incident that the user is concerned about, gives failure rate more accurately, helps user's fault location point, gives the comprehensive evaluating of a fault of whole network equipment simultaneously.In calculating the failure rate process, the user not only can freely increase and decrease event of failure, i.e. setting by alarm conditions realizes, and can independently give different fault judgement standards to distinct device, and the fault judgement standard not only comprises event alarm, also can be included in the alarm basis and go up the screening function that increases, can add the alarm persistent value as screening fault judgement condition.In addition, this method has good forecast function to the future network operation conditions.
After the detailed description of reading embodiment of the present invention in conjunction with the accompanying drawings, it is clearer that characteristics of the present invention and advantage will become.
Description of drawings
Fig. 1 is the flow chart of embodiments of the present invention;
Fig. 2 is that the alarm of embodiments of the present invention produces flow chart;
Fig. 3 is whether the judgment device of embodiments of the present invention is the flow chart of faulty equipment; And
Fig. 4 is the failure rate calculation flow chart of embodiments of the present invention.
Embodiment
Fig. 1 has provided the flow chart of embodiments of the present invention.A kind of global fault rate measuring method, it may further comprise the steps: the alarm conditions that define each equipment; Alarm conditions are collected at the alarming processing center earlier, obtain the parameter value of each equipment again, when a certain device parameter value meets described alarm conditions, produce alarm; When alarm duration of a certain equipment during more than or equal to time of default fault judgement standard, judge that this equipment is faulty equipment, repeat this step each equipment is carried out fault judgement and draws the faulty equipment sum; The alarming processing center obtains the equipment sum and calculates global fault rate.Wherein, global fault rate=faulty equipment sum/equipment sum.The content of alarm conditions comprises alarm type, Class Type, warning position.
Below in conjunction with Fig. 2, Fig. 3, Fig. 4 present embodiment is described in further detail.
As shown in Figure 2, it is the alarm generation flow chart of present embodiment.
201 definition alarm conditions promptly become alarm to the event definition that the user pays close attention to.In the present embodiment, the alarm type of setting alarm conditions is a temperature, and this only is an embodiment of the invention; obviously; according to actual needs, alarm type also can be network packet loss rate or network traffics size or the like, and does not influence protection scope of the present invention.The Class Type of definition alarm conditions is: slight alarm, common alarm, significant alarm, high severity alarm.When the temperature of equipment is between 40-50 °, produce slightly alarm; When device temperature is between 50-70 °, produce common alarm; When device temperature is between 70-80 °, produce significant alarm; When device temperature more than 80 ° the time, produce high severity alarm.
202 alarm conditions with each equipment are registered to the alarming processing center, and the alarm conditions of each equipment are promptly arranged in the alarming processing at heart.Wherein the alarming processing center method that can take poll or equipment initiatively to report is obtained each device parameter value, if when the parameter value of a certain equipment reaches alarm conditions, will produce alarm, and deposit in the record alert database shown in the step 204.With regard to temperature, when the temperature value that gets access to an equipment when the alarming processing center is 100 °, promptly produce a high severity alarm and deposit in the record alert database, the warning information that wherein deposits record alert database in comprises raising Time and Class Type etc., at this moment, the alarm clearing time is empty, and this alarm is called as current alarm.When parameter value no longer satisfies alarm conditions, promptly need to eliminate this alarm, for example, there has been this equipment to be in the record of high severity alarm in the record alert database, if the temperature value of this equipment that get access to this moment is 60 °, promptly recover high severity alarm, the concluding time of high severity alarm Class Type before is revised as the current time, this incident just becomes a history alarm; Produce a common alarm simultaneously, noting raising Time is the current time; Certainly, if the current parameter value of judging and when last time, parameter value was between same zone, promptly event property does not change, and then is left intact this moment.
The fault judgement standard of 203 each equipment of user definition.For an equipment, can be separately to failure criterion of its definition, as, when alarm duration of this equipment more than or equal to default fault judgement during the time, this equipment is faulty equipment.
204 alarm and the fault judgement standards that equipment is produced are stored in the described record alert database that is used for failure rate respectively.
As shown in Figure 3, it is whether the judgment device of present embodiment is the flow chart of faulty equipment.
301 obtain the alarm at the different stage type of a certain equipment, put into list of devices list.At first from the record alert database shown in the step 204, fetch all alarms that meet the default fault judgement standard of step 203 at this equipment, because a plurality of incidents that the user may define produce simultaneously or have overlapping in time, therefore, the alarm time of lap is only regarded an alarm as, promptly only calculates once overlapping time.
302 merge alarm.For example, deposit alarm in list, each alarm all has zero-time startTime and concluding time endTime.At first alarm among the list is sorted according to the alarm time of origin, what take place earlier is arranged in the front, certainly, also can come the front to what the back took place and do not influence protection scope of the present invention according to actual needs, farthest merges the alarm among the list then.Specifically, two adjacent alarm C are as follows with the merging method of alarm D:
(1) if the alarm D startTime more than or equal to the alarm C startTime and smaller or equal to the alarm C endTime, the endTime that perhaps alarms D is more than or equal to the startTime of alarm C and smaller or equal to the endTime that alarms C, that is to say, if the startTime of alarm D is between the startTime and endTime of alarm C, the endTime that perhaps alarms D is after the startTime of alarm C and before the endTime of alarm C, obviously, alarm C and alarm D have lap in time, and then these two alarms can be merged into a new alarm E.The zero-time of the new time period of alarm E is the minimum value among the startTime of alarm C and alarm D, and the concluding time of alarm E is the maximum among the endTime of alarm C and alarm D.After merging calls alarm E with alarm D alarm C, promptly C and alarm D are alarmed in deletion from list, and an alarm E added among the list, the next one that continues to merge alarm E according to the method described above then and be right after after alarm E is alarmed, if alarm E is last alarm, then can not remerge alarm.
(2), then alarm C, when alarm D can not merge, at this moment, just continue the alarm that merging is alarmed D and abutted against alarm D back if alarm C and alarm D when not having lap in time.Merge all alarms that can merge among the list according to above-mentioned merging method, till can not remerging.
303 calculate the alarm duration.The endTime-startTime time segment value of all alarms among the list that adds up, thus obtain alarming duration w.
304 judge that whether the alarm duration is more than or equal to the default fault judgement time.If the alarm duration, during the time, promptly this equipment was faulty equipment more than or equal to default fault judgement, otherwise not fault equipment just.
Fig. 4 has provided the failure rate calculation flow chart of present embodiment.
401 at first obtain all equipment of being managed in the network, be placed among the list of devices list, and equipment sum T is preserved.
Whether also have equipment among the 402 inquiry list,, then enter step 407 and calculate global fault rate if list is empty, otherwise, step 403 entered.
Whether 403 computing equipments are faulty equipment.
Whether 404 judgment device are faulty equipment.
405 if faulty equipment, and then faulty equipment is counted F and added 1, and wherein, the initial value of F is 0.
First element is removed among 406 list, returns step 402 then and judges whether next equipment is faulty equipment.
The computing formula of 407 global fault rates is: f=F/T, the f here is a breakdown rate.
In addition, present embodiment also comprises the fault trend determining step: the Class Type according to alarm conditions sorts and default a plurality of fault judgement standards, the judged result that will draw under the different faults criterion compares analysis, predicts the fault trend of each equipment.For example, for the temperature parameter of equipment, in the Class Type of alarm conditions, be 20-30 ° if set common alarm, high severity alarm is 30-40 °, so, can dispose two different fault judgement standards this moment.One of them fault judgement standard pin is to common alarm, another fault judgement standard pin is to high severity alarm, calculate two failure rates according to above-described method, sort according to severity level, can find out the fault trend of this equipment from two failure rate sizes, thereby a good forecast function is provided.
In the step 201 of present embodiment, because the event definition that the user is paid close attention to becomes alarm, highlight user-defined fault, get rid of other non-principal element.For point of contact, equipment fault pass, the alarm function that definition is necessary, alarm simultaneously is divided into the Class Type of several different weight grades, when the incident of concern takes place or a mensurable value of incident when meeting alarm conditions, produces corresponding failure rate alarm.Wherein alarm not necessarily matters of aggravation, as long as the user pays close attention to a parameter and a parameter value thresholding, can be defined as alarm, user's flexibility ratio is very big like this, also more can embody the incident that the user pays close attention to, and gets rid of interference incident.
When whether judgment device is faulty equipment, necessary by self-defined faulty equipment at high severity alarm rank and minimum alarm total time, for the judgement of faulty equipment provides standard.Qualified alarm only need be extracted in the alarming processing center from record alert database, can be alarm time as basis for estimation, and many one is filtered level, judges that granularity is meticulousr, and accuracy is higher.
In the present embodiment, consider the environmental factor of whole system, the fault judgement factor can freely be deleted by the user, and allow whether the control individual equipment is fault equipment; Fault can be judged by alarming indirectly, and the secondary filter of support failure rate, and the user can also filter out the alarm that needs according to self-defined fault threshold after defining fault warning, and result of calculation is also more accurate.
The fault type of faulty equipment can define flexibly, and non-physical fault also can be considered into; The alarm that simultaneous faults relies on is independent of other module, so definable alarm can not influence the strategy of other module to alarming processing.
Step mentioned above is an embodiment of the invention.
In a word; though described embodiments of the present invention in conjunction with the accompanying drawings; but those skilled in the art can make various distortion or modification within the scope of the appended claims; as long as be no more than the described protection range of claim of the present invention, all should be within protection scope of the present invention.

Claims (7)

1, a kind of global fault rate measuring method is characterized in that, may further comprise the steps:
1.1 define the alarm conditions of each equipment;
1.2 alarm conditions are collected at the alarming processing center earlier, obtain the parameter value of each equipment again, when a certain device parameter value meets described alarm conditions, produce alarm;
1.3 when alarm duration of a certain equipment during, judge that this equipment is faulty equipment, repeat this step each equipment is carried out fault judgement and draws the faulty equipment sum more than or equal to time of default fault judgement standard;
1.4 the alarming processing center obtains the equipment sum and calculates global fault rate.
2, fault rate measuring method according to claim 1 is characterized in that:
Described alarm conditions comprise alarm type, Class Type;
In step 1.3,, judge that this equipment is faulty equipment when duration of the alarm level type of a certain equipment during more than or equal to time of the alarm level type of default fault judgement standard.
3, fault rate measuring method according to claim 2 is characterized in that:
In step 1.2, the alarming processing center is stored in warning information in the record alert database;
In step 1.3, default fault judgement standard storage is in record alert database.
4, the described fault rate measuring method of claim 3 is characterized in that: described alarm type is a temperature, and Class Type is set according to the different temperatures value.
5, according to each described fault rate measuring method of claim 1 to 4, it is characterized in that:
In step 1.3,, the alarm time of lap is only calculated once if a plurality of alarms of equipment exist when overlapping in time.
6, fault rate measuring method according to claim 5 is characterized in that:
In step 1.2, the method that the alarming processing center initiatively reports by poll or equipment is obtained the parameter value of each equipment.
7, fault rate measuring method according to claim 5 is characterized in that,
After step 1.4, also comprise the fault trend determining step:
Class Type according to alarm conditions sorts and default a plurality of fault judgement standards, and the judged result that will draw under the different faults criterion compares analysis, predicts the fault trend of each equipment.
CN2008101420796A 2008-08-25 2008-08-25 global fault rate measuring method Expired - Fee Related CN101345656B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101420796A CN101345656B (en) 2008-08-25 2008-08-25 global fault rate measuring method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101420796A CN101345656B (en) 2008-08-25 2008-08-25 global fault rate measuring method

Publications (2)

Publication Number Publication Date
CN101345656A true CN101345656A (en) 2009-01-14
CN101345656B CN101345656B (en) 2011-12-28

Family

ID=40247551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101420796A Expired - Fee Related CN101345656B (en) 2008-08-25 2008-08-25 global fault rate measuring method

Country Status (1)

Country Link
CN (1) CN101345656B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102185722A (en) * 2011-05-19 2011-09-14 中兴通讯股份有限公司 Alarm processing method and system
CN103067045A (en) * 2011-10-18 2013-04-24 国家电网公司 Waste circuit identification method of electric power communication system and apparatus thereof
CN103326875A (en) * 2012-03-23 2013-09-25 中兴通讯股份有限公司 Teleservice performance management method, system and network management method based on threshold
CN106878091A (en) * 2017-03-27 2017-06-20 千寻位置网络有限公司 Broadcast the monitoring analysis method of platform in high accuracy positioning differential data internet
CN107105448A (en) * 2016-02-23 2017-08-29 中国移动通信集团内蒙古有限公司 A kind of warning information display methods and device
CN110411547A (en) * 2019-08-01 2019-11-05 吉旗(成都)科技有限公司 Fault remote automatic diagnosis method and device for lorry perception internet of things equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1183709C (en) * 2001-12-24 2005-01-05 英业达股份有限公司 Electronic data exchange platform failure warning system, its method and its material structure
CN1866853A (en) * 2005-05-19 2006-11-22 黄宁宁 LAN node fault alarming method, system and interface module for the system
CN100596036C (en) * 2007-04-10 2010-03-24 华为技术有限公司 Optical transmission router and alarm processing method

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102185722A (en) * 2011-05-19 2011-09-14 中兴通讯股份有限公司 Alarm processing method and system
CN103067045A (en) * 2011-10-18 2013-04-24 国家电网公司 Waste circuit identification method of electric power communication system and apparatus thereof
CN103067045B (en) * 2011-10-18 2015-02-18 国家电网公司 Waste circuit identification method of electric power communication system and apparatus thereof
CN103326875A (en) * 2012-03-23 2013-09-25 中兴通讯股份有限公司 Teleservice performance management method, system and network management method based on threshold
WO2013139082A1 (en) * 2012-03-23 2013-09-26 中兴通讯股份有限公司 Method and system for managing end-to-end service performance based on threshold, and network manager
CN103326875B (en) * 2012-03-23 2016-04-13 中兴通讯股份有限公司 A kind of teleservice performance management method based on thresholding, system and webmaster
CN107105448A (en) * 2016-02-23 2017-08-29 中国移动通信集团内蒙古有限公司 A kind of warning information display methods and device
CN106878091A (en) * 2017-03-27 2017-06-20 千寻位置网络有限公司 Broadcast the monitoring analysis method of platform in high accuracy positioning differential data internet
CN106878091B (en) * 2017-03-27 2019-10-08 千寻位置网络有限公司 Broadcast the monitoring analysis method of platform in high accuracy positioning differential data internet
CN110411547A (en) * 2019-08-01 2019-11-05 吉旗(成都)科技有限公司 Fault remote automatic diagnosis method and device for lorry perception internet of things equipment

Also Published As

Publication number Publication date
CN101345656B (en) 2011-12-28

Similar Documents

Publication Publication Date Title
US8635498B2 (en) Performance analysis of applications
EP3051421B1 (en) An application performance analyzer and corresponding method
CN101345656B (en) global fault rate measuring method
US8248228B2 (en) Method and device for optimizing the alarm configuration
US8918345B2 (en) Network analysis system
EP2894813A1 (en) Technique for creating a knowledge base for alarm management in a communications network
CN101997709B (en) Root alarm data analysis method and system
CN106656627A (en) Performance monitoring and fault positioning method based on service
CN106411659A (en) Business data monitoring method and apparatus
CN110428018A (en) A kind of predicting abnormality method and device in full link monitoring system
CN103023028A (en) Rapid grid failure positioning method based on dependency graph of entities
CN114095965A (en) Index detection model obtaining and fault positioning method, device, equipment and storage medium
US20220294529A1 (en) Analyzing performance of fibers and fiber connections using long-term historical data
KR101281460B1 (en) Method for anomaly detection using statistical process control
CN117194142A (en) Integrated application performance diagnosis system and method based on link tracking
CN102195791A (en) Alarm analysis method, device and system
CN114238020A (en) Multidimensional high-precision intelligent service monitoring method and system
Deljac et al. Early detection of network element outages based on customer trouble calls
CN115016976B (en) Root cause positioning method, device, equipment and storage medium
KR100323747B1 (en) Method for intelligent observation of exchange network service
CN104125346A (en) Voice quality alarming method and device
CN116232851A (en) Early warning method and device for network abnormality, electronic equipment and storage medium
CN111327442B (en) Complaint early warning threshold value obtaining method and device based on control chart
US8976783B2 (en) Method and apparatus for assuring voice over internet protocol service
CN110601885A (en) Artificial intelligence public cloud abnormity indication alarm system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20111228

Termination date: 20170825

CF01 Termination of patent right due to non-payment of annual fee