CN101989931A - Operation alarm processing method and device - Google Patents

Operation alarm processing method and device Download PDF

Info

Publication number
CN101989931A
CN101989931A CN 201010589714 CN201010589714A CN101989931A CN 101989931 A CN101989931 A CN 101989931A CN 201010589714 CN201010589714 CN 201010589714 CN 201010589714 A CN201010589714 A CN 201010589714A CN 101989931 A CN101989931 A CN 101989931A
Authority
CN
China
Prior art keywords
alarm
value
unit
processing unit
alarming processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010589714
Other languages
Chinese (zh)
Inventor
廖昕
杨涛
陈松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Qinzhi Digital Technology Co Ltd
Original Assignee
Chengdu Qinzhi Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Qinzhi Digital Technology Co Ltd filed Critical Chengdu Qinzhi Digital Technology Co Ltd
Priority to CN 201010589714 priority Critical patent/CN101989931A/en
Publication of CN101989931A publication Critical patent/CN101989931A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses an operation alarm processing method, comprising the following steps of: (1) alarm collection: scheduling various probes by a robot to collect the operation health condition of IT resources; (2) alarm processing: generating alarms through the processes such as abnormity judgment, information expansion, relevancy analysis, storage and the like, expanding the fields, and shielding the invalid alarms; and (3) alarm notification: notifying the alarms to relevant operators through the manners such as short message, alarm lamp, mail, message and the like. The method and relevant device thereof of the invention improve the correctness, the effectiveness and the timeliness of the operation alarms.

Description

A kind of O﹠M alert processing method and device
Technical field
The present invention relates to IT O﹠M field, particularly O﹠M alert processing method.
Background technology
Along with the deep development of informatization, IT system becomes the critical infrastructures that core business is handled day by day; In order to guarantee the normal operation of IT resources such as network, server, database, need it be safeguarded; When system alarm occurs can in time producing when unusual and notifies the O﹠M personnel.The O﹠M personnel can diagnose, and finish relevant maintaining operation according to alarm to positioning unusually.Accuracy, real-time and the validity of alarm plays crucial effects to timely discovery, early warning and the solution that guarantees the system failure.
Summary of the invention
The invention provides a kind of IT O﹠M alert processing method, its key step is: 1) gather system running state and performance index the collection point; 2) collection point upload the data to processing server; 3) whether processing server exists abnormal conditions according to predefined rule judgment; If have unusually, then produce unusual; 4) unusual to new generation carries out correlation analysis, determines whether to produce new alarm; 5) to the alarm of new generation, to carry out note and send, alarm lamp drives, operations such as instant message transmission.
The present invention also provides a kind of device of making according to said method, as shown in Figure 1.This device comprises 3 parts: collecting unit, alarming processing unit and alarm transmitting element.Wherein collecting unit is responsible for gathering the state and the performance data of IT infrastructure.The alarming processing unit comprises four subelements: unusual judgement, be responsible for data being analyzed according to predefined rule, and determined whether unusual generation; Correlation analysis is analyzed new incident unusual and that have been found that, judges that unusually whether this should trigger a new incident; Information expansion, original alarm the inside may have only some Back ground Informations, and after expanding, it is abundanter that content becomes, and the O﹠M personnel can more effective understanding alarm and make best judgement.
The data of collecting unit collection comprise status data and performance data, and it can support multiple acquisition mode, comprises SNMP, Telnet/SSH, and JDBC, JMX etc. are contained multiple IT infrastructure such as network, server, database, middleware.
In the IT O﹠M, how to judge automatically that system's operation exception is very important.Some fault such as system can't visit, and this can cause business to handle, and the user can report complaint; But the problem that some is potential, user impression less than, but can make judgement according to relevant knowledge, such as, the flow normal condition in evening of certain link is below the 1M, if exceed 1M even higher, just may exist unusually.Abnormal deciding means is according to the problem of regular recognition system existence in service.In rule, the data that collecting unit collects all are called " value ", and each value all includes attributes such as corresponding device, module, index, acquisition time.Rule is the expression formula whether calculated value satisfies condition, and expression formula is by grand, and identifier and operator are formed.Abnormal deciding means carries out calculating after the macro substitution to each value that receives, if the value after calculating is true, then expression occurs unusual.The flexibility of expression formula makes this determination methods can adapt to the needs of number of different types equipment, index and scene.
Include only the alarm source in the original alarm information, time of origin, attributes such as content.Because operation system is complicated day by day, better grasp issuable risk of alarm or problem in order to help the O﹠M personnel, to the influence of business etc., the information expansion unit is realized the attribute of warning information is expanded.
In IT system, connect each other between the resources such as network, server, database.When certain assembly wherein take place unusual after and its assembly of being associated also can produce same exception reporting, thereby produce a series of alarm.How to find real failure cause and position by analysed for relevance between these a series of alarms, be a key that guarantees alarm validity.
After alarm takes place, need the O﹠M personnel that timely alarm notification need be understood.Adapt to different urgency levels, the alarm notification unit provides multiple alarm modes such as note, mail, light, message.Note, light, message etc. is applicable to promptly, to the demanding alarm of real-time, mail is applicable to general alarm.
In addition, according to embodiments of the invention, collection point of the present invention is made up of robot and a plurality of probe; Robot is responsible for dispatching probe and carries out the collection action;
In addition, according to embodiments of the invention, the acquisition mode of probe support comprises SNMP, Telnet, SSH, JDBC, JMX etc.
In addition, according to embodiments of the invention, the collection point can distribute and be installed in a plurality of places, but data are left concentratedly.
In addition, according to embodiments of the invention, acquisition probe is divided into the SNMP probe, JDBC probe, Telnet/SSH probe, JMX probe etc.
In addition, according to embodiments of the invention, be connected with messaging bus by data/address bus between collecting unit and the alarming processing unit; Data/address bus is used for reported data, and messaging bus is used to issue acquisition;
In addition, according to embodiments of the invention, an alarming processing unit can receive the data of a plurality of collecting units;
In addition, according to embodiments of the invention, when transmission fault occurring, collecting unit can be attempted one or more backup alarming processing unit;
In addition, according to embodiments of the invention, when data can't be transmitted, collecting unit can be preserved the data of up-to-date a period of time, recovered up to transmission.
In addition, according to embodiments of the invention, when the alarming processing unit is found to gather again, can notify collecting unit to gather again by messaging bus.
In addition, according to embodiments of the invention, unusual judgement is calculated by conditional expression, and conditional expression is quoted desired value, environment value etc. by macrodefinition;
In addition, according to embodiments of the invention, information expansion is by conditional expression sign alarm set, by the field value of value expression definition expansion;
In addition, according to embodiments of the invention, correlation analysis is by resource dependencies, temporal correlation and professional correlation between the rule definition alarm;
In addition, according to embodiments of the invention, correlation analysis is realized shielding, compression, upgrading, operation associated.
In addition, according to embodiments of the invention, alarm notification unit and alarming processing unit are by the Transmission Control Protocol transmitting warning; The alarming processing unit can be alarm pushing to a plurality of alarm notifications unit.
In addition, according to embodiments of the invention, the switch flicker and the color of alarm lamp just controlled in the alarm notification unit by the serial ports level.
In addition, according to embodiments of the invention, the alarm notification unit sends alarm by serial ports control note cat.
Description of drawings
The present invention will illustrate by example and with reference to the mode of accompanying drawing, wherein
Fig. 1 is the alert processing method schematic diagram.
Fig. 2 is a harvester work schematic diagram.
Fig. 3 is the harvester workflow diagram.
Fig. 4 is that warning information expands flow chart.
Fig. 5 is the correlation analysis flow chart.
 
Embodiment
Disclosed all features in this specification, or the step in disclosed all methods or the process except mutually exclusive feature and/or step, all can make up by any way.
Disclosed arbitrary feature in this specification (comprising any accessory claim, summary and accompanying drawing) is unless special narration all can be replaced by other equivalences or the alternative features with similar purpose.That is, unless special narration, each feature is an example in a series of equivalences or the similar characteristics.
The present invention is described further below in conjunction with accompanying drawing
As Fig. 1, apparatus of the present invention comprise collecting unit, alarming processing unit, alarm notification unit.Collecting unit comprises robot and various probes such as SNMP, Telnet.According to the different technologies interface that equipment is supported, probe is by the running status of different technological means collecting devices.Collecting unit passes to the alarming processing unit with the data that collect by data/address bus.Simultaneously, collecting unit also receives the instruction from the alarming processing unit, heavily adopts when the image data mistake occurring, operation such as filling mining.Connection support backup between collecting unit and the alarming processing unit.Promptly when collecting unit finds can't to communicate by letter in the alarming processing unit of current use, the alarming processing unit of the backup that can be dynamically connected certainly.If all alarming processing unit all can't connect, the data of nearest a period of time can be preserved in the alarm collection unit, and are big or small less than specifying up to the remanence disk space.When remaining space was not enough, the alarm collection unit can abandon the data of " old "; But guarantee the promptness and the accuracy of alarm by the said method maximum possible.
After the alarming processing unit receives initial data, at first whether occurred unusually according to the unusual judgment rule analysis of presetting.Unusually can be certain concrete technical indicator of IT resource or operation system, can be certain tolerance of user experience; It also may be the judgement that draws after a plurality of index comprehensive computings.In order to adapt to the complexity of distinct device, different business systems, rule is described by the unconventionality expression formula unusually.The user can be described abnormal conditions according to own understanding to IT system with expression formula.Because the macro substitution of expression formula, computing etc. may be more consuming time, unusual judge module can write down the performance of expression formula processing and regularly analyze, and takes this to optimize and revise the concurrent Thread Count that expression formula is handled.
In order to increase the readability of alarm, help the O﹠M personnel to analyze alarm more accurately, the information expansion unit expands the alarm field.In this device, warning information has been reserved the expansion field.As shown in Figure 4, system at first defines a conditional expression, the alarm set of determining to satisfy condition, and then define the value expression of one or more expansion fields.To every alarm, its attribute expression formula that whether satisfies condition is judged by system, if satisfy, then with macro substitution call by value expression formulas such as the primitive attribute of alarm, environmental information, business information, plant maintenance information, calculates the value that expands field.
To a new alarm that produces, dependency analysis unit compares analysis with its and history alarm, determining whether there is correlation between these incidents, and definite Root alarm with derive alarm.This correlation comprises temporal correlation, resource dependencies and professional correlation.As shown in Figure 5, correlation is handled and is comprised the following steps: 1) user sets up association rules, and the priority of definite rule; The rule that system provides can be described correlations such as time, resource and business; 2) system reads presetting rule; 3) after new alarm produces, system calculates an alarm set according to the attribute and the association rules of alarm, if the alarm set comprises a more than element, then there are correlation in this alarm and other alarms, further analyze Root alarm and the alarm of deriving (acquiescence is that the alarm that produces earlier is a Root alarm); 4) to having the alarm of correlation, operations such as shielding, compression, upgrading are carried out in predefined action according to rule.5) have the alarm of correlation, on display unit, can divide into groups to show.
Alarm behind the correlation analysis, needs the relevant O﹠M personnel of notice through expanding, and comprises by inquiry, note, mail, light etc.As shown in Figure 1, in this device, communicate by TCP between alarm notification device and the alarm treatment device, alarm treatment device with alarm pushing to the alarm notification device.The alarm notification device is connected with note cat, alarm lamp etc. by serial ports.Device is communicated by letter with the note cat by serial port protocol and is sent note.Device is by the switch of high-low level control alarm lamp.
The present invention is not limited to aforesaid embodiment.The present invention expands to any new feature or any new combination that discloses in this manual, and the arbitrary new method that discloses or step or any new combination of process.

Claims (10)

1. O﹠M alert processing method is characterized in that this method may further comprise the steps: 1) alarm collection, robot are dispatched various probes and are collected IT resources operation health status; 2) alarming processing, by unusual judgement, information expansion, correlation analysis, processes such as preservation produce alarm, expand field, shield invalid alarm etc.; 3) alarm notification: by modes such as note, alarm lamp, mail, message with the relevant O﹠M personnel of alarm notification.
2. O﹠M alert processing method according to claim 1 is characterized in that: the process of described correlation analysis mainly comprises the following steps: 1) user sets up association rules; 2) system reads presetting rule; 3) after new alarm produces, system calculates an alarm set according to the attribute and the association rules of alarm, if the alarm set comprises a more than element, then there are correlation in this alarm and other alarms, further analyze Root alarm and the alarm of deriving; 4) to having the alarm of correlation, operations such as shielding, compression, upgrading are carried out in predefined action according to rule.
3. a device of using O﹠M alert processing method as claimed in claim 1 or 2 is characterized in that: comprise alarm collection unit, alarming processing unit and alarm notification unit.
4. device according to claim 3 is characterized in that: collecting unit comprises gathers robot and acquisition probe two parts; Acquisition probe is divided into the SNMP probe, JDBC probe, Telnet/SSH probe, JMX probe etc.; Gather robot and be responsible for dispatching the operating index that acquisition probe is gathered distinct device.
5. device according to claim 3 is characterized in that: be connected with messaging bus by data/address bus between collecting unit and the alarming processing unit; Data/address bus is used for reported data, and messaging bus is used to issue acquisition; An alarming processing unit can receive the data of a plurality of collecting units.
6. device according to claim 3 is characterized in that: when transmission fault occurring with main alarming processing unit, and collecting unit one or more backup alarming processing unit that can be dynamically connected certainly; When all alarming processing unit all can't transmission success, collecting unit can be preserved the data of up-to-date a period of time, up to the remanence disk space less than specifying size; When remaining space was not enough, the alarm collection unit can abandon the data of " old "; After transmission recovered, the data of preservation can be uploaded automatically.
7. device according to claim 3 is characterized in that: when the alarming processing unit is found to gather again, can notify collecting unit to gather again by messaging bus.
8. device according to claim 3, it is characterized in that: unusual judgement is calculated by conditional expression, conditional expression is made up of operator and symbol, can quote desired value, property value, environment value by macrodefinition, and its operation result is the logical value true or false; Abnormal deciding means carries out calculating after the macro substitution to each value that receives, if the value after calculating is true, then expression occurs unusual.
9. device according to claim 3, it is characterized in that: information expansion is by conditional expression sign alarm set, field value by value expression definition expansion, value expression is made up of operator and symbol, can quote desired value, property value, environment value by macrodefinition, its operation result is numerical value, character string or logical value.
10. device according to claim 3 is characterized in that: alarm notification unit and alarming processing unit are by the Transmission Control Protocol transmitting warning; Wherein, described alarming processing unit is pushed to a plurality of alarm notifications unit to warning information; The switch that notification unit is just controlled alarm lamp by the serial ports level glimmers and color, perhaps sends alarm by serial ports control note cat.
CN 201010589714 2010-12-15 2010-12-15 Operation alarm processing method and device Pending CN101989931A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010589714 CN101989931A (en) 2010-12-15 2010-12-15 Operation alarm processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010589714 CN101989931A (en) 2010-12-15 2010-12-15 Operation alarm processing method and device

Publications (1)

Publication Number Publication Date
CN101989931A true CN101989931A (en) 2011-03-23

Family

ID=43746289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010589714 Pending CN101989931A (en) 2010-12-15 2010-12-15 Operation alarm processing method and device

Country Status (1)

Country Link
CN (1) CN101989931A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102223251A (en) * 2011-06-14 2011-10-19 重庆市电力公司江北供电局 Collecting and analyzing method for network operation and maintenance and business processing device
CN103546338A (en) * 2011-05-04 2014-01-29 成都勤智数码科技股份有限公司 Method for predicting IT (information technology) operation and maintenance by using correlation
CN104468224A (en) * 2014-12-18 2015-03-25 浪潮电子信息产业股份有限公司 Double-filtering fault warning method for data center monitoring system
CN105812247A (en) * 2016-05-04 2016-07-27 北京思特奇信息技术股份有限公司 Method and system for processing service alarm information through E-mail
CN106254158A (en) * 2016-09-22 2016-12-21 安徽云图信息技术有限公司 Information system intelligent monitoring management platform
CN107506916A (en) * 2017-08-15 2017-12-22 上海数聚软件系统股份有限公司 A kind of enterprise operation Warning System based on index
CN107707376A (en) * 2017-06-09 2018-02-16 贵州白山云科技有限公司 A kind of method and system for monitoring and alerting
CN109495546A (en) * 2018-10-26 2019-03-19 北京车和家信息技术有限公司 Data processing method, system and server
CN109768950A (en) * 2018-01-19 2019-05-17 杭州博烁晟斐智能科技有限公司 A kind of communication iron tower Breakdown Maintenance system Real Data Exchangs agreement
CN114944980A (en) * 2022-07-26 2022-08-26 上海有孚智数云创数字科技有限公司 System method, apparatus, medium, and program product for monitoring alarms

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1992636A (en) * 2005-12-29 2007-07-04 华为技术有限公司 System and method for processing warning information
CN101335643A (en) * 2008-08-06 2008-12-31 烽火通信科技股份有限公司 Method and apparatus for SDH equipment alarm correlativity analysis
CN101436274A (en) * 2008-11-14 2009-05-20 山东浪潮齐鲁软件产业股份有限公司 Method for across-platform monitoring enterprise application system performance
CN101494568A (en) * 2008-12-16 2009-07-29 浪潮通信信息系统有限公司 Method for shortening performance alarm generation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1992636A (en) * 2005-12-29 2007-07-04 华为技术有限公司 System and method for processing warning information
CN101335643A (en) * 2008-08-06 2008-12-31 烽火通信科技股份有限公司 Method and apparatus for SDH equipment alarm correlativity analysis
CN101436274A (en) * 2008-11-14 2009-05-20 山东浪潮齐鲁软件产业股份有限公司 Method for across-platform monitoring enterprise application system performance
CN101494568A (en) * 2008-12-16 2009-07-29 浪潮通信信息系统有限公司 Method for shortening performance alarm generation

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103546338A (en) * 2011-05-04 2014-01-29 成都勤智数码科技股份有限公司 Method for predicting IT (information technology) operation and maintenance by using correlation
CN102223251A (en) * 2011-06-14 2011-10-19 重庆市电力公司江北供电局 Collecting and analyzing method for network operation and maintenance and business processing device
CN104468224A (en) * 2014-12-18 2015-03-25 浪潮电子信息产业股份有限公司 Double-filtering fault warning method for data center monitoring system
CN104468224B (en) * 2014-12-18 2018-02-23 浪潮电子信息产业股份有限公司 Double-filtering fault warning method for data center monitoring system
CN105812247A (en) * 2016-05-04 2016-07-27 北京思特奇信息技术股份有限公司 Method and system for processing service alarm information through E-mail
CN106254158A (en) * 2016-09-22 2016-12-21 安徽云图信息技术有限公司 Information system intelligent monitoring management platform
CN107707376A (en) * 2017-06-09 2018-02-16 贵州白山云科技有限公司 A kind of method and system for monitoring and alerting
CN107707376B (en) * 2017-06-09 2018-08-03 贵州白山云科技有限公司 A kind of method and system of monitoring and alarm
CN107506916A (en) * 2017-08-15 2017-12-22 上海数聚软件系统股份有限公司 A kind of enterprise operation Warning System based on index
CN109768950A (en) * 2018-01-19 2019-05-17 杭州博烁晟斐智能科技有限公司 A kind of communication iron tower Breakdown Maintenance system Real Data Exchangs agreement
CN109495546A (en) * 2018-10-26 2019-03-19 北京车和家信息技术有限公司 Data processing method, system and server
CN109495546B (en) * 2018-10-26 2021-11-23 北京车和家信息技术有限公司 Data processing method, system and server
CN114944980A (en) * 2022-07-26 2022-08-26 上海有孚智数云创数字科技有限公司 System method, apparatus, medium, and program product for monitoring alarms
CN114944980B (en) * 2022-07-26 2022-10-21 上海有孚智数云创数字科技有限公司 System method, apparatus, and medium for monitoring alarms

Similar Documents

Publication Publication Date Title
CN101989931A (en) Operation alarm processing method and device
CN105183609B (en) A kind of real-time monitoring system for being applied to software system and method
CN104407964B (en) A kind of centralized monitoring system and method based on data center
CN111294217B (en) Alarm analysis method, device, system and storage medium
CN103023695B (en) Master station system monitoring model based on power dispatching automation
CN111339175B (en) Data processing method, device, electronic equipment and readable storage medium
US20030135382A1 (en) Self-monitoring service system for providing historical and current operating status
US11323463B2 (en) Generating data structures representing relationships among entities of a high-scale network infrastructure
JP6085550B2 (en) Log analysis apparatus and method
CN107958337A (en) A kind of information resources visualize mobile management system
CN103220173A (en) Alarm monitoring method and alarm monitoring system
CN106815125A (en) A kind of log audit method and platform
JP5913145B2 (en) Log visualization device, method, and program
CN100549975C (en) Computer maintenance support system and analysis server
CN102938710A (en) Monitoring system and method for large-scale servers
CN114787875A (en) System and method for using virtual or augmented reality with data center operations or cloud infrastructure
CN113190415A (en) Internet hospital system monitoring method, equipment, storage medium and program product
CN111431754A (en) Fault analysis method and system for power distribution and utilization communication network
CN114095522A (en) Vehicle monitoring method, service system, management terminal, vehicle and storage medium
CN116755992B (en) Log analysis method and system based on OpenStack cloud computing
CN116010456A (en) Equipment processing method, server and rail transit system
CN114328107A (en) Monitoring method and system for optomagnetic fusion storage server cluster and electronic equipment
CN111339466A (en) Interface management method and device, electronic equipment and readable storage medium
CN115766768B (en) Perception center design method and device in computing power network operation system
CN115981950A (en) Monitoring alarm method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent for invention or patent application
CB02 Change of applicant information

Address after: 610041 Sichuan province Chengdu city Chengdu high tech Zone Tianyun Road No. 150 High Tech International Plaza D block, room 404

Applicant after: Chengdu Qinzhi Digital Technology Co., Ltd.

Address before: 610041 Sichuan province Chengdu city Chengdu high tech Zone Tianyun Road No. 150 High Tech International Plaza D block, room 404

Applicant before: Chengdu Qinzhi Digital Technology Co., Ltd.

COR Change of bibliographic data

Free format text: CORRECT: APPLICANT; FROM: CHENGDU QINZHI DIGITAL TECHNOLOGY CO., LTD. TO: CHINA CHENGDU WISERV TECHNOLOGY CO., LTD.

C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20110323