CN106254137B - The alarm root analysis system and method for supervisory systems - Google Patents

The alarm root analysis system and method for supervisory systems Download PDF

Info

Publication number
CN106254137B
CN106254137B CN201610772896.4A CN201610772896A CN106254137B CN 106254137 B CN106254137 B CN 106254137B CN 201610772896 A CN201610772896 A CN 201610772896A CN 106254137 B CN106254137 B CN 106254137B
Authority
CN
China
Prior art keywords
alarm
link
information
daughter
root
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610772896.4A
Other languages
Chinese (zh)
Other versions
CN106254137A (en
Inventor
李保平
张宏
谢超
刘庆锐
杨建荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huitong Guoxin Technology Co ltd
Original Assignee
Guangzhou Huitong Guoxin Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huitong Guoxin Mdt Infotech Ltd filed Critical Guangzhou Huitong Guoxin Mdt Infotech Ltd
Priority to CN201610772896.4A priority Critical patent/CN106254137B/en
Publication of CN106254137A publication Critical patent/CN106254137A/en
Application granted granted Critical
Publication of CN106254137B publication Critical patent/CN106254137B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0604Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0604Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
    • H04L41/0618Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time based on the physical or logical position
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0604Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
    • H04L41/0622Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time based on time
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses the alarm root analysis systems and method of a kind of supervisory systems, including IT element to collect module, alarm root analyzing and positioning module and alarm root processing module;IT element collects module and collects monitored IT element progress information for collecting means by information, and database is written in the IT element after collecting;It alerts root analyzing and positioning module and is used for the physics according to each IT element, the progress link combing of logical access relationship and analysis, calculate and filtered with noise according to link adaptation, the superposition of event link, weight area, position alarm root;Root processing module is alerted for handling alarm root.The present invention realizes the result association of the positioning of alarm root, coverage index of correlation, facilitates operation maintenance personnel alert quickly positioning and consults, and carries out failure solution on this basis.

Description

The alarm root analysis system and method for supervisory systems
Technical field
The present invention relates to IT supervisory systems technical field, particularly relate to a kind of supervisory systems alarm root analysis system and Method.
Background technique
With IT technology develop, will appear in IT environment various equipment, using etc., for IT supervisory systems It is required that also gradually covering hardware, network message, cluster, virtualization, operating system, application software, database, middleware net Each IT level such as member monitoring, code components and browser, but when event occurs in some IT node element, network management system often by In lacking linkage analysis, leading to the IT element, laterally and longitudinally associated nodes will appear fault cues information, cause O&M people Member can only be checked one by one using network management tool, manual script, rule of thumb be analyzed in mass alarm information by experience The root that locating events occur not in time due to positioning to be delayed repairing opportunity, causes unnecessary traffic lost and fortune Tie up cost waste.Experience in addition to relying on operation maintenance personnel, IT supervisory systems can also pass through the announcement of each IT element index of correlation of setting Alert threshold values information, after the indication information of each IT element is collected, system judges whether the numerical value of specific targets meets alarm valve It is worth setting condition, such as meets, then alarm prompting directly is carried out to the IT element index.By then passing through each IT element index of correlation Threshold values be defined after, as long as the IT element index meets alarm threshold value setting condition, which can be directly displayed Out;But since in IT system, each IT element will not be individually present substantially, but certain physical access is formed with periphery and is closed System, logical access relationship, thus the technology there are the shortcomings that comprising as follows:
1 when there is alarm in the index of certain IT element, other IT elements constructed on its basis and its formation access The IT element of relationship will form corresponding event information, when these event informations also exceed the alarm threshold value of itself definition, Then by the Root alarm information of the original single single index of IT element, the alarm that will form multiple multiple indexs of IT element is mentioned It wakes up, information can be caused to interfere O&M treatment people, Root alarm content can not be searched and located out in time.
2 after alarm root is processed, is associated with by linkage originally and alerts and do not have processed index, may refer to this The normal of new data is marked, the warning information of the index can be sunk;But look back historical information later after, it is difficult to find out Why the index will appear alarm, how restores normal, event at that time coverage why again at that time.
Summary of the invention
The present invention proposes the alarm root analysis system and method for a kind of supervisory systems, can be in the magnanimity of processing IT element Positioning, the existing weakness of coverage association that root is alerted when warning information, propose to divide using each element index alarm root Analysis technology finally realizes the result association of the positioning of alarm root, coverage index of correlation, operation maintenance personnel is facilitated alert fastly Speed positioning is consulted, and carries out failure solution on this basis.
The technical scheme of the present invention is realized as follows:
A kind of alarm root analysis method of supervisory systems, specifically includes the following steps:
(1) information collection: by monitored IT element, information is carried out to each IT element by various information collecting method and is returned Database is written in collection, each IT element after collecting;
(2) physics, logical access relational link combing, dimensionality reduction: analyze each IT element it is inside and outside between business model Relationship, logical access relationship and configuration management information are enclosed, dimension-reduction treatment is carried out to the relational link of each IT element, forms each IT member The single link of element;
(3) link adaptation: each IT element after collecting is matched according to the single link relationship of each IT element, is obtained The link overview of each IT element between each single link;
(4) time match is superimposed with event link: according to the available time, duration and time delay of alarm event information, Alarm event information is stored into each single link, the event overview of each single link is carried out;
(5) weight area calculates: according to the indication information and other IT daughter elements of the alarm IT daughter element of each single link Indication information, the data fluctuations rate of each IT daughter element in alarm front and back is calculated, if the data fluctuations rate of other IT daughter elements is greater than Alarm IT daughter element, then be included in the whole weight area of alarm by the data fluctuations rate for alerting IT daughter element;
(6) noise filters: according to data fluctuations alarm IT daughter element previous data fluctuations and be associated with IT daughter element Property, the alarm IT daughter element in whole weight area is filtered, obtains alarm root.
Further, the IT element includes hardware, network message, software systems, application software, code and browser.
Further, in step (1), information is carried out to each IT element by agency or non-proxy access way and is collected.
Further, in step (5), the calculation formula of the data fluctuations rate of each IT element are as follows: data fluctuations rate=announcement The data -1 before data/alarm after police.
The beneficial effects of the present invention are: by the physics on automatic carding IT element periphery, logical relation, and pass through relationship Topological dimensionality reduction operation, the time is superimposed with the event of link relationship, is calculated by weight area, finally according to duration, fluctuation Algorithm is filtered screening, realizes the result association of the positioning of alarm root, coverage index of correlation, operation maintenance personnel is facilitated to carry out Quickly positioning is consulted for alarm, and carries out failure solution on this basis.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention without any creative labor, may be used also for those of ordinary skill in the art To obtain other drawings based on these drawings.
Fig. 1 is the functional block diagram of the alarm root analysis system of supervisory systems of the present invention;
Fig. 2 is the flow chart of the alarm root analysis method of supervisory systems of the present invention;
Fig. 3 be Fig. 2 in physics, logical access relational link combing, dimensionality reduction embodiment functional block diagram.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
As shown in Figure 1, the invention proposes a kind of alarm root analysis system of supervisory systems, including IT element collects mould Block, alarm root analyzing and positioning module and alarm root processing module;IT element collects module for collecting means by information Monitored IT element is carried out information to collect, database is written in the IT element after collecting;Alert root analyzing and positioning module It is folded according to link adaptation, event link for carrying out link combing and analysis according to the physics of each IT element, logical access relationship Add, weight area is calculated and filtered with noise, positioning alarm root;Root processing module is alerted for handling alarm root.This hair The bright physics by automatic carding IT element periphery, logical relation, and operated by relationship topology dimensionality reduction, time and link relationship Event superposition, calculated by weight area, screening be finally filtered according to duration, fluctuation algorithm, realization alarm root Source positioning, the result association of coverage index of correlation facilitate operation maintenance personnel alert quickly positioning and consult, and basic herein Upper progress failure solution.
IT element includes hardware, network message, software systems, application software, code and browser.Specifically, each IT member Element further includes several IT daughter elements.
Database also includes the indication information of each IT element.IT element collects module and carries out data to monitored IT element Acquire and collect, the content for acquiring and collecting includes the indication information of each IT element and IT element, such as hardware model, CPU usage amount, memory usage amount, network flow, process title, code each method response time, browser return code etc..
As shown in Fig. 2, specifically including following step the invention also provides a kind of alarm root analysis method of supervisory systems It is rapid:
(1) information collection: by monitored IT element, information is carried out to each IT element by various information collecting method and is returned Database is written in collection, each IT element after collecting;
Specifically, carrying out information in step (1) to each IT element by agency or non-proxy access way and collecting.IT Element includes hardware, network message, software systems, application software, code and browser.The indication information of IT element, such as firmly The model of part, CPU usage amount, memory usage amount, network flow, process title, code each method response time, browser return Return code etc..
(2) physics, logical access relational link combing, dimensionality reduction: analyze each IT element it is inside and outside between business model Relationship, logical access relationship and configuration management information are enclosed, dimension-reduction treatment is carried out to the relational link of each IT element, forms each IT member The single link of element;Specifically, by the network relation originally between single IT element and periphery element, comb as from originating end element To the single link relationship of end elements.As shown in figure 3, A-E is netted pass before dimensionality reduction to five IT element analyzing examples of A-E Being is single link relationship after combing, dimensionality reduction.
1) internal relations of IT element indicate the relationship of each IT element and daughter element therein: such as physical server member Element, loads virtual platform, each fictitious host computer is mounted with respective operating system, the multiple processes of operating system, Different processes provide different services, as there are also different page codes above information system services.
2) external relations of IT element indicate the relationship between each IT element and other IT elements: such as physical server with It is associated between physical server by interchanger, router, load balancer.
(3) link adaptation: each IT element after collecting is matched according to the single link relationship of each IT element, is obtained The link overview of each IT element between each single link;
(4) time match is superimposed with event link: according to the available time, duration and time delay of alarm event information, Alarm event information is stored into each single link, the event overview of each single link is carried out;
(5) weight area calculates: according to the indication information and other IT daughter elements of the alarm IT daughter element of each single link Indication information, the data fluctuations rate of each IT daughter element in alarm front and back is calculated, if the data fluctuations rate of other IT daughter elements is greater than Alarm IT daughter element, then be included in the whole weight area of alarm by the data fluctuations rate for alerting IT daughter element;
In step (5), the calculation formula of the data fluctuations rate of each IT element are as follows: the number after data fluctuations rate=alarm According to/alarm before data -1.
(6) noise filters: according to data fluctuations alarm IT daughter element previous data fluctuations and be associated with IT daughter element Property, the alarm IT daughter element in whole weight area is filtered, obtains alarm root.
Specifically, IT daughter element also includes several indication informations.
Such as index A, when the historic state of this alarm index is non-alarm, its historical data fluctuation reaches existing The fluctuation of data, but the fluctuation of its associated index B, C is also bigger than current value, then and index A may filter that.
Such as index A, when the historic state of this alarm index is non-alarm, its historical data fluctuation reaches existing The fluctuation of data, but the fluctuation of its associated index B, C is equal to or less than current value, then and index A is classified as alarm root ?.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (4)

1. a kind of alarm root analysis method of supervisory systems, it is characterised in that: specifically includes the following steps:
(1) information collection: by monitored IT element, carrying out information to each IT element by various information collecting method and collect, Database is written in each IT element after collecting;
(2) physics, logical access relational link combing, dimensionality reduction: analyze each IT element it is inside and outside between the scope of business close System, logical access relationship and configuration management information carry out dimension-reduction treatment to the relational link of each IT element, form each IT element Single link;
(3) link adaptation: each IT element after collecting is matched according to the single link relationship of each IT element, is obtained each The link overview of each IT element between single link;
(4) time match is superimposed with event link: according to the available time, duration and time delay of alarm event information, will be accused Alert event information is stored into each single link, carries out the event overview of each single link;
(5) weight area calculates: according to the finger of the indication information of the alarm IT daughter element of each single link and other IT daughter elements Information is marked, the data fluctuations rate of each IT daughter element in alarm front and back is calculated, if the data fluctuations rate of other IT daughter elements is greater than alarm Alarm IT daughter element is then included in the whole weight area of alarm by the data fluctuations rate of IT daughter element;
(6) noise filters: right according to data fluctuations alarm IT daughter element previous data fluctuations and be associated with IT daughter element Alarm IT daughter element in whole weight area is filtered, and obtains alarm root.
2. the alarm root analysis method of supervisory systems according to claim 1, it is characterised in that: the IT element includes Hardware, network message, software systems, application software, code and browser.
3. the alarm root analysis method of supervisory systems according to claim 1, it is characterised in that: in step (1), lead to Agency or non-proxy access way is crossed to collect each IT element progress information.
4. the alarm root analysis method of supervisory systems according to claim 1, it is characterised in that: in step (5), respectively The calculation formula of the data fluctuations rate of IT element are as follows: the data -1 before data/alarm after data fluctuations rate=alarm.
CN201610772896.4A 2016-08-30 2016-08-30 The alarm root analysis system and method for supervisory systems Active CN106254137B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610772896.4A CN106254137B (en) 2016-08-30 2016-08-30 The alarm root analysis system and method for supervisory systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610772896.4A CN106254137B (en) 2016-08-30 2016-08-30 The alarm root analysis system and method for supervisory systems

Publications (2)

Publication Number Publication Date
CN106254137A CN106254137A (en) 2016-12-21
CN106254137B true CN106254137B (en) 2019-05-10

Family

ID=58079648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610772896.4A Active CN106254137B (en) 2016-08-30 2016-08-30 The alarm root analysis system and method for supervisory systems

Country Status (1)

Country Link
CN (1) CN106254137B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108234189B (en) * 2016-12-22 2021-10-08 北京神州泰岳软件股份有限公司 Alarm data processing method and device
CN106951465A (en) * 2017-02-28 2017-07-14 深圳市华傲数据技术有限公司 System failure positioning data analysing method and device
CN108111342B (en) * 2017-12-15 2021-08-27 北京华创网安科技股份有限公司 Visualization-based threat alarm display method
CN109634808B (en) * 2018-12-05 2022-05-10 中信百信银行股份有限公司 Chain monitoring event root cause analysis method based on correlation analysis
CN110351150B (en) * 2019-07-26 2022-08-16 中国工商银行股份有限公司 Fault source determination method and device, electronic equipment and readable storage medium
CN110633165B (en) * 2019-08-15 2022-08-23 平安普惠企业管理有限公司 Fault processing method, device, system server and computer readable storage medium
CN112686773B (en) * 2020-12-17 2024-05-14 贵州电网有限责任公司 Electric power metering all-link key business anomaly positioning model construction method based on fusion business topology
CN114143171B (en) * 2021-11-30 2022-11-29 中电信数智科技有限公司 Alarm root cause positioning method and system based on TR069 protocol

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102136949A (en) * 2011-03-24 2011-07-27 国网电力科学研究院 Method and system for analyzing alarm correlation based on network and time
CN102195826A (en) * 2010-03-10 2011-09-21 杭州华三通信技术有限公司 Method and device for detecting root alarm
CN104636989A (en) * 2015-02-11 2015-05-20 广东电网有限责任公司中山供电局 Electric power system monitoring warning information processing method and system
CN105656699A (en) * 2016-03-29 2016-06-08 网宿科技股份有限公司 Alarm management method and system for content distribution network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102195826A (en) * 2010-03-10 2011-09-21 杭州华三通信技术有限公司 Method and device for detecting root alarm
CN102136949A (en) * 2011-03-24 2011-07-27 国网电力科学研究院 Method and system for analyzing alarm correlation based on network and time
CN104636989A (en) * 2015-02-11 2015-05-20 广东电网有限责任公司中山供电局 Electric power system monitoring warning information processing method and system
CN105656699A (en) * 2016-03-29 2016-06-08 网宿科技股份有限公司 Alarm management method and system for content distribution network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《基于数据挖掘的通信网告警相关性分析研究》;李彤岩;《中国博士学位论文数据库(信息科技辑)》;20110715(第7期);第3.1节

Also Published As

Publication number Publication date
CN106254137A (en) 2016-12-21

Similar Documents

Publication Publication Date Title
CN106254137B (en) The alarm root analysis system and method for supervisory systems
CN111654489B (en) Network security situation sensing method, device, equipment and storage medium
CN105721187B (en) A kind of traffic failure diagnostic method and device
CN111010291B (en) Business process abnormity warning method and device, electronic equipment and storage medium
CN108197261A (en) A kind of wisdom traffic operating system
CN102929773B (en) information collecting method and device
CN108880847A (en) A kind of method and device of positioning failure
KR20180108446A (en) System and method for management of ict infra
CN107239707A (en) A kind of threat data processing method for information system
CN102447570A (en) Monitoring device and method based on health degree analysis
CN108446546A (en) Abnormal access detection method, device, equipment and computer readable storage medium
CN103617038A (en) Service monitoring method and device for distributed application system
CN113190423B (en) Method, device and system for monitoring service data
CN110388315A (en) Oil transfer pump fault recognition method, apparatus and system based on Multi-source Information Fusion
CN109995555A (en) Monitoring method, device, equipment and medium
CN113271224A (en) Node positioning method and device, storage medium and electronic device
CN111382334B (en) Data processing method and device, computer and readable storage medium
CN114374597A (en) Fault processing method, device, equipment and product of network event
CN110363381A (en) A kind of information processing method and device
CN106951360A (en) Data statistics integrity degree computational methods and system
CN111651170B (en) Instance dynamic adjustment method and device and related equipment
CN104346246B (en) Failure prediction method and device
CN114238383A (en) Big data extraction method and device for supply chain monitoring
CN114253819A (en) User operation monitoring method and device and related equipment
CN115706669A (en) Network security situation prediction method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Building B9, No. 11, Kaiyuan Avenue Road, Dongqu street, Huangpu District, Guangzhou, Guangdong Province 510000

Patentee after: Guangzhou Huitong Guoxin Technology Co.,Ltd.

Address before: 510000, one of the 111 rooms, No. 139, West Zhongshan Road, Tianhe District, Guangdong, Guangzhou

Patentee before: GUANGZHOU HUITONE GUOXIN INFORMATION TECHNOLOGY Co.,Ltd.

PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Alarm source analysis system and method of supervision system

Effective date of registration: 20200904

Granted publication date: 20190510

Pledgee: Guangzhou Caold financing Company limited by guarantee

Pledgor: Guangzhou Huitong Guoxin Technology Co.,Ltd.

Registration number: Y2020980005749

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20221027

Granted publication date: 20190510

Pledgee: Guangzhou Caold financing Company limited by guarantee

Pledgor: Guangzhou Huitong Guoxin Technology Co.,Ltd.

Registration number: Y2020980005749

PC01 Cancellation of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Alarm root cause analysis system and method of supervision system

Effective date of registration: 20221031

Granted publication date: 20190510

Pledgee: Guangzhou Caold financing Company limited by guarantee

Pledgor: Guangzhou Huitong Guoxin Technology Co.,Ltd.

Registration number: Y2022980020291

PE01 Entry into force of the registration of the contract for pledge of patent right