CN107846314A - A kind of intelligent operation management system - Google Patents
A kind of intelligent operation management system Download PDFInfo
- Publication number
- CN107846314A CN107846314A CN201711049087.1A CN201711049087A CN107846314A CN 107846314 A CN107846314 A CN 107846314A CN 201711049087 A CN201711049087 A CN 201711049087A CN 107846314 A CN107846314 A CN 107846314A
- Authority
- CN
- China
- Prior art keywords
- module
- fault
- fault restoration
- time
- failure
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0677—Localisation of faults
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Debugging And Monitoring (AREA)
Abstract
The present invention relates to system operation management technical field, especially a kind of intelligent operation management system.Including system monitoring module, fault message identification module, fault restoration module and fault restoration evaluation module;System monitoring module is used for the running status of monitoring system, and when monitoring abnormal, current state parameter and the abnormal conditions monitored are passed to fault information collection module by system monitoring module;Fault message identification module is used to the abnormal conditions that collection module transmission is collected into being identified to be confirmed whether it is false-alarm, and the information transmission that will be deemed as failure is repaired to fault restoration module;Fault restoration module is used to repair failure according to fault signature after the warning message of fault message identification module is received;Fault restoration evaluation module is used to assess the fault restoration result of fault restoration module.The present invention can be to the quick reparation of failure, and can pay no attention to automatically to repairing result and think of repair time long failure and remind keeper to optimize.
Description
Technical field
The present invention relates to system operation management technical field, especially a kind of intelligent operation management system.
Background technology
IT operational systems scale constantly increases at present, and system is to performance of network equipments such as server, virtual machine, interchangers
And its during network connectivty is monitored, operation maintenance personnel can receive increasing monitoring alarm daily, in face of magnanimity
For O&M index when system breaks down, operation maintenance personnel is difficult that failure root is quickly found from magnanimity monitor control index because of wind of alarming
The sudden and violent speed for significantly reducing orientation problem, fault recovery speed rely on substantially experience and the operation response of operation maintenance personnel
Speed.Therefore the intelligent operation platform that an automatic fault diagnosis cooperates with processing with quick recovery system is established, for more scenes
Machine learning model and big data expert system are built, inline diagnosis and positioning are carried out to the abnormal of operation platform in real time, when being
Quick reparation is realized by performing corresponding strategy when system breaks down, it is desirable to recover normal operation.
The content of the invention
In order to overcome above mentioned problem, the present invention provides a kind of intelligent operation management system, and the exception of system is carried out in real time
Inline diagnosis and positioning, quick reparation is realized by performing corresponding strategy when system breaks down, and can be automatically to repairing
As a result pay no attention to and think of repair time long failure system for prompting keeper and optimize.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of intelligent operation management system, including system monitoring module, fault message identification module, fault restoration module and
Fault restoration evaluation module;
The system monitoring module is used for the running status of monitoring system, and when monitoring abnormal, system monitoring module will
Current state parameter and the abnormal conditions monitored pass to fault information collection module;
The fault message identification module is used to the abnormal conditions that collection module transmission is collected into confirmation is identified
No is false-alarm, and the information transmission that will be deemed as failure is repaired to fault restoration module;
The fault restoration module is used for after the warning message of fault message identification module is received according to fault signature
Failure is repaired;
It is qualified that the fault restoration evaluation module is used to the fault restoration result of fault restoration module assess whether;
The fault restoration evaluation module also includes time detecting unit, when the time detecting unit is used to detect fault restoration cost
Between and judge whether spent time is more than threshold value.
Further, the fault restoration evaluation module is additionally operable to after fault restoration, according to the running status pair of system
Result is repaired every time to be given a mark, and the selfreparing implementation procedure for giving a mark low is periodically submitted into system manager and analyzed, and
All scripts corresponding in script calling module are deposited in prompting keeper's optimization.
Further, the time detecting cell operation flow is:When fault restoration module is receiving fault message knowledge
After the warning message of other module, the time detecting unit detects and records present system time, when fault restoration module will be former
After barrier is repaired, the time time detecting unit detects and records present system time again, and calculates detected twice
Time interval, and judge whether the time interval is more than threshold value, when the time interval is more than threshold value, the failure is reviewed one's lessons by oneself
Multiple implementation procedure is submitted to system manager and analyzed, and it is all corresponding in script calling module to prompt keeper's optimization to deposit
Script.
Further, the threshold value is repair average time needed for the failure 2-3 times.
The invention has the advantages that the monitoring modular in the present invention can carry out complete detection to system, work as detecting system
When occurring abnormal, abnormal information is identified fault message identification module determines whether failure, for being judged as failure
Information, fault restoration module are effectively repaired to failure, reparation result of the fault restoration evaluation module to fault restoration module
System manager can be submitted to for repairing the undesirable failure of result and is analyzed by assess, and the system can also be for reparation
During occur the time required to long result remind system keeper to be analyzed and carry out corresponding optimization system.The system is not
It is only capable of, to abnormal progress inline diagnosis and positioning, quick reparation being realized by performing corresponding strategy when system breaks down, and
And can pay no attention to automatically to repairing result and think of repair time long failure system for prompting keeper and optimize, constantly lifting therefore
Hinder repairing effect and efficiency.
Brief description of the drawings
Fig. 1 is the intelligent operation management system structured flowchart of a better embodiment of the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made
Embodiment, belong to the scope of protection of the invention.
It should be noted that when component is referred to as " being fixed on " another component, it can be directly on another component
Or there may also be component placed in the middle.When a component is considered as " connection " another component, it can be directly connected to
To another component or it may be simultaneously present component placed in the middle.When a component is considered as " being arranged at " another component, it
Can be set directly on another component or may be simultaneously present component placed in the middle.Term as used herein is " vertical
", " horizontal ", "left", "right" and similar statement for illustrative purposes only.
Unless otherwise defined, all of technologies and scientific terms used here by the article is with belonging to technical field of the invention
The implication that technical staff is generally understood that is identical.Term used in the description of the invention herein is intended merely to description tool
The purpose of the embodiment of body, it is not intended that in the limitation present invention.Term as used herein " and/or " include one or more phases
The arbitrary and all combination of the Listed Items of pass.
Please referring also to Fig. 1 better embodiments of the invention provide a kind of intelligent operation management system, including including being
System monitoring modular 10, fault message identification module 20, fault restoration module 30 and fault restoration evaluation module 40.System monitoring mould
Block 10 is used for the running status of monitoring system, and when monitoring abnormal, system monitoring module 10 is by current state parameter and prison
The abnormal conditions measured pass to fault information collection module 20;Fault message identification module 20 is used to receive collection module transmission
The abnormal conditions collected, which are identified, is confirmed whether it is false-alarm, and the information transmission that will be deemed as failure is entered to fault restoration module 30
Row is repaired;30 pieces of fault restoration mould is used for after the warning message of fault message identification module is received according to fault signature pair event
Barrier is repaired;Fault restoration evaluation module 40 is used to the fault restoration result of fault restoration module is carried out assessing whether to close
Lattice;Fault restoration evaluation module 40 also includes time detecting unit 410, and time detecting unit 410 is used to detect fault restoration flower
Whether time time-consuming and that judgement is spent is more than threshold value.
Further, fault restoration evaluation module 40 is additionally operable to after fault restoration, according to the running status of system to every
Secondary reparation result is given a mark, and the selfreparing implementation procedure for giving a mark low periodically is submitted into system manager and analyzed, and is carried
Show that all scripts corresponding in script calling module are deposited in keeper's optimization.
Further, the workflow of time detecting unit 410 is:When fault restoration module 30 is receiving fault message knowledge
After the warning message of other 20 pieces of mould, time detecting unit 410 detects and records present system time, when fault restoration module 30 will
After fault restoration, time time detecting unit 410 detects and records present system time again, and calculates detected twice
Time interval, and judge whether the time interval is more than threshold value, when the time interval is more than threshold value, the failure is reviewed one's lessons by oneself
Multiple implementation procedure is submitted to system manager and analyzed, and it is all corresponding in script calling module to prompt keeper's optimization to deposit
Script.The threshold value is repair average time needed for the failure 2-3 times.
Claims (4)
- A kind of 1. intelligent operation management system, it is characterised in that:Including system monitoring module, fault message identification module, failure Repair module and fault restoration evaluation module;The system monitoring module is used for the running status of monitoring system, and when monitoring abnormal, system monitoring module will be current State parameter and the abnormal conditions that monitor pass to fault information collection module;The fault message identification module is used to the abnormal conditions that collection module transmission is collected into being identified to be confirmed whether it is False-alarm, the information transmission that will be deemed as failure are repaired to fault restoration module;The fault restoration module is used for after the warning message of fault message identification module is received according to fault signature pair event Barrier is repaired;It is qualified that the fault restoration evaluation module is used to the fault restoration result of fault restoration module assess whether;It is described Fault restoration evaluation module also includes time detecting unit, and the time detecting unit is used to detect the fault restoration cost time simultaneously Judge whether the spent time is more than threshold value.
- 2. intelligent operation management system according to claim 1, it is characterised in that:The fault restoration evaluation module is additionally operable to After fault restoration, given a mark according to the running status of system to repairing result every time, the low selfreparing that will periodically give a mark is held Row process is submitted to system manager and analyzed, and prompts keeper that all pin corresponding in script calling module are deposited in optimization This.
- 3. intelligent operation management system according to claim 1, it is characterised in that:The time detecting cell operation flow For:After fault restoration module is receiving the warning message of fault message identification module, the time detecting unit detection is simultaneously Present system time is recorded, after fault restoration module is by fault restoration, the time time detecting unit is detected and remembered again Present system time is recorded, and calculates time interval detected twice, and judges whether the time interval is more than threshold value, when When the time interval is more than threshold value, the failure selfreparing implementation procedure is submitted into system manager and analyzed, and is prompted All scripts corresponding in script calling module are deposited in keeper's optimization.
- 4. intelligent operation management system according to claim 3, it is characterised in that:The threshold value is put down to repair needed for the failure 2-3 times of equal time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711049087.1A CN107846314A (en) | 2017-10-31 | 2017-10-31 | A kind of intelligent operation management system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711049087.1A CN107846314A (en) | 2017-10-31 | 2017-10-31 | A kind of intelligent operation management system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107846314A true CN107846314A (en) | 2018-03-27 |
Family
ID=61681217
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711049087.1A Pending CN107846314A (en) | 2017-10-31 | 2017-10-31 | A kind of intelligent operation management system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107846314A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110197289A (en) * | 2019-06-12 | 2019-09-03 | 众诚恒祥(北京)科技有限公司 | A kind of energy-saving equipment management system based on big data |
WO2019214010A1 (en) * | 2018-05-08 | 2019-11-14 | 网宿科技股份有限公司 | Method and device for monitoring for equipment failure |
WO2021143483A1 (en) * | 2020-01-17 | 2021-07-22 | 中兴通讯股份有限公司 | System maintenance method and apparatus, device, and storage medium |
WO2023045931A1 (en) * | 2021-09-24 | 2023-03-30 | 华为技术有限公司 | Network performance abnormality analysis method and apparatus, and readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104038373A (en) * | 2014-05-30 | 2014-09-10 | 国家电网公司 | Information early warning and self repairing system and method |
CN105262616A (en) * | 2015-09-21 | 2016-01-20 | 浪潮集团有限公司 | Failure repository-based automated failure processing system and method |
CN105550100A (en) * | 2015-12-11 | 2016-05-04 | 国家电网公司 | Method and system for automatic fault recovery of information system |
CN106204330A (en) * | 2016-07-18 | 2016-12-07 | 国网山东省电力公司济南市历城区供电公司 | A kind of power distribution network intelligent diagnosis system |
CN106209428A (en) * | 2016-06-28 | 2016-12-07 | 武汉合创源科技有限公司 | A kind of website failure monitoring method for early warning and system |
-
2017
- 2017-10-31 CN CN201711049087.1A patent/CN107846314A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104038373A (en) * | 2014-05-30 | 2014-09-10 | 国家电网公司 | Information early warning and self repairing system and method |
CN105262616A (en) * | 2015-09-21 | 2016-01-20 | 浪潮集团有限公司 | Failure repository-based automated failure processing system and method |
CN105550100A (en) * | 2015-12-11 | 2016-05-04 | 国家电网公司 | Method and system for automatic fault recovery of information system |
CN106209428A (en) * | 2016-06-28 | 2016-12-07 | 武汉合创源科技有限公司 | A kind of website failure monitoring method for early warning and system |
CN106204330A (en) * | 2016-07-18 | 2016-12-07 | 国网山东省电力公司济南市历城区供电公司 | A kind of power distribution network intelligent diagnosis system |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019214010A1 (en) * | 2018-05-08 | 2019-11-14 | 网宿科技股份有限公司 | Method and device for monitoring for equipment failure |
CN110197289A (en) * | 2019-06-12 | 2019-09-03 | 众诚恒祥(北京)科技有限公司 | A kind of energy-saving equipment management system based on big data |
CN110197289B (en) * | 2019-06-12 | 2020-08-25 | 众诚恒祥(北京)科技有限公司 | Energy-saving equipment management system based on big data |
WO2021143483A1 (en) * | 2020-01-17 | 2021-07-22 | 中兴通讯股份有限公司 | System maintenance method and apparatus, device, and storage medium |
WO2023045931A1 (en) * | 2021-09-24 | 2023-03-30 | 华为技术有限公司 | Network performance abnormality analysis method and apparatus, and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107612756A (en) | A kind of operation management system with intelligent trouble analyzing and processing function | |
CN107862393A (en) | A kind of IT operation management system | |
CN106655522B (en) | A kind of main station system suitable for electric grid secondary equipment operation management | |
CN107846314A (en) | A kind of intelligent operation management system | |
CN108847968B (en) | Monitoring accident and abnormal event identification and multidimensional analysis method | |
CN105337765A (en) | Distributed hadoop cluster fault automatic diagnosis and restoration system | |
CN103078403B (en) | On-line state evaluation method for secondary system of intelligent substation | |
CN107656156B (en) | A kind of equipment fault diagnosis and operating status appraisal procedure and system based on cloud platform | |
CN105634133A (en) | Power supply and distribution monitoring system | |
CN105911424B (en) | A kind of recognition methods based on fault detector false positive signal | |
CN109672175B (en) | Power grid control method and device | |
CN108776625A (en) | A kind of restorative procedure of service fault, device and storage medium | |
CN112434826A (en) | Intelligent analysis and early warning system for operation and maintenance faults of charging facilities of charging pile | |
CN110020791A (en) | A kind of product design method based on liability management | |
CN107390604A (en) | The inspection method and system of unattended operation transformer station electrical secondary system novel maintenance | |
CN112396292A (en) | Substation equipment risk management and control system based on Internet of things and edge calculation | |
CN105067959B (en) | Fault Locating Method under the conditions of ring network power supply | |
CN110350660B (en) | Online monitoring method and system for relay protection function pressing plate | |
CN113471864A (en) | Transformer substation secondary equipment field maintenance device and method | |
CN104977870A (en) | Auxiliary treating system for workshop equipment accidents and method thereof | |
CN104417504B (en) | The security protection subsystem of battery replacement of electric automobile system | |
CN115422504A (en) | Power distribution equipment fault risk identification method and device | |
CN108520788A (en) | A kind of processing system and method for nuclear power plant's unit starting and alarm of stopping transport | |
CN111026097A (en) | Fault self-diagnosis and early-warning method for inspection robot | |
CN104836335A (en) | Method for quickly discovering dead halt of measurement and control apparatus based on open3000 intelligent monitoring system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180327 |
|
RJ01 | Rejection of invention patent application after publication |