CN107846314A - A kind of intelligent operation management system - Google Patents

A kind of intelligent operation management system Download PDF

Info

Publication number
CN107846314A
CN107846314A CN201711049087.1A CN201711049087A CN107846314A CN 107846314 A CN107846314 A CN 107846314A CN 201711049087 A CN201711049087 A CN 201711049087A CN 107846314 A CN107846314 A CN 107846314A
Authority
CN
China
Prior art keywords
module
fault
fault restoration
time
failure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711049087.1A
Other languages
Chinese (zh)
Inventor
姚小艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Yizhou Union Network Technology Co Ltd
Original Assignee
Guangxi Yizhou Union Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Yizhou Union Network Technology Co Ltd filed Critical Guangxi Yizhou Union Network Technology Co Ltd
Priority to CN201711049087.1A priority Critical patent/CN107846314A/en
Publication of CN107846314A publication Critical patent/CN107846314A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention relates to system operation management technical field, especially a kind of intelligent operation management system.Including system monitoring module, fault message identification module, fault restoration module and fault restoration evaluation module;System monitoring module is used for the running status of monitoring system, and when monitoring abnormal, current state parameter and the abnormal conditions monitored are passed to fault information collection module by system monitoring module;Fault message identification module is used to the abnormal conditions that collection module transmission is collected into being identified to be confirmed whether it is false-alarm, and the information transmission that will be deemed as failure is repaired to fault restoration module;Fault restoration module is used to repair failure according to fault signature after the warning message of fault message identification module is received;Fault restoration evaluation module is used to assess the fault restoration result of fault restoration module.The present invention can be to the quick reparation of failure, and can pay no attention to automatically to repairing result and think of repair time long failure and remind keeper to optimize.

Description

A kind of intelligent operation management system
Technical field
The present invention relates to system operation management technical field, especially a kind of intelligent operation management system.
Background technology
IT operational systems scale constantly increases at present, and system is to performance of network equipments such as server, virtual machine, interchangers And its during network connectivty is monitored, operation maintenance personnel can receive increasing monitoring alarm daily, in face of magnanimity For O&M index when system breaks down, operation maintenance personnel is difficult that failure root is quickly found from magnanimity monitor control index because of wind of alarming The sudden and violent speed for significantly reducing orientation problem, fault recovery speed rely on substantially experience and the operation response of operation maintenance personnel Speed.Therefore the intelligent operation platform that an automatic fault diagnosis cooperates with processing with quick recovery system is established, for more scenes Machine learning model and big data expert system are built, inline diagnosis and positioning are carried out to the abnormal of operation platform in real time, when being Quick reparation is realized by performing corresponding strategy when system breaks down, it is desirable to recover normal operation.
The content of the invention
In order to overcome above mentioned problem, the present invention provides a kind of intelligent operation management system, and the exception of system is carried out in real time Inline diagnosis and positioning, quick reparation is realized by performing corresponding strategy when system breaks down, and can be automatically to repairing As a result pay no attention to and think of repair time long failure system for prompting keeper and optimize.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of intelligent operation management system, including system monitoring module, fault message identification module, fault restoration module and Fault restoration evaluation module;
The system monitoring module is used for the running status of monitoring system, and when monitoring abnormal, system monitoring module will Current state parameter and the abnormal conditions monitored pass to fault information collection module;
The fault message identification module is used to the abnormal conditions that collection module transmission is collected into confirmation is identified No is false-alarm, and the information transmission that will be deemed as failure is repaired to fault restoration module;
The fault restoration module is used for after the warning message of fault message identification module is received according to fault signature Failure is repaired;
It is qualified that the fault restoration evaluation module is used to the fault restoration result of fault restoration module assess whether; The fault restoration evaluation module also includes time detecting unit, when the time detecting unit is used to detect fault restoration cost Between and judge whether spent time is more than threshold value.
Further, the fault restoration evaluation module is additionally operable to after fault restoration, according to the running status pair of system Result is repaired every time to be given a mark, and the selfreparing implementation procedure for giving a mark low is periodically submitted into system manager and analyzed, and All scripts corresponding in script calling module are deposited in prompting keeper's optimization.
Further, the time detecting cell operation flow is:When fault restoration module is receiving fault message knowledge After the warning message of other module, the time detecting unit detects and records present system time, when fault restoration module will be former After barrier is repaired, the time time detecting unit detects and records present system time again, and calculates detected twice Time interval, and judge whether the time interval is more than threshold value, when the time interval is more than threshold value, the failure is reviewed one's lessons by oneself Multiple implementation procedure is submitted to system manager and analyzed, and it is all corresponding in script calling module to prompt keeper's optimization to deposit Script.
Further, the threshold value is repair average time needed for the failure 2-3 times.
The invention has the advantages that the monitoring modular in the present invention can carry out complete detection to system, work as detecting system When occurring abnormal, abnormal information is identified fault message identification module determines whether failure, for being judged as failure Information, fault restoration module are effectively repaired to failure, reparation result of the fault restoration evaluation module to fault restoration module System manager can be submitted to for repairing the undesirable failure of result and is analyzed by assess, and the system can also be for reparation During occur the time required to long result remind system keeper to be analyzed and carry out corresponding optimization system.The system is not It is only capable of, to abnormal progress inline diagnosis and positioning, quick reparation being realized by performing corresponding strategy when system breaks down, and And can pay no attention to automatically to repairing result and think of repair time long failure system for prompting keeper and optimize, constantly lifting therefore Hinder repairing effect and efficiency.
Brief description of the drawings
Fig. 1 is the intelligent operation management system structured flowchart of a better embodiment of the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made Embodiment, belong to the scope of protection of the invention.
It should be noted that when component is referred to as " being fixed on " another component, it can be directly on another component Or there may also be component placed in the middle.When a component is considered as " connection " another component, it can be directly connected to To another component or it may be simultaneously present component placed in the middle.When a component is considered as " being arranged at " another component, it Can be set directly on another component or may be simultaneously present component placed in the middle.Term as used herein is " vertical ", " horizontal ", "left", "right" and similar statement for illustrative purposes only.
Unless otherwise defined, all of technologies and scientific terms used here by the article is with belonging to technical field of the invention The implication that technical staff is generally understood that is identical.Term used in the description of the invention herein is intended merely to description tool The purpose of the embodiment of body, it is not intended that in the limitation present invention.Term as used herein " and/or " include one or more phases The arbitrary and all combination of the Listed Items of pass.
Please referring also to Fig. 1 better embodiments of the invention provide a kind of intelligent operation management system, including including being System monitoring modular 10, fault message identification module 20, fault restoration module 30 and fault restoration evaluation module 40.System monitoring mould Block 10 is used for the running status of monitoring system, and when monitoring abnormal, system monitoring module 10 is by current state parameter and prison The abnormal conditions measured pass to fault information collection module 20;Fault message identification module 20 is used to receive collection module transmission The abnormal conditions collected, which are identified, is confirmed whether it is false-alarm, and the information transmission that will be deemed as failure is entered to fault restoration module 30 Row is repaired;30 pieces of fault restoration mould is used for after the warning message of fault message identification module is received according to fault signature pair event Barrier is repaired;Fault restoration evaluation module 40 is used to the fault restoration result of fault restoration module is carried out assessing whether to close Lattice;Fault restoration evaluation module 40 also includes time detecting unit 410, and time detecting unit 410 is used to detect fault restoration flower Whether time time-consuming and that judgement is spent is more than threshold value.
Further, fault restoration evaluation module 40 is additionally operable to after fault restoration, according to the running status of system to every Secondary reparation result is given a mark, and the selfreparing implementation procedure for giving a mark low periodically is submitted into system manager and analyzed, and is carried Show that all scripts corresponding in script calling module are deposited in keeper's optimization.
Further, the workflow of time detecting unit 410 is:When fault restoration module 30 is receiving fault message knowledge After the warning message of other 20 pieces of mould, time detecting unit 410 detects and records present system time, when fault restoration module 30 will After fault restoration, time time detecting unit 410 detects and records present system time again, and calculates detected twice Time interval, and judge whether the time interval is more than threshold value, when the time interval is more than threshold value, the failure is reviewed one's lessons by oneself Multiple implementation procedure is submitted to system manager and analyzed, and it is all corresponding in script calling module to prompt keeper's optimization to deposit Script.The threshold value is repair average time needed for the failure 2-3 times.

Claims (4)

  1. A kind of 1. intelligent operation management system, it is characterised in that:Including system monitoring module, fault message identification module, failure Repair module and fault restoration evaluation module;
    The system monitoring module is used for the running status of monitoring system, and when monitoring abnormal, system monitoring module will be current State parameter and the abnormal conditions that monitor pass to fault information collection module;
    The fault message identification module is used to the abnormal conditions that collection module transmission is collected into being identified to be confirmed whether it is False-alarm, the information transmission that will be deemed as failure are repaired to fault restoration module;
    The fault restoration module is used for after the warning message of fault message identification module is received according to fault signature pair event Barrier is repaired;
    It is qualified that the fault restoration evaluation module is used to the fault restoration result of fault restoration module assess whether;It is described Fault restoration evaluation module also includes time detecting unit, and the time detecting unit is used to detect the fault restoration cost time simultaneously Judge whether the spent time is more than threshold value.
  2. 2. intelligent operation management system according to claim 1, it is characterised in that:The fault restoration evaluation module is additionally operable to After fault restoration, given a mark according to the running status of system to repairing result every time, the low selfreparing that will periodically give a mark is held Row process is submitted to system manager and analyzed, and prompts keeper that all pin corresponding in script calling module are deposited in optimization This.
  3. 3. intelligent operation management system according to claim 1, it is characterised in that:The time detecting cell operation flow For:After fault restoration module is receiving the warning message of fault message identification module, the time detecting unit detection is simultaneously Present system time is recorded, after fault restoration module is by fault restoration, the time time detecting unit is detected and remembered again Present system time is recorded, and calculates time interval detected twice, and judges whether the time interval is more than threshold value, when When the time interval is more than threshold value, the failure selfreparing implementation procedure is submitted into system manager and analyzed, and is prompted All scripts corresponding in script calling module are deposited in keeper's optimization.
  4. 4. intelligent operation management system according to claim 3, it is characterised in that:The threshold value is put down to repair needed for the failure 2-3 times of equal time.
CN201711049087.1A 2017-10-31 2017-10-31 A kind of intelligent operation management system Pending CN107846314A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711049087.1A CN107846314A (en) 2017-10-31 2017-10-31 A kind of intelligent operation management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711049087.1A CN107846314A (en) 2017-10-31 2017-10-31 A kind of intelligent operation management system

Publications (1)

Publication Number Publication Date
CN107846314A true CN107846314A (en) 2018-03-27

Family

ID=61681217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711049087.1A Pending CN107846314A (en) 2017-10-31 2017-10-31 A kind of intelligent operation management system

Country Status (1)

Country Link
CN (1) CN107846314A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197289A (en) * 2019-06-12 2019-09-03 众诚恒祥(北京)科技有限公司 A kind of energy-saving equipment management system based on big data
WO2019214010A1 (en) * 2018-05-08 2019-11-14 网宿科技股份有限公司 Method and device for monitoring for equipment failure
WO2021143483A1 (en) * 2020-01-17 2021-07-22 中兴通讯股份有限公司 System maintenance method and apparatus, device, and storage medium
WO2023045931A1 (en) * 2021-09-24 2023-03-30 华为技术有限公司 Network performance abnormality analysis method and apparatus, and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104038373A (en) * 2014-05-30 2014-09-10 国家电网公司 Information early warning and self repairing system and method
CN105262616A (en) * 2015-09-21 2016-01-20 浪潮集团有限公司 Failure repository-based automated failure processing system and method
CN105550100A (en) * 2015-12-11 2016-05-04 国家电网公司 Method and system for automatic fault recovery of information system
CN106204330A (en) * 2016-07-18 2016-12-07 国网山东省电力公司济南市历城区供电公司 A kind of power distribution network intelligent diagnosis system
CN106209428A (en) * 2016-06-28 2016-12-07 武汉合创源科技有限公司 A kind of website failure monitoring method for early warning and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104038373A (en) * 2014-05-30 2014-09-10 国家电网公司 Information early warning and self repairing system and method
CN105262616A (en) * 2015-09-21 2016-01-20 浪潮集团有限公司 Failure repository-based automated failure processing system and method
CN105550100A (en) * 2015-12-11 2016-05-04 国家电网公司 Method and system for automatic fault recovery of information system
CN106209428A (en) * 2016-06-28 2016-12-07 武汉合创源科技有限公司 A kind of website failure monitoring method for early warning and system
CN106204330A (en) * 2016-07-18 2016-12-07 国网山东省电力公司济南市历城区供电公司 A kind of power distribution network intelligent diagnosis system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019214010A1 (en) * 2018-05-08 2019-11-14 网宿科技股份有限公司 Method and device for monitoring for equipment failure
CN110197289A (en) * 2019-06-12 2019-09-03 众诚恒祥(北京)科技有限公司 A kind of energy-saving equipment management system based on big data
CN110197289B (en) * 2019-06-12 2020-08-25 众诚恒祥(北京)科技有限公司 Energy-saving equipment management system based on big data
WO2021143483A1 (en) * 2020-01-17 2021-07-22 中兴通讯股份有限公司 System maintenance method and apparatus, device, and storage medium
WO2023045931A1 (en) * 2021-09-24 2023-03-30 华为技术有限公司 Network performance abnormality analysis method and apparatus, and readable storage medium

Similar Documents

Publication Publication Date Title
CN107612756A (en) A kind of operation management system with intelligent trouble analyzing and processing function
CN107862393A (en) A kind of IT operation management system
CN106655522B (en) A kind of main station system suitable for electric grid secondary equipment operation management
CN107846314A (en) A kind of intelligent operation management system
CN108847968B (en) Monitoring accident and abnormal event identification and multidimensional analysis method
CN105337765A (en) Distributed hadoop cluster fault automatic diagnosis and restoration system
CN103078403B (en) On-line state evaluation method for secondary system of intelligent substation
CN107656156B (en) A kind of equipment fault diagnosis and operating status appraisal procedure and system based on cloud platform
CN105634133A (en) Power supply and distribution monitoring system
CN105911424B (en) A kind of recognition methods based on fault detector false positive signal
CN109672175B (en) Power grid control method and device
CN108776625A (en) A kind of restorative procedure of service fault, device and storage medium
CN112434826A (en) Intelligent analysis and early warning system for operation and maintenance faults of charging facilities of charging pile
CN110020791A (en) A kind of product design method based on liability management
CN107390604A (en) The inspection method and system of unattended operation transformer station electrical secondary system novel maintenance
CN112396292A (en) Substation equipment risk management and control system based on Internet of things and edge calculation
CN105067959B (en) Fault Locating Method under the conditions of ring network power supply
CN110350660B (en) Online monitoring method and system for relay protection function pressing plate
CN113471864A (en) Transformer substation secondary equipment field maintenance device and method
CN104977870A (en) Auxiliary treating system for workshop equipment accidents and method thereof
CN104417504B (en) The security protection subsystem of battery replacement of electric automobile system
CN115422504A (en) Power distribution equipment fault risk identification method and device
CN108520788A (en) A kind of processing system and method for nuclear power plant's unit starting and alarm of stopping transport
CN111026097A (en) Fault self-diagnosis and early-warning method for inspection robot
CN104836335A (en) Method for quickly discovering dead halt of measurement and control apparatus based on open3000 intelligent monitoring system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180327

RJ01 Rejection of invention patent application after publication