CN107846314A

CN107846314A - A kind of intelligent operation management system

Info

Publication number: CN107846314A
Application number: CN201711049087.1A
Authority: CN
Inventors: 姚小艳
Original assignee: Guangxi Yizhou Union Network Technology Co Ltd
Current assignee: Guangxi Yizhou Union Network Technology Co Ltd
Priority date: 2017-10-31
Filing date: 2017-10-31
Publication date: 2018-03-27

Abstract

The present invention relates to system operation management technical field, especially a kind of intelligent operation management system.Including system monitoring module, fault message identification module, fault restoration module and fault restoration evaluation module；System monitoring module is used for the running status of monitoring system, and when monitoring abnormal, current state parameter and the abnormal conditions monitored are passed to fault information collection module by system monitoring module；Fault message identification module is used to the abnormal conditions that collection module transmission is collected into being identified to be confirmed whether it is false-alarm, and the information transmission that will be deemed as failure is repaired to fault restoration module；Fault restoration module is used to repair failure according to fault signature after the warning message of fault message identification module is received；Fault restoration evaluation module is used to assess the fault restoration result of fault restoration module.The present invention can be to the quick reparation of failure, and can pay no attention to automatically to repairing result and think of repair time long failure and remind keeper to optimize.

Description

A kind of intelligent operation management system

Technical field

The present invention relates to system operation management technical field, especially a kind of intelligent operation management system.

Background technology

IT operational systems scale constantly increases at present, and system is to performance of network equipments such as server, virtual machine, interchangers And its during network connectivty is monitored, operation maintenance personnel can receive increasing monitoring alarm daily, in face of magnanimity For O＆M index when system breaks down, operation maintenance personnel is difficult that failure root is quickly found from magnanimity monitor control index because of wind of alarming The sudden and violent speed for significantly reducing orientation problem, fault recovery speed rely on substantially experience and the operation response of operation maintenance personnel Speed.Therefore the intelligent operation platform that an automatic fault diagnosis cooperates with processing with quick recovery system is established, for more scenes Machine learning model and big data expert system are built, inline diagnosis and positioning are carried out to the abnormal of operation platform in real time, when being Quick reparation is realized by performing corresponding strategy when system breaks down, it is desirable to recover normal operation.

The content of the invention

In order to overcome above mentioned problem, the present invention provides a kind of intelligent operation management system, and the exception of system is carried out in real time Inline diagnosis and positioning, quick reparation is realized by performing corresponding strategy when system breaks down, and can be automatically to repairing As a result pay no attention to and think of repair time long failure system for prompting keeper and optimize.

The technical solution adopted for the present invention to solve the technical problems is：

A kind of intelligent operation management system, including system monitoring module, fault message identification module, fault restoration module and Fault restoration evaluation module；

The system monitoring module is used for the running status of monitoring system, and when monitoring abnormal, system monitoring module will Current state parameter and the abnormal conditions monitored pass to fault information collection module；

The fault message identification module is used to the abnormal conditions that collection module transmission is collected into confirmation is identified No is false-alarm, and the information transmission that will be deemed as failure is repaired to fault restoration module；

The fault restoration module is used for after the warning message of fault message identification module is received according to fault signature Failure is repaired；

It is qualified that the fault restoration evaluation module is used to the fault restoration result of fault restoration module assess whether； The fault restoration evaluation module also includes time detecting unit, when the time detecting unit is used to detect fault restoration cost Between and judge whether spent time is more than threshold value.

Further, the fault restoration evaluation module is additionally operable to after fault restoration, according to the running status pair of system Result is repaired every time to be given a mark, and the selfreparing implementation procedure for giving a mark low is periodically submitted into system manager and analyzed, and All scripts corresponding in script calling module are deposited in prompting keeper's optimization.

Further, the time detecting cell operation flow is：When fault restoration module is receiving fault message knowledge After the warning message of other module, the time detecting unit detects and records present system time, when fault restoration module will be former After barrier is repaired, the time time detecting unit detects and records present system time again, and calculates detected twice Time interval, and judge whether the time interval is more than threshold value, when the time interval is more than threshold value, the failure is reviewed one's lessons by oneself Multiple implementation procedure is submitted to system manager and analyzed, and it is all corresponding in script calling module to prompt keeper's optimization to deposit Script.

Further, the threshold value is repair average time needed for the failure 2-3 times.

The invention has the advantages that the monitoring modular in the present invention can carry out complete detection to system, work as detecting system When occurring abnormal, abnormal information is identified fault message identification module determines whether failure, for being judged as failure Information, fault restoration module are effectively repaired to failure, reparation result of the fault restoration evaluation module to fault restoration module System manager can be submitted to for repairing the undesirable failure of result and is analyzed by assess, and the system can also be for reparation During occur the time required to long result remind system keeper to be analyzed and carry out corresponding optimization system.The system is not It is only capable of, to abnormal progress inline diagnosis and positioning, quick reparation being realized by performing corresponding strategy when system breaks down, and And can pay no attention to automatically to repairing result and think of repair time long failure system for prompting keeper and optimize, constantly lifting therefore Hinder repairing effect and efficiency.

Brief description of the drawings

Fig. 1 is the intelligent operation management system structured flowchart of a better embodiment of the invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made Embodiment, belong to the scope of protection of the invention.

It should be noted that when component is referred to as " being fixed on " another component, it can be directly on another component Or there may also be component placed in the middle.When a component is considered as " connection " another component, it can be directly connected to To another component or it may be simultaneously present component placed in the middle.When a component is considered as " being arranged at " another component, it Can be set directly on another component or may be simultaneously present component placed in the middle.Term as used herein is " vertical ", " horizontal ", "left", "right" and similar statement for illustrative purposes only.

Unless otherwise defined, all of technologies and scientific terms used here by the article is with belonging to technical field of the invention The implication that technical staff is generally understood that is identical.Term used in the description of the invention herein is intended merely to description tool The purpose of the embodiment of body, it is not intended that in the limitation present invention.Term as used herein " and/or " include one or more phases The arbitrary and all combination of the Listed Items of pass.

Please referring also to Fig. 1 better embodiments of the invention provide a kind of intelligent operation management system, including including being System monitoring modular 10, fault message identification module 20, fault restoration module 30 and fault restoration evaluation module 40.System monitoring mould Block 10 is used for the running status of monitoring system, and when monitoring abnormal, system monitoring module 10 is by current state parameter and prison The abnormal conditions measured pass to fault information collection module 20；Fault message identification module 20 is used to receive collection module transmission The abnormal conditions collected, which are identified, is confirmed whether it is false-alarm, and the information transmission that will be deemed as failure is entered to fault restoration module 30 Row is repaired；30 pieces of fault restoration mould is used for after the warning message of fault message identification module is received according to fault signature pair event Barrier is repaired；Fault restoration evaluation module 40 is used to the fault restoration result of fault restoration module is carried out assessing whether to close Lattice；Fault restoration evaluation module 40 also includes time detecting unit 410, and time detecting unit 410 is used to detect fault restoration flower Whether time time-consuming and that judgement is spent is more than threshold value.

Further, fault restoration evaluation module 40 is additionally operable to after fault restoration, according to the running status of system to every Secondary reparation result is given a mark, and the selfreparing implementation procedure for giving a mark low periodically is submitted into system manager and analyzed, and is carried Show that all scripts corresponding in script calling module are deposited in keeper's optimization.

Further, the workflow of time detecting unit 410 is：When fault restoration module 30 is receiving fault message knowledge After the warning message of other 20 pieces of mould, time detecting unit 410 detects and records present system time, when fault restoration module 30 will After fault restoration, time time detecting unit 410 detects and records present system time again, and calculates detected twice Time interval, and judge whether the time interval is more than threshold value, when the time interval is more than threshold value, the failure is reviewed one's lessons by oneself Multiple implementation procedure is submitted to system manager and analyzed, and it is all corresponding in script calling module to prompt keeper's optimization to deposit Script.The threshold value is repair average time needed for the failure 2-3 times.

Claims

A kind of 1. intelligent operation management system, it is characterised in that：Including system monitoring module, fault message identification module, failure Repair module and fault restoration evaluation module；

The system monitoring module is used for the running status of monitoring system, and when monitoring abnormal, system monitoring module will be current State parameter and the abnormal conditions that monitor pass to fault information collection module；

The fault message identification module is used to the abnormal conditions that collection module transmission is collected into being identified to be confirmed whether it is False-alarm, the information transmission that will be deemed as failure are repaired to fault restoration module；

The fault restoration module is used for after the warning message of fault message identification module is received according to fault signature pair event Barrier is repaired；

It is qualified that the fault restoration evaluation module is used to the fault restoration result of fault restoration module assess whether；It is described Fault restoration evaluation module also includes time detecting unit, and the time detecting unit is used to detect the fault restoration cost time simultaneously Judge whether the spent time is more than threshold value.
2. intelligent operation management system according to claim 1, it is characterised in that：The fault restoration evaluation module is additionally operable to After fault restoration, given a mark according to the running status of system to repairing result every time, the low selfreparing that will periodically give a mark is held Row process is submitted to system manager and analyzed, and prompts keeper that all pin corresponding in script calling module are deposited in optimization This.
3. intelligent operation management system according to claim 1, it is characterised in that：The time detecting cell operation flow For：After fault restoration module is receiving the warning message of fault message identification module, the time detecting unit detection is simultaneously Present system time is recorded, after fault restoration module is by fault restoration, the time time detecting unit is detected and remembered again Present system time is recorded, and calculates time interval detected twice, and judges whether the time interval is more than threshold value, when When the time interval is more than threshold value, the failure selfreparing implementation procedure is submitted into system manager and analyzed, and is prompted All scripts corresponding in script calling module are deposited in keeper's optimization.
4. intelligent operation management system according to claim 3, it is characterised in that：The threshold value is put down to repair needed for the failure 2-3 times of equal time.