CN117076264A - Alarm event processing method, device, equipment and storage medium - Google Patents

Alarm event processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN117076264A
CN117076264A CN202311048663.6A CN202311048663A CN117076264A CN 117076264 A CN117076264 A CN 117076264A CN 202311048663 A CN202311048663 A CN 202311048663A CN 117076264 A CN117076264 A CN 117076264A
Authority
CN
China
Prior art keywords
alarm
event
alarm event
target
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311048663.6A
Other languages
Chinese (zh)
Inventor
黄勇
俞嘉敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Dadi Property Insurance Co ltd
Original Assignee
China Dadi Property Insurance Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Dadi Property Insurance Co ltd filed Critical China Dadi Property Insurance Co ltd
Priority to CN202311048663.6A priority Critical patent/CN117076264A/en
Publication of CN117076264A publication Critical patent/CN117076264A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Alarm Systems (AREA)

Abstract

The application discloses a method, a device, equipment and a storage medium for processing alarm events, which relate to the field of intelligent alarms and comprise the following steps: acquiring alarm events of a preset monitoring alarm tool, integrating the alarm events to obtain initial alarm events, grading according to a preset health degree grading rule, and judging whether event tracking is required for the initial alarm events; if so, generating a target alarm event according to the initial alarm event, executing corresponding event operation, grouping the target alarm event, and generating a corresponding alarm fault list; and analyzing the alarm fault list and sending the analysis result to the service system so that the service system can adjust according to the analysis result. The method can integrate the data of each existing monitoring alarm tool, display the conditions of each tool and the corresponding alarm event to operation and maintenance personnel in a healthy degree, enable the operation and maintenance personnel to manage more conveniently and intuitively, and complete closed loop the alarm event processing through the management flow of the alarm event.

Description

Alarm event processing method, device, equipment and storage medium
Technical Field
The present application relates to the field of intelligent alarm, and in particular, to a method, an apparatus, a device, and a storage medium for processing an alarm event.
Background
In the current monitoring alarm work, most of work still needs to be carried out manually and communication among different units is carried out manually, but the manual dialing process needs to find an alarm, find a person, check an address book and make a call, the whole process of describing the alarm event is at least 1 to 3 minutes, meanwhile, the alarm depends on a mailbox, a short message and the like, and the alarm notification feedback efficiency is low; and the average number of the manually tracked alarm amount is about 15000 per month, the number of alarm events is more, the manual fatigue is followed, meanwhile, the manual intervention fault repair is low in efficiency, 30 minutes are required to be processed after the alarm event is received, and the suspicious points are required to be analyzed and removed through various professional monitoring tools. However, the conventional technical stack monitoring tools are complicated and cannot effectively unify and carry out service association, the monitoring dimensions of various monitoring tools are different, alarm analysis and investigation are required to be carried out through different platforms, and a unified view angle is lacking. In this way, the alarm history data cannot be effectively utilized, stability of the back feeding production cannot be improved, and the circulation and the closed loop of the alarm event are performed through the ITSM (IT Service Management ), and the alarm history data is disjointed from the alarm itself, so that effective relevance cannot be formed. Therefore, how to more effectively notify and process alarm events is a problem to be solved in the art.
Disclosure of Invention
Accordingly, the present application is directed to a method, apparatus, device and storage medium for processing alarm event, which can integrate the data of each existing monitoring alarm tool, and display the conditions of each tool and the corresponding alarm event to the operation and maintenance personnel in terms of "health degree", so that the operation and maintenance personnel can manage more conveniently and intuitively, and complete closed loop for the processing of the alarm event through the above management flow of the alarm event. The specific scheme is as follows:
in a first aspect, the present application provides a method for processing an alarm event, including:
acquiring an alarm event obtained by a preset monitoring alarm tool, integrating the alarm event to obtain an initial alarm event, grading the initial alarm event according to a preset health degree grading rule, and judging whether event tracking is required to be carried out on the initial alarm event according to a grading result;
if event tracking is needed, generating a target alarm event according to the initial alarm event, executing event operation corresponding to the target alarm event, grouping the target alarm event according to a preset grouping rule, and generating a plurality of alarm fault lists corresponding to the target alarm event; the alarm fault list is used for storing the target alarm event to be processed and the related information of the target alarm event;
analyzing the target alarm event in the alarm fault list, and sending an analysis result to a service system corresponding to the preset monitoring alarm tool so that the service system can be adjusted according to the analysis result.
Optionally, the acquiring the alarm event obtained by the preset monitoring alarm tool and integrating the alarm event includes:
and acquiring the alarm events obtained by a plurality of preset monitoring alarm tools, and uniformly storing the alarm events to a Hadoop (an open source software framework for providing distributed storage and calculation) big data component and/or an ES (an open source high-expansion distributed full text search engine) big data component by utilizing a Spark component so as to obtain the initial alarm events.
Optionally, the rating the initial alarm event according to a preset health rating rule, and determining whether the event tracking is required for the initial alarm event according to a rating result includes:
acquiring a historical baseline and an alarm threshold value of the alarm generated by the service system monitored by the preset monitoring alarm tool, and grading the health degree of a technical stack instance of the service system and an upper-layer service system according to the historical baseline and the alarm threshold value to obtain an operation health score of the service system;
and judging whether event tracking is needed for the initial alarm event according to the running health score.
Optionally, if event tracking is required, generating a target alarm event according to the initial alarm event, including:
if event tracking is needed, merging the initial alarm event to an alarm event which is generated in advance to obtain the target alarm event;
or directly generating a corresponding target alarm event according to the initial alarm event.
Optionally, after the directly generating the corresponding target alarm event according to the initial alarm event, the method further includes:
generating an event notification corresponding to the target alarm event, sending the event notification to a first user through a mail and/or enterprise WeChat, carrying out a telephone notification related to the event notification on a second user based on a preset notification range through a preset intelligent robot, and putting the event notification into a kafka message queue; the preset notification scope is divided according to a CMDB (Configuration Management Database ) resource management system.
Optionally, before generating the event notification corresponding to the target alarm event, the method further includes:
monitoring real-time indexes of the service system, screening target alarm indexes meeting preset alarm threshold conditions from the real-time indexes, carrying out noise reduction processing on the target alarm event according to the target alarm indexes, and generating event notification corresponding to the target alarm event according to the processed target alarm event.
Optionally, the analyzing the target alarm event in the alarm fault list includes:
classifying the target alarm event through a naive Bayesian algorithm according to a preset data classification dimension, and classifying an alarm reason corresponding to the target alarm event so as to analyze the target alarm event in the alarm fault list according to the classification result of the target alarm event and the alarm reason.
In a second aspect, the present application provides an alarm event processing apparatus, including:
the event rating module is used for acquiring an alarm event obtained by a preset monitoring alarm tool, integrating the alarm event, obtaining an initial alarm event, rating the initial alarm event according to a preset health rating rule, and judging whether the initial alarm event needs to be subjected to event tracking according to a rating result;
the fault list generation module is used for generating a target alarm event according to the initial alarm event if event tracking is required, executing event operation corresponding to the target alarm event, grouping the target alarm event according to a preset grouping rule and generating a plurality of alarm fault lists corresponding to the target alarm event; the alarm fault list is used for storing the target alarm event to be processed and the related information of the target alarm event;
and the event analysis module is used for analyzing the target alarm event in the alarm fault list and sending an analysis result to a service system corresponding to the preset monitoring alarm tool so that the service system can be adjusted according to the analysis result.
In a third aspect, the present application provides an electronic device comprising a processor and a memory; the memory is used for storing a computer program, and the computer program is loaded and executed by the processor to realize the alarm event processing method.
In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which when executed by a processor implements the alarm event handling method described above.
In the application, an alarm event obtained by a preset monitoring alarm tool is obtained and integrated to obtain an initial alarm event, the initial alarm event is rated according to a preset health rating rule, and whether the initial alarm event needs to be subjected to event tracking is judged according to a rating result; if event tracking is needed, generating a target alarm event according to the initial alarm event, executing event operation corresponding to the target alarm event, grouping the target alarm event according to a preset grouping rule, and generating a plurality of alarm fault lists corresponding to the target alarm event; the alarm fault list is used for storing the target alarm event to be processed and the related information of the target alarm event; analyzing the target alarm event in the alarm fault list, and sending an analysis result to a service system corresponding to the preset monitoring alarm tool so that the service system can be adjusted according to the analysis result. Through the technical scheme, the application can integrate the data of each existing monitoring alarm tool, and display the conditions of each tool and corresponding alarm event to operation and maintenance personnel in a healthy degree, so that the operation and maintenance personnel can manage more conveniently and intuitively, and the problems that various monitoring tools are different in monitoring dimension and complex in technical stack monitoring tools and cannot be unified effectively are avoided. The intelligent alarm is realized through the fault list, the processing efficiency of the alarm event is improved without manual processing, and the alarm event is processed in a complete closed loop through the management flow of the alarm event by effectively utilizing the alarm history data.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an alarm event processing method provided by the application;
FIG. 2 is a flow chart of an alarm event data processing provided by the present application;
FIG. 3 is a flowchart of a specific alarm event processing method according to the present application;
FIG. 4 is a timing diagram for processing an alarm event according to the present application;
FIG. 5 is a schematic diagram of an alarm event processing device according to the present application;
fig. 6 is a block diagram of an electronic device according to the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In the current monitoring alarm work, most of work is performed manually, so that the efficiency is low, meanwhile, the alarm depends on a mailbox, a short message and the like, and the alarm notification feedback efficiency is low; meanwhile, the conventional technical stack monitoring tools are complicated and cannot effectively unify and correlate business, various monitoring tools have different monitoring dimensions, and alarm history data cannot be effectively utilized.
Referring to fig. 1, the embodiment of the application discloses a method for processing an alarm event, which comprises the following steps:
step S11, acquiring an alarm event obtained by a preset monitoring alarm tool, integrating the alarm event, obtaining an initial alarm event, grading the initial alarm event according to a preset health degree grading rule, and judging whether event tracking is required for the initial alarm event according to a grading result.
In this embodiment, the data of each existing monitoring alarm tool may be integrated, and the overall situation of the application is displayed to the operation and maintenance personnel in terms of "health degree" through the data, so that the alarm event obtained by the preset monitoring alarm tool is first obtained, the obtained alarm event is integrated to obtain an initial alarm event, then the initial alarm event is rated according to the preset health degree rating rule, and whether the initial alarm event needs to be subjected to event tracking is determined according to the rating result. As shown in fig. 2, the alarm events monitored by the monitoring tools such as zabbix, log monitoring tools and Dynatrace, prometheus can be obtained to perform unified alarm, so as to perform the next alarm processing and fault solving process. In this embodiment, tracking alarms which need to make an alarm by manually making a call at present can be combined into alarm events by integrating the data of each existing monitoring alarm tool, the indexes triggering the alarm threshold are screened by monitoring all real-time index data of the alarm data, then the alarm information is noise-reduced according to resource grouping and system dividing dimensions, and the alarms are noise-reduced and compressed into events by different dimensions, so that the workload of carrying out alarm tracking can be effectively reduced, the alarm compression is realized, the quick scheduling of the alarm events is facilitated, and the operation and maintenance personnel can be helped to get insight into the root cause of the problem.
It should be noted that, after the alarm event obtained by the preset monitoring alarm tool is obtained, the alarm event may be uniformly saved to the HADOOP big data component and/or the ES big data component by using the Spark component, so as to obtain an initial alarm event. The Spark component can realize the high-efficiency flow batch processing of the flow processing and batch processing of the alarm event data, and the alarm event is unified and put into the big data component. By the technical scheme, the alarm event obtained by the monitoring alarm tool can be acquired and uniformly stored in the big data component in the embodiment, so that alarm aggregation is realized, and the monitoring alarm data of each technical stack monitoring tool are effectively unified.
Based on the above components, when the initial alarm event is rated according to the preset health rating rule, it can be understood that in the big data component, all data can be classified and stored according to the dimensions of the service system and the technical stack, so that after a historical baseline and an alarm threshold value for generating an alarm by the service system monitored by a preset monitoring alarm tool are obtained, health ratings are performed on the technical stack instance of the service system and the upper-layer service system according to the historical baseline and the alarm threshold value, so as to obtain the running health score of the service system; and judging whether event tracking is needed for the initial alarm event according to the running health score. In this way, the data of each existing monitoring alarm tool can be integrated according to the operating health condition of the score evaluation system and the alarm event, so that mass storage is supported when the alarm event is processed, the data submitted by each monitoring tool is stored for data calculation and analysis, and the data are displayed to operation and maintenance personnel in the form of 'health degree', so that unified opening capability is provided, and the operation and maintenance personnel can be helped to manage the received alarm event more efficiently.
Step S12, if event tracking is needed, generating a target alarm event according to the initial alarm event, executing event operation corresponding to the target alarm event, grouping the target alarm event according to a preset grouping rule, and generating a plurality of alarm fault lists corresponding to the target alarm event; the alarm fault list is used for storing the target alarm event to be processed and the related information of the target alarm event.
In this embodiment, if it is determined that the current generated alarm event needs to be tracked, a target alarm event is generated according to the initial alarm event, an event operation corresponding to the target alarm event is executed, the target alarm event is grouped according to a preset grouping rule, and an alarm fault list corresponding to a plurality of target alarm events is generated. It can be understood that in the preset grouping rule, grouping dimensions of the alarm event include, but are not limited to, a time dimension, a system dimension, a service dimension and the like, through the dimensions, an alarm fault list to be processed is generated, through the alarm fault list, the alarm event can be circulated and processed through a management means, and root cause information of all alarm events can be recorded while the recovery of all service system problems corresponding to alarms is ensured, so that an event processing standard flow is formed, a closed loop is completed, meanwhile, the alarm fault list is supported, a relational person is notified, and accurate scheduling is realized. Through the technical scheme, the embodiment can generate the alarm fault list to be processed based on the grouping dimension of the alarm event, and further realize the clustering method of 'alarm compression-alarm aggregation'.
And S13, analyzing the target alarm event in the alarm fault list, and sending an analysis result to a service system corresponding to the preset monitoring alarm tool so that the service system can be adjusted according to the analysis result.
In this embodiment, the target alarm event may be classified according to a preset data classification dimension by a naive bayes algorithm, and the alarm reason corresponding to the target alarm event may be classified, so as to analyze the target alarm event in the alarm fault list according to the classification result of the target alarm event and the alarm reason. The preset data classification dimension includes, but is not limited to, dimensions of an application system, a technology stack, an influence surface, a handler, and the like. It may be appreciated that, for the target alarm event in the alarm fault list, the fault reasons corresponding to the alarm event are classified, for example: the data value can be fully mined by changing reasons, network reasons and the like to find problems, solve problems, compound disk problems, classify problems and radically solve processes, so that the stable operation rate of a service system is continuously improved. It should be noted that the target alarm event in the alarm fault list may also include a historically generated alarm fault list, so as to calculate and analyze data according to service dimensions by combining with the historical alarm event, and facilitate improvement of monitoring effects of the service system according to the back feeding production of the analysis problem. Therefore, the target alarm event and the corresponding alarm reasons are classified by analyzing the target alarm event in the alarm fault list, so that the root cause information of the alarm is recorded to form an event processing standard flow, the effect of 'event association' is effectively realized, and the subsequent event analysis is facilitated.
Through the technical scheme, the alarm events obtained by the preset monitoring alarm tool are obtained and integrated, the alarm events are unified and put into a big data component to obtain initial alarm events, the technical stack instance of the service system and the upper-layer service system are subjected to health degree rating according to the historical base line and the alarm threshold value, so that the running health score of the service system is obtained, and whether the initial alarm events need to be subjected to event tracking is judged according to the running health score. If the event tracking is needed, generating a target alarm event according to the initial alarm event, executing event operation corresponding to the target alarm event, grouping the target alarm event according to a preset grouping rule, and generating alarm fault lists corresponding to a plurality of target alarm events. And then analyzing the target alarm event in the alarm fault list, and sending the analysis result to a service system corresponding to a preset monitoring alarm tool so that the service system can be adjusted according to the analysis result. Therefore, the data of each existing monitoring alarm tool can be integrated according to the operation health condition of the score evaluation system and the alarm event, so that mass storage is supported when the alarm event is processed, and the received alarm event can be managed more efficiently by showing the health degree to operation and maintenance personnel. And through the intelligent alarm of the alarm fault list, the circulation processing can be carried out on the alarm events through the management means, the service system problems corresponding to all alarms can be ensured to be recovered, and the root cause information of all alarm events can be recorded, so that an event processing standard flow is formed, and the closed loop is completed. According to the flow, the fault management of the three-level clustering method of 'alarm compression-alarm aggregation-event association' can be formed, alarms are compressed into events through noise reduction of different dimensions, corresponding alarm fault lists are formed for analysis, and the topology structure is combined to help operation and maintenance personnel to get insight into the root cause of the problem.
Based on the above embodiment, the present application can integrate the data of each existing monitoring alarm tool to process the alarm event, and the notification and tracking process of the alarm event will be described in detail in this embodiment. Referring to fig. 3, the embodiment of the application discloses a specific alarm event processing method, which comprises the following steps:
step S21, acquiring an alarm event obtained by a preset monitoring alarm tool, integrating the alarm event, obtaining an initial alarm event, grading the initial alarm event according to a preset health degree grading rule, and judging whether event tracking is required for the initial alarm event according to a grading result.
Step S22, if event tracking is required, merging the initial alarm event to an alarm event which is generated in advance to obtain the target alarm event, or directly generating a corresponding target alarm event according to the initial alarm event.
In this embodiment, as shown in fig. 4, if event tracking is required, in a specific embodiment, the initial alarm event may be combined with an alarm event that has been generated in advance to obtain a target alarm event, and then the target alarm event may be processed according to a processing flow corresponding to the generated alarm event.
In another specific embodiment, if there is no generated target alarm event corresponding to the initial alarm event, the corresponding target alarm event is directly generated according to the initial alarm event. After the target alarm event is generated, as shown in fig. 4, an event notification corresponding to the target alarm event needs to be generated, the event notification is sent to a first user (i.e. an operation and maintenance person on duty) through mail and/or enterprise WeChat, and a second user (i.e. an operation and maintenance person on duty of a professional group) is subjected to telephone notification related to the event notification based on a preset notification range through a preset intelligent robot and a chatOps operation and maintenance model, and the event notification is put into a kafka message queue, so that alarm information needed to be used in other scenes can be dealt with. It will be appreciated that the preset notification scope is partitioned according to the resource management system of the CMDB. By introducing the intelligent robot, the intelligent outbound of the alarm event notification can be realized, the second-level response of the alarm information is realized by combining the CMDB, and the accurate notification is realized. It can be understood that, when the alarm notification is performed, the report form, the large screen and other external display functions can be provided, so that the current situation analysis and the trend analysis are performed based on the large data component in the foregoing embodiment, and the event notification effect is improved.
It should be noted that, before generating an event notification corresponding to a target alarm event, a real-time index of a service system needs to be monitored, a target alarm index meeting a preset alarm threshold condition is screened from the real-time index, noise reduction processing is performed on the target alarm event according to the target alarm index, and the event notification corresponding to the target alarm event is generated according to the processed target alarm event. Through the noise reduction processing, unnecessary alarms can be reduced as much as possible through a preset compression algorithm so as to be processed more efficiently, and when the alarms are notified, an operation and maintenance model of ChatOps can be constructed, and through combination with the intelligent robot, intelligent outbound can be realized for operators on duty according to the generated time notification, thereby being beneficial to realizing real-time scheduling of alarm events, implementing display of alarm processing, linking with problem management and fault management, and realizing rapid scheduling and closed-loop management of the alarm events. In this way, as shown in fig. 4, a topology structure based on the alarm event master control, the users and the SRE (Site Reliability Engineer, website reliability engineers) is formed, which is helpful for assisting the operation and maintenance personnel to get insight into the root cause of the problem, so that the alarm event can be processed in a closed loop, and intelligent root cause analysis and auxiliary positioning are realized.
S23, grouping the target alarm events according to a preset grouping rule, and generating a plurality of alarm fault lists corresponding to the target alarm events; the alarm fault list is used for storing the target alarm event to be processed and the related information of the target alarm event.
And step S24, analyzing the target alarm event in the alarm fault list, and sending an analysis result to a service system corresponding to the preset monitoring alarm tool so that the service system can be adjusted according to the analysis result.
For more specific processing procedures in the steps S21, S23, and S24, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no detailed description is given here.
In this embodiment, if the alarm event needs to be tracked, the initial alarm event is merged to the alarm event that has been generated in advance to obtain the target alarm event, or the corresponding target alarm event is directly generated according to the initial alarm event. And when the generated target alarm event corresponding to the initial alarm event does not exist currently, directly generating a corresponding target alarm event according to the initial alarm event, then generating an event notification corresponding to the target alarm event, sending the event notification to the first user through a mail and/or enterprise WeChat, and carrying out telephone notification related to the event notification on the second user based on a preset notification range through a preset intelligent robot. And before generating an event notification corresponding to the target alarm event, monitoring a real-time index of the service system, screening target alarm indexes meeting the preset alarm threshold condition from the real-time indexes, carrying out noise reduction processing on the target alarm event according to the target alarm indexes, and generating the event notification corresponding to the target alarm event according to the processed target alarm event. In this way, the intelligent robot is introduced to realize intelligent outbound of alarm event notification, and the CMDB is combined to realize second-level response and accurate notification of alarm information.
Referring to fig. 5, the embodiment of the application also discloses an alarm event processing device, which comprises:
the event rating module 11 is configured to obtain an alarm event obtained by a preset monitoring alarm tool, integrate the alarm event, obtain an initial alarm event, rate the initial alarm event according to a preset health rating rule, and determine whether to perform event tracking on the initial alarm event according to a rating result;
the fault list generation module 12 is configured to generate a target alarm event according to the initial alarm event if event tracking is required, execute an event operation corresponding to the target alarm event, group the target alarm event according to a preset grouping rule, and generate a plurality of alarm fault lists corresponding to the target alarm event; the alarm fault list is used for storing the target alarm event to be processed and the related information of the target alarm event;
and the event analysis module 13 is used for analyzing the target alarm event in the alarm fault list and sending an analysis result to a service system corresponding to the preset monitoring alarm tool so that the service system can be adjusted according to the analysis result.
In this embodiment, an alarm event obtained by a preset monitoring alarm tool is obtained and integrated to obtain an initial alarm event, the initial alarm event is rated according to a preset health rating rule, and whether the initial alarm event needs to be subjected to event tracking is judged according to a rating result; if event tracking is needed, generating a target alarm event according to the initial alarm event, executing event operation corresponding to the target alarm event, grouping the target alarm event according to a preset grouping rule, and generating a plurality of alarm fault lists corresponding to the target alarm event; the alarm fault list is used for storing the target alarm event to be processed and the related information of the target alarm event; analyzing the target alarm event in the alarm fault list, and sending an analysis result to a service system corresponding to the preset monitoring alarm tool so that the service system can be adjusted according to the analysis result. Therefore, the data of each existing monitoring alarm tool can be integrated, the conditions of each tool and the corresponding alarm event are displayed to operation and maintenance personnel in a healthy degree, the operation and maintenance personnel can manage more conveniently and intuitively, and the problems that various monitoring tools are different in monitoring dimensionality and complex in technical stack monitoring tools and cannot be unified effectively are avoided. The intelligent alarm is realized through the fault list, the processing efficiency of the alarm event is improved without manual processing, and the alarm event is processed in a complete closed loop through the management flow of the alarm event by effectively utilizing the alarm history data.
In some specific embodiments, the event rating module 11 specifically includes:
the event acquisition unit is used for acquiring the alarm events obtained by the preset monitoring alarm tools, and uniformly storing the alarm events to the HADOOP big data component and/or the ES big data component by utilizing the Spark component so as to obtain the initial alarm event.
In some specific embodiments, the event rating module 11 specifically includes:
the health degree grading unit is used for obtaining a historical baseline and an alarm threshold value of the alarm generated by the business system monitored by the preset monitoring alarm tool, and grading the health degree of a technical stack instance of the business system and an upper business system according to the historical baseline and the alarm threshold value to obtain an operation health degree grading of the business system;
and the event tracking judgment unit is used for judging whether the event tracking of the initial alarm event is required according to the running health score.
In some specific embodiments, the fault ticket generating module 12 specifically includes:
the event merging unit is used for merging the initial alarm event to the alarm event which is generated in advance if the event tracking is needed, so as to obtain the target alarm event;
and the event generation sub-module is used for directly generating a corresponding target alarm event according to the initial alarm event.
In some embodiments, the event generation sub-module further comprises:
a notification generation unit, configured to generate an event notification corresponding to the target alarm event, send the event notification to a first user through mail and/or enterprise WeChat, perform a phone notification related to the event notification on a second user based on a preset notification range through a preset intelligent robot, and put the event notification into a kafka message queue; the preset notification range is divided according to a CMDB resource management system.
In some embodiments, the event generation sub-module further comprises:
the event processing unit is used for monitoring the real-time index of the service system, screening target alarm indexes meeting the preset alarm threshold condition from the real-time indexes, carrying out noise reduction processing on the target alarm event according to the target alarm indexes, and generating event notification corresponding to the target alarm event according to the processed target alarm event.
In some embodiments, the event analysis module 13 specifically includes:
the event classification unit is used for classifying the target alarm event through a naive Bayesian algorithm according to a preset data classification dimension, and classifying an alarm reason corresponding to the target alarm event so as to analyze the target alarm event in the alarm fault list according to the classification results of the target alarm event and the alarm reason.
Further, the embodiment of the present application further discloses an electronic device, and fig. 6 is a block diagram of an electronic device 20 according to an exemplary embodiment, where the content of the figure is not to be considered as any limitation on the scope of use of the present application.
Fig. 6 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is configured to store a computer program that is loaded and executed by the processor 21 to implement the relevant steps in the alarm event handling method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, and the like, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and computer programs 222, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the alarm event processing method performed by the electronic device 20 as disclosed in any of the previous embodiments.
Further, the application also discloses a computer readable storage medium for storing a computer program; wherein the computer program when executed by the processor implements the alert event processing method disclosed previously. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing has outlined rather broadly the more detailed description of the application in order that the detailed description of the application that follows may be better understood, and in order that the present principles and embodiments may be better understood; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (10)

1. A method for processing an alarm event, comprising:
acquiring an alarm event obtained by a preset monitoring alarm tool, integrating the alarm event to obtain an initial alarm event, grading the initial alarm event according to a preset health degree grading rule, and judging whether event tracking is required to be carried out on the initial alarm event according to a grading result;
if event tracking is needed, generating a target alarm event according to the initial alarm event, executing event operation corresponding to the target alarm event, grouping the target alarm event according to a preset grouping rule, and generating a plurality of alarm fault lists corresponding to the target alarm event; the alarm fault list is used for storing the target alarm event to be processed and the related information of the target alarm event;
analyzing the target alarm event in the alarm fault list, and sending an analysis result to a service system corresponding to the preset monitoring alarm tool so that the service system can be adjusted according to the analysis result.
2. The method for processing an alarm event according to claim 1, wherein the acquiring the alarm event obtained by the preset monitoring alarm tool and integrating the alarm event comprises:
and acquiring the alarm events obtained by a plurality of preset monitoring alarm tools, and uniformly storing the alarm events to a HADOOP big data component and/or an ES big data component by utilizing a Spark component so as to obtain the initial alarm event.
3. The method for processing an alarm event according to claim 1, wherein the step of grading the initial alarm event according to a preset health degree grading rule and determining whether the initial alarm event needs to be event tracked according to a grading result comprises:
acquiring a historical baseline and an alarm threshold value of the alarm generated by the service system monitored by the preset monitoring alarm tool, and grading the health degree of a technical stack instance of the service system and an upper-layer service system according to the historical baseline and the alarm threshold value to obtain an operation health score of the service system;
and judging whether event tracking is needed for the initial alarm event according to the running health score.
4. The method for processing an alarm event according to claim 1, wherein generating a target alarm event according to the initial alarm event if event tracking is required comprises:
if event tracking is needed, merging the initial alarm event to an alarm event which is generated in advance to obtain the target alarm event;
or directly generating a corresponding target alarm event according to the initial alarm event.
5. The method for processing an alarm event according to claim 4, wherein after directly generating a corresponding target alarm event according to the initial alarm event, further comprising:
generating an event notification corresponding to the target alarm event, sending the event notification to a first user through a mail and/or enterprise WeChat, carrying out a telephone notification related to the event notification on a second user based on a preset notification range through a preset intelligent robot, and putting the event notification into a kafka message queue; the preset notification range is divided according to a CMDB resource management system.
6. The method for processing an alarm event according to claim 5, further comprising, before generating an event notification corresponding to the target alarm event:
monitoring real-time indexes of the service system, screening target alarm indexes meeting preset alarm threshold conditions from the real-time indexes, carrying out noise reduction processing on the target alarm event according to the target alarm indexes, and generating event notification corresponding to the target alarm event according to the processed target alarm event.
7. The method according to any one of claims 1 to 6, wherein said analyzing the target alarm event in the alarm fault list comprises:
classifying the target alarm event through a naive Bayesian algorithm according to a preset data classification dimension, and classifying an alarm reason corresponding to the target alarm event so as to analyze the target alarm event in the alarm fault list according to the classification result of the target alarm event and the alarm reason.
8. An alarm event handling device, comprising:
the event rating module is used for acquiring an alarm event obtained by a preset monitoring alarm tool, integrating the alarm event, obtaining an initial alarm event, rating the initial alarm event according to a preset health rating rule, and judging whether the initial alarm event needs to be subjected to event tracking according to a rating result;
the fault list generation module is used for generating a target alarm event according to the initial alarm event if event tracking is required, executing event operation corresponding to the target alarm event, grouping the target alarm event according to a preset grouping rule and generating a plurality of alarm fault lists corresponding to the target alarm event; the alarm fault list is used for storing the target alarm event to be processed and the related information of the target alarm event;
and the event analysis module is used for analyzing the target alarm event in the alarm fault list and sending an analysis result to a service system corresponding to the preset monitoring alarm tool so that the service system can be adjusted according to the analysis result.
9. An electronic device comprising a processor and a memory; wherein the memory is for storing a computer program that is loaded and executed by the processor to implement the alarm event handling method of any of claims 1 to 7.
10. A computer readable storage medium for storing a computer program which when executed by a processor implements the alarm event handling method of any of claims 1 to 7.
CN202311048663.6A 2023-08-18 2023-08-18 Alarm event processing method, device, equipment and storage medium Pending CN117076264A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311048663.6A CN117076264A (en) 2023-08-18 2023-08-18 Alarm event processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311048663.6A CN117076264A (en) 2023-08-18 2023-08-18 Alarm event processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117076264A true CN117076264A (en) 2023-11-17

Family

ID=88716584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311048663.6A Pending CN117076264A (en) 2023-08-18 2023-08-18 Alarm event processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117076264A (en)

Similar Documents

Publication Publication Date Title
EP3798846B1 (en) Operation and maintenance system and method
CN108039959B (en) Data situation perception method, system and related device
EP3748501B1 (en) Service metric analysis from structured logging schema of usage data
EP2487860B1 (en) Method and system for improving security threats detection in communication networks
US9219639B2 (en) Automated alert management
US11816586B2 (en) Event identification through machine learning
CN110581773A (en) automatic service monitoring and alarm management system
US9798576B2 (en) Updating and redistributing process templates with configurable activity parameters
US11042525B2 (en) Extracting and labeling custom information from log messages
US9772873B2 (en) Generating process templates with configurable activity parameters by merging existing templates
CN113704065A (en) Monitoring method, device, equipment and computer storage medium
CN110209518A (en) A kind of multi-data source daily record data, which is concentrated, collects storage method and device
CN110221947A (en) Warning information method for inspecting, system, computer installation and readable storage medium storing program for executing
CN102882701A (en) Alarm system and method for intelligently monitoring power grid core service data
Solmaz et al. ALACA: A platform for dynamic alarm collection and alert notification in network management systems
CN103986607A (en) Voice-sound-light alarm monitoring system for intelligent data center
CN110363381B (en) Information processing method and device
CN111339062A (en) Data monitoring method and device, electronic equipment and storage medium
CN114116872A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN113592114A (en) User fault reporting research and judgment method and device in power grid, computer equipment and storage medium
CN115514618A (en) Alarm event processing method and device, electronic equipment and medium
CN117076264A (en) Alarm event processing method, device, equipment and storage medium
CN116795631A (en) Service system monitoring alarm method, device, equipment and medium
CN110677271A (en) Big data alarm method, device, equipment and storage medium based on ELK
CN114816943A (en) Enterprise intelligent cloud operation and maintenance system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination