CN116627695B - Alarm event root cause recommendation method, device, equipment and storage medium - Google Patents
Alarm event root cause recommendation method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN116627695B CN116627695B CN202310594906.XA CN202310594906A CN116627695B CN 116627695 B CN116627695 B CN 116627695B CN 202310594906 A CN202310594906 A CN 202310594906A CN 116627695 B CN116627695 B CN 116627695B
- Authority
- CN
- China
- Prior art keywords
- alarm
- root cause
- event
- root
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000003860 storage Methods 0.000 title claims abstract description 18
- 238000012163 sequencing technique Methods 0.000 claims abstract description 68
- 238000004590 computer program Methods 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 11
- 230000002776 aggregation Effects 0.000 claims description 9
- 238000004220 aggregation Methods 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 8
- 238000012216 screening Methods 0.000 claims description 8
- 238000012423 maintenance Methods 0.000 abstract description 15
- 230000002159 abnormal effect Effects 0.000 abstract description 8
- 238000004458 analytical method Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000007405 data analysis Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000004140 cleaning Methods 0.000 description 2
- 230000002498 deadly effect Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000011010 flushing procedure Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/32—Monitoring with visual or acoustical indication of the functioning of the machine
- G06F11/324—Display of status information
- G06F11/327—Alarm or error message display
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Debugging And Monitoring (AREA)
- Alarm Systems (AREA)
Abstract
The invention discloses a method, a device, equipment and a storage medium for recommending the root cause of an alarm event. Comprising the following steps: acquiring an alarm data sample, and acquiring event root cause sequencing based on the alarm data sample, wherein the alarm data sample comprises standard alarm data and corresponding alarm root causes; determining the corresponding target influence condition of each alarm root cause according to the event root cause sequencing; and determining root cause scores corresponding to the alarm root causes according to the target influence conditions, and recommending the root causes according to the root cause scores. The acquired alarm data samples are displayed to a user to acquire event root cause sequencing, and the corresponding target influence conditions of each alarm root cause can be determined according to the event root cause sequencing, so that the root cause recommendation accuracy is ensured. And finally, root cause recommendation is carried out by determining root cause scores corresponding to the alarm root causes, and true root causes of abnormal problems are rapidly positioned by multi-dimensional input and manual selection, so that the workload of operation and maintenance personnel is reduced, and the working efficiency is improved.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for recommending root causes of alarm events.
Background
In the existing enterprises, various operation and maintenance tools are generated along with the continuous improvement and reinforcement of operation and maintenance systems, and operation and maintenance monitoring of a plurality of layers of business, network, system and the like is covered, and each layer can be subdivided into different infrastructures, monitoring facilities and management facilities. When the system is abnormal, tens or even hundreds of alarms can be generated in a short time, so that the real cause of the problem that operation and maintenance personnel are difficult to locate is caused.
The alarm event platform in the prior art can only aggregate different alarms to the same platform to perform an alarm convergence forwarding function, and cannot accurately position the root cause of the alarm event, so that the workload of operation and maintenance personnel is large and the working efficiency is low.
Disclosure of Invention
The invention provides a root cause recommendation method, device, equipment and storage medium for alarm events, which are used for realizing root cause analysis of the alarm events.
According to an aspect of the present invention, there is provided a method for recommending root cause of an alarm event, the method comprising:
Acquiring an alarm data sample, and acquiring event root cause sequencing based on the alarm data sample, wherein the alarm data sample comprises standard alarm data and corresponding alarm root causes;
determining the corresponding target influence condition of each alarm root cause according to the event root cause sequencing;
And determining root cause scores corresponding to the alarm root causes according to the target influence conditions, and recommending the root causes according to the root cause scores.
Optionally, acquiring the alert data sample includes: collecting historical alarm events through a data aggregation platform, wherein the historical alarm events comprise historical alarm data and corresponding alarm root causes; performing specified format conversion on the historical alarm data to generate standard alarm data; and generating an alarm data sample according to the corresponding relation between the standard alarm data and the alarm root cause.
Optionally, acquiring the event root cause ordering based on the alert data samples includes: acquiring a target time range input by a user, and determining a time stamp corresponding to each alarm data sample; screening each time stamp according to the target time range to obtain a target data sample corresponding to the target time range; and acquiring event root cause sequencing input by a user based on the target data sample, wherein the event root cause sequencing is alarm root cause sequencing corresponding to each standard alarm data.
Optionally, determining the target influence condition corresponding to each alarm root cause according to the event root cause ordering includes: acquiring influence conditions, wherein the influence conditions comprise occurrence time, event grade, history frequency, key characters, event sources, belonging levels and alarm time periods; sequentially matching the alarm root causes corresponding to the standard alarm data with the influence conditions to determine root cause influence conditions corresponding to the alarm root causes; determining the frequency and the frequency of each root cause affecting the condition; and when the frequency and the frequency meet the preset threshold, taking the root cause influence condition as a target influence condition.
Optionally, determining root cause scores corresponding to the alarm root causes according to the target influence conditions includes: acquiring weights set by a user based on target influence conditions; the weight is input into a preset algorithm to calculate root cause scores corresponding to the alarm root causes.
Optionally, after determining the target influencing condition according to the event root cause ordering, the method further comprises: displaying the target influence condition for the user to select; taking the target influence condition selected by the user as a screened target influence condition; determining root cause scores corresponding to the alarm root causes according to target influence conditions, wherein the method comprises the following steps: and determining root cause scores corresponding to the alarm root causes according to the screened target influence conditions.
Optionally, root cause recommendation is performed according to each root cause score, including: sequencing the alarm root causes according to the sequence of the root cause scores from high to low to generate a root cause sequencing list; and selecting the alarm root causes with specified quantity from the root cause sequencing list to conduct root cause recommendation.
According to another aspect of the present invention, there is provided an alarm event root cause recommending apparatus, the apparatus comprising:
the system comprises an event root cause sequencing acquisition module, a processing module and a processing module, wherein the event root cause sequencing acquisition module is used for acquiring an alarm data sample and acquiring event root cause sequencing based on the alarm data sample, wherein the alarm data sample comprises standard alarm data and corresponding alarm root causes;
The target influence condition determining module is used for determining the corresponding target influence condition of each alarm root cause according to the event root cause sequencing;
And the root cause score calculating and recommending module is used for determining the root cause score corresponding to each alarm root cause according to the target influence condition and recommending the root cause according to each root cause score.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform an alert event root recommendation method according to any of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement an alarm event root cause recommendation method according to any of the embodiments of the present invention when executed.
According to the technical scheme, the acquired alarm data samples are displayed to the user to acquire the event root cause sequencing, and the corresponding target influence conditions of each alarm root cause can be determined according to the event root cause sequencing, so that the root cause recommendation accuracy is ensured. And finally, root cause recommendation is carried out by determining root cause scores corresponding to the alarm root causes, and true root causes of abnormal problems are rapidly positioned by multi-dimensional input and manual selection, so that the workload of operation and maintenance personnel is reduced, and the working efficiency is improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an alarm event root cause recommendation method according to a first embodiment of the present invention;
FIG. 2 is a flowchart of another method for recommending root cause of an alarm event according to a second embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an alarm event root cause recommending apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device implementing a method for recommending root cause of an alarm event according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of an alarm event root cause recommending method according to an embodiment of the present invention, where the method may be performed by an alarm event root cause recommending apparatus, the alarm event root cause recommending apparatus may be implemented in hardware and/or software, and the alarm event root cause recommending apparatus may be configured in a computer. As shown in fig. 1, the method includes:
S110, acquiring an alarm data sample, and acquiring event root cause sequencing based on the alarm data sample, wherein the alarm data sample comprises standard alarm data and corresponding alarm root causes.
The alarm data refer to data generated when the operation and maintenance system operates abnormally. The alarm data sample comprises standard alarm data and corresponding alarm root cause, the standard alarm data refers to alarm data from multiple platforms after unified format by a user, for example, a data aggregation platform carried on the bottom layer can be used for collecting all the multiple types of alarm data from a network, a host and a service platform, and the methods such as regular analysis, KV decomposition or JSON analysis are used for data conversion, cleaning and filtering, and the formats of various types of data are unified to generate standard alarm data. The alarm root causes refer to important influence factors of alarm data, and the embodiment is carried on a mature big data analysis and aggregation platform, and the true root causes of abnormal problems are rapidly positioned through multi-dimensional input and manual selection.
Optionally, acquiring the alert data sample includes: collecting historical alarm events through a data aggregation platform, wherein the historical alarm events comprise historical alarm data and corresponding alarm root causes; performing specified format conversion on the historical alarm data to generate standard alarm data; and generating an alarm data sample according to the corresponding relation between the standard alarm data and the alarm root cause.
Specifically, the historical alarm event may be from multiple platforms, for example, a network, a host, and a service, where the historical alarm event includes historical alarm data and a corresponding alarm root cause, the historical alarm data may be subjected to specified format conversion to generate standard alarm data by aggregating and cleaning the data of multiple platforms, and finally an alarm data sample may be generated according to the corresponding relationship between the standard alarm data and the alarm root cause. In addition, when the alarm data sample is generated, the data can be classified in a hierarchical manner by using a custom dictionary or a data label mode, so that the event aggregation requirement is better met.
Optionally, acquiring the event root cause ordering based on the alert data samples includes: acquiring a target time range input by a user, and determining a time stamp corresponding to each alarm data sample; screening each time stamp according to the target time range to obtain a target data sample corresponding to the target time range; and acquiring event root cause sequencing input by a user based on the target data sample, wherein the event root cause sequencing is alarm root cause sequencing corresponding to each standard alarm data.
The event root cause sequencing refers to a result of inputting a large amount of historical alarm data as sample data into the system and manually combing the historical data by using a system training function. The event root order is the alarm root order corresponding to each standard alarm data.
Specifically, the user can manually select an abnormal time period, namely, input a target time range, at this time, the controller can determine the time stamp corresponding to each alarm data sample, then screen each time stamp according to the target time range, and the target data samples in the target time range can be obtained after screening, and the controller can screen all the target data samples in the target time range to the same page for the user to order the root cause of the alarm. By way of example, a transaction exception condition occurs in a system in history, and the system can automatically screen all alarms of the system in the period to the same page by manually selecting a target time range due to overflow of a program memory, and the root causes of the alarms are ordered manually, for example, the event root causes ordering includes first memory overflow and flushing, second service exception and third induced network exception, and the host CPU raises fourth and fifth transaction exception.
S120, determining the corresponding target influence condition of each alarm root cause according to the event root cause sequence.
Specifically, through event root cause sequencing, the occurrence frequency, the event grade, the generation source and the occurrence frequency of a fixed time period (for example, 10 minutes is set as a section) of each event are searched in a large amount of historical data by using the sequenced events, the importance of the condition in root cause analysis is recommended according to the frequency, the frequency of the same source and the like, and a data analysis basis is provided for the condition of influence of the follow-up artificial setting root cause. By sorting different faults of different historical systems, a plurality of root cause influence condition recommendations are formed and recorded, and finally, conditions with higher influence on the root cause are obtained, for example, target influence conditions can comprise: event level impact level: advanced, event source: low-level, time of occurrence: intermediate level, belonging level: intermediate stage.
S130, determining root cause scores corresponding to the alarm root causes according to the target influence conditions, and recommending the root causes according to the root cause scores.
Optionally, determining root cause scores corresponding to the alarm root causes according to the target influence conditions includes: acquiring weights set by a user based on target influence conditions; the weight is input into a preset algorithm to calculate root cause scores corresponding to the alarm root causes.
Specifically, the user can perform weight distribution on the target influence conditions, the target influence conditions are classified into 1-5 levels, the condition level setting with high influence is higher, and the condition setting level with low influence is lower. The influencing conditions are classified by different conditions (1-3 levels), and the alarm frequency (4 levels) is exemplified by the following way: the number of times of history occurrence is 0-3 (level 3), the number of times of history occurrence is 3-10 (level 2), and the number of times of history occurrence is more than 10 (level 1). And then inputting the weight into a preset algorithm, namely calculating root cause scores corresponding to the alarm root causes, wherein the preset algorithm can be a preset root cause score calculation formula, and the root cause scores=high in level (weight level is secondary weight level) +early in time+contains key information+belongs to a level. Wherein, the score may be set to 5 for the influence condition before 4:00, and 8 for the influence condition after 4:00, according to whether the time of the influence condition is early, whether the influence condition contains key information, and the belonging hierarchy preset scoring criteria. The key information may be set according to the type of alarm event, for example, the key information may be "deadly". The hierarchy is a data source, for example, a network event may be set to a score of 5, a business event may be set to a score of 8, and a host event may be set to a score of 10.
Optionally, after determining the target influencing condition according to the event root cause ordering, the method further comprises: displaying the target influence condition for the user to select; taking the target influence condition selected by the user as a screened target influence condition; determining root cause scores corresponding to the alarm root causes according to target influence conditions, wherein the method comprises the following steps: and determining root cause scores corresponding to the alarm root causes according to the screened target influence conditions.
Specifically, the user can also manually screen the target influence conditions, the controller can display the target influence conditions through the user terminal, and the user can screen influence condition factors suitable for the existing environment of the enterprise from the existing target influence conditions according to own operation and maintenance experience to serve as follow-up root cause analysis input. After the manual intervention root cause means is added in the embodiment, the root cause analysis effect can be better adapted to different environments and different enterprises, and the flexibility and adaptability of root cause analysis are improved. Can help enterprises better find and solve problems.
Optionally, root cause recommendation is performed according to each root cause score, including: sequencing the alarm root causes according to the sequence of the root cause scores from high to low to generate a root cause sequencing list; and selecting the alarm root causes with specified quantity from the root cause sequencing list to conduct root cause recommendation.
Specifically, the controller may compare root cause scores of the alarm root causes, sort the alarm root causes according to the order of the root cause scores from large to small to generate a root cause sorting list, and then select a specified number of alarm root causes from the root cause sorting list to perform root cause recommendation, where the specified number may be set according to user needs, and the specified number may be 3, at this time, the controller may select 3 alarm root causes from the root cause sorting list to perform root cause recommendation, and it should be noted that, when performing root cause recommendation, the controller defaults to select the first 3 alarm root causes with the highest root cause scores as recommended root cause results, and the user may also change the recommendation targets, for example, may use the 5 th to 7 th root cause scores as recommended root cause results.
Further, the root cause recommendation can send the root cause of the fault to the operation and maintenance personnel, for example, a mail server can be conventionally called, and the result can be sent to a mailbox of each operation and maintenance personnel, so that the scene of quickly finding the fault and analyzing the root cause can be realized.
According to the technical scheme, the acquired alarm data samples are displayed to the user to acquire the event root cause sequencing, and the corresponding target influence conditions of each alarm root cause can be determined according to the event root cause sequencing, so that the root cause recommendation accuracy is ensured. And finally, root cause recommendation is carried out by determining root cause scores corresponding to the alarm root causes, and true root causes of abnormal problems are rapidly positioned by multi-dimensional input and manual selection, so that the workload of operation and maintenance personnel is reduced, and the working efficiency is improved.
Example two
Fig. 2 is a flowchart of an alarm event root cause recommendation method according to a second embodiment of the present invention, and a specific process of determining the target influence condition corresponding to each alarm root cause according to the event root cause ordering is added on the basis of the present embodiment and the first embodiment. The specific contents of steps S210 and S260 are substantially the same as steps S110 and S130 in the first embodiment, so that a detailed description is omitted in this embodiment. As shown in fig. 2, the method includes:
s210, acquiring an alarm data sample, and acquiring event root cause sequencing based on the alarm data sample, wherein the alarm data sample comprises standard alarm data and corresponding alarm root causes.
Optionally, acquiring the alert data sample includes: collecting historical alarm events through a data aggregation platform, wherein the historical alarm events comprise historical alarm data and corresponding alarm root causes; performing specified format conversion on the historical alarm data to generate standard alarm data; and generating an alarm data sample according to the corresponding relation between the standard alarm data and the alarm root cause.
Optionally, acquiring the event root cause ordering based on the alert data samples includes: acquiring a target time range input by a user, and determining a time stamp corresponding to each alarm data sample; screening each time stamp according to the target time range to obtain a target data sample corresponding to the target time range; and acquiring event root cause sequencing input by a user based on the target data sample, wherein the event root cause sequencing is alarm root cause sequencing corresponding to each standard alarm data.
S220, acquiring influence conditions, wherein the influence conditions comprise occurrence time, event level, history frequency, key characters, event sources, affiliated levels and alarm time periods.
The influence condition refers to the influence condition of the root cause, and specifically comprises occurrence time, event level, history frequency, key character, event source, belonging level and alarm period. Wherein, the occurrence time refers to the specific time of generating an alarm event, and the event level comprises general, warning, serious and recovery; historical frequency refers to the frequency of occurrence of the alarm event in history and can be represented by more or less; the key characters refer to key characters contained in the alarm event, for example, the key characters may be "deadly", and the event source refers to which operation and maintenance system or infrastructure the alarm event is generated on; the hierarchy refers to whether the alarm event belongs to a network event, a service event or a host event; the alarm period refers to the time period in which the alarm event generally occurs from point to point.
And S230, sequentially matching the alarm root causes corresponding to the standard alarm data with the influence conditions to determine root cause influence conditions corresponding to the alarm root causes.
S240, determining the frequency and the frequency of each root cause affecting the condition.
S250, taking the root cause influence condition as a target influence condition when the frequency and the frequency meet a preset threshold.
Specifically, the alarm root causes corresponding to the standard alarm data are sequentially matched with the influence conditions, the occurrence frequency, the event grade, the generation source and the occurrence frequency of a fixed time period (for example, 10 minutes are set up as one period) of each event are searched in a large amount of historical data, so that the root cause influence conditions corresponding to the alarm root causes are determined, the frequency and the frequency of each root cause influence condition can be further determined, when the frequency and the frequency meet a preset threshold value, the condition that the root cause influence conditions have higher influence on faults is indicated, and the root cause influence conditions can be used as target influence conditions. In the embodiment, the importance of the condition in root cause analysis is recommended according to the frequency, the frequency of the same source and the like, and a data analysis basis is provided for the subsequent artificial setting of root cause influence conditions.
And S260, determining root cause scores corresponding to the alarm root causes according to the target influence conditions, and recommending the root causes according to the root cause scores.
Optionally, determining root cause scores corresponding to the alarm root causes according to the target influence conditions includes: acquiring weights set by a user based on target influence conditions; the weight is input into a preset algorithm to calculate root cause scores corresponding to the alarm root causes.
Optionally, after determining the target influencing condition according to the event root cause ordering, the method further comprises: displaying the target influence condition for the user to select; taking the target influence condition selected by the user as a screened target influence condition; determining root cause scores corresponding to the alarm root causes according to target influence conditions, wherein the method comprises the following steps: and determining root cause scores corresponding to the alarm root causes according to the screened target influence conditions.
Optionally, root cause recommendation is performed according to each root cause score, including: sequencing the alarm root causes according to the sequence of the root cause scores from high to low to generate a root cause sequencing list; and selecting the alarm root causes with specified quantity from the root cause sequencing list to conduct root cause recommendation.
According to the technical scheme, the acquired alarm data samples are displayed to the user to acquire the event root cause sequencing, and the corresponding target influence conditions of each alarm root cause can be determined according to the event root cause sequencing, so that the root cause recommendation accuracy is ensured. And finally, root cause recommendation is carried out by determining root cause scores corresponding to the alarm root causes, and true root causes of abnormal problems are rapidly positioned by multi-dimensional input and manual selection, so that the workload of operation and maintenance personnel is reduced, and the working efficiency is improved.
Example III
Fig. 3 is a schematic structural diagram of an alarm event root cause recommending apparatus according to a third embodiment of the present invention. As shown in fig. 3, the apparatus includes:
An event root cause sequencing obtaining module 310, configured to obtain an alarm data sample, and obtain an event root cause sequencing based on the alarm data sample, where the alarm data sample includes standard alarm data and a corresponding alarm root cause;
the target influence condition determining module 320 is configured to determine, according to the event root order, a target influence condition corresponding to each alarm root;
And the root score calculating and recommending module 330 is configured to determine root scores corresponding to the alarm root according to the target influencing conditions, and recommend root according to the root scores.
Optionally, the event root cause ordering obtaining module 310 specifically includes: an alarm data sample acquisition unit configured to: collecting historical alarm events through a data aggregation platform, wherein the historical alarm events comprise historical alarm data and corresponding alarm root causes; performing specified format conversion on the historical alarm data to generate standard alarm data; and generating an alarm data sample according to the corresponding relation between the standard alarm data and the alarm root cause.
Optionally, the event root cause ordering obtaining module 310 specifically includes: the event root cause sequencing acquisition unit is used for acquiring a target time range input by a user and determining a time stamp corresponding to each alarm data sample; screening each time stamp according to the target time range to obtain a target data sample corresponding to the target time range; and acquiring event root cause sequencing input by a user based on the target data sample, wherein the event root cause sequencing is alarm root cause sequencing corresponding to each standard alarm data.
Optionally, the target influence condition determining module 320 is configured to: acquiring influence conditions, wherein the influence conditions comprise occurrence time, event grade, history frequency, key characters, event sources, belonging levels and alarm time periods; sequentially matching the alarm root causes corresponding to the standard alarm data with the influence conditions to determine root cause influence conditions corresponding to the alarm root causes; determining the frequency and the frequency of each root cause affecting the condition; and when the frequency and the frequency meet the preset threshold, taking the root cause influence condition as a target influence condition.
Optionally, the root cause score calculating and recommending module 330 specifically includes: root cause score determining unit for: acquiring weights set by a user based on target influence conditions; the weight is input into a preset algorithm to calculate root cause scores corresponding to the alarm root causes.
Optionally, the apparatus further comprises: the root cause screening module is used for displaying the target influence conditions for the user to select after determining the target influence conditions according to the event root cause sequencing; taking the target influence condition selected by the user as a screened target influence condition; determining root cause scores corresponding to the alarm root causes according to target influence conditions, wherein the method comprises the following steps: and determining root cause scores corresponding to the alarm root causes according to the screened target influence conditions.
Optionally, the root cause score calculating and recommending module 330 specifically includes: root cause recommendation unit for: sequencing the alarm root causes according to the sequence of the root cause scores from high to low to generate a root cause sequencing list; and selecting the alarm root causes with specified quantity from the root cause sequencing list to conduct root cause recommendation.
According to the technical scheme, the acquired alarm data samples are displayed to the user to acquire the event root cause sequencing, and the corresponding target influence conditions of each alarm root cause can be determined according to the event root cause sequencing, so that the root cause recommendation accuracy is ensured. And finally, root cause recommendation is carried out by determining root cause scores corresponding to the alarm root causes, and true root causes of abnormal problems are rapidly positioned by multi-dimensional input and manual selection, so that the workload of operation and maintenance personnel is reduced, and the working efficiency is improved.
The alarm event root recommendation method provided by any embodiment of the invention can be executed by the alarm event root recommendation device, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 4 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as an alert event root cause recommendation method.
In some embodiments, an alert event root cause recommendation method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of a method of alert event root recommendation described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform an alarm event root cause recommendation method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Claims (7)
1. An alarm event root cause recommendation method, comprising:
Acquiring an alarm data sample, and acquiring event root cause sequencing based on the alarm data sample, wherein the alarm data sample comprises standard alarm data and corresponding alarm root causes;
determining the target influence conditions corresponding to the alarm root causes according to the event root cause sequencing, wherein the event root cause sequencing is the alarm root cause sequencing corresponding to each standard alarm data;
determining root cause scores corresponding to the alarm root causes according to the target influence conditions, and recommending the root causes according to the root cause scores;
Wherein the determining, according to the event root cause ordering, each of the alert root causes corresponding to a target influence condition includes:
Obtaining influence conditions, wherein the influence conditions comprise occurrence time, event grade, history frequency, key characters, event sources, belonging levels and alarm time periods; the occurrence time refers to the specific time of the generation of the alarm event; the historical frequency refers to the frequency of occurrence of alarm events in history; the key characters refer to key characters contained in the alarm event; the alarm time period refers to a time period when an alarm event occurs;
sequentially matching the alarm root causes corresponding to the standard alarm data with the influence conditions to determine root cause influence conditions corresponding to the alarm root causes;
Determining the frequency and the frequency of each root cause influence condition;
When the frequency and the frequency meet a preset threshold, taking the root cause influence condition as the target influence condition;
wherein, obtain the warning data sample, include:
collecting historical alarm events through a data aggregation platform, wherein the historical alarm events comprise historical alarm data and corresponding alarm root causes;
Performing specified format conversion on the historical alarm data to generate standard alarm data;
generating the alarm data sample according to the corresponding relation between the standard alarm data and the alarm root cause, including: hierarchical classification is carried out on each standard alarm data by utilizing a custom dictionary or a data tag mode;
Wherein the obtaining the event root cause ordering based on the alert data samples comprises:
Acquiring a target time range input by a user, and determining a time stamp corresponding to each alarm data sample;
Screening each time stamp according to the target time range to obtain a target data sample corresponding to the target time range;
and acquiring event root cause sequencing input by a user based on the target data sample.
2. The method of claim 1, wherein said determining a root cause score for each of said alert root causes based on said target impact condition comprises:
acquiring weights set by a user based on the target influence conditions;
and inputting the weight into a preset algorithm to calculate root cause scores corresponding to the alarm root causes.
3. The method of claim 2, further comprising, after said determining a target impact condition based on said event root cause ordering:
displaying the target influence condition for the user to select;
Taking the target influence condition selected by the user as a screened target influence condition;
the determining root cause scores corresponding to the alarm root causes according to the target influence conditions comprises the following steps:
And determining root cause scores corresponding to the alarm root causes according to the screened target influence conditions.
4. The method of claim 1, wherein said making root cause recommendations based on each of said root cause scores comprises:
sequencing the alarm root causes according to the sequence from the root cause score to the small to generate a root cause sequencing list;
And selecting a specified number of alarm root causes from the root cause sequencing list to conduct root cause recommendation.
5. An alarm event root cause recommending apparatus, comprising:
The system comprises an event root cause sequencing acquisition module, a processing module and a processing module, wherein the event root cause sequencing acquisition module is used for acquiring an alarm data sample and acquiring event root cause sequencing based on the alarm data sample, wherein the alarm data sample comprises standard alarm data and corresponding alarm root causes;
The target influence condition determining module is used for determining the target influence condition corresponding to each alarm root cause according to the event root cause sequencing, wherein the event root cause sequencing is the alarm root cause sequencing corresponding to each standard alarm data;
The root cause score calculating and recommending module is used for determining the root cause score corresponding to each alarm root cause according to the target influence condition and recommending the root cause according to each root cause score;
the target influence condition determining module is specifically configured to:
Obtaining influence conditions, wherein the influence conditions comprise occurrence time, event grade, history frequency, key characters, event sources, belonging levels and alarm time periods; the occurrence time refers to the specific time of the generation of the alarm event; the historical frequency refers to the frequency of occurrence of alarm events in history; the key characters refer to key characters contained in the alarm event; the alarm time period refers to a time period when an alarm event occurs;
sequentially matching the alarm root causes corresponding to the standard alarm data with the influence conditions to determine root cause influence conditions corresponding to the alarm root causes; determining the frequency and the frequency of each root cause influence condition; when the frequency and the frequency meet a preset threshold, taking the root cause influence condition as the target influence condition;
The event root cause sequencing acquisition module specifically comprises: an alarm data sample acquisition unit configured to:
collecting historical alarm events through a data aggregation platform, wherein the historical alarm events comprise historical alarm data and corresponding alarm root causes;
Performing specified format conversion on the historical alarm data to generate standard alarm data;
generating the alarm data sample according to the corresponding relation between the standard alarm data and the alarm root cause, including: hierarchical classification is carried out on each standard alarm data by utilizing a custom dictionary or a data tag mode;
The event root cause sequencing acquisition module specifically comprises: the event root cause sequencing obtaining unit is used for:
Acquiring a target time range input by a user, and determining a time stamp corresponding to each alarm data sample;
Screening each time stamp according to the target time range to obtain a target data sample corresponding to the target time range;
and acquiring event root cause sequencing input by a user based on the target data sample.
6. An electronic device, the electronic device comprising:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.
7. A computer storage medium storing computer instructions for causing a processor to perform the method of any one of claims 1-4 when executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310594906.XA CN116627695B (en) | 2023-05-24 | 2023-05-24 | Alarm event root cause recommendation method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310594906.XA CN116627695B (en) | 2023-05-24 | 2023-05-24 | Alarm event root cause recommendation method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116627695A CN116627695A (en) | 2023-08-22 |
CN116627695B true CN116627695B (en) | 2024-05-14 |
Family
ID=87637736
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310594906.XA Active CN116627695B (en) | 2023-05-24 | 2023-05-24 | Alarm event root cause recommendation method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116627695B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112087334A (en) * | 2020-09-09 | 2020-12-15 | 中移(杭州)信息技术有限公司 | Alarm root cause analysis method, electronic device and storage medium |
WO2022252512A1 (en) * | 2021-06-01 | 2022-12-08 | 深圳前海微众银行股份有限公司 | Root cause analysis method and apparatus, electronic device, medium, and program |
CN115794473A (en) * | 2022-12-20 | 2023-03-14 | 北京博睿宏远数据科技股份有限公司 | Root cause alarm positioning method, device, equipment and medium |
CN116032725A (en) * | 2022-12-27 | 2023-04-28 | 中国联合网络通信集团有限公司 | Method and device for generating fault root cause positioning model |
-
2023
- 2023-05-24 CN CN202310594906.XA patent/CN116627695B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112087334A (en) * | 2020-09-09 | 2020-12-15 | 中移(杭州)信息技术有限公司 | Alarm root cause analysis method, electronic device and storage medium |
WO2022252512A1 (en) * | 2021-06-01 | 2022-12-08 | 深圳前海微众银行股份有限公司 | Root cause analysis method and apparatus, electronic device, medium, and program |
CN115794473A (en) * | 2022-12-20 | 2023-03-14 | 北京博睿宏远数据科技股份有限公司 | Root cause alarm positioning method, device, equipment and medium |
CN116032725A (en) * | 2022-12-27 | 2023-04-28 | 中国联合网络通信集团有限公司 | Method and device for generating fault root cause positioning model |
Also Published As
Publication number | Publication date |
---|---|
CN116627695A (en) | 2023-08-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115033463B (en) | System exception type determining method, device, equipment and storage medium | |
CN114662953A (en) | Internet of things equipment operation and maintenance method, device, equipment and medium | |
CN116627695B (en) | Alarm event root cause recommendation method, device, equipment and storage medium | |
CN117034149A (en) | Fault processing strategy determining method and device, electronic equipment and storage medium | |
CN116755974A (en) | Cloud computing platform operation and maintenance method and device, electronic equipment and storage medium | |
CN115774648A (en) | Abnormity positioning method, device, equipment and storage medium | |
CN115048352B (en) | Log field extraction method, device, equipment and storage medium | |
CN114708117B (en) | Power utilization safety check rating method, device and equipment integrating priori knowledge | |
CN115794744A (en) | Log display method, device, equipment and storage medium | |
CN115801589A (en) | Event topological relation determining method, device, equipment and storage medium | |
CN114661562A (en) | Data warning method, device, equipment and medium | |
CN115146986A (en) | Data center equipment maintenance method, device, equipment and storage medium | |
CN114884813A (en) | Network architecture determination method and device, electronic equipment and storage medium | |
CN116149933B (en) | Abnormal log data determining method, device, equipment and storage medium | |
CN116822740A (en) | Power distribution network operation and maintenance scheme determining method and device, electronic equipment and storage medium | |
CN115129538A (en) | Event processing method, device, equipment and medium | |
CN118244055A (en) | Power distribution network fault processing method, device, equipment and storage medium | |
CN117934152A (en) | Risk assessment method, device, equipment and storage medium after system change | |
CN118426996A (en) | Fault positioning method, device, equipment and medium in containerized application system | |
CN116228199A (en) | Method, device, equipment and medium for acquiring vehicle problem processing countermeasures | |
CN118037414A (en) | Project risk management method and device, electronic equipment and storage medium | |
CN116467198A (en) | Method, device, electronic equipment and storage medium for determining performance actual measurement necessity | |
CN118394597A (en) | Method, device, equipment and medium for detecting index data abnormality under call chain log | |
CN117829611A (en) | Subcontractor management risk assessment early warning method based on artificial intelligence | |
CN117632617A (en) | Method, device, equipment and medium for determining chaotic experiment treatment mode |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |