WO2021217865A1 - Method and apparatus for locating root cause of alarm, computer device, and storage medium - Google Patents

Method and apparatus for locating root cause of alarm, computer device, and storage medium Download PDF

Info

Publication number
WO2021217865A1
WO2021217865A1 PCT/CN2020/099434 CN2020099434W WO2021217865A1 WO 2021217865 A1 WO2021217865 A1 WO 2021217865A1 CN 2020099434 W CN2020099434 W CN 2020099434W WO 2021217865 A1 WO2021217865 A1 WO 2021217865A1
Authority
WO
WIPO (PCT)
Prior art keywords
alarm
specified
root cause
cluster
designated
Prior art date
Application number
PCT/CN2020/099434
Other languages
French (fr)
Chinese (zh)
Inventor
陈桢博
金戈
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021217865A1 publication Critical patent/WO2021217865A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/064Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving time analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device, computer equipment, and storage medium for locating the root cause of an alarm.
  • the main purpose of this application is to provide a method, device, computer equipment, and storage medium for locating the root cause of an alarm. It aims to solve the existing method of locating the root cause of an alarm.
  • the alarm objects at all times are divided into different problems, and the root cause analysis is performed for each problem to determine the fault object.
  • the workload of the operation and maintenance personnel is large, the operation and maintenance work takes a long time, and the operation and maintenance work efficiency is low.
  • the present application proposes a method for locating the root cause of an alarm, and the method includes the steps:
  • the first time window of the first indicator time series data is adjusted according to the first preset rule, and the distance between the specified indicator and the entry indicator is calculated.
  • At least one specified alarm object is selected from all the specified alarm objects as the alarm according to a second preset rule
  • the root cause object of the cluster and output the root cause object of the alarm cluster.
  • this application also provides a device for locating the root cause of the alarm, including:
  • the first acquisition module is configured to acquire a specific alarm object in an alarm slice, and generate an alarm cluster according to the specific alarm object, where the specific alarm object is any alarm object among all the alarm objects in the alarm slice ;
  • the generating module is used to obtain the specified alarm object in the alarm cluster, and perform indicator aggregation processing on the monitoring indicators in the specified alarm object to generate corresponding entry indicators, wherein the specified alarm object is the alarm cluster Any one of the alarm objects in all alarm objects;
  • the second acquisition module is configured to acquire the first indicator time series data corresponding to the specified indicator of the specified alarm object, and obtain the second indicator time series data corresponding to the entry indicator of the specified alarm object;
  • the adjustment module is configured to adjust the first time window of the first indicator time series data according to the first preset rule according to the second time window of the second indicator time series data, and calculate the specified indicator and the time window Specify the Pearson similarity between the entry indicators;
  • the third acquiring module is configured to respectively acquire the designated Pearson similarity degree corresponding to each of the designated alarm objects, and the designated time difference corresponding to each of the designated Pearson similarity degrees;
  • the first determining module is configured to filter out at least one of all the specified alarm objects according to the second preset rule based on all the specified Pearson similarities and the specified time difference corresponding to each of the specified Pearson similarities. Specify an alarm object as the root cause object of the alarm cluster, and output the root cause object of the alarm cluster.
  • the present application also provides a computer device including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, a method for locating the root cause of an alarm is implemented, wherein:
  • the method for locating the root cause of the alarm includes the following steps:
  • the first time window of the first indicator time series data is adjusted according to the first preset rule, and the distance between the specified indicator and the entry indicator is calculated.
  • At least one specified alarm object is selected from all the specified alarm objects as the alarm according to a second preset rule
  • the root cause object of the cluster and output the root cause object of the alarm cluster.
  • the present application also provides a computer-readable storage medium on which a computer program is stored.
  • a method for locating the root cause of an alarm is realized, wherein the root cause of the alarm is The positioning method includes the following steps:
  • the first time window of the first indicator time series data is adjusted according to the first preset rule, and the distance between the specified indicator and the entry indicator is calculated.
  • At least one specified alarm object is selected from all the specified alarm objects as the alarm according to a second preset rule
  • the root cause object of the cluster and output the root cause object of the alarm cluster.
  • the method, device, computer equipment and storage medium for locating the alarm root cause provided in this application can quickly and intelligently generate the root cause object related to the alarm cluster corresponding to the alarm object by clustering and root cause analysis on the alarm object , Effectively avoid the need for manual root cause analysis in a large amount of raw data related to the operation and maintenance system at all times, reduce the workload of operation and maintenance personnel, reduce the time-consuming process of alarm root cause judgment, and increase The efficiency of operation and maintenance work.
  • FIG. 1 is a schematic flowchart of a method for locating the root cause of an alarm according to an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of an apparatus for locating the root cause of an alarm according to an embodiment of the present application
  • Fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.
  • a method for locating the root cause of an alarm includes:
  • S1 Obtain a specific alarm object in an alarm slice, and generate an alarm cluster according to the specific alarm object, where the specific alarm object is any alarm object among all alarm objects in the alarm slice;
  • S2 Obtain the specified alarm object in the alarm cluster, and perform index aggregation processing on the monitoring indicators in the specified alarm object to generate a corresponding entry indicator, wherein the specified alarm object is all of the alarm clusters Any one of the alarm objects in the alarm object;
  • S3 Acquire first indicator time series data corresponding to the specified indicator of the specified alarm object, and obtain second indicator time series data corresponding to the entry indicator of the specified alarm object;
  • the execution subject of this method embodiment is a device for locating the root cause of the alarm.
  • the device for locating the root cause of the alarm can be realized by a virtual device, such as software code, or by a physical device written or integrated with relevant execution codes, and can communicate with the user through a keyboard, mouse, remote control, Human-computer interaction is carried out by means of touchpad or voice control equipment.
  • the device for locating the root cause of alarms provided in this embodiment can quickly and intelligently generate root cause objects related to the alarm cluster corresponding to the alarm object, thereby effectively improving the work efficiency of the operation and maintenance work. Specifically, first obtain a specific alarm object in the alarm slice, and generate an alarm cluster based on the above specific alarm object.
  • a failure event When a failure event occurs during the operation of the operation and maintenance system, it may trigger alarms for multiple objects, that is, Generate multiple alarm objects associated with the failure event.
  • the above alarm slice refers to all the alarm objects generated by the operation and maintenance system in a specific time period.
  • the above specific time period can be set every 10 minutes; and each alarm object in the alarm slice in this example may only belong to one Alarm problems, to eliminate the possibility that the same alarm object in the alarm slice belongs to multiple alarm problems due to time relationships; the above-mentioned specific alarm object is any one of all alarm objects in the above-mentioned alarm slice; the above-mentioned alarm cluster is for specific The alarm object and the target alarm object whose call chain distance from the specific alarm object is not greater than the preset distance threshold is obtained after clustering, and any object in the alarm cluster may be the best explanation for the alarm cluster , Which corresponds to the root cause object of the alarm cluster.
  • the specified alarm object in the alarm cluster is obtained, and the monitoring indicators corresponding to the application layer object in the specified alarm object are aggregated to generate an entry indicator.
  • the above-mentioned designated alarm object is any one of all alarm objects in the alarm cluster, and the above-mentioned process of performing indicator aggregation processing on the monitoring indicators corresponding to the application layer objects in the above-mentioned designated alarm object may include: first obtaining the above All monitoring indicators in the application layer object are then normalized and averaged for all the above monitoring indicators to obtain the above entry indicators.
  • the indicator time series data corresponding to the above specified indicators are data values collected in real time, and generally can include at least collected values such as access time, visit volume, and occupancy, and the above specified indicators may specifically be access time consuming indicators.
  • the first time window of the first indicator time series data is performed according to the first preset rule. Adjust, and calculate the specified Pearson similarity between the above-mentioned designated index and the above-mentioned entrance index.
  • the first preset rule may refer to adjusting the first time window of the first indicator time series data according to the preset time difference threshold to ensure that the time difference between the first time window and the second time window is within the range of the time difference threshold.
  • the designated Pearson similarity corresponding to each of the above-mentioned designated alarm objects and the designated time difference corresponding to each of the above-mentioned designated Pearson similarities are respectively obtained.
  • the alarm object can calculate the Pearson similarity between the multiple sets of specified indicators and the entry indicators corresponding to multiple sets of specified first time windows multiple times according to multiple sets of specified first time windows, and The Pearson similarity degree with the largest numerical value is used as the above-mentioned designated Pearson similarity degree.
  • the time difference between the specific first time window and the specific second time window corresponding to the specified Pearson similarity can be calculated to be similar to the specified Pearson Degree corresponds to the above specified time difference.
  • at least one designated alarm object is selected from all the above-mentioned designated alarm objects as the root of the above-mentioned alarm cluster according to the second preset rule.
  • Cause object, and output the root cause object of the above alarm cluster are not specifically limited.
  • the above-mentioned second preset rule may be: by acquiring other related features corresponding to the specified alarm object, and then calling the preset supervised learning algorithm to control the other related features , And the specified Pearson similarity obtained above and the specified time difference are predicted to process, and then the root cause judgment probability value of each specified alarm object is calculated, and then the alarm object corresponding to the root cause judgment probability value that meets the condition is used as the alarm cluster The root cause object and output.
  • other rules can also be used to find the root cause objects of the alarm clusters.
  • the corresponding weighted value can be obtained by weighting the specified Pearson similarity, the specified time difference, and the corresponding weight value, which will then meet the conditions.
  • the alarm object corresponding to the weighted value is output as the root cause object of the alarm cluster, and so on.
  • clustering and root cause analysis are performed on the alarm objects, so that the root cause objects related to the alarm cluster corresponding to the alarm objects can be quickly and intelligently generated, effectively avoiding the need for manual time in a large number of primitives related to the operation and maintenance system.
  • Root cause analysis in the data reduces the workload of operation and maintenance personnel, reduces the time required for the process of determining the root cause of alarms, and improves the efficiency of operation and maintenance.
  • step S1 includes:
  • S101 Calculate the call chain distance between each alarm object except the specific alarm object in the alarm slice and the specific alarm object;
  • S102 cyclically execute the step of filtering out the target alarm object whose call chain distance to the specific alarm object is not greater than the preset distance threshold from the alarm slice, until there is no contact with the specific alarm object in the alarm slice.
  • the call chain distance between the alarm objects is not greater than the preset distance threshold of the target alarm object;
  • S103 Place all the target alarm objects and the specific alarm objects obtained through screening in a preset alarm set to obtain the alarm cluster.
  • the step of acquiring a specific alarm object in an alarm slice and generating an alarm cluster based on the specific alarm object may specifically include: first obtaining the specific alarm object in the alarm slice. Then respectively calculate the call chain distance between each alarm object in the alarm slice except the specific alarm object and the specific alarm object.
  • the above-mentioned call chain distance is the distance value corresponding to the relationship of the number of mutual calls between the alarm objects. Specifically, there will be a call relationship between the alarm objects in the operation and maintenance system. In a certain operation and maintenance system, Alarm object A will call alarm object B, and alarm object B will call alarm object C. If there is a call relationship, there will be an influence relationship.
  • a and B are two alarm objects associated with n calls, then the call chain distance between A and B is n; for another example, according to the above-mentioned A call B, B calls C can be obtained Out: A and B are two alarm objects associated with one call, that is, the call chain distance between A and B is 1; in the same way, B and C are two alarm objects associated with one call, That is, the call chain distance between B and C is 1; A and C are two alarm objects that are associated with two calls, that is, the call chain distance between A and C is 2.
  • the call chain distance is calculated as n, for any alarm object A among the multiple alarm objects included in a certain alarm impact problem, there is at least one alarm object B and the call chain distance between it is less than or equal to n .
  • the target alarm object whose call chain distance to the specific alarm object is not greater than the preset distance threshold is filtered from the alarm slice.
  • the value of the aforementioned preset distance threshold is set according to the aforementioned alarm propagation distance, which can be set according to the actual value of the alarm propagation distance, for example, it can be set to 1.
  • step S4 includes:
  • S400 Acquire a first time window of the first indicator time series data, and a second time window of the second indicator time series data;
  • S401 Perform sliding adjustment on the first time window according to the second time window and a preset time difference threshold, so as to control the time difference between the first time window and the second time window to be within the time difference threshold. Within the range, and get multiple sets of designated first time windows after sliding adjustment;
  • S402 Calculate the Pearson similarity between the multiple sets of the specified indicators and the entry indicators corresponding to the multiple sets of the specified first time windows according to the multiple sets of the specified first time windows;
  • the first time window of the first indicator time series data is adjusted according to the first preset rule according to the second time window of the second indicator time series data, and all the time windows are calculated.
  • the step of specifying the Pearson similarity between the specified indicator and the entry indicator may specifically include: first obtaining a first time window of the first indicator time series data and a second time window of the second indicator time series data. Wherein, the numerical values of the first time window and the second time window are in units of minutes. Then, according to the second time window and the preset time difference threshold, the first time window is slidingly adjusted to control the time difference between the first time window and the second time window within the range of the time difference threshold And get multiple sets of designated first time windows after sliding adjustment.
  • the numerical form of the above-mentioned time difference is an absolute value. Then, according to the multiple sets of the specified first windows, the Pearson similarity between the multiple sets of the specified indicators and the entry indicators respectively corresponding to the multiple sets of the specified first time windows is calculated. Wherein, the above-mentioned first time window and the second time window have the same size.
  • the above-mentioned time difference threshold is a range value, and the specific numerical range of the time difference threshold is not limited, for example, it can be set to 0-60 min.
  • the first time window can be slidably adjusted according to the preset number of times to obtain multiple sets of designated first time windows with the same preset number of times, and the index data of the designated indicators corresponding to different designated first time windows will be different , And then it is possible to calculate multiple sets of different Pearson similarities between the designated index and the entry index. After the multiple groups of Pearson similarity are obtained, the Pearson similarity with the largest value is selected from the multiple groups of Pearson similarity, and the Pearson similarity with the largest value is determined as the specified Pearson similarity. Spend.
  • the first time window of the first indicator time series data is used to perform corresponding sliding adjustments through a preset time difference threshold, and then multiple sets of Pearson similarities are calculated according to the obtained multiple sets of designated first time windows, and from multiple sets of Pearson similarities.
  • Accurately filtering out the designated Pearson similarity with the largest value from the group Pearson similarity is beneficial to subsequently accurately determining the root cause object of the alarm cluster based on the designated Pearson similarity and in accordance with preset related rules.
  • step S6 includes:
  • S601 Acquire a second characteristic corresponding to the specified alarm object, where the quantity of the second characteristic includes one or more;
  • S602 Invoke a preset supervised learning algorithm to perform prediction processing on the first feature and the second feature respectively corresponding to each of the specified alarm objects, and calculate the root cause judgment probability value corresponding to each specified alarm object;
  • S603 Use the at least one designated alarm object with the highest root cause judgment probability value as the root cause object of the alarm cluster, and output the root cause object of the alarm cluster.
  • the step of specifying the alarm object as the root cause object of the alarm cluster and outputting the root cause object of the alarm cluster may specifically include: firstly taking the specified Pearson similarity and the specified time difference corresponding to the specified alarm object as the first feature. Then, a second feature corresponding to the specified alarm object is acquired, where the second feature refers to other related features corresponding to the specified alarm object, and the number of the second features includes one or more. And the above-mentioned second feature may specifically include one or more of the number of object calls, the number of object alarms, and the object hierarchy.
  • the above-mentioned supervised learning algorithm can be a random forest algorithm.
  • the random forest algorithm is an ensemble learning algorithm, which constructs multiple weak learners (tree models), and trains any learner by randomly extracting features and samples, and finally obtains Comprehensive judgment result of multiple weak learners.
  • the predictive output of random forest is in the form of a probability value (generally, the probability is greater than 0.5 is 1, otherwise it is 0).
  • the feature of an object is input, and the output is the judgment probability of the object as the root cause. For an alarm cluster, there are multiple alarm objects.
  • the random forest algorithm will calculate the corresponding root cause judgment probability for each alarm object value.
  • the root cause judgment probability value is obtained, at least one designated alarm object with the highest root cause judgment probability value is taken as the root cause object of the alarm cluster, and the root cause object of the alarm cluster is output.
  • the first feature and the second feature corresponding to the root cause object of the alarm cluster may also be output.
  • root cause prediction is performed on multiple acquired features, that is, the first feature and the second feature, and at least one designated alarm object with the highest root cause judgment probability value can be output accurately, so that it can be quickly and intelligently
  • the generation of the root cause object of the alarm cluster corresponding to the alarm object effectively reduces the workload of the operation and maintenance personnel and improves the work efficiency of the operation and maintenance work.
  • step S603 includes:
  • S6030 Sort all the root cause judgment probability values in order from high to low to obtain a sorting result
  • S6032 Use the alarm object corresponding to the obtained specified root cause judgment probability value as the root cause object of the alarm cluster.
  • the step of using at least one designated alarm object with the highest root cause judgment probability value as the root cause object of the alarm cluster and outputting the root cause object of the alarm cluster may specifically include: All the above-mentioned root cause judgment probability values are sorted in order from high to low, and the sorting result is obtained. After obtaining the above ranking result, starting from the root cause judgment probability value ranked first in the above ranking result, a preset number of designated root cause judgment probability values are sequentially obtained. Among them, the above-mentioned preset number is not specifically limited, and can be set according to actual needs, for example, it can be set to five.
  • the alarm object corresponding to the obtained specified root cause judgment probability value is used as the root cause object of the above alarm cluster, so that the acquired root cause object of the alarm cluster can be subsequently used as the root cause corresponding to the alarm cluster.
  • Personnel output so that the operation and maintenance personnel can locate the alarm root cause of this alarm cluster according to the output root cause of the alarm cluster.
  • the ranking results obtained after sorting all root cause judgment probability values are: root cause judgment probability value 1, root cause judgment probability value 2, root cause judgment probability value 3 , Root cause judgment probability value 4, root cause judgment probability value 5,..., then the above root cause judgment probability value 1, root cause judgment probability value 2, root cause judgment probability value 3, root cause judgment probability value 4,
  • the alarm objects corresponding to the root cause judgment probability value of 5 are used as the root cause objects of the above-mentioned alarm cluster.
  • step S6 includes:
  • S612 Perform a weighted calculation on each of the designated Pearson similarities and each corresponding designated time difference according to the first weight value and the second weight value to obtain multiple weighted values;
  • S613 Use at least one designated alarm object with the highest weight value as the root cause object of the alarm cluster, and output the root cause object of the alarm cluster.
  • the step of specifying the alarm object as the root cause object of the alarm cluster and outputting the root cause object of the alarm cluster may specifically include: first calculating a first weight value corresponding to the specified Pearson similarity based on historical alarm data. Wherein, the first weight value corresponding to the specified Pearson similarity can be calculated according to the frequency of occurrence of the specified alarm object in the historical alarm data. After the first weight value is obtained, the second weight value corresponding to the specified time difference is obtained.
  • the value of the above-mentioned second weight value is not specifically limited, and can be set according to actual needs, for example, it can be set to 0.5.
  • the weighted calculation is performed on each designated Pearson similarity degree and each corresponding designated time difference according to the first weight value and the second weight value, and Get multiple weighted values.
  • at least one designated alarm object with the highest weighted value is taken as the root cause object of the above-mentioned alarm cluster, and the root cause object of the above-mentioned alarm cluster is output.
  • the process of determining at least one designated alarm object with the highest weighted value as the root cause object of the alarm cluster can refer to the above-mentioned setting at least one designated alarm object with the highest root cause judgment probability value as the root cause object of the alarm cluster. No longer.
  • the obtained first weight value corresponding to the specified Pearson similarity degree and the second weight value corresponding to the specified time difference are weighted and calculated to obtain multiple corresponding weight values, and the accurate output weight value is the highest.
  • At least one designated alarm object is used as the root cause object of the alarm cluster, which effectively realizes that the root cause object of the alarm cluster corresponding to the alarm object can be quickly and intelligently generated, which reduces the workload of operation and maintenance personnel and improves the operation and maintenance work Work efficiency.
  • step S610 includes:
  • S6101 Respectively filter out the second quantity of the specified historical alarm data containing each specified alarm object from the historical alarm data;
  • S6102 Calculate the quotient of the second quantity of each of the designated historical alarm data and the first quantity to obtain multiple proportion values corresponding to each of the designated alarm objects;
  • the foregoing step of calculating the first weight value corresponding to the specified Pearson similarity based on the historical alarm data may specifically include: first obtaining the historical alarm data and the first quantity of the foregoing historical alarm data .
  • a timed task mechanism can be used to read historical alarm data within a preset time period according to a set time period to perform subsequent data processing and analysis.
  • Other task data such as the execution time period of the timing task and the alarm data sampling duration can be stored in a fixed configuration file in advance.
  • the fixed configuration file is read to obtain the time period for the execution of the timing task and register it in the timing task executor.
  • the alarm data sampling duration configured in the configuration file is read, and the corresponding historical alarm data is obtained from the historical alarm number record table.
  • the second quantity of the specified historical alarm data containing each specified alarm object is filtered out from the above-mentioned historical alarm data.
  • the quotient of the second quantity of each of the designated historical alarm data and the first quantity is calculated to obtain multiple proportion values corresponding to each of the designated alarm objects.
  • the obtained ratios are determined as the first weight value corresponding to each of the specified alarm objects, so that the weight value is calculated according to the first weight value, and then the weight value is calculated according to the weight value. Value to determine the root cause of the alarm cluster.
  • the method includes:
  • S620 Display the root cause object of the alarm cluster
  • S621 Receive a designated root cause object selected by the operation and maintenance personnel from the root cause objects of the alarm cluster;
  • S622 Determine the specified root cause object as the final root cause object of the alarm cluster.
  • the operation and maintenance personnel can further determine the final root cause object of the alarm cluster from all root cause objects.
  • the root cause object of the above-mentioned alarm cluster is first shown to the operation and maintenance personnel.
  • the first feature and the second feature corresponding to the root cause object of the alarm cluster can also be output, or the weight value corresponding to the root cause object of the alarm cluster.
  • the designated root cause object selected by the operation and maintenance personnel from the root cause object of the above alarm cluster is received.
  • the number of the above-mentioned designated root cause object is preferably one.
  • operation and maintenance personnel can make manual judgments based on the root cause objects, first feature, second feature, or weight value of the above-mentioned alarm clusters, and filter some of the wrong root cause objects among the root cause objects of all alarm clusters. And then get the above-mentioned designated root cause object.
  • the above-mentioned designated root cause object is determined as the final root cause object of the alarm cluster, so as to obtain the alarm cluster after the operation and maintenance personnel perform the screening and elimination of some of the erroneous root cause objects in the root cause objects of all the alarm clusters.
  • the final root cause object enables the operation and maintenance personnel to locate the root cause of the alarm cluster more quickly and accurately based on the final root cause object of the above alarm cluster.
  • root cause object 4 is regarded as the final root cause object of the alarm cluster.
  • an embodiment of the present application also provides a device for locating the root cause of an alarm, including:
  • the first obtaining module 1 is configured to obtain a specific alarm object in an alarm slice, and generate an alarm cluster according to the specific alarm object, where the specific alarm object is any alarm among all alarm objects in the alarm slice Object
  • the generating module 2 is used to obtain the specified alarm object in the alarm cluster, and perform finger aggregation processing on the monitoring indicators in the specified alarm object to generate corresponding entry indicators, wherein the specified alarm object is the alarm Any one of the alarm objects in all the alarm objects in the cluster;
  • the second obtaining module 3 is configured to obtain first indicator time series data corresponding to the specified indicator of the specified alarm object, and obtain second indicator time series data corresponding to the entry indicator of the specified alarm object;
  • the adjustment module 4 is configured to adjust the first time window of the first indicator time series data according to the first preset rule according to the second time window of the second indicator time series data, and calculate the specified indicator and The designated Pearson similarity between the entry indicators;
  • the third acquiring module 5 is configured to respectively acquire the designated Pearson similarity corresponding to each of the designated alarm objects, and the designated time difference corresponding to each of the designated Pearson similarities;
  • the first determination module 6 is configured to screen out at least all the specified alarm objects according to the second preset rule according to all the specified Pearson similarities and the specified time difference corresponding to each of the specified Pearson similarities.
  • a designated alarm object is used as the root cause object of the alarm cluster, and the root cause object of the alarm cluster is output.
  • the implementation process of the functions and roles of the first acquisition module, generation module, second acquisition module, adjustment module, third acquisition module, and first determination module in the device for locating the root cause of the above-mentioned alarm is detailed above.
  • the implementation process corresponding to steps S1 to S6 in the method for locating the root cause of the alarm will not be repeated here.
  • the above-mentioned first acquisition module includes:
  • the first obtaining unit is used to obtain a specific alarm object in the alarm slice
  • the first calculation unit is configured to calculate the call chain distance between each alarm object except the specific alarm object in the alarm slice and the specific alarm object;
  • the first screening unit is configured to cyclically perform the step of screening out the target alarm object whose call chain distance to the specific alarm object is not greater than a preset distance threshold from the alarm slice, until no target alarm object is found in the alarm slice. There is a target alarm object whose call chain distance to the specific alarm object is not greater than a preset distance threshold;
  • the placing unit is configured to place all the target alarm objects and the specific alarm objects obtained through screening in a preset alarm set to obtain the alarm cluster.
  • the realization process of the functions and roles of the first acquisition unit, the first calculation unit, the first screening unit and the placement unit in the device for locating the root cause of the alarm is detailed in the corresponding method for locating the root cause of the alarm.
  • the implementation process of steps S100 to S103 will not be repeated here.
  • the aforementioned adjustment module includes:
  • the second acquiring unit is configured to acquire the first time window of the first indicator time series data and the second time window of the second indicator time series data;
  • the adjustment unit is configured to perform sliding adjustment on the first time window according to the second time window and a preset time difference threshold, so as to control the time difference between the first time window and the second time window within the Within the range of the time difference threshold, and obtain multiple sets of designated first time windows after sliding adjustment;
  • the second calculation unit is configured to calculate the Pearson difference between the multiple sets of the specified indicators and the entry indicators corresponding to the multiple sets of the specified first time windows, respectively, according to the multiple sets of the specified first time windows Similarity
  • the second screening unit is used to screen the Pearson similarity with the largest value from the multiple groups of Pearson similarity
  • the first determining unit is configured to determine the Pearson similarity degree with the largest value as the designated Pearson similarity degree.
  • the implementation process of the functions and roles of the second acquisition unit, the adjustment unit, the second calculation unit, the second screening unit and the first determination unit in the positioning device of the above-mentioned root cause of the alarm is detailed in the above-mentioned root cause of the alarm.
  • the implementation process of corresponding steps S400 to S404 in the positioning method is not repeated here.
  • the above-mentioned first determining module includes:
  • a second determining unit configured to use the designated Pearson similarity corresponding to the designated alarm object and the designated time difference as the first feature
  • the third acquiring unit is configured to acquire a second characteristic corresponding to the specified alarm object, wherein the quantity of the second characteristic includes one or more;
  • the calling unit is used to call a preset supervised learning algorithm to perform prediction processing on the first feature and the second feature corresponding to each specified alarm object, and calculate the root cause judgment probability corresponding to each specified alarm object. value;
  • the third determining unit is configured to use at least one designated alarm object with the highest root cause judgment probability value as the root cause object of the alarm cluster, and output the root cause object of the alarm cluster.
  • the functions and functions of the second determining unit, the third acquiring unit, the invoking unit and the third determining unit in the device for locating the root cause of the alarm are detailed in the corresponding method in the method for locating the root cause of the alarm.
  • the implementation process of steps S600 to S603 will not be repeated here.
  • the above-mentioned third determining unit includes:
  • the sorting subunit is used to sort all the root cause judgment probability values in order from high to low to obtain the sorting result
  • the first obtaining subunit is configured to sequentially obtain a preset number of designated root cause judgment probability values starting from the root cause judgment probability value ranked first in the ranking result;
  • the first determining subunit is configured to use the alarm object corresponding to the obtained specified root cause judgment probability value as the root cause object of the alarm cluster.
  • the functions and functions of the sorting subunit, the first acquiring subunit, and the first determining subunit in the device for positioning the root cause of the alarm are realized in detail in the corresponding step S6030 in the method for positioning the root cause of the alarm.
  • the implementation process to S6032 will not be repeated here.
  • the above-mentioned first determining module includes:
  • the third calculation unit is configured to calculate the first weight value corresponding to the specified Pearson similarity based on historical alarm data
  • a fourth obtaining unit configured to obtain a second weight value corresponding to the specified time difference
  • a fourth calculation unit configured to perform a weighted calculation on each of the designated Pearson similarity and each corresponding designated time difference according to the first weight value and the second weight value to obtain multiple weighted values
  • the fourth determining unit is configured to use at least one designated alarm object with the highest weight value as the root cause object of the alarm cluster, and output the root cause object of the alarm cluster.
  • the implementation process of the functions and roles of the third calculation unit, the fourth acquisition unit, the fourth calculation unit, and the fourth determination unit in the device for locating the root cause of the alarm is detailed in the above method for locating the root cause of the alarm.
  • the implementation process of corresponding steps S610 to S613 in, will not be repeated here.
  • the above-mentioned third calculation unit includes:
  • the second obtaining subunit is configured to obtain the historical alarm data and the first quantity of the historical alarm data
  • the screening subunit is used to filter the second quantity of the specified historical alarm data containing each specified alarm object from the historical alarm data;
  • the calculation subunit is configured to calculate the quotient of the second quantity of each of the designated historical alarm data and the first quantity to obtain multiple proportion values corresponding to each of the designated alarm objects;
  • the second determining subunit is configured to determine each of the obtained ratios as the first weight value corresponding to each of the designated alarm objects.
  • the device for locating the root cause of the alarm includes:
  • the display module is used to display the root cause object of the alarm cluster
  • the receiving module is configured to receive the designated root cause object selected by the operation and maintenance personnel from the root cause objects of the alarm cluster;
  • the second determining module is configured to determine the designated root cause object as the final root cause object of the alarm cluster.
  • the implementation process of the functions and effects of the display module, the receiving module and the second determining module in the above-mentioned root-cause locating device is detailed in the implementation process corresponding to steps S620 to S622 in the above-mentioned root-cause locating method. , I won’t repeat it here.
  • an embodiment of the present application also provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed for the computer equipment is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used to store data such as designated alarm objects, monitoring indicators, entry indicators, indicator time series data, designated Pearson similarity, and designated time difference.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the steps for the processor to execute the method for locating the root cause of the alarm include:
  • the first time window of the first indicator time series data is adjusted according to the first preset rule, and the distance between the specified indicator and the entry indicator is calculated.
  • At least one specified alarm object is selected from all the specified alarm objects as the alarm according to a second preset rule
  • the root cause object of the cluster and output the root cause object of the alarm cluster.
  • FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the devices and computer equipment to which the solution of the present application is applied.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile, and has a computer program stored thereon, which is realized when the computer program is executed by a processor.
  • the method for locating the root cause of the alarm includes:
  • the first time window of the first indicator time series data is adjusted according to the first preset rule, and the distance between the specified indicator and the entry indicator is calculated.
  • At least one specified alarm object is selected from all the specified alarm objects as the alarm according to a second preset rule
  • the root cause object of the cluster and output the root cause object of the alarm cluster.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Abstract

A method for locating the root cause of an alarm, and a related apparatus, which relate to artificial intelligence. The method comprises: acquiring a particular alarm object in an alarm slice and generating a corresponding alarm cluster; aggregating monitoring indexes in specified alarm objects in the alarm cluster and generating an entry index; acquiring first index time sequence data corresponding to the specified alarm objects and acquiring second index time sequence data corresponding to the specified alarm objects; adjusting a first time window of the first index time sequence data according to a second time window of the second index time sequence data, and calculating a specified Pearson similarity between a specified index and the entry index; according to the specified Pearson similarity corresponding to each specified alarm object and the specified time difference corresponding to each specified Pearson similarity, selecting at least one specified alarm object from among all of the specified alarm objects as a root cause object of the alarm cluster and outputting same. The described method can quickly generate a root cause object related to an alarm object.

Description

告警根因的定位方法、装置、计算机设备和存储介质Method, device, computer equipment and storage medium for locating root cause of alarm
本申请要求于2020年04月29日提交中国专利局、申请号为202010357568.4,发明名称为“告警根因的定位方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on April 29, 2020, the application number is 202010357568.4, and the invention title is "Methods, devices, computer equipment and storage media for locating the root cause of alarms", and its entire contents Incorporated in this application by reference.
技术领域Technical field
本申请涉及人工智能技术领域,具体涉及一种告警根因的定位方法、装置、计算机设备和存储介质。This application relates to the field of artificial intelligence technology, and in particular to a method, device, computer equipment, and storage medium for locating the root cause of an alarm.
背景技术Background technique
随着科学技术的快速发展,当前信息化时代的业务场景日新月异,频繁的业务功能更新和配置参数变更,都会导致层出不穷的异常告警,从而导致安全隐患和潜在损失。在运维系统中,某对象所产生的故障可能引发多个对象的告警,而每一时刻可能存在多个对象故障所引发的大量关联告警。因此,如何在运维系统发生异常告警时,迅速定位异常告警的根因(Root Cause,根本原因)以及时止损,就成为亟待解决的问题。发明人意识到,现有的定位告警根因的方式需要由运维人员时刻关注运维系统,然后对某时刻的告警对象进行切分以归纳为不同的问题,并对于每一个问题进行根因分析以判定故障对象,运维人员的工作量大,运维工作的耗时较长,运维工作效率低。With the rapid development of science and technology, business scenarios in the current information age are changing with each passing day. Frequent business function updates and configuration parameter changes will cause endless abnormal alarms, which will lead to security risks and potential losses. In the operation and maintenance system, a fault generated by an object may trigger alarms for multiple objects, and there may be a large number of associated alarms caused by multiple object failures at each moment. Therefore, how to quickly locate the root cause (root cause) of the abnormal alarm and stop the loss in time when an abnormal alarm occurs in the operation and maintenance system has become an urgent problem to be solved. The inventor realized that the existing method of locating the root cause of the alarm requires the operation and maintenance personnel to always pay attention to the operation and maintenance system, and then segment the alarm object at a certain moment to summarize it into different problems, and perform the root cause for each problem. Analyze to determine the fault object, the workload of the operation and maintenance personnel is large, the operation and maintenance work takes a long time, and the operation and maintenance work efficiency is low.
技术问题technical problem
本申请的主要目的为提供一种告警根因的定位方法、装置、计算机设备和存储介质,旨在解决现有的定位告警根因的方式需要由运维人员时刻关注运维系统,然后对某时刻的告警对象进行切分以归纳为不同的问题,并对于每一个问题进行根因分析以判定故障对象,运维人员的工作量大,运维工作的耗时较长,运维工作效率低的技术问题。The main purpose of this application is to provide a method, device, computer equipment, and storage medium for locating the root cause of an alarm. It aims to solve the existing method of locating the root cause of an alarm. The alarm objects at all times are divided into different problems, and the root cause analysis is performed for each problem to determine the fault object. The workload of the operation and maintenance personnel is large, the operation and maintenance work takes a long time, and the operation and maintenance work efficiency is low. Technical issues.
技术解决方案Technical solutions
为了实现上述发明目的,第一方面,本申请提出一种告警根因的定位方法,所述方法包括步骤:In order to achieve the above-mentioned purpose of the invention, in the first aspect, the present application proposes a method for locating the root cause of an alarm, and the method includes the steps:
获取告警切片中的特定告警对象,并根据所述特定告警对象生成告警簇,其中,所述特定告警对象为所述告警切片中所有的告警对象内的任意一个告警对象;Acquiring a specific alarm object in an alarm slice, and generating an alarm cluster according to the specific alarm object, where the specific alarm object is any alarm object among all the alarm objects in the alarm slice;
获取所述告警簇中的指定告警对象,并对所述指定告警对象中的监控指标进行指标聚合处理,生成对应的入口指标,其中,所述指定告警对象为所述告警簇中所有的告警对象内的任意一个告警对象;Obtain the specified alarm object in the alarm cluster, and perform indicator aggregation processing on the monitoring indicators in the specified alarm object to generate corresponding entry indicators, where the specified alarm object is all alarm objects in the alarm cluster Any alarm object within;
获取与所述指定告警对象的指定指标对应的第一指标时序数据,以及获取与所述指定告警对象的入口指标对应的第二指标时序数据;Acquiring first indicator time series data corresponding to the specified indicator of the specified alarm object, and acquiring second indicator time series data corresponding to the entry indicator of the specified alarm object;
根据所述第二指标时序数据的第二时间窗口,按照第一预设规则对所述第一指标时序数据的第一时间窗口进行调整,并计算出所述指定指标与所述入口指标之间的指定皮尔逊相似度;According to the second time window of the second indicator time series data, the first time window of the first indicator time series data is adjusted according to the first preset rule, and the distance between the specified indicator and the entry indicator is calculated. The specified Pearson similarity;
分别获取每一个所述指定告警对象对应的指定皮尔逊相似度,以及与每一个所述指定皮尔逊相似度分别对应的指定时间差;Respectively acquiring a designated Pearson similarity corresponding to each of the designated alarm objects, and a designated time difference corresponding to each of the designated Pearson similarities;
根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。According to all the specified Pearson similarities and the specified time difference corresponding to each of the specified Pearson similarities, at least one specified alarm object is selected from all the specified alarm objects as the alarm according to a second preset rule The root cause object of the cluster, and output the root cause object of the alarm cluster.
第二方面,本申请还提供一种告警根因的定位装置,包括:In the second aspect, this application also provides a device for locating the root cause of the alarm, including:
第一获取模块,用于获取告警切片中的特定告警对象,并根据所述特定告警对象生成告警簇,其中,所述特定告警对象为所述告警切片中所有的告警对象内的任意一个告警对象;The first acquisition module is configured to acquire a specific alarm object in an alarm slice, and generate an alarm cluster according to the specific alarm object, where the specific alarm object is any alarm object among all the alarm objects in the alarm slice ;
生成模块,用于获取所述告警簇中的指定告警对象,并对所述指定告警对象中的监控指标进行指标聚合处理,生成对应的入口指标,其中,所述指定告警对象为所述告警簇中所有的告警对象内的任意一个告警对象;The generating module is used to obtain the specified alarm object in the alarm cluster, and perform indicator aggregation processing on the monitoring indicators in the specified alarm object to generate corresponding entry indicators, wherein the specified alarm object is the alarm cluster Any one of the alarm objects in all alarm objects;
第二获取模块,用于获取与所述指定告警对象的指定指标对应的第一指标时序数据,以及获取与所述指定告警对象的入口指标对应的第二指标时序数据;The second acquisition module is configured to acquire the first indicator time series data corresponding to the specified indicator of the specified alarm object, and obtain the second indicator time series data corresponding to the entry indicator of the specified alarm object;
调整模块,用于根据所述第二指标时序数据的第二时间窗口,按照第一预设规则对所述第一指标时序数据的第一时间窗口进行调整,并计算出所述指定指标与所述入口指标之间的指定皮尔逊相似度;The adjustment module is configured to adjust the first time window of the first indicator time series data according to the first preset rule according to the second time window of the second indicator time series data, and calculate the specified indicator and the time window Specify the Pearson similarity between the entry indicators;
第三获取模块,用于分别获取每一个所述指定告警对象对应的指定皮尔逊相似度,以及与每一个所述指定皮尔逊相似度分别对应的指定时间差;The third acquiring module is configured to respectively acquire the designated Pearson similarity degree corresponding to each of the designated alarm objects, and the designated time difference corresponding to each of the designated Pearson similarity degrees;
第一确定模块,用于根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。The first determining module is configured to filter out at least one of all the specified alarm objects according to the second preset rule based on all the specified Pearson similarities and the specified time difference corresponding to each of the specified Pearson similarities. Specify an alarm object as the root cause object of the alarm cluster, and output the root cause object of the alarm cluster.
第三方面,本申请还提供一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器执行所述计算机程序时实现一种告警根因的定位方法,其中,所述告警根因的定位方法包括以下步骤:In a third aspect, the present application also provides a computer device including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, a method for locating the root cause of an alarm is implemented, wherein: The method for locating the root cause of the alarm includes the following steps:
获取告警切片中的特定告警对象,并根据所述特定告警对象生成告警簇,其中,所述特定告警对象为所述告警切片中所有的告警对象内的任意一个告警对象;Acquiring a specific alarm object in an alarm slice, and generating an alarm cluster according to the specific alarm object, where the specific alarm object is any alarm object among all the alarm objects in the alarm slice;
获取所述告警簇中的指定告警对象,并对所述指定告警对象中的监控指标进行指标聚合处理,生成对应的入口指标,其中,所述指定告警对象为所述告警簇中所有的告警对象内的任意一个告警对象;Obtain the specified alarm object in the alarm cluster, and perform indicator aggregation processing on the monitoring indicators in the specified alarm object to generate corresponding entry indicators, where the specified alarm object is all alarm objects in the alarm cluster Any alarm object within;
获取与所述指定告警对象的指定指标对应的第一指标时序数据,以及获取与所述指定告警对象的入口指标对应的第二指标时序数据;Acquiring first indicator time series data corresponding to the specified indicator of the specified alarm object, and acquiring second indicator time series data corresponding to the entry indicator of the specified alarm object;
根据所述第二指标时序数据的第二时间窗口,按照第一预设规则对所述第一指标时序数据的第一时间窗口进行调整,并计算出所述指定指标与所述入口指标之间的指定皮尔逊相似度;According to the second time window of the second indicator time series data, the first time window of the first indicator time series data is adjusted according to the first preset rule, and the distance between the specified indicator and the entry indicator is calculated. The specified Pearson similarity;
分别获取每一个所述指定告警对象对应的指定皮尔逊相似度,以及与每一个所述指定皮尔逊相似度分别对应的指定时间差;Respectively acquiring a designated Pearson similarity corresponding to each of the designated alarm objects, and a designated time difference corresponding to each of the designated Pearson similarities;
根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。According to all the specified Pearson similarities and the specified time difference corresponding to each of the specified Pearson similarities, at least one specified alarm object is selected from all the specified alarm objects as the alarm according to a second preset rule The root cause object of the cluster, and output the root cause object of the alarm cluster.
第四方面,本申请还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现一种告警根因的定位方法,其中,所述告警根因的定位方法包括以下步骤:In a fourth aspect, the present application also provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, a method for locating the root cause of an alarm is realized, wherein the root cause of the alarm is The positioning method includes the following steps:
获取告警切片中的特定告警对象,并根据所述特定告警对象生成告警簇,其中,所述特定告警对象为所述告警切片中所有的告警对象内的任意一个告警对象;Acquiring a specific alarm object in an alarm slice, and generating an alarm cluster according to the specific alarm object, where the specific alarm object is any alarm object among all the alarm objects in the alarm slice;
获取所述告警簇中的指定告警对象,并对所述指定告警对象中的监控指标进行指标聚合处理,生成对应的入口指标,其中,所述指定告警对象为所述告警簇中所有的告警对象内的任意一个告警对象;Obtain the specified alarm object in the alarm cluster, and perform indicator aggregation processing on the monitoring indicators in the specified alarm object to generate corresponding entry indicators, where the specified alarm object is all alarm objects in the alarm cluster Any alarm object within;
获取与所述指定告警对象的指定指标对应的第一指标时序数据,以及获取与所述指定告警对象的入口指标对应的第二指标时序数据;Acquiring first indicator time series data corresponding to the specified indicator of the specified alarm object, and acquiring second indicator time series data corresponding to the entry indicator of the specified alarm object;
根据所述第二指标时序数据的第二时间窗口,按照第一预设规则对所述第一指标时序数据的第一时间窗口进行调整,并计算出所述指定指标与所述入口指标之间的指定皮尔逊相似度;According to the second time window of the second indicator time series data, the first time window of the first indicator time series data is adjusted according to the first preset rule, and the distance between the specified indicator and the entry indicator is calculated. The specified Pearson similarity;
分别获取每一个所述指定告警对象对应的指定皮尔逊相似度,以及与每一个所述指定皮尔逊相似度分别对应的指定时间差;Respectively acquiring a designated Pearson similarity corresponding to each of the designated alarm objects, and a designated time difference corresponding to each of the designated Pearson similarities;
根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。According to all the specified Pearson similarities and the specified time difference corresponding to each of the specified Pearson similarities, at least one specified alarm object is selected from all the specified alarm objects as the alarm according to a second preset rule The root cause object of the cluster, and output the root cause object of the alarm cluster.
有益效果Beneficial effect
本申请中提供的告警根因的定位方法、装置、计算机设备和存储介质,通过对告警对象进行聚类与根因分析,从而能够快速智能地生成与告警对象对应的告警簇相关的根因对象,有效的避免出现需要人工时刻在与运维系统相关的大量原始数据中进行根因分析的情况,减轻了运维人员的工作量,降低了告警根因判断过程所需的耗时,提高了运维工作的工作效率。The method, device, computer equipment and storage medium for locating the alarm root cause provided in this application can quickly and intelligently generate the root cause object related to the alarm cluster corresponding to the alarm object by clustering and root cause analysis on the alarm object , Effectively avoid the need for manual root cause analysis in a large amount of raw data related to the operation and maintenance system at all times, reduce the workload of operation and maintenance personnel, reduce the time-consuming process of alarm root cause judgment, and increase The efficiency of operation and maintenance work.
附图说明Description of the drawings
图1是本申请一实施例的告警根因的定位方法的流程示意图;FIG. 1 is a schematic flowchart of a method for locating the root cause of an alarm according to an embodiment of the present application;
图2是本申请一实施例的告警根因的定位装置的结构示意图;FIG. 2 is a schematic structural diagram of an apparatus for locating the root cause of an alarm according to an embodiment of the present application;
图3是本申请一实施例的计算机设备的结构示意图。Fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.
本发明的最佳实施方式The best mode of the present invention
参照图1,本申请一实施例的告警根因的定位方法,包括:1, a method for locating the root cause of an alarm according to an embodiment of the present application includes:
S1:获取告警切片中的特定告警对象,并根据所述特定告警对象生成告警簇,其中,所述特定告警对象为所述告警切片中所有的告警对象内的任意一个告警对象;S1: Obtain a specific alarm object in an alarm slice, and generate an alarm cluster according to the specific alarm object, where the specific alarm object is any alarm object among all alarm objects in the alarm slice;
S2:获取所述告警簇中的指定告警对象,并对所述指定告警对象中的监控指标进行指聚合处理,生成对应的入口指标,其中,所述指定告警对象为所述告警簇中所有的告警对象内的任意一个告警对象;S2: Obtain the specified alarm object in the alarm cluster, and perform index aggregation processing on the monitoring indicators in the specified alarm object to generate a corresponding entry indicator, wherein the specified alarm object is all of the alarm clusters Any one of the alarm objects in the alarm object;
S3:获取与所述指定告警对象的指定指标对应的第一指标时序数据,以及获取与所述指定告警对象的入口指标对应的第二指标时序数据;S3: Acquire first indicator time series data corresponding to the specified indicator of the specified alarm object, and obtain second indicator time series data corresponding to the entry indicator of the specified alarm object;
S4:根据所述第二指标时序数据的第二时间窗口,按照第一预设规则对所述第一指标时序数据的第一时间窗口进行调整,并计算出所述指定指标与所述入口指标之间的指定皮尔逊相似度;S4: According to the second time window of the second indicator time series data, adjust the first time window of the first indicator time series data according to the first preset rule, and calculate the specified indicator and the entry indicator The specified Pearson similarity between;
S5:分别获取每一个所述指定告警对象对应的指定皮尔逊相似度,以及与每一个所述指定皮尔逊相似度分别对应的指定时间差;S5: Obtain a designated Pearson similarity corresponding to each of the designated alarm objects, and a designated time difference corresponding to each of the designated Pearson similarities;
S6:根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。S6: According to all the designated Pearson similarities and the designated time difference corresponding to each of the designated Pearson similarities, at least one designated alarm object is selected from all the designated alarm objects according to the second preset rule. The root cause object of the alarm cluster is described, and the root cause object of the alarm cluster is output.
如上述步骤S1至S6所述,本方法实施例的执行主体为一种告警根因的定位装置。在实际应用中, 该告警根因的定位装置可以通过虚拟装置,例如软件代码实现,也可以通过写入或集成有相关执行代码的实体装置实现,且可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。本实施例提供的告警根因的定位装置能够快速智能的生成与告警对象对应的告警簇相关的根因对象,进而有效的提高运维工作的工作效率。具体地,首先获取告警切片中的特定告警对象,并根据上述特定告警对象生成告警簇,其中,当运维系统在运行过程中出现故障事件时,便可能会引发多个对象的告警,即会产生与故障事件相关联的多个告警对象。另外,上述告警切片是指运维系统在特定时间段内所产生的所有告警对象,上述特定时间段可设为每隔10分钟;而且本实例中的告警切片中每一告警对象只可能属于一个告警问题,以排除告警切片中同一告警对象因为时间关系分属多个告警问题的可能性;上述特定告警对象为上述告警切片中所有的告警对象内的任意一个告警对象;上述告警簇是对特定告警对象,以及与该特定告警对象之间的调用链距离不大于预设距离阈值的目标告警对象进行聚类后得到的,且告警簇内的任意一个对象都可能是该告警簇的最佳解释,即对应着该告警簇的根因对象。在得到了上述告警簇后,再获取上述告警簇中的指定告警对象,并对与上述指定告警对象中的应用层对象相对应的监控指标进行指标聚合处理,生成入口指标。其中,上述指定告警对象为告警簇中所有的告警对象内的任意一个告警对象,上述对与上述指定告警对象中的应用层对象相对应的监控指标进行指标聚合处理的过程可包括:首先获取上述应用层对象中的所有监控指标,然后对上述所有监控指标进行归一化加和平均计算,得到上述入口指标。另外,在获取了上述所有监控指标后,还可进一步对该所有监控指标进行筛选处理,选取出一定数量的指定监控指标,之后再对该指定监控指标进行归一化加和平均计算,得到上述入口指标。举例地,如果存在监控指标A=[A 1,A 2,A 3,…,A n],与监控指标B=[B 1,B 2,B 3,…,B n],则对监控指标A与监控指标B进行归一化加和平均计算后,能得到入口指标C=[C 1,C 2,C 3,…,C n],且C n=(A n+B n)/2。在得到了上述入口指标时,然后获取与上述指定告警对象的指定指标对应的第一指标时序数据;以及获取与上述指定告警对象的入口指标对应的第二指标时序数据。其中,与上述指定指标对应的指标时序数据为实时采集的数据值,一般至少可包括访问耗时、访问量、占用量等采集值,上述指定指标具体可为访问耗时指标。在得到了上述第一指标时序数据与第二指标时序数据后,再根据上述第二指标时序数据的第二时间窗口,按照第一预设规则对上述第一指标时序数据的第一时间窗口进行调整,并计算出上述指定指标与上述入口指标之间的指定皮尔逊相似度。其中,上述第一预设规则可指根据预设的时间差阈值对上述第一指标时序数据的第一时间窗口进行调整,以确保第一时间窗口与第二时间窗口的时间差在上述时间差阈值的范围之内。另外,可通过与皮尔逊相关系数相关的计算公式来计算出上述指定指标与上述入口指标之间的指定皮尔逊相似度,且上述皮尔逊相关系数的计算公式如下:
Figure PCTCN2020099434-appb-000001
X为指定指标且X为变量,X=[X 1,X 2,X 3,…,X n],Y为入口指标且Y为变量,Y=[Y 1,Y 2,Y 3,…,Y n],cov(X,Y)为X与Y的协方差,μ为平均值,σ为标准差。之后分别获取每一个上述指定告警对象对应的指定皮尔逊相似度,以及与每一个上述指定皮尔逊相似度分别对应的指定时间差。其中,在根据第二时间窗口,按照第一预设规则对第一指标时序数据的第一时间窗口进行滑动调整后,会生成经过滑动调整后的多组指定第一时间窗口;使得对于上述指定告警对象,可以根据多组指定第一时间窗口,来多次计算出与多组指定第一时间窗口分别对应的,多组上述指定指标与所述入口指标之间的皮尔逊相似度,并将数值最大的皮尔逊相似度作为上述指定皮尔逊相似度。另外,在得到上述指定皮尔逊相似度后,通过计算上述指定皮尔逊相似度所对应的特定第一时间窗口及特定第二时间窗口之间的时间差值,便可得到与上述指定皮尔逊相似度对应的上述指定时间差。最后根据所有上述指定皮尔逊相似度以及每一个上述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有上述指定告警对象中筛选出至少一个指定告警对象作为上述告警簇的根因对象,并输出上述告警簇的根因对象。其中,对上述第二预设规则不作具体的限定,例如上述第二预设规则可以为:通过获取与指定告警对象对应的其他相关特征,然后调用预设的监督学习算法将对该其他相关特征、以及上述获得的指定皮尔逊相似度与指定时间差进行预测处理,进而计算出每一个指定告警对象的根因判断概率值,之后将满足条件的根因判断概率值所对应的告警对象作为告警簇的根因对象并输出。另外,还可以采用其他的规则来求取出告警簇的根因对象,例如通过对上述指定皮尔逊相似度、指定时间差以及相对应的权重值进行加权计算得到对应的加权值,之后将满足条件的加权值所对应的告警对象作为告警簇的根因对 象并输出,等等。本实施例通过对告警对象进行聚类与根因分析,从而能够快速智能地生成与告警对象对应的告警簇相关的根因对象,有效的避免出现需要人工时刻在与运维系统相关的大量原始数据中进行根因分析的情况,减轻了运维人员的工作量,降低了告警根因判断过程所需的耗时,提高了运维工作的工作效率。
As described in the above steps S1 to S6, the execution subject of this method embodiment is a device for locating the root cause of the alarm. In actual applications, the device for locating the root cause of the alarm can be realized by a virtual device, such as software code, or by a physical device written or integrated with relevant execution codes, and can communicate with the user through a keyboard, mouse, remote control, Human-computer interaction is carried out by means of touchpad or voice control equipment. The device for locating the root cause of alarms provided in this embodiment can quickly and intelligently generate root cause objects related to the alarm cluster corresponding to the alarm object, thereby effectively improving the work efficiency of the operation and maintenance work. Specifically, first obtain a specific alarm object in the alarm slice, and generate an alarm cluster based on the above specific alarm object. When a failure event occurs during the operation of the operation and maintenance system, it may trigger alarms for multiple objects, that is, Generate multiple alarm objects associated with the failure event. In addition, the above alarm slice refers to all the alarm objects generated by the operation and maintenance system in a specific time period. The above specific time period can be set every 10 minutes; and each alarm object in the alarm slice in this example may only belong to one Alarm problems, to eliminate the possibility that the same alarm object in the alarm slice belongs to multiple alarm problems due to time relationships; the above-mentioned specific alarm object is any one of all alarm objects in the above-mentioned alarm slice; the above-mentioned alarm cluster is for specific The alarm object and the target alarm object whose call chain distance from the specific alarm object is not greater than the preset distance threshold is obtained after clustering, and any object in the alarm cluster may be the best explanation for the alarm cluster , Which corresponds to the root cause object of the alarm cluster. After the alarm cluster is obtained, the specified alarm object in the alarm cluster is obtained, and the monitoring indicators corresponding to the application layer object in the specified alarm object are aggregated to generate an entry indicator. Wherein, the above-mentioned designated alarm object is any one of all alarm objects in the alarm cluster, and the above-mentioned process of performing indicator aggregation processing on the monitoring indicators corresponding to the application layer objects in the above-mentioned designated alarm object may include: first obtaining the above All monitoring indicators in the application layer object are then normalized and averaged for all the above monitoring indicators to obtain the above entry indicators. In addition, after all the above monitoring indicators are obtained, all the monitoring indicators can be further screened and processed, a certain number of designated monitoring indicators are selected, and then the normalized addition and average calculation of the designated monitoring indicators are carried out to obtain the above Entry indicators. For example, if there is a monitoring index A=[A 1 ,A 2 ,A 3 ,...,A n ], and a monitoring index B=[B 1 ,B 2 ,B 3 ,...,B n ], then the monitoring index After the normalized addition and average calculation of A and monitoring index B, the entry index C=[C 1 ,C 2 ,C 3 ,...,C n ], and C n =(A n +B n )/2 can be obtained . When the entry index is obtained, the first indicator time series data corresponding to the specified indicator of the specified alarm object is then obtained; and the second indicator time series data corresponding to the entry indicator of the specified alarm object is obtained. Among them, the indicator time series data corresponding to the above specified indicators are data values collected in real time, and generally can include at least collected values such as access time, visit volume, and occupancy, and the above specified indicators may specifically be access time consuming indicators. After obtaining the first indicator time series data and the second indicator time series data, according to the second time window of the second indicator time series data, the first time window of the first indicator time series data is performed according to the first preset rule. Adjust, and calculate the specified Pearson similarity between the above-mentioned designated index and the above-mentioned entrance index. Wherein, the first preset rule may refer to adjusting the first time window of the first indicator time series data according to the preset time difference threshold to ensure that the time difference between the first time window and the second time window is within the range of the time difference threshold. within. In addition, the specified Pearson similarity between the above specified index and the above entry index can be calculated by the calculation formula related to the Pearson correlation coefficient, and the calculation formula of the above Pearson correlation coefficient is as follows:
Figure PCTCN2020099434-appb-000001
X is the designated index and X is the variable, X=[X 1 ,X 2 ,X 3 ,...,X n ], Y is the entry index and Y is the variable, Y=[Y 1 ,Y 2 ,Y 3 ,..., Y n ], cov(X,Y) is the covariance of X and Y, μ is the average value, and σ is the standard deviation. Then, the designated Pearson similarity corresponding to each of the above-mentioned designated alarm objects and the designated time difference corresponding to each of the above-mentioned designated Pearson similarities are respectively obtained. Wherein, after sliding adjustment of the first time window of the first indicator time series data according to the second time window according to the first preset rule, multiple sets of designated first time windows after sliding adjustment will be generated; The alarm object can calculate the Pearson similarity between the multiple sets of specified indicators and the entry indicators corresponding to multiple sets of specified first time windows multiple times according to multiple sets of specified first time windows, and The Pearson similarity degree with the largest numerical value is used as the above-mentioned designated Pearson similarity degree. In addition, after the specified Pearson similarity is obtained, the time difference between the specific first time window and the specific second time window corresponding to the specified Pearson similarity can be calculated to be similar to the specified Pearson Degree corresponds to the above specified time difference. Finally, according to all the above-mentioned designated Pearson similarities and the designated time difference corresponding to each of the above-mentioned designated Pearson similarities, at least one designated alarm object is selected from all the above-mentioned designated alarm objects as the root of the above-mentioned alarm cluster according to the second preset rule. Cause object, and output the root cause object of the above alarm cluster. Among them, the above-mentioned second preset rule is not specifically limited. For example, the above-mentioned second preset rule may be: by acquiring other related features corresponding to the specified alarm object, and then calling the preset supervised learning algorithm to control the other related features , And the specified Pearson similarity obtained above and the specified time difference are predicted to process, and then the root cause judgment probability value of each specified alarm object is calculated, and then the alarm object corresponding to the root cause judgment probability value that meets the condition is used as the alarm cluster The root cause object and output. In addition, other rules can also be used to find the root cause objects of the alarm clusters. For example, the corresponding weighted value can be obtained by weighting the specified Pearson similarity, the specified time difference, and the corresponding weight value, which will then meet the conditions. The alarm object corresponding to the weighted value is output as the root cause object of the alarm cluster, and so on. In this embodiment, clustering and root cause analysis are performed on the alarm objects, so that the root cause objects related to the alarm cluster corresponding to the alarm objects can be quickly and intelligently generated, effectively avoiding the need for manual time in a large number of primitives related to the operation and maintenance system. Root cause analysis in the data reduces the workload of operation and maintenance personnel, reduces the time required for the process of determining the root cause of alarms, and improves the efficiency of operation and maintenance.
进一步地,本申请一实施例中,上述步骤S1,包括:Further, in an embodiment of the present application, the above step S1 includes:
S100:获取告警切片中的特定告警对象;S100: Obtain a specific alarm object in the alarm slice;
S101:分别计算所述告警切片中除所述特定告警对象外的每一个告警对象与所述特定告警对象之间的调用链距离;S101: Calculate the call chain distance between each alarm object except the specific alarm object in the alarm slice and the specific alarm object;
S102:循环执行从所述告警切片中筛选出与所述特定告警对象之间的调用链距离不大于预设距离阈值的目标告警对象的步骤,直至在所述告警切片中不存在与所述特定告警对象之间的调用链距离不大于预设距离阈值的目标告警对象;S102: cyclically execute the step of filtering out the target alarm object whose call chain distance to the specific alarm object is not greater than the preset distance threshold from the alarm slice, until there is no contact with the specific alarm object in the alarm slice. The call chain distance between the alarm objects is not greater than the preset distance threshold of the target alarm object;
S103:将筛选得到的所有所述目标告警对象与所述特定告警对象放置于预设的告警集合内,得到所述告警簇。S103: Place all the target alarm objects and the specific alarm objects obtained through screening in a preset alarm set to obtain the alarm cluster.
如上述步骤S100至S103所述,上述获取告警切片中的特定告警对象,并根据上述特定告警对象生成告警簇的步骤,具体可包括:首先获取告警切片中的特定告警对象。然后分别计算上述告警切片中除上述特定告警对象外的每一个告警对象与上述特定告警对象之间的调用链距离。其中,上述调用链距离是与告警对象之间的两两互相调用次数关系相对应的距离值,具体地,在运维系统中会存在有告警对象间的调用关系,在某运维系统中,告警对象A会调用告警对象B,告警对象B会调用告警对象C,且存在调用关系便会存在影响关系,对于任意某两个通过n次调用并关联的告警对象,则这2个告警对象之间的调用链距离为n。举例地,如果A与B为通过n次调用并关联的两个告警对象,则A与B之间的调用链距离为n;又如,根据上述提及的A调用B,B调用C可得出:A与B为通过1次调用并关联的两个告警对象,即A与B之间的调用链距离为1;同理,B与C为通过1次调用并关联的两个告警对象,即B与C之间的调用链距离为1;A与C为通过2次调用并关联的两个告警对象,即A与C之间的调用链距离为2。另外,当计算得到上述调用链距离为n时,对于某告警影响问题内包括的多个告警对象中的任意一个告警对象A,至少存在一个告警对象B与其之间的调用链距离小于或等于n。在计算出上述调用链距离后,再从上述告警切片中筛选出与上述特定告警对象的调用链距离不大于预设距离阈值的目标告警对象。其中,对于上述预设距离阈值的数值根据上述告警传播距离进行设置,即可根据告警传播距离的实际数值进行设置,例如可以设置为1。并循环执行从上述告警切片中筛选出与上述特定告警对象之间的调用链距离不大于预设距离阈值的目标告警对象的步骤,直至在告警切片中不存在与特定告警对象之间的调用链距离不大于预设距离阈值的目标告警对象。最后,将筛选得到的所有上述目标告警对象与上述特定告警对象放置于预设的告警集合内,得到上述告警簇。其中,可预先创建一个告警集合,通过将上述特定告警对象与上述目标告警对象放置于该告警集合内,来生成上述告警簇,以便后续利用相关算法规则对该告警簇内的指定告警对象进行根因分析,来智能快捷地得到告警簇的根因对象。As described in the above steps S100 to S103, the step of acquiring a specific alarm object in an alarm slice and generating an alarm cluster based on the specific alarm object may specifically include: first obtaining the specific alarm object in the alarm slice. Then respectively calculate the call chain distance between each alarm object in the alarm slice except the specific alarm object and the specific alarm object. Among them, the above-mentioned call chain distance is the distance value corresponding to the relationship of the number of mutual calls between the alarm objects. Specifically, there will be a call relationship between the alarm objects in the operation and maintenance system. In a certain operation and maintenance system, Alarm object A will call alarm object B, and alarm object B will call alarm object C. If there is a call relationship, there will be an influence relationship. For any two alarm objects that are associated with n calls, then one of the two alarm objects The call chain distance between is n. For example, if A and B are two alarm objects associated with n calls, then the call chain distance between A and B is n; for another example, according to the above-mentioned A call B, B calls C can be obtained Out: A and B are two alarm objects associated with one call, that is, the call chain distance between A and B is 1; in the same way, B and C are two alarm objects associated with one call, That is, the call chain distance between B and C is 1; A and C are two alarm objects that are associated with two calls, that is, the call chain distance between A and C is 2. In addition, when the above-mentioned call chain distance is calculated as n, for any alarm object A among the multiple alarm objects included in a certain alarm impact problem, there is at least one alarm object B and the call chain distance between it is less than or equal to n . After the call chain distance is calculated, the target alarm object whose call chain distance to the specific alarm object is not greater than the preset distance threshold is filtered from the alarm slice. Wherein, the value of the aforementioned preset distance threshold is set according to the aforementioned alarm propagation distance, which can be set according to the actual value of the alarm propagation distance, for example, it can be set to 1. And cyclically execute the steps of screening the target alarm object whose call chain distance to the specific alarm object is not greater than the preset distance threshold from the alarm slice, until there is no call chain with the specific alarm object in the alarm slice The target alarm object whose distance is not greater than the preset distance threshold. Finally, all the above-mentioned target alarm objects and the above-mentioned specific alarm objects obtained by screening are placed in a preset alarm set to obtain the above-mentioned alarm cluster. Among them, an alarm set can be created in advance, and the above-mentioned alarm cluster can be generated by placing the above-mentioned specific alarm object and the above-mentioned target alarm object in the alarm set, so that the specified alarm object in the alarm cluster can be rooted subsequently using relevant algorithm rules. Based on analysis, the root cause object of the alarm cluster can be obtained intelligently and quickly.
进一步地,本申请一实施例中,上述步骤S4,包括:Further, in an embodiment of the present application, the above step S4 includes:
S400:获取第一指标时序数据的第一时间窗口,以及所述第二指标时序数据的第二时间窗口;S400: Acquire a first time window of the first indicator time series data, and a second time window of the second indicator time series data;
S401:根据所述第二时间窗口与预设的时间差阈值,对所述第一时间窗口进行滑动调整,以控制所述第一时间窗口与所述第二时间窗口的时间差在所述时间差阈值的范围之内,并得到经过滑动调整后的多组指定第一时间窗口;S401: Perform sliding adjustment on the first time window according to the second time window and a preset time difference threshold, so as to control the time difference between the first time window and the second time window to be within the time difference threshold. Within the range, and get multiple sets of designated first time windows after sliding adjustment;
S402:根据多组所述指定第一时间窗口,计算出与多组所述指定第一时间窗口分别对应的,多组所述指定指标与所述入口指标之间的皮尔逊相似度;S402: Calculate the Pearson similarity between the multiple sets of the specified indicators and the entry indicators corresponding to the multiple sets of the specified first time windows according to the multiple sets of the specified first time windows;
S403:从所述多组皮尔逊相似度中筛选出数值最大的皮尔逊相似度;S403: Filter the Pearson similarity with the largest value from the multiple groups of Pearson similarity;
S404:将所述数值最大的皮尔逊相似度确定为所述指定皮尔逊相似度。S404: Determine the Pearson similarity with the largest value as the designated Pearson similarity.
如上述步骤S400至S404所述,上述根据所述第二指标时序数据的第二时间窗口,按照第一预设规则对所述第一指标时序数据的第一时间窗口进行调整,并计算出所述指定指标与所述入口指标之间的指定皮尔逊相似度的步骤,具体可包括:首先获取第一指标时序数据的第一时间窗口,以及所述第二指标时序数据的第二时间窗口。其中,上述第一时间窗口与第二时间窗口的数值大小以分为单位。然后根据所述第二时间窗口与预设的时间差阈值,对所述第一时间窗口进行滑动调整,以控制所述第一时间窗口与所述第二时间窗口的时间差在所述时间差阈值的范围之内,并得到经过滑动调整后的多组指定第一时间窗口。其中,上述时间差的数值形式为绝对值。之后根据多组所述指定第一窗口,计算出与多组上述指定第一时间窗口分别对应的,多组上述指定指标与上述入口指标之间的皮尔逊相似度。其中,上述第一时间窗口与第二时间窗口的大小相同。上述时间差阈值为一个范围值,且对于时间差阈值的具体数值范围不作限定,例如可设置为0-60min。另外,可根据预设次数来滑动调整上述第一时间窗口来得到与 上述预设次数相同的多组指定第一时间窗口,且不同的指定第一时间窗口所对应的指定指标的指标数据会不同,进而能够计算出指定指标与所述入口指标之间的多组不同的皮尔逊相似度。在得到了上述多组皮尔逊相似度后,再从所述多组皮尔逊相似度中筛选出数值最大的皮尔逊相似度,并将上述数值最大的皮尔逊相似度确定为上述指定皮尔逊相似度。本实施例通过预设的时间差阈值来第一指标时序数据的第一时间窗口进行对应的滑动调整,进而根据得到的多组指定第一时间窗口来计算出多组皮尔逊相似度,并从多组皮尔逊相似度中准确地筛选出数值最大的指定皮尔逊相似度,有利于后续根据该指定皮尔逊相似度并按照预设的相关规则来准确地确定出告警簇的根因对象。As described in the above steps S400 to S404, the first time window of the first indicator time series data is adjusted according to the first preset rule according to the second time window of the second indicator time series data, and all the time windows are calculated. The step of specifying the Pearson similarity between the specified indicator and the entry indicator may specifically include: first obtaining a first time window of the first indicator time series data and a second time window of the second indicator time series data. Wherein, the numerical values of the first time window and the second time window are in units of minutes. Then, according to the second time window and the preset time difference threshold, the first time window is slidingly adjusted to control the time difference between the first time window and the second time window within the range of the time difference threshold And get multiple sets of designated first time windows after sliding adjustment. Among them, the numerical form of the above-mentioned time difference is an absolute value. Then, according to the multiple sets of the specified first windows, the Pearson similarity between the multiple sets of the specified indicators and the entry indicators respectively corresponding to the multiple sets of the specified first time windows is calculated. Wherein, the above-mentioned first time window and the second time window have the same size. The above-mentioned time difference threshold is a range value, and the specific numerical range of the time difference threshold is not limited, for example, it can be set to 0-60 min. In addition, the first time window can be slidably adjusted according to the preset number of times to obtain multiple sets of designated first time windows with the same preset number of times, and the index data of the designated indicators corresponding to different designated first time windows will be different , And then it is possible to calculate multiple sets of different Pearson similarities between the designated index and the entry index. After the multiple groups of Pearson similarity are obtained, the Pearson similarity with the largest value is selected from the multiple groups of Pearson similarity, and the Pearson similarity with the largest value is determined as the specified Pearson similarity. Spend. In this embodiment, the first time window of the first indicator time series data is used to perform corresponding sliding adjustments through a preset time difference threshold, and then multiple sets of Pearson similarities are calculated according to the obtained multiple sets of designated first time windows, and from multiple sets of Pearson similarities. Accurately filtering out the designated Pearson similarity with the largest value from the group Pearson similarity is beneficial to subsequently accurately determining the root cause object of the alarm cluster based on the designated Pearson similarity and in accordance with preset related rules.
进一步地,本申请一实施例中,上述步骤S6,包括:Further, in an embodiment of the present application, the above step S6 includes:
S600:将所述指定告警对象对应的指定皮尔逊相似度与指定时间差作为第一特征;S600: Use the specified Pearson similarity corresponding to the specified alarm object and the specified time difference as the first feature;
S601:获取与所述指定告警对象对应的第二特征,其中,所述第二特征的数量包括一个或多个;S601: Acquire a second characteristic corresponding to the specified alarm object, where the quantity of the second characteristic includes one or more;
S602:调用预设的监督学习算法对与每一个所述指定告警对象分别对应的第一特征与第二特征进行预测处理,计算出与每一个指定告警对象分别对应的根因判断概率值;S602: Invoke a preset supervised learning algorithm to perform prediction processing on the first feature and the second feature respectively corresponding to each of the specified alarm objects, and calculate the root cause judgment probability value corresponding to each specified alarm object;
S603:将所述根因判断概率值最高的至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。S603: Use the at least one designated alarm object with the highest root cause judgment probability value as the root cause object of the alarm cluster, and output the root cause object of the alarm cluster.
如上述步骤S600至S603所述,上述根据所有上述指定皮尔逊相似度以及每一个上述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有上述指定告警对象中筛选出至少一个指定告警对象作为上述告警簇的根因对象,并输出上述告警簇的根因对象的步骤,具体可包括:首先将上述指定告警对象对应的指定皮尔逊相似度与指定时间差作为第一特征。然后获取与上述指定告警对象对应的第二特征,其中,上述第二特征是指与与上述指定告警对象对应的其他相关特征,第二特征的数量包括一个或多个。且上述第二特征具体可包括对象被调用次数、对象告警次数、对象层级中的一个或多个。之后调用预设的监督学习算法对与每一个上述指定告警对象分别对应的第一特征与第二特征进行预测处理,计算出与每一个指定告警对象分别对应的根因判断概率值。其中,上述监督学习算法具体可为随机森林算法,随机森林算法属于一种集成学习算法,其构建了多个弱学习器(树模型),并通过随机抽取特征与样本训练任意学习器,最后获得多个弱学习器的综合判断结果。随机森林的预测输出形式为概率值(一般概率大于0.5为1,反之则为0),以一个对象的特征为输入,输出为该对象作为根因的判定概率。对于一个告警簇存在多个告警对象,在将上述第一特征与其他相关特征输入至随机森林算法所构建的弱学习器内时,随机森林算法会为每一个告警对象计算对应的根因判断概率值。最后在得到上述根因判断概率值时,将上述根因判断概率值最高的至少一个指定告警对象作为上述告警簇的根因对象,并输出上述告警簇的根因对象。另外,除了输出上述告警簇的根因对象,还可以输出与上述告警簇的根因对象相对应的第一特征与第二特征。本实施例通过调用监督学习算法,对获取的多个特征,即第一特征与第二特征进行根因预测,能够准确的输出根因判断概率值最高的至少一个指定告警对象,从而可以快速智能的生成与告警对象对应的告警簇的根因对象,有效的减轻了运维人员的工作量,提高了运维工作的工作效率。As described in the above steps S600 to S603, according to all the specified Pearson similarities and the specified time difference corresponding to each of the specified Pearson similarities, at least one of all the specified alarm objects is selected according to the second preset rule. The step of specifying the alarm object as the root cause object of the alarm cluster and outputting the root cause object of the alarm cluster may specifically include: firstly taking the specified Pearson similarity and the specified time difference corresponding to the specified alarm object as the first feature. Then, a second feature corresponding to the specified alarm object is acquired, where the second feature refers to other related features corresponding to the specified alarm object, and the number of the second features includes one or more. And the above-mentioned second feature may specifically include one or more of the number of object calls, the number of object alarms, and the object hierarchy. Then, a preset supervised learning algorithm is called to predict the first feature and the second feature corresponding to each of the above-mentioned designated alarm objects, and the root cause judgment probability value corresponding to each designated alarm object is calculated. Among them, the above-mentioned supervised learning algorithm can be a random forest algorithm. The random forest algorithm is an ensemble learning algorithm, which constructs multiple weak learners (tree models), and trains any learner by randomly extracting features and samples, and finally obtains Comprehensive judgment result of multiple weak learners. The predictive output of random forest is in the form of a probability value (generally, the probability is greater than 0.5 is 1, otherwise it is 0). The feature of an object is input, and the output is the judgment probability of the object as the root cause. For an alarm cluster, there are multiple alarm objects. When the above first feature and other related features are input into the weak learner constructed by the random forest algorithm, the random forest algorithm will calculate the corresponding root cause judgment probability for each alarm object value. Finally, when the root cause judgment probability value is obtained, at least one designated alarm object with the highest root cause judgment probability value is taken as the root cause object of the alarm cluster, and the root cause object of the alarm cluster is output. In addition, in addition to outputting the root cause object of the alarm cluster, the first feature and the second feature corresponding to the root cause object of the alarm cluster may also be output. In this embodiment, by invoking the supervised learning algorithm, root cause prediction is performed on multiple acquired features, that is, the first feature and the second feature, and at least one designated alarm object with the highest root cause judgment probability value can be output accurately, so that it can be quickly and intelligently The generation of the root cause object of the alarm cluster corresponding to the alarm object effectively reduces the workload of the operation and maintenance personnel and improves the work efficiency of the operation and maintenance work.
进一步地,本申请一实施例中,上述步骤S603,包括:Further, in an embodiment of the present application, the above step S603 includes:
S6030:将所有所述根因判断概率值按照从高到低的顺序进行排序,得到排序结果;S6030: Sort all the root cause judgment probability values in order from high to low to obtain a sorting result;
S6031:从所述排序结果中排在首位的根因判断概率值开始,依次获取预设数量的指定根因判断概率值;S6031: Starting from the root cause judgment probability value ranked first in the ranking result, sequentially obtain a preset number of designated root cause judgment probability values;
S6032:将与获取到的所述指定根因判断概率值对应的告警对象作为所述告警簇的根因对象。S6032: Use the alarm object corresponding to the obtained specified root cause judgment probability value as the root cause object of the alarm cluster.
如上述步骤S6030至S6032所述,上述将根因判断概率值最高的至少一个指定告警对象作为上述告警簇的根因对象,并输出上述告警簇的根因对象的步骤,具体可包括:首先将所有上述根因判断概率值按照从高到低的顺序进行排序,得到排序结果。在得到了上述排序结果后,从上述排序结果中排在首位的根因判断概率值开始,依次获取预设数量的指定根因判断概率值。其中,对于上述预设数量不作具体限定,可根据实际需求进行设置,例如可设为5个。最后将与获取到的上述指定根因判断概率值对应的告警对象作为上述告警簇的根因对象,使得后续可以将获取到的告警簇的根因对象作为对应于告警簇的根因向运维人员输出,以便运维人员能够根据输出的该告警簇的根因来对本次的告警簇进行告警根因定位。举例地,当上述预设数量设为5时,假设将所有根因判断概率值进行排序后得到的排序结果为:根因判断概率值1、根因判断概率值2、根因判断概率值3、根因判断概率值4、根因判断概率值5、……,则可将上述根因判断概率值1、根因判断概率值2、根因判断概率值3、根因判断概率值4、根因判断概率值5分别对应的告警对象作为上述告警簇的根因对象。As described in the above steps S6030 to S6032, the step of using at least one designated alarm object with the highest root cause judgment probability value as the root cause object of the alarm cluster and outputting the root cause object of the alarm cluster may specifically include: All the above-mentioned root cause judgment probability values are sorted in order from high to low, and the sorting result is obtained. After obtaining the above ranking result, starting from the root cause judgment probability value ranked first in the above ranking result, a preset number of designated root cause judgment probability values are sequentially obtained. Among them, the above-mentioned preset number is not specifically limited, and can be set according to actual needs, for example, it can be set to five. Finally, the alarm object corresponding to the obtained specified root cause judgment probability value is used as the root cause object of the above alarm cluster, so that the acquired root cause object of the alarm cluster can be subsequently used as the root cause corresponding to the alarm cluster. Personnel output, so that the operation and maintenance personnel can locate the alarm root cause of this alarm cluster according to the output root cause of the alarm cluster. For example, when the above-mentioned preset number is set to 5, it is assumed that the ranking results obtained after sorting all root cause judgment probability values are: root cause judgment probability value 1, root cause judgment probability value 2, root cause judgment probability value 3 , Root cause judgment probability value 4, root cause judgment probability value 5,..., then the above root cause judgment probability value 1, root cause judgment probability value 2, root cause judgment probability value 3, root cause judgment probability value 4, The alarm objects corresponding to the root cause judgment probability value of 5 are used as the root cause objects of the above-mentioned alarm cluster.
本申请一实施例中,上述步骤S6,包括:In an embodiment of the present application, the foregoing step S6 includes:
S610:基于历史告警数据计算与所述指定皮尔逊相似度对应的第一权重值;S610: Calculate a first weight value corresponding to the specified Pearson similarity based on historical alarm data;
S611:获取与所述指定时间差对应的第二权重值;S611: Obtain a second weight value corresponding to the specified time difference;
S612:根据所述第一权重值与所述第二权重值,对每一个所述指定皮尔逊相似度与相对应的每一个 指定时间差进行加权计算,得到多个加权值;S612: Perform a weighted calculation on each of the designated Pearson similarities and each corresponding designated time difference according to the first weight value and the second weight value to obtain multiple weighted values;
S613:将加权值最高的至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。S613: Use at least one designated alarm object with the highest weight value as the root cause object of the alarm cluster, and output the root cause object of the alarm cluster.
如上述步骤S610至S613所述,上述根据所有上述指定皮尔逊相似度以及每一个上述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有上述指定告警对象中筛选出至少一个指定告警对象作为上述告警簇的根因对象,并输出上述告警簇的根因对象的步骤,具体可包括:首先基于历史告警数据计算与上述指定皮尔逊相似度对应的第一权重值。其中,可根据上述指定告警对象在历史告警数据中的出现频率来计算出与上述指定皮尔逊相似度对应的第一权重值。在得到了上述第一权重值后,再获取与上述指定时间差对应的第二权重值。其中,其中,对于上述第二权重值的数值不作具体限定,可根据实际需求进行设置,例如可以设置为0.5。在获得了上述第一权重值与第二权重值后,再根据上述第一权重值与上述第二权重值,对每一个上述指定皮尔逊相似度与相对应的每一个指定时间差进行加权计算,得到多个加权值。最后将加权值最高的至少一个指定告警对象作为上述告警簇的根因对象,并输出上述告警簇的根因对象。其中,将加权值最高的至少一个指定告警对象作为上述告警簇的根因对象的确定过程可参考上述将根因判断概率值最高的至少一个指定告警对象作为上述告警簇的根因对象,在此不再赘述。本实施例通过对获取得到的与指定皮尔逊相似度对应的第一权重值,以及与指定时间差对应的第二权重值进行加权计算处理得到对应的多个加权值,进而准确的输出加权值最高的至少一个指定告警对象来作为告警簇的根因对象,有效地实现了可以快速智能的生成与告警对象对应的告警簇的根因对象,减轻了运维人员的工作量,提高了运维工作的工作效率。As described in the above steps S610 to S613, according to all the specified Pearson similarities and the specified time difference corresponding to each of the specified Pearson similarities, at least one is selected from all the specified alarm objects according to the second preset rule. The step of specifying the alarm object as the root cause object of the alarm cluster and outputting the root cause object of the alarm cluster may specifically include: first calculating a first weight value corresponding to the specified Pearson similarity based on historical alarm data. Wherein, the first weight value corresponding to the specified Pearson similarity can be calculated according to the frequency of occurrence of the specified alarm object in the historical alarm data. After the first weight value is obtained, the second weight value corresponding to the specified time difference is obtained. Among them, the value of the above-mentioned second weight value is not specifically limited, and can be set according to actual needs, for example, it can be set to 0.5. After the first weight value and the second weight value are obtained, the weighted calculation is performed on each designated Pearson similarity degree and each corresponding designated time difference according to the first weight value and the second weight value, and Get multiple weighted values. Finally, at least one designated alarm object with the highest weighted value is taken as the root cause object of the above-mentioned alarm cluster, and the root cause object of the above-mentioned alarm cluster is output. Wherein, the process of determining at least one designated alarm object with the highest weighted value as the root cause object of the alarm cluster can refer to the above-mentioned setting at least one designated alarm object with the highest root cause judgment probability value as the root cause object of the alarm cluster. No longer. In this embodiment, the obtained first weight value corresponding to the specified Pearson similarity degree and the second weight value corresponding to the specified time difference are weighted and calculated to obtain multiple corresponding weight values, and the accurate output weight value is the highest. At least one designated alarm object is used as the root cause object of the alarm cluster, which effectively realizes that the root cause object of the alarm cluster corresponding to the alarm object can be quickly and intelligently generated, which reduces the workload of operation and maintenance personnel and improves the operation and maintenance work Work efficiency.
进一步地,本申请一实施例中,上述步骤S610,包括:Further, in an embodiment of the present application, the above step S610 includes:
S6100:获取所述历史告警数据,以及所述历史告警数据的第一数量;S6100: Acquire the historical alarm data and the first quantity of the historical alarm data;
S6101:从所述历史告警数据中分别筛选出包含各个指定告警对象的指定历史告警数据的第二数量;S6101: Respectively filter out the second quantity of the specified historical alarm data containing each specified alarm object from the historical alarm data;
S6102:分别计算每一个所述指定历史告警数据的第二数量与所述第一数量的商值,得到与各个所述指定告警对象对应的多个占比值;S6102: Calculate the quotient of the second quantity of each of the designated historical alarm data and the first quantity to obtain multiple proportion values corresponding to each of the designated alarm objects;
S6103:将得到的各个所述比值确定为与各个所述指定告警对象对应的第一权重值。S6103: Determine each obtained ratio as a first weight value corresponding to each designated alarm object.
如上述步骤S6100至S6103所述,上述基于历史告警数据计算与上述指定皮尔逊相似度对应的第一权重值的步骤,具体可包括:首先获取历史告警数据,以及上述历史告警数据的第一数量。其中,可以采用定时任务机制,按照设定好的时间周期读取预设时间段内的历史告警数据,以进行后续的数据处理分析。该定时任务执行的时间周期、告警数据采样时长等其他任务数据可以预先存储至固定配置文件中。在本实施例的告警根因的定位装置初次启动时,通过读取该固定配置文件,获取到定时任务执行的时间周期,并将其注册到定时任务执行器中。当触发该定时任务之后,读取配置文件中配置的告警数据采样时长,从历史告警数记录表中获取到对应的历史告警数据。之后,从上述历史告警数据中分别筛选出包含各个指定告警对象的指定历史告警数据的第二数量。在得到了上述第二数量后,再分别计算每一个上述指定历史告警数据的第二数量与上述第一数量的商值,得到与各个上述指定告警对象对应的多个占比值。最后在得到了上述多个占比值时,将得到的各个上述比值确定为与各个上述指定告警对象对应的第一权重值,以便后续根据该第一权重值来计算出上述加权值,进而根据加权值来确定出告警簇的根因对象。As described in the foregoing steps S6100 to S6103, the foregoing step of calculating the first weight value corresponding to the specified Pearson similarity based on the historical alarm data may specifically include: first obtaining the historical alarm data and the first quantity of the foregoing historical alarm data . Among them, a timed task mechanism can be used to read historical alarm data within a preset time period according to a set time period to perform subsequent data processing and analysis. Other task data such as the execution time period of the timing task and the alarm data sampling duration can be stored in a fixed configuration file in advance. When the device for locating the root cause of an alarm in this embodiment is started for the first time, the fixed configuration file is read to obtain the time period for the execution of the timing task and register it in the timing task executor. After the timing task is triggered, the alarm data sampling duration configured in the configuration file is read, and the corresponding historical alarm data is obtained from the historical alarm number record table. Afterwards, the second quantity of the specified historical alarm data containing each specified alarm object is filtered out from the above-mentioned historical alarm data. After the second quantity is obtained, the quotient of the second quantity of each of the designated historical alarm data and the first quantity is calculated to obtain multiple proportion values corresponding to each of the designated alarm objects. Finally, when the multiple ratios are obtained, the obtained ratios are determined as the first weight value corresponding to each of the specified alarm objects, so that the weight value is calculated according to the first weight value, and then the weight value is calculated according to the weight value. Value to determine the root cause of the alarm cluster.
进一步地,本申请一实施例中,上述步骤S6之后,包括:Further, in an embodiment of the present application, after the above step S6, the method includes:
S620:展示所述告警簇的根因对象;S620: Display the root cause object of the alarm cluster;
S621:接收运维人员从所述告警簇的根因对象中选择的指定根因对象;S621: Receive a designated root cause object selected by the operation and maintenance personnel from the root cause objects of the alarm cluster;
S622:将所述指定根因对象确定为所述告警簇的最终根因对象。S622: Determine the specified root cause object as the final root cause object of the alarm cluster.
如上述步骤S620至S622所述,在生成了上述告警簇的根因对象后,还可进一步借由运维人员从所有根因对象中确定出告警簇的最终根因对象。具体地,首先向运维人员展示上述告警簇的根因对象。其中,除了输出上述告警簇的根因对象,还可以输出与上述告警簇的根因对象相对应的第一特征与第二特征,或者是与上述告警簇的根因对象相对应的权重值。然后接收运维人员从上述告警簇的根因对象中选择的指定根因对象。其中,上述指定根因对象的数量优选为一个。另外,运维人员可根据上述告警簇的根因对象、第一特征、第二特征或权重值来进行人工判定,并对所有告警簇的根因对象中的部分错误的根因对象进行过滤,进而得到上述指定根因对象。最后将上述指定根因对象确定为上述告警簇的最终根因对象,以根据运维人员执行对于上述所有告警簇的根因对象中的部分错误的根因对象的筛选剔除处理后得到告警簇的最终根因对象,使得运维人员能够根据上述告警簇的最终根因对象来更加快捷准确地对本次的告警簇进行告警根因定位。举例地,如果输出并展示的告警簇的根因对象为:根因对象1、根因对象2、根因对象3、根因对象4、根因对象5,且运维人员选择的指定根因对象为根因对象4,则将根因对象4作为告警簇的最终根因对象。As described in the above steps S620 to S622, after the root cause object of the alarm cluster is generated, the operation and maintenance personnel can further determine the final root cause object of the alarm cluster from all root cause objects. Specifically, the root cause object of the above-mentioned alarm cluster is first shown to the operation and maintenance personnel. Wherein, in addition to outputting the root cause object of the alarm cluster, the first feature and the second feature corresponding to the root cause object of the alarm cluster can also be output, or the weight value corresponding to the root cause object of the alarm cluster. Then, the designated root cause object selected by the operation and maintenance personnel from the root cause object of the above alarm cluster is received. Among them, the number of the above-mentioned designated root cause object is preferably one. In addition, operation and maintenance personnel can make manual judgments based on the root cause objects, first feature, second feature, or weight value of the above-mentioned alarm clusters, and filter some of the wrong root cause objects among the root cause objects of all alarm clusters. And then get the above-mentioned designated root cause object. Finally, the above-mentioned designated root cause object is determined as the final root cause object of the alarm cluster, so as to obtain the alarm cluster after the operation and maintenance personnel perform the screening and elimination of some of the erroneous root cause objects in the root cause objects of all the alarm clusters. The final root cause object enables the operation and maintenance personnel to locate the root cause of the alarm cluster more quickly and accurately based on the final root cause object of the above alarm cluster. For example, if the root cause object of the alarm cluster that is output and displayed is: root cause object 1, root cause object 2, root cause object 3, root cause object 4, root cause object 5, and the designated root cause selected by the operation and maintenance personnel If the object is root cause object 4, then root cause object 4 is regarded as the final root cause object of the alarm cluster.
参照图2,本申请一实施例中还提供了一种告警根因的定位装置,包括:Referring to Figure 2, an embodiment of the present application also provides a device for locating the root cause of an alarm, including:
第一获取模块1,用于获取告警切片中的特定告警对象,并根据所述特定告警对象生成告警簇,其中,所述特定告警对象为所述告警切片中所有的告警对象内的任意一个告警对象;The first obtaining module 1 is configured to obtain a specific alarm object in an alarm slice, and generate an alarm cluster according to the specific alarm object, where the specific alarm object is any alarm among all alarm objects in the alarm slice Object
生成模块2,用于获取所述告警簇中的指定告警对象,并对所述指定告警对象中的监控指标进行指聚合处理,生成对应的入口指标,其中,所述指定告警对象为所述告警簇中所有的告警对象内的任意一个告警对象;The generating module 2 is used to obtain the specified alarm object in the alarm cluster, and perform finger aggregation processing on the monitoring indicators in the specified alarm object to generate corresponding entry indicators, wherein the specified alarm object is the alarm Any one of the alarm objects in all the alarm objects in the cluster;
第二获取模块3,用于获取与所述指定告警对象的指定指标对应的第一指标时序数据,以及获取与所述指定告警对象的入口指标对应的第二指标时序数据;The second obtaining module 3 is configured to obtain first indicator time series data corresponding to the specified indicator of the specified alarm object, and obtain second indicator time series data corresponding to the entry indicator of the specified alarm object;
调整模块4,用于根据所述第二指标时序数据的第二时间窗口,按照第一预设规则对所述第一指标时序数据的第一时间窗口进行调整,并计算出所述指定指标与所述入口指标之间的指定皮尔逊相似度;The adjustment module 4 is configured to adjust the first time window of the first indicator time series data according to the first preset rule according to the second time window of the second indicator time series data, and calculate the specified indicator and The designated Pearson similarity between the entry indicators;
第三获取模块5,用于分别获取每一个所述指定告警对象对应的指定皮尔逊相似度,以及与每一个所述指定皮尔逊相似度分别对应的指定时间差;The third acquiring module 5 is configured to respectively acquire the designated Pearson similarity corresponding to each of the designated alarm objects, and the designated time difference corresponding to each of the designated Pearson similarities;
第一确定模块6,用于根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。The first determination module 6 is configured to screen out at least all the specified alarm objects according to the second preset rule according to all the specified Pearson similarities and the specified time difference corresponding to each of the specified Pearson similarities. A designated alarm object is used as the root cause object of the alarm cluster, and the root cause object of the alarm cluster is output.
本实施例中,上述告警根因的定位装置中的第一获取模块、生成模块、第二获取模块、调整模块、第三获取模块与第一确定模块的功能和作用的实现过程具体详见上述告警根因的定位方法中对应步骤S1至S6的实现过程,在此不再赘述。In this embodiment, the implementation process of the functions and roles of the first acquisition module, generation module, second acquisition module, adjustment module, third acquisition module, and first determination module in the device for locating the root cause of the above-mentioned alarm is detailed above. The implementation process corresponding to steps S1 to S6 in the method for locating the root cause of the alarm will not be repeated here.
进一步地,本申请一实施例中,上述第一获取模块,包括:Further, in an embodiment of the present application, the above-mentioned first acquisition module includes:
第一获取单元,用于获取告警切片中的特定告警对象;The first obtaining unit is used to obtain a specific alarm object in the alarm slice;
第一计算单元,用于分别计算所述告警切片中除所述特定告警对象外的每一个告警对象与所述特定告警对象之间的调用链距离;The first calculation unit is configured to calculate the call chain distance between each alarm object except the specific alarm object in the alarm slice and the specific alarm object;
第一筛选单元,用于循环执行从所述告警切片中筛选出与所述特定告警对象之间的调用链距离不大于预设距离阈值的目标告警对象的步骤,直至在所述告警切片中不存在与所述特定告警对象之间的调用链距离不大于预设距离阈值的目标告警对象;The first screening unit is configured to cyclically perform the step of screening out the target alarm object whose call chain distance to the specific alarm object is not greater than a preset distance threshold from the alarm slice, until no target alarm object is found in the alarm slice. There is a target alarm object whose call chain distance to the specific alarm object is not greater than a preset distance threshold;
放置单元,用于将筛选得到的所有所述目标告警对象与所述特定告警对象放置于预设的告警集合内,得到所述告警簇。The placing unit is configured to place all the target alarm objects and the specific alarm objects obtained through screening in a preset alarm set to obtain the alarm cluster.
本实施例中,上述告警根因的定位装置中的第一获取单元、第一计算单元、第一筛选单元与放置单元的功能和作用的实现过程具体详见上述告警根因的定位方法中对应步骤S100至S103的实现过程,在此不再赘述。In this embodiment, the realization process of the functions and roles of the first acquisition unit, the first calculation unit, the first screening unit and the placement unit in the device for locating the root cause of the alarm is detailed in the corresponding method for locating the root cause of the alarm. The implementation process of steps S100 to S103 will not be repeated here.
进一步地,本申请一实施例中,上述调整模块,包括:Further, in an embodiment of the present application, the aforementioned adjustment module includes:
第二获取单元,用于获取第一指标时序数据的第一时间窗口,以及所述第二指标时序数据的第二时间窗口;The second acquiring unit is configured to acquire the first time window of the first indicator time series data and the second time window of the second indicator time series data;
调整单元,用于根据所述第二时间窗口与预设的时间差阈值,对所述第一时间窗口进行滑动调整,以控制所述第一时间窗口与所述第二时间窗口的时间差在所述时间差阈值的范围之内,并得到经过滑动调整后的多组指定第一时间窗口;The adjustment unit is configured to perform sliding adjustment on the first time window according to the second time window and a preset time difference threshold, so as to control the time difference between the first time window and the second time window within the Within the range of the time difference threshold, and obtain multiple sets of designated first time windows after sliding adjustment;
第二计算单元,用于根据多组所述指定第一时间窗口,计算出与多组所述指定第一时间窗口分别对应的,多组所述指定指标与所述入口指标之间的皮尔逊相似度;The second calculation unit is configured to calculate the Pearson difference between the multiple sets of the specified indicators and the entry indicators corresponding to the multiple sets of the specified first time windows, respectively, according to the multiple sets of the specified first time windows Similarity
第二筛选单元,用于从所述多组皮尔逊相似度中筛选出数值最大的皮尔逊相似度;The second screening unit is used to screen the Pearson similarity with the largest value from the multiple groups of Pearson similarity;
第一确定单元,用于将所述数值最大的皮尔逊相似度确定为所述指定皮尔逊相似度。The first determining unit is configured to determine the Pearson similarity degree with the largest value as the designated Pearson similarity degree.
本实施例中,上述告警根因的定位装置中的第二获取单元、调整单元、第二计算单元、第二筛选单元与第一确定单元的功能和作用的实现过程具体详见上述告警根因的定位方法中对应步骤S400至S404的实现过程,在此不再赘述。In this embodiment, the implementation process of the functions and roles of the second acquisition unit, the adjustment unit, the second calculation unit, the second screening unit and the first determination unit in the positioning device of the above-mentioned root cause of the alarm is detailed in the above-mentioned root cause of the alarm. The implementation process of corresponding steps S400 to S404 in the positioning method is not repeated here.
进一步地,本申请一实施例中,上述第一确定模块,包括:Further, in an embodiment of the present application, the above-mentioned first determining module includes:
第二确定单元,用于将所述指定告警对象对应的指定皮尔逊相似度与指定时间差作为第一特征;A second determining unit, configured to use the designated Pearson similarity corresponding to the designated alarm object and the designated time difference as the first feature;
第三获取单元,用于获取与所述指定告警对象对应的第二特征,其中,所述第二特征的数量包括一个或多个;The third acquiring unit is configured to acquire a second characteristic corresponding to the specified alarm object, wherein the quantity of the second characteristic includes one or more;
调用单元,用于调用预设的监督学习算法对与每一个所述指定告警对象分别对应的第一特征与第二特征进行预测处理,计算出与每一个指定告警对象分别对应的根因判断概率值;The calling unit is used to call a preset supervised learning algorithm to perform prediction processing on the first feature and the second feature corresponding to each specified alarm object, and calculate the root cause judgment probability corresponding to each specified alarm object. value;
第三确定单元,用于将所述根因判断概率值最高的至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。The third determining unit is configured to use at least one designated alarm object with the highest root cause judgment probability value as the root cause object of the alarm cluster, and output the root cause object of the alarm cluster.
本实施例中,上述告警根因的定位装置中的第二确定单元、第三获取单元、调用单元与第三确定单元的功能和作用的实现过程具体详见上述告警根因的定位方法中对应步骤S600至S603的实现过程,在此不再赘述。In this embodiment, the functions and functions of the second determining unit, the third acquiring unit, the invoking unit and the third determining unit in the device for locating the root cause of the alarm are detailed in the corresponding method in the method for locating the root cause of the alarm. The implementation process of steps S600 to S603 will not be repeated here.
进一步地,本申请一实施例中,上述第三确定单元,包括:Further, in an embodiment of the present application, the above-mentioned third determining unit includes:
排序子单元,用于将所有所述根因判断概率值按照从高到低的顺序进行排序,得到排序结果;The sorting subunit is used to sort all the root cause judgment probability values in order from high to low to obtain the sorting result;
第一获取子单元,用于从所述排序结果中排在首位的根因判断概率值开始,依次获取预设数量的指定根因判断概率值;The first obtaining subunit is configured to sequentially obtain a preset number of designated root cause judgment probability values starting from the root cause judgment probability value ranked first in the ranking result;
第一确定子单元,用于将与获取到的所述指定根因判断概率值对应的告警对象作为所述告警簇的根因对象。The first determining subunit is configured to use the alarm object corresponding to the obtained specified root cause judgment probability value as the root cause object of the alarm cluster.
本实施例中,上述告警根因的定位装置中的排序子单元、第一获取子单元与第一确定子单元的功能和作用的实现过程具体详见上述告警根因的定位方法中对应步骤S6030至S6032的实现过程,在此不再赘述。In this embodiment, the functions and functions of the sorting subunit, the first acquiring subunit, and the first determining subunit in the device for positioning the root cause of the alarm are realized in detail in the corresponding step S6030 in the method for positioning the root cause of the alarm. The implementation process to S6032 will not be repeated here.
进一步地,本申请一实施例中,上述第一确定模块,包括:Further, in an embodiment of the present application, the above-mentioned first determining module includes:
第三计算单元,用于基于历史告警数据计算与所述指定皮尔逊相似度对应的第一权重值;The third calculation unit is configured to calculate the first weight value corresponding to the specified Pearson similarity based on historical alarm data;
第四获取单元,用于获取与所述指定时间差对应的第二权重值;A fourth obtaining unit, configured to obtain a second weight value corresponding to the specified time difference;
第四计算单元,用于根据所述第一权重值与所述第二权重值,对每一个所述指定皮尔逊相似度与相对应的每一个指定时间差进行加权计算,得到多个加权值;A fourth calculation unit, configured to perform a weighted calculation on each of the designated Pearson similarity and each corresponding designated time difference according to the first weight value and the second weight value to obtain multiple weighted values;
第四确定单元,用于将加权值最高的至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。The fourth determining unit is configured to use at least one designated alarm object with the highest weight value as the root cause object of the alarm cluster, and output the root cause object of the alarm cluster.
本实施例中,上述告警根因的定位装置中的第三计算单元、第四获取单元、第四计算单元与第四确定单元的功能和作用的实现过程具体详见上述告警根因的定位方法中对应步骤S610至S613的实现过程,在此不再赘述。In this embodiment, the implementation process of the functions and roles of the third calculation unit, the fourth acquisition unit, the fourth calculation unit, and the fourth determination unit in the device for locating the root cause of the alarm is detailed in the above method for locating the root cause of the alarm. The implementation process of corresponding steps S610 to S613 in, will not be repeated here.
进一步地,本申请一实施例中,上述第三计算单元,包括:Further, in an embodiment of the present application, the above-mentioned third calculation unit includes:
第二获取子单元,用于获取所述历史告警数据,以及所述历史告警数据的第一数量;The second obtaining subunit is configured to obtain the historical alarm data and the first quantity of the historical alarm data;
筛选子单元,用于从所述历史告警数据中分别筛选出包含各个指定告警对象的指定历史告警数据的第二数量;The screening subunit is used to filter the second quantity of the specified historical alarm data containing each specified alarm object from the historical alarm data;
计算子单元,用于分别计算每一个所述指定历史告警数据的第二数量与所述第一数量的商值,得到与各个所述指定告警对象对应的多个占比值;The calculation subunit is configured to calculate the quotient of the second quantity of each of the designated historical alarm data and the first quantity to obtain multiple proportion values corresponding to each of the designated alarm objects;
第二确定子单元,用于将得到的各个所述比值确定为与各个所述指定告警对象对应的第一权重值。The second determining subunit is configured to determine each of the obtained ratios as the first weight value corresponding to each of the designated alarm objects.
上述告警根因的定位装置中的第二获取子单元、筛选子单元、计算子单元与第二确定子单元的功能和作用的实现过程具体详见上述告警根因的定位方法中对应步骤S6100至S6103的实现过程,在此不再赘述。The implementation process of the functions and roles of the second acquisition subunit, the screening subunit, the calculation subunit, and the second determination subunit in the device for locating the root cause of the alarm is detailed in the corresponding steps S6100 to S6100 to the method for locating the root cause of the alarm. The implementation process of S6103 will not be repeated here.
进一步地,本申请一实施例中,上述告警根因的定位装置,包括:Further, in an embodiment of the present application, the device for locating the root cause of the alarm includes:
展示模块,用于展示所述告警簇的根因对象;The display module is used to display the root cause object of the alarm cluster;
接收模块,用于接收运维人员从所述告警簇的根因对象中选择的指定根因对象;The receiving module is configured to receive the designated root cause object selected by the operation and maintenance personnel from the root cause objects of the alarm cluster;
第二确定模块,用于将所述指定根因对象确定为所述告警簇的最终根因对象。The second determining module is configured to determine the designated root cause object as the final root cause object of the alarm cluster.
本实施例中,上述告警根因的定位装置中的展示模块、接收模块与第二确定模块的功能和作用的实现过程具体详见上述告警根因的定位方法中对应步骤S620至S622的实现过程,在此不再赘述。In this embodiment, the implementation process of the functions and effects of the display module, the receiving module and the second determining module in the above-mentioned root-cause locating device is detailed in the implementation process corresponding to steps S620 to S622 in the above-mentioned root-cause locating method. , I won’t repeat it here.
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储指定告警对象、监控指标、入口指标、指标时序数据、指定皮尔逊相似度以及指定时间差等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现上述的任一实施例所示出的告警根因的定位方法。Referring to FIG. 3, an embodiment of the present application also provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed for the computer equipment is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store data such as designated alarm objects, monitoring indicators, entry indicators, indicator time series data, designated Pearson similarity, and designated time difference. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor, the method for locating the root cause of the alarm shown in any of the above embodiments is realized.
上述处理器执行上述告警根因的定位方法的步骤包括:The steps for the processor to execute the method for locating the root cause of the alarm include:
获取告警切片中的特定告警对象,并根据所述特定告警对象生成告警簇,其中,所述特定告警对象为所述告警切片中所有的告警对象内的任意一个告警对象;Acquiring a specific alarm object in an alarm slice, and generating an alarm cluster according to the specific alarm object, where the specific alarm object is any alarm object among all the alarm objects in the alarm slice;
获取所述告警簇中的指定告警对象,并对所述指定告警对象中的监控指标进行指标聚合处理,生成对应的入口指标,其中,所述指定告警对象为所述告警簇中所有的告警对象内的任意一个告警对象;Obtain the specified alarm object in the alarm cluster, and perform indicator aggregation processing on the monitoring indicators in the specified alarm object to generate corresponding entry indicators, where the specified alarm object is all alarm objects in the alarm cluster Any alarm object within;
获取与所述指定告警对象的指定指标对应的第一指标时序数据,以及获取与所述指定告警对象的入口指标对应的第二指标时序数据;Acquiring first indicator time series data corresponding to the specified indicator of the specified alarm object, and acquiring second indicator time series data corresponding to the entry indicator of the specified alarm object;
根据所述第二指标时序数据的第二时间窗口,按照第一预设规则对所述第一指标时序数据的第一时间窗口进行调整,并计算出所述指定指标与所述入口指标之间的指定皮尔逊相似度;According to the second time window of the second indicator time series data, the first time window of the first indicator time series data is adjusted according to the first preset rule, and the distance between the specified indicator and the entry indicator is calculated. The specified Pearson similarity;
分别获取每一个所述指定告警对象对应的指定皮尔逊相似度,以及与每一个所述指定皮尔逊相似度 分别对应的指定时间差;Respectively acquiring a designated Pearson similarity corresponding to each of the designated alarm objects, and a designated time difference corresponding to each of the designated Pearson similarities;
根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。According to all the specified Pearson similarities and the specified time difference corresponding to each of the specified Pearson similarities, at least one specified alarm object is selected from all the specified alarm objects as the alarm according to a second preset rule The root cause object of the cluster, and output the root cause object of the alarm cluster.
本领域技术人员可以理解,图3中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的装置、计算机设备的限定。Those skilled in the art can understand that the structure shown in FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the devices and computer equipment to which the solution of the present application is applied.
本申请一实施例还提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,其上存储有计算机程序,计算机程序被处理器执行时实现上述的任一实施例所示出的告警根因的定位方法,其中,所述告警根因的定位方法包括:An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile, and has a computer program stored thereon, which is realized when the computer program is executed by a processor. In the method for locating the root cause of the alarm shown in any of the above embodiments, the method for locating the root cause of the alarm includes:
获取告警切片中的特定告警对象,并根据所述特定告警对象生成告警簇,其中,所述特定告警对象为所述告警切片中所有的告警对象内的任意一个告警对象;Acquiring a specific alarm object in an alarm slice, and generating an alarm cluster according to the specific alarm object, where the specific alarm object is any alarm object among all the alarm objects in the alarm slice;
获取所述告警簇中的指定告警对象,并对所述指定告警对象中的监控指标进行指标聚合处理,生成对应的入口指标,其中,所述指定告警对象为所述告警簇中所有的告警对象内的任意一个告警对象;Obtain the specified alarm object in the alarm cluster, and perform indicator aggregation processing on the monitoring indicators in the specified alarm object to generate corresponding entry indicators, where the specified alarm object is all alarm objects in the alarm cluster Any alarm object within;
获取与所述指定告警对象的指定指标对应的第一指标时序数据,以及获取与所述指定告警对象的入口指标对应的第二指标时序数据;Acquiring first indicator time series data corresponding to the specified indicator of the specified alarm object, and acquiring second indicator time series data corresponding to the entry indicator of the specified alarm object;
根据所述第二指标时序数据的第二时间窗口,按照第一预设规则对所述第一指标时序数据的第一时间窗口进行调整,并计算出所述指定指标与所述入口指标之间的指定皮尔逊相似度;According to the second time window of the second indicator time series data, the first time window of the first indicator time series data is adjusted according to the first preset rule, and the distance between the specified indicator and the entry indicator is calculated. The specified Pearson similarity;
分别获取每一个所述指定告警对象对应的指定皮尔逊相似度,以及与每一个所述指定皮尔逊相似度分别对应的指定时间差;Respectively acquiring a designated Pearson similarity corresponding to each of the designated alarm objects, and a designated time difference corresponding to each of the designated Pearson similarities;
根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。According to all the specified Pearson similarities and the specified time difference corresponding to each of the specified Pearson similarities, at least one specified alarm object is selected from all the specified alarm objects as the alarm according to a second preset rule The root cause object of the cluster, and output the root cause object of the alarm cluster.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储与一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM通过多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored and a non-volatile computer readable storage. In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of the specification and drawings of this application, or directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of this application.

Claims (20)

  1. 一种告警根因的定位方法,其中,包括:A method for locating the root cause of an alarm, which includes:
    获取告警切片中的特定告警对象,并根据所述特定告警对象生成告警簇,其中,所述特定告警对象为所述告警切片中所有的告警对象内的任意一个告警对象;Acquiring a specific alarm object in an alarm slice, and generating an alarm cluster according to the specific alarm object, where the specific alarm object is any alarm object among all the alarm objects in the alarm slice;
    获取所述告警簇中的指定告警对象,并对所述指定告警对象中的监控指标进行指标聚合处理,生成对应的入口指标,其中,所述指定告警对象为所述告警簇中所有的告警对象内的任意一个告警对象;Obtain the specified alarm object in the alarm cluster, and perform indicator aggregation processing on the monitoring indicators in the specified alarm object to generate corresponding entry indicators, where the specified alarm object is all alarm objects in the alarm cluster Any alarm object within;
    获取与所述指定告警对象的指定指标对应的第一指标时序数据,以及获取与所述指定告警对象的入口指标对应的第二指标时序数据;Acquiring first indicator time series data corresponding to the specified indicator of the specified alarm object, and acquiring second indicator time series data corresponding to the entry indicator of the specified alarm object;
    根据所述第二指标时序数据的第二时间窗口,按照第一预设规则对所述第一指标时序数据的第一时间窗口进行调整,并计算出所述指定指标与所述入口指标之间的指定皮尔逊相似度;According to the second time window of the second indicator time series data, the first time window of the first indicator time series data is adjusted according to the first preset rule, and the distance between the specified indicator and the entry indicator is calculated. The specified Pearson similarity;
    分别获取每一个所述指定告警对象对应的指定皮尔逊相似度,以及与每一个所述指定皮尔逊相似度分别对应的指定时间差;Respectively acquiring a designated Pearson similarity corresponding to each of the designated alarm objects, and a designated time difference corresponding to each of the designated Pearson similarities;
    根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。According to all the specified Pearson similarities and the specified time difference corresponding to each of the specified Pearson similarities, at least one specified alarm object is selected from all the specified alarm objects as the alarm according to a second preset rule The root cause object of the cluster, and output the root cause object of the alarm cluster.
  2. 根据权利要求1所述的告警根因的定位方法,其中,所述获取告警切片中的特定告警对象,并根据所述特定告警对象生成告警簇的步骤,包括:The method for locating the root cause of an alarm according to claim 1, wherein the step of obtaining a specific alarm object in an alarm slice and generating an alarm cluster according to the specific alarm object comprises:
    获取告警切片中的特定告警对象;Obtain a specific alarm object in the alarm slice;
    分别计算所述告警切片中除所述特定告警对象外的每一个告警对象与所述特定告警对象之间的调用链距离;Respectively calculating the call chain distance between each alarm object except the specific alarm object in the alarm slice and the specific alarm object;
    循环执行从所述告警切片中筛选出与所述特定告警对象之间的调用链距离不大于预设距离阈值的目标告警对象的步骤,直至在所述告警切片中不存在与所述特定告警对象之间的调用链距离不大于预设距离阈值的目标告警对象;The step of cyclically performing the step of filtering out the target alarm object whose call chain distance to the specific alarm object is not greater than the preset distance threshold from the alarm slice is performed until there is no contact with the specific alarm object in the alarm slice. The target alarm object whose call chain distance is not greater than the preset distance threshold;
    将筛选得到的所有所述目标告警对象与所述特定告警对象放置于预设的告警集合内,得到所述告警簇。Place all the target alarm objects and the specific alarm objects obtained by screening in a preset alarm set to obtain the alarm cluster.
  3. 根据权利要求1所述的告警根因的定位方法,其中,所述根据所述第二指标时序数据的第二时间窗口,按照第一预设规则对所述第一指标时序数据的第一时间窗口进行调整,并计算出所述指定指标与所述入口指标之间的指定皮尔逊相似度的步骤,包括:The method for locating the root cause of an alarm according to claim 1, wherein the first time window of the time series data of the first indicator is calculated according to the first preset rule according to the second time window of the time series data of the second indicator. The step of adjusting the window and calculating the designated Pearson similarity between the designated index and the entry index includes:
    获取第一指标时序数据的第一时间窗口,以及所述第二指标时序数据的第二时间窗口;Acquiring a first time window of the first indicator time series data, and a second time window of the second indicator time series data;
    根据所述第二时间窗口与预设的时间差阈值,对所述第一时间窗口进行滑动调整,以控制所述第一时间窗口与所述第二时间窗口的时间差在所述时间差阈值的范围之内,并得到经过滑动调整后的多组指定第一时间窗口;According to the second time window and the preset time difference threshold, the first time window is slidingly adjusted to control the time difference between the first time window and the second time window to be within the range of the time difference threshold And get multiple sets of designated first time windows after sliding adjustment;
    根据多组所述指定第一时间窗口,计算出与多组所述指定第一时间窗口分别对应的,多组所述指定指标与所述入口指标之间的皮尔逊相似度;Calculating the Pearson similarity between the multiple sets of the specified indicators and the entry indicators corresponding to the multiple sets of the specified first time windows according to the multiple sets of the specified first time windows;
    从所述多组皮尔逊相似度中筛选出数值最大的皮尔逊相似度;Selecting the Pearson similarity with the largest value from the multiple groups of Pearson similarity;
    将所述数值最大的皮尔逊相似度确定为所述指定皮尔逊相似度。The Pearson similarity degree with the largest value is determined as the designated Pearson similarity degree.
  4. 根据权利要求1所述的告警根因的定位方法,其中,所述根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象的步骤,包括:The method for locating the root cause of an alarm according to claim 1, wherein the specified time difference corresponding to all the specified Pearson similarities and each of the specified Pearson similarities is determined according to a second preset rule The step of screening out at least one specified alarm object as the root cause object of the alarm cluster from all the specified alarm objects and outputting the root cause object of the alarm cluster includes:
    将所述指定告警对象对应的指定皮尔逊相似度与指定时间差作为第一特征;Taking the specified Pearson similarity corresponding to the specified alarm object and the specified time difference as the first feature;
    获取与所述指定告警对象对应的第二特征,其中,所述第二特征的数量包括一个或多个;Acquiring a second feature corresponding to the specified alarm object, where the number of the second feature includes one or more;
    调用预设的监督学习算法对与每一个所述指定告警对象分别对应的第一特征与第二特征进行预测处理,计算出与每一个指定告警对象分别对应的根因判断概率值;Invoking a preset supervised learning algorithm to perform prediction processing on the first feature and the second feature corresponding to each specified alarm object, and calculate the root cause judgment probability value corresponding to each specified alarm object;
    将所述根因判断概率值最高的至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。The at least one designated alarm object with the highest root cause judgment probability value is taken as the root cause object of the alarm cluster, and the root cause object of the alarm cluster is output.
  5. 根据权利要求4所述的告警根因的定位方法,其中,所述将所述根因判断概率值最高的至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象的步骤,包括:The method for locating the root cause of an alarm according to claim 4, wherein the at least one designated alarm object with the highest root cause judgment probability value is used as the root cause object of the alarm cluster, and the information of the alarm cluster is output The steps for root cause objects include:
    将所有所述根因判断概率值按照从高到低的顺序进行排序,得到排序结果;Sorting all the root cause judgment probability values in descending order to obtain the sorting result;
    从所述排序结果中排在首位的根因判断概率值开始,依次获取预设数量的指定根因判断概率值;Starting from the root cause judgment probability value ranked first in the ranking result, sequentially obtain a preset number of designated root cause judgment probability values;
    将与获取到的所述指定根因判断概率值对应的告警对象作为所述告警簇的根因对象。The alarm object corresponding to the obtained specified root cause judgment probability value is taken as the root cause object of the alarm cluster.
  6. 根据权利要求1所述的告警根因的定位方法,其中,所述根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象的步骤,包括:The method for locating the root cause of an alarm according to claim 1, wherein the specified time difference corresponding to all the specified Pearson similarities and each of the specified Pearson similarities is determined according to a second preset rule The step of screening out at least one specified alarm object as the root cause object of the alarm cluster from all the specified alarm objects and outputting the root cause object of the alarm cluster includes:
    基于历史告警数据计算与所述指定皮尔逊相似度对应的第一权重值;Calculating a first weight value corresponding to the specified Pearson similarity based on historical alarm data;
    获取与所述指定时间差对应的第二权重值;Obtaining a second weight value corresponding to the specified time difference;
    根据所述第一权重值与所述第二权重值,对每一个所述指定皮尔逊相似度与相对应的每一个指定时间差进行加权计算,得到多个加权值;Performing a weighted calculation on each of the designated Pearson similarity and each corresponding designated time difference according to the first weight value and the second weight value to obtain multiple weighted values;
    将加权值最高的至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。The at least one designated alarm object with the highest weight value is taken as the root cause object of the alarm cluster, and the root cause object of the alarm cluster is output.
  7. 根据权利要求1所述的告警根因的定位方法,其中,所述根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象的步骤之后,包括:The method for locating the root cause of an alarm according to claim 1, wherein the specified time difference corresponding to all the specified Pearson similarities and each of the specified Pearson similarities is determined according to a second preset rule After filtering out at least one specified alarm object as the root cause object of the alarm cluster from all the specified alarm objects, and outputting the root cause object of the alarm cluster, the step includes:
    展示所述告警簇的根因对象;Display the root cause object of the alarm cluster;
    接收运维人员从所述告警簇的根因对象中选择的指定根因对象;Receiving the designated root cause object selected by the operation and maintenance personnel from the root cause objects of the alarm cluster;
    将所述指定根因对象确定为所述告警簇的最终根因对象。The designated root cause object is determined as the final root cause object of the alarm cluster.
  8. 一种告警根因的定位装置,其中,包括:A device for locating the root cause of an alarm, which includes:
    第一获取模块,用于获取告警切片中的特定告警对象,并根据所述特定告警对象生成告警簇,其中,所述特定告警对象为所述告警切片中所有的告警对象内的任意一个告警对象;The first acquisition module is configured to acquire a specific alarm object in an alarm slice, and generate an alarm cluster according to the specific alarm object, where the specific alarm object is any alarm object among all the alarm objects in the alarm slice ;
    生成模块,用于获取所述告警簇中的指定告警对象,并对所述指定告警对象中的监控指标进行指标聚合处理,生成对应的入口指标,其中,所述指定告警对象为所述告警簇中所有的告警对象内的任意一个告警对象;The generating module is used to obtain the specified alarm object in the alarm cluster, and perform indicator aggregation processing on the monitoring indicators in the specified alarm object to generate corresponding entry indicators, wherein the specified alarm object is the alarm cluster Any one of the alarm objects in all alarm objects;
    第二获取模块,用于获取与所述指定告警对象的指定指标对应的第一指标时序数据,以及获取与所述指定告警对象的入口指标对应的第二指标时序数据;The second acquisition module is configured to acquire the first indicator time series data corresponding to the specified indicator of the specified alarm object, and obtain the second indicator time series data corresponding to the entry indicator of the specified alarm object;
    调整模块,用于根据所述第二指标时序数据的第二时间窗口,按照第一预设规则对所述第一指标时序数据的第一时间窗口进行调整,并计算出所述指定指标与所述入口指标之间的指定皮尔逊相似度;The adjustment module is configured to adjust the first time window of the first indicator time series data according to the first preset rule according to the second time window of the second indicator time series data, and calculate the specified indicator and the time window Specify the Pearson similarity between the entry indicators;
    第三获取模块,用于分别获取每一个所述指定告警对象对应的指定皮尔逊相似度,以及与每一个所述指定皮尔逊相似度分别对应的指定时间差;The third acquiring module is configured to respectively acquire the designated Pearson similarity degree corresponding to each of the designated alarm objects, and the designated time difference corresponding to each of the designated Pearson similarity degrees;
    第一确定模块,用于根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。The first determining module is configured to filter out at least one of all the specified alarm objects according to the second preset rule based on all the specified Pearson similarities and the specified time difference corresponding to each of the specified Pearson similarities. Specify an alarm object as the root cause object of the alarm cluster, and output the root cause object of the alarm cluster.
  9. 根据权利要求8所述的告警根因的定位装置,其中,所述第一获取模块,包括:The device for locating the root cause of an alarm according to claim 8, wherein the first obtaining module comprises:
    第一获取单元,用于获取告警切片中的特定告警对象;The first obtaining unit is used to obtain a specific alarm object in the alarm slice;
    第一计算单元,用于分别计算所述告警切片中除所述特定告警对象外的每一个告警对象与所述特定告警对象之间的调用链距离;The first calculation unit is configured to calculate the call chain distance between each alarm object except the specific alarm object in the alarm slice and the specific alarm object;
    第一筛选单元,用于循环执行从所述告警切片中筛选出与所述特定告警对象之间的调用链距离不大于预设距离阈值的目标告警对象的步骤,直至在所述告警切片中不存在与所述特定告警对象之间的调用链距离不大于预设距离阈值的目标告警对象;The first screening unit is configured to cyclically perform the step of screening out the target alarm object whose call chain distance to the specific alarm object is not greater than a preset distance threshold from the alarm slice, until no target alarm object is found in the alarm slice. There is a target alarm object whose call chain distance to the specific alarm object is not greater than a preset distance threshold;
    放置单元,用于将筛选得到的所有所述目标告警对象与所述特定告警对象放置于预设的告警集合内,得到所述告警簇。The placing unit is configured to place all the target alarm objects and the specific alarm objects obtained through screening in a preset alarm set to obtain the alarm cluster.
  10. 根据权利要求8所述的告警根因的定位装置,其中,所述调整模块,包括:The device for locating the root cause of an alarm according to claim 8, wherein the adjustment module comprises:
    第二获取单元,用于获取第一指标时序数据的第一时间窗口,以及所述第二指标时序数据的第二时间窗口;The second acquiring unit is configured to acquire the first time window of the first indicator time series data and the second time window of the second indicator time series data;
    调整单元,用于根据所述第二时间窗口与预设的时间差阈值,对所述第一时间窗口进行滑动调整,以控制所述第一时间窗口与所述第二时间窗口的时间差在所述时间差阈值的范围之内,并得到经过滑动调整后的多组指定第一时间窗口;The adjustment unit is configured to perform sliding adjustment on the first time window according to the second time window and a preset time difference threshold, so as to control the time difference between the first time window and the second time window within the Within the range of the time difference threshold, and obtain multiple sets of designated first time windows after sliding adjustment;
    第二计算单元,用于根据多组所述指定第一时间窗口,计算出与多组所述指定第一时间窗口分别对应的,多组所述指定指标与所述入口指标之间的皮尔逊相似度;The second calculation unit is configured to calculate the Pearson difference between the multiple sets of the specified indicators and the entry indicators corresponding to the multiple sets of the specified first time windows, respectively, according to the multiple sets of the specified first time windows Similarity
    第二筛选单元,用于从所述多组皮尔逊相似度中筛选出数值最大的皮尔逊相似度;The second screening unit is used to screen the Pearson similarity with the largest value from the multiple groups of Pearson similarity;
    第一确定单元,用于将所述数值最大的皮尔逊相似度确定为所述指定皮尔逊相似度。The first determining unit is configured to determine the Pearson similarity degree with the largest value as the designated Pearson similarity degree.
  11. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机程序,其中,所述处理器执行所述计算机程序时实现一种告警根因的定位方法:A computer device includes a memory and a processor, and a computer program is stored in the memory. When the processor executes the computer program, a method for locating the root cause of an alarm is implemented:
    其中,所述告警根因的定位方法包括:Wherein, the method for locating the root cause of the alarm includes:
    获取告警切片中的特定告警对象,并根据所述特定告警对象生成告警簇,其中,所述特定告警对象为所述告警切片中所有的告警对象内的任意一个告警对象;Acquiring a specific alarm object in an alarm slice, and generating an alarm cluster according to the specific alarm object, where the specific alarm object is any alarm object among all the alarm objects in the alarm slice;
    获取所述告警簇中的指定告警对象,并对所述指定告警对象中的监控指标进行指标聚合处理,生成对应的入口指标,其中,所述指定告警对象为所述告警簇中所有的告警对象内的任意一个告警对象;Obtain the specified alarm object in the alarm cluster, and perform indicator aggregation processing on the monitoring indicators in the specified alarm object to generate corresponding entry indicators, where the specified alarm object is all alarm objects in the alarm cluster Any alarm object within;
    获取与所述指定告警对象的指定指标对应的第一指标时序数据,以及获取与所述指定告警对象的入口指标对应的第二指标时序数据;Acquiring first indicator time series data corresponding to the specified indicator of the specified alarm object, and acquiring second indicator time series data corresponding to the entry indicator of the specified alarm object;
    根据所述第二指标时序数据的第二时间窗口,按照第一预设规则对所述第一指标时序数据的第一时间窗口进行调整,并计算出所述指定指标与所述入口指标之间的指定皮尔逊相似度;According to the second time window of the second indicator time series data, the first time window of the first indicator time series data is adjusted according to the first preset rule, and the distance between the specified indicator and the entry indicator is calculated. The specified Pearson similarity;
    分别获取每一个所述指定告警对象对应的指定皮尔逊相似度,以及与每一个所述指定皮尔逊相似度分别对应的指定时间差;Respectively acquiring a designated Pearson similarity corresponding to each of the designated alarm objects, and a designated time difference corresponding to each of the designated Pearson similarities;
    根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。According to all the specified Pearson similarities and the specified time difference corresponding to each of the specified Pearson similarities, at least one specified alarm object is selected from all the specified alarm objects as the alarm according to a second preset rule The root cause object of the cluster, and output the root cause object of the alarm cluster.
  12. 根据权利要求11所述的计算机设备,其中,所述获取告警切片中的特定告警对象,并根据所述特定告警对象生成告警簇的步骤,包括:The computer device according to claim 11, wherein the step of obtaining a specific alarm object in an alarm slice, and generating an alarm cluster according to the specific alarm object, comprises:
    获取告警切片中的特定告警对象;Obtain a specific alarm object in the alarm slice;
    分别计算所述告警切片中除所述特定告警对象外的每一个告警对象与所述特定告警对象之间的调用链距离;Respectively calculating the call chain distance between each alarm object except the specific alarm object in the alarm slice and the specific alarm object;
    循环执行从所述告警切片中筛选出与所述特定告警对象之间的调用链距离不大于预设距离阈值的目标告警对象的步骤,直至在所述告警切片中不存在与所述特定告警对象之间的调用链距离不大于预设距离阈值的目标告警对象;The step of cyclically performing the step of filtering out the target alarm object whose call chain distance to the specific alarm object is not greater than the preset distance threshold from the alarm slice is performed until there is no contact with the specific alarm object in the alarm slice. The target alarm object whose call chain distance is not greater than the preset distance threshold;
    将筛选得到的所有所述目标告警对象与所述特定告警对象放置于预设的告警集合内,得到所述告警簇。Place all the target alarm objects and the specific alarm objects obtained by screening in a preset alarm set to obtain the alarm cluster.
  13. 根据权利要求11所述的计算机设备,其中,所述根据所述第二指标时序数据的第二时间窗口,按照第一预设规则对所述第一指标时序数据的第一时间窗口进行调整,并计算出所述指定指标与所述入口指标之间的指定皮尔逊 相似度的步骤,包括:11. The computer device according to claim 11, wherein the first time window of the first indicator time series data is adjusted according to a first preset rule according to the second time window of the second indicator time series data, And the step of calculating the designated Pearson similarity between the designated index and the entry index includes:
    获取第一指标时序数据的第一时间窗口,以及所述第二指标时序数据的第二时间窗口;Acquiring a first time window of the first indicator time series data, and a second time window of the second indicator time series data;
    根据所述第二时间窗口与预设的时间差阈值,对所述第一时间窗口进行滑动调整,以控制所述第一时间窗口与所述第二时间窗口的时间差在所述时间差阈值的范围之内,并得到经过滑动调整后的多组指定第一时间窗口;According to the second time window and the preset time difference threshold, the first time window is slidingly adjusted to control the time difference between the first time window and the second time window to be within the range of the time difference threshold And get multiple sets of designated first time windows after sliding adjustment;
    根据多组所述指定第一时间窗口,计算出与多组所述指定第一时间窗口分别对应的,多组所述指定指标与所述入口指标之间的皮尔逊相似度;Calculating the Pearson similarity between the multiple sets of the specified indicators and the entry indicators corresponding to the multiple sets of the specified first time windows according to the multiple sets of the specified first time windows;
    从所述多组皮尔逊相似度中筛选出数值最大的皮尔逊相似度;Selecting the Pearson similarity with the largest value from the multiple groups of Pearson similarity;
    将所述数值最大的皮尔逊相似度确定为所述指定皮尔逊相似度。The Pearson similarity degree with the largest value is determined as the designated Pearson similarity degree.
  14. 根据权利要求11所述的计算机设备,其中,所述根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象的步骤,包括:11. The computer device according to claim 11, wherein the specified time difference corresponding to all the specified Pearson similarities and each of the specified Pearson similarities is selected from all the specified Pearson similarities according to a second preset rule. The step of screening out at least one designated alarm object as the root cause object of the alarm cluster from the alarm objects and outputting the root cause object of the alarm cluster includes:
    将所述指定告警对象对应的指定皮尔逊相似度与指定时间差作为第一特征;Taking the specified Pearson similarity corresponding to the specified alarm object and the specified time difference as the first feature;
    获取与所述指定告警对象对应的第二特征,其中,所述第二特征的数量包括一个或多个;Acquiring a second feature corresponding to the specified alarm object, where the number of the second feature includes one or more;
    调用预设的监督学习算法对与每一个所述指定告警对象分别对应的第一特征与第二特征进行预测处理,计算出与每一个指定告警对象分别对应的根因判断概率值;Invoking a preset supervised learning algorithm to perform prediction processing on the first feature and the second feature corresponding to each specified alarm object, and calculate the root cause judgment probability value corresponding to each specified alarm object;
    将所述根因判断概率值最高的至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。The at least one designated alarm object with the highest root cause judgment probability value is taken as the root cause object of the alarm cluster, and the root cause object of the alarm cluster is output.
  15. 根据权利要求14所述的计算机设备,其中,所述将所述根因判断概率值最高的至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象的步骤,包括:The computer device according to claim 14, wherein the at least one designated alarm object with the highest root cause judgment probability value is used as the root cause object of the alarm cluster, and the information of the root cause object of the alarm cluster is output The steps include:
    将所有所述根因判断概率值按照从高到低的顺序进行排序,得到排序结果;Sorting all the root cause judgment probability values in descending order to obtain the sorting result;
    从所述排序结果中排在首位的根因判断概率值开始,依次获取预设数量的指定根因判断概率值;Starting from the root cause judgment probability value ranked first in the ranking result, sequentially obtain a preset number of designated root cause judgment probability values;
    将与获取到的所述指定根因判断概率值对应的告警对象作为所述告警簇的根因对象。The alarm object corresponding to the obtained specified root cause judgment probability value is taken as the root cause object of the alarm cluster.
  16. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现一种告警根因的定位方法,其中,所述告警根因的定位方法包括以下步骤:A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, a method for locating the root cause of an alarm is realized, wherein the method for locating the root cause of an alarm includes the following steps:
    获取告警切片中的特定告警对象,并根据所述特定告警对象生成告警簇,其中,所述特定告警对象为所述告警切片中所有的告警对象内的任意一个告警对象;Acquiring a specific alarm object in an alarm slice, and generating an alarm cluster according to the specific alarm object, where the specific alarm object is any alarm object among all the alarm objects in the alarm slice;
    获取所述告警簇中的指定告警对象,并对所述指定告警对象中的监控指标进行指标聚合处理,生成对应的入口指标,其中,所述指定告警对象为所述告 警簇中所有的告警对象内的任意一个告警对象;Obtain the specified alarm object in the alarm cluster, and perform indicator aggregation processing on the monitoring indicators in the specified alarm object to generate corresponding entry indicators, where the specified alarm object is all alarm objects in the alarm cluster Any alarm object within;
    获取与所述指定告警对象的指定指标对应的第一指标时序数据,以及获取与所述指定告警对象的入口指标对应的第二指标时序数据;Acquiring first indicator time series data corresponding to the specified indicator of the specified alarm object, and acquiring second indicator time series data corresponding to the entry indicator of the specified alarm object;
    根据所述第二指标时序数据的第二时间窗口,按照第一预设规则对所述第一指标时序数据的第一时间窗口进行调整,并计算出所述指定指标与所述入口指标之间的指定皮尔逊相似度;According to the second time window of the second indicator time series data, the first time window of the first indicator time series data is adjusted according to the first preset rule, and the distance between the specified indicator and the entry indicator is calculated. The specified Pearson similarity;
    分别获取每一个所述指定告警对象对应的指定皮尔逊相似度,以及与每一个所述指定皮尔逊相似度分别对应的指定时间差;Respectively acquiring a designated Pearson similarity corresponding to each of the designated alarm objects, and a designated time difference corresponding to each of the designated Pearson similarities;
    根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。According to all the specified Pearson similarities and the specified time difference corresponding to each of the specified Pearson similarities, at least one specified alarm object is selected from all the specified alarm objects as the alarm according to a second preset rule The root cause object of the cluster, and output the root cause object of the alarm cluster.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述获取告警切片中的特定告警对象,并根据所述特定告警对象生成告警簇的步骤,包括:The computer-readable storage medium according to claim 16, wherein the step of obtaining a specific alarm object in an alarm slice, and generating an alarm cluster according to the specific alarm object, comprises:
    获取告警切片中的特定告警对象;Obtain a specific alarm object in the alarm slice;
    分别计算所述告警切片中除所述特定告警对象外的每一个告警对象与所述特定告警对象之间的调用链距离;Respectively calculating the call chain distance between each alarm object except the specific alarm object in the alarm slice and the specific alarm object;
    循环执行从所述告警切片中筛选出与所述特定告警对象之间的调用链距离不大于预设距离阈值的目标告警对象的步骤,直至在所述告警切片中不存在与所述特定告警对象之间的调用链距离不大于预设距离阈值的目标告警对象;The step of cyclically performing the step of filtering out the target alarm object whose call chain distance to the specific alarm object is not greater than the preset distance threshold from the alarm slice is performed until there is no contact with the specific alarm object in the alarm slice. The target alarm object whose call chain distance is not greater than the preset distance threshold;
    将筛选得到的所有所述目标告警对象与所述特定告警对象放置于预设的告警集合内,得到所述告警簇。Place all the target alarm objects and the specific alarm objects obtained by screening in a preset alarm set to obtain the alarm cluster.
  18. 根据权利要求16所述的计算机可读存储介质,其中,所述根据所述第二指标时序数据的第二时间窗口,按照第一预设规则对所述第一指标时序数据的第一时间窗口进行调整,并计算出所述指定指标与所述入口指标之间的指定皮尔逊相似度的步骤,包括:The computer-readable storage medium according to claim 16, wherein the second time window according to the second indicator time series data is compared to the first time window of the first indicator time series data according to a first preset rule The steps of adjusting and calculating the designated Pearson similarity between the designated index and the entry index include:
    获取第一指标时序数据的第一时间窗口,以及所述第二指标时序数据的第二时间窗口;Acquiring a first time window of the first indicator time series data, and a second time window of the second indicator time series data;
    根据所述第二时间窗口与预设的时间差阈值,对所述第一时间窗口进行滑动调整,以控制所述第一时间窗口与所述第二时间窗口的时间差在所述时间差阈值的范围之内,并得到经过滑动调整后的多组指定第一时间窗口;According to the second time window and the preset time difference threshold, the first time window is slidingly adjusted to control the time difference between the first time window and the second time window to be within the range of the time difference threshold And get multiple sets of designated first time windows after sliding adjustment;
    根据多组所述指定第一时间窗口,计算出与多组所述指定第一时间窗口分别对应的,多组所述指定指标与所述入口指标之间的皮尔逊相似度;Calculating the Pearson similarity between the multiple sets of the specified indicators and the entry indicators corresponding to the multiple sets of the specified first time windows according to the multiple sets of the specified first time windows;
    从所述多组皮尔逊相似度中筛选出数值最大的皮尔逊相似度;Selecting the Pearson similarity with the largest value from the multiple groups of Pearson similarity;
    将所述数值最大的皮尔逊相似度确定为所述指定皮尔逊相似度。The Pearson similarity degree with the largest value is determined as the designated Pearson similarity degree.
  19. 根据权利要求16所述的计算机可读存储介质,其中,所述根据所有所述指定皮尔逊相似度以及每一个所述指定皮尔逊相似度分别对应的指定时间差,按照第二预设规则从所有所述指定告警对象中筛选出至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象的步骤,包括:15. The computer-readable storage medium according to claim 16, wherein the specified time difference corresponding to all the specified Pearson similarities and each of the specified Pearson similarities is selected from all the specified Pearson similarities according to the second preset rule The step of screening at least one specified alarm object as the root cause object of the alarm cluster from the specified alarm objects and outputting the root cause object of the alarm cluster includes:
    将所述指定告警对象对应的指定皮尔逊相似度与指定时间差作为第一特 征;Taking the designated Pearson similarity corresponding to the designated alarm object and the designated time difference as the first feature;
    获取与所述指定告警对象对应的第二特征,其中,所述第二特征的数量包括一个或多个;Acquiring a second feature corresponding to the specified alarm object, where the number of the second feature includes one or more;
    调用预设的监督学习算法对与每一个所述指定告警对象分别对应的第一特征与第二特征进行预测处理,计算出与每一个指定告警对象分别对应的根因判断概率值;Invoking a preset supervised learning algorithm to perform prediction processing on the first feature and the second feature corresponding to each specified alarm object, and calculate the root cause judgment probability value corresponding to each specified alarm object;
    将所述根因判断概率值最高的至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象。The at least one designated alarm object with the highest root cause judgment probability value is taken as the root cause object of the alarm cluster, and the root cause object of the alarm cluster is output.
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述将所述根因判断概率值最高的至少一个指定告警对象作为所述告警簇的根因对象,并输出所述告警簇的根因对象的步骤,包括:The computer-readable storage medium according to claim 19, wherein the at least one designated alarm object with the highest root cause judgment probability value is used as the root cause object of the alarm cluster, and the root cause of the alarm cluster is outputted. Depending on the target's steps, including:
    将所有所述根因判断概率值按照从高到低的顺序进行排序,得到排序结果;Sorting all the root cause judgment probability values in descending order to obtain the sorting result;
    从所述排序结果中排在首位的根因判断概率值开始,依次获取预设数量的指定根因判断概率值;Starting from the root cause judgment probability value ranked first in the ranking result, sequentially obtain a preset number of designated root cause judgment probability values;
    将与获取到的所述指定根因判断概率值对应的告警对象作为所述告警簇的根因对象。The alarm object corresponding to the obtained specified root cause judgment probability value is taken as the root cause object of the alarm cluster.
PCT/CN2020/099434 2020-04-29 2020-06-30 Method and apparatus for locating root cause of alarm, computer device, and storage medium WO2021217865A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010357568.4 2020-04-29
CN202010357568.4A CN111555921B (en) 2020-04-29 2020-04-29 Method and device for positioning alarm root cause, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021217865A1 true WO2021217865A1 (en) 2021-11-04

Family

ID=72007828

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/099434 WO2021217865A1 (en) 2020-04-29 2020-06-30 Method and apparatus for locating root cause of alarm, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN111555921B (en)
WO (1) WO2021217865A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114024837A (en) * 2022-01-06 2022-02-08 杭州大乘智能科技有限公司 Fault root cause positioning method of micro-service system
CN114325232A (en) * 2021-12-28 2022-04-12 微梦创科网络科技(中国)有限公司 Fault positioning method and device
CN114389960A (en) * 2022-01-04 2022-04-22 烽火通信科技股份有限公司 Method and system for collecting and reporting network service performance
CN115473789A (en) * 2022-09-16 2022-12-13 深信服科技股份有限公司 Alarm processing method and related equipment
CN115529219A (en) * 2022-09-16 2022-12-27 中国工商银行股份有限公司 Alarm analysis method and device, computer readable storage medium and electronic equipment
CN115756919A (en) * 2022-11-10 2023-03-07 上海鼎茂信息技术有限公司 Root cause positioning method and system for multidimensional data

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114285730A (en) * 2020-09-18 2022-04-05 华为技术有限公司 Method and device for determining fault root cause and related equipment
CN113821413A (en) * 2021-09-27 2021-12-21 中国建设银行股份有限公司 Alarm analysis method and device
CN117093407B (en) * 2023-10-19 2024-03-19 北京凡得科技有限公司 Improved S-learner-based flow anomaly cascade root cause analysis method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120072782A1 (en) * 2010-09-21 2012-03-22 Verizon Patent And Licensing, Inc. Correlation of network alarm messages based on alarm time
CN103051481A (en) * 2012-11-15 2013-04-17 香港应用科技研究院有限公司 Adaptive uniform performance management of network element
CN109981326A (en) * 2017-12-28 2019-07-05 中国移动通信集团山东有限公司 The method and device of home broadband perception fault location
CN110300011A (en) * 2018-03-23 2019-10-01 中国移动通信集团有限公司 A kind of alarm root is because of localization method, device and computer readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399347B (en) * 2018-04-23 2021-05-18 华为技术有限公司 Alarm log compression method, device and system and storage medium
CN110351118B (en) * 2019-05-28 2020-12-01 华为技术有限公司 Root cause alarm decision network construction method, device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120072782A1 (en) * 2010-09-21 2012-03-22 Verizon Patent And Licensing, Inc. Correlation of network alarm messages based on alarm time
CN103051481A (en) * 2012-11-15 2013-04-17 香港应用科技研究院有限公司 Adaptive uniform performance management of network element
CN109981326A (en) * 2017-12-28 2019-07-05 中国移动通信集团山东有限公司 The method and device of home broadband perception fault location
CN110300011A (en) * 2018-03-23 2019-10-01 中国移动通信集团有限公司 A kind of alarm root is because of localization method, device and computer readable storage medium

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114325232A (en) * 2021-12-28 2022-04-12 微梦创科网络科技(中国)有限公司 Fault positioning method and device
CN114325232B (en) * 2021-12-28 2023-07-25 微梦创科网络科技(中国)有限公司 Fault positioning method and device
CN114389960A (en) * 2022-01-04 2022-04-22 烽火通信科技股份有限公司 Method and system for collecting and reporting network service performance
CN114389960B (en) * 2022-01-04 2023-11-28 烽火通信科技股份有限公司 Method and system for collecting and reporting network service performance
CN114024837A (en) * 2022-01-06 2022-02-08 杭州大乘智能科技有限公司 Fault root cause positioning method of micro-service system
CN114024837B (en) * 2022-01-06 2022-04-05 杭州乘云数字技术有限公司 Fault root cause positioning method of micro-service system
CN115473789A (en) * 2022-09-16 2022-12-13 深信服科技股份有限公司 Alarm processing method and related equipment
CN115529219A (en) * 2022-09-16 2022-12-27 中国工商银行股份有限公司 Alarm analysis method and device, computer readable storage medium and electronic equipment
CN115473789B (en) * 2022-09-16 2024-02-27 深信服科技股份有限公司 Alarm processing method and related equipment
CN115756919A (en) * 2022-11-10 2023-03-07 上海鼎茂信息技术有限公司 Root cause positioning method and system for multidimensional data
CN115756919B (en) * 2022-11-10 2023-10-31 上海鼎茂信息技术有限公司 Root cause positioning method and system for multidimensional data

Also Published As

Publication number Publication date
CN111555921A (en) 2020-08-18
CN111555921B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
WO2021217865A1 (en) Method and apparatus for locating root cause of alarm, computer device, and storage medium
CN111178456B (en) Abnormal index detection method and device, computer equipment and storage medium
CN107633265B (en) Data processing method and device for optimizing credit evaluation model
WO2021184554A1 (en) Database exception monitoring method and device, computer device, and storage medium
CN106951984B (en) Dynamic analysis and prediction method and device for system health degree
CN111625413A (en) Index abnormality analysis method, index abnormality analysis device and storage medium
US8793210B2 (en) General market prediction using position specification language
EP3657347A1 (en) Machine learning based database anomaly prediction
CN111176953A (en) Anomaly detection and model training method thereof, computer equipment and storage medium
CN116955092B (en) Multimedia system monitoring method and system based on data analysis
CN113051308A (en) Alarm information processing method, equipment, storage medium and device
RU2607977C1 (en) Method of creating model of object
CN116034379A (en) Activity level measurement using deep learning and machine learning
US20230325632A1 (en) Automated anomaly detection using a hybrid machine learning system
Boyerinas Determining the statistical power of the Kolmogorov-Smirnov and Anderson-Darling goodness-of-fit tests via Monte Carlo simulation
CN113642672A (en) Feature processing method and device of medical insurance data, computer equipment and storage medium
CN114155048A (en) Method and device for predicting associated business, electronic equipment and storage medium
Madyatmadja et al. Big data analysis using rapidminer studio to predict suicide rate in several countries
CN113161004A (en) Epidemic situation prediction system and method
US20180060279A1 (en) System and method for creating a metrological/psychometric instrument
WO2023179042A1 (en) Data updating method, fault diagnosis method, electronic device, and storage medium
WO2022222230A1 (en) Indicator prediction method and apparatus based on machine learning, and device and storage medium
WO2022183019A1 (en) Methods for mitigation of algorithmic bias discrimination, proxy discrimination and disparate impact
CN114676749A (en) Power distribution network operation data abnormity judgment method based on data mining
Anuar et al. Reverse Migration Factor in Machine Learning Models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20933848

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20933848

Country of ref document: EP

Kind code of ref document: A1