CN116048915A

CN116048915A - Index abnormality monitoring method and device, electronic equipment and storage medium

Info

Publication number: CN116048915A
Application number: CN202211708992.4A
Authority: CN
Inventors: 李睿; 韩超; 王宗强; 邓罡; 门玉森; 吕旖旎; 杨俊�; 梁策; 曹铭轩; 杨元; 李巍伟; 林恩爱; 娄峰; 何宁
Original assignee: China Travelsky Technology Co Ltd
Current assignee: China Travelsky Technology Co Ltd
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-05-02

Abstract

The application discloses a method and a device for monitoring index abnormality, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring the current CPU utilization rate of a target system and the tag information of the target system in real time; based on the label information of the target system, matching an abnormality detection model corresponding to the target system from all abnormality detection models constructed in advance; the anomaly detection models are constructed by utilizing historical CPU utilization rate time sequence data and distribution characteristics of the systems in advance; detecting whether the current CPU utilization rate of the target system is abnormal or not by using an abnormality detection model corresponding to the target system; and if the current CPU utilization rate of the target system is detected to be abnormal, alarming is carried out at least based on the label information of the target system and the current CPU utilization rate of the target system.

Description

Index abnormality monitoring method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of system state monitoring technologies, and in particular, to a method and apparatus for monitoring an abnormal index, an electronic device, and a storage medium.

Background

With the continuous development of information technology, the service carried by the system is increasingly complex and the resource scale is increasingly large, so that higher requirements are put forward on the monitoring capability of the system. The monitoring of the running state of the system is realized mainly by monitoring various monitoring indexes. The CPU utilization rate is one of the important indexes for monitoring the system state.

At present, the CPU utilization rate is monitored by an operation and maintenance personnel according to experience in a mode of setting a fixed threshold, namely, the CPU utilization rate is collected in real time, and whether the CPU utilization rate is abnormal or not is determined by comparing the CPU utilization rate with the set threshold. And if the abnormality exists, alarming in time.

However, in an actual production environment, the usage of different server systems is different, so that the characterization of the CPU usage rate is also greatly different, and therefore, a single mode of comparing according to an empirically set threshold value is adopted, so that a large number of false positives or false negatives are easy to occur.

Disclosure of Invention

Based on the defects of the prior art, the application provides a monitoring method and device for index abnormality, electronic equipment and storage medium, so as to solve the problem that the existing monitoring method is easy to report by mistake or not.

In order to achieve the above object, the present application provides the following technical solutions:

the first aspect of the present application provides a method for monitoring an index anomaly, including:

acquiring the current CPU utilization rate of a target system and the tag information of the target system in real time;

based on the label information of the target system, matching an abnormality detection model corresponding to the target system from all abnormality detection models constructed in advance; the anomaly detection models are constructed by utilizing historical CPU utilization rate time sequence data and distribution characteristics of the systems in advance;

detecting whether the current CPU utilization rate of the target system is abnormal or not by using an abnormality detection model corresponding to the target system;

and if the current CPU utilization rate of the target system is detected to be abnormal, alarming is carried out at least based on the label information of the target system and the current CPU utilization rate of the target system.

Optionally, in the method for monitoring an indicator anomaly, the method for constructing an anomaly detection model corresponding to the target system includes:

acquiring offline log data of the target system;

performing structural conversion on the offline log data of the target system to obtain label information of the target system and historical CPU utilization rate time sequence data of the target system;

Detecting whether historical CPU usage time sequence data of the target system accords with normal distribution;

if the historical CPU utilization rate time sequence data of the target system is detected to accord with normal distribution, outlier analysis is carried out on the historical CPU utilization rate time sequence data of the target system, and an abnormality detection model corresponding to the target system is constructed;

if the historical CPU utilization rate time sequence data of the target system is detected to be not in accordance with normal distribution, detecting an abnormal value in the historical CPU utilization rate time sequence data of the target system by using an isolated forest algorithm, and removing the abnormal value from the historical CPU utilization rate time sequence data of the target system;

and carrying out fluctuation rate analysis on the historical CPU usage time sequence data of the target system after the abnormal value is removed, and constructing an abnormality detection model corresponding to the target system.

Optionally, in the method for monitoring an indicator anomaly, before detecting whether the historical CPU usage time series data of the target system accords with a normal distribution, the method further includes:

filling the missing data in the historical CPU utilization time sequence data of the target system according to the data missing proportion in the historical CPU utilization time sequence data of the target system;

And carrying out downsampling processing on the filled historical CPU usage time sequence data of the target system so as to unify the time interval of data in the historical CPU usage time sequence data of the target system.

Optionally, in the method for monitoring an indicator anomaly, the constructing an anomaly detection model corresponding to the target system by performing outlier analysis on historical CPU usage time series data of the target system includes:

extracting the mean value and variance of the historical CPU usage time sequence data of the target system;

subtracting the product of the variance and a preset parameter from the mean value to obtain a minimum outlier detection factor, and adding the product of the variance and the preset parameter to the mean value to obtain a maximum outlier detection factor;

constructing an outlier judgment condition corresponding to the target system by using the minimum outlier detection factor and the maximum outlier detection factor; the outlier judgment condition corresponding to the target system is smaller than the minimum outlier detection factor or larger than the maximum outlier detection factor;

at least the outlier judgment condition corresponding to the target system and the label information of the target system are combined to form an anomaly detection model corresponding to the target system.

Optionally, in the method for monitoring an indicator anomaly, the step of constructing an anomaly detection model corresponding to the target system by performing a fluctuation rate analysis on historical CPU usage time series data of the target system after removing the anomaly value includes:

extracting a minimum percentile, a maximum percentile, a first quartile and a third quartile of historical CPU (Central processing Unit) utilization time sequence data of the target system after the abnormal value is removed;

subtracting the product of the preset parameter and the quartile difference from the first quartile to obtain a minimum fluctuation rate, and adding the product of the preset parameter and the quartile difference to the third quartile to obtain a maximum fluctuation rate; wherein the quartile difference is the third quartile minus the first quartile difference;

determining the smaller of the minimum volatility and the minimum percentile as a minimum volatility detection factor and the larger of the maximum percentile and the maximum volatility as a maximum volatility factor;

constructing an outlier judgment condition corresponding to the target system by utilizing the minimum fluctuation rate detection factor and the maximum fluctuation rate factor; the outlier judgment condition corresponding to the target system is smaller than the minimum fluctuation rate detection factor or larger than the maximum fluctuation rate factor;

Optionally, in the method for monitoring an indicator anomaly, the detecting whether the current CPU utilization of the target system is abnormal by using an anomaly detection model corresponding to the target system includes:

detecting whether the current CPU utilization rate of the target system is smaller than a smaller value of detection factors in an anomaly detection model corresponding to the target system or larger than a smaller value of two detection factors; and if the current CPU utilization rate of the target system is detected to be smaller than a smaller value or larger than a larger value of the smaller value of the two detection factors in the abnormality detection model corresponding to the target system, determining that the current CPU utilization rate of the target system is abnormal.

Optionally, in the method for monitoring an indicator anomaly, the alarming at least based on the tag information of the target system and the current CPU usage of the target system includes:

calculating a current fluctuation ratio according to the current CPU utilization rate of the target system;

Determining an alarm level corresponding to the current fluctuation ratio;

displaying the alarm grade and the label information of the target system;

and utilizing the current CPU utilization rate of the target system and a utilization rate sequence of a preset time period before the current CPU utilization rate of the target system, and displaying a change trend graph.

The second aspect of the present application provides a monitoring device for index anomalies, including:

the real-time data acquisition unit is used for acquiring the current CPU utilization rate of the target system and the tag information of the target system in real time;

the model matching unit is used for matching an abnormality detection model corresponding to the target system from all the pre-constructed abnormality detection models based on the label information of the target system; the anomaly detection models are constructed by utilizing historical CPU utilization rate time sequence data and distribution characteristics of the systems in advance;

the abnormality detection unit is used for detecting whether the current CPU utilization rate of the target system is abnormal or not by using an abnormality detection model corresponding to the target system;

and the alarming unit is used for alarming at least based on the label information of the target system and the current CPU utilization rate of the target system when detecting that the current CPU utilization rate of the target system is abnormal.

Optionally, the monitoring device for index anomaly further includes:

the log acquisition unit is used for acquiring offline log data of the target system;

the conversion unit is used for carrying out structural conversion on the offline log data of the target system to obtain the tag information of the target system and the historical CPU utilization rate time sequence data of the target system;

a distribution form detection unit for detecting whether the historical CPU usage time sequence data of the target system accords with normal distribution;

the first model construction unit is used for constructing an abnormality detection model corresponding to the target system by performing outlier analysis on the historical CPU usage time sequence data of the target system when the historical CPU usage time sequence data of the target system is detected to accord with normal distribution;

a removing unit, configured to detect an outlier in the historical CPU usage time series data of the target system by using an isolated forest algorithm when it is detected that the historical CPU usage time series data of the target system does not conform to the normal distribution, and remove the outlier from the historical CPU usage time series data of the target system;

and the second model construction unit is used for constructing an abnormality detection model corresponding to the target system by carrying out fluctuation rate analysis on the historical CPU usage time sequence data of the target system after the abnormal value is removed.

Optionally, the monitoring device for index anomaly further includes:

the filling unit is used for filling the missing data in the historical CPU utilization rate time sequence data of the target system according to the data missing proportion in the historical CPU utilization rate time sequence data of the target system;

and the downsampling unit is used for downsampling the filled historical CPU utilization rate time sequence data of the target system so as to unify the time interval of the data in the historical CPU utilization rate time sequence data of the target system.

Optionally, in the above-mentioned monitoring device for index anomaly, the first model building unit includes:

a first feature extraction unit for extracting a mean value and a variance of historical CPU usage time series data of the target system;

the first calculation unit is used for subtracting the product of the variance and the preset parameter from the mean value to obtain a minimum outlier detection factor, and adding the product of the variance and the preset parameter to the mean value to obtain a maximum outlier detection factor;

the first condition construction unit is used for constructing an outlier judgment condition corresponding to the target system by utilizing the minimum outlier detection factor and the maximum outlier detection factor; the outlier judgment condition corresponding to the target system is smaller than the minimum outlier detection factor or larger than the maximum outlier detection factor;

The first model composing unit is used for at least composing the outlier judging condition corresponding to the target system and the label information of the target system together into an anomaly detection model corresponding to the target system.

Optionally, in the above-mentioned monitoring device for index anomaly, the second model building unit includes:

the second feature extraction unit is used for extracting a minimum percentile, a maximum percentile, a first quartile and a third quartile of the historical CPU utilization time sequence data of the target system after the abnormal value is removed;

a second calculation unit, configured to subtract a product of a preset parameter and a quartile difference from the first quartile to obtain a minimum fluctuation rate, and add a product of the preset parameter and the quartile difference to the third quartile to obtain a maximum fluctuation rate; wherein the quartile difference is the third quartile minus the first quartile difference;

a factor determining unit configured to determine a smaller value of the minimum volatility and the minimum percentile as a minimum volatility detection factor, and a larger value of the maximum percentile and the maximum volatility as a maximum volatility factor;

A second condition construction unit for constructing an outlier judgment condition corresponding to the target system by using the minimum fluctuation rate detection factor and the maximum fluctuation rate factor; the outlier judgment condition corresponding to the target system is smaller than the minimum fluctuation rate detection factor or larger than the maximum fluctuation rate factor;

and the second model forming unit at least forms an anomaly detection model corresponding to the target system by jointly forming the outlier judging condition corresponding to the target system and the label information of the target system.

Optionally, in the above-mentioned monitoring device for index abnormality, the abnormality detection unit includes:

an anomaly detection subunit, configured to detect whether a current CPU usage of the target system is smaller than a smaller value of detection factors in an anomaly detection model corresponding to the target system, or larger than a smaller value of two detection factors; and if the current CPU utilization rate of the target system is detected to be smaller than a smaller value or larger than a larger value of the smaller value of the two detection factors in the abnormality detection model corresponding to the target system, determining that the current CPU utilization rate of the target system is abnormal.

Optionally, in the above monitoring device for index anomaly, the alarm unit includes:

a fluctuation ratio calculation unit for calculating a current fluctuation ratio according to a current CPU usage of the target system;

a level determining unit, configured to determine an alarm level corresponding to the current fluctuation ratio;

the information display unit is used for displaying the alarm grade and the label information of the target system;

and the drawing unit is used for utilizing the current CPU utilization rate of the target system and the utilization rate sequence of the preset time period before the current CPU utilization rate of the target system, changing the trend graph and displaying the change trend graph.

A third aspect of the present application provides an electronic device, comprising:

one or more processing devices;

a memory having one or more programs stored thereon;

the one or more programs, when executed by the one or more processing devices, cause the one or more processing devices to implement the method of monitoring for indicator anomalies as set forth in any one of the preceding claims.

A fourth aspect of the present application provides a computer storage medium storing a program for implementing the method for monitoring an index anomaly as set forth in any one of the above when the program is executed.

The embodiment of the application provides a monitoring method for index abnormality, which is used for constructing and obtaining a corresponding abnormality detection model based on historical CPU utilization time sequence data and distribution characteristics of each system in advance, so as to obtain an abnormality detection model which accords with the specific distribution condition of the CPU utilization. And then acquiring the current CPU utilization rate of the target system and the label information of the target system in real time, and matching the corresponding abnormality detection model of the target system from the pre-constructed abnormality detection models based on the label information of the target system. And then detecting whether the current CPU utilization rate of the target system is abnormal or not by using an abnormality detection model corresponding to the target system. And if the current CPU utilization rate of the target system is detected to be abnormal, alarming at least based on the label information of the target system and the current CPU utilization rate of the target system. Because the abnormality detection model corresponding to the system is constructed based on the historical data and the distribution characteristics thereof, the system with different purposes accords with the characteristics of the data, thereby effectively ensuring the detection accuracy and avoiding the occurrence of false alarm and missing report.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flowchart of a method for monitoring an index anomaly according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for constructing an anomaly detection model corresponding to a target system according to an embodiment of the present application;

FIG. 3 is a flowchart of a method for preprocessing historical CPU usage time series data according to an embodiment of the present application;

FIG. 4 is a flowchart of a method for constructing an anomaly detection model by outlier analysis of data according to an embodiment of the present application;

FIG. 5 is a flowchart of a method for constructing an anomaly detection model by performing a volatility analysis on data provided by an embodiment of the present application;

FIG. 6 is a flowchart of an alarm method according to an embodiment of the present application;

fig. 7 is a schematic architecture diagram of a monitoring device for index anomaly according to an embodiment of the present application;

fig. 8 is a schematic architecture diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

In this application, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The embodiment of the application provides a method for monitoring index anomalies, as shown in fig. 1, comprising the following steps:

s101, acquiring the current CPU utilization rate of the target system and the tag information of the target system in real time.

The target system refers to any system that needs to be monitored, so if there are a plurality of systems that need to be monitored, each system may be used as the target system to execute the subsequent steps.

Wherein the tag information of the target system refers to information for distinguishing between different systems. Alternatively, configuration information of the system may be employed as tag information of the system. Since there are a plurality of systems that generally need to be monitored, and in the embodiment of the present application, a corresponding model is built for each system, processing is required by matching the system tag information with the corresponding model.

S102, based on label information of the target system, matching an abnormality detection model corresponding to the target system from the pre-constructed abnormality detection models.

Wherein, each abnormal detection model is constructed by utilizing the historical CPU utilization rate time sequence data and the distribution characteristics of each system in advance.

It should be noted that, the usage of different systems is different, so the characterization of the CPU usage is also different, so in the embodiment of the present application, based on the historical CPU usage time series data of the system and the distribution feature of the historical CPU usage time series data, that is, a transformation feature of the CPU usage monitored by the history, a model corresponding to the system is constructed, so that the model corresponding to the system is constructed, and the model can conform to the variation situation of the CPU usage of the system.

Optionally, when the anomaly detection model corresponding to the target system is constructed, the label information of the target system is also stored in the anomaly detection model corresponding to the target system, so that the anomaly detection model corresponding to the target system can be matched from the anomaly detection models constructed in advance directly based on the label information of the target system.

Optionally, another embodiment of the present application provides a method for constructing an anomaly detection model corresponding to a target system, as shown in fig. 2, including:

s201, acquiring offline log data of a target system.

S202, performing structural conversion on the offline log data of the target system to obtain tag information of the target system and historical CPU utilization time sequence data of the target system.

Since the offline log data includes a large amount of data and is inconvenient to process, it is necessary to perform structural conversion and parse the offline log data into multi-dimensional structured data. The converted data includes configuration information of the target system and historical CPU usage time sequence data of the target system, and the configuration information of the target system can be used as tag information of the target system.

Optionally, in order to facilitate accuracy of subsequent data processing, in another embodiment of the present application, after performing step S202, the historical CPU usage time series data may be further preprocessed before performing step S203. As shown in fig. 3, a method for preprocessing time-series data of historical CPU usage provided in an embodiment of the present application includes:

S301, filling missing data in the historical CPU utilization time sequence data of the target system according to the data missing proportion in the historical CPU utilization time sequence data of the target system.

In the data collection process, the data at a certain time point may not be successfully collected due to the reasons of heartbeat timeout, network fluctuation and the like, or the situation of partial data loss and the like occurs in the data transmission process, so that the missing data needs to be filled. However, the filling of the missing data needs to be performed without affecting the data distribution.

Alternatively, when the data missing proportion is low, the linear difference method can be adopted to fill the missing data. When the data missing proportion is larger, the filling can be performed by adopting a mean value method. Of course, if the missing data is too large, it indicates that it cannot be used for model construction, and data collection needs to be performed again. For example, when the data missing proportion is less than 5%, then a linear difference method can be adopted for filling. When the data missing proportion is more than 5% and not more than 10%, the filling can be performed by adopting a mean value method. When the data loss ratio is greater than 10%, then the feedback data cannot be used for model construction.

S302, downsampling the filled historical CPU usage time sequence data of the target system to unify the time intervals of the data in the historical CPU usage time sequence data of the target system.

It should be noted that, in the subsequent analysis process, it may be necessary to calculate the fluctuation difference of the data at different time intervals, and if the time intervals of the data are not uniform, an error exists in the calculated difference, so that downsampling processing is required to be performed on the historical CPU usage time sequence data of the target system.

Alternatively, a specific downsampling time interval calculation method may be as follows:

wherein t is _i Representing the time interval of sampling data.

S203, detecting whether the historical CPU utilization time sequence data of the target system accords with normal distribution.

It should be noted that, for CPU usage time series data of different systems, there may be a large difference in distribution form, and for data of different distribution forms, it is obviously necessary to adopt different detection methods to determine abnormal states of the data. Therefore, in the embodiment of the present application, the data belonging to the normal distribution and the data belonging to the non-normal distribution are processed in different ways, so it is necessary to detect whether the historical CPU usage time series data of the target system conforms to the normal distribution.

Alternatively, a K-S test method may be employed to detect the normality of historical CPU usage timing data of the target system.

If it is detected that the historical CPU usage time series data of the target system matches the normal distribution, step S204 is executed, and if it is detected that the historical CPU usage time series data of the target system does not match the normal distribution, step S205 is executed.

S204, performing outlier analysis on historical CPU utilization time sequence data of the target system, and constructing an anomaly detection model corresponding to the target system.

Since the historical CPU usage time sequence data of the target system accords with normal distribution, if the difference between one data and other data is large, the data is obviously abnormal data, and the data is an outlier, so that an abnormality detection model corresponding to the target system can be constructed by performing outlier analysis on the historical CPU usage time sequence data of the target system and used for performing abnormality detection on the CPU usage of the target system which accords with normal distribution.

Optionally, in another embodiment of the present application, a specific implementation of step S204, as shown in fig. 4, includes the following steps:

s401, extracting the mean value and the variance of the historical CPU utilization time sequence data of the target system.

S402, subtracting the product of the variance and a preset parameter from the mean value to obtain a minimum outlier detection factor, and adding the product of the variance and the preset parameter to the mean value to obtain a maximum outlier detection factor.

The mean and variance represent the overall size and deviation of the historical CPU usage time series data, so subtracting the product of the variance and the preset parameter from the mean results in a minimum value that allows downward deviation, and taking the minimum value as a minimum outlier detection factor. Correspondingly, the product of the variance and the preset parameter is added to the mean value to obtain the maximum value which is allowed to deviate upwards, and the maximum value is taken as a maximum outlier detection factor.

Alternatively, the preset parameter may be specifically determined according to the data amount in the historical CPU usage time series data of the target system, and may be set to 6, for example.

S403, constructing an outlier judgment condition corresponding to the target system by utilizing the minimum outlier detection factor and the maximum outlier detection factor.

The outlier judgment condition corresponding to the target system is smaller than the minimum outlier detection factor or larger than the maximum outlier detection factor.

S404, at least combining the outlier judgment conditions corresponding to the target system and the label information of the target system into an anomaly detection model corresponding to the target system.

And storing the outlier judgment conditions corresponding to the target system and the label information of the target system together, and determining the outlier judgment conditions and the label information of the target system as an anomaly detection model corresponding to the target system. In order to facilitate subsequent tracing and use of related information, the distribution form of the historical CPU usage time series data of the target system, the minimum outlier detection factor, the detection factor larger than the maximum outlier, and the like may be stored in an anomaly detection model corresponding to the target system.

Alternatively, in order to facilitate analysis of the historical data, after step S404 is performed, the historical CPU usage time series data of the target system may be subjected to anomaly detection using the anomaly detection model corresponding to the target system, thereby detecting the anomaly data therein, and storing the detected anomaly data in the database.

S205, detecting an abnormal value in the historical CPU usage time sequence data of the target system by using an isolated forest algorithm, and removing the abnormal value from the historical CPU usage time sequence data of the target system.

Since the historical CPU usage time series data of the target system belongs to a non-normal distribution, abnormal data cannot be determined by outlier analysis. In the embodiment of the present application, it is determined whether the data is abnormal data or not by the fluctuation rate of the data at this time.

In order to avoid that the outlier in the historical CPU usage time series data of the target system affects the extraction of the data feature value, it is necessary in the embodiment of the present application to detect the outlier in the historical CPU usage time series data of the target system by the isolated forest algorithm, remove the outlier from the historical CPU usage time series data of the target system, and then execute step S206.

S206, performing fluctuation rate analysis on the historical CPU usage time sequence data of the target system with the abnormal values removed, and constructing an abnormality detection model corresponding to the target system.

Optionally, in another embodiment of the present application, a specific implementation of step S206, as shown in fig. 5, includes the following steps:

s501, extracting a minimum percentile, a maximum percentile, a first quartile and a third quartile of historical CPU (Central processing Unit) utilization time sequence data of the target system after the outliers are removed.

Where the minimum percentile refers to a value on a preset smaller percentile, for example, a percentile on 0.0001 of historical CPU usage time series data may be determined as the minimum percentile. Similarly, the maximum percentile refers to a value at a predetermined larger percentile, for example, a percentile located at 0.9999 of historical CPU usage time series data may be determined as the maximum percentile. Specifically, the method can be calculated by a quantile function. Thus, an upper limit value and a lower limit value in the historical CPU usage time series data of the target system can be determined.

The quartile is a commonly used quantile comprising first to third quantiles, wherein the first quartile refers to the 25% data after all values in the sample are arranged from small to large. And the third quartile refers to the 75% data after all values in the sample are arranged from small to large. Also, historical CPU usage time series data of the target system can be calculated through a fractional bit function.

S502, subtracting a product of a preset parameter and a quartile difference from the first quartile to obtain a minimum fluctuation rate, and adding a product of the preset parameter and the quartile difference to the third quartile to obtain a maximum fluctuation rate.

Wherein the third quartile of the quartile difference minus the first quartile.

In general, in the manner of detecting an abnormal value by quartile, an abnormal upper limit value and an abnormal lower limit value are calculated in the manner of step S502, and then, whether or not data is abnormal is determined by these two limit values. In the embodiment of the present application, therefore, two limit values are also calculated by step S502 and are determined as the minimum fluctuation rate and the maximum fluctuation rate, respectively.

S503, determining the smaller value of the minimum fluctuation rate and the minimum percentile as a minimum fluctuation rate detection factor, and determining the larger value of the maximum percentile and the maximum fluctuation rate as a maximum fluctuation rate factor.

S504, constructing an outlier judgment condition corresponding to the target system by utilizing the minimum fluctuation rate detection factor and the maximum fluctuation rate factor.

The outlier judgment condition corresponding to the target system is smaller than the minimum fluctuation rate detection factor or larger than the maximum fluctuation rate factor.

S505, at least combining the outlier judgment conditions corresponding to the target system and the label information of the target system into an anomaly detection model corresponding to the target system.

Also alternatively, in order to facilitate analysis of the historical data, after step S505 is performed, the historical CPU usage time series data of the target system may be subjected to anomaly detection using the anomaly detection model corresponding to the target system, thereby detecting the anomaly data therein, and storing the detected anomaly data in the database.

S103, detecting whether the current CPU utilization rate of the target system is abnormal or not by using an abnormality detection model corresponding to the target system.

Because the abnormality detection model corresponding to the target system is constructed based on the historical data of the target system and the distribution characteristics of the historical data, whether the current CPU utilization rate of the target system is abnormal or not can be accurately detected.

If it is detected that the current CPU usage of the target system is abnormal, step S104 is executed.

Optionally, when the anomaly detection model corresponding to the target system is constructed by using the method shown in fig. 4 or fig. 5, the specific embodiment of step S103 includes:

detecting whether the current CPU utilization rate of the target system is smaller than a smaller value of detection factors in an anomaly detection model corresponding to the target system or larger than a smaller value of two detection factors.

And if the current CPU utilization rate of the detection target system is smaller than a smaller value of two detection factors in the abnormality detection model corresponding to the target system or larger than a larger value of a smaller value of the two detection factors, determining that the current CPU utilization rate of the target system is abnormal.

Optionally, in order to enable the operation and maintenance personnel to know more information, when determining that the data has abnormality, information such as data source distribution condition, data abnormality label, number of data judging methods, abnormality fluctuation ratio and the like can be added into the data stream, so that subsequent alarming and problem analysis can be performed according to the information in the data stream.

The data source distribution request refers to the distribution condition of the CPU utilization rate of the target system, that is, whether the distribution is normal distribution or non-normal distribution, or other distribution forms. The data exception label is a label indicating whether an exception exists. The number of the data judging methods refers to the number of methods for judging that the current CPU utilization rate is abnormal, namely, whether the current CPU utilization rate is abnormal or not can be detected through the method provided by the embodiment of the application, and multiple methods can be adopted for judging at the same time so as to further ensure the accuracy of the result.

S104, alarming at least based on the label information of the target system and the current CPU utilization rate of the target system.

Specifically, in order to enable the operation and maintenance personnel to know what type of data is abnormal, specifically what value is abnormal, and determine which system is abnormal, an alarm is required at least based on the label information of the target system, the current CPU utilization rate of the target system and the information in the abnormality detection model corresponding to the target system.

Optionally, in another embodiment of the present application, a specific implementation of step S104, as shown in fig. 6, includes the following steps:

S601, calculating a current fluctuation ratio according to the current CPU utilization rate of the target system.

Because the data distribution of different systems is different, different models are adopted during detection, and in order to make the feedback result homogeneous, comparison is convenient, so in the embodiment of the application, the current fluctuation ratio is calculated, and the corresponding alarm base is matched based on the fluctuation ratio, so that the hierarchical alarm is realized.

S602, determining an alarm level corresponding to the current fluctuation ratio.

S603, displaying the alarm grade and the label information of the target system.

Specifically, besides the alarm level, the tag information and the like of the target system can be used as alarm point information, namely, detailed information for display, so that operation and maintenance personnel can conveniently conduct problem investigation, diagnosis and the like according to the detailed information.

S604, utilizing the current CPU utilization rate of the target system and the utilization rate sequence of the preset time period before the current CPU utilization rate of the target system, and displaying the change trend graph.

It should be noted that, the situation of the sequence fluctuation may not be known in time only according to the information of the abnormal point, so that the situation of the sequence fluctuation before the occurrence of the abnormality and the data information with the abnormal point are drawn in the same expansion diagram by using different labels, so that the change trend before the abnormality of the sequence is directly displayed for the problem inspector, and meanwhile, whether the result of the abnormality detection is effective or not can be conveniently and timely confirmed, and the abnormality detection model can be conveniently and further optimized subsequently.

It is noted that the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, nodes, and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous.

Another embodiment of the present application provides a monitoring device for index anomalies, as shown in fig. 7, including:

the real-time data acquisition unit 701 is configured to acquire, in real time, a current CPU usage rate of the target system and tag information of the target system.

The model matching unit 702 is configured to match an anomaly detection model corresponding to the target system from the anomaly detection models that are constructed in advance based on the tag information of the target system.

An anomaly detection unit 703, configured to detect whether the current CPU usage of the target system is abnormal by using an anomaly detection model corresponding to the target system.

And an alarm unit 704, configured to, when detecting that the current CPU usage of the target system is abnormal, alarm based at least on the tag information of the target system and the current CPU usage of the target system.

Optionally, in the monitoring device for index anomaly provided in another embodiment of the present application, the monitoring device further includes:

and the log acquisition unit is used for acquiring the offline log data of the target system.

The conversion unit is used for carrying out structural conversion on the offline log data of the target system to obtain the label information of the target system and the historical CPU utilization rate time sequence data of the target system.

And the distribution form detection unit is used for detecting whether the historical CPU utilization rate time sequence data of the target system accords with normal distribution.

The first model construction unit is used for constructing an abnormality detection model corresponding to the target system by performing outlier analysis on the historical CPU usage time sequence data of the target system when the historical CPU usage time sequence data of the target system is detected to accord with normal distribution.

And the removing unit is used for detecting an abnormal value in the historical CPU usage time sequence data of the target system by using an isolated forest algorithm when the historical CPU usage time sequence data of the target system is detected to be not in accordance with normal distribution, and removing the abnormal value from the historical CPU usage time sequence data of the target system.

and the filling unit is used for filling the missing data in the historical CPU utilization time sequence data of the target system according to the data missing proportion in the historical CPU utilization time sequence data of the target system.

The downsampling unit is used for downsampling the filled historical CPU usage time sequence data of the target system so as to unify the time interval of the data in the historical CPU usage time sequence data of the target system.

Optionally, in the monitoring device for index anomalies provided in another embodiment of the present application, the first model building unit includes:

and the first feature extraction unit is used for extracting the mean value and the variance of the historical CPU usage time sequence data of the target system.

The first calculation unit is used for subtracting the product of the variance and the preset parameter from the mean value to obtain a minimum outlier detection factor, and adding the product of the variance and the preset parameter to the mean value to obtain a maximum outlier detection factor.

And the first condition construction unit is used for constructing an outlier judgment condition corresponding to the target system by utilizing the minimum outlier detection factor and the maximum outlier detection factor. The outlier judgment condition corresponding to the target system is smaller than the minimum outlier detection factor or larger than the maximum outlier detection factor.

The first model composing unit is used for at least composing the outlier judging condition corresponding to the target system and the label information of the target system into an abnormality detection model corresponding to the target system.

Optionally, in the monitoring device for index anomalies provided in another embodiment of the present application, the second model building unit includes:

the second feature extraction unit is used for extracting the minimum percentile, the maximum percentile, the first quartile and the third quartile of the historical CPU utilization time sequence data of the target system after the abnormal values are removed.

And the second calculation unit is used for subtracting the product of the preset parameter and the quartile difference value from the first quartile to obtain the minimum fluctuation rate, and adding the product of the preset parameter and the quartile difference value to the third quartile to obtain the maximum fluctuation rate. Wherein the third quartile of the quartile difference minus the first quartile.

And the factor determining unit is used for determining the smaller value of the minimum fluctuation rate and the minimum percentile as a minimum fluctuation rate detection factor and the larger value of the maximum percentile and the maximum fluctuation rate as a maximum fluctuation rate factor.

And the second condition construction unit is used for constructing an outlier judgment condition corresponding to the target system by utilizing the minimum fluctuation rate detection factor and the maximum fluctuation rate factor. The outlier judgment condition corresponding to the target system is smaller than the minimum fluctuation rate detection factor or larger than the maximum fluctuation rate factor.

And the second model forming unit at least forms an anomaly detection model corresponding to the target system together with the outlier judging condition corresponding to the target system and the label information of the target system.

Optionally, in the monitoring device for index anomalies provided in another embodiment of the present application, the anomaly detection unit includes:

an anomaly detection subunit, configured to detect whether a current CPU usage of the target system is smaller than a smaller value of detection factors in an anomaly detection model corresponding to the target system, or larger than a smaller value of two detection factors. And if the current CPU utilization rate of the detection target system is smaller than a smaller value of two detection factors in the abnormality detection model corresponding to the target system or larger than a larger value of a smaller value of the two detection factors, determining that the current CPU utilization rate of the target system is abnormal.

Optionally, in the monitoring device for index anomaly provided in another embodiment of the present application, the alarm unit includes:

and the fluctuation rate calculation unit is used for calculating the current fluctuation rate according to the current CPU utilization rate of the target system.

And the grade determining unit is used for determining the alarm grade corresponding to the current fluctuation ratio.

And the information display unit is used for displaying the alarm grade and the label information of the target system.

The drawing unit is used for utilizing the current CPU utilization rate of the target system and the utilization rate sequence of the preset time period before the current CPU utilization rate of the target system, changing the trend graph and displaying the change trend graph.

It should be noted that, for the specific working process of each unit provided in the above embodiment of the present application, reference may be made to corresponding steps in the above method embodiment accordingly, which is not described herein again.

Also, the functions of the various elements described in the above embodiments may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), etc.

Another embodiment of the present application provides an electronic device, as shown in fig. 8, which illustrates a schematic structural diagram of an electronic device 800 suitable for use in implementing embodiments of the present disclosure. The electronic device in the embodiment of the disclosure may include, but is not limited to, an electronic device such as a desktop computer, a notebook computer, a tablet computer, a vehicle-mounted terminal, and the like.

As shown in fig. 8, the electronic device 800 includes one or more processing means 801, such as a central processing unit, a graphics processor, etc., and a memory 802 on which one or more programs are stored. Wherein the one or more programs, when executed by the one or more processing devices 801, cause the one or more processing devices 801 to implement the method for monitoring for an indicator anomaly provided in any one of the embodiments described above.

Alternatively, the electronic device may also comprise other constituent structures, see also fig. 8, the processing means 801, the read-only memory ROM802, and the random access memory RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804. In general, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 807 including, for example, a liquid crystal display, a speaker, a vibrator, etc., a storage device 808 including, for example, a magnetic tape, a hard disk, etc., and a communication device 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 shows an electronic device 800 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

Another embodiment of the present application provides a computer storage medium storing a program that, when executed, is configured to implement the method for monitoring an indicator anomaly as described in any one of the above.

Computer storage media, including both non-transitory and non-transitory, removable and non-removable media, may be implemented in any method or technology for storage of information. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The method for monitoring the index abnormality is characterized by comprising the following steps:

2. The method of claim 1, wherein the method for constructing the anomaly detection model corresponding to the target system comprises:

acquiring offline log data of the target system;

3. The method of claim 2, wherein before detecting whether the historical CPU usage timing data of the target system meets the normal distribution, further comprising:

4. The method according to claim 2, wherein the constructing the anomaly detection model corresponding to the target system by performing outlier analysis on the historical CPU usage time series data of the target system includes:

5. The method according to claim 2, wherein the constructing the anomaly detection model corresponding to the target system by performing a fluctuation rate analysis on the historical CPU usage time series data of the target system from which the anomaly value is removed includes:

6. The method according to any one of claims 4 or 5, wherein the detecting whether the current CPU usage of the target system is abnormal using the abnormality detection model corresponding to the target system includes:

7. The method of claim 1, wherein the alerting based at least on the tag information of the target system and the current CPU usage of the target system comprises:

determining an alarm level corresponding to the current fluctuation ratio;

displaying the alarm grade and the label information of the target system;

8. A monitoring device for index anomalies, comprising:

9. An electronic device, comprising:

one or more processing devices;

a memory having one or more programs stored thereon;

when the one or more programs are executed by the one or more processing apparatuses, the one or more processing apparatuses are caused to implement the method of monitoring for an indicator anomaly as claimed in any one of claims 1 to 7.

10. A computer storage medium storing a program which, when executed, is adapted to implement the method of monitoring for an abnormality in an index as claimed in any one of claims 1 to 7.