CN113190406B - IT entity group anomaly detection method under cloud native observability - Google Patents

IT entity group anomaly detection method under cloud native observability Download PDF

Info

Publication number
CN113190406B
CN113190406B CN202110478056.8A CN202110478056A CN113190406B CN 113190406 B CN113190406 B CN 113190406B CN 202110478056 A CN202110478056 A CN 202110478056A CN 113190406 B CN113190406 B CN 113190406B
Authority
CN
China
Prior art keywords
entity
entities
data
point
index data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110478056.8A
Other languages
Chinese (zh)
Other versions
CN113190406A (en
Inventor
宋祥雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eisoo Information Technology Co Ltd
Original Assignee
Shanghai Eisoo Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eisoo Information Technology Co Ltd filed Critical Shanghai Eisoo Information Technology Co Ltd
Priority to CN202110478056.8A priority Critical patent/CN113190406B/en
Publication of CN113190406A publication Critical patent/CN113190406A/en
Application granted granted Critical
Publication of CN113190406B publication Critical patent/CN113190406B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection

Abstract

The invention relates to an IT entity group anomaly detection method under cloud native observability, which comprises the following steps: 1) Acquiring historical time sequence data of the IT entity group in the same index and time period; 2) Judging whether the IT entity group is suitable for group abnormity detection according to the historical time sequence data, if so, executing the step 3), and if not, ending; 3) Performing data compression on the historical time series data, and performing backward difference calculation to obtain a backward difference matrix; 4) Calculating the distance between each IT entity in the IT entity group and other IT entities according to the backward difference matrix; 5) Identifying abnormal IT entities through the LOF step according to the distance obtained by calculation in the step 4); 6) And calculating abnormal points and the severity thereof generated by each IT entity by taking the normal IT entity as a reference. Compared with the prior art, the method and the device can simultaneously detect the abnormal indexes of the IT entities, and have high calculation efficiency.

Description

IT entity group anomaly detection method under cloud native observability
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an IT entity group abnormity detection method under cloud native observability.
Background
A cloud-native-based micro service architecture is a current technical trend, and under the cloud-native micro service architecture, a large number of applications are deployed in a distributed cluster manner, in the distributed cluster, nodes, applications, services and other IT entities in the cluster are generally configured in the same aspects and have homogeneity. In a cluster, IT entities such as nodes, applications or services with the same configuration or attributes form a group. The traditional anomaly detection method generally adopts a similarity measurement model, a probability statistics model, a regression model and other methods to detect anomalies aiming at historical time sequence data of a single IT entity under a certain index, wherein in the index data of the IT entities, some IT entities have homogeneity under certain indexes, namely have similar behaviors or modes and tend to be consistent in variation trend. If, for some indicators, the variation trend of the indicator data of a certain IT entity is greatly different from the variation trend of the indicator data of other IT entities within a certain time period on the premise that a plurality of IT entities have homogeneity, the IT entity may have an abnormality. If the traditional anomaly detection method is adopted to detect a plurality of IT entities one by one, the calculation efficiency is low.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide the method for detecting the abnormality of the IT entity group under the cloud native observability, which can simultaneously detect the abnormality of the index data of a plurality of IT entities with homogeneity and has high calculation efficiency;
the purpose of the invention can be realized by the following technical scheme:
a method for detecting an IT entity group abnormity under cloud native observability comprises the following steps:
1) Acquiring historical time sequence data of the IT entity group in the same index and time period;
2) Judging whether the IT entity group is suitable for group abnormality detection or not according to the historical time sequence data, if so, executing the step 3), and if not, ending;
3) Performing data compression on the historical time sequence data, and performing backward difference calculation to obtain a backward difference matrix;
4) Calculating the distance between each IT entity in the IT entity group and other IT entities according to the backward difference matrix;
5) Identifying abnormal IT entities through the LOF step according to the distance obtained by calculation in the step 4);
6) And calculating abnormal points and the severity thereof generated by each IT entity by taking the normal IT entity as a reference.
Further, step 2) comprises:
judging whether the following conditions are met simultaneously:
the sample size of the index data of each IT entity is not less than the set sample size;
the number of IT entities in the IT entity group is not less than a preset value, and the preset value is not less than 3;
if yes, the IT entity group is judged to be applicable to group abnormity detection, and if not, the IT entity group is not applicable.
Further, the data compression is performed on the historical time-series data through a PAA step, wherein the PAA step comprises:
averagely dividing index values of all index data in historical time sequence data, dividing the index data into n sections, taking the average value of non-null values of each section as new data, and taking the initial value of each section as the index value of the new data, so that the length of the index data is compressed to n;
the following two problems can be solved by data compression:
when the index sample number of each IT entity is excessive, the sample size can be reduced on the basis of furthest retaining the characteristics of data through data compression so as to improve the efficiency of the algorithm;
when the time corresponding to the index data of each IT entity slightly deviates due to machine calculation and the like, the compressed sample size can be controlled through data compression, so that the compressed sample size is kept unchanged, but the time corresponding to the index data of each IT entity is kept consistent.
Further, the distance of each IT entity from other IT entities is calculated by the FastDTW step.
Further, step 5) comprises:
calculating a local outlier LOF of the IT entity, judging whether the LOF is larger than a set threshold of the local outlier, if so, judging the IT entity to be an abnormal IT entity, and otherwise, judging the IT entity to be a normal IT entity.
Further, regarding the IT entity as a sample point, the calculation formula of the local outlier LOF is:
Figure BDA0003047985730000021
wherein ρ k (O) is the local achievable density, ρ, of the point O k (P) is N k Local achievable density of other points in (O), N k (O) is the kth distance domain of point O.
Further, N k (O) satisfies:
N k (O)={P′∈D\{O}|d(O,P′)≤d k (O)}
wherein d is k (O) is the kth distance of point O, d k (O) = d (O, P), P being k points closest to the point O, satisfying the following condition:
there are at least k points P 'e D \ O } in the set such that D (O, P') ≦ D (O, P);
at most k-1 points P 'e D \ O } exist in the set, such that D (O, P') < D (O, P);
ρ k the formula for calculation of (O) is:
Figure BDA0003047985730000031
wherein, d k (O, P) is the k-th reachable distance from the point P to the point O, and the calculation formula is as follows:
d k (O,P)=max{d k (O),d(O,P)}。
further, step 6) comprises:
61 According to the backward difference matrix, a normal IT entity is taken as a reference entity, and the standard difference number bias of the difference between the index data sample point of the IT entity and the reference entity at each time point is identified;
62 According to bias, the severity of the index data sample point is determined by a preset severity rule.
Further, the calculation formula of the standard deviation quantity bias is specifically as follows:
when the standard deviation is not null and not 0,
Figure BDA0003047985730000032
the standard deviation is not null and is 0, and when the mean value is not 0,
Figure BDA0003047985730000033
the standard deviation is not null and is 0, and when the mean value is 0,
Figure BDA0003047985730000034
wherein mean is a non-empty mean value of the index data of the reference entity, σ is a non-empty standard deviation of the index data of the reference entity, and default _ mean is a default mean value of the set reference entity index data.
Further, the severity is divided into unknown, normal and abnormal, and the severity rule includes:
621 Judging whether the original value of the index data sample point of the IT entity is-1 or null, if so, marking the index data sample point as an unknown sample point, otherwise, executing step 622);
622 Whether the bias is not larger than the set quantity is judged, if so, the index data sample point is marked as a normal sample point, otherwise, the index data sample point is marked as an abnormal sample point.
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention obtains historical time sequence data of a group of IT entities in the same time period under a certain index, judges the applicability of the group of IT entities according to a preset condition, then performs data compression on the index data of the group of IT entities, respectively performs backward difference calculation according to the compressed index data of each IT entity, merges the results to form a backward difference matrix, calculates the distance between each IT entity and other IT entities under a certain index according to the difference matrix, identifies abnormal IT entities according to the distance of each IT entity, and detects the time when the IT entities generate abnormity and the severity of the abnormity, the invention can simultaneously perform abnormity detection on the index data of a plurality of IT entities, can identify whether the IT entities have homogeneity, detects whether the IT entities belong to IT abnormal entities, and greatly improves the calculation efficiency;
(2) According to the invention, data compression is carried out on historical time sequence data through the PAA step, through the data compression, the sample size is reduced on the basis of maximally retaining the characteristics of the data, the algorithm efficiency is improved, and meanwhile, the compressed sample size can be controlled through the data compression, so that the compressed sample size is kept unchanged, but the time corresponding to the index data of each IT entity is kept consistent, and thus the group abnormity detection is carried out;
(3) According to the invention, the distance between each IT entity and other IT entities is calculated through the step of FastDTW according to the backward difference matrix, so that the efficiency of the algorithm is improved;
(4) The method calculates the local outlier factor, judges whether the IT entity is an abnormal IT entity according to the size of the local outlier factor, and has high algorithm efficiency;
(5) According to the method, the severity result of the index sample point of each IT entity is obtained according to the standard deviation number of the index sample point of each IT entity from the index data of the benchmark entity and the severity rule, the abnormal condition of the IT entity at which time and the severity degree of the abnormal condition can be detected, and the accuracy is high.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a schematic diagram of a two-phase time sequence.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
A method for detecting an IT entity group anomaly under cloud native observability, as shown in FIG. 1, includes:
1) Acquiring historical time sequence data of the IT entity group in the same index and time period;
2) Judging whether the IT entity group is suitable for group abnormity detection according to the historical time sequence data, if so, executing the step 3), and if not, ending;
3) Performing data compression on the historical time sequence data, and performing backward difference calculation to obtain a backward difference matrix;
4) Calculating the distance between each IT entity and other IT entities through a FastDTW step according to the backward difference matrix;
5) Identifying abnormal IT entities through the LOF step according to the distance obtained by calculation in the step 4);
6) And calculating abnormal points and the severity thereof generated by each IT entity by taking the normal IT entity as a reference.
In step 1), a group of IT entities needs to be determined, in this embodiment, multiple es servers are selected as IT entities, the CPU utilization of the IT entities in the same time period is obtained as the index data of the IT entities, in order to ensure accuracy, the time intervals corresponding to the index data of the IT entities are also consistent, and the obtained historical time series data are shown in table 1:
TABLE 1 historical time series data Table
0 1 2 3 4 5 6
1604419200000 0.134 0.134 0.136 0.132 0.198 0.068 0.066
1604419500000 0.134 0.134 0.068 0.068 0.134 0.134 0.134
1604419800000 0.066 0.066 0.134 0.134 0.132 0.136 0.136
1604420100000 0.132 0.132 0.134 0.134 0.132 0.134 0.066
1604420400000 0.134 0.068 0.132 0.132 0.134 0.198 0.134
1604420700000 0.066 0.132 0.066 0.066 0.134 0.134 0.132
1604421000000 0.136 0.134 0.134 0.134 0.134 0.134 0.132
1604421300000 0.134 0.066 0.068 0.132 0.068 0.068 0.134
1604421600000 0.134 0.134 0.132 0.134 0.134 0.134 0.068
1604421900000 0.066 0.136 0.066 0.132 0.134 0.134 0.198
The first column represents a timestamp, the first row represents the number of the es server, and the data in the rest cells represent the CPU utilization rate of the corresponding es server at the corresponding time.
The step 2) comprises the following steps:
judging whether the following conditions are met simultaneously:
the sample size of the index data of each IT entity is not less than the set sample size;
the number of the IT entities in the IT entity group is not less than a preset value, and the preset value is not less than 3;
if yes, the IT entity group is judged to be applicable to group abnormity detection, otherwise, the IT entity group is not applicable.
According to the 80% rule, IT is considered that when the non-missing part of a certain substance is lower than 80% of the total sample size, the substance is recommended to be deleted, so that the sample size is set to be 80% of the due sample size of a single IT entity, namely when the sample size of the index data of the IT entity is lower than 80% of the due sample size, the group of IT entities is not suitable for group anomaly detection, for example, when the acquired due sample size of a single es server is 288, and the sample size of a certain es server is 200, 200 < 288 × 80%, the group of es servers is not suitable for group anomaly detection;
since the group anomaly detection requires three or more IT entities for comparison, a preset value may be set for the number of IT entities, the number of the set of IT entities should be greater than or equal to the preset value, and the required number of IT entities may be modified according to specific requirements, for example, the number of IT entities is 7 in the example shown in table 1, if the preset value is 3, the example in table 1 is suitable for group anomaly detection, and if the preset value is 9, and the number of IT entities in the example in table 1 is less than 9, the example in table 1 is not suitable for group anomaly detection.
Data compression is carried out on the historical time series data through a PAA step, wherein the PAA step comprises the following steps:
averagely dividing index values of index data in historical time sequence data, dividing the index data with the length of m into n sections, taking the average value of non-null values of each section as new data, taking the initial value of the index value of each section as the index value of the new data, and compressing the length of the index data to n;
the following two problems can be solved by data compression:
when the index sample number of each IT entity is excessive, the sample size can be reduced on the basis of furthest retaining the characteristics of data through data compression so as to improve the efficiency of the algorithm;
when the time corresponding to the index data of each IT entity slightly deviates due to machine calculation and the like, the compressed sample size can be controlled through data compression, so that the compressed sample size is kept unchanged, but the time corresponding to the index data of each IT entity is kept consistent.
For example, according to the sample data in table 1, the sample size of each es server index data is 10, the index of the summarized index data is divided into 5 segments according to the average value of the numerical values, that is, the first column data in the table is divided into 5 segments according to the equal intervals of the numerical values, and then the average value of the non-null values of each corresponding index data in each segment is calculated, that is, the sample size of each es server index data can be compressed to 5, and the corresponding obtained results are as shown in table 2:
TABLE 2 History time series data table after data compression
0 1 2 3 4 5 6
1604419200000 0.134 0.134 0.102 0.1 0.166 0.101 0.1
1604419800000 0.099 0.099 0.134 0.134 0.132 0.135 0.101
1604420400000 0.1 0.1 0.099 0.099 0.134 0.166 0.133
1604421000000 0.135 0.1 0.101 0.133 0.101 0.101 0.133
1604421600000 0.1 0.135 0.099 0.133 0.134 0.134 0.133
The group anomaly detection mainly detects whether the index data change trends of the IT entities are consistent, so that the backward difference is selected as the measure of the variable quantity of each index data at each moment, the index data of each IT entity after data compression are respectively subjected to backward difference calculation, and the backward difference matrix is formed by combining the backward difference calculation and the backward difference calculation.
The backward difference is defined as: the backward difference of the current time is the difference between the current time position and the previous time position, and the next backward difference is used for filling because the backward difference of the first time cannot be obtained, and the backward difference
Figure BDA0003047985730000071
The calculation formula of (c) is as follows:
Figure BDA0003047985730000072
the backward difference is calculated according to the sample data in table 1, and a backward difference matrix is generated, and the result is shown in table 3:
TABLE 3 backward difference matrix calculation results table
0 1 2 3 4 5 6
1604419200000 0 0 -0.068 -0.064 -0.064 0.066 0.068
1604419500000 0 0 -0.068 -0.064 -0.064 0.066 0.068
1604419800000 -0.068 -0.068 0.066 0.066 -0.002 0.002 0.002
1604420100000 0.066 0.066 0 0 0 -0.002 -0.07
1604420400000 0.002 -0.064 -0.002 -0.002 0.002 0.064 0.068
1604420700000 -0.068 0.064 -0.066 -0.066 0 -0.064 -0.002
1604421000000 0.07 0.002 0.068 0.068 0 0 0
1604421300000 -0.002 -0.068 -0.066 -0.002 -0.066 -0.066 0.002
1604421600000 0 0.068 0.064 0.002 0.066 0.066 -0.066
1604421900000 -0.068 0.002 -0.066 -0.002 0 0 0.13
And calculating the distance between each IT entity and other IT entities under a certain index through a FastDTW step according to the backward difference matrix. FastDTW is an acceleration algorithm of a dynamic time adjustment algorithm DTW, which is a method for measuring similarity between two time sequences by using a dynamic programming concept, and is mostly used for detecting the similarity of two voices, because the length of each letter pronunciation is different during each utterance, the two voices do not completely coincide, the dynamic time adjustment algorithm stretches or compresses the voices so that they are aligned as much as possible, as shown in fig. 2, the DTW calculates the similarity between the two time sequences by extending and shortening the time sequences, and FastDTW accelerates the calculation of the DTW by comprehensively using two methods, namely a restriction method and a data abstraction method.
The distance between each IT entity is calculated according to the backward difference matrix, the distance matrix between the IT entities is obtained by summarizing, and the result is shown in table 4:
table 4 table of calculation results of distances between IT entities
Figure BDA0003047985730000073
Figure BDA0003047985730000081
The first row and the first column in table 4 each represent the number of each es server, and as can be seen from table 4, the distance matrix between the IT entities is a symmetric matrix.
Regarding the IT entity as a sample point, step 5) includes:
51 Calculate the kth distance;
d k (O) is the kth distance of point O, d k (O) = d (O, P), P is k points closest to the point O, and the following condition is satisfied:
there are at least k points P 'e D \ O } in the set such that D (O, P') ≦ D (O, P);
at most k-1 points P 'epsilon D \ O } exist in the set, so that D (O, P') < D (O, P);
52 Calculate the kth distance domain;
let N k (O) is the kth distance domain of point O, satisfying:
N k (O)={P′∈D\{O}|d(O,P′)≤d k (O)}
N k (O) includes all points whose distance to point O is less than the kth distance of point O;
53 Calculating a kth reachable distance;
the k-th reachable distance from point P to point O is defined as:
d k (O,P)=max{d k (O),d(O,P)}
that is, the k-th reachable distance from the point P to the point O is at least the k-th distance from the point O, and the k points nearest to the point O are equivalent to the reachable distances from the point O and are all equal to d k (O);
54 Calculating local reachable density;
the local achievable density is defined as:
Figure BDA0003047985730000082
ρ k (O) average reachable distance to the point O of all points in the k-th distance domain of the point O, the number of points on the k-th neighborhood boundary will be counted as k even if the number of points is greater than 1, and if the point O and surrounding neighborhood points are in the same cluster, the more likely the reachable distance is d, which is smaller k (O) resulting in a smaller sum of the reachable distances and a larger local reachable density, if point O is further away from surrounding neighborhood points, the reachable distance may take a larger value d (O, P), resulting in a larger sum of the reachable distances and a smaller local reachable density;
55 Calculating a local outlier LOF for the T entity;
the local outlier factor LOF is calculated as:
Figure BDA0003047985730000091
wherein, LOF k (O) the kth distance domain N of the point O k (O) mean of the ratio of the local achievable density of the other points to the local achievable density of the point O, LOF k The closer to 1 (O), the larger the difference in the point density in the neighborhood of the point O, and the point O may belong to the same cluster as the neighborhood, if LOF is larger k (O) is less than 1, indicating that the density of the point O is higher than that of the neighboring points, the point O is a dense point, if LOF k (O) is greater than 1, indicating that the density of points O is less than its neighborhood point density, O may be outliers.
Judging whether LOF is larger than a set threshold value of a local outlier factor, if so, judging that the IT entity is an abnormal IT entity and marking as-1, otherwise, judging that the IT entity is a normal IT entity and marking as 1, in the embodiment, according to a distance matrix among the IT entities in the table 4, setting a threshold value of the local outlier factor as 1.2, and judging as [1, -1], wherein the variation trend of the CPU utilization rate of the es server with the number of 6 is different from that of the rest es servers, and the IT entity is judged as an abnormal IT entity.
Step 6) comprises the following steps:
61 According to the backward difference matrix, a normal IT entity is taken as a reference entity, and the standard difference number bias of the difference between the index data sample point of the IT entity and the reference entity at each time point is identified;
62 According to bias, the severity of the index data sample point is determined through a preset severity rule.
The calculation formula of the standard deviation quantity bias is specifically as follows:
when the standard deviation is not null and is not 0,
Figure BDA0003047985730000092
the standard deviation is not null and is 0, and when the mean value is not 0,
Figure BDA0003047985730000093
the standard deviation is 0 instead of null, and when the mean value is 0,
Figure BDA0003047985730000094
the mean is a non-empty mean value of the index data of the reference entity, the sigma is a non-empty standard deviation of the index data of the reference entity, and the default _ mean is a default mean value of the set index data of the reference entity.
In this embodiment, taking the backward difference matrix in table 3 as an example, default _ mean is set to 0.5, when the timestamp is 1604421900000, the non-null mean value of the index data of each reference entity is-0.02233, the non-null standard deviation is 0.03161, the backward difference of the CPU utilization of the es server with number 1 is-0.068, which has | (-0.068- (-0.02233))/0.03161 | -1.444 from the reference, the backward difference of the CPU utilization of the es server with number 6 is 0.13, which has | (0.13- (-0.02233))/0.03161 | -4.819 standard deviations from the reference, and the number of standard deviations of the index sample points of each IT entity from the reference entity index data can be obtained according to the non-null mean value and the non-null standard deviation of the index data of the backward difference matrix and the reference entity in table 3, and the number of standard deviations of the index sample points of each IT entity from the reference entity are as shown in table 5:
TABLE 5 Table of results of calculation of standard deviation
0 1 2 3 4 5 6
1604419200000 0.443 0.443 0.947 0.865 0.865 1.792 1.833
1604419500000 0.443 0.443 0.947 0.865 0.865 1.792 1.833
1604419800000 1.231 1.231 1.218 1.218 0.024 0.049 0.049
1604420100000 1.414 1.414 0.691 0.691 0.691 0.755 2.923
1604420400000 0.054 1.730 0.054 0.054 0.054 1.730 1.839
1604420700000 0.697 1.956 0.656 0.656 0.670 0.616 0.630
1604421000000 1.039 0.960 0.980 0.980 1.019 1.019 1.019
1604421300000 1.414 0.756 0.690 1.414 0.690 0.690 1.545
1604421600000 1.446 0.772 0.641 1.380 0.706 0.706 3.598
1604421900000 1.445 0.770 1.381 0.643 0.707 0.707 4.819
The severity is divided into unknown, normal and abnormal, denoted 0, 2 and 3, respectively, and the severity rules include:
621 ) determining whether the original value of the index data sample point of the IT entity is-1 or null, if so, marking the index data sample point as an unknown sample point, otherwise, executing step 622)
622 Whether the bias is not larger than the set quantity is judged, if so, the index data sample point is marked as a normal sample point, otherwise, the index data sample point is marked as an abnormal sample point.
In this embodiment, the number is set to 3, and according to the number of standard deviations between the index sample point of each IT entity and the reference entity index data in table 5 and the severity rule, the severity result of the index sample point of each IT entity can be obtained, as shown in table 6:
TABLE 6 severity result table of IT entity index sample points
0 1 2 3 4 5 6
1604419200000 2 2 2 2 2 2 2
1604419500000 2 2 2 2 2 2 2
1604419800000 2 2 2 2 2 2 2
1604420100000 2 2 2 2 2 2 2
1604420400000 2 2 2 2 2 2 2
1604420700000 2 2 2 2 2 2 2
1604421000000 2 2 2 2 2 2 2
1604421300000 2 2 2 2 2 2 2
1604421600000 2 2 2 2 2 2 3
1604421900000 2 2 2 2 2 2 3
In order to further evaluate the group anomaly detection effect, the embodiment collects the CPU utilization rate data of 12 servers of a company in the same day, the time interval is 5 minutes, and the effect of the group anomaly detection method is measured by utilizing a two-classification confusion matrix through the accuracy, the precision, the recall rate and the F1 value;
the two-class confusion matrix is shown in table 7:
TABLE 7 two-class confusion matrix
Predicted as Positive Predicted as Negative
Labeled as Positive True Positive(TP) False Negative(FN)
Labeled as Negative False Positive(FP) True Negative(TN)
As in table 7, the results can be divided into:
true example True Positive, i.e., TP: the real category is a positive example, and the prediction category is a positive example;
false Negative, i.e. FN: the true category is a positive example, and the predicted category is a negative example;
false Positive, FP: the true category is a negative example, and the predicted category is a positive example;
true Negative example True Negative, namely TN: the true category is a negative example and the predicted category is a negative example.
The accuracy represents the ratio of the number of correctly predicted samples to the total number of predicted samples, and the calculation formula of the accuracy is as follows:
Figure BDA0003047985730000111
the precision ratio precision represents how many of all samples judged to be positive are true positive samples, and is also called precision ratio, and the calculation formula is as follows:
Figure BDA0003047985730000112
the recall rate recall represents how many positive samples are judged as positive samples by the model, and is also called recall rate, and the calculation formula is as follows:
Figure BDA0003047985730000113
F-Measure is a weighted harmonic mean of recall ratio and precision ratio, also called F-Score, and in this embodiment, F1 is used, and the calculation formula is shown below.
Figure BDA0003047985730000114
The evaluation results of the method for detecting an abnormality of an IT entity group provided in this embodiment are shown in table 8:
TABLE 8 evaluation result table of IT entity group anomaly detection method
Figure BDA0003047985730000115
Figure BDA0003047985730000121
In order to verify the efficiency of the group anomaly detection method in the large data volume scenario, this embodiment selects 10 similar IT entities in the same time period, tests the efficiency of group anomaly detection as the data volume increases, and the result is shown in table 9:
TABLE 9 Algorithm efficiency test result table
Data volume Time(s)
60*10 1.132
120*10 2.283
180*10 3.672
240*10 4.593
300*10 5.887
360*10 7.138
420*10 8.166
480*10 9.441
540*10 10.635
600*10 11.739
As shown in table 9, as the data amount increases, the operation time of the group anomaly detection method gradually increases, and the group anomaly detection method includes a step of data compression, which can reduce the data amount on the basis of maximally preserving the data characteristics and improve the efficiency of the algorithm, so that when group anomaly detection with a large data amount is performed, the original data of the IT entity can be appropriately compressed according to the operation efficiency in table 9, so as to improve the efficiency of the algorithm, for example, when the compressed data amount is maintained at 240 × 10, group anomaly detection can be guaranteed to be completed within 5 s.
The embodiment provides an IT entity group anomaly detection method under cloud native observability, which can perform anomaly detection on index data of a plurality of IT entities simultaneously, can identify whether the IT entities have homogeneity, which IT entities belong to anomalous entities, detect the times when the IT entities generate anomalies, perform data compression through a PAA (packet access) step, calculate the distance between each IT entity and other IT entities through a FastDTW step, improve the efficiency of group anomaly detection, identify anomalous IT entities through an LOF (low-order-of-compliance) step, obtain the severity result of index sample points of each IT entity according to the standard deviation number of the index sample points of each IT entity from the index data of a reference entity and the severity rule, and improve the accuracy of group anomaly detection.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions that can be obtained by a person skilled in the art through logical analysis, reasoning or limited experiments based on the prior art according to the concepts of the present invention should be within the scope of protection determined by the claims.

Claims (6)

1. A method for detecting an IT entity group anomaly under cloud native observability, wherein the IT entity group comprises a plurality of IT entities with same configuration or attribute, the IT entities comprise nodes, applications or services in a cluster, and the method comprises the following steps:
1) Acquiring historical time sequence data of the IT entity group in the same index and time period;
2) Judging whether the IT entity group is suitable for group abnormity detection according to the historical time sequence data, if so, executing the step 3), and if not, ending;
3) Performing data compression on the historical time sequence data, and performing backward difference calculation to obtain a backward difference matrix;
4) Calculating the distance between each IT entity in the IT entity group and other IT entities according to the backward difference matrix;
5) Identifying abnormal IT entities through the LOF step according to the distance obtained by calculation in the step 4);
6) Calculating abnormal points and severity thereof generated by each IT entity by taking normal IT entities as a reference;
the step 2) comprises the following steps:
judging whether the following conditions are met simultaneously:
the sample size of the index data of each IT entity is not less than the set sample size;
the number of the IT entities in the IT entity group is not less than a preset value;
if yes, the IT entity group is judged to be applicable to group abnormity detection, otherwise, the IT entity group is not applicable;
step 5) comprises the following steps:
calculating a local outlier LOF of the IT entity, judging whether the LOF is larger than a set threshold of the local outlier, if so, judging the IT entity to be an abnormal IT entity, and otherwise, judging the IT entity to be a normal IT entity;
step 6) comprises the following steps:
61 According to the backward difference matrix, a normal IT entity is used as a reference entity, and the standard difference quantity bias of the difference between the index data sample point of the IT entity and the reference entity at each time point is identified;
62 According to bias, judging the severity of the index data sample point through a preset severity rule;
the severity is divided into unknown, normal and abnormal, and the severity rule comprises:
621 Judging whether the original value of the index data sample point of the IT entity is-1 or null, if so, marking the index data sample point as an unknown sample point, otherwise, executing step 622);
622 Judging whether the bias is not more than the set quantity, if so, marking the index data sample point as a normal sample point, otherwise, marking the index data sample point as an abnormal sample point.
2. The method of claim 1, wherein the historical time series data is data compressed by a PAA step, the PAA step comprises:
and averagely dividing the index data according to the index value of each index data in the historical time sequence data, dividing the index data into n sections, taking the average value of non-null values of each section as new data, and taking the initial value of each section as the index value of the new data, so that the length of the index data is compressed to n.
3. The method of claim 1, wherein the distance between each IT entity and other IT entities is calculated through a FastDTW step.
4. The method of claim 1, wherein the IT entities are regarded as sample points, and the local outlier LOF is calculated by the following formula:
Figure FDA0003927212810000021
where ρ is k (O) is the local achievable density, ρ, of the point O k (P') is N k Local achievable density of other points within (O), N k (O) is the kth distance neighborhood of point O.
5. The method of claim 4, wherein N is N k (O) satisfies:
N k (O)={P′∈D\{O}|d(O,P′)≤d k (O)}
wherein d is k (O) is the kth distance of point O, d k (O) = d (O, P), P is the k-th point closest to the point O, and the following condition is satisfied:
there are at least k points P 'e D \ O } in the set such that D (O, P') ≦ D (O, P);
at most k-1 points P 'e D \ O } exist in the set, such that D (O, P') < D (O, P);
ρ k the formula for calculation of (O) is:
Figure FDA0003927212810000022
wherein d is k (O, P ') is the k-th reachable distance from the point P' to the point O, and the calculation formula is:
d k (O,P′)=max{d k (O),d(O,P′)}。
6. the method according to claim 1, wherein the calculation formula of the standard deviation number bias is specifically:
when the standard deviation is not null and not 0,
Figure FDA0003927212810000031
the standard deviation is not null and is 0, and when the mean value is not 0,
Figure FDA0003927212810000032
the standard deviation is 0 instead of null, and when the mean value is 0,
Figure FDA0003927212810000033
wherein mean is a non-empty mean value of the index data of the reference entity, σ is a non-empty standard deviation of the index data of the reference entity, and default _ mean is a default mean value of the set reference entity index data.
CN202110478056.8A 2021-04-30 2021-04-30 IT entity group anomaly detection method under cloud native observability Active CN113190406B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110478056.8A CN113190406B (en) 2021-04-30 2021-04-30 IT entity group anomaly detection method under cloud native observability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110478056.8A CN113190406B (en) 2021-04-30 2021-04-30 IT entity group anomaly detection method under cloud native observability

Publications (2)

Publication Number Publication Date
CN113190406A CN113190406A (en) 2021-07-30
CN113190406B true CN113190406B (en) 2023-02-03

Family

ID=76983217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110478056.8A Active CN113190406B (en) 2021-04-30 2021-04-30 IT entity group anomaly detection method under cloud native observability

Country Status (1)

Country Link
CN (1) CN113190406B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112416661A (en) * 2020-11-18 2021-02-26 清华大学 Multi-index time sequence anomaly detection method and device based on compressed sensing

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6895534B2 (en) * 2001-04-23 2005-05-17 Hewlett-Packard Development Company, L.P. Systems and methods for providing automated diagnostic services for a cluster computer system
US7203431B2 (en) * 2003-12-26 2007-04-10 Ricoh Company, Ltd. Abnormality determining method, abnormality determining apparatus, and image forming apparatus
US7484132B2 (en) * 2005-10-28 2009-01-27 International Business Machines Corporation Clustering process for software server failure prediction
JP5868784B2 (en) * 2012-05-31 2016-02-24 横河電機株式会社 Process monitoring system and method
IL229819A (en) * 2013-12-05 2016-04-21 Deutsche Telekom Ag System and method for it servers anomaly detection using incident consolidation
US11683234B2 (en) * 2015-07-14 2023-06-20 Netflix, Inc. Server outlier detection
US10721254B2 (en) * 2017-03-02 2020-07-21 Crypteia Networks S.A. Systems and methods for behavioral cluster-based network threat detection
US11621969B2 (en) * 2017-04-26 2023-04-04 Elasticsearch B.V. Clustering and outlier detection in anomaly and causation detection for computing environments
US10592372B2 (en) * 2017-07-18 2020-03-17 Vmware, Inc. Confidence-controlled sampling methods and systems to analyze high-frequency monitoring data and event messages of a distributed computing system
US11157782B2 (en) * 2017-11-16 2021-10-26 International Business Machines Corporation Anomaly detection in multidimensional time series data
US11341374B2 (en) * 2018-05-29 2022-05-24 Microsoft Technology Licensing, Llc Data anomaly detection
CN110873613A (en) * 2018-09-04 2020-03-10 北京奇虎科技有限公司 Method and device for processing machine room abnormity based on temperature monitoring
US10855548B2 (en) * 2019-02-15 2020-12-01 Oracle International Corporation Systems and methods for automatically detecting, summarizing, and responding to anomalies
US11481300B2 (en) * 2019-04-23 2022-10-25 Vmware, Inc. Processes and systems that detect abnormal behavior of objects of a distributed computing system
CN110569890A (en) * 2019-08-23 2019-12-13 河海大学 Hydrological data abnormal mode detection method based on similarity measurement

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112416661A (en) * 2020-11-18 2021-02-26 清华大学 Multi-index time sequence anomaly detection method and device based on compressed sensing

Also Published As

Publication number Publication date
CN113190406A (en) 2021-07-30

Similar Documents

Publication Publication Date Title
CN109816031B (en) Transformer state evaluation clustering analysis method based on data imbalance measurement
KR101872342B1 (en) Method and device for intelligent fault diagnosis using improved rtc(real-time contrasts) method
CN113037595B (en) Abnormal device detection method and device, electronic device and storage medium
Abid et al. Anomaly detection through outlier and neighborhood data in Wireless Sensor Networks
CN111401785A (en) Power system equipment fault early warning method based on fuzzy association rule
CN116678552B (en) Abnormality monitoring method for optical fiber stress sensor in variable temperature environment
CN109597757B (en) Method for measuring similarity between software networks based on multidimensional time series entropy
CN110543907A (en) fault classification method based on microcomputer monitoring power curve
CN110544047A (en) Bad data identification method
CN117195137B (en) Rotor die casting error detecting system based on data analysis
CN108399115B (en) Operation and maintenance operation detection method and device and electronic equipment
CN109426655A (en) Data analysing method, device, electronic equipment and computer readable storage medium
CN112131575A (en) Concept drift detection method based on classification error rate and consistency prediction
CN113918642A (en) Data filtering, monitoring and early warning method based on power Internet of things equipment
CN114997256A (en) Method and device for detecting abnormal power of wind power plant and storage medium
JP2016537702A (en) Method and system for evaluating measurements obtained from a system
CN114547145A (en) Method, system, storage medium and equipment for detecting time sequence data abnormity
CN112949735A (en) Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining
CN113190406B (en) IT entity group anomaly detection method under cloud native observability
CN113515450A (en) Environment anomaly detection method and system
CN110472188A (en) A kind of abnormal patterns detection method of facing sensing data
CN113255810B (en) Network model testing method based on key decision logic design test coverage rate
CN115766513A (en) Anomaly detection method and device
CN115511106B (en) Method, device and readable storage medium for generating training data based on time sequence data
EP3502818B1 (en) Operational status classification device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: An IT entity group anomaly detection method under cloud native observability

Effective date of registration: 20231115

Granted publication date: 20230203

Pledgee: Bank of Shanghai Limited by Share Ltd. Pudong branch

Pledgor: SHANGHAI EISOO INFORMATION TECHNOLOGY Co.,Ltd.

Registration number: Y2023310000743

PE01 Entry into force of the registration of the contract for pledge of patent right