CN116383645A - Intelligent system health degree monitoring and evaluating method based on anomaly detection - Google Patents
Intelligent system health degree monitoring and evaluating method based on anomaly detection Download PDFInfo
- Publication number
- CN116383645A CN116383645A CN202310284349.1A CN202310284349A CN116383645A CN 116383645 A CN116383645 A CN 116383645A CN 202310284349 A CN202310284349 A CN 202310284349A CN 116383645 A CN116383645 A CN 116383645A
- Authority
- CN
- China
- Prior art keywords
- data
- index
- value
- anomaly detection
- anomaly
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 102
- 238000001514 detection method Methods 0.000 title claims abstract description 93
- 230000036541 health Effects 0.000 title claims abstract description 62
- 238000012544 monitoring process Methods 0.000 title claims abstract description 41
- 230000008569 process Effects 0.000 claims abstract description 14
- 230000002159 abnormal effect Effects 0.000 claims description 47
- 230000005856 abnormality Effects 0.000 claims description 28
- 230000000737 periodic effect Effects 0.000 claims description 28
- 238000006243 chemical reaction Methods 0.000 claims description 25
- 230000004044 response Effects 0.000 claims description 21
- 238000012549 training Methods 0.000 claims description 19
- 238000004364 calculation method Methods 0.000 claims description 18
- 230000009466 transformation Effects 0.000 claims description 16
- 238000012423 maintenance Methods 0.000 claims description 15
- 238000010801 machine learning Methods 0.000 claims description 13
- 238000010606 normalization Methods 0.000 claims description 11
- 238000013135 deep learning Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 238000011002 quantification Methods 0.000 claims description 9
- 238000013139 quantization Methods 0.000 claims description 9
- 230000008859 change Effects 0.000 claims description 8
- 238000011156 evaluation Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 8
- 230000035772 mutation Effects 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 6
- WXOMTJVVIMOXJL-BOBFKVMVSA-A O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O[Al](O)O.O[Al](O)O.O[Al](O)O.O[Al](O)O.O[Al](O)O.O[Al](O)O.O[Al](O)O.O[Al](O)O.O[Al](O)OS(=O)(=O)OC[C@H]1O[C@@H](O[C@]2(COS(=O)(=O)O[Al](O)O)O[C@H](OS(=O)(=O)O[Al](O)O)[C@@H](OS(=O)(=O)O[Al](O)O)[C@@H]2OS(=O)(=O)O[Al](O)O)[C@H](OS(=O)(=O)O[Al](O)O)[C@@H](OS(=O)(=O)O[Al](O)O)[C@@H]1OS(=O)(=O)O[Al](O)O Chemical compound O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O[Al](O)O.O[Al](O)O.O[Al](O)O.O[Al](O)O.O[Al](O)O.O[Al](O)O.O[Al](O)O.O[Al](O)O.O[Al](O)OS(=O)(=O)OC[C@H]1O[C@@H](O[C@]2(COS(=O)(=O)O[Al](O)O)O[C@H](OS(=O)(=O)O[Al](O)O)[C@@H](OS(=O)(=O)O[Al](O)O)[C@@H]2OS(=O)(=O)O[Al](O)O)[C@H](OS(=O)(=O)O[Al](O)O)[C@@H](OS(=O)(=O)O[Al](O)O)[C@@H]1OS(=O)(=O)O[Al](O)O WXOMTJVVIMOXJL-BOBFKVMVSA-A 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 238000013075 data extraction Methods 0.000 claims description 3
- 238000013136 deep learning model Methods 0.000 claims description 3
- 230000009191 jumping Effects 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 2
- 230000000877 morphologic effect Effects 0.000 abstract description 2
- 238000013461 design Methods 0.000 description 13
- 238000003066 decision tree Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 238000013501 data transformation Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001932 seasonal effect Effects 0.000 description 1
- 230000033772 system development Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Testing And Monitoring For Control Systems (AREA)
Abstract
The invention discloses an intelligent monitoring and evaluating method for system health based on anomaly detection. According to the method, the mass index data are subjected to morphological division, different anomaly detection algorithms are designed for different types of index data, and the anomaly detection efficiency and accuracy are improved. And combining an unsupervised anomaly detection algorithm to perform anomaly detection on key indexes influencing the system health degree, and obtaining anomaly labels and anomaly degrees of the indexes. An expert system based on an analytic hierarchy process is adopted, a weight value is set for key indexes in the system, the health degrees of the key indexes are weighted and summed, and finally the health degree score of the system is obtained, so that the overall health condition of the system can be intuitively reflected.
Description
Technical Field
The invention relates to the field of intelligent operation and maintenance, in particular to a system health intelligent monitoring and evaluating method based on anomaly detection.
Background
Along with the continuous acceleration of industry digitization, each industry is in digital transformation, and meanwhile, the 5G age comes, so that mobile services are rapidly increased, service requests are increasingly complex, and service channels are continuously expanded. The high-speed development of Internet application brings assistance to the development of industry and simultaneously brings great challenges to the traditional operation and maintenance.
The traditional monitoring tool mainly aims at monitoring the resource level, focuses on the running conditions of various resources, can only reflect whether each component normally runs, but does not evaluate the running conditions of the service system as a whole. In the traditional automatic operation and maintenance system, the labor cost and efficiency of the repeated operation and maintenance work are solved. However, for complex scenarios, people are still required to control the decision making process, which hinders further improvement of the operation and maintenance efficiency. It is important to reasonably apply artificial intelligence technology to truly and globally reflect the actual health of the current system.
Defects of conventional monitoring tools: the traditional operation and maintenance tool is mainly used for directly monitoring a single index, can only reflect whether each index data is in a normal range, and cannot globally reflect the actual health condition of the current system. While the operation and maintenance tool is too specialized, the data analysis is highly dependent on system development and security technicians, and in practical situations, the technicians have difficulty in processing massive monitoring data, and a great deal of labor cost is required.
Defects of traditional automatic operation and maintenance: the setting of the index warning threshold depends on expert experience and cannot be scientifically set. With the change of the service, part of indexes can show a trend of dynamic change, and the traditional automatic operation and maintenance lacks the capability of establishing a dynamic baseline for the indexes. The tolerance of the conventional threshold alarm to faults is low, and the false alarm rate is high.
The existing health degree assessment method has the problems that the labor cost is high, the self-adaptive business change is difficult to realize, the overall health degree of the system cannot be reflected globally, and the like.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an intelligent monitoring and evaluating method for the system health degree based on anomaly detection. According to the method, the mass index data are subjected to morphological division, different anomaly detection algorithms are designed for different types of index data, and the anomaly detection efficiency and accuracy are improved. And combining an unsupervised anomaly detection algorithm to perform anomaly detection on key indexes influencing the system health degree, and obtaining anomaly labels and anomaly degrees of the indexes. An expert system based on an analytic hierarchy process is adopted, a weight value is set for key indexes in the system, the health degrees of the key indexes are weighted and summed, and finally the health degree score of the system is obtained, so that the overall health condition of the system can be intuitively reflected.
The invention aims at realizing the following technical scheme: an intelligent monitoring and evaluating method for system health based on anomaly detection comprises the following steps:
(1) Key index extraction: extracting and preprocessing the collected system monitoring data to obtain key index data, wherein the extraction comprises key index selection and index data extraction; the system monitoring data specifically is log file data generated by monitoring each request by a load balancer; the key indexes comprise four indexes of success rate, response time, request number and concurrency number which can be extracted from the log file;
(2) Index anomaly detection: designing a set of unsupervised index anomaly detection model for the processed key index data, and carrying out anomaly detection on massive data; the method specifically comprises index morphology classification, an abnormality detection algorithm and an abnormality quantization score;
(3) System health assessment: an expert system based on an analytic hierarchy process is adopted, and the health degree of the key indexes is weighted and summed by combining the index anomaly detection result to obtain an overall system health degree assessment result; the expert system is a weight system established by combining the opinions of the operation and maintenance expert.
Further, the key index extraction in the step (1) includes the following substeps:
the method comprises the steps of (1.1) obtaining system monitoring log data, completing key index selection operation, extracting key indexes from original data, and reserving required index fields;
(1.2) data division, namely data obtained in the step (1.1) are divided according to the system name and time, and the time interval is one month;
(1.3) setting counters for four key indexes by taking minutes as dimensions for different systems and access channels url, wherein response time and success rate are counted by taking url as dimensions, and request number and concurrency number are counted by taking the system as dimensions;
(1.4) analyzing the transmitted monitoring log data one by one, extracting and calculating key indexes in the log, and updating the state of a counter;
(1.5) repeating the step (1.3) and the step (1.4) until the data of a certain month of the system are read;
(1.6) calculating and storing the values of four indexes of the success rate, the response time, the request number and the concurrency number of all minutes in a certain month of a certain system according to the key index counter;
(1.7) repeating the steps (1.33) to (1.6) until all the monitoring log data are processed;
(1.8) denoising the data, and denoising the data based on a 3-sigma algorithm, so as to reduce the influence caused by noise;
(1.9) filling missing values, and filling the missing values in the data by using a mean value method;
(1.10) timestamp conversion, namely converting the timestamp format in the data into a unified format, and further carrying out subsequent processing;
(1.11) time-sequential ordering, ordering the data in time-sequential order.
Further, the index anomaly detection in the step (2) specifically comprises index morphology classification, anomaly detection algorithm and anomaly quantification score.
Further, the index morphology classification specifically includes the following substeps:
(3.1) acquiring key index data, checking the data quantity, judging that the index with the data quantity smaller than the threshold value is other types of data, otherwise, entering the step (3.2);
(3.2) carrying out normalization and difference processing on the data based on the ideas of the same ratio, and judging whether the data has periodicity; if yes, the data is periodic data, otherwise, the step (3.3) is entered;
(3.3) obtaining local fluctuation and global fluctuation of the data based on wavelet transformation and a method for calculating overall variance, wherein if the global fluctuation is far greater than the local fluctuation, the data is trend data; if the global fluctuation approximates to the local fluctuation, the data is stable, and the step (3.4) is carried out; if both conditions are not satisfied, the data is of other types;
(3.4) calculating a mean value and a maximum value, wherein the mean value represents the overall distribution of the data, the maximum value represents the distribution of the abnormal values, the deviation between the abnormal values and the overall data distribution is compared, and if the deviation is greater than a threshold value, the deviation is of a mutation stability type; and otherwise, the method is of a conventional stable type.
Further, the anomaly detection algorithm uses a machine learning-based method or a depth-based learning method; the machine learning-based method is to detect three types of periodic, trend and stable by using two different anomaly detection algorithms of a fixed threshold method and an isolated forest method; the method comprises the steps of performing space conversion on periodic and trend indexes, converting the periodic and trend indexes into conventional stable index data, detecting the conventional stable index data by adopting a fixed threshold method, and detecting the mutation stable index data by adopting an isolated forest method; the method based on deep learning only carries out anomaly detection on periodic data, namely the periodic data is input into a deep learning model for training and prediction, and the difference between a predicted value and a true value is compared to obtain an anomaly detection result.
Further, the machine learning based anomaly detection algorithm specifically includes the following sub-steps:
(4.1) index data space conversion, wherein for periodic data, the periodic data is converted into stable data by adopting the same-ratio space conversion; for trend data, converting the trend data into stable data by adopting ring ratio space conversion;
the formula of the same-ratio space conversion is as follows:
wherein x is t For the current time data, K represents the number of periods, T is the period of the current index data, w is the set time window size, mean () represents the mean value of the data, and std () represents the standard deviation of the data.
The data is subjected to homonymy transformation to calculate the average value and standard deviation of the data in the same time window in history, and then the homonymy value of the current point is calculated, namely the value of the current point is divided by the standard deviation after subtracting the average value;
the formula of the loop ratio space conversion is as follows:
wherein x is t And w is the set time window size for the current time data.
The ring ratio transformation of the data is actually to calculate the mean value change ratio of the two windows closest to the data;
(4.2) detecting abnormality of index data, wherein the converted periodic data, the converted trend data and the conventional stable data are subjected to abnormality detection by adopting a fixed threshold method, and the mutation stable data are subjected to abnormality detection by adopting an isolated forest method;
the anomaly detection algorithm flow based on deep learning is carried out according to the following steps:
(5.1) data segmentation, namely converting time sequence data into supervised samples by adopting a sliding window segmentation method; the sliding window extraction is characterized in that the data of the predicted value in the previous hour and the data of the predicted value in the previous 10 days in the same time for 5 minutes;
(5.2) differential transformation of the data, wherein the differential transformation is to subtract the value of the same time of the previous day from the current value;
(5.3) normalizing the data, and carrying out normalization processing on the data based on the maximum and minimum normalization ideas;
(5.4) training an LSTM model, and training based on an LSTM algorithm principle to obtain an LSTM prediction model;
and (5.5) inputting the value to be measured into the model to obtain the abnormal label.
Specifically, the fixed threshold method anomaly detection includes the following sub-steps:
and (6.1) data transformation, namely, window division is carried out on the data by taking five minutes as time granularity, and the average value of index data in the window is obtained. The invalid alarm problem caused by the single-point burr phenomenon can be eliminated by data conversion, so that the false alarm rate is reduced;
(6.2) training an N-sigma model, and calculating a threshold value of an index according to the historical data in the step (1) based on the principle of an N-sigma algorithm;
and (6.3) inputting the value to be measured into the model to obtain the abnormal label.
Specifically, the isolated forest method anomaly detection includes the following sub-steps:
(7.1) selecting index features, namely comprehensively judging abnormality by adopting multidimensional features, and selecting six feature values to judge, wherein the selected features are as follows: the difference between the current value, the difference between the previous 1 minute value and the current value, the difference between the previous 2 minutes value and the current value, the difference between the previous 5 minutes average value and the current value before and after the same time of the previous 1 day, and the difference between the previous 5 minutes average value and the current value before and after the same time of the previous 5 days;
(7.2) training an isolated forest model, and training based on an isolated forest algorithm principle according to the extracted characteristic value to obtain the isolated forest model;
and (7.3) inputting the value to be measured into the model to obtain the abnormal label.
Specifically, the abnormal quantization score in the step (3) includes two abnormal quantization scoring algorithms, namely a quantization scoring algorithm based on a sigmoid function calculation formula and a quantization scoring algorithm based on an isolated forest method, specifically:
the quantization scoring algorithm flow based on the sigmoid calculation formula is carried out according to the following substeps:
(6.1) judging the abnormal label, if the abnormal label is normal, marking the abnormal score as 0, otherwise, entering the step (6.2);
(6.2) normalizing the index data, and eliminating the influence of the index data with different numbers of poles on abnormal scoring;
(6.3) transmitting a sigmoid formula to obtain an abnormal score, wherein a sigmoid function can scale data to be between 0 and 1, so that the difference between the data is eliminated, and the subsequent calculation of the health degree of the whole system is facilitated; the sigmoid function is expressed as follows:
the process of the chemical scoring algorithm based on the isolated forest method is carried out according to the following substeps:
(7.1) judging the abnormal label, if the abnormal label is normal, marking the abnormal score as 0, otherwise, entering the step (7.2);
(7.2) calculating PathLength, i.e. dividing the path length, in preparation for the calculation of the anomaly in the next step. The specific formula is as follows:
h(x)=e+c(T.size)
wherein e is the number of edges experienced by a sample in the process from the root node to the leaf node of the tree, namely the split times; the size represents the number of samples in a leaf node together with the sample x, and C (t.size) can be regarded as a correction value representing the average path length of the t.size samples to construct a binary tree, and C (n) has the following calculation formula:
wherein 0.5772156649 is euler constant;
finally, mapping the value range of the edge where the sample falls into the leaf node to be between 0 and 1 in a normalization mode; the formula is as follows:
wherein: h (x) is PathLength of the sample on iTree. E (h (x)) is the average of the samples at t PathLength of iTree; c (n) constructs an average path length of a BST binary tree for n samples.
Further, the system health evaluation module flow in the step (3) includes the following substeps:
(8.1) judging whether the key index is response time or success rate, if so, executing the step (8.2), otherwise, jumping to the step (8.4);
(8.2) combining index data from different access channels url of the same system according to system dimension, and sorting in time sequence;
(8.3) calculating the anomaly degree of the combined index, and carrying out summation and average on the anomaly degree at the same time to obtain the anomaly degree of the response time and the success rate in the system dimension;
(8.4) setting weights for four key indexes based on an index weight analysis system of an analytic hierarchy process, and obtaining the overall health of the system by adopting a weighted summation method; the specific calculation formula is as follows:
system health = 100-qps_rate QPS-current_rate current
-SUC_rate*SUC-RTime_rate*RTime
Wherein qps_rate represents the weight of the number of requests, QPS represents the anomaly of the number of requests, current_rate represents the weight of the number of concurrences, current_rate represents the anomaly of the number of concurrences, suc_rate represents the weight of the success rate, SUC represents the anomaly of the success rate, rtime_rate represents the weight of the average response time, RTime represents the anomaly of the average response time.
The beneficial effects of the invention are as follows: the intelligent monitoring and evaluating technology for the system health degree, which is realized by combining the automatic division of the index form into the decision tree and the anomaly detection algorithm, can meet the application requirement of outputting the system health degree. The automatic index dividing and anomaly detection algorithm has excellent effect, and can meet the actual application requirements in terms of detection speed and precision.
According to the scheme, the limitation that the traditional mode is too dependent on professional technicians and consumes a large amount of labor cost for operation and maintenance can be made up, and the types and the matching detection algorithms are automatically divided through the index classification decision tree, so that the abnormality detection is fully automatic; meanwhile, through multi-dimensional evaluation of the system health degree, the real health condition of the system is integrally and globally obtained, so that operation and maintenance personnel are helped to respond to the system problem in time, the operation and maintenance efficiency is improved, and the operation and maintenance cost is reduced.
Drawings
FIG. 1 is a design drawing of a system health intelligent detection evaluation scheme;
FIG. 2 is a flow chart of an anomaly detection algorithm based on machine learning;
FIG. 3 is a flow chart of an anomaly detection algorithm based on deep learning;
FIG. 4 is a flow chart of a system health assessment module implementation.
Detailed Description
The general algorithm design scheme of the invention is as follows:
as shown in FIG. 1, the intelligent monitoring and evaluating method for the system health degree based on anomaly detection comprises three modules.
(1) And extracting and preprocessing the collected system monitoring data to obtain key index data. The method specifically comprises the steps of key index selection and index data extraction. In the scheme, the system monitoring data is specifically log file data generated by monitoring each request by a load balancer, and the key indexes are specifically four indexes of success rate, response time, request number and concurrency number which can be extracted from the log file.
(2) Index anomaly detection, designing a set of effective unsupervised index anomaly detection model for processed key index data, and carrying out efficient anomaly detection on massive data. Specifically, the method comprises index morphology classification, an abnormality detection algorithm and an abnormality quantification score.
(3) And (3) evaluating the health degree of the system, wherein an expert system based on an analytic hierarchy process is adopted, and the health degree of the key indexes is weighted and summed by combining the result of index anomaly detection to obtain an overall system health degree evaluation result.
The health degree is a score of the health condition of the index, and if the index is detected to be normal, the health degree is close to 1, and if the index is detected to be abnormal, the health degree is close to 0.
The key index extraction module is designed as follows: the specific implementation steps are as follows:
and (1.1) acquiring system monitoring log data, finishing key index selection operation, extracting key indexes from original data, and reserving required index fields.
And (1.2) data division, wherein the data obtained in the step (1) are divided according to the system name and time, and the time interval is one month. And the memory overflow problem caused by the existence of too much data in the memory is avoided.
(1.3) setting counters for four key indexes by taking minutes as dimensions for different systems and access channels url, wherein response time and success rate are counted by taking url as dimensions, and request number and concurrency number are counted by taking the system as dimensions; the url is the address of the visited website.
And (1.4) analyzing the transmitted monitoring log data one by one, extracting and calculating key indexes in the log, and updating the state of the counter.
And (1.5) repeating the step (3) and the step (4) until the data reading of a certain month of a certain system is completed.
And (1.6) calculating and storing the values of four indexes of the success rate, the response time, the request number and the concurrency number of all minutes in a certain month of a certain system according to the key index counter.
And (1.7) repeating the steps (3) to (6) until all the monitoring log data are processed.
And (1.8) denoising the data, wherein denoising processing is performed on the data based on a 3-sigma algorithm, so that the influence caused by noise is reduced.
(1.9) missing value filling, and filling missing values in the data by a mean-based method.
And (1.10) timestamp conversion, namely converting the timestamp format in the data into a unified format, so that the subsequent processing is convenient.
(1.11) time sequence sorting, sorting the data according to the time sequence, and ensuring the order of the data.
The design scheme of the index anomaly detection module is as follows: the index anomaly detection module comprises three sub-modules, namely index morphology classification, anomaly detection algorithm and anomaly quantification score.
(2.1) index form classification, the invention combines ideas such as homonymy and wavelet transformation, designs an index form division decision tree, carries out automatic type division on massive index data to improve division efficiency, ensures more accurate division, and can carry out type division again when the data form changes or new data is added, thereby adapting to the data change.
And (2.2) an abnormality detection algorithm, wherein the invention combines machine learning and deep learning algorithms to design different abnormality detection algorithms for index data of each form, and performs abnormality detection on the index to obtain the abnormality condition of the index at each moment.
And (2.3) carrying out anomaly quantification scoring, wherein two sets of index anomaly scoring algorithms are designed for the anomaly detection algorithm, and the anomaly degree of the index is measured. And (3) normalizing abnormal conditions of different indexes to be between 0 and 1, so that the subsequent overall evaluation of the system health degree is facilitated by a weighted summation method.
The three sub-modules of the index anomaly detection module are described in detail below:
the design scheme of the index morphology classification submodule comprises the following specific implementation steps:
and (3.1) acquiring key index data, checking the data quantity, judging that the index with the data quantity smaller than the threshold value is other types of data, and otherwise, entering the step (3.2).
And (3.2) carrying out normalization and difference processing on the data based on the ideas of the same ratio, and judging whether the data has periodicity. If yes, the data is periodic data, otherwise, the step (3.3) is entered.
(3.3) obtaining local fluctuation and global fluctuation of the data based on wavelet transformation and a method for calculating the overall variance, wherein if the global fluctuation is far greater than the local fluctuation, the data is of a trend type. If the global fluctuation approximates to the local fluctuation, the data is stable, and the process proceeds to step (3.4). If neither condition is satisfied, the data is of another type.
And (3.4) calculating a mean value and a maximum value, wherein the mean value represents the overall distribution condition of the data, the maximum value represents the distribution condition of the abnormal value, the deviation condition of the abnormal value and the overall data distribution is compared, and if the deviation is greater than a threshold value, the data is of a mutation stability type. And otherwise, the method is of a conventional stable type.
Design scheme of abnormal detection algorithm submodule
The invention designs a method based on machine learning for trend type and stable type data, and designs two sets of methods based on machine learning and deep learning for detection for periodic type data.
The invention designs two different anomaly detection algorithms for three types of data, namely periodic type, trend type and stable type. The first set is a machine learning-based method, which comprises two different anomaly detection algorithms, namely a fixed threshold method and an isolated forest method, and can detect three types of periodic type, trend type and stable type. In this set of methods, periodic and trend indicators are spatially transformed into regular stable indicator data. And finally, detecting the conventional stable data by adopting a fixed threshold method, and detecting the abrupt change stable data by adopting an isolated forest method. The second set is a deep learning-based method that performs anomaly detection only on periodic data. In the method, periodic data are input into a deep learning model for training and prediction, and the difference between a predicted value and a true value is compared to realize the effect of anomaly detection.
As shown in fig. 2, the machine learning-based anomaly detection algorithm flow is performed according to the following steps:
(4.1) index data space conversion, wherein for periodic data, the periodic data is converted into stable data by adopting the same-ratio space conversion; for trend data, it was converted to stable form using ring-to-space conversion.
The formula of the same-ratio space conversion is as follows:
wherein x is t For the current time data, K represents the number of periods, T is the period of the current index data, w is the set time window size, mean () represents the mean value of the data, and std () represents the standard deviation of the data.
The data is subjected to the homomorphism conversion, namely, the average value and the standard deviation of the data in the same time window in history are calculated, and then the homomorphism value of the current point is calculated, namely, the value of the current point is divided by the standard deviation after the average value is subtracted.
The formula of the loop ratio space conversion is as follows:
wherein x is t And w is the set time window size for the current time data.
The ring ratio transformation of the data is actually to calculate the mean change ratio of the two windows of the data.
And (4.2) detecting abnormality of the index data, wherein the converted periodic data, the converted trend data and the conventional stable data are subjected to abnormality detection by adopting a fixed threshold method, and the mutation stable data are subjected to abnormality detection by adopting an isolated forest method.
(4.3) detecting abnormality of the index data, and transmitting the data to a machine learning-based method for abnormality detection.
The fixed threshold method abnormality detection flow is carried out according to the following steps:
and (5.1) data transformation, namely, window division is carried out on the data by taking five minutes as time granularity, and the average value of index data in the window is obtained. The data conversion can eliminate the invalid alarm problem caused by the single-point burr phenomenon, and the false alarm rate is reduced.
(5.2) training an N-sigma model, and calculating the threshold value of the index according to the historical data in the step (1) based on the principle of an N-sigma algorithm.
And (5.3) inputting the value to be measured into the model to obtain the abnormal label.
The isolated forest method abnormality detection flow is carried out according to the following steps:
(6.1) selecting index features, namely comprehensively judging abnormality by adopting multidimensional features, and selecting six feature values to judge, wherein the selected features are as follows: the current value, the difference between the previous 1 minute value and the current value, the difference between the previous 2 minutes value and the current value, the difference between the previous 5 minutes value and the current value, the difference between the average value of the previous and the next 5 minutes at the same time of the previous 1 day and the current value, and the difference between the average value of the previous and the next 5 minutes at the same time of the previous 5 days and the current value.
And (6.2) training the isolated forest model, and training based on the principle of an isolated forest algorithm according to the extracted characteristic value to obtain the isolated forest model.
And (6.3) inputting the value to be measured into the model to obtain the abnormal label.
As shown in fig. 3, the anomaly detection algorithm flow based on deep learning proceeds according to the following steps:
(7.1) data segmentation, namely converting time sequence data into supervised samples by adopting a sliding window segmentation method; the sliding window extraction is characterized by the data of one hour before the predicted value and the data of 5 minutes before and after the same time of the previous 10 days.
(7.2) differential transformation of the data, wherein the differential transformation is to subtract the value of the same time of the previous day from the current value; the influence of seasonal variation and periodicity of the data is eliminated, so that the data is stable.
And (7.3) normalizing the data, and based on the maximum and minimum normalization ideas, normalizing the data, so that the accuracy of subsequent model training can be improved.
(7.4) training the LSTM model, and training based on the LSTM algorithm principle to obtain the LSTM prediction model.
(7.5) inputting the measured value into the model to obtain an abnormal label
The abnormal quantitative scoring sub-model design scheme comprises the following steps: the invention designs two abnormal quantitative scoring algorithms in total, and the two abnormal quantitative scoring algorithms are respectively described in detail as follows:
the quantization scoring algorithm flow based on the sigmoid calculation formula is carried out according to the following steps:
and (8.1) judging the abnormal label, if the abnormal label is normal, marking the abnormal score as 0, otherwise, entering the step (2).
And (8.2) normalizing the index data, and eliminating the influence of the index data with different numbers of poles on abnormal scoring.
And (8.3) transmitting a sigmoid formula to obtain an abnormal score, wherein the sigmoid function can scale the data to be between 0 and 1, so that the difference between the data is eliminated, and the subsequent calculation of the health degree of the whole system is facilitated. The sigmoid function is expressed as follows:
the chemical scoring algorithm flow based on the isolated forest method is carried out according to the following steps:
and (9.1) judging the abnormal label, if the abnormal label is normal, marking the abnormal score as 0, otherwise, entering the step (2).
(9.2) calculating PathLength, i.e. dividing the path length, in preparation for the calculation of the anomaly in the next step. The specific formula is as follows:
h(x)=e+c(T.size)
where e is the number of edges, i.e., split times, that a sample experiences in going from the root node to the leaf node of the tree. The size represents the number of samples in a leaf node together with the sample x, and C (t.size) can be regarded as a correction value representing the average path length of the t.size samples to construct a binary tree, and C (n) has the following calculation formula:
wherein 0.5772156649 is Euler constant
The value range of the edge where the sample falls into the leaf node is mapped to between 0 and 1 in a normalization mode. The specific formula is as follows:
wherein: h (x) is PathLength of the sample on iTree. E (h (x)) is the average of the samples at t PathLength of iTree. c (n) constructs an average path length of a BST binary tree for n samples.
The system health degree evaluation module design scheme comprises the following steps: as shown in fig. 4, the system health evaluation module flow proceeds according to the following steps:
and (10.1) judging whether the key index is response time or success rate, if so, executing the step (10.2), otherwise, jumping to the step (10.4).
(10.2) merging the index data from different access channels (url) of the same system according to the system dimension, and sorting the index data in time sequence.
And (10.3) combining index anomaly degree calculation, and carrying out summation and average on anomaly degrees at the same time to obtain anomaly degrees of response time and success rate in system dimension.
And (10.4) setting weights for four key indexes based on an index weight analysis system of an analytic hierarchy process, and obtaining the overall health of the system by adopting a weighted summation method. The specific calculation formula is as follows:
system health = 100-qps_rate QPS-current_rate current
-SUC_rate*SUC-RTime_rate*RTime
Wherein qps_rate represents the weight of the number of requests, QPS represents the anomaly of the number of requests, current_rate represents the weight of the number of concurrences, current_rate represents the anomaly of the number of concurrences, suc_rate represents the weight of the success rate, SUC represents the anomaly of the success rate, rtime_rate represents the weight of the average response time, RTime represents the anomaly of the average response time.
The above examples illustrate specific embodiments of the invention, which are described in more detail and are intended to aid in understanding the method of the invention and its core ideas, but are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.
Claims (10)
1. The intelligent monitoring and evaluating method for the system health degree based on the anomaly detection is characterized by comprising the following steps of:
(1) Key index extraction: extracting and preprocessing the collected system monitoring data to obtain key index data, wherein the extraction comprises key index selection and index data extraction; the system monitoring data specifically is log file data generated by monitoring each request by a load balancer; the key indexes comprise four indexes of success rate, response time, request number and concurrency number which can be extracted from the log file;
(2) Index anomaly detection: designing a set of unsupervised index anomaly detection model for the processed key index data, and carrying out anomaly detection on massive data; the method specifically comprises index morphology classification, an abnormality detection algorithm and an abnormality quantization score;
(3) System health assessment: an expert system based on an analytic hierarchy process is adopted, and the health degree of the key indexes is weighted and summed by combining the index anomaly detection result to obtain an overall system health degree assessment result; the expert system is a weight system established by combining the opinions of the operation and maintenance expert.
2. The intelligent monitoring and evaluating method for system health based on anomaly detection according to claim 1, wherein the key index extraction in the step (1) comprises the following sub-steps:
the method comprises the steps of (1.1) obtaining system monitoring log data, completing key index selection operation, extracting key indexes from original data, and reserving required index fields;
(1.2) data division, namely data obtained in the step (1.1) are divided according to the system name and time, and the time interval is one month;
(1.3) setting counters for four key indexes by taking minutes as dimensions for different systems and access channels url, wherein response time and success rate are counted by taking url as dimensions, and request number and concurrency number are counted by taking the system as dimensions;
(1.4) analyzing the transmitted monitoring log data one by one, extracting and calculating key indexes in the log, and updating the state of a counter;
(1.5) repeating the step (1.3) and the step (1.4) until the data of a certain month of the system are read;
(1.6) calculating and storing the values of four indexes of the success rate, the response time, the request number and the concurrency number of all minutes in a certain month of a certain system according to the key index counter;
(1.7) repeating the steps (1.33) to (1.6) until all the monitoring log data are processed;
(1.8) denoising the data, and denoising the data based on a 3-sigma algorithm, so as to reduce the influence caused by noise;
(1.9) filling missing values, and filling the missing values in the data by using a mean value method;
(1.10) timestamp conversion, namely converting the timestamp format in the data into a unified format, and further carrying out subsequent processing;
(1.11) time-sequential ordering, ordering the data in time-sequential order.
3. The intelligent monitoring and evaluating method for system health based on anomaly detection according to claim 1, wherein the index anomaly detection in the step (2) specifically comprises index morphology classification, anomaly detection algorithm and anomaly quantification score.
4. The intelligent monitoring and evaluating method for system health based on anomaly detection according to claim 3, wherein the index morphology classification specifically comprises the following sub-steps:
(3.1) acquiring key index data, checking the data quantity, judging that the index with the data quantity smaller than the threshold value is other types of data, otherwise, entering the step (3.2);
(3.2) carrying out normalization and difference processing on the data based on the ideas of the same ratio, and judging whether the data has periodicity; if yes, the data is periodic data, otherwise, the step (3.3) is entered;
(3.3) obtaining local fluctuation and global fluctuation of the data based on wavelet transformation and a method for calculating overall variance, wherein if the global fluctuation is far greater than the local fluctuation, the data is trend data; if the global fluctuation approximates to the local fluctuation, the data is stable, and the step (3.4) is carried out; if both conditions are not satisfied, the data is of other types;
(3.4) calculating a mean value and a maximum value, wherein the mean value represents the overall distribution of the data, the maximum value represents the distribution of the abnormal values, the deviation between the abnormal values and the overall data distribution is compared, and if the deviation is greater than a threshold value, the deviation is of a mutation stability type; and otherwise, the method is of a conventional stable type.
5. The intelligent monitoring and evaluating method for system health based on anomaly detection according to claim 3, wherein the anomaly detection algorithm uses a machine learning-based method or a deep learning-based method; the machine learning-based method is to detect three types of periodic, trend and stable by using two different anomaly detection algorithms of a fixed threshold method and an isolated forest method; the method comprises the steps of performing space conversion on periodic and trend indexes, converting the periodic and trend indexes into conventional stable index data, detecting the conventional stable index data by adopting a fixed threshold method, and detecting the mutation stable index data by adopting an isolated forest method; the method based on deep learning only carries out anomaly detection on periodic data, namely the periodic data is input into a deep learning model for training and prediction, and the difference between a predicted value and a true value is compared to obtain an anomaly detection result.
6. The intelligent monitoring and evaluating method for system health based on anomaly detection according to claim 5, wherein the anomaly detection algorithm based on machine learning specifically comprises the following sub-steps:
(4.1) index data space conversion, wherein for periodic data, the periodic data is converted into stable data by adopting the same-ratio space conversion; for trend data, converting the trend data into stable data by adopting ring ratio space conversion;
the formula of the same-ratio space conversion is as follows:
wherein x is t For the current time data, K represents the number of periods, T is the period of the current index data, w is the set time window size, mean () represents the mean value of the data, and std () represents the standard deviation of the data.
The data is subjected to homonymy transformation to calculate the average value and standard deviation of the data in the same time window in history, and then the homonymy value of the current point is calculated, namely the value of the current point is divided by the standard deviation after subtracting the average value;
the formula of the loop ratio space conversion is as follows:
wherein x is t And w is the set time window size for the current time data.
The ring ratio transformation of the data is actually to calculate the mean value change ratio of the two windows closest to the data;
(4.2) detecting abnormality of index data, wherein the converted periodic data, the converted trend data and the conventional stable data are subjected to abnormality detection by adopting a fixed threshold method, and the mutation stable data are subjected to abnormality detection by adopting an isolated forest method;
the anomaly detection algorithm flow based on deep learning is carried out according to the following steps:
(5.1) data segmentation, namely converting time sequence data into supervised samples by adopting a sliding window segmentation method; the sliding window extraction is characterized in that the data of the predicted value in the previous hour and the data of the predicted value in the previous 10 days in the same time for 5 minutes;
(5.2) differential transformation of the data, wherein the differential transformation is to subtract the value of the same time of the previous day from the current value;
(5.3) normalizing the data, and carrying out normalization processing on the data based on the maximum and minimum normalization ideas;
(5.4) training an LSTM model, and training based on an LSTM algorithm principle to obtain an LSTM prediction model;
and (5.5) inputting the value to be measured into the model to obtain the abnormal label.
7. The intelligent monitoring and evaluating method for system health based on anomaly detection according to claim 6, wherein the fixed threshold anomaly detection comprises the following sub-steps:
(6.1) data conversion, namely dividing a window of the data by taking five minutes as time granularity, and obtaining the average value of index data in the window; the invalid alarm problem caused by the single-point burr phenomenon can be eliminated by data conversion, so that the false alarm rate is reduced;
(6.2) training an N-sigma model, and calculating a threshold value of an index according to the historical data in the step (1) based on the principle of an N-sigma algorithm;
and (6.3) inputting the value to be measured into the model to obtain the abnormal label.
8. The intelligent monitoring and evaluating method for system health based on anomaly detection according to claim 6, wherein the anomaly detection by the isolated forest method comprises the following sub-steps:
(7.1) selecting index features, namely comprehensively judging abnormality by adopting multidimensional features, and selecting six feature values to judge, wherein the selected features are as follows: the difference between the current value, the difference between the previous 1 minute value and the current value, the difference between the previous 2 minutes value and the current value, the difference between the previous 5 minutes average value and the current value before and after the same time of the previous 1 day, and the difference between the previous 5 minutes average value and the current value before and after the same time of the previous 5 days;
(7.2) training an isolated forest model, and training based on an isolated forest algorithm principle according to the extracted characteristic value to obtain the isolated forest model;
and (7.3) inputting the value to be measured into the model to obtain the abnormal label.
9. The intelligent monitoring and evaluating method for system health based on anomaly detection according to claim 1, wherein the anomaly quantification score in the step (3) comprises two anomaly quantification scoring algorithms, namely a quantification scoring algorithm based on a sigmoid function calculation formula and a quantification scoring algorithm based on an isolated forest method, specifically comprising:
the quantization scoring algorithm flow based on the sigmoid calculation formula is carried out according to the following substeps:
(6.1) judging the abnormal label, if the abnormal label is normal, marking the abnormal score as 0, otherwise, entering the step (6.2);
(6.2) normalizing the index data, and eliminating the influence of the index data with different numbers of poles on abnormal scoring;
(6.3) transmitting a sigmoid formula to obtain an abnormal score, wherein a sigmoid function can scale data to be between 0 and 1, so that the difference between the data is eliminated, and the subsequent calculation of the health degree of the whole system is facilitated; the sigmoid function is expressed as follows:
the process of the chemical scoring algorithm based on the isolated forest method is carried out according to the following substeps:
(7.1) judging the abnormal label, if the abnormal label is normal, marking the abnormal score as 0, otherwise, entering the step (7.2);
(7.2) calculating PathLength, namely dividing the path length, and preparing for the calculation of the degree of abnormality in the next step; the specific formula is as follows:
h(x)=e+c(T.size)
wherein e is the number of edges experienced by a sample in the process from the root node to the leaf node of the tree, namely the split times; the size represents the number of samples in a leaf node together with the sample x, and C (t.size) can be regarded as a correction value representing the average path length of the t.size samples to construct a binary tree, and C (n) has the following calculation formula:
wherein 0.5772156649 is euler constant;
finally, mapping the value range of the edge where the sample falls into the leaf node to be between 0 and 1 in a normalization mode; the formula is as follows:
wherein: h (x) is PathLength of the sample on iTree; e (h (x)) is the average of the samples at t PathLength of iTree; c (n) constructs an average path length of a BST binary tree for n samples.
10. The intelligent monitoring and evaluating method for system health based on anomaly detection according to claim 1, wherein the system health evaluation module flow in step (3) comprises the following sub-steps:
(8.1) judging whether the key index is response time or success rate, if so, executing the step (8.2), otherwise, jumping to the step (8.4);
(8.2) combining index data from different access channels url of the same system according to system dimension, and sorting in time sequence;
(8.3) calculating the anomaly degree of the combined index, and carrying out summation and average on the anomaly degree at the same time to obtain the anomaly degree of the response time and the success rate in the system dimension;
(8.4) setting weights for four key indexes based on an index weight analysis system of an analytic hierarchy process, and obtaining the overall health of the system by adopting a weighted summation method; the specific calculation formula is as follows:
system health = 100-qps_rate QPS-current_rate Concurrent-SUC_rate SUC-rtime_rate RTime
Wherein qps_rate represents the weight of the number of requests, QPS represents the anomaly of the number of requests, current_rate represents the weight of the number of concurrences, current_rate represents the anomaly of the number of concurrences, suc_rate represents the weight of the success rate, SUC represents the anomaly of the success rate, rtime_rate represents the weight of the average response time, RTime represents the anomaly of the average response time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310284349.1A CN116383645A (en) | 2023-03-22 | 2023-03-22 | Intelligent system health degree monitoring and evaluating method based on anomaly detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310284349.1A CN116383645A (en) | 2023-03-22 | 2023-03-22 | Intelligent system health degree monitoring and evaluating method based on anomaly detection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116383645A true CN116383645A (en) | 2023-07-04 |
Family
ID=86966737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310284349.1A Pending CN116383645A (en) | 2023-03-22 | 2023-03-22 | Intelligent system health degree monitoring and evaluating method based on anomaly detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116383645A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117538491A (en) * | 2024-01-09 | 2024-02-09 | 武汉怡特环保科技有限公司 | Station room air quality intelligent monitoring method and system |
CN118656741A (en) * | 2024-08-14 | 2024-09-17 | 南开大学 | Intelligent operation and maintenance method based on time sequence data |
CN118656741B (en) * | 2024-08-14 | 2024-10-29 | 南开大学 | Intelligent operation and maintenance method based on time sequence data |
-
2023
- 2023-03-22 CN CN202310284349.1A patent/CN116383645A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117538491A (en) * | 2024-01-09 | 2024-02-09 | 武汉怡特环保科技有限公司 | Station room air quality intelligent monitoring method and system |
CN117538491B (en) * | 2024-01-09 | 2024-04-05 | 武汉怡特环保科技有限公司 | Station room air quality intelligent monitoring method and system |
CN118656741A (en) * | 2024-08-14 | 2024-09-17 | 南开大学 | Intelligent operation and maintenance method based on time sequence data |
CN118656741B (en) * | 2024-08-14 | 2024-10-29 | 南开大学 | Intelligent operation and maintenance method based on time sequence data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115412455A (en) | Server multi-performance index abnormity detection method and device based on time sequence | |
CN113569903A (en) | Method, system, equipment, medium and terminal for predicting abrasion of numerical control machine tool cutter | |
CN110636066B (en) | Network security threat situation assessment method based on unsupervised generative reasoning | |
CN111796957B (en) | Transaction abnormal root cause analysis method and system based on application log | |
CN117421994A (en) | Edge application health monitoring method and system | |
CN116383645A (en) | Intelligent system health degree monitoring and evaluating method based on anomaly detection | |
CN116126807A (en) | Log analysis method and related device | |
CN117592870A (en) | Comprehensive analysis system based on water environment monitoring information | |
CN113891342A (en) | Base station inspection method and device, electronic equipment and storage medium | |
CN111125186A (en) | Data processing method and system based on questionnaire | |
CN114548494A (en) | Visual cost data prediction intelligent analysis system | |
CN117591860A (en) | Data anomaly detection method and device | |
CN112151185A (en) | Child respiratory disease and environment data correlation analysis method and system | |
CN115935285A (en) | Multi-element time series anomaly detection method and system based on mask map neural network model | |
CN113393169B (en) | Financial industry transaction system performance index analysis method based on big data technology | |
CN111882135B (en) | Internet of things equipment intrusion detection method and related device | |
CN113642669B (en) | Feature analysis-based fraud prevention detection method, device, equipment and storage medium | |
CN111654853B (en) | Data analysis method based on user information | |
CN116863481B (en) | Service session risk processing method based on deep learning | |
CN118013443B (en) | Online real-time vacuum dry pump abnormality detection method based on generation model algorithm | |
CN118312657B (en) | Knowledge base-based intelligent large model analysis recommendation system and method | |
CN113610333B (en) | Hydraulic engineering construction quality inspection method and device, electronic equipment and storage medium | |
CN118350921A (en) | Novel agricultural operation subject credit evaluation method, device, equipment and storage medium | |
CN118820910A (en) | Heterogeneous network security big data management method and system | |
CN117972595A (en) | Method, system, device and medium for analyzing electric charge abnormality |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |