CN111737099B - Data center anomaly detection method and device based on Gaussian distribution - Google Patents

Data center anomaly detection method and device based on Gaussian distribution Download PDF

Info

Publication number
CN111737099B
CN111737099B CN202010515936.3A CN202010515936A CN111737099B CN 111737099 B CN111737099 B CN 111737099B CN 202010515936 A CN202010515936 A CN 202010515936A CN 111737099 B CN111737099 B CN 111737099B
Authority
CN
China
Prior art keywords
distribution
feature
matrix
data center
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010515936.3A
Other languages
Chinese (zh)
Other versions
CN111737099A (en
Inventor
许明杰
俞俊
陈琰
卢士达
王琳
梅竹
陈海洋
庞恒茂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NARI Group Corp
Nari Technology Co Ltd
State Grid Shanghai Electric Power Co Ltd
State Grid Electric Power Research Institute
Original Assignee
NARI Group Corp
Nari Technology Co Ltd
State Grid Shanghai Electric Power Co Ltd
State Grid Electric Power Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NARI Group Corp, Nari Technology Co Ltd, State Grid Shanghai Electric Power Co Ltd, State Grid Electric Power Research Institute filed Critical NARI Group Corp
Priority to CN202010515936.3A priority Critical patent/CN111737099B/en
Publication of CN111737099A publication Critical patent/CN111737099A/en
Application granted granted Critical
Publication of CN111737099B publication Critical patent/CN111737099B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions

Abstract

The invention provides a data center anomaly detection method and device based on Gaussian distribution. The method comprises the following steps: acquiring the characteristics of a hardware level, a software level and a physical environment of a data center server to form a multi-dimensional characteristic data set; performing dimensionality reduction processing on the acquired multi-dimensional feature data set; and according to the data subjected to the dimension reduction processing, performing operation by using an abnormality detection model based on Gaussian distribution to obtain an abnormality detection result. On the basis of an anomaly detection algorithm based on Gaussian distribution, the anomaly monitoring method suitable for the high-density data center is provided, anomaly monitoring efficiency of the data center can be improved, and management cost of the data center under high-density design is reduced.

Description

Data center anomaly detection method and device based on Gaussian distribution
Technical Field
The invention relates to an anomaly detection method and device for a data center, and belongs to the technical field of data centers.
Background
With the advent of the era of big Data, Data centers (IDC for short) have been rapidly developed[1]. According to white paper book in data center (2018)[2]It is shown that global data centers exhibit a trend toward an increase in the amount of reducers. Since 2017, with the development of large-scale and intensive concepts, the construction scale of the data center is increasing, but the problems of efficient operation and maintenance management and talent loss of the data center are highlighted. The problems that operation and maintenance talents are in short supply and the operation and maintenance capability cannot keep pace with the construction speed of the data center and the like occur in the multi-data center. In the big data era, a large amount of data generated by a network is flooded into a data center, so that the data center is required to have the characteristics of high density, greenness and easiness in management[3]. As data centers continue to approach these goals, however, it becomes increasingly difficult for people to manage data centers. Wherein the monitoring of devices and troubleshooting of faulty devices for IDC rooms has been a popular topic of academic research in recent years. The academia has not yet provided a good solution to this problem. Most data center monitoring machine rooms still adopt means of manual investigation and physical sensor monitoring, and monitoring efficiency is not high and cost is high.
In recent years, anomaly detection for data centers has been a hotspot in academia[4][5]. Two main strategies are currently adopted for such studies: anomaly detection based on machine learning models and anomaly detection based on statistical models. The sample set is clustered based on the anomaly detection of the machine learning model, each data can be gathered in a certain cluster, and then the relevance of the data can be judged by calculating the Euclidean distance and the Manhattan distance.If a sample data point is far from any cluster or the data points of the cluster are sparse, the data point or cluster is determined to be in an abnormal state. For the study of this algorithm, Shenyin[6]A stable single-class support vector machine is provided, and a self-adaptive penalty factor is designed according to the Euclidean distance from each normal data point to the center of a data set, so that the influence of partial outliers on the support vector machine is small. Although overfitting of the model is avoided to a certain extent by the algorithm of Shenyin, the algorithm is easy to converge prematurely during convergence, so that the vector machine model cannot be classified well. F Xiao[7]And learning normal data samples by utilizing linear discrimination and logistic regression, and identifying acceptable behaviors of the network from the learning to perform intrusion detection. An alarm may occur when abnormal data outside the data set is observed. Although the accuracy of the algorithm is excellent, the linear regression model algorithm based on the algorithm occupies a large amount of time and memory during calculation, thereby reducing the efficiency of the algorithm. Statistical model-based anomaly detection requires feature sets to be extracted from the state or behavior of an observed object and a corresponding statistical model to be constructed. By collecting the distribution conditions of normal samples and abnormal samples in the samples, the abnormity can be rapidly judged according to the distribution conditions of newly collected samples. The method does not occupy a large amount of calculation time, and is suitable for solving the problem of large data flow. Huorong Ren[8]And segmenting sequence data through a sliding window, defining the state of the data according to the value of the sliding window data, and establishing a high-order Markov model with real-time adaptive state change for carrying out anomaly detection. The algorithm of the Huorongren can adapt to the change of the data set in real time, but the algorithm has no way of considering all sample sets in different periods, and is not suitable for being used as an algorithm of a data center anomaly detection system. Chen Xianda[9]The hierarchical structure is adopted to integrate the correlation between the sensor control and time, and the weight of the sensor and the spatial information is combined to perform anomaly detection on the sensor in the network through the Markov chain. After the spatial correlation is determined, effective time correlation is extracted, so that the detection accuracy can be improved and the communication cost can be reduced. But do notThe algorithm of Chen Xianda only designs an algorithm on the level of a sensor, and cannot completely reflect the abnormality of a server, and the monitoring effect on some servers with abnormal operation is not good.
As can be seen from the above analysis, the anomaly detection methods for data centers in the prior art, although excellent in some convenience, have their respective disadvantages. How to comprehensively improve the execution efficiency of the algorithm and the accuracy of the anomaly monitoring is a problem to be solved.
The cited references are as follows:
[1]2019 cloud computing industry depth report [ N ]. China information weekly report, 2019-12-09(012).
[2] White paper book in data center (2018).
http://www.caict.ac.cn/kxyj/qwfb/bps/201810/t20181016_186900.htm
[3] Tengqing cloud-based data center platform research and design [ J/OL ] electronic technology and software engineering, 2019(23): 173-.
[4] Zhan, network anomaly detection research and application [ D ]. Beijing post and telecommunications university, 2019.
[5] The virtual machine anomaly detection strategy and algorithm for operating environment perception under the Zhouyun cloud platform are researched [ D ]. Chongqing university, 2015.
[6]Yin S,Zhu X,Jing C.Fault detection based on a robust one class support vector machine[J].Neurocomputing,2014,145:263-268
[7]Subba B,Biswas S,Karmakar S.Intrusion Detection Systems using Linear Discriminant Analysis and Logistic Regression[C].India Conference.IEEE,2016:1-6
[8]Ren H,Ye Z,Li Z.Anomaly detection based on a dynamic Markov model[J].Information Sciences,2017,411:52-65.
[9]Chen X,Kim K T.Youn H Y.Integration of Markov random field with Markov chain for efficient event detection using wireless sensor network[J].Computer Communications,2008,31(17):4018-4025.
[10]Tingting Pan,Junhong Zhao,Wei Wu,Jie Yang.Learning imbalanced datasets based on SMOTE and Gaussian distribution[J].Information Sciences,2020,512.。
Disclosure of Invention
The purpose of the invention is as follows: aiming at the defects of the prior art, the invention provides a data center anomaly detection method and device based on Gaussian distribution, which can obviously improve the detection accuracy of a data center anomaly server and have high algorithm execution efficiency.
The technical scheme is as follows: in a first aspect, a data center anomaly detection method based on gaussian distribution includes the following steps:
acquiring the characteristics of a hardware level, a software level and a physical environment of a data center server to form a multi-dimensional characteristic data set;
performing dimensionality reduction processing on the acquired multi-dimensional feature data set;
and according to the data subjected to the dimension reduction processing, performing operation by using an abnormality detection model based on Gaussian distribution to obtain an abnormality detection result.
Further, the multi-dimensional feature data set is represented in the form of a matrix as follows:
Figure BDA0002530092490000031
n represents a characteristic dimension, each matrix element Xd(d is more than or equal to 1 and less than or equal to n) represents a vector formed by a plurality of physical quantities, and the vector is respectively one of X _ CPU, X _ GPU, X _ memory, X _ disk, X _ net, X _ thread and X _ phy, wherein X _ CPU is a series of characteristics for representing the working state of a CPU, X _ GPU is a series of characteristics for representing the working state of a GPU, X _ memory is a series of characteristics for representing the working state of a memory, X _ disk is a series of characteristics for representing the working state of a disk, X _ net is a series of characteristics for representing the working state of a network, X _ thread is a series of characteristics for representing the state of process resources, and X _ phy is a series of characteristics for representing a physical environment.
Further, the performing dimension reduction processing on the acquired multi-dimensional feature data set includes:
s21, for the d dimension feature XdThe jth element X of (2)djCalculating each feature X according to equation (1)djAverage value of (d):
Figure BDA0002530092490000041
wherein the superscript i represents the specific feature serial number, and m is the number of samples taken for the element feature;
s22, using
Figure BDA0002530092490000042
Replace each
Figure BDA0002530092490000043
Substituting equation (2) for feature scaling for each feature:
Figure BDA0002530092490000044
where max _ xdjRepresents the maximum value of the jth element feature of the d-dimension, min _ xdjRepresenting the minimum value of j element characteristics of the d dimension;
s23, the step S22
Figure BDA0002530092490000045
Substituting equation (3) to calculate the covariance matrix:
Figure BDA0002530092490000046
s24, sorting the covariance matrix elements from big to small, taking the first k columns to form a new covariance matrix ureduceThen, calculating a new feature value according to the formula (4) to obtain a new feature matrix dataset _ z:
z=Ureduce Tx (4)
Figure BDA0002530092490000047
further, the anomaly detection model based on the gaussian distribution is generated as follows:
recording the set of k features after dimensionality reduction as a set χ, selecting a first element in the set χ to be placed in an empty set κ, and then circularly executing the following operations until the set χ is empty:
a) calculating the distribution of the first column characteristic value in the set chi according to the Gaussian distribution, and marking as Pfirst(x) Separately calculate Pfirst(x) A correlation coefficient r associated with each distribution in the set κ;
b) when the | r | is larger than a specified threshold value, an eta matrix and an s matrix corresponding to the two distributions are calculated to form a multi-element high-density data center distribution which is recorded as Hdd distribution, and P is removed from the set χfirst(x) The cycle is ended;
c) otherwise, the master is Pfirst(x) Putting into a kappa set, and returning to the step a;
the calculation mode of the eta matrix and the s matrix is as follows:
Figure BDA0002530092490000051
Figure BDA0002530092490000052
Figure BDA0002530092490000053
where eta ∈ Rn,s∈Rn×n,f∈RnEta is mean vector of Hdd multivariate distribution, s is covariance matrix of Hdd multivariate distribution, f is intermediate parameter vector of Hdd multivariate distribution, and is formed by dividing corresponding elements of eta and s, p (x) is probability density function of Hdd multivariate distribution, and x(i)Represents the ith feature and m represents the number of samples of that feature.
Further, the obtaining an anomaly detection result by performing an operation using the anomaly detection model based on gaussian distribution includes:
and obtaining a multivariate distribution probability density function of each distribution after obtaining all the distributions in the set kappa according to an abnormality detection model, calculating a probability value of each distribution by using data subjected to dimension reduction processing, and determining the occurrence of an abnormality and identifying the dimension of the abnormality when the probability value is greater than a specified threshold value.
In a second aspect, a data center anomaly detection device based on gaussian distribution includes:
the data acquisition module is used for acquiring the characteristics of a hardware layer, a software layer and a physical environment of the data center server to form a multi-dimensional characteristic data set;
the preprocessing module is used for performing dimensionality reduction processing on the acquired multi-dimensional feature data set;
and the anomaly detection module is used for calculating by using an anomaly detection model based on Gaussian distribution according to the data subjected to the dimension reduction processing to obtain an anomaly detection result.
In a third aspect, a computer device comprises:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, which when executed by the processors implement the steps of the gaussian distribution based data center anomaly detection method according to the first aspect of the present invention.
Has the advantages that: the invention provides an anomaly monitoring method suitable for a high-density data center on the basis of an anomaly detection algorithm based on Gaussian distribution. The method comprises the steps of acquiring running characteristics of physical devices and software layers of a server, capturing data objects which possibly have abnormity in real time, performing dimensionality reduction processing on the data, extracting factors which have important influence on the abnormity, applying an improved Gaussian probability model, and performing comprehensive measurement on a plurality of factors to avoid detection errors caused by single-factor detection. The method can effectively improve the detection accuracy of the abnormal server of the high-density data center, has higher execution efficiency, and is beneficial to reducing the management cost of the data center under the high-density design.
Drawings
Fig. 1 is a flowchart of an anomaly detection method for a data center according to an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
The traditional data center carries out routing inspection and risk analysis through manpower and physical sensors, and consumes a large amount of manpower and material resources. On the basis of an anomaly detection algorithm based on Gaussian distribution, the anomaly monitoring method suitable for the high-density data center is provided, anomaly monitoring efficiency of the data center can be improved, and management cost of the data center under high-density design is reduced.
As shown in fig. 1, in one embodiment, the data center anomaly detection method based on gaussian distribution includes the following steps:
and step S10, acquiring the characteristics of the physical device and the software layer of the data center server.
For the selection of the features, the embodiment selects 300 features from the physical device and software level of the server, and the features mainly come from the physical environment where the CPU, the GPU, the hard disk, the memory, the motherboard, the power supply, the hardware are located, the computation, the storage, the process, the network throughput, and some other composite features. The examples of the extraction of the above features are as follows:
firstly, a series of characteristics extracted from the CPU, such as CPU load, CPU wait IO operation occupancy rate, CPU idle state occupancy rate and the like, are set as:
X_cpu=(cpu_load,cpu_iowait,cpu_free,......,cpu_sys)
secondly, a series of characteristics extracted from the GPU, such as GPU load, GPU occupancy rate of waiting IO operation, GPU idle state occupancy rate and the like are set as follows:
X_gpu=(gpu_load,gpu_iowait,gpu_free,......,gpu_sys)
a series of characteristics extracted from the memory, such as the number of free memories, the reading rate from the memory per second, the writing rate into the memory per second, the memory access rate, etc., are set as:
X_memory=(memory_free,memory_read,memory_write,......,memory_visit)
and fourthly, extracting a series of characteristics from the disk, such as disk IO throughput, hard disk access amount, reading rate from the disk per second, writing rate to the disk per second and the like, and setting the characteristics as follows:
X_disk=(disk_io,num_of_disk_acc/sec,......,disk_read)
the characteristics extracted from the physical environment, such as temperature, humidity, temperature difference, fan speed and the like, are set as follows:
X_phy=(tem,hum,tem_dval,......,cpu_fan_rate)
a series of characteristics extracted from the network throughput of the server, such as the data volume received by the server per second, the data volume sent by the server per second, the network load rate, the data packet receiving amount, the data packet loss amount and the like are set as follows:
X_net=(net_re,net_send,net_pac_re,......,net_load)
extracting characteristics from process resources, such as process occupancy rate of a process occupying a memory, a shared memory and a cpu, and setting the characteristics as follows:
X_thread=(thread_mem_size,thread_share_size,thread_cpu,......,thread_time)
for the above obtained features, the unlabeled feature sample is defined as:
X≡(X_cpu,X_gpu,X_memory,X_disk,X_net,X_thread,X_phy)。
it should be understood that the above-described seven-dimensional feature content is only for illustrative purposes, and does not limit the method of the present invention to obtain the same features as described above, and since the hardware facilities, the physical environment and the maintenance focus of different data centers are different, the selection of corresponding feature items can be performed according to actual situations.
In step S20, the acquired feature data set is subjected to dimension reduction processing.
The acquired feature data sets form a matrix, which is recorded as:
Figure BDA0002530092490000071
wherein each element X of the matrixdRepresents a vector value, i.e. an X value in X ≡ (X _ cpu, X _ gpu, X _ memory, X _ disk, X _ net, X _ thread, X _ phy) which is a collection of several physical quantities. n is the server feature dimension acquired in step S10, and in the present embodiment, n is 7.
The dimensionality reduction is carried out according to the following steps:
s21, for the d (d is more than or equal to 1 and less than or equal to n) th dimension characteristic XdThe jth element X of (2)djCalculating each feature X according to equation (1)djAverage value of (d):
Figure BDA0002530092490000081
the superscript i indicates a specific feature number, and as described in step S10, the CPU dimension feature X _ CPU is set to X1First feature X thereof11For cpu _ load, m is the number of samples taken for the element's feature, UdjRepresenting the mean value of the acquired m cpu _ loads;
s22, using
Figure BDA0002530092490000082
Replace each
Figure BDA0002530092490000083
Substituting equation (2) for feature scaling for each feature:
Figure BDA0002530092490000084
where max _ xdjRepresents the maximum value of the jth element feature of the d-dimension, min _ xdjRepresenting the minimum value of j element characteristics of the d dimension;
s23, preparation of S22
Figure BDA0002530092490000085
Substituting equation (3) to calculate the covariance matrix:
Figure BDA0002530092490000086
for different features XdjAnd calculating according to a matrix formed by the sample values of the covariance matrix.
S24, sorting the covariance matrix elements from big to small, taking the first k columns to form a new covariance matrix ureduceThen, a new eigenvalue is calculated according to equation (4), resulting in a new eigen matrix dataset _ z as shown in equation (5).
z=Ureduce Tx (4)
Figure BDA0002530092490000091
Step S30, an anomaly detection result is obtained by performing an operation using an anomaly detection model based on gaussian distribution based on the data subjected to the dimension reduction processing.
Because the common Gaussian distribution is applied to the abnormal detection algorithm of the data center server, the error is large, and the effect is not ideal, the invention provides a new probability distribution function on the basis of the Gaussian distribution.
The general distribution for a High-density data center (Hdd) is defined as follows:
X~Hdd(μ,σ2,t) (6)
Figure BDA0002530092490000092
Figure BDA0002530092490000093
Figure BDA0002530092490000094
Figure BDA0002530092490000095
general minute Brilliant, mujThe mean value is represented by the average value,
Figure BDA0002530092490000096
denotes the standard deviation, t is an intermediate value, and f (x) denotes the probability density function. The multivariate distribution for Hdd is defined as follows:
X~MultHdd(η,s,f) (11)
Figure BDA0002530092490000097
Figure BDA0002530092490000098
Figure BDA0002530092490000099
where eta ∈ Rn,s∈Rn×n,f∈Rnη is the mean vector of Hdd multivariate distribution, s is the covariance matrix of Hdd multivariate distribution, f is the t-parameter vector of Hdd multivariate distribution, which is formed by dividing η by the corresponding elements of s, and p (x) is the probability density function of Hdd multivariate distribution.
Assuming x is a k-dimensional feature vector, then:
Figure BDA0002530092490000101
PHddadprobability distribution function representing multivariate distribution, with hddad representing high density data center anomaly detection (hddamany detection), wherein
Figure BDA0002530092490000102
And PMultHdd(x; eta, s, f) represents the normal distribution and the multivariate distribution, respectively, as defined above.
Since a plurality of factors are considered, detection is equivalent to integration of a plurality of dimensions that reflect abnormal data, compared with a single element. For example, if an exception occurs, the CPU \ GPU \ memory may have a fault after the exception occurs, but the hard disk has no fault, so the foregoing principal component analysis is used to eliminate the influence of the hard disk, and the detection error caused by only passing through the CPU is avoided by the comprehensive measurement of a plurality of elements.
The generation of the anomaly detection model of the invention needs to calculate the correlation among all characteristic variables and then generate the model.
The following algorithm is performed:
1) setting a set χ as a set where k features subjected to dimensionality reduction are located, and setting a set κ as an empty set;
2) selecting a first element in chi to be put into a kappa set;
3) when there are elements in the set χ, the following operations are performed in a loop until the set χ is empty:
3.1) selecting the first distribution P in the set χfirst(x) (the first distribution refers to the distribution of the first list of feature values obtained according to the formula of the previous general distribution), the correlation coefficient r is calculated separately for each distribution in the set k, as follows:
Figure BDA0002530092490000103
3.2) if | r>0.25, calculating the eta matrix and the s matrix of the two distributions to form a multivariate Hdd distribution, and removing P from the set χfirst(x) The cycle ends.
3.3) otherwise, Pfirst(x) Put into the kappa pool.
The cycle ends.
The kappa set is a reference comparison set, storing all irrelevant distributions. The effect of this cycle is that the P currently taken in χfirst(x) And comparing each distribution in the comparison set kappa and if the correlation is greater than 0.25, comparing P with the distributionfirst(x) Form a multivariate distribution with the current distribution in kappa because of their strong correlation; if neither is greater than 0.25, P is indicatedfirst(x) If the correlation with all distributions in κ is not strong, P is selectedfirst(x) Put into κ. In this embodiment, the correlation threshold of 0.25 is a value obtained through experimental statistics, which is relatively reasonable and has a small error, and can be adjusted as needed in an actual situation.
Figure BDA0002530092490000111
dis is an abbreviation for distribution, hddis denotes Hdd distribution, hddisiShowing the ith multivariate distribution constructed according to the above cycle.
And finishing the generation of the abnormality detection model.
After all the distributions in the set k are obtained, the multivariate distribution probability density function of each distribution can be obtained, then the probability value of each distribution can be calculated according to the measured value in practical application, and when the probability value is larger than a certain threshold value, the abnormality detection can be carried out. The threshold value is generally determined by specific problem specific analysis, is related to different characteristics, cannot be uniformly determined in advance, and can be configured in detection.
According to the method, the data center deployed by an enterprise is verified within a period of time, experiments show that the method improves the accuracy rate of detecting the abnormal server by nearly 20%, and meanwhile, the algorithm has high execution efficiency.
According to another embodiment of the present invention, a data center abnormality detection apparatus based on gaussian distribution includes:
the data acquisition module is used for acquiring the characteristics of physical devices and software layers of the data center server to form a multi-dimensional characteristic data set;
the preprocessing module is used for performing dimensionality reduction processing on the acquired multi-dimensional feature data set;
and the anomaly detection module is used for calculating by using an anomaly detection model based on Gaussian distribution according to the data subjected to the dimension reduction processing to obtain an anomaly detection result.
The multidimensional characteristic data set obtained by the data acquisition module is expressed in a matrix form as follows:
Figure BDA0002530092490000121
n represents a characteristic dimension, each matrix element Xd(d is more than or equal to 1 and less than or equal to n) represents a vector formed by a plurality of physical quantities, and the vector is respectively one of X _ CPU, X _ GPU, X _ memory, X _ disk, X _ net, X _ thread and X _ phy, wherein X _ CPU is a series of characteristics for representing the working state of a CPU, X _ GPU is a series of characteristics for representing the working state of a GPU, X _ memory is a series of characteristics for representing the working state of a memory, X _ disk is a series of characteristics for representing the working state of a disk, X _ net is a series of characteristics for representing the working state of a network, X _ thread is a series of characteristics for representing the state of process resources, and X _ phy is a series of characteristics for representing a physical environment.
The preprocessing module comprises:
a mean value calculation unit for calculating the d-th dimension characteristic XdThe jth element X of (2)djEach feature X is calculated as followsdjAverage value of (d):
Figure BDA0002530092490000122
wherein the superscript i represents the specific feature serial number, and m is the number of samples taken for the element feature;
feature scaling unit for using
Figure BDA0002530092490000123
Replace each
Figure BDA0002530092490000124
The feature scaling is performed for each feature by substituting:
Figure BDA0002530092490000125
where max _ xdjRepresents the maximum value of the jth element feature of the d-dimension, min _ xdjRepresenting the minimum value of j element characteristics of the d dimension;
a covariance matrix calculation unit for scaling the features obtained by the feature scaling unit
Figure BDA0002530092490000126
The covariance matrix is calculated by substituting the following equation:
Figure BDA0002530092490000127
a new feature matrix calculation unit for sorting the covariance matrix elements from large to small, taking the first k columns to form a new covariance matrix ureduceThen, a new feature value is calculated as follows,
z=Ureduce Tx (21)
obtaining a new feature matrix dataset _ z:
Figure BDA0002530092490000131
the abnormality detection module includes:
the model building unit is used for generating a data center abnormity detection model based on Gaussian distribution, and the generation method comprises the following steps: recording the set of k features after dimensionality reduction as a set χ, selecting a first element in the set χ to be placed in an empty set κ, and then circularly executing the following operations until the set χ is empty:
a) calculating the distribution of the first column characteristic value in the set chi according to the Gaussian distribution, and marking as Pfirst(x) Separately calculate Pfirst(x) A correlation coefficient r associated with each distribution in the set κ;
b) when | r | is greater than a specified threshold, the η moments for the two distributions are calculatedThe matrix and the s-matrix form a multi-element high density data center distribution, denoted as Hdd distribution, with P removed from the set χfirst(x) The cycle is ended;
c) otherwise, the master is Pfirst(x) Putting into a kappa set, and returning to the step a;
the calculation mode of the eta matrix and the s matrix is as follows:
Figure BDA0002530092490000132
Figure BDA0002530092490000133
Figure BDA0002530092490000134
where eta ∈ Rn,s∈Rn×n,f∈RnEta is mean vector of Hdd multivariate distribution, s is covariance matrix of Hdd multivariate distribution, f is intermediate parameter vector of Hdd multivariate distribution, and is formed by dividing corresponding elements of eta and s, p (x) is probability density function of Hdd multivariate distribution, and x(i)Representing the ith feature, and m represents the number of samples of the feature;
and the anomaly detection unit is used for obtaining a multi-element distribution probability density function of each distribution after obtaining all the distributions in the set kappa according to the anomaly detection model, calculating the probability value of each distribution by using the data subjected to dimension reduction processing, and when the probability value is greater than a specified threshold value, considering that an anomaly occurs and identifying the dimension of the anomaly.
Based on the same technical concept as the method embodiment, according to another embodiment of the present invention, there is provided a computer apparatus including: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, which when executed by the processors implement the steps in the method embodiments.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (7)

1. A data center anomaly detection method based on Gaussian distribution is characterized by comprising the following steps:
the method comprises the following steps of obtaining the characteristics of a hardware layer, a software layer and a physical environment of a data center server to form a multi-dimensional characteristic data set, wherein the multi-dimensional characteristic data set is expressed in a matrix form as follows:
Figure FDA0002949650460000011
n represents a characteristic dimension, each matrix element Xd(d is more than or equal to 1 and less than or equal to n) represents a vector formed by a plurality of physical quantities, and the vector is respectively one of X _ CPU, X _ GPU, X _ memory, X _ disk, X _ net, X _ thread and X _ phy, wherein X _ CPU is a series of characteristics for representing the working state of a CPU, X _ GPU is a series of characteristics for representing the working state of a GPU, X _ memory is a series of characteristics for representing the working state of a memory, X _ disk is a series of characteristics for representing the working state of a disk, X _ net is a series of characteristics for representing the working state of a network, X _ thread is a series of characteristics for representing the state of process resources, and X _ phy is a series of characteristics for representing a physical environment;
and performing dimensionality reduction processing on the acquired multi-dimensional feature data set, wherein the dimensionality reduction processing comprises the following steps:
s21, for the d dimension feature XdThe jth element X of (2)djCalculating each feature X according to equation (1)djAverage value of (d):
Figure FDA0002949650460000012
wherein the superscript i represents the specific feature serial number, and m is the number of samples taken for the element feature;
s22, using
Figure FDA0002949650460000013
Replace each
Figure FDA0002949650460000014
Substituting equation (2) for feature scaling for each feature:
Figure FDA0002949650460000015
where max _ xdjRepresents the maximum value of the jth element feature of the d-dimension, min _ xdjRepresenting the minimum value of j element characteristics of the d dimension;
s23, the step S22
Figure FDA0002949650460000021
Substituting equation (3) to calculate the covariance matrix:
Figure FDA0002949650460000022
s24, sorting the covariance matrix elements from big to small, taking the first k columns to form a new covariance matrix ureduceThen, calculating a new feature value according to the formula (4) to obtain a new feature matrix dataset _ z:
z=Ureduce Tx (4)
Figure FDA0002949650460000023
according to the data subjected to dimension reduction processing, performing operation by using an anomaly detection model based on Gaussian distribution to obtain an anomaly detection result, wherein the anomaly detection model based on Gaussian distribution is generated according to the following method: recording the set of k features after dimensionality reduction as a set χ, selecting a first element in the set χ to be placed in an empty set κ, and then circularly executing the following operations until the set χ is empty:
a) calculating the distribution of the first column characteristic value in the set chi according to the Gaussian distribution, and marking as Pfirst(x) Separately calculate Pfirst(x) A correlation coefficient r associated with each distribution in the set κ;
b) when the | r | is larger than a specified threshold value, an eta matrix and an s matrix corresponding to the two distributions are calculated to form a multi-element high-density data center distribution which is recorded as Hdd distribution, and P is removed from the set χfirst(x) The cycle is ended;
c) otherwise, the master is Pfirst(x) Put into kappa collection and return to step a.
2. The data center abnormality detection method based on gaussian distribution according to claim 1, characterized in that the η matrix and the s matrix are calculated as follows:
Figure FDA0002949650460000024
Figure FDA0002949650460000025
Figure FDA0002949650460000031
where eta ∈ Rn,s∈Rn×n,f∈RnEta is mean vector of Hdd multivariate distribution, s is covariance matrix of Hdd multivariate distribution, f is intermediate parameter vector of Hdd multivariate distribution, and is formed by dividing corresponding elements of eta and s, and p (x) is probability density function of Hdd multivariate distribution,x(i)Represents the ith feature and m represents the number of samples of that feature.
3. The method for detecting the abnormality of the data center based on the gaussian distribution according to claim 2, wherein the performing an operation by using the gaussian distribution based abnormality detection model to obtain the abnormality detection result comprises:
and obtaining a multivariate distribution probability density function of each distribution after obtaining all the distributions in the set kappa according to an abnormality detection model, calculating a probability value of each distribution by using data subjected to dimension reduction processing, and determining the occurrence of an abnormality and identifying the dimension of the abnormality when the probability value is greater than a specified threshold value.
4. A data center anomaly detection device based on Gaussian distribution is characterized by comprising:
the data acquisition module is used for acquiring the characteristics of a hardware layer, a software layer and a physical environment of the data center server to form a multi-dimensional characteristic data set, and the multi-dimensional characteristic data set is expressed in a matrix form as follows:
Figure FDA0002949650460000032
n represents a characteristic dimension, each matrix element Xd(d is more than or equal to 1 and less than or equal to n) represents a vector formed by a plurality of physical quantities, and the vector is respectively one of X _ CPU, X _ GPU, X _ memory, X _ disk, X _ net, X _ thread and X _ phy, wherein X _ CPU is a series of characteristics for representing the working state of a CPU, X _ GPU is a series of characteristics for representing the working state of a GPU, X _ memory is a series of characteristics for representing the working state of a memory, X _ disk is a series of characteristics for representing the working state of a disk, X _ net is a series of characteristics for representing the working state of a network, X _ thread is a series of characteristics for representing the state of process resources, and X _ phy is a series of characteristics for representing a physical environment;
the preprocessing module is configured to perform dimension reduction processing on the acquired multi-dimensional feature data set, and the preprocessing module specifically includes:
a mean value calculation unit for calculating the d-th dimension characteristic XdThe jth element X of (2)djCalculating each feature X according to equation (1)djAverage value of (d):
Figure FDA0002949650460000041
wherein the superscript i represents the specific feature serial number, and m is the number of samples taken for the element feature;
feature scaling unit for using
Figure FDA0002949650460000042
Replace each
Figure FDA0002949650460000043
Substituting equation (2) for feature scaling for each feature:
Figure FDA0002949650460000044
where max _ xdjRepresents the maximum value of the jth element feature of the d-dimension, min _ xdjRepresenting the minimum value of j element characteristics of the d dimension;
a covariance matrix calculation unit for scaling the features obtained by the feature scaling unit
Figure FDA0002949650460000045
Substituting equation (3) to calculate the covariance matrix:
Figure FDA0002949650460000046
a new feature matrix calculation unit for sorting the covariance matrix elements from large to small, taking the first k columns to form a new covariance matrix ureduceThen, calculating a new feature value according to the formula (4) to obtain a new feature matrix dataset _ z:
z=Ureduce Tx (4)
Figure FDA0002949650460000047
the anomaly detection module is used for calculating by using an anomaly detection model based on Gaussian distribution according to data subjected to dimension reduction processing to obtain an anomaly detection result, and comprises a model construction unit which is used for generating a data center anomaly detection model based on Gaussian distribution, wherein the generation method comprises the following steps: recording the set of k features after dimensionality reduction as a set χ, selecting a first element in the set χ to be placed in an empty set κ, and then circularly executing the following operations until the set χ is empty:
a) calculating the distribution of the first column characteristic value in the set chi according to the Gaussian distribution, and marking as Pfirst(x) Separately calculate Pfirst(x) A correlation coefficient r associated with each distribution in the set κ;
b) when the | r | is larger than a specified threshold value, an eta matrix and an s matrix corresponding to the two distributions are calculated to form a multi-element high-density data center distribution which is recorded as Hdd distribution, and P is removed from the set χfirst(x) The cycle is ended;
c) otherwise, the master is Pfirst(x) Put into kappa collection and return to step a.
5. The apparatus for detecting data center abnormality based on gaussian distribution according to claim 4, wherein the η matrix and the s matrix are calculated as follows:
Figure FDA0002949650460000051
Figure FDA0002949650460000052
Figure FDA0002949650460000053
where eta ∈ Rn,s∈Rn×n,f∈RnEta is mean vector of Hdd multivariate distribution, s is covariance matrix of Hdd multivariate distribution, f is intermediate parameter vector of Hdd multivariate distribution, and is formed by dividing corresponding elements of eta and s, p (x) is probability density function of Hdd multivariate distribution, and x(i)Represents the ith feature and m represents the number of samples of that feature.
6. The data center abnormality detection device according to claim 4, wherein the abnormality detection module further includes an abnormality detection unit, configured to obtain a multivariate distribution probability density function of each distribution after obtaining all distributions in the set κ according to the abnormality detection model, calculate a probability value of each distribution by using dimension-reduced data, and determine that an abnormality occurs and identify a dimension in which the abnormality occurs when the probability value is greater than a specified threshold.
7. A computer device, the device comprising:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, which when executed by the processors implement the steps of any of claims 1-3.
CN202010515936.3A 2020-06-09 2020-06-09 Data center anomaly detection method and device based on Gaussian distribution Active CN111737099B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010515936.3A CN111737099B (en) 2020-06-09 2020-06-09 Data center anomaly detection method and device based on Gaussian distribution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010515936.3A CN111737099B (en) 2020-06-09 2020-06-09 Data center anomaly detection method and device based on Gaussian distribution

Publications (2)

Publication Number Publication Date
CN111737099A CN111737099A (en) 2020-10-02
CN111737099B true CN111737099B (en) 2021-04-16

Family

ID=72648545

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010515936.3A Active CN111737099B (en) 2020-06-09 2020-06-09 Data center anomaly detection method and device based on Gaussian distribution

Country Status (1)

Country Link
CN (1) CN111737099B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115373834A (en) * 2021-05-27 2022-11-22 北京火山引擎科技有限公司 Intrusion detection method based on process call chain
CN114527249B (en) * 2022-01-17 2024-03-19 南方海洋科学与工程广东省实验室(广州) Quality control method and system for water quality monitoring data
CN114816825B (en) * 2022-06-23 2022-09-09 光谷技术有限公司 Internet of things gateway data error correction method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102361014A (en) * 2011-10-20 2012-02-22 上海大学 State monitoring and fault diagnosis method for large-scale semiconductor manufacture process
CN106547852A (en) * 2016-10-19 2017-03-29 腾讯科技(深圳)有限公司 Abnormal deviation data examination method and device, data preprocessing method and system
CN106850687A (en) * 2017-03-29 2017-06-13 北京百度网讯科技有限公司 Method and apparatus for detecting network attack
CN107133654A (en) * 2017-05-25 2017-09-05 大连理工大学 A kind of method of monitor video accident detection
CN108075906A (en) * 2016-11-08 2018-05-25 上海有云信息技术有限公司 A kind of management method and system for cloud computation data center

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102110480B1 (en) * 2020-02-03 2020-05-13 주식회사 이글루시큐리티 Method for detecting anomaly based on unsupervised learning and system thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102361014A (en) * 2011-10-20 2012-02-22 上海大学 State monitoring and fault diagnosis method for large-scale semiconductor manufacture process
CN106547852A (en) * 2016-10-19 2017-03-29 腾讯科技(深圳)有限公司 Abnormal deviation data examination method and device, data preprocessing method and system
CN108075906A (en) * 2016-11-08 2018-05-25 上海有云信息技术有限公司 A kind of management method and system for cloud computation data center
CN106850687A (en) * 2017-03-29 2017-06-13 北京百度网讯科技有限公司 Method and apparatus for detecting network attack
CN107133654A (en) * 2017-05-25 2017-09-05 大连理工大学 A kind of method of monitor video accident detection

Also Published As

Publication number Publication date
CN111737099A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN111737099B (en) Data center anomaly detection method and device based on Gaussian distribution
Lan et al. Toward automated anomaly identification in large-scale systems
US8843422B2 (en) Cloud anomaly detection using normalization, binning and entropy determination
Yan et al. Big-data-driven based intelligent prognostics scheme in industry 4.0 environment
Chen et al. Predicting job completion times using system logs in supercomputing clusters
CN111950660A (en) Alarm prediction method and device for artificial intelligence training platform
US20090182994A1 (en) Two-level representative workload phase detection method, apparatus, and computer usable program code
Aksar et al. Proctor: A semi-supervised performance anomaly diagnosis framework for production hpc systems
Maroulis et al. A holistic energy-efficient real-time scheduler for mixed stream and batch processing workloads
Watanakeesuntorn et al. Massively parallel causal inference of whole brain dynamics at single neuron resolution
CN113836806A (en) PHM model construction method, system, storage medium and electronic equipment
CN110287256B (en) Cloud computing-based power grid data parallel processing system and processing method thereof
Liang et al. Prediction method of energy consumption based on multiple energy-related features in data center
CN110874601B (en) Method for identifying running state of equipment, state identification model training method and device
CN111198979A (en) Method and system for cleaning big data for power transmission and transformation reliability evaluation
CN113296990B (en) Method and device for recognizing abnormity of time sequence data
Wang et al. Anomaly monitoring in high-density data centers based on gaussian distribution anomaly detection algorithm
CN113535522A (en) Abnormal condition detection method, device and equipment
CN112395167A (en) Operation fault prediction method and device and electronic equipment
US20190138931A1 (en) Apparatus and method of introducing probability and uncertainty via order statistics to unsupervised data classification via clustering
CN116541252B (en) Computer room fault log data processing method and device
CN109474445B (en) Distributed system root fault positioning method and device
US11907159B2 (en) Method for representing a distributed computing system by graph embedding
Dheenadayalan et al. Premonition of storage response class using skyline ranked ensemble method
CN111694712B (en) Dynamic self-adaptive power consumption measuring method, system and medium for CPU and memory on multiple computing nodes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant