CN111914873A

CN111914873A - Two-stage cloud server unsupervised anomaly prediction method

Info

Publication number: CN111914873A
Application number: CN202010505118.5A
Authority: CN
Inventors: 刘发贵; 蔡木庆
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2020-11-10

Abstract

The invention discloses a two-stage cloud server unsupervised anomaly prediction method, which is used for solving the anomaly prediction problem of a cloud server environment. The method comprises a prediction stage and an anomaly detection stage, wherein the prediction stage trains a many-to-many time sequence prediction model according to preprocessed key performance index data of the historical cloud server, and predicts cloud server KPIs data at a future moment by using the model; and in the abnormal detection stage, training a multivariate abnormal detection model according to the preprocessed KPIs data of the historical cloud server, performing abnormal detection on the predicted KPIs data at the future time by using the model to obtain the abnormal probability of the data point at the future time, and finally setting an abnormal probability threshold, wherein the data point which is larger than the abnormal probability threshold is considered as an abnormal data point, otherwise, the data point is a normal data point, and an abnormal prediction result is obtained. The invention has the advantages of no dependence on label data, wider applicability and excellent performance.

Description

Two-stage cloud server unsupervised anomaly prediction method

Technical Field

The invention belongs to the technical field of computer application, and particularly relates to a two-stage cloud server unsupervised anomaly prediction method.

Background

The cloud server (ECS) is a simple, efficient, safe, reliable, and flexible computing Service, which enables developers to purchase hardware independently, but only hosts the Service on the cloud server. While cloud server hosting services have a very broad prospect, many challenges remain for cloud server providers, with the availability of cloud servers at the forefront. Software and hardware faults or misoperation can directly cause the cloud server to be down, and further cause the loss of users. Therefore, how to effectively monitor the cloud server and effectively predict the abnormality becomes an important research problem.

The monitoring data of the cloud server and the monitoring data generated by the hosted service form a monitoring data set with a large number and various monitoring index types. The traditional monitoring means which completely depends on operation and maintenance personnel for observation becomes unrealistic, then a means for assisting the operation and maintenance personnel to monitor is derived, some rules are summarized by screening Key Performance Indicators (KPIs) data of the cloud server, corresponding threshold values are set, and if the comparison result exceeds the threshold values, the condition that the anomaly exists is considered. However, these methods are not universal, still depend on operation and maintenance personnel strongly, and are often difficult to apply to complex business scenarios due to their simple logic, and these methods often only have monitoring capability but not prediction capability, and are unable to predict abnormalities in advance and warn operation and maintenance personnel. In recent years, intelligent operation and maintenance become an important means for promoting the operation and maintenance development of cloud servers, historical KPIs data of the cloud servers are analyzed through an artificial intelligence algorithm, the characteristics of the KPIs data are mined, abnormal types are automatically identified, future KPIs data are predicted through a prediction algorithm and diagnosed, the abnormality at the future moment is found, an alarm is given before the abnormality occurs, and the KPIs data are repaired in advance to avoid cloud server faults.

At present, researchers have proposed a plurality of anomaly prediction methods, and documents "anomaly prediction method, anomaly prediction system and anomaly prediction device (CN 106330852B)" propose to record and construct corresponding prediction rules according to system commands to perform anomaly prediction. The document "a method and apparatus for anomaly prediction of a drive system (CN 110426634A)" proposes to construct a state distribution map of a system according to current response parameters and historical states of the system, and quantify an anomaly risk of a system region, so as to predict system anomalies, however, there is a complex dependency relationship between cloud server KPIs data, and the data has unbalanced and unlabeled features, and the method is particularly difficult in constructing the state distribution map, so that an anomaly prediction model is difficult to construct. Therefore, how to construct a method suitable for cloud server anomaly prediction still remains a challenge of cloud server intelligent operation and maintenance.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a two-stage cloud server abnormity prediction method which is used for predicting future KPIs data according to historical KPIs data and detecting the future KPIs data so as to realize abnormity prediction, alarm before abnormity occurs and repair in advance to avoid cloud server faults.

The purpose of the invention is realized by at least one of the following technical solutions.

A two-stage cloud server unsupervised anomaly prediction method comprises the following steps:

s1, performing missing value processing, data formatting and data normalization processing on the collected historical KPIs data of the cloud server;

s2, training a many-to-many time sequence prediction model and a multivariate abnormality detection model according to the preprocessed historical KPIs time sequence data of the cloud server;

s3, predicting KPIs data of the cloud server at a future moment by using a trained many-to-many time sequence prediction model;

s4, carrying out anomaly detection on the KPIs data at the predicted future time by using the trained multivariate anomaly detection model to obtain the anomaly probability of each future time data point;

s5, setting an abnormal probability threshold, judging whether the data point is an abnormal data point according to the abnormal probability of the data point, wherein the data point which is larger than the abnormal probability threshold is considered to be an abnormal data point, otherwise, the data point is a normal data point, and obtaining the result of abnormal prediction.

Further, in step S1, the missing value processing means performing a completion repair by using the average value of the latest 24 non-missing values for the missing values of the interval, that is, no more than 5 consecutive missing values of a certain row or a certain column; directly removing missing values when the number of the continuous missing values, namely the number of the continuous missing values of a certain row or a certain column, exceeds 5;

the data formatting refers to converting a category time sequence into a numerical time sequence by adopting an enumeration mode for the category time sequence of a certain dimension;

and the data normalization processing refers to normalizing the KPIs data of the cloud server after missing value processing and data formatting, and distributing the KPIs data in [0,1 ].

Further, in step S2, the many-to-many time series prediction model includes an encoder and a decoder, the encoder is used for fully extracting the features of the KPIs time series data, and the decoder is used for decoding the extracted features and outputting the predicted value at the future time; the input time sequence is

Wherein X_i＝{x⁽¹⁾,x⁽²⁾,…,x^(m)Predicting P-step KPIs time sequence of future time

Wherein X_i＝{x⁽¹⁾,x⁽²⁾,…,x^(m)The input sequence X is a multivariate time sequence, T represents the length of X, and m represents the dimension of X; the predicted target time series Y is a multivariate time series, where P represents a pre-predictionMeasuring the number of steps;

the encoder is realized by adopting a multi-layer stacked LSTM neural network, aims to fully extract time sequence characteristics of time sequence data of KPIs (Key Performance indicators) of a plurality of cloud servers, and provides characteristic input with weight, namely an encoding vector, for training of a prediction model by adopting an attention mechanism, and specifically comprises the following steps:

the state of the LSTM unit of the encoder needs to be transferred while considering the hidden state of the previous LSTM unit and the hidden state of the next layer of LSTM unit, that is:

wherein

And

respectively represent the k-th layer, the state of the memory cell at time T and the state of the hidden layer, L represents the number of layers of the stacked LSTMs, and T represents the length of the input time sequence. Through the feature extraction of the multi-layer LSTM, the hidden layer vectors of each moment are finally obtained

For convenience of description, use h₁,h₂,…,h_TInstead of this. Computing the hidden layer vector h of each LSTM unit of the encoder₁,h₂,…,h_TD respectively with the state of the decoder_t＝Concatnate(S_t-1,Y′_t-1) Of the degree of correlation, wherein S_t-1Indicating the hidden layer state, Y ', of the decoder at the time immediately before the LSTM cell'_t-1Representing the true value, D, of a time instant on the LSTM unit of the decoder_tFinger joint S_t-1And Y'_t-1The vectors connected by columns, will S_t-1And Y'_t-1The connection and re-correlation calculation is to guide the calculation of the correlation by the real value of the last moment, and finally the normalization weight a of each hidden layer vector is obtained by using softmax normalization_ijThe calculation formula is as follows:

wherein e_ijState D representing the i-th moment of the decoder_tAnd the state h of the jth hidden layer of the encoder_jCan learn e through a neural network_ijIn which V is_a、W_aAnd U_aThe method is a weight parameter needing to be learned, and in consideration of the effectiveness of historical time sequence information, a Soft Attention mechanism (Soft Attention) is adopted to distribute weights to the states of hidden layers of all encoders;

state h for hidden layer of encoder₁,h₂,…,h_TCarrying out weighted summation to obtain a coding vector C corresponding to the ith moment of the decoder_iThe calculation formula is as follows:

further, the decoder is realized by adopting a single-layer LSTM neural network, the features obtained by encoding are decoded based on the single-layer LSTM, real data are taken as input, values of KPIs data at future moments of a plurality of cloud servers are output, and the construction of a many-to-many time sequence prediction model is completed, specifically as follows:

the state update of the decoder's LSTM is as follows:

wherein S_t-1Represents the hidden layer state at the previous time, Y'_t-₁Representing the true value, S, of the last instant_t-1Is composed of (h)_t-1,c_t-1) C represents a coded vector output by the coder; the first moment of the decoder uses the coding vector C to initialize the state of the hidden layer and the state of the memory unit, and the input is 0; the state of the hidden layer at the later moment is updated as the state of the common LSTM, but the state of the hidden layer at the later moment comprises an encoding vector and a real value at the last moment when training is input, the predicted value at the last moment is not taken as the input to avoid accumulative errors, and the state of the hidden layer at the later moment comprises the encoding vector and the predicted value at the last moment when verifying and; according to the output h of the decoder at time t_tA predicted value at time t can be obtained, and a linear layer is usually added to adjust the dimension of the target output sequence, as shown in formula (7):

Y_t＝linear(h_t)； (7)

finally obtaining the predicted value of the P step by outputting the predicted values one by one

Wherein X_i＝{x⁽¹⁾,x⁽²⁾,…,x^(m)}。

Further, a many-to-many time sequence prediction model is trained according to a Bagging integration algorithm based on sliding window subsequence random sampling, which specifically comprises the following steps:

the method comprises the steps of segmenting input multi-dimensional cloud server historical KPIs time sequence data in a sliding window mode to obtain a subsequence set, then sampling the subsequence set by a non-replaced random sampling method, in order to guarantee diversity of an individual predictor, randomly taking two thirds of the number of the subsequence set for each training sample to finally obtain M subsequence sets, training a constructed many-to-many time sequence prediction model by taking the subsequence sets as a training set to obtain the individual predictor, and finally integrating the individual predictors to obtain an integrated predictor, namely the trained many-to-many time sequence prediction model.

Further, in step S2, the multivariate abnormality detection model is an abnormality detection model generated based on abnormal values, and the generator and the discriminator are constructed by using a multilayer fully-connected neural network; the construction and training of a multivariate anomaly detection model are completed by adopting a pseudo label data generation method based on a stack automatic encoder and K-Means + + clustering and an unsupervised multivariate anomaly detection training method based on a multi-target generation countermeasure network, and the method specifically comprises the following steps:

reducing the dimension of time series data of KPIs (Key Performance indicators) of a cloud server comprising a plurality of variables by adopting a stacking automatic encoder, clustering the time series data after dimension reduction by adopting a K-Means + + algorithm to obtain K clusters, removing the cluster containing the least number of data points as a suspected abnormal data cluster, and marking a pseudo label representing normal data on the data of other clusters;

the method comprises the steps of generating potential abnormal data by adopting a plurality of sub-generators with differences, combining the obtained data with a pseudo label representing normal, inputting the data into a discriminator for training, generating abnormal data similar to real data as much as possible by a multi-target generator, discriminating the generated abnormal data and normal data as much as possible by the discriminator, and taking the finally obtained discriminator as an abnormality detection two-classifier, namely a trained multivariable abnormality detection model.

Further, dimension reduction is carried out on the complex time sequence by using a stacking automatic encoder, the stacking automatic encoder adopts a plurality of single-layer automatic encoders, the single-layer automatic encoders are realized through a fully-connected neural network and are unsupervised artificial neural networks, the output target of the single-layer automatic encoders is input, and the realization process of the automatic encoders comprises two stages:

the first stage is an encoding stage, in which the auto-encoder (AE) obtains the hidden state a (x) by encoding the input data, as shown in equation (8):

a(X)＝f(W₁X+b₁)； (8)

wherein X is the input, W₁Is a weight vector, b₁Is a bias unit, f is a nonlinear activation function;

the second stage is a decoding stage, in which the Automatic Encoder (AE) reconstructs a (X) obtained by encoding, and the decoding obtains X' close to the input, as shown in formula (9):

X′＝f(W₂a(X)+b₂)； (9)

wherein W₂Is a weight vector, b₂The offset unit is used for considering that the hidden layer state a (X) is abstract characteristic representation of input data when the output X' is equal to or close to the input X;

the method comprises the steps that a Stacking Automatic Encoder (SAEs) learns input data layer by simulating a multilayer expression mode of brain perception data, and extracts deeper and more abstract features; the Stacking Automatic Encoders (SAEs) comprise a plurality of single-layer automatic encoders, the single-layer Automatic Encoders (AE) map input to a hidden layer, after the first Automatic Encoder (AE) is trained, a reconstruction layer of the first Automatic Encoder (AE) is discarded, the hidden layer becomes an input layer of a second single-layer Automatic Encoder (AE), and subsequent layers are also the same, and the same objective function and optimization algorithm are adopted for training of each layer of Automatic Encoders (AE); and the complex time sequence is subjected to dimensionality reduction through a stacking automatic encoder to obtain a characteristic representation after dimensionality reduction.

Further, clustering the complex time sequence data after dimensionality reduction by using a K-Means + + clustering algorithm; K-Means is one of classic clustering algorithms, the core idea is that a first clustering center is selected randomly, when the ith clustering center is selected, the probability that a point which is farther away from the previous i-1 clustering centers is selected is higher, wherein i is greater than 1; the calculation of the distance between the sample and the clustering center point and the probability that the sample point X is selected as the clustering center in the K-Means + + clustering process are respectively shown as a formula (10) and a formula (11):

where m is the dimension of the data point, x_iAnd c_iRespectively, the ith dimension data point for sample point X and cluster center C.

Further, the generator and the discriminator both adopt a multilayer fully-connected neural network; generation of exception data G by using k generators₁(z₁),G₂(z₂),G₃(z₃),…,G_k(z_k)},z_iThe method comprises the steps that input random data are distributed, abnormal data generated by a plurality of sub-generators are integrated, finally the generated abnormal data and data with a marked false label representing normal are input into a discriminator for training, when training begins, the generator G cannot generate abnormal values with diversity, so that the discriminator D obtains a rough boundary, but through multiple iterations, the generator G gradually learns a normal data generation mechanism of a mini-max game between G and D, and generates abnormal data with diversity; when the mini-max game reaches Nash equilibrium, namely the discriminator can hardly distinguish normal data or generated abnormal data, the training of the generator is stopped, the discriminator is continuously trained, and the discrimination error of the discriminator is further reduced; finally, the discriminator is able to correctly describe the boundary of normal data and abnormal data, and for a given data point, the discriminator outputs the probability that the data point will become normal data.

Further, in step S5, according to the abnormal probability of the cloud server KPIs data point at the future time outputted by the trained multivariate abnormal detection model, the threshold of the abnormal probability is set to 0.5, and if the abnormal probability is greater than the threshold, it is determined as an abnormal data point, otherwise, it is a normal data point.

Compared with the prior art, the invention has the following advantages and technical effects:

1. the two-stage cloud server unsupervised anomaly prediction method is independent of tag data in a cloud server environment and can process unbalanced KPIs data. Conventional anomaly prediction methods are often supervised, require a large amount of label data, and are difficult to process unbalanced KPIs data.

2. According to the proposed attention-stacking LSTMs-based many-to-many integrated prediction model, the adopted stacking LSTMs and the attention mechanism can fully extract time sequence characteristics and capture the interdependence relation characteristics among a plurality of KPIs time sequences; the adopted integrated prediction can further improve the prediction accuracy.

3. According to the anomaly detection model based on the pre-labeling and multi-target generation antagonistic network, the adopted pre-labeling technology based on the stacking automatic encoder and the K-Means + + can provide more reasonable input for anomaly detection, and the training difficulty is reduced; the adopted multi-target generator can automatically generate the potential abnormal data with diversity; the anomaly detector with high detection precision, strong generalization capability and robustness can be obtained through the countermeasure game of the generator and the discriminator.

Drawings

Fig. 1 is a schematic diagram of a two-stage cloud server anomaly prediction method in an embodiment of the method of the present invention.

FIG. 2 is a schematic diagram of a many-to-many integrated prediction model based on attention-stacking LSTMs in an embodiment of the method of the present invention.

Fig. 3 is a schematic diagram of an integrated prediction process based on a Bagging algorithm in the embodiment of the method of the present invention.

Fig. 4 is a diagram illustrating a sliding window based sub-sequence random sampling in an embodiment of the method of the present invention.

FIG. 5 is a flow chart illustrating generation of a countermeasure network based on pre-labeling and multi-objective in an embodiment of the method of the present invention.

FIG. 6 is a schematic diagram of a stacked auto-encoder according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions and advantages of the present invention more apparent, the following detailed description is made with reference to the accompanying drawings, but the present invention is not limited thereto.

Example (b):

the missing value processing means that for the missing values of intervals, namely continuous missing values of a certain row or a certain column, not more than 5, the average value of the latest 24 non-missing values is adopted for completing and repairing; directly removing missing values when the number of the continuous missing values, namely the number of the continuous missing values of a certain row or a certain column, exceeds 5;

as shown in fig. 2, the many-to-many time series prediction model includes an encoder and a decoder, the encoder is used for fully extracting the features of the KPIs time series data, and the decoder is used for decoding the extracted features and outputting predicted values at future time; the input time sequence is

Wherein X_i＝{x⁽¹⁾,x⁽²⁾,…,x^(m)And predicting P-step KPIs time sequence Y of future time

Wherein X_i＝{x⁽¹⁾,x⁽²⁾,…,x^(m)The input sequence X is a multivariate time sequence, T represents the length of X, and m represents the dimension of X; the predicted target time series Y is a multivariate time series, where P represents the predicted number of steps;

wherein

And

the decoder is realized by adopting a single-layer LSTM neural network, the features obtained by coding are decoded based on the single-layer LSTM, real data are taken as input, values of KPIs data at future moments of a plurality of cloud servers are output, and a many-to-many time sequence prediction model is constructed, specifically as follows:

the state update of the decoder's LSTM is as follows:

wherein S_t-1Represents the hidden layer state at the previous time, Y'_t-1Representing the true value, S, of the last instant_t-1Is composed of (h)_t-1,c_t-1) C represents a coded vector output by the coder; the first moment of the decoder uses the coding vector C to initialize the state of the hidden layer and the state of the memory unit, and the input is 0; the state of the hidden layer at the later moment is updated as the state of the common LSTM, but the input of the training comprises a coding vector and a real value at the previous moment, the predicted value at the previous moment is not taken as the input to avoid accumulative error, and the input packet is input during verification and testIncluding the coding vector and the predicted value at the previous moment; according to the output h of the decoder at time t_tA predicted value at time t can be obtained, and a linear layer is usually added to adjust the dimension of the target output sequence, as shown in formula (7):

Y_t＝linear(h_t)； (7)

Wherein X_i＝{x⁽¹⁾,x⁽²⁾,…,x^(m)}。

As shown in fig. 3, a many-to-many time sequence prediction model is trained according to a Bagging integration algorithm based on sliding window subsequence random sampling, which specifically includes the following steps:

In this embodiment, the sliding window based subsequence random sampling is shown in fig. 4, where N is the length of the training set, T is the length of the time sequence of LSTM input, P is the predicted number of steps, in order to fully mine the subsequence pattern, the sliding window is set to 1, and the number of finally obtained subsequences is N- (T + P) + 1. After the subsequence sets are obtained, random sampling without returning is adopted, in order to ensure the diversity of the individual predictor, only two thirds of the number of the subsequence sets are randomly taken from samples trained each time, M small data sets are obtained in total, many-to-many individual predictors are trained by using the seq2seq model based on the attention stacking LSTMs based on each small data set to obtain M many-to-many individual predictors in total, and all the individual predictors are integrated to obtain an integrated predictor, namely a trained many-to-many time sequence prediction model.

The structure diagram of the multivariate abnormality detection model is shown in fig. 5, and the multivariate abnormality detection model is an abnormality detection model generated based on abnormal values, and a generator and a discriminator are constructed by using a multilayer fully-connected neural network. The construction and training of a multivariate anomaly detection model are completed by adopting a pseudo label data generation method based on a stacking automatic encoder and K-Means + + clustering and an unsupervised multivariate anomaly detection training method based on a multi-target generation countermeasure network, and the method specifically comprises the following steps:

a(X)＝f(W₁X+b₁)； (8)

X′＝f(W₂a(X)+b₂)； (9)

the Stack Automatic Encoders (SAEs) learn the input data layer by simulating a multilayer expression mode of brain perception data, and extract deeper and more abstract features. FIG. 6 illustrates a training process for Stacked Autoencoders (SAEs) comprising a plurality of single-layer autoencoders that map inputs to a hidden layer, wherein after a first Autoencoder (AE) is trained, a reconstructed layer of the first Autoencoder (AE) is discarded, the hidden layer becomes an input layer of a second single-layer Autoencoder (AE), and so are subsequent layers, and the same objective function and optimization algorithm are used for training each layer of Autoencoder (AE); and the complex time sequence is subjected to dimensionality reduction through a stacking automatic encoder to obtain a characteristic representation after dimensionality reduction.

As shown in fig. 5, the generator and the arbiter both use a multi-layer fully-connected neural network; generation of exception data G by using k generators₁(z₁),G₂(z₂),G₃(z₃),…,G_k(z_k)},z_iThe method comprises the steps that input random data are distributed, abnormal data generated by a plurality of sub-generators are integrated, finally the generated abnormal data and data with a marked false label representing normal are input into a discriminator for training, when training begins, the generator G cannot generate abnormal values with diversity, so that the discriminator D obtains a rough boundary, but through multiple iterations, the generator G gradually learns a normal data generation mechanism of a mini-max game between G and D, and generates abnormal data with diversity; when the mini-max game reaches Nash equilibrium, namely the discriminator can hardly distinguish normal data or generated abnormal data, the training of the generator is stopped, the discriminator is continuously trained, and the discrimination error of the discriminator is further reduced; finally, the discriminator is able to correctly describe the boundary of normal data and abnormal data, and for a given data point, the discriminator outputs the probability that the data point will become normal data.

and S5, using the abnormal probability of the KPIs data points at the future time of the cloud server output by the trained multivariate abnormal detection model, setting the threshold value of the abnormal probability to be 0.5, judging the data points to be abnormal if the threshold value is larger than the threshold value, and otherwise, judging the data points to be normal data points.

Claims

1. A two-stage cloud server unsupervised anomaly prediction method is characterized by comprising the following steps:

2. The unsupervised anomaly prediction method for the two-phase cloud server according to claim 1, wherein in step S1, the missing value processing means that for every missing value, that is, no more than 5 consecutive missing values in a certain row or a certain column, the average value of the latest 24 non-missing values is used for completion and repair; directly removing missing values when the number of the continuous missing values, namely the number of the continuous missing values of a certain row or a certain column, exceeds 5;

3. The unsupervised anomaly prediction method for the two-stage cloud server according to claim 1, wherein in step S2, the many-to-many time series prediction model comprises an encoder and a decoder, the encoder is configured to fully extract features of KPIs time series data, and the decoder is configured to decode the extracted features and output predicted values at future time; the input time sequence is

wherein

And

respectively representing a k layer, a memory unit state and a hidden layer state at the moment T, wherein L represents the number of stacked LSTMs, and T represents the length of an input time sequence; through the feature extraction of the multi-layer LSTM, the hidden layer vectors of each moment are finally obtained

For convenience of description, use h₁,h₂,…,h_TInstead of this;

computing the hidden layer vector h of each LSTM unit of the encoder₁,h₂,…,h_TD respectively with the state of the decoder_t＝Concatnate(S_t-1,Y′_t-1) Of the degree of correlation, wherein S_t-1Indicating the hidden layer state, Y ', of the decoder at the time immediately before the LSTM cell'_t-1Representing the true value, D, of a time instant on the LSTM unit of the decoder_tFinger joint S_t-1And Y'_t-1The vectors connected by columns, will S_t-1And Y'_t-1The connection and re-correlation calculation is to guide the calculation of the correlation by the real value of the last moment, and finally the normalization weight a of each hidden layer vector is obtained by using softmax normalization_ijThe calculation formula is as follows:

wherein e_ijState D representing the i-th moment of the decoder_tAnd the state h of the jth hidden layer of the encoder_jCan learn e through a neural network_ijIn which V is_a、W_aAnd U_aIs a weight parameter to be learned, considering the historical time sequenceThe validity of column information adopts a Soft Attention mechanism (Soft Attention), and weights are distributed to the states of hidden layers of all encoders;

4. the unsupervised anomaly prediction method for the two-stage cloud server according to claim 3, wherein a decoder is implemented by using a single-layer LSTM neural network, the features obtained by encoding are decoded based on the single-layer LSTM, real data is used as input, values of KPIs (Key Performance indicators) data at future moments of a plurality of cloud servers are output, and a many-to-many time series prediction model is constructed, specifically as follows:

the state update of the decoder's LSTM is as follows:

wherein S_t-1Represents the hidden layer state at the previous time, Y'_t-1Representing the true value, S, of the last instant_t-1Is composed of (h)_t-1,c_t-1) C represents a coded vector output by the coder; the first moment of the decoder uses the coding vector C to initialize the state of the hidden layer and the state of the memory unit, and the input is 0; the state of the hidden layer at the later moment is updated as the state of the common LSTM, but the state of the hidden layer at the later moment comprises an encoding vector and a real value at the previous moment when training, and the state of the hidden layer at the later moment comprises an encoding vector and a predicted value at the previous moment when verifying and testing; according to the output h of the decoder at time t_tA predicted value at time t can be obtained, and a linear layer is usually added to adjust the dimension of the target output sequence, as shown in formula (7):

Y_t＝linear(h_t)； (7)

Wherein X_i＝{x⁽¹⁾,x⁽²⁾,…,x^(m)}。

5. The unsupervised anomaly prediction method for the two-stage cloud server according to claim 1, wherein a many-to-many time sequence prediction model is trained according to a Bagging integration algorithm based on sliding window subsequence random sampling, which is specifically as follows:

6. The unsupervised anomaly prediction method for the two-stage cloud server according to claim 1, wherein in step S2, the multivariate anomaly detection model is an anomaly detection model generated based on an abnormal value, and a generator and a discriminator are constructed by using a multi-layer fully-connected neural network; the construction and training of a multivariate anomaly detection model are completed by adopting a pseudo label data generation method based on a stack automatic encoder and K-Means + + clustering and an unsupervised multivariate anomaly detection training method based on a multi-target generation countermeasure network, and the method specifically comprises the following steps:

7. The unsupervised anomaly prediction method for the two-stage cloud server according to claim 6, wherein dimension reduction is performed on the complex time sequence by using a stacked automatic encoder, the stacked automatic encoder adopts a plurality of single-layer automatic encoders, the single-layer automatic encoders are realized by a fully-connected neural network, the single-layer automatic encoders are unsupervised artificial neural networks, the output targets of the single-layer automatic encoders are input, and the realization process of the automatic encoders comprises two stages:

a(X)＝f(W₁X+b₁)； (8)

X′＝f(W₂a(X)+b₂)； (9)

8. The unsupervised anomaly prediction method for the two-stage cloud server according to claim 6, characterized in that the complex time series data after dimensionality reduction is clustered using a K-Means + + clustering algorithm; K-Means is one of classic clustering algorithms, the core idea is that a first clustering center is selected randomly, when the ith clustering center is selected, the probability that a point which is farther away from the previous i-1 clustering centers is selected is higher, wherein i is greater than 1; the calculation of the distance between the sample and the clustering center point and the probability that the sample point X is selected as the clustering center in the K-Means + + clustering process are respectively shown as a formula (10) and a formula (11):

9. The unsupervised anomaly prediction method for the two-stage cloud server according to claim 6, wherein the generator and the arbiter both adopt a multilayer fully-connected neural network; generation of exception data G by using k generators₁(z₁),G₂(z₂),G₃(z₃),…,G_k(z_k)},z_iThe method comprises the steps that input random data are distributed, abnormal data generated by a plurality of sub-generators are integrated, the generated abnormal data and data with a normal pseudo label are input into a discriminator to be trained, when the training is started, the generator G cannot generate abnormal values with diversity, so that the discriminator D obtains a rough boundary, the generator G gradually learns a normal data generation mechanism of a mini-max game between G and D through multiple iterations, and the abnormal data with diversity is generated; when the mini-max game reaches Nash equilibrium, namely the discriminator can hardly distinguish normal data or generated abnormal data, the training of the generator is stopped, the discriminator is continuously trained, and the discrimination error of the discriminator is further reduced; finally, the discriminator is able to correctly describe the boundary of normal data and abnormal data, and for a given data point, the discriminator outputs the probability that the data point will become normal data.

10. The unsupervised anomaly prediction method for the two-stage cloud server according to claim 1, wherein in step S5, the anomaly probability of the KPIs data points at the future time of the cloud server output by the trained multivariate anomaly detection model is set, and the threshold of the anomaly probability is set to 0.5, and if the threshold is greater than the threshold, the data points are determined to be abnormal data points, otherwise, the data points are normal data points.