CN114647554A

CN114647554A - Performance data monitoring method and device of distributed management cluster

Info

Publication number: CN114647554A
Application number: CN202210406970.6A
Authority: CN
Inventors: 陈壮壮; 钟瑞; 郑重; 高汉
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-04-18
Filing date: 2022-04-18
Publication date: 2022-06-21

Abstract

The disclosure provides a performance data monitoring method of a distributed management cluster, which can be applied to the technical field of artificial intelligence. The performance data monitoring method of the distributed management cluster comprises the following steps: acquiring performance time sequence data of a distributed management cluster; classifying the performance time series data by using a classification model; determining the performance timing data as target performance timing data when the performance timing data is classified as periodic data; predicting the target performance time series data by using a data prediction model to determine predicted data; and determining target abnormal data according to the predicted data, the actual data and the data abnormal recognition model. The disclosure also provides a performance data monitoring apparatus, device, storage medium and program product of the distributed management cluster.

Description

Performance data monitoring method and device of distributed management cluster

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to the field of automatic operation and maintenance technologies, and in particular, to a method, an apparatus, a device, a storage medium, and a program product for monitoring performance data of a distributed management cluster.

Background

With the development of distributed technology, a distributed management cluster exists in a distributed framework as data registration, data sharing, data synchronization, message publishing and subscribing of each node, and when a large number of nodes are registered in the distributed management cluster, the management cluster is prone to have problems of slow response and high data synchronization delay in the cluster, so that it is necessary to monitor whether a large number of nodes are registered suddenly and to perform emergency processing subsequently.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

In view of the foregoing, the present disclosure provides a performance data monitoring method, apparatus, device, medium, and program product for a distributed management cluster.

According to a first aspect of the present disclosure, a performance data monitoring method for a distributed management cluster is provided, including: acquiring performance time sequence data of a distributed management cluster;

classifying the performance time series data by using a classification model;

determining the performance timing data as target performance timing data when the performance timing data is classified as periodic data;

predicting the target performance time series data by using a data prediction model to determine predicted data; and

and determining target abnormal data according to the predicted data, the actual data and the data abnormal recognition model.

According to an embodiment of the present disclosure, the classifying the performance time series data using a classification model includes:

preprocessing the performance time sequence data, wherein the preprocessing comprises dimensionality reduction processing and normalization processing;

generating a two-dimensional picture according to the preprocessed performance time sequence data;

and inputting the two-dimensional picture into the classification model for classification.

According to an embodiment of the present disclosure, before determining the prediction data, further comprising:

and constructing a data prediction model according to the target performance time series data.

According to an embodiment of the present disclosure, the constructing a data prediction model according to the target performance timing data includes:

determining an autocorrelation coefficient and a partial autocorrelation coefficient according to the target performance time series data;

determining the type of a data prediction model according to the autocorrelation coefficients and the partial autocorrelation coefficients; and

parameters of the data prediction model are determined using least squares estimation.

According to an embodiment of the present disclosure, the determining target anomaly data according to the prediction data, the actual data and the data anomaly identification model includes:

determining initial abnormal data according to the predicted data and the actual data; and

and determining target abnormal data according to the initial abnormal data and the data abnormal recognition model.

According to an embodiment of the present disclosure, the determining initial anomaly data from the predicted data and the actual data comprises:

determining a difference value between the predicted data and actual data at the same time as the predicted data; and

and when the difference value is larger than a first preset threshold value, determining that the actual data is initial abnormal data.

According to an embodiment of the present disclosure, the determining target anomaly data according to the initial anomaly data and a data anomaly identification model includes:

training the target performance time sequence data serving as a training set to obtain a data anomaly identification model;

determining the path length of the initial abnormal data according to the data abnormality identification model; and

and determining target abnormal data according to the path length of the initial abnormal data.

According to an embodiment of the present disclosure, the target performance timing data includes first target performance timing data, second target performance timing data, and third target performance timing data, wherein the first target performance timing data is data for a time series data window of days, the second target performance timing data is data for a time series data window of weeks, and the third target performance timing data is data for a time series data window of months.

According to an embodiment of the present disclosure, the constructing a data prediction model according to the target performance timing data further includes:

determining a first autocorrelation coefficient and a first partial autocorrelation coefficient according to the first target performance time series data;

determining the type of a first data prediction model according to the first autocorrelation coefficients and the first partial autocorrelation coefficients; and

determining parameters of the first data prediction model using least squares estimation.

determining a second autocorrelation coefficient and a second partial autocorrelation coefficient according to the second target performance time series data;

determining the type of a second data prediction model according to the second autocorrelation coefficients and the second partial autocorrelation coefficients; and

determining parameters of the second data prediction model using least squares estimation.

determining a third autocorrelation coefficient and a third partial autocorrelation coefficient according to the third target performance time series data;

determining the type of a third data prediction model according to the third autocorrelation coefficient and the third partial autocorrelation coefficient; and

determining parameters of the third data prediction model using least squares estimation.

According to an embodiment of the present disclosure, the predicting the target performance timing data using a data prediction model to determine prediction data includes:

predicting the first target performance time series data by using a first data prediction model to determine first prediction data;

predicting second target performance time series data by using a second data prediction model to determine second prediction data; and

the third target performance timing data is predicted using a third data prediction model to determine third predicted data.

According to the embodiment of the disclosure, the training of the target performance time sequence data as a training set to obtain a data anomaly identification model comprises;

training the first target performance time sequence data serving as a first training set to obtain a first data anomaly identification model;

training the second target performance time sequence data serving as a second training set to obtain a second data anomaly identification model; and

and taking the third target performance time sequence data as a third training set, and training to obtain a third data anomaly identification model.

According to an embodiment of the present disclosure, the determining the path length of the initial anomaly data according to the data anomaly identification model includes:

determining a first path length of the initial abnormal data according to the first data abnormality identification model;

determining a second path length of the initial abnormal data according to the second data abnormality identification model; and

and determining a third path length of the initial abnormal data according to the third data abnormality identification model.

According to an embodiment of the present disclosure, the determining target anomaly data according to the path length of the initial anomaly data includes:

and if the first path length, the second path length and the third path length of the initial abnormal data are all smaller than a second preset threshold, determining that the initial abnormal data are target abnormal data.

According to an embodiment of the present disclosure, the determining target anomaly data according to the path length of the initial anomaly data further comprises:

if at least one of the first path length, the second path length and the third path length of the initial abnormal data is greater than or equal to a second preset threshold value, determining that the initial abnormal data is normal data; and

and adding the actual data into the training set, and retraining the data anomaly identification model.

According to the embodiment of the present disclosure, after determining the target abnormal data, the method further includes:

generating alarm information according to the target abnormal data; and

and sending the alarm information.

A second aspect of the present disclosure provides a performance data monitoring apparatus for a distributed management cluster, including: the acquisition module is used for acquiring performance time sequence data of the distributed management cluster;

the classification module is used for classifying the performance time sequence data by utilizing a classification model;

a first determining module for determining the performance time series data as target performance time series data when the performance time series data is classified as periodic data;

the data prediction module is used for predicting the target performance time sequence data by using a data prediction model so as to determine prediction data; and

and the second determining module is used for determining target abnormal data according to the predicted data, the actual data and the data abnormal recognition model.

A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the performance data monitoring method of the distributed management cluster described above.

A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions, which when executed by a processor, cause the processor to perform the performance data monitoring method of the distributed management cluster described above.

A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the performance data monitoring method of the distributed management cluster described above.

According to the performance data monitoring method of the distributed management cluster, provided by the embodiment of the disclosure, the performance time sequence data of the distributed management cluster is obtained, the performance time sequence data is classified according to the classification model to determine that the periodic performance time sequence data is the target performance time sequence data, the target performance time sequence data is input into the data prediction model to be predicted, the predicted data and the actual data are input into the data abnormality recognition model to finally determine the target abnormal data, so that the real-time detection of the performance time sequence data can be realized, and the accuracy of abnormal data detection is improved.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an application scenario diagram of a method, apparatus, device, medium and program product for performance data monitoring of a distributed management cluster according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow diagram of a method of performance data monitoring of a distributed management cluster according to an embodiment of the disclosure;

FIG. 3 schematically illustrates a performance data anomaly schematic for a distributed management cluster according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of a method of determining target performance timing data according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow chart of a method of determining a data prediction model according to an embodiment of the present disclosure;

FIG. 6a schematically illustrates one of the flow charts of a method of determining target anomaly data according to an embodiment of the present disclosure;

FIG. 6b schematically illustrates a flow chart of a method of determining initial anomaly data according to an embodiment of the present disclosure;

FIG. 6c schematically illustrates a second flow chart of a method of determining target anomaly data according to an embodiment of the present disclosure;

fig. 7 schematically shows a block diagram of a performance data monitoring apparatus of a distributed management cluster according to an embodiment of the present disclosure; and

fig. 8 schematically illustrates a block diagram of an electronic device adapted to implement a method of performance data monitoring of a distributed management cluster according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

The terms appearing in the embodiments of the present disclosure are explained first:

distributed management cluster: the cluster responsible for data storage, data synchronization and data sharing in the field of distributed architecture provides a reliable way to store data accessed by distributed systems or machine clusters, such as commonly used zookeeper, etcd, Consul and other products.

Performance timing data: the performance time sequence data in the embodiment of the disclosure is the change condition of the number of the nodes registered by the distributed cluster to the management cluster along with time.

When a large number of nodes are registered in the distributed management cluster in a short time, the management cluster is easy to have slow response and high data synchronization delay in the cluster, so that the situation that the distributed cluster registers node data in the management cluster is monitored in real time, and the performance problems that the management cluster is slow in response, the cluster data delay is overlarge and the like caused by the fact that a large number of data are registered in the management cluster in a short time due to various abnormal conditions are solved.

Based on the above problem, an embodiment of the present disclosure provides a performance data monitoring method for a distributed management cluster, including: acquiring performance time sequence data of a distributed management cluster; classifying the performance time series data by using a classification model; determining the performance timing data as target performance timing data when the performance timing data is classified as periodic data; predicting the target performance time series data by using a data prediction model to determine predicted data; and determining target abnormal data according to the predicted data, the actual data and the data abnormal recognition model.

It should be noted that the performance data monitoring method and apparatus of the distributed management cluster determined by the present disclosure may be used in the financial field, and may also be used in any field other than the financial field.

Fig. 1 schematically illustrates an application scenario diagram of a performance data monitoring method, apparatus, device, medium, and program product of a distributed management cluster according to an embodiment of the present disclosure.

As shown in fig. 1, an application scenario 100 according to this embodiment may include a performance data monitoring scenario. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the

terminal devices

101, 102, 103. The background management server can register nodes to the distributed cluster according to the received data such as the user request and provide various services for the user.

It should be noted that the performance data monitoring method of the distributed management cluster provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the performance data monitoring apparatus of the distributed management cluster provided by the embodiment of the present disclosure may be generally disposed in the server 105. The performance data monitoring method of the distributed management cluster provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the performance data monitoring apparatus of the distributed management cluster provided in the embodiment of the present disclosure may also be disposed in a server or a server cluster that is different from the server 105 and can communicate with the

terminal devices

101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.

The performance data monitoring method of the distributed management cluster according to the disclosed embodiment will be described in detail with reference to fig. 2 to 6 based on the scenario described in fig. 1.

Fig. 2 schematically shows a flow chart of a method for performance data monitoring of a distributed management cluster according to an embodiment of the present disclosure. FIG. 3 schematically illustrates a performance data anomaly schematic for a distributed management cluster according to an embodiment of the present disclosure.

As shown in fig. 2, the performance data monitoring method of the distributed management cluster of this embodiment includes operations S210 to S250, and the monitoring method may be executed by a server or other computing device.

Under normal conditions, the number of the distributed cluster registration nodes is regular in a history period, that is, performance time series data, data at the current time or at any time can be predicted according to the history data, and whether the number of the registration nodes is abnormal or not is further judged, as shown in fig. 3, the points a and B are abnormal points. It should be noted that the method provided by the embodiment of the present disclosure may also be used for anomaly detection and monitoring of other performance data, for example, anomaly detection of cpu, energy consumption data, and the like.

The performance timing data is first processed through operations S210 and S220.

In operation S210, performance timing data of the distributed management cluster is acquired.

In operation S220, the performance time series data is classified using a classification model.

In operation S230, when the performance time series data is classified as periodic data, the performance time series data is determined as target performance time series data.

According to an embodiment of the disclosure, the target performance timing data includes a first target performance timing data, a second target performance timing data, and a third target performance timing data, wherein the first target performance timing data is data for a time series data window of days, the second target performance timing data is data for a time series data window of weeks, and the third target performance timing data is data for a time series data window of months.

In one example, where the performance timing data includes periodic and non-periodic types, the disclosed embodiments provide methods for periodic type data only, it is therefore desirable to classify performance timing data, identify periodic data, because in the actual service, the service requirement of large-batch operation exists, the number of the registered nodes of the distributed management cluster at a certain moment is more than that at other moments, however, this case is also a normal case, so the embodiment of the present disclosure divides the time series data window into three dimensions of day, week and month, and sequentially monitors and detects the abnormal data, and only when the performance time series data is detected as abnormal data in all three dimensions, the performance time sequence data is finally determined to be abnormal data, so that the accuracy of abnormal data detection is improved, and false alarm caused by inaccurate abnormal data detection is reduced.

In an example, the classification model may be a classification model commonly used in machine learning, and since the performance time series data processed in the embodiment of the present disclosure is a two-dimensional picture with time and the number of registered nodes as dimensions, the classification model in the embodiment of the present disclosure may be, for example, a CNN (probabilistic Neural network) convolutional Neural network model, the convolutional Neural network CNN is designed using a variant of a multilayer perceptron, and is very suitable for the classification of pictures, a training set, a verification set, and a test set need to be prepared in advance, corresponding data all have corresponding labels 0/1, 0 represents periodic data, and 1 represents non-periodic data. A typical CNN consists of three parts: 1. a convolution layer 2, a pooling layer 3, a full connection layer. The convolutional layer is responsible for extracting local features in the image, the pooling layer is used for greatly reducing the number of parameters, and the full-link layer is used for outputting a desired result. Before the classification model is used, the classification model needs to be trained, and the training process is as follows:

the CNN network model used in the embodiment of the disclosure is a ResNet network, which refers to a VGG19 network, is modified on the basis, and adds a residual error unit through a short circuit mechanism. The change is mainly reflected in that ResNet directly uses convolution of stride-2 for downsampling, and replaces the full-connection layer with a global average pool layer. An important design principle of ResNet is: when the feature map size is reduced by half, the number of feature maps is doubled, which maintains the complexity of the network layer. Compared with the common network, the ResNet adds a short circuit mechanism between every two layers, and thus residual error learning is formed. For the ResNet of 18-layer and 34-layer, which performs residual learning between two layers, when the network is deeper, it performs residual learning between three layers, the three-layer convolution kernels are 1x1, 3x3 and 1x1, respectively, and it is noted that the feature map number of the hidden layer is relatively small and is 1/4 of the output feature map number. 10000 pictures and corresponding labels are prepared in advance as a training set, wherein 5000 pieces of periodic data correspond to the labels of 0, 5000 pieces of non-periodic data correspond to the labels of 1; and sequentially inputting the training set into a ResNet network model for training, and training a classification model through a convolutional layer, a pooling layer and a full-link layer. After training is finished, testing is carried out by using a test set (1000 pictures, 700 periodic pictures and 300 non-periodic data) and a verification set (500 pictures, 300 periodic pictures and 200 non-periodic pictures), if the verification and testing effects are not good, fine adjustment is carried out on the network, and parameters such as the number of convolutional layers, the number of pooling layers, the learning rate and the like are adjusted. After training and testing, finally obtaining a training model for classification. The process of determining the target performance time series data according to the classification model may be referred to as operations S221 to S223.

In operation S240, the target performance timing data is predicted using a data prediction model to determine predicted data.

In operation S250, target abnormal data is determined according to the prediction data, the actual data, and the data abnormality recognition model.

In one example, the classified periodic data is predicted according to a data prediction model, and the data prediction model needs to be determined first. The ARIMA differential integration moving average autoregressive model, wherein AR is an autoregressive model, and a corresponding model parameter p is a hysteresis value; i is a difference model, and a corresponding model parameter d is the difference times required by the conversion of the non-stationary sequence into the stationary sequence; MA is a moving average model, and the corresponding model parameter q is a lag value of the prediction error. The AR model describes the relationship between current and historical values. The MA model describes the relationship of the current value to the accumulation of the error in the autoregressive section. For a non-stationary sequence, due to its unstable variance and mean, the process is typically to convert it to a stationary sequence and then model it by reviewing its lag values as well as its random error values. A data prediction model is constructed according to the target performance time series data, specifically, the value of q can be determined according to the ACF autocorrelation coefficient map, and the value of p is determined according to the PACF partial autocorrelation coefficient map, and a specific construction process can refer to operations S231 to S233 shown in fig. 5.

The target performance time series data are predicted according to the data prediction model, and the target performance time series data are used as the input of the data prediction model, so that the predicted data can be obtained, and the predicted data can be three different values obtained from three cycle dimensions of day, week and month. Comparing the predicted data to the actual data further determines target anomaly data.

In one example, since the predicted data has multiple time dimensions, taking a day dimension as an example, the predicted data obtained by taking a day as the dimension is compared with an actual data value at a corresponding moment to determine that the actual data value is initial abnormal data, and the initial abnormal data is not final abnormal data, and further identification needs to be performed through a data abnormality identification model to determine whether the initial abnormal data is abnormal data. And when the actual data value is initial abnormal data compared with the predicted data obtained under the three time dimensions and is identified as abnormal data through the data abnormality identification model, determining that the actual data value is target abnormal data. For a specific identification process, refer to fig. 6a to 6c, which are not described herein again.

FIG. 4 schematically illustrates a flow chart of a method of determining target performance timing data according to an embodiment of the disclosure.

As shown in fig. 4, operation S220 includes operations S221 through S223.

In operation S221, the performance time series data is preprocessed, where the preprocessing includes a dimensionality reduction process and a normalization process.

In operation S222, a two-dimensional picture is generated according to the pre-processed performance timing data.

In operation S223, the two-dimensional picture is input to the classification model for classification.

In one example, after the trained cnn convolutional neural network model is obtained, the input data are two-dimensional time series data pictures, which are time and the number of registration nodes corresponding to the time, respectively, where the time is natural time, the number of registration nodes can be obtained by traversing the number of keys in a cluster, and the two types of data are drawn into a two-dimensional picture, the horizontal axis represents the time, and the vertical axis represents the number of registration nodes. The output result may be, for example, 0 and 1, where 0 represents periodic data and 1 represents non-periodic data. The specific classification process is as follows: the performance time sequence data is preprocessed, and the preprocessing comprises dimensionality reduction and normalization processing. Meanwhile, the time data and the registered node data often differ by several orders of magnitude, and the performance time sequence data needs to be normalized after dimension reduction processing, so that the time data and the node data are processed to be in one order of magnitude, and overlarge influence on training caused by overlarge time data is avoided. And after the preprocessing is finished, converting the time sequence data into two-dimensional pictures, inputting the two-dimensional pictures into the training model for classification, determining whether the input two-dimensional pictures are target two-dimensional pictures according to the output result, determining that the input two-dimensional pictures are the target two-dimensional pictures when the output result is 0, thereby determining that the performance time sequence data corresponding to the target two-dimensional pictures are periodic data, and indicating that the input two-dimensional pictures are non-target two-dimensional pictures when the output result is 1, thereby determining that the input performance time sequence data are non-periodic data.

And after the classification of the performance time sequence data is finished to obtain target performance time sequence data, predicting the target performance time sequence data through a data prediction model. The creation and prediction process of the data prediction model will be described next by means of fig. 5.

Fig. 5 schematically shows a flow chart of a method of determining a data prediction model according to an embodiment of the present disclosure. As shown in fig. 5, operation S230 includes operations S231 through S233.

It should be noted that, because the target performance time series data in the embodiment of the present disclosure includes performance data of three time series windows of day, week, and month, when creating the data prediction model, corresponding data prediction models are created according to the target performance time series data of each time dimension, that is, a first data prediction model is created according to the first target performance time series data; creating a second data prediction model from the second target performance timing data; the third target performance timing data creates a third data prediction model. The input target performance time series data are different, the parameters of the data prediction model are different, but the method for creating the data prediction model is the same.

In operation S231, autocorrelation coefficients and partial autocorrelation coefficients are determined according to the target performance timing data.

According to an embodiment of the present disclosure, determining a first autocorrelation coefficient and a first partial autocorrelation coefficient according to the first target performance timing data; determining a second autocorrelation coefficient and a second partial autocorrelation coefficient according to the second target performance time series data; and determining a third autocorrelation coefficient and a third partial autocorrelation coefficient according to the third target performance time series data.

In operation S232, a type of a data prediction model is determined according to the autocorrelation coefficients and the partial autocorrelation coefficients.

According to an embodiment of the present disclosure, determining a type of a first data prediction model according to the first autocorrelation coefficients and the first partial autocorrelation coefficients; determining the type of a second data prediction model according to the second autocorrelation coefficients and the second partial autocorrelation coefficients; determining a type of a third data prediction model according to the third autocorrelation coefficients and the third partial autocorrelation coefficients.

In operation S233, parameters of the data prediction model are determined using least squares estimation.

According to an embodiment of the present disclosure, determining parameters of the first data prediction model using least squares estimation; determining parameters of the second data prediction model using a least squares estimation; determining parameters of the third data prediction model using least squares estimation.

In one example, the time series data is judged by using an ARIMA (p, d, q) data prediction model. The ARIMA difference integration moving average autoregressive model, wherein p is a lag value, d is the difference number required by the non-stationary sequence to be converted into the stationary sequence, and q is the lag value of the prediction error. The AR model describes the relationship between current and historical values. The MA model describes the relationship of the current value to the autoregressive partial error accumulation. For a non-stationary sequence, due to its unstable variance and mean, the process is typically to convert it to a stationary sequence and then model it by reviewing its lag values as well as its random error values.

The conventional ARIMA algorithm needs to be subjected to the steps of difference, stability judgment, white noise judgment and ARMA prediction, and since the embodiment of the disclosure uses a CNN model to perform periodic time sequence classification, the ARMA prediction step in the ARIMA algorithm is directly started to be executed. ARMA modeling mainly comprises three steps: model identification, parameter estimation and model verification. Firstly, a proper ARMA (P, q) model is required to be selected, the process is also called a model order fixing process, for example, a plurality of groups of ACF autocorrelation coefficient graphs and PACF partial autocorrelation coefficient graphs are obtained by respectively processing a two-dimensional picture corresponding to target performance time sequence data with the dimensions of day, week and month through ACF and PACF, the type of a prediction model corresponding to the time dimension is determined according to the plurality of groups of ACF autocorrelation coefficient graphs and PACF partial autocorrelation coefficient graphs, for example, the ACF autocorrelation coefficient graphs are tailing, the PACF partial autocorrelation coefficient graphs are P-order truncation, and the data prediction model is determined to be an AR model; if the ACF autocorrelation coefficient graph is a q-order truncation and the PACF partial autocorrelation coefficient graph is a tailing, determining that the data prediction model is an MA model; and if the ACF autocorrelation coefficient graph is trailing and the PACF partial autocorrelation coefficient graph is trailing, determining that the data prediction model is an ARMA model. Here, the truncation refers to a range of standard deviation significantly larger than 2 times at the initial d-order, and then almost 95.5% or more of the range falls within the range of standard deviation 2 times, and the process of attenuation from a non-zero value to a small value fluctuation is very abrupt, and the d-order truncation can be considered. Tailing means that values of more than 5% fall outside the 2 standard deviation range or decay significantly to small value fluctuations are relatively slow.

After model identification, the unknown parameters of the model, including the AR parameters and the MA parameters, are determined using least squares estimation. And (5) checking whether the model is reasonable or not by adopting a method of observing a residual sequence timing diagram. Through the steps, a final ARIMA model is determined, and the ARIMA model is used for predicting the target performance time series data.

According to the embodiment of the disclosure, after determining each data prediction model, predicting first target performance time series data by using a first data prediction model to determine first prediction data; predicting second target performance time series data by using a second data prediction model to determine second prediction data; the third target performance timing data is predicted using a third data prediction model to determine third predicted data.

In one example, the first prediction data is obtained by taking target performance time series data with a day as a time series window as input and predicting by using a first data prediction model; the second prediction data is obtained by taking target performance time sequence data with week as a time sequence window as input and utilizing a second data prediction model for prediction; and the third prediction data is obtained by taking the year as the target performance time sequence data of a time sequence window as input and utilizing a third data prediction model for prediction.

Fig. 6a schematically shows one of flowcharts of a method for determining target abnormal data according to an embodiment of the present disclosure, fig. 6b schematically shows a flowchart of a method for determining initial abnormal data according to an embodiment of the present disclosure, and fig. 6c schematically shows the other flowchart of the method for determining target abnormal data according to an embodiment of the present disclosure. As shown in fig. 6a, operation S250 includes operations S251 to S252.

In operation S251, initial abnormal data is determined according to the predicted data and the actual data.

As shown in fig. 6b, the operation S251 includes operations S2511 to S2512.

In operation S2511, a difference value between the predicted data and actual data at the same time as the predicted data is determined.

In operation S2512, initial abnormal data is determined according to the difference value.

According to the embodiment of the disclosure, if the difference value is greater than a first preset threshold value, the actual data is determined to be initial abnormal data.

In one example, the predicted data has three dimensions of day, week and month at the same time, that is, the predicted data may be the same or different, the predicted data is compared with the actual data at the same time as the predicted data, that is, the actual data is compared with the number of registered nodes actually acquired, assuming that the predicted value is a and the true value is b, the difference value is a-b/b, and if the difference value is greater than a first preset threshold, for example, the first preset threshold may be 10%, the actual data is determined to be the initial abnormal data. And if the difference value is smaller than a first preset threshold value, determining that the actual data is normal data.

In operation S252, target abnormal data is determined according to the initial abnormal data and the data abnormality recognition model.

As shown in fig. 6c, operation S252 includes operations S2521 to S2523.

Similar to the data prediction model, a sample training set of the data anomaly identification model is related to a time sequence window of target performance time sequence data, namely three data prediction models are obtained by training based on three dimensions of day, week and month as the sample training set.

In operation S2521, the target performance time series data is used as a training set, and a data anomaly identification model is obtained through training.

According to the embodiment of the disclosure, the first target performance time sequence data is used as a first training set, and a first data anomaly identification model is obtained through training; training the second target performance time sequence data serving as a second training set to obtain a second data anomaly identification model; and training to obtain a third data anomaly identification model by taking the third target performance time sequence data as a third training set.

In one example, the data anomaly identification model in the embodiment of the present disclosure identifies by using isolated forests, and the isolated forest model needs to be trained before identification. In an isolated forest, outlier data is defined as "outliers that are easily isolated," which can be understood as points that are sparsely distributed and far from a population with a high density. In the feature space, sparsely distributed regions indicate that the probability of an event occurring in the region is extremely low, and thus the data falling in these regions can be considered to be abnormal. The isolated forest is an unsupervised anomaly detection method suitable for continuous data, namely training without marked samples. In solitary forest, the data set is recursively randomly partitioned until all sample points are isolated. Under this strategy of random segmentation, outliers typically have shorter paths. Intuitively, the clusters with high density need to be cut many times to be isolated, but the points with low density can be easily isolated. The iForest consists of t iTrees, each of which is a binary tree structure.

The training process is as follows:

(1) and taking the target performance time sequence data as training sample data, randomly selecting psi sample points from the training data as a sample subset, and putting the psi sample points into a root node of the tree.

(2) In the embodiment of the disclosure, only one dimension is registered, and a cut point p is randomly generated in the current node data, wherein the cut point is generated between the maximum value and the minimum value of the specified dimension in the current node data.

(3) A hyperplane is generated by the cutting point, and then the data space of the current node is divided into 2 subspaces: and placing the data smaller than p in the specified dimension at the left child node of the current node, and placing the data larger than or equal to p at the right child node of the current node.

(4) Recursion steps (2) and (3) in the child nodes, and new child nodes are continuously constructed until only one data in the child nodes or the child nodes reach the preset limited height.

(5) And (4) circulating from (1) to (4) until t isolated trees iTree are generated.

In the embodiment of the disclosure, three data anomaly identification models are obtained through training and respectively correspond to target performance time sequence data of three dimensions.

In operation S2522, a path length of the initial anomaly data is determined according to the data anomaly identification model.

According to the embodiment of the disclosure, determining a first path length of the initial abnormal data according to the first data abnormality identification model; determining a second path length of the initial abnormal data according to the second data abnormality identification model; and determining a third path length of the initial abnormal data according to the third data abnormality identification model.

In operation S2523, target abnormal data is determined according to the path length of the initial abnormal data.

According to the embodiment of the disclosure, if the first path length, the second path length and the third path length of the initial abnormal data are all smaller than a second preset threshold, the initial abnormal data are determined to be target abnormal data.

According to the embodiment of the disclosure, if at least one of a first path length, a second path length and a third path length of the initial abnormal data is greater than or equal to a second preset threshold, determining that the initial abnormal data is normal data; and adding the actual data into the training set, and retraining the data anomaly recognition model.

In one example, let the initial outlier be xi, input the initial outlier into the above three data anomaly identification models, traverse each isolation tree, and calculate the path length h (xi) of the initial outlier, where the shorter the path length is, the more easily the initial outlier is characterized to be isolated, and the more abnormal the initial outlier is characterized to be. And determining target abnormal data according to the calculation result, and if all path lengths (the first path length, the second path length and the third path length) are smaller than a third preset threshold, determining that the initial abnormal value is the target abnormal data, namely the actual data is abnormal, and needing manual intervention for emergency treatment. And if at least one path length is larger than a second preset threshold, namely at least one of the first path length, the second path length and the third path length is larger than or equal to the second preset threshold, determining that the initial abnormal value is normal data, adding the value into a training set to form new training sample data, and re-executing the training process to train the isolated forest model. In the embodiment of the disclosure, the second preset threshold may be adjusted according to the granularity of actual abnormal recognition, the smaller the threshold is, the larger the granularity of the abnormal data recognition is represented, the more loose the recognition is, and the larger the threshold is, the smaller the granularity of the abnormal data recognition is represented, and the more strict the recognition is.

Because the data prediction model and the data abnormity identification model both belong to unsupervised methods, the data does not need to be labeled manually, and the efficiency of monitoring and detecting the abnormal data is further improved.

After the initial abnormal value is determined as the target abnormal data, the fact that the actual data at the corresponding moment of the predicted data is abnormal is determined, alarm information is generated according to the target abnormal data, the alarm information comprises the time point when the target abnormal data are generated and the number of the registered nodes, and the alarm information is sent to operation and maintenance personnel.

Based on the performance data monitoring method of the distributed management cluster, the disclosure also provides a performance data monitoring device of the distributed management cluster. The apparatus will be described in detail below with reference to fig. 7.

Fig. 7 schematically shows a block diagram of a performance data monitoring apparatus of a distributed management cluster according to an embodiment of the present disclosure.

As shown in fig. 7, the performance data monitoring apparatus 800 of the distributed management cluster of this embodiment includes an obtaining module 810, a classifying module 820, a first determining module 830, a data predicting module 840, and a second determining module 850.

The obtaining module 810 is configured to obtain performance timing data of the distributed management cluster. In an embodiment, the obtaining module 810 may be configured to perform the operation S210 described above, which is not described herein again.

The classification module 820 is configured to classify the performance timing data using a classification model. In an embodiment, the classification module 820 may be configured to perform the operation S220 described above, which is not described herein again.

The first determining module 830 is configured to determine the performance timing data as target performance timing data when the performance timing data is classified as periodic data. In an embodiment, the first determining module 830 may be configured to perform the operation S230 described above, and is not described herein again.

The data prediction module 840 is configured to predict the target performance timing data using a data prediction model to determine predicted data. In an embodiment, the data prediction module 840 may be configured to perform the operation S240 described above, which is not described herein again.

The second determining module 850 is configured to determine target abnormal data according to the predicted data, the actual data, and the data abnormality recognition model. In an embodiment, the second determining module 850 is configured to perform the operation S250 described above, which is not described herein again.

The classification module 820 includes a pre-processing sub-module 821, a generation sub-module 822, and a classification sub-module 823 according to an embodiment of the present disclosure.

The preprocessing submodule 821 is used for preprocessing the performance time series data, and the preprocessing includes dimensionality reduction processing and normalization processing. In an embodiment, the preprocessing submodule 821 may be configured to perform the operation S221 described above, and is not described herein again.

The generating submodule 822 is configured to generate a two-dimensional picture according to the pre-processed performance time series data. In an embodiment, the generating submodule 822 may be configured to perform the operation S222 described above, and details are not described herein again.

The classification sub-module 823 is configured to input the two-dimensional picture into a classification model for classification. In an embodiment, the classification sub-module 823 may be configured to perform the operation S223 described above, and is not described herein again.

According to an embodiment of the present disclosure, any plurality of the obtaining module 810, the classifying module 820, the first determining module 830, the data predicting module 840, and the second determining module 850 may be combined into one module to be implemented, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the obtaining module 810, the classifying module 820, the first determining module 830, the data predicting module 840 and the second determining module 850 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented in any one of three implementations of software, hardware and firmware, or in a suitable combination of any of them. Alternatively, at least one of the obtaining module 810, the classifying module 820, the first determining module 830, the data predicting module 840 and the second determining module 850 may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.

As shown in fig. 8, an electronic apparatus 900 according to an embodiment of the present disclosure includes a processor 901 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. Processor 901 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are stored. The processor 901, ROM 902, and RAM 903 are connected to each other by a bus 904. The processor 901 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the programs may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 900 may also include input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the present disclosure. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

The present disclosure also provides a computer-readable storage medium, which may be embodied in the device/apparatus/system described in the above embodiments; or may exist alone without being assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 902 and/or RAM 903 described above and/or one or more memories other than the ROM 902 and RAM 903.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to implement the performance data monitoring method of the distributed management cluster provided by the embodiment of the disclosure.

The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 901. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, and the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, and downloaded and installed through the communication section 909 and/or installed from the removable medium 911. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The above described systems, devices, apparatuses, modules, units, etc. may be implemented by computer program modules according to embodiments of the present disclosure.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments of the present disclosure and/or the claims may be made without departing from the spirit and teachings of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A performance data monitoring method for a distributed management cluster is characterized by comprising the following steps:

acquiring performance time sequence data of a distributed management cluster;

classifying the performance time series data by using a classification model;

2. The method of claim 1, wherein the classifying the performance timing data using a classification model comprises:

3. The method of claim 2, prior to determining the prediction data, further comprising:

4. The method of claim 3, wherein said building a data prediction model from said target performance time series data comprises:

parameters of the data prediction model are determined using a least squares estimate.

5. The method of claim 4, wherein determining target anomaly data from the predictive data, actual data, and data anomaly identification model comprises:

6. The method of claim 5, wherein determining initial anomaly data from the predicted data and actual data comprises:

7. The method of claim 5, wherein determining target anomaly data from the initial anomaly data and a data anomaly recognition model comprises:

8. The method of claim 7, wherein the target performance timing data comprises first target performance timing data, second target performance timing data, and third target performance timing data, wherein the first target performance timing data is data for a time-of-day window of timing data, the second target performance timing data is data for a time-of-week window of timing data, and the third target performance timing data is data for a time-of-month window of timing data.

9. The method of claim 8, wherein said building a data prediction model from said target performance timing data further comprises:

10. The method of claim 8, wherein said building a data prediction model from said target performance timing data further comprises:

determining a second autocorrelation coefficient and a second partial autocorrelation coefficient according to the second target performance timing data;

11. The method of claim 8, wherein said building a data prediction model from said target performance time series data further comprises:

12. The method of any of claims 9 to 11, wherein said predicting said target performance time series data using a data prediction model to determine predicted data comprises:

13. The method of claim 8, wherein the training the target performance timing data as a training set comprises training a data anomaly recognition model;

taking the first target performance time sequence data as a first training set, and training to obtain a first data anomaly identification model;

taking the second target performance time sequence data as a second training set, and training to obtain a second data anomaly identification model; and

and training to obtain a third data anomaly identification model by taking the third target performance time sequence data as a third training set.

14. The method of claim 13, wherein the determining a path length of the initial anomaly data from the data anomaly identification model comprises:

and determining a third path length of the initial abnormal data according to the third data abnormality recognition model.

15. The method of claim 14, wherein determining target anomaly data based on the path length of the initial anomaly data comprises:

16. The method of claim 15, wherein determining target anomaly data based on a path length of the initial anomaly data further comprises:

17. The method of claim 1, after determining the target anomaly data, further comprising:

generating alarm information according to the target abnormal data; and

and sending the alarm information.

18. A performance data monitoring apparatus of a distributed management cluster, comprising:

the acquisition module is used for acquiring performance time sequence data of the distributed management cluster;

19. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-17.

20. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 17.

21. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 17.