CN117725543B

CN117725543B - Multi-element time sequence anomaly prediction method, electronic equipment and storage medium

Info

Publication number: CN117725543B
Application number: CN202410179705.8A
Authority: CN
Inventors: 李静; 刘畅; 王静; 丁建立
Original assignee: Civil Aviation University of China
Current assignee: Civil Aviation University of China
Priority date: 2024-02-18
Filing date: 2024-02-18
Publication date: 2024-05-03
Anticipated expiration: 2044-02-18
Also published as: CN117725543A

Abstract

The invention relates to the field of computer technology application, in particular to a multivariate time sequence anomaly prediction method, electronic equipment and a storage medium, comprising the following steps: inputting the multi-element time sequence X of the monitored server which needs to be predicted currently into a data embedding module of a multi-element time sequence abnormal prediction model to obtain a corresponding data embedding result; inputting the data embedding result into an encoder of a multi-element time sequence abnormal prediction model to obtain a corresponding encoding result; inputting the coding result into a decoder of a multi-element time sequence abnormal prediction model to obtain a corresponding decoding result; and determining whether the monitored server is abnormal in w time stamps after the prediction time and an index of the abnormality based on at least X and a decoding result. The invention can accurately predict whether the monitored server generates abnormality or not and the specific abnormal indexes in w time stamps after the prediction time.

Description

Multi-element time sequence anomaly prediction method, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer technology, and in particular, to a method for predicting multiple time series anomalies, an electronic device, and a storage medium.

Background

With the rapid development of computing technology, complex systems (e.g., social networks, cloud computing, water quality detection, etc.) have become more complex and sensitive. To ensure the reliability of these systems, various performance counters and/or sensors are widely used to closely monitor the status of the running objects (e.g., servers, services). These monitoring data are collected at equal time intervals and form a Multiple Time Series (MTS); each monitoring indicator forms a univariate time series. If some error or failure occurs in the system, such as network overload, application error, hardware failure, the monitoring data will be abnormal (e.g., surge, steep rise or fall). System failures can lead to service outages, data loss, and significant economic loss.

The current focus is mainly on identifying abnormal behavior (i.e., abnormal detection) in the MTS, which helps the operator find and recover from a fault after it occurs. But when an anomaly is detected, a fault has occurred and the reliability of the system has been reduced. In contrast, predicting an anomaly before the anomaly actually occurs may inform the operator to take action in advance. In addition, the operator may be aided in improving the efficiency of solving the underlying problem by locating a set of most abnormal indicators to account for the predicted anomalies. Thus, with adequate predictive performance, MTS anomaly prediction and interpretation can significantly improve the reliability of complex systems.

Disclosure of Invention

Aiming at the technical problems, the invention adopts the following technical scheme:

The embodiment of the invention provides a multi-element time sequence anomaly prediction method, which comprises the following steps:

S100, inputting a multi-element time sequence X= (Xt-w+1, xt-w+2, … …, xi, … … and Xt) of a monitored server to be predicted into a data embedding module of a multi-element time sequence abnormal prediction model so as to perform position embedding and space embedding on data in X, and obtaining a corresponding data embedding result; wherein Xi is a multi-element time sequence corresponding to the ith timestamp before the prediction time t, xi= { Xi1, xi2, … …, xis, … …, xin }, i takes the value of t-w+1 to t, t is the prediction time, and w is the prediction window size; xis is the value corresponding to the s-th monitoring index in Xi, the value of s is 1 to n, and n is the number of the monitoring indexes of the monitored server.

S200, inputting the data embedding result into an encoder of a multi-element time sequence abnormal prediction model to obtain a corresponding encoding result; the encoder comprises first to third encoding modules which are connected in sequence and have the same structure, wherein each encoding module at least comprises a multi-head attention mechanism module, a horizontal drawing attention module and a multi-scale feedforward network module which are connected in sequence; the first coding module is also connected with the data embedding module.

S300, inputting the coding result into a decoder of a multi-element time sequence abnormal prediction model to obtain a corresponding decoding result; the decoder at least comprises a first linear layer, a dimension relation learning module and a second linear layer which are sequentially connected, wherein the first linear layer is connected with the third coding module; the decoding result comprises a first result obtained by the first linear layer, an inter-dimension relation dependency matrix obtained by the inter-dimension relation learning module and a second result obtained by the second linear layer.

S400, determining whether the monitored server is abnormal or not and an index of the abnormality in w time stamps after the prediction time t at least based on the X, the inter-dimension relation dependency matrix and the second result.

The multivariate time sequence anomaly prediction model is a model which is obtained based on sample data training of a monitored server.

The invention has at least the following beneficial effects:

according to the method, the time dependence and the inter-dimensional dependence in the multi-element time sequence are jointly learned by adopting the transducer and the graph annotation force network framework, so that the prediction result can be more accurate.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a multivariate time series anomaly prediction method according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

For the multivariate time series MTS, the inventors of the present invention noted that three factors lead to more complex predictions and interpretations of MTS anomalies:

(1) There is a lack of sufficient samples of anomaly markers. In long-term stable operation of complex systems, there are relatively few failure events, and abnormal samples in the monitoring stream are typically covered by large amounts of regulatory data. It is impractical and expensive to manually identify and mark anomalies.

(2) Complex patterns of MTS in modern systems. With the increasing number of monitoring metrics, the multi-component time series has been transformed into a high-dimensional time series with complex patterns. This requires that the predictive model not only identify the abnormal changes of each monitored indicator over the long term, but also mine the relationships between these indicators.

(3) Subtle changes before abnormal behavior. The fault (or anomaly) does not occur suddenly, but gradually affects the monitoring indicator, such as a memory leak fault. In order to accurately identify the warning signal of an impending anomaly in a timely manner, it is necessary to capture subtle changes in the MTS that are not readily detectable.

In view of the above problems, the present invention provides a multi-component time series anomaly prediction scheme, which aims to predict whether anomalies will occur in the monitored server in the upcoming time, and how to find a set of most relevant indexes to explain the upcoming anomalies. The scheme learns both time and dimensional dependencies in MTS through a transducer and a graph-annotation network and adopts a multitasking resistance training strategy to amplify differences between normal and abnormal.

Further, as shown in fig. 1, the method for predicting multi-element time series anomalies provided by the embodiment of the invention may include the following steps:

S100, inputting a multi-element time sequence X= (X _t-w+1,X_t-w+2,……,X_i, ……,X_t) of a monitored server which needs to be predicted currently into a data embedding module of a multi-element time sequence abnormal prediction model so as to perform position embedding and space embedding on data in X, and obtaining a corresponding data embedding result; wherein X _i is a multi-element time sequence corresponding to the ith timestamp before the predicted time t, X _i={x_i1,x_i2,……,x_is,……,x_in, the value of i is t-w+1 to t, t is the predicted time, and w is the predicted window size; x _is is the value corresponding to the s-th monitoring index in X _i, the value of s is 1 to n, and n is the number of the monitoring indexes of the monitored server.

In the embodiment of the invention, the monitoring index of the monitored server can be a parameter representing the performance of the monitored server, the condition occurring inside the monitored server can be better known through the monitoring index, and all indexes comprehensively represent the state of the monitored server. The monitoring index of the monitored server may be determined based on actual situations, for example, the monitoring index of the monitored server may include broadband usage, CPU usage, network IO, disk usage, writing amount of the hard disk, and the like, and the present invention is not particularly limited. In the embodiment of the invention, the monitoring index of the monitored server can be obtained based on the existing mode, for example, the monitoring index is obtained by sampling the monitoring index by a sampler built in the monitored server.

In the embodiment of the present invention, the intervals before two adjacent time stamps are the same, and the specific interval may be set based on actual needs, for example, may be 5 minutes or 1 hour.

Multidimensional time series have two dependencies: time dependence, representing the correlation of data in each index along with time; spatial dependence characterizes the correlation between different index data at the same time. An unexpected point or subsequence that violates these dependencies is often considered an exception.

The invention predicts MTS abnormality based on sliding window. For each timestamp t, the monitoring data collected in its first w timestamps is constructed as one sample, i.e. x= (X _t-w+1,X_t-w+2,……,X_i, ……,X_t).

Further, in the embodiment of the present invention, in order to be able to learn the time dependency and the inter-index dependency in the monitored data, the present invention embeds two kinds of information: (a) time embedding along time; (b) monitoring data space embedding between metrics. In addition, to reduce the effects of outlier data that may be mixed into the training data, the present invention introduces an additional causal convolution to smooth the raw input data. Specifically, the data embedding result is obtained based on the embedding layer, and specifically the following conditions can be satisfied:

Embedding（X）=Cov（PE（X）+SE（X）+Cacov（X））。

The method comprises the steps of Embedding data with the Embedding (X) as X, wherein PE (X) represents time or sequence information of each monitoring index added on the time dimension of X, SE (X) represents information among the indexes added on each time stamp, SE (X) =W _se×CosSim（X）,W_se is a learnable weight matrix with the size of n×w, cosSim (X) is a matrix formed by cosine similarity among different monitoring indexes in X, the size of n×n is the size of each element in CosSim (X), and the cosine similarity between two corresponding monitoring indexes is obtained. Cacov (X) denotes performing causal convolution operations on X for smoothing input data, thereby improving the ability of the graph transformer to capture normal modes in MTS. Cov () represents a one-dimensional convolution operation for converting embedded data into a high-dimensional space (i.e Dimension/>> N), which enables the graph transformer to learn local patterns useful for capturing the underlying structure of MTS data.

S200, inputting the data embedding result into an encoder of a multi-element time sequence abnormal prediction model to obtain a corresponding encoding result; the encoder comprises first to third encoding modules which are sequentially connected and have the same structure, wherein each encoding module at least comprises three sub-layers which are sequentially connected, and the sub-layers are a multi-head attention mechanism module, a horizontal drawing attention module and a multi-scale feedforward network module respectively; the first coding module is also connected with the data embedding module, namely, the input of the first coding module is the output of the data embedding module, the output of the first coding module is used as the input of the second coding module, and the output of the second coding module is used as the input of the third coding module.

In the embodiment of the invention, residual connection and layer standardization are applied to each sub-layer of each coding module. To facilitate residual connection, the number of dimensions of each sub-layer is set toThe same number of dimensions as the embedded layer.

In embodiments of the present invention, the multi-headed attention mechanism module may be an 8-headed multi-headed attention mechanism module to capture the richer time dependence from multiple angles. It is known to those skilled in the art that the specific operation of the multi-head attention mechanism module may be prior art, i.e. dividing the input data into multiple heads, performing self-attention calculation on each head, and finally splicing the results of the multiple heads to obtain the final output. The expression is as follows:

MultiHeadAtt（Q，K，V）=Contact（H₁,H₂,……,H₈）。

Wherein MultiHeadAtt (Q, K, V) is the output of the multi-head attention mechanism module, H _p is the output of the p-th sub-head attention mechanism module, the value of p is 1 to 8,H _p=softmax(（Q_pK^T _p/d^1/2）V_p), Q, K, V are different linear transformations of the input of the multi-head attention mechanism module, specifically a query matrix, a key matrix and a value matrix obtained by convolution with different convolution kernels. Q _p、K_p and V _p are different linear transformations of the input of the p-th sub-head attention mechanism module, contat () represents the concatenation operation. Softmax () is an activation function, representing the normalization operation.

In an embodiment of the invention, attention is scaled by the term d ^1/2 to reduce the variance of the weights and promote stable training.

In order to solve the problem of excessive simplification weight propagation caused by dot products in the attention mechanism, the invention adds a new propagation mechanism to learn time dependence. Specifically, the graph attention is performed on the input time series according to a horizontal relation matrix, and the output result HGAT (Fh) of the horizontal attention module may satisfy the following condition:

HGAT（Fh）=GAT（V，Softmax（QK^T））；

Wherein Q, K and V are respectively a query matrix, a key matrix and a value matrix which are obtained by convolution of the input Fh of the horizontal attention module and different convolution kernels, and K ^T is a transpose matrix of K; softmax () is an activation function, and Softmax (QK ^T) represents a normalization operation on QK ^T to form a discrete distribution for each row. GAT () is to execute the graph attention mechanism, and GAT (V, softmax (QK ^T)) specifically indicates that, according to Softmax (QK ^T), executing the graph attention mechanism on V results in the output of the sub-layer, which includes a dimension conversion process performed by multiple heads, i.e., d to d/2, to d.

In the embodiment of the invention, in order to better extract the characteristics from the time sequence, the feedforward neural network is optimized so that the characteristics can be extracted on multiple scales. The invention uses 3 convolutions with different kernel sizes and activation function results, and the output result MFFN (Fm) of the multi-scale feedforward feedback meets the following conditions:

MFFN (Fm) = Contat (S ₁,S₂,S₃）×W^o), where the j-th extraction result S _j=sigmoid（Conv_j（Fm））+tanh（Conv_j (Fm)), the value of j is 1 to 3. Wherein Contat () represents a splicing operation, W ^o is a projection parameter of a full connection layer Linear, sigmoid () and tanh () are activation functions, the activation distribution of different sigmoids is different, and the activation distribution of different tanhs is different. Conv _j (Fm) represents convolving the input Fm of the multi-scale feed-forward feedback with the jth convolution kernel.

In the embodiment of the invention, the relationship between dimensions is represented by the relationship learning module between dimensions by adopting a directed graph, and the output result DRLM (Fv) meets the following conditions:

DRLM（Fv）= Softmax(CosSim（TCN(Q)）)；

The Softmax (CosSim (TCN (Q))) represents a dependency relation matrix among dimensions, and captures the dependency learned among different indexes, wherein the size is n multiplied by n. Softmax () is an activation function, TCN () is an execution time convolution network, and Q is a query matrix obtained by convolving an input Fv of the inter-dimensional relationship learning module with different convolution kernels.

It is known to those skilled in the art that the structure of the feedforward neural network may be an existing network structure.

S400, determining whether the monitored server is abnormal or not and an index of the abnormality in w time stamps after the prediction time t at least based on the X, the inter-dimension relation dependency matrix and the second result. Further, S400 may specifically include:

S401, obtaining an abnormality judgment value S (X) = - |X-X2 ⁰∣∣·（KL（M,M⁰) corresponding to X; where X2 ⁰ is the second result, i.e., i X-X2 ⁰ i is the mean square error between X and X2 ⁰. And KL () represents a bi-directional Kullback-Leibler divergence function, M is a matrix composed of cosine similarities among different monitoring indexes in X, and M ⁰ is a dimension dependency relation matrix.

In the embodiment of the invention, because only the reconstruction error of X2 ⁰ is used in the calculation formula of the abnormal judgment value of X, the error is not required to be completely close to the input. Meanwhile, errors of the inter-dimension dependency relationship are included in the score, so that early change signals caused by context abnormality and the inter-dimension dependency relationship abnormality can be effectively captured.

S402, if S (X) > S0, determining that the monitored server is abnormal in w time stamps after the prediction time t, and executing S403; otherwise, determining that the monitored server cannot be abnormal in w time stamps after the predicted time t; s0 is a preset abnormal judgment value threshold value.

S403, acquiring an abnormal judgment value IS _s=k1×∣∣Y_s-Y⁰ _s∣∣+k2×KL（M,M⁰ of the S-th monitoring index); wherein Y _s is an index vector corresponding to the s-th monitoring index in X, and Y _s=（x_（t-w+1）s,……,x_is,……,x_ts）,Y⁰ _s is a result corresponding to Y _s in the second result; k1 and k2 are a first preset weight and a second preset weight, respectively, and may be empirical values, in one exemplary embodiment k1=0.001, k2=1.

S404, sorting IS ₁ to IS _n in order from large to small, and taking monitoring indexes corresponding to the first K abnormal judgment values after sorting as indexes for abnormal occurrence.

In the embodiment of the present invention, K may be set based on actual needs, for example, k=3.

In the embodiment of the invention, the multivariate time series anomaly prediction model is a model trained based on sample data of a monitored server, and can comprise an embedded layer, an encoder and a decoder. The specific structure of the embedded layer, encoder and decoder can be seen from the foregoing description. The structure of the multivariate time series anomaly prediction model facilitates learning potential dependencies from deep multiscale features. Because of its structural features (lack of recursion), the transducer ignores the position and structural information of the original time series when applied directly to the time series. This may reduce its ability to capture sequence information and dependencies in the MTS data. Thus, a data embedding module is added to alleviate this problem.

The training process of the multivariate time series anomaly prediction model can be specifically obtained based on the following steps:

S1, acquiring a sample set; the sample data in the sample set may be a historical multi-dimensional time series of the monitored object.

S2, inputting the sample data of the current batch into the current multivariate time sequence anomaly prediction model for training to obtain a corresponding prediction result. The length of the sample data for each batch is w.

Those skilled in the art will appreciate that the specific implementation of S2 may refer to the specific implementations of S100 to S300 described above. The prediction result may be a decoding result.

S3, acquiring a current loss function value of the current multi-element time sequence abnormal prediction model based on the prediction result of the current batch and the corresponding real result, judging whether the current loss function value accords with a preset model training ending condition, if so, executing the step S5, otherwise, executing the step S4.

In the embodiment of the invention, in order to more effectively predict the abnormality, the reconstruction error is not directly used, but two-stage multi-task countermeasure loss is adopted as an optimization target. In the first stage, X1 ⁰、X2⁰ is driven to approach the original input X and the relationship matrix is driven to approach the original matrix. In the second stage, the final output is optimized away from the input and the difference of the relationship matrix is enlarged, so that the relationship matrix is free from the limitation of the original matrix and the difference between the normal sample and the abnormal sample is enlarged. That is, the multivariate time series anomaly prediction model uses a two-stage loss function during training, wherein the first stage loss function l1= |x-X1 ⁰∣∣+α×（KL（M,M⁰）+β×∣∣X-X2⁰ |and the second stage loss function l2= |x-X1 ⁰∣∣-α/2×（KL（M,M⁰）-β/2×∣∣X-X2⁰ |. Wherein X1 ⁰ is a first result, X2 ⁰ is a second result, |||| means || an absolute value function of the absolute value, represents the dot product, KL () represents the bi-directional Kullback-Leibler divergence function,The degree of matching of the dependency between the dimensions learned by the loss metric model with the initial relationship. M is a matrix formed by cosine similarity among different monitoring indexes in X, M ⁰ is a dimension dependency relation matrix, and alpha and beta are super parameters respectively for adjusting weights of different losses. When/>，/>When <0, the objective of the optimization is to expand the difference between normal and abnormal so that the difference between normal and abnormal samples is between 10% -30%.

In the embodiment of the present invention, the preset model training ending condition may be set based on actual needs, for example, L1 or L2 may not change any more in a set time period, for example, in a continuous 3-wheel training process, and the training may be ended.

S4, updating parameters of a current time sequence abnormality prediction model based on the current loss function value, and taking sample data of a next batch as sample data of the current batch to execute S2.

And S5, taking the current multivariate time sequence anomaly prediction model as a target multivariate time sequence anomaly prediction model.

In one embodiment of the present invention, S0 may be an empirical value.

In another embodiment of the invention, S0 may be derived based on a test set. Specifically, first, the test set is processed according to the steps S100 to S400, so as to obtain corresponding abnormal judgment values at all the prediction moments. Then, the minimum value of 5% of the data sorted from high to low is set as S0.

In summary, the multivariate time series anomaly prediction model used in the invention utilizes the variant of the transducer, the horizontal drawing attention module and the relationship learning module between dimensions to model the time dependency relationship and the relationship between dimensions simultaneously so as to conduct anomaly prediction, adopts a multitask countermeasure training strategy to expand the difference between normal time points and abnormal time points, can improve the performance of the model, and further improves the performance in the aspects of anomaly prediction, anomaly detection and anomaly interpretation.

According to embodiments of the present invention, the present invention also provides an electronic device, a readable storage medium and a computer program product.

In an exemplary embodiment, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in the above embodiments.

In an exemplary embodiment, the readable storage medium may be a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method according to the above embodiment.

In an exemplary embodiment, the computer program product comprises a computer program which, when executed by a processor, implements the method according to the above embodiments.

Electronic devices are intended to represent various forms of user terminals, various forms of digital computers, such as desktop computers, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

In one exemplary embodiment, the electronic device may include a computing unit that may perform various suitable actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) or a computer program loaded from a storage unit into a Random Access Memory (RAM). In the RAM, various programs and data required for the operation of the device may also be stored. The computing unit and the RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.

Further, a plurality of components in the electronic device are connected to the I/O interface, including: an input unit such as a keyboard, a mouse, etc.; an output unit such as various types of displays, speakers, and the like; a storage unit such as a magnetic disk, an optical disk, or the like; and communication units such as network cards, modems, wireless communication transceivers, and the like. The communication unit allows the device to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing units include, but are not limited to, central Processing Units (CPUs), graphics Processing Units (GPUs), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processors, controllers, microcontrollers, and the like. The computing unit performs the various methods and processes described above, such as the multivariate time series anomaly prediction method. For example, in some embodiments, the multivariate time series anomaly prediction method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device via the ROM and/or the communication unit. When the computer program is loaded into RAM and executed by the computing unit, one or more steps of the multivariate time series anomaly prediction method described above may be performed. Alternatively, in other embodiments, the computing unit may be configured to perform the multivariate time series exception prediction method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution disclosed in the present invention can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method for multivariate time series anomaly prediction, the method comprising the steps of:

S100, inputting a multi-element time sequence X= (X _t-w+1,X_t-w+2,……,X_i, ……,X_t) of a monitored server which needs to be predicted currently into a data embedding module of a multi-element time sequence abnormal prediction model so as to perform position embedding and space embedding on data in X, and obtaining a corresponding data embedding result; wherein X _i is a multi-element time sequence corresponding to the ith timestamp before the predicted time t, X _i={x_i1,x_i2,……,x_is,……,x_in, the value of i is t-w+1 to t, t is the predicted time, and w is the predicted window size; x _is is the value corresponding to the s-th monitoring index in X _i, the value of s is1 to n, and n is the number of the monitoring indexes of the monitored server; the monitoring index of the monitored server is a parameter representing the performance of the monitored server;

s200, inputting the data embedding result into an encoder of a multi-element time sequence abnormal prediction model to obtain a corresponding encoding result; the encoder comprises first to third encoding modules which are connected in sequence and have the same structure, wherein each encoding module at least comprises a multi-head attention mechanism module, a horizontal drawing attention module and a multi-scale feedforward network module which are connected in sequence; the first coding module is also connected with the data embedding module;

S300, inputting the coding result into a decoder of a multi-element time sequence abnormal prediction model to obtain a corresponding decoding result; the decoder at least comprises a first linear layer, a dimension relation learning module and a second linear layer which are sequentially connected, wherein the first linear layer is connected with the third coding module; the decoding result comprises a first result obtained by a first linear layer, an inter-dimension relation dependency matrix obtained by the inter-dimension relation learning module and a second result obtained by a second linear layer;

S400, determining whether the monitored server is abnormal or not and an index of the abnormality in w time stamps after the prediction time t at least based on the X, the relation dependency matrix among the dimensions and the second result;

The multivariate time sequence anomaly prediction model is a model obtained by training based on sample data of a monitored server;

wherein, the output result HGAT (Fh) of the horizontal drawing attention module satisfies the following conditions:

HGAT（Fh）=GAT（V，Softmax（QK^T））；

wherein Q, K and V are respectively a query matrix, a key matrix and a value matrix which are obtained by convolution of the input Fh of the horizontal drawing attention module and different convolution kernels, and K ^T is a transpose matrix of K; softmax () is the activation function and GAT () is the execution graph attention mechanism;

The output result MFFN (Fm) of the multi-scale feed-forward network module satisfies the following condition:

MFFN (Fm) = Contat (S ₁,S₂,S₃）×W^o, where the j-th extraction result S _j=sigmoid（Conv_j（Fm））+tanh（Conv_j (Fm)), the value of j is 1 to 3;

Wherein Contat () represents a splicing operation, W ^o is a projection parameter, sigmoid () and tanh () are activation functions, conv _j (Fm) represents convolving an input Fm of the multi-scale feed-forward network module with a j-th convolution kernel;

The output result of the dimension relation learning module is DRLM (Fv), and the following conditions are satisfied:

DRLM（Fv）=Softmax(CosSim（TCN(Q)）)；

Softmax (CosSim (TCN (Q))) represents the inter-dimension dependency matrix, softmax () is the activation function, cosSim () is the computation of cosine similarity between different rows, TCN () is the execution time convolution network, and Q is the query matrix obtained by convolving the input Fv of the inter-dimension relationship learning module with different convolution kernels.

2. The method of claim 1, wherein the data embedding result satisfies the following condition:

Embedding（X）=Cov（PE（X）+SE（X）+Cacov（X））；

The method comprises the steps that (1) an Embedding result of data with an Embedding value of X is obtained, PE (X) represents time or sequence information of each monitoring index added in the time dimension of X, SE (X) represents information among the indexes added in each time stamp, SE (X) =W _se×CosSim（X）,W_se is a weight matrix with the size of n×w, and CosSim (X) is a matrix formed by cosine similarity among different monitoring indexes in X; cacov (X) denotes performing a causal convolution operation on X, and Cov () denotes a one-dimensional convolution operation.

3. The method according to claim 1, wherein S400 specifically comprises:

s401, obtaining an abnormality judgment value S (X) = - |X-X2 ⁰∣∣·（KL（M,M⁰) corresponding to X; wherein X2 ⁰ is the second result, i represents the absolute function, & represents the dot product, KL () represents the bi-directional Kullback-Leibler divergence function, M is the matrix composed of cosine similarities among different monitoring indexes in X, and M ⁰ is the dependency matrix among dimensions;

S402, if S (X) > S0, determining that the monitored server is abnormal in w time stamps after the prediction time t, and executing S403; otherwise, determining that the monitored server cannot be abnormal in w time stamps after the predicted time t; s0 is a preset abnormal judgment value threshold value;

S403, acquiring an abnormal judgment value IS _s=k1×∣∣Y_s-Y⁰ _s∣∣+k2×KL（M,M⁰ of the S-th monitoring index); wherein Y _s is an index vector corresponding to the s-th monitoring index in X, and Y _s=（x_（t-w+1）s,……,x_is,……,x_ts）,Y⁰ _s is a result corresponding to Y _s in the second result; k1 and k2 are a first preset weight and a second preset weight respectively;

4. The method of claim 1, wherein the multivariate time series anomaly prediction model employs a two-stage loss function during training, wherein a first stage loss function l1= |x-X1 ⁰∣∣+α×KL（M,M⁰）+β×∣∣X-X2⁰ | and a second stage loss function l2= |x-X1 ⁰∣∣-α/2×KL（M,M⁰）-β/2×∣∣X-X2⁰ ||;

Wherein X1 ⁰ is a first result, X2 ⁰ is a second result, |||| denotes an absolute value function, represents the dot product, KL () represents the bi-directional Kullback-Leibler divergence function, M is a matrix formed by cosine similarity among different monitoring indexes in X, M ⁰ is a dependency relationship matrix among dimensions, and alpha and beta are super parameters respectively.

5. An electronic device comprising a processor and a memory;

the processor is adapted to perform the steps of the method according to any of claims 1 to 4 by invoking a program or instruction stored in the memory.

6. A non-transitory computer-readable storage medium storing a program or instructions that cause a computer to perform the steps of the method of any one of claims 1 to 4.