CN115982235A

CN115982235A - Abnormal time sequence data detection method, equipment and medium

Info

Publication number: CN115982235A
Application number: CN202211654333.7A
Authority: CN
Inventors: 李才博; 闻金鸿; 王迅; 吴斌
Original assignee: Zhaotong Liangfengtai Information Technology Co ltd
Current assignee: Zhaotong Liangfengtai Information Technology Co ltd
Priority date: 2022-12-19
Filing date: 2022-12-19
Publication date: 2023-04-18

Abstract

The invention provides a method, a device and a medium for detecting abnormal time sequence data, which relate to the field of data processing and comprise the following steps: establishing an initial model, and training the initial model by adopting a training data set; acquiring any training data in a training data set, and in an initial model, performing time sequence feature extraction on the training data to obtain feature representation; carrying out data reconstruction on the feature representation to obtain first processing data; carrying out data reconstruction on the first processed data to obtain second processed data; obtaining a prediction error by adopting a prediction module; calculating a maximum mean difference penalty based on the feature representation; performing iterative training, namely adjusting a loss function in the initial model until the training is finished to obtain a target model; the method comprises the steps of obtaining target data, calculating time sequence scores according to a preset function after the target data are processed by adopting a target model so as to obtain a target detection result, and solving the problems that the existing recurrent neural network-based abnormal time sequence detection is complex in network structure, large in memory and time consumption and low in efficiency.

Description

Abnormal time sequence data detection method, equipment and medium

Technical Field

The present invention relates to the field of data processing, and in particular, to a method, a device, and a medium for detecting abnormal time series data.

Background

Intelligent control system monitoring is a supervisory process of system measurable events and outputs, used as a reference to specify the normal functioning of the system. Deviations from historical normal data are analyzed to determine if heritage exists. In the past, this analysis was done by system monitoring experts who established normal behavior thresholds for each measured output. And if the monitored quantity exceeds the corresponding threshold value, the system is considered to have abnormal operation. As industrial control systems become more complex, the number of sensors required to obtain measurement data increases dramatically, and traditional expert-defined threshold-based methods are no longer applicable. In such a case, it has become necessary to automate the monitoring of the intelligent control system. Automated intelligent control system monitoring requires the development of methods to observe the different measurements taken by the sensors and to infer normal and abnormal behavior from these methods.

Monitoring on a set of multidimensional time series is referred to as anomaly detection in the multivariate time series. Over the past few years, a number of approaches have been developed to address this problem. The most common techniques include distance-based techniques such as k-nearest neighbors, clustering such as k-means, or classification with single-class SVMs. However, these approaches are not applicable as the intelligent control systems have advanced with greater complexity. An unsupervised anomaly detection method based on deep learning, which can infer the correlation between time series and thus identify anomalous behavior, is migrated to an anomaly timing detection technique. Among deep learning methods for detecting temporal data anomalies, methods based on Recurrent Neural Networks (RNNs) are very popular. However, it is well known that the memory and time consumption of the recurrent neural network is enormous. Therefore, it is necessary to design an abnormal timing detection method which is efficient and suitable for practical situations.

Disclosure of Invention

In order to overcome the technical defects, the present invention aims to provide a method, a device and a medium for detecting abnormal time sequence data, which solve the problems of complex network structure, large memory and time consumption and low efficiency of the conventional recurrent neural network-based abnormal time sequence detection.

The invention discloses an abnormal time sequence data detection method, which comprises the following steps:

establishing an initial model, and training the initial model by adopting a training data set; the initial model comprises an encoding module, a first decoding module, a second decoding module, a prediction module and a penalty module;

acquiring any training data in the training data set, and inputting the training data into the initial model;

in the initial model, a coding module is adopted to extract time sequence characteristics of the training data to obtain characteristic representation;

adopting a first decoding module to carry out data reconstruction on the feature representation to obtain first processing data;

the first processing data are subjected to data reconstruction sequentially through the encoding module and the second decoding module to obtain second processing data;

adopting a prediction module to perform time sequence prediction based on the feature representation to obtain a prediction error between a predicted value and an actual value of the training data;

calculating a maximum mean difference penalty based on the feature representation by a penalty module;

calculating training output by adopting the calculation module according to the training data, the first processing data, the second processing data, the prediction error and the maximum mean difference penalty;

performing iterative training, namely adjusting a loss function in the initial model until the training is finished to obtain a target model;

and obtaining target data, and calculating a time sequence score according to a preset function after the target data is processed by adopting the target model so as to obtain a target detection result.

Preferably, a second reconstruction error is calculated from the second processed data and the training data;

the training includes a first training phase during which adjustments are made to minimize the second reconstruction error and a second training phase during which adjustments are made to maximize the second reconstruction error.

Preferably, the performing, by using an encoding module, time series feature extraction on the training data to obtain a feature representation includes:

performing feature extraction on the training data under a time dimension by adopting a gating circulation unit to obtain first output data;

processing the first output by adopting a multivariate Resnet module to obtain second output data containing the spatial characteristics corresponding to the training data;

and mapping the second output data to the spatial features corresponding to the training data through a full connection layer to obtain the feature representation.

Preferably, the processing the first output data by using the multivariate Resnet to obtain second output data including spatial features corresponding to the training data includes:

performing size reshaping based on the first output data to obtain feature maps of a plurality of sizes;

and processing the feature maps of all sizes through the three Resnet modules and the pooling layer respectively, and performing dimension splicing to obtain spatial features corresponding to the training data as second output data.

Preferably, the data reconstruction of the feature representation by using the first decoding module to obtain the first processed data includes:

performing feature mapping on the feature representation by adopting a full connection layer to obtain first intermediate data;

inputting the first intermediate data into a multivariate Resnet module to reconstruct space characteristics to obtain second intermediate data;

and processing the second intermediate data by adopting a gating circulating unit to obtain first processing data.

Preferably, the second decoding module is structurally identical to the first decoding module;

the second decoding module is connected in parallel with the first decoding module.

Preferably, the performing time series prediction based on the feature representation to obtain a prediction error between a predicted value and an actual value of the training data includes:

in the prediction module, the feature representation is processed by adopting two fully-connected layers connected in series, so that prediction data at the time of T +1 is obtained based on the features of T time steps, and the difference value between the predicted value and the actual value at each time in the training data is obtained to generate a prediction error.

Preferably, said calculating a training output according to the training data, the first processed data, the second processed data, the prediction error and the maximum mean difference penalty by using the calculating module includes:

calculating a first reconstruction error according to the training data and first processing data, and calculating a second reconstruction error according to the training data and second processing data;

weighting and summing the first reconstruction error, the second reconstruction error, the prediction error, and the maximum mean difference penalty by preset weights, respectively, to generate a training output.

The present invention also provides a computer apparatus, comprising:

a memory for storing executable program code; and

and the processor is used for calling the executable program codes in the memory, and the execution step comprises the abnormal time series data detection method.

The present invention also provides a computer-readable storage medium, having stored thereon a computer program,

the computer program, when executed by a processor, implements the steps of the abnormal timing data detection method.

After the technical scheme is adopted, compared with the prior art, the method has the following beneficial effects:

according to the abnormal time sequence data detection method, a neural network of a self-coding mechanism based on the multivariate Resnet and the GRU is established for the characteristics of the multivariate time sequence data, the model architecture is improved, the spatio-temporal characteristics of the time sequence data are captured, the characteristic constraint is increased, the sensitivity to the abnormal data is improved, the model is trained in an unsupervised mode and then used for detection, the memory consumption is reduced, and the detection efficiency is improved.

Drawings

FIG. 1 is a flowchart illustrating a method for detecting abnormal timing data according to a first embodiment of the present invention;

fig. 2 is a schematic diagram of a network structure of an initial model in a first embodiment of a method for detecting abnormal time series data according to the present invention;

fig. 3 is a schematic block diagram of a computer device according to the present invention.

Reference numerals are as follows:

9-a computer device; 91-a memory; 92-a processor.

Detailed Description

The advantages of the invention are further illustrated by the following detailed description of the preferred embodiments in conjunction with the drawings.

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at" ... "or" when ...or" in response to a determination ", depending on the context.

In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience of description and for simplicity of description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention.

In the description of the present invention, unless otherwise specified and limited, it should be noted that the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection through an intermediate medium, and those skilled in the art will understand the specific meaning of the terms as they are used in the specific case.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.

The first embodiment is as follows: the present embodiment provides a method for detecting abnormal time series data, referring to fig. 1 and 2, including the following steps:

s100: establishing an initial model, and training the initial model by adopting a training data set; the initial model comprises a coding module, a first decoding module, a second decoding module, a prediction module and a penalty module; acquiring any training data in the training data set, and inputting the training data to the initial model;

in the embodiment, an encoding module is provided for extracting features, a first decoding module and a second decoding module are respectively used for feature reconstruction, a prediction module is used for predicting a sequence at the current moment based on data in a past period of time, a penalty module is used for constraining the features, a calculation module is arranged for calculating scores by using the outputs of the encoding module, the first decoding module, the second decoding module, the prediction module and the penalty module in an actual reasoning stage, and the scores exceed a threshold value and are determined as abnormal time series data, so that the detection of the abnormal time series data is realized.

It should be noted that the initial model is a model before training, the structure of the initial model is consistent with that of the target model after training, the training process is used for adjusting parameters in the initial model and fixing the parameters after training, so that the target model can be obtained, and the training process of the embodiment is completed in an unsupervised manner. The training data set comprises a plurality of training data, and due to the lack of actual abnormal data, the embodiment adopts an unsupervised mode to model normal time sequence data and capture the inherent relation of the data. When abnormal data enters the network, a predicted value (embodied as a training error in the network) with a larger difference from the normal data distribution is generated, a confidence threshold value of the normal data is preset, and when the predicted value is larger than the threshold value, the current input sequence data is considered to contain the abnormal data.

S200: in the initial model, a coding module is adopted to extract time sequence characteristics of the training data to obtain characteristic representation;

specifically, the encoding module is used for deep timing feature extraction so as to determine whether the timing data is abnormal or not in the following. The adoption of the coding module to perform time sequence feature extraction on the training data to obtain feature representation comprises the following steps:

s210: performing feature extraction on the training data in a time dimension by adopting a gate control circulation unit to obtain first output data;

illustratively, the gating cycle unit controls information flow through a gate which can be learned, so as to capture the dependence relationship with large time distance in a time sequence, and the embodiment is used for capturing and outputting the characteristics of training data in a time dimension;

s220: processing the first output by adopting a multivariate Resnet module to obtain second output data containing the spatial characteristics corresponding to the training data;

in this embodiment, a multivariate Resnet module is used to capture spatial features. The time-space characteristics of time sequence data are captured by adopting the multiple Resnet and GRU, so that the data can be better modeled, and the abnormal data can be more sensitive. Specifically, the processing the first output data by using the multivariate Resnet to obtain second output data including spatial features corresponding to the training data includes:

s221: performing size reshaping based on the first output data to obtain feature maps of a plurality of sizes;

it should be noted that the above-mentioned output sequence reshape from the gate loop unit (GRU) is applied to a plurality of different shapes, and two-dimensional feature maps with different sizes are obtained, similar to a single-channel image (i.e. feature maps with a plurality of sizes).

S222: and processing the feature maps of all sizes through the three Resnet modules and the pooling layer respectively, and then performing dimension splicing to obtain the spatial features corresponding to the training data as second output data.

Illustratively, feature maps of a plurality of sizes are respectively passed through three different Resnet modules and pooling layers (three groups of Resnet modules and pooling layers connected in sequence), so as to obtain three feature maps of the same size, and the three feature maps are spliced together through dimensions to form a three-channel feature map. The purpose of two-dimensional feature extraction with different sizes is to capture spatio-temporal features from multivariate time series data from different time and space windows to better model and code the data source. It should be noted that the above-described processing procedure of the multiple Resnet modules is applied to the multiple Resnet modules in the first decoding module/the second decoding module described below, and the multiple Resnet modules have the same structure and may be replaced with the processing objects.

S230: and mapping the second output data to the corresponding spatial features of the training data through a full connection layer to obtain the feature representation.

Based on the above steps S210-S230, by way of example and not limitation, assume that W (sample data) is [ batch size, T, D ] shaped data, and batch size is the input data amount of a single batch in training; t is a time step, namely the length of a continuous time sequence; d is the dimension of the data, i.e. the detection data originates from D sensors. W passes through a gate control loop unit (GRU) to capture its features in the time dimension to obtain an output, which then passes through a multivariate Resnet to further capture the spatial features of the normal time sequence data by using the feature extraction capability of Resnet, and then maps the features to a deep-level feature representation via a full-link layer, which is the encoder AE (i.e., the processing of the above-mentioned encoding module).

S300: adopting a first decoding module to carry out data reconstruction on the feature representation to obtain first processing data;

in this embodiment, assuming that the first encoding module (decoder) is DE1 and the second encoding module (decoder) is DE2, specifically, the data reconstruction of the feature representation by using the first decoding module to obtain the first processed data includes:

s310: performing feature mapping on the feature representation by adopting a full connection layer to obtain first intermediate data;

in this process, the decoding (reconstruction) of the signature approximates the inverse operation of the encoding module, so that when the encoding module performs the processing of GRU, multivariate Resnet, full connectivity layer in turn, a full connectivity layer-multivariate Resnet-GRU structure is performed in the decoding module.

S320: inputting the first intermediate data into a multivariate Resnet module to reconstruct space characteristics to obtain second intermediate data;

for illustration, the processing procedure of the multiple Resnet modules may refer to the processing of the multiple Resnet modules in the coding module, and the processing manners of the two modules are similar and will not be described herein again.

S330: and processing the second intermediate data by adopting a gating circulating unit to obtain first processing data.

That is, the above (deep) feature means that the original data X1 and X2 are restored (reconstructed) through two decoders, respectively, in particular, two fully-connected layer-multivariate Resnet-GRU structures.

S400: the first processing data are subjected to data reconstruction sequentially through the encoding module and the second decoding module to obtain second processing data;

it should be noted that the second decoding module is consistent with the first decoding module in structure; and the second decoding module is connected in parallel with the first decoding module, so the processing procedure of the second decoding module can refer to the above-mentioned S310-S320, and the feature representation in each of the second decoding module is replaced by the feature representation output by the encoding module of the first processing data.

Based on the above S300 to S400, the network is trained based on the reconstruction errors (the first reconstruction error and the second reconstruction error described below in the present embodiment) of X1, X2, and W, and two sets of codecs DE1 and DE2 can be obtained. When obvious abnormal data are met, the abnormal data can be well predicted based on reconstruction errors. However, in practice, the difference between the abnormal data and the normal data may be small, or the abnormal value from a specific sensor may be small due to the data type and difficult to detect. In the present application, in order to cope with this situation, a countermeasure training method is adopted, wherein X1 is input as an input into the whole self-decoder (i.e. data reconstruction is performed sequentially through the encoding module and the second decoding module) structure, a deep feature representation (i.e. an updated feature representation, different from the feature representation obtained by W) is obtained, and then DE2 is used to identify X1, at this time, X1 is taken as a false sample to deceive DE2, and although the error between X1 and W is already small after reconstruction, it still needs to be regarded as abnormal data. That is, two self-encoders with strong reconstruction capability need to be trained in the early stage of the whole network, and the output of one decoder needs to be used in the later stage to try to cheat the other decoder, and the other decoder needs to find the self-encoders as far as possible. Since the network is unsupervised, it is shown that in the first training stage, when X1 spoofs DE2, it is necessary to minimize | | | W-DE2 (AE (X1)) | error as much as possible, and as in the first training stage, DE2 should identify the data as abnormal data as much as possible, it is necessary to maximize | | | W-DE2 (AE (X1)) | error, that is, minimize- | | W-DE2 (AE (X1)) | |. During the antagonistic training, the deception process is optimized firstly, then the identification process is optimized, and by the method, a stronger encoder can be obtained, and a very small error can be amplified, so that a better abnormal time sequence detection effect is achieved. Loss1 and loss2 in the graph represent the reconstruction errors of X1 and X2, respectively. Thus, in particular, a second reconstruction error (i.e., | | W-DE2 (AE (X1)) |) is calculated from the second processed data and the training data; during the training process, two training phases may be divided, a first training phase to adjust to minimize the second reconstruction error and a second training phase to adjust to maximize the second reconstruction error.

S500: adopting a prediction module to perform time sequence prediction based on the feature representation to obtain a prediction error between a predicted value and an actual value of the training data;

illustratively, the normal time series data is essentially predictive, i.e., the sequence of the current time can be predicted based on the past time data. In the present application, the network architecture design of the encoder and decoder has a very strong ability to extract spatio-temporal features, and thus, the metadata represented by T +1 time steps can be predicted based on the deep feature representation of W from T time steps.

Specifically, the performing time series prediction based on the feature representation to obtain a prediction error between a predicted value and an actual value of the training data includes: in the prediction module, the feature representation is processed by adopting two fully-connected layers connected in series, so that prediction data at the time of T +1 is obtained based on the features of T time steps, and the difference value between the predicted value and the actual value at each time in the training data is obtained to generate a prediction error.

Therefore, the deep characteristic representation of T time steps is input into two full-connection layers connected in series to predict the data at the time T +1, and the mean square error of the predicted value and the sequence value of the real time T +1 in the data can be used as a loss function to optimize the network.

S600: calculating a maximum mean difference penalty based on the feature representation by using a penalty module;

in particular, the distribution of the deep features (i.e. in the above-mentioned feature representation of the present application) should not be distribution-invariant, and for better constraining the deep feature representation, it is mapped to a better feature space, where the MMD (maximum mean difference) penalty is added and MMD loss is used as the training loss to optimize the encoder. The maximum mean difference is a measure of the highest frequency used in the transfer learning, which measures the distance between two distributions in the regenerated hilbert space, and is a nuclear learning method. It is simply understood that the mean distance of two stacks of data is calculated, but in practice it is difficult to calculate, and the two distributions are mapped to another space to calculate the distance. In the present application, considering that data is multivariate, the distribution of data from different sensors should also be different, and therefore a mixed gaussian distribution is adopted as the spatial distribution to be mapped to. In actual use, the number of gaussian kernel functions can be reasonably selected according to the dimension of the data source, and generally, the number of kernel functions is reasonable about half of the dimension of the data source. After the MMD penalty term is calculated, the value is used as a loss value to optimize an encoder module of the network.

S700: performing iterative training, namely adjusting the loss function in the initial model until the training is completed to obtain a target model;

specifically, the loss function of the whole network (i.e. the initial model) includes 3, which are:

Loss_1＝3/n*||W-X1||+(1-1/n)*||W-DE2(AE(X1)||+3/n*MMD_loss(Z,G)

Loss_2＝3/n*||W-X2||-(1-1/n)*||W-DE2(AE(X1)||

Loss_3＝MSE(y_label-y_pred)+3/n*MMD_loss(Z,G)

where n represents the epochs of the training, i.e., n is incremented with training by 1 after each epoch. In this way, the network is divided into two training phases (i.e. the first training phase and the second training phase).

S800: and obtaining target data, and calculating a time sequence score according to a preset function after the target data is processed by adopting the target model so as to obtain a target detection result.

In the foregoing step, specifically, a calculation module is provided to calculate a time series score according to the target data, the first processed data, the second processed data, the prediction error, and the maximum mean difference penalty obtained from the target model, and includes:

calculating a first reconstruction error according to the target data and the first processed data, and calculating a second reconstruction error according to the target data and the second processed data; the first reconstruction error, the second reconstruction error, the prediction error, and the maximum mean difference penalty are weighted and summed with preset weights, respectively, to generate a training output, which may be referred to as follows.

Illustratively, in the inference phase, the score of sequence data (i.e., the sequence score) is obtained by the following formula:

Score＝a*||W-X1||+b*||W-DE2(AE(X1)||+c*MSE(y_label-y_pred)+d*MMD_loss(Z,G)

and (3) adjusting the four coefficients of a, b, c and d in advance to obtain the score condition of a certain input sequence, and when score is greater than a set threshold value, the group of data sequences is considered to contain abnormal data.

The abnormal time sequence data detection method provided by the embodiment achieves 99.18% of accuracy and 66.78% of recall rate on a SWAT data set through training and testing; the accuracy rate of 99.80% and the recall rate of 14.3% are achieved on the WADI data set, so that the effectiveness of the method designed in the application can be demonstrated, the model provided by the application is simple in structure and high in convergence speed, convergence can be achieved within about 30 minutes on more than 40 ten thousand pieces of data of SWAT, and an excellent abnormal time sequence data detection effect is achieved. In addition, the unsupervised network structure and the antagonistic training method greatly improve the network performance and further improve the realization of the automatic detection process of the control system.

Example two: in order to achieve the above object, the present invention further provides a computer device 9, as shown in fig. 3, the computer device may be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc. executing a program. The computer device of the embodiment at least includes but is not limited to: a memory 91, a processor 92, communicatively connected to each other by a device bus, as shown in fig. 3. It should be noted that fig. 3 only shows a computer device with components, but it should be understood that not all of the shown components are required to be implemented, and more or fewer components may be implemented instead.

In this embodiment, the storage 91 may be an internal storage unit of a computer device, such as a hard disk or a memory of the computer device. In other embodiments, the memory 91 may also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device. In this embodiment, the memory 91 is generally used to store an operating device installed in a computer device and various application software, such as a program code of the abnormal time series data detection method according to the first embodiment; training samples, model parameters such as (first/second) processed data, a first reconstruction error, the second reconstruction error, a prediction error, and a maximum mean difference penalty; target detection results, etc. Further, the memory 91 can also be used to temporarily store various types of data that have been output or are to be output.

Processor 92 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 92 is typically used to control the overall operation of the computer device. In this embodiment, the processor 92 is configured to execute the program code stored in the memory 91 or process data, for example, execute a method for detecting abnormal time series data according to the first embodiment.

Example three: to achieve the above objects, the present invention also provides a computer readable storage device including a plurality of storage media such as a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or D memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by a processor 92 implements corresponding functions. The computer readable storage medium of the embodiment is used for storing data, and when being executed by the processor 92, the computer readable storage medium implements an abnormal time series data detection method of the first embodiment.

It should be noted that the embodiments of the present invention have been described in a preferred embodiment and not limited to the embodiments, and those skilled in the art may modify and modify the above-disclosed embodiments to equivalent embodiments without departing from the scope of the present invention.

Claims

1. An abnormal time series data detection method is characterized by comprising the following steps:

establishing an initial model, and training the initial model by adopting a training data set; the initial model comprises a coding module, a first decoding module, a second decoding module, a prediction module and a penalty module;

calculating a maximum mean difference penalty based on the feature representation by using a penalty module;

2. The detection method according to claim 1, characterized in that:

calculating a second reconstruction error from the second processed data and the training data;

3. The detection method according to claim 1, characterized in that: the adoption of the coding module to perform time sequence feature extraction on the training data to obtain feature representation comprises the following steps:

performing feature extraction on the training data in a time dimension by adopting a gate control circulation unit to obtain first output data; processing the first output by adopting a multivariate Resnet module to obtain second output data containing the spatial characteristics corresponding to the training data;

4. The detection method according to claim 3, wherein the processing the first output data by using multivariate Resnet to obtain second output data containing spatial features corresponding to the training data comprises:

and processing the feature maps of all sizes through the three Resnet modules and the pooling layer respectively, and then performing dimension splicing to obtain the spatial features corresponding to the training data as second output data.

5. The detection method according to claim 1, wherein the performing data reconstruction on the feature representation by using a first decoding module to obtain first processed data comprises:

6. The detection method according to claim 1, characterized in that:

the second decoding module is consistent with the first decoding module in structure;

7. The detection method according to claim 1, wherein the employing the prediction module to perform time series prediction based on the feature representation to obtain a prediction error between a predicted value and an actual value of the training data comprises:

8. The method of claim 1, wherein said calculating, with said computing module, a training output based on training data, first processed data, second processed data, prediction error, and a maximum mean difference penalty comprises:

calculating a second reconstruction error according to the training data and second processing data; calculating a first reconstruction error according to the training data and the first processing data;

9. A computer device, characterized by: the computer device includes:

a memory for storing executable program code; and

a processor for invoking said executable program code in said memory, the executing step comprising the abnormal temporal data detection method of any one of claims 1 to 8.

10. A computer-readable storage medium having stored thereon a computer program, characterized in that:

the computer program when executed by a processor implements the steps of the abnormal time series data detection method of any one of claims 1 to 8.