CN115903741A

CN115903741A - Data anomaly detection method for industrial control system

Info

Publication number: CN115903741A
Application number: CN202211445673.9A
Authority: CN
Inventors: 瞿治国; 肖峰
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2022-11-18
Filing date: 2022-11-18
Publication date: 2023-04-04
Anticipated expiration: 2042-11-18
Also published as: CN115903741B

Abstract

The invention discloses an industrial control system data anomaly detection method based on BiGRU-VAE and Deep SVDD, which comprises the following steps: acquiring safety water treatment data and/or water distribution data to be detected, and preprocessing the data; inputting the preprocessed data into a trained BiGRU-VAE-based reconstruction model for reconstruction to obtain a reconstruction error; performing exponential weighted moving average (EMWA) smoothing on the reconstruction error to obtain a smoothed reconstruction error; inputting the reconstructed error after the smoothing treatment into a trained data anomaly detection model based on Deep SVDD; and determining a data anomaly detection result according to the output of the data anomaly detection model.

Description

Data anomaly detection method for industrial control system

Technical Field

The invention belongs to the technical field of industrial control systems, and mainly relates to a data anomaly detection method for an industrial control system based on BiGRU-VAE and Deep SVDD.

Background

Industrial control systems are mainly used for managing and supervising industrial processes in critical infrastructures. Various network attacks are usually performed on various attack points, including physical sensors and actuators, access points of network communication infrastructures, and the like, so that the capability of detecting various sudden abnormalities in Industrial Control Systems (ics for short) in real time is important for ensuring safety during operation and maintaining and repairing the system. Network security in ICSs is still an emerging field, and although experts can well define the required system behaviors of the ICSs, the reasons for the infinite abnormal event outbreaks can be various and are difficult to automatically identify.

Currently, the best technology for detecting anomalies in ICSs is anomaly detection based on machine learning and recent deep learning, which can realize various anomaly event detections in sensor networks through system operating condition monitoring for various intrusion detections.

However, when using the common neural network and Variational Encoder (VAE) model, the following problems often exist in this kind of method: 1) As the size of ICSs and their internal dependencies increase, the complexity often reaches a level where identifying abnormal system behavior is no longer trivial. For example, when multi-mode monitoring sensors are mixed with actuator status, it is common to involve hundreds of sensors and actuators, particularly as it relates to specific water treatment and water dispensing applications. Therefore, it is often difficult to develop an abnormality detection technique capable of coping with data from various heterogeneous sensors and actuators. 2) VAE models are not suitable for time series modeling. For example, in the past VAE-based anomaly detection method, the time series is regarded as a sliding window, and the time relationship between the sliding windows in the encoding process is often ignored, so that the time series modeling is invalid. 3) The use of VAE models typically requires setting thresholds for anomaly detection. VAE detects anomalies primarily by comparing the reconstruction results with the original input (i.e., reconstruction errors), and for a large number of different types of data, it is difficult to set a uniform threshold for reconstruction errors when significant fluctuations occur due to different usage patterns.

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects of the prior art, the invention provides an industrial control system data anomaly detection method based on BiGRU-VAE and Deep SVDD. A model capable of processing the time relation between the sliding windows and having strong robustness is arranged, so that the model has strong abnormality detection capability. The encoder of the VAE is used to learn the distribution of the training data to generate compressed values of the training data, and the decoder is used to reconstruct the compressed values to obtain a more robust model. Then, by using bidirectional GRU (Gate recovery Unit) as the encoder and decoder of VAE. In addition, the difference value between the reconstructed value and the actual value is put into a Deep Support Vector Data Description (Deep SVDD for short) for training, and then an optimal hypersphere classifier capable of detecting abnormal Data is obtained. The hyper-sphere is determined by the sphere center and the radius of the hyper-sphere to judge whether the sample point is abnormal in the feature space. An anomaly detection method for data of an industrial control system sensor is used based on a mixed model of BiGRU-VAE and Deep support vector data description (Deep SVDD). The model can perform anomaly detection on industrial control sensor data typified by SWaT (safe Water treatment data set) and WADI (Water distribution data set).

The present invention aims to solve the following specific problems:

(1) The ability to learn continuously is not strong. When the model has a good abnormal detection effect in the data of one industrial control system sensor, the model is used for directly testing the data of the other industrial control system sensor, and the effect of the model is reduced; the generalization ability of the model is not strong and overfitting is shown.

(2) Long-term dependencies of sequential time series data are tracked. To better capture the temporal correlation of the time series, the encoder and decoder of the VAE are designed as BiGRU. In contrast to LSTM, GRU saves hardware computational power and time cost when tracking long-term dependencies of sequential time-series data, and BiGRU otherwise processes sequences in both the positive and negative directions. This has the advantage that not only past data but also future information is taken into account

(3) Suppressing frequent error peaks. Normal behavior may also lead to sharp false peaks due to unpredictability of system behavior.

(4) Threshold adaptation problem for anomaly detection. And putting the smoothed reconstruction error into Deep SVDD for training. The Deep SVDD determines a threshold value by optimizing the center and the radius of the hyper-sphere, and improves the performance of anomaly detection.

The technical scheme is as follows: in order to solve the technical problems, the technical scheme adopted by the invention is as follows:

in a first aspect, a method for detecting data anomaly of an industrial control system based on BiGRU-VAE and Deep SVDD is provided, which comprises the following steps:

acquiring safe water treatment data and/or water distribution data to be detected,

preprocessing the safe water treatment data and/or the water distribution data to obtain preprocessed data;

inputting the preprocessed data into a trained BiGRU-VAE-based reconstruction model for reconstruction to obtain a reconstruction error;

performing exponential weighted moving average EMWA smoothing processing on the reconstruction error to obtain a smoothed reconstruction error;

inputting the smoothed reconstruction error into a trained data anomaly detection model based on Deep SVDD;

determining a data anomaly detection result according to the output of the data anomaly detection model;

the training method of the data anomaly detection model based on the BiGRU-VAE reconstruction model and the Deep SVDD comprises the following steps:

acquiring a safety water treatment data set and a water distribution data set which are subjected to pretreatment;

inputting the data set into a pre-constructed BiGRU-VAE-based reconstruction model for training, and obtaining a model parameter with the highest accuracy for a current task model to obtain a trained BiGRU-VAE-based reconstruction model and a reconstruction error output by the model;

performing exponential weighted moving average (EMWA) smoothing on the reconstruction error to obtain a smoothed reconstruction error;

inputting the reconstructed error after the smoothing treatment into a pre-constructed data anomaly detection model based on Deep SVDD for training, so as to determine that the threshold finds the center and the radius of the optimal hypersphere through Deep SVDD, and obtain the trained data based on Deep SVDD.

In some embodiments, the performing pre-processing comprises:

carrying out missing value processing on the data by using a linear interpolation method, and carrying out standardization processing to obtain normalized data;

and (3) segmenting each time sequence of the normalized data into a set G of segments by using a preset sliding window.

Further, in some embodiments, the missing value processing of the data by using the linear interpolation method includes:

calculating the slope b according to the data before and after the missing value:

wherein x is _p+1 ，x _p-1 Respectively represent missing values x _p Data before and after, t _p+1 ，t _p-1 Respectively represent missing values x _p Time steps before and after;

calculating a missing value x from the slope b _p ：

x _p ＝x _p-1 +b×(t _p -t _p-1 )

Wherein, t _p Indicates the missing value x _p Time step (c).

Further, in some embodiments, the normalization process results in normalized data, including:

where x denotes the original data, x _max Representing the maximum value of data, x _min Denotes the minimum value of data, x _norm The normalized data is represented.

Further, in some embodiments, segmenting each time series of the normalized data into a set G of segments using a preset sliding window includes:

based on a preset window size Ω, the step size of window sliding is fixed to 1, and a time series X = { X = is given ₁ ,x ₂ ,…,x _n In which x _m Is the ith time stampM-dimensional readings, containing information about m different channels, n representing the length of data in the time series, giving the state of the time series for t segments: x is the number of _t ＝{x _t-Ω+1 ,…,x _t After the sliding window partitions the time series, the time series X is divided into a set of segments G = { X = } _Ω ,…,x _t ,…,x _n }。

In some embodiments, the BiGRU-VAE based reconstruction model comprises:

VAE is a deep generative model aimed at learning the probability distribution p (x) in data x and generating new data x from sampled reconstructions of the data distribution

p(x)；

Firstly, an encoder converts data x into probability distribution of a potential variable z, then randomly samples the potential variable z, and then a decoder decodes the potential variable z into reconstructed output to fit x->Neural networks of z are called inference networks

Fitting>

Called the generating network p _θ (z|x)：

Wherein phi and theta are parameters for identifying the network and generating the network respectively;

representative of encoders, D _θ (. Represents a decoder;

in order to capture the time correlation and the dependency of data input data, a GRU network unit is introduced into a VAE network by utilizing the memory function of a GRU, so that proper features are extracted from a hidden layer and an input sequence is reconstructed;

parameterizing the encoder by a BiGRU with an activation function: the number of samples is n, the input number is d, the number of hidden units is h, and the small batch of input data X of a given time step t _t ∈R ^n×d Hidden state with previous layer forward time step

And hidden status of the next layer of inverted time step>

Forward candidate hidden status +calculating time step t>

And a candidate hidden state in reverse>

/>

Wherein

Is a weight parameter, is->

As deviation parameter, R _t To reset the gate, tanh (-) is the activation function;

updating the door Z according to the current time step _t To hidden state of forward time step

And a forward candidate concealment time step>

Hidden status of a reverse time step>

And a candidate hidden state in reverse>

To generate a sequence of hidden states in two directions: forward → and backward ←; two-sided final encoder hidden state interconnect to produce a vector

Generating, by the encoder, mean and variance μ and σ subject to a Gaussian distribution, μ and ∈ hiding state H from the final encoder using two fully connected layers with linear and Softplus activation, respectively _t The following results are obtained:

wherein

As weight parameter, b _μ ，b _σ Is inclined toA difference parameter, softplus is an activation function;

from an approximate posterior when reconstructing errors

Random sampling is carried out, and an epsilon-N (0,I) auxiliary noise variable, which represents an element multiplied by a product quantity, is taken by adopting a reparameterization technique.

Further, in some embodiments, a loss function is employed in the training of the BiGRU-VAE based reconstruction model

Wherein

Representing the reconstruction error, D _KL (q _φ (z|x)||p _θ (z) is the true distribution p _θ (z) and the selected distribution->

KL divergence in between, encouraging a distribution->

Approximate true distribution p in training _θ (z)；

To ensure that the desired lower limit of the KL divergence distribution is positive, a batch training BN is used to apply to the mean vector of the encoder output, where k is the dimension of the latent variable and b is the size of the batch of samples; mu.s _i,j 、σ _i,j Mean and variance, respectively;

the sample mean of the above samples was substituted with the expectation, resulting in a KL divergence:

wherein mu _i 、σ _i Mean and variance, respectively; var [ mu ] of _i ]Representing an unbiased estimate;

batch normalization is used for constraining the distribution of the mean values, so that the mean values are controlled in a reasonable range;

calculated by the posterior distribution mean:

is mu _i Average value after BN treatment; mu.s _Bi And σ _Bi Is represented by mu _i Mean and standard deviation of; by replacing mu in KL formula _i The following formula is obtained:

beta and gamma are parameters in batch normalization, controlling mu separately _i Variance and mean of the distribution; the beta and gamma parameters are reasonably controlled, and the KL divergence term always keeps a positive lower bound.

In some embodiments, performing exponentially weighted moving average EMWA smoothing on the reconstruction error to obtain a smoothed reconstruction error includes:

at time k, from the original reconstruction error d _k Obtaining the reconstructed error e after smoothing _k By controlling α (0)<α<1) Deciding the ability of the EWMA to track sudden changes in data:

e _k ＝αd _k +(1-α)e _k-1

wherein alpha represents a hyper-parameter, d _k Representing the original reconstruction error, e _k After the representation smoothing processThe reconstruction error of (1).

In some embodiments, the Deep SVDD based data anomaly detection model includes:

mapping original data from an input space to a high-dimensional feature space through a neural network, then finding a hypersphere containing as many normal samples as possible on the feature space, and discharging abnormal samples in the hypersphere as much as possible, when using the hypersphere to identify new samples, if the samples are in the boundary of the hypersphere, then considering the samples as normal samples, otherwise, the samples are abnormal samples:

from the input space

To the characteristic space->

Make->

Is provided with L hidden layers and weight set W = { W ¹ ,…,W ^L Is in which>

Means are>

The weight of the layer;

the target training sample is a reconstructed error after smoothing, i.e.

Expressed as e = { e = { e ₁ ,e ₂ ,e _t ,…,e _n Where n is the number of samples;

the goal of Deep SVDD is to learn the network parameters W together and to expect a certain fault tolerance while minimizing the radius of the hyper-sphere; the boundary is specified by a characteristic space center C and a distance radius R from the center to a data point in the characteristic space; objective function L _{deep svdd} Comprises the following steps:

wherein the penalty factor gamma epsilon (0,1)]Controlling whether the sphere volume and balance violate the boundary, λ being a hyper-parameter of the network parameter W, | · | _F Is the Frobenius norm;

first item

Representing the volume of the maximum decreasing hypersphere, the second term

Is a penalty term for a point outside the hypersphere after passing through the network, and the third term is greater than or equal to>

Represents a weight decay regularizer.

In a second aspect, the invention provides an industrial control system data anomaly detection device based on BiGRU-VAE and Deep SVDD, which comprises a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is configured to operate in accordance with the instructions to perform the steps of the method according to the first aspect.

In a third aspect, the present invention provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.

Has the advantages that: compared with the prior art, the method has the following advantages:

1) The invention is suitable for the sensor of the industrial control system and the abnormal detection of network data. Because the current industrial control system needs a large amount of material resources and financial resources for collecting and processing the sensor and network abnormity; according to the method, the abnormality detection method based on the hybrid model of the BiGRU-VAE and the support vector data description (Deep SVDD) can achieve a good detection effect on the data of the industrial control sensor, and solves the problem of the current abnormality detection capability for data from various sensors and actuators and the time relation between sliding windows, so that the method can be suitable for the abnormality detection tasks of different industrial control systems.

2) The invention enables the model to have the threshold self-adaption capability of anomaly detection. Nowadays, most deep learning models achieve anomaly detection capability through a fixed threshold, and the method needs actual experience or multiple parameter adjustment, which often increases the cost consumption. The invention adaptively determines the threshold value of the anomaly detection by putting the smooth reconstruction error sequence into Deep SVDD for training. The smoothed reconstruction error can reduce false positive and false negative in detection to a certain extent, and the self-adaptive threshold has more flexibility and effect.

Drawings

FIG. 1 is a flow chart of an industrial control system data anomaly detection method based on BiGRU-VAE and Deep SVDD according to an embodiment of the present invention;

FIG. 2 is a diagram of a BiGRU-VAE-based reconstruction model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the exemplary embodiment of the present invention.

Detailed Description

In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.

In the description of the present invention, the meaning of a plurality is one or more, the meaning of a plurality is two or more, and the above, below, exceeding, etc. are understood as excluding the present numbers, and the above, below, within, etc. are understood as including the present numbers. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

In the description of the present invention, reference to the description of the terms "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Example 1

A data anomaly detection method for an industrial control system based on BiGRU-VAE and Deep SVDD comprises the following steps:

The data sets used in the present invention are mainly derived from SWaT (safe water treatment data set) and WADI (water distribution data set) provided by network security center iTrust, university of singapore; the SWaT data set is mainly data of sensors, actuators and network data in industrial water treatment plants, and the WADI water distribution data set is mainly a natural extension of SWaT.

In some embodiments, a method for detecting data anomaly of an industrial control system based on BiGRU-VAE and Deep SVDD comprises the following steps:

step 1. Data preprocessing

1.1 missing value processing: the original safe water treatment and water distribution data are collected by a sensor, a small amount of missing reports or noise data are deleted, so that the conditions of losing and breaking points of values in the data are caused, the training of a model is facilitated by supplementing proper data, and because the safe water treatment and water distribution missing values are few, a linear interpolation method is selected in the aspects of performance and simplicity, firstly, the missing values x are used for solving the problem that the missing values x are not used for the conventional safe water treatment and water distribution _p The calculated slope of the previous and subsequent data is shown in equation 1:

wherein x is _p+1 ，x _p-1 Respectively expressed as data before and after the missing value, t _p+1 ，t _p-1 The time steps before and after the missing value are respectively expressed, and b is the slope.

Calculating the missing value x from the slope _p As shown in equation 2:

x _p ＝x _p-1 +b×(t _p -t _p-1 ) (2)

1.2 data normalization is mainly to eliminate the dimensional influence between the indices. After data standardization, all indexes are in the same order of magnitude, and the method is suitable for comprehensive comparison and evaluation. In addition, it can speed up the speed of solving the optimal solution by gradient descent. Because safe water treatment and water distribution data are stable, the maximum and minimum values which do not have extreme exist, normalization is adopted here as shown in formula 3, and the data are compressed in the range of 0,1.

Where x is represented as raw data, x _max Expressed as the maximum value of the data, x _min Expressed as the minimum value of the data, x _norm Expressed as normalized data.

1.3 set the window size to Ω (set to 12), the step size of the window sliding is fixed to 1, given a time series X = { X = { n } ₁ ,x ₂ ,…,x _n In which x _m Is the m-dimensional reading at the ith timestamp, which contains information about m different channels, giving a time series of states: x is the number of _t ＝{x _t-Ω+1 ,…,x _t After the sliding window partitions the time series, the time series is divided into a set of segments G = { x = } _Ω ,…,x _t ,…,x _n And each adjacent L section is organized into a sequence and sent to the model for training.

1.4 partitioning the data set. In the safe water treatment and water distribution data sets, the proportion of the training set and the test set in the whole data set is uniformly set to be 8:2.

and 2, step: building a reconstruction model based on BiGRU-VAE

2.1BiGRU-VAE reconstruction model As shown in FIG. 2, VAE is a deep generation model, which aims to learn the probability distribution p (x) in data x and sample the data distribution to reconstruct to generate new data x

p (x). Firstly, the encoder converts x into the probability distribution of the latent variable z, then the latent variable z is randomly sampled and decoded into the reconstructed output by the decoder, and the x->Neural networks of z are called reasoningNetwork->

As shown in equation 4. Fitting>

Is called a generating network p _θ (z | x), as shown in equation 5:

wherein

θ is expressed as a parameter for identifying the network and generating the network, respectively. />

Represented as an encoder, D _θ (. Cndot.) stands for decoder.

In order to capture the time dependence and dependency of data input data, the invention introduces a GRU network unit into a VAE network by utilizing the memory function of a GRU, and realizes the extraction of proper characteristics in a hidden layer and the reconstruction of an input sequence.

Parameterizing an encoder by a BiGRU with an activation function, specifically, the number of samples is n, the number of input is d, the number of hidden units is h, and a small batch of input X with a given time step t is used _t ∈R ^n×d Hidden state with previous layer forward time step

And hidden status of the next layer of inverted time step>

Forward candidates for computing time step tHidden state>

And a candidate hidden state in reverse>

As shown in equations 6 and 7:

wherein

Expressed as a weight parameter, <' > is selected>

Expressed as deviation parameter, R _t Denoted reset gate, tanh (-) as activation function;

finally updating the gate Z according to the current time step _t To hide the forward time step

And a forward candidate concealment time step>

Hidden status in reverse time step>

And a candidate hidden state in reverse>

To generate a hidden state sequence in two directions, forward → and backward ←. Two-sided final encoder hidden state interconnect to produce a vector

As shown in equations 8 and 9:

generating, by the encoder, mean and variance μ and σ subject to a Gaussian distribution, μ and ∈ hiding state H from the final encoder using two fully connected layers with linear and Softplus activation, respectively _t Is obtained by the following steps. As shown in equations 10 and 11:

wherein

Expressed as a weight parameter, b _μ ，b _σ Expressed as a bias parameter and Softplus as an activation function.

In reconstruction errors, an approximate posterior is needed

Random sampling is carried out, because the sampling process is non-conductive and cannot directly participate in gradient propagation, a re-parameterization technique is adopted, and an epsilon-N (0,I) auxiliary noise variable, which represents an element by a product quantity, is taken. Thus, the random sampling z does not participate in the gradient descent, and only the result of the sampling needs to be updated. The heavy parameter trick is shown in figure 3.

2.2 main hyper-parameter settings. The batch size is set to 64, the number of iterations is 200, the optimizer selects the SGD, the learning rate is set to 0.0005, the GRU unit size is 128, the latent variable dimension is 16, the sliding window length is 12, the F1 score reaches an optimal value, the alarm delay penalty factor is 0.25, and the Gaussian kernel parameter is 0.9.

The loss function of the model is constructed in two parts as shown in equation 12, with the first term representing the reconstruction error. The second term is the true distribution p _θ (z) distribution of choices

KL divergence in between, encouraging a distribution->

Approximate true distribution p in training _θ (z)。

2.3 set the proportion of the validation set to the training set. In order to prevent the phenomenon of overfitting in the training process, the training set is divided into a training set and a verification set again, and the proportion is 8:2.

2.4BN (Batch Normalization) is applied to the mean vector of the encoder output, ensuring that the desired lower bound of the KL divergence (KL vanizing) distribution is positive. In the present invention, batch training is often used, and the training process is shown in equation 13:

k is the dimension of the latent variable and b is the size of the batch sample.

Assuming that the sample mean may be considered to be approximately equal to the overall expectation, the sample mean of the sample may be substituted with the expectation, thus yielding the KL divergence as shown in equation 14:

batch normalization can be used to constrain the distribution of the mean values so that the mean values are controlled to a reasonable range. Calculated by the posterior distribution mean, as shown in equation 15:

is mu _i Average value after BN treatment; mu.s _Bi And σ _Bi Is represented by mu _i Mean and standard deviation of; by replacing mu in KL formula _i To obtain the formula, as shown in equation 16:

beta and gamma are parameters in batch normalization, and mu can be controlled separately _i Variance and mean of distribution. The KL divergence term remains a positive lower bound as long as the beta and gamma parameters are reasonably controlled.

And 2.5, inputting the training set of the safe water treatment data and the water distribution data into the constructed reconstruction model for training to obtain the model parameters with the highest accuracy for the current task model.

2.6 after the training is finished, storing the model parameters with the highest accuracy and the accuracy;

2.7 smoothing the difference sequence by using Exponential Weighted Moving Average (EWMA) to suppress the error peak value frequently appearing, and according to the original difference sequence d at time k _k Obtaining a smoothed sequence e _k By controlling α (0)<α<1) The ability of the EWMA to track sudden changes in data is determined, as shown in equation 17:

e _k ＝αd _k +(1-α)e _k-1 (17)

wherein alpha is represented as a hyperparameter and d _k Is expressed asFirst difference sequence, e _k Expressed as a smooth sequence

And step 3: building an anomaly detection module based on SVDD

3.1 to distinguish between normal and abnormal samples, the basic idea of Deep support vector data description Deep SVDD is to map raw data from an input space onto a high-dimensional feature space through a neural network, then find a hypersphere containing as many normal samples as possible on the feature space, while abnormal samples are excluded as much as possible in the hypersphere, when a new sample is identified using the hypersphere, if the sample is within the boundary of the hypersphere, it is considered as a normal sample, otherwise it is an abnormal sample. In particular from the input space

To the characteristic space->

Make->

Is a layer having L hidden layers and a weight set W = { W = { W = } ¹ ,…,W ^L }, in which +>

Means->

The weight of the layer. In the present invention, the target training sample is a smoothed reconstruction error, i.e. < >>

Expressed as e = { e = { e ₁ ,e ₂ ,e _t ,…,e _n And n is the number of samples. The goal of Deep SVDD is to learn the network parameters W together and to minimize the radius of the hyper-sphere while having a certain fault tolerance. The boundary is specified by the feature space center C and the radius R of the distance between the center to the data points in the feature space. The objective function is shown in equation 18:

the first term represents the maximum reduction of the volume of the hyper-sphere, the second term is a penalty term for points outside the hyper-sphere after passing through the network, and the third term represents the weight decay regularizer.

3.2 after the training is finished, saving the model parameter W with the highest accuracy, the center C of the sphere and the radius R.

And 4, step 4: the model at this time can input a test set of safe water treatment and a test set of water distribution to realize anomaly detection.

Example 2

In a second aspect, the present embodiment provides an apparatus for detecting data anomaly of an industrial control system based on BiGRU-VAE and Deep SVDD, including a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is configured to operate in accordance with the instructions to perform the steps of the method according to embodiment 1.

Example 3

In a third aspect, the present embodiment provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of embodiment 1.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be appreciated by those skilled in the art that the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The embodiments disclosed above are therefore to be considered in all respects as illustrative and not restrictive. All changes which come within the scope of or equivalence to the invention are intended to be embraced therein.

Claims

1. A data anomaly detection method for an industrial control system based on BiGRU-VAE and Deep SVDD is characterized by comprising the following steps:

inputting the reconstructed error after the smoothing treatment into a pre-constructed data anomaly detection model based on Deep SVDD for training so as to determine the threshold value and find the center and the radius of the optimal hypersphere through the Deep SVDD to obtain the trained data based on the Deep SVDD.

2. The method for detecting data anomaly of industrial control system based on BiGRU-VAE and Deep SVDD as claimed in claim 1, wherein said preprocessing comprises:

3. The method for detecting data abnormality of the industrial control system based on the BiGRU-VAE and Deep SVDD as claimed in claim 2, wherein the processing of missing values of data by using the linear interpolation method comprises:

wherein x is _p+1 ，x _p-1 Respectively represent missing values x _p Data before and after, t _p+1 ，t _p-1 Respectively represent missing values x _p The previous and subsequent time steps;

calculating a missing value x from the slope b _p ：

x _p ＝x _p-1 +b×(t _p -t _p-1 )

Wherein, t _p Indicates the missing value x _p Time step (c).

4. The method for detecting data abnormality of the industrial control system based on the BiGRU-VAE and Deep SVDD as claimed in claim 2, wherein the normalizing process to obtain normalized data comprises:

5. The method for detecting data anomaly of industrial control system based on BiGRU-VAE and Deep SVDD according to claim 2, wherein the step of segmenting each time series of the normalized data into a set G of segments by using a preset sliding window comprises the following steps:

based on a preset window size Ω, the step size of window sliding is fixed to 1, and a time series X = { X = is given ₁ ,x ₂ ,…,x _n In which x is _m Is the m-dimensional reading at the ith timestamp, which contains information about m different channels, n represents the length of data in the time series, and the state given for the t time series is: x is the number of _t ＝{x _t-Ω+1 ,…,x _t After the sliding window partitions the time series, the time series X is divided into a set of segments G = { X = } _Ω ,…,x _t ,…,x _n }。

6. The method for detecting data anomaly of the industrial control system based on the BiGRU-VAE and Deep SVDD according to claim 1, wherein the BiGRU-VAE reconstruction model comprises:

VAE is a deep generative model, which aims to learn the probability distribution p (x) in data x and to generate new data by sampling and reconstructing from the data distribution

p(x)；

Fitting>

Is called a generating network p _θ (z|x)：

representative of encoders, D _θ (. Represents a decoder;

parameterizing an encoder by a BiGRU with an activation function: the number of samples is n, the input number is d, the number of hidden units is h, and the small batch of input data X of a given time step t _t ∈R ^n×d Hidden state with previous layer forward time step

And hidden status of the next layer of inverted time step>

Positive candidate hidden status ^ based on the calculation of time step t>

And a candidate hidden state in reverse>

Wherein

Is a weight parameter, is->

As a deviation parameter, R _t To reset the gate, tanh (-) is the activation function;

And forward candidate concealment time step

Hidden status of a reverse time step>

And a candidate hidden state in reverse>

To generate a sequence of hidden states in two directions: forward → and backward ←; the two sides of the final encoder hidden state are connected to each other to produce a vector @>

/>

By means of an encoderObeying mean and variance μ and σ from the Gaussian distribution, μ and ∈ concealment of state H from the final encoder using two fully-connected layers with linear and Softplus activation, respectively _t The following results are obtained:

wherein

As a weight parameter, b _μ ，b _σ For the deviation parameter, softplus is the activation function;

in reconstructing errors, from an approximate posterior

Random sampling is carried out, and an auxiliary noise variable of epsilon-N (0,I) is taken by adopting a reparameterization technique, which represents that an element multiplies a product quantity.

7. The method for detecting data anomaly of industrial control system based on BiGRU-VAE and Deep SVDD according to claim 1, wherein a loss function is adopted during training based on a BiGRU-VAE reconstruction model

Wherein

KL divergence in between, encouraging profiles>

Approximate true distribution p in training _θ (z)；

calculated by the posterior distribution mean:

/>

is mu _i Average value after BN treatment; mu.s _Bi And σ _Bi Is represented by mu _i Mean and standard deviation of; by replacing mu in KL formula _i To obtain the following formula:

8. The method for detecting data anomaly of industrial control system based on BiGRU-VAE and Deep SVDD as claimed in claim 1, wherein performing EMWA smoothing on the reconstructed error by exponential weighted moving average to obtain the smoothed reconstructed error comprises:

at time k, from the original reconstruction error d _k Obtaining the reconstructed error e after smoothing _k By controlling α (0)<α<1) Ability to determine the abrupt change of EWMA tracking data:

e _k ＝αd _k +(1-α)e _k-1

wherein alpha represents a hyper-parameter, d _k Representing the original reconstruction error, e _k Representing the reconstruction error after the smoothing process.

9. The method for detecting data abnormality of the industrial control system based on the BiGRU-VAE and Deep SVDD according to claim 1, wherein the data abnormality detection model based on Deep SVDD comprises:

from the input space

To the characteristic space->

Make->

M → N is a set of L hidden layers and weight W = { W = { (W) ¹ ,…,W ^L Is in which>

Means are>

The weight of the layer;

the target training sample is a reconstructed error after smoothing, i.e.

the goal of Deep SVDD is to learn the network parameters W together and to expect a certain fault tolerance while minimizing the radius of the hyper-sphere; the boundary is specified by a feature space center C and a distance radius R from the center to a data point in the feature space; objective function L _{deep svdd} Comprises the following steps:

wherein the penalty factor gamma epsilon (0,1)]Controlling whether the sphere volume and balance violate the boundary, λ being the over-parameter of the network parameter W, | · |) _F Is the Frobenius norm;

first item

The representation represents a maximum reduction of the hyper-sphereVolume of body, second term

Represents a weight decay regularizer.

10. A data anomaly detection device of an industrial control system based on BiGRU-VAE and Deep SVDD is characterized by comprising a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of claims 1 to 9.