CN111275113B - Skew time series abnormity detection method based on cost sensitive hybrid network - Google Patents
Skew time series abnormity detection method based on cost sensitive hybrid network Download PDFInfo
- Publication number
- CN111275113B CN111275113B CN202010065816.8A CN202010065816A CN111275113B CN 111275113 B CN111275113 B CN 111275113B CN 202010065816 A CN202010065816 A CN 202010065816A CN 111275113 B CN111275113 B CN 111275113B
- Authority
- CN
- China
- Prior art keywords
- representing
- cost
- samples
- layer
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 55
- 238000000034 method Methods 0.000 claims abstract description 76
- 238000012549 training Methods 0.000 claims abstract description 36
- 230000008569 process Effects 0.000 claims abstract description 31
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 30
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 29
- 230000006870 function Effects 0.000 claims description 88
- 239000013598 vector Substances 0.000 claims description 62
- 230000004913 activation Effects 0.000 claims description 39
- 238000010606 normalization Methods 0.000 claims description 22
- 238000011176 pooling Methods 0.000 claims description 18
- 238000013507 mapping Methods 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000012360 testing method Methods 0.000 claims description 12
- 230000002159 abnormal effect Effects 0.000 claims description 11
- 238000007781 pre-processing Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 7
- 239000012634 fragment Substances 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 6
- 230000005856 abnormality Effects 0.000 claims description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 230000003213 activating effect Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 238000002790 cross-validation Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 241000764238 Isis Species 0.000 claims description 2
- 230000011218 segmentation Effects 0.000 claims description 2
- 238000013528 artificial neural network Methods 0.000 description 10
- 238000011156 evaluation Methods 0.000 description 10
- 238000002474 experimental method Methods 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 235000007319 Avena orientalis Nutrition 0.000 description 2
- 241000209763 Avena sativa Species 0.000 description 2
- 235000007558 Avena sp Nutrition 0.000 description 2
- 238000012733 comparative method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 101150055297 SET1 gene Proteins 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a skew time sequence anomaly detection method based on a cost-sensitive hybrid network, which comprises the steps of firstly establishing and training a cost-sensitive hybrid network model consisting of a deep convolutional neural network, a gated recursive network and a cost-sensitive loss function, wherein local characteristics of a time sequence are learned through the deep convolutional neural network, sequence characteristics of the time sequence are learned through the gated recursive network, then the characteristics are combined for classification, the similarity between an output result and a real value is measured by using the cost-sensitive loss function in the model training process, then parameters of the network model are adjusted through a back propagation algorithm, and different penalty factors are used for punishing error detection of the network model aiming at samples of different quantity and types. The method is simple and efficient, high in precision and strong in robustness, and has high detection precision on both the skew time series data set and the non-skew time series data set.
Description
Technical Field
The invention belongs to the technical field of time series data abnormity detection, and relates to a skew time series abnormity detection method based on a cost-sensitive hybrid network.
Background
Skew time series data refers to data sets with widely different sample data volumes in different classes. In practical applications, the time series data obtained by engineering measurement is mostly within a normal range, and has only a very small number of abnormal values, which is a typical skew time series data set. In the binary classification problem, the result of a general classifier is biased to a normal class, and the false detection rate of an abnormal class is very high. However, in practical applications, a few categories are focused on, for example, fault detection of spacecraft, disease diagnosis in the medical field, and credit card fraud in the financial field.
The time series classification method based on deep learning is based on the whole time series, and combines a feature extraction stage and a classification stage together for processing. The classification of univariate time series based on a multichannel Convolutional neural network model (MC-CNN) is proposed in the literature [ Y.ZHING, Q.Liu, E.Chen, explicit Multi-Channels later conditional neural network for Multi-variant time series classification [ J ], frontiers of Computer Science,2016,10 (1): 96-112 ]. The MC-CNN respectively learns time series characteristics by using three channels, combines the characteristics learned by the three channels together, and sends the combined characteristics to a Softmax layer for classification finally. Compared with the traditional algorithm, the MC-CNN model has better performance, but the MC-CNN model only performs experiments on two reference UCR data sets and cannot show superior performance. The most significant advantage of MCNN is that the classification technology of machine learning is continuously explored in the document [ Z, cui, W.Chen, Y.Chen, multi-scale conditional Neural Networks for time series classification [ J ], arXiv preprints arXiv:1603.06995,2016], a plurality of branch conversion layers are arranged, preprocessing is carried out on data in a time domain and a frequency domain respectively, a plurality of local convolution layers are established, features with different sizes and frequencies are automatically extracted, and the characteristic representation performance is outstanding. The MCNN makes model evaluations on 44 UCR reference data, with 10 data sets showing better experimental results. But MCNN requires more pre-processing and hyper-parameter settings. Because the MC-CNN and the MCNN models need to be preprocessed in a large quantity before training, the learning rate needs to be set manually, and the full connection layer is used for feature tiling before the network output layer, the parameters of network learning are increased obviously. The network models proposed in the documents [ Z.Wang, W.Yan, oates T.Time series classification from scratch with deep Neural Networks: A string basis, neural Networks (IJCNN), 2017International journal reference on IEEE,2017, 1578-1585] and [ F.Karim, S.Majumdar, H.Darabi, LSTM full volumetric network for time series classification [ J ], IEEE Access,2018, 1662-1669] all use a global average pooling layer instead of a full connection layer before the output layer, reducing network parameters, using an adaptive optimizer for the training of loss functions, avoiding the setting of learning rate. The documents [ Z.Wang, W.Yan, oates T.Time series classification from scratch with deep Neural Networks: A string basis, neural Networks (IJCNN), 2017International Joint Conference on. IEEE,2017, 1578-1585] propose a benchmark Neural network model in which Full Convolution Networks (FCN) and residual Networks (ResNet) achieve true end-to-end learning, without cumbersome preprocessing work, and show better performance on 18 data sets in an experiment of 44 UCR data sets. In order to further improve the performance of the network, two combined network models (Long Short message Memory network, LSTM-FCN) which combines a Long Short message Memory network and a full volume network for learning time sequence features are proposed in the documents [ F.Karim, S.Majumdar, H.Darabi, LSTM full connectivity network for time services classification [ J ], IEEE Access,2018,6: 1662-1669], wherein the ALTM-FCN model is disclosed by X29. Xiong, zhao, Y.Pat, K.Choo, Y.sub.Y.sub.sub.19, and X29. J.for predicting the time sequence features on the basis of LSTM-FCN. The ALSTM-FCN model is superior to the existing method in 51 data sets in 85 UCR data sets. However, the ALSTM-FCN model obviously reduces the learning efficiency by learning more parameters in the training phase. The prior literature also discloses the use of long-and-short memory networks for classification of medical and industrial data, but improvements in detection accuracy are not significant.
Most of the existing algorithms are directed at the situation that the scale of a minority class is equivalent to that of a majority class, and for an oblique class time series data set, namely a data set with the number of samples of the minority class and the majority class being seriously different, the feature learning of the minority class samples is insufficient by an algorithm, so that the classification precision is seriously reduced. Existing solutions are data-based sampling methods and algorithm-based methods. Data-based sampling methods typically preprocess a data set, including downsampling for most classes and oversampling for few classes. The former balances the data set by randomly deleting samples from the multi-class, with the disadvantage of causing loss of valuable information. The latter requires random generation of some samples that are not originally present to adjust the data distribution, and this method usually causes an overfitting problem in the final result because the original structure of the time series data is changed. The algorithm-based method is a threshold moving method, namely a decision threshold of the classifier is adjusted through experiments or artificial setting, and the cost for finding a proper threshold is very high. For a data set with serious skewness, the existing processing method has a plurality of problems, and the detection performance of the classifier is seriously influenced, so that the development of data analysis technology in the fields of industry, finance, medicine, military and the like is influenced.
Disclosure of Invention
The invention aims to provide a skew time series anomaly detection method based on a cost-sensitive hybrid network, which solves the problems that the detection precision of a few types of samples in a skew time series data set is low in the prior art, and the classification precision is seriously reduced because the feature learning of the few types of samples is insufficient in the prior algorithm.
The technical scheme adopted by the invention is a skew time series abnormity detection method based on a cost sensitive hybrid network. Firstly, establishing and training a cost-sensitive hybrid network model consisting of a Deep Convolutional Neural Network (DCNN), a gated recursive network (GRU) and a cost-sensitive loss function; learning local features of the time series through the deep convolution neural network DCNN, learning sequence features of the time series through the gated recursion network GRU, combining the features and classifying the combined features through a Soft-max classifier; in the model training process, a cost sensitive loss function is utilized to measure the similarity between an output result and a true value, then parameters of the network model are adjusted through a back propagation algorithm, and different penalty factors are used for samples with skew between classes to penalize error detection of the network model.
The invention is also characterized in that:
the method specifically comprises the following steps:
step 1.1, learning local features of a time sequence by using a Deep Convolutional Neural Network (DCNN) composed of three convolutional layers, wherein each convolutional layer comprises convolution operation and batch normalization operation, and a global average pooling layer is introduced into an output layer and used for reducing feature dimensions;
step 1.2, learning sequence characteristics of time sequence through gated recursion network GRU, wherein gated recursion network GRU is updated by update gate p s And a reset gate q s Composition, X represents a time-series data sample, g s Indicates the amount of output information at time s,representing a hidden state at time s, the input to the memory unit at time s being g s-1 And X; reset gate p s Controlling the output value g of the last moment s-1 Into the hidden state at the present time>Is mapped to by the activation function one0,1]In between, hidden states>Mapping to [ -1,1 ] by activating function two]In the range of (a), the mathematical expression thereof is as follows:
p s =σ(K p ·[g s-1 ,X]) (6)
wherein, K p Weight matrix representing reset gates, [ g ] s-1 ,X]Representing a vector g of two inputs s-1 And X are connected into a long vector, and sigma is the first activation function;
updating door q s Information g for determining output of s-1 time s-1 Is brought into the S time to output information g s Degree of (d), update gate q s Value is [0,1 ]]The larger the value is, the output information g at the previous time is shown s-1 Is brought to the current moment and outputs information g s The less, the mathematical expression thereof is as follows:
q s =σ(K q ·[g s-1 ,X s ]) (8)
wherein, K q Weight matrix representing the updated gate, [ g ] s-1 ,X s ]Vector g representing two inputs s-1 And X s Connecting into a long vector, wherein sigma is the activation function I;
step 1.3, in the training process of the cost-sensitive hybrid network model, measuring the similarity between an output result and a true value by using a cost-sensitive loss function, wherein the expression is as follows:
wherein l j True label, X, representing the jth training sample j J-th time-series sample, σ, representing the input k,b (X j ) Representing model inputThe probability value is obtained, K represents a weight parameter, b represents bias, and N represents the total number of samples; wherein eta and ν respectively represent punishment factors under the condition that the minority samples and the majority samples are wrongly classified, and when the minority samples are wrongly detected, the punishment factors are multiplied by a larger punishment factor eta, so that the total loss is amplified; when most samples are detected wrongly, multiplying a smaller penalty factor v, eta and upsilon by the calculation formula as follows:
where N is the total number of samples, N normal_total Is the normal number of samples, n abnormal_total Is the number of abnormal samples, n classes For the sample class, n in the present invention classes =2;
Step 2, a skew time series data anomaly detection algorithm based on the cost sensitive hybrid network model:
the algorithm is mainly divided into three stages: the first stage is a data preprocessing stage; the second stage is a time sequence local feature learning stage, which mainly comprises the local feature learning of the time sequence based on the deep convolutional neural network DCNN in the step 1 and the local feature learning of the time sequence of the gated recursive network GRU; the third stage is an abnormality detection stage;
step 2.1, preprocessing data mainly comprises normalization operation and time slicing operation;
step 2.2, local feature learning of time series: data of 80% in the time series dataInputting the training samples into the cost-sensitive hybrid network model constructed in the step 1 to learn local characteristics of a time sequence, simultaneously performing cross verification by using part of the training samples, and updating model parameters by adopting a back propagation algorithm in the whole training and learning process; the specific process of feature learning comprises the following steps: local features of the time sequence based on the deep convolutional neural network DCNN in the step 1Learning, namely performing local feature learning based on the time sequence of the gated recursive network GRU in the step 1, classifying by using a Softmax classifier to obtain a probability value output by the cost-sensitive hybrid network model, and updating parameters by using the cost-sensitive loss function to measure the similarity between a predicted value and a true value;
step 2.3, anomaly detection phase
Testing the test data by using the cost sensitive hybrid network model trained in the step 2.2, and testing the rest 20 percent of data in the time sequence dataAs a sample book, let phi (L) r (ii) a K, b) as a cost-sensitive hybrid network model, L r ∈L test_set The mathematical expression is:
wherein, P nclass (L r ) Is phi (L) r (ii) a Predicted probability values of K, b), l r_label In order to predict the label of a sample, are parameters obtained in the learning process.
Step 1.1 specifically comprises the following steps:
step 1.1.1, convolution operation
Definition ofRepresents the difference between the u-th channel in the d-th layer and the v-th channel in the d-1 layerIn between convolution kernels, is greater than or equal to>Represents the output value of the u-th channel of the sample in layer d-1, < >>And &>The local features of the time series are learned by convolution operations:
wherein,represents the output value of the u-th channel of level d, < >>Represents a bias of the u-th channel of level d>Representing convolution operation, and V represents the number of convolution kernels in the previous layer;
step 1.1.2, batch normalization operation
For an input time series of samples X = { X 1 ,x 2 ,…,x z And expressing the batch normalization operation as follows:
wherein,is a standard normalization value, τ is a constant used to ensure that the denominator is greater than 0, γ represents a data scale change, β represents a data offset, and/or->Represents the value after the batch normalization operation;
step 1.1.3, global average pooling layer
And carrying out average pooling operation on a plurality of feature vectors obtained by the previous convolutional layer by utilizing the global average pooling layer to obtain the following result:
A={a 1 ,a 2 ,…,a U } (5)
wherein, X u Representing the feature vector, K, of the u-th channel after the last layer of convolution GAP Representing a global average pooling matrix, U representing the dimension of the output feature vector, A representing the output value a of each channel u And combining as the final output vector.
In step 1.2, the first activation function is a Sigmoid activation function, and the second activation function is a tanh activation function;
reset gate p s Mapping to [0,1 ] by Sigmoid activation function]In the above-mentioned manner, hidden state g & s Mapping to [ -1,1 ] by tanh activation function]In the range of (a), the mathematical expression thereof is as follows:
p s =σ(K p ·[g s-1 ,X]) (6)
wherein, K p Weight matrix representing reset gates, [ g ] s-1 ,X]Representing a vector g of two inputs s-1 And X are connected into a long vector, sigma is the Sigmoid activation function,representing the weights for computing the hidden state.
The specific steps of step 2.1 are as follows:
step 2.1.1, data normalization processing
X{t m (x m ,l m ) } (M =1,2, \ 8230;, M) represents a time-series data set, where t m (x m ,l m ) Representing time series samples, x m Signal value, l, representing the m-th sample m Label representing the mth sample,/ m Is 0 or 1, M represents the total number of samples, and the mathematical expression is as follows:
step 2.1.2, time slicing
Long time sequence data X { t } is processed by adopting sliding window m (x m ,l m ) Dividing (M =1,2, \ 8230;, M) into short overlapping segments, taking a window function window () with length w, which is shifted by a step length h, and normalizing the data processed by the step 2.1.1Is divided into->Each segment->Is w, the expression is as follows:
wherein L is r Denotes the r-th fragment, sets w to half the period of the time-series data,for the total number of fragments, M represents the total number of samples.
The specific steps of step 2.2 are as follows:
step 2.2.1, deep convolutional neural network feature learning: and (2) local feature learning of a time sequence is carried out by adopting the deep convolutional neural network DCNN based on the step 1, a hidden layer of the convolutional network consists of three convolutional layers, each convolutional layer comprises three processing operations, and the specific flow is as follows.
Conv1 layer: assume that Conv1 layer has e 1 Size of k 1 Of the convolution kernelSuppose to take e 1 =32,k 1 =8, for sample L r (L r ∈L train_set ) And the convolution kernel->Performing convolution operation to obtain e1 characteristic vectors with the length of w-7>Then, the final output of the Conv1 layer is obtained through the BN operation and the activation function LeakyReLUThis process is expressed as follows:
conv2 layer: assume that Conv2 layer has e 2 Size of k 2 Convolution kernel ofSuppose to take e 2 =64,k 2 =5, the characteristic vector obtained by Conv1 layer is quantified =>And convolution kernel->Performing convolution operation to generate e 2 Characteristic vector with length w = 11->Then, the final output ^ of Conv2 layer is obtained through the BN operation and the activation function LeakyReLU>This process is expressed as follows:
conv3 layer: assume that Conv3 layer has e 3 Each size is k 3 Of the convolution kernelSuppose take e 3 =128,k 3 =3, the characteristic vector determined by the Conv2 layer is/is @>And convolution kernel->Performing convolution operation to generate 128 characteristic vectors with the length of w-13>The final output ^ of Conv3 layer is then obtained through the BN operation and the activation function LeakyReLU>This process is expressed as follows:
GAP layer, feature vector for Conv3 layer outputUsing and>convolution kernel K with same dimension GAP Andconvolution operation is carried out to generate a 128-dimensional characteristic vector->
WhereinFeature vector representing the eventual learning of a deep convolutional neural network>Each component value of (a);
step 2.2.2, gated recursive network feature learning: for an input time series data set L r (L r ∈L train_set ) Using the GRU learning sequence characteristics of the gated recursive network containing 128 cells to obtain the final output characteristics of the gated recursive networkEigenvector
Wherein K p And K q Weight matrices, F, representing reset and update gates, respectively GRU A mapping function representing a GRU network;
step 2.2.3, outputting the cost sensitive hybrid network model: for the input time series samples L r (L r ∈L train_set ) Finally outputting a probability value P by using a Softmax classifier by using the cost-sensitive hybrid network model nclass (L r ) Where nclass =0,1, and nclass =0 denotes L r Belongs to the majority of classes, nclass =1 denotes L r Belonging to a few classes, this process is expressed as follows:
whereinA feature vector representing the output of the convolutional network, <' >>A feature vector representing the output of the GRU network, the function concat (-) will pick the feature vector->And &>Splicing into a long vector;
step 2.2.4, updating parameters by using a cost sensitive loss function: probability value output by the cost sensitive hybrid network model CSHM obtained in the step 2.2.3Measuring the similarity between the predicted value and the true value by the cost sensitive loss function formula (11) in said step 1.3, wherein the weightIs biased to->Adopting a mechanism that the learning rate is 0.001 and gradient is reduced for every 200 segments, using 40% of training samples to perform cross validation, and updating the weight K and the bias b through a back propagation mechanism of an Adam optimization algorithm;
the final weight K and the deviation b are related to penalty factors eta and nu, and when a few types of samples are detected wrongly, the relatively large penalty factor eta is used for expanding the total loss; when most samples are misdetected, a relatively small penalty factor v is used to control the increase of the total loss;
generalizing the proposed cost sensitive loss function to a multi-class case, where the penalty factor for multi-class skewed data samples is as shown in equation (28):
wherein N is the total number of samples, N c_total Is the total number of samples of class c, η c Penalty factor corresponding to class c, c = {1,2, \8230;, n classes }。
The invention has the beneficial effects that:
the invention provides a skew time series data anomaly detection algorithm based on a cost-sensitive hybrid network model, wherein the cost-sensitive hybrid network model integrates the characteristics that a DCNN has strong local feature learning capability and a GRU has good sequence feature learning capability. The cost-sensitive hybrid network model has stronger nonlinear representation performance, is an end-to-end network model, and avoids a complex data preprocessing process. According to the invention, a cost sensitive loss function is introduced into the CSHN network model, and parameters of the network model are adjusted by adopting different penalty loss factors aiming at different types of samples, so that the problem of insufficient feature learning of a few types of samples is solved. The invention solves the problem of insufficient learning of a few types of samples in the prior art, and avoids the problems that the sampling method changes the structure of data, the threshold in the threshold moving method is difficult to determine and the like. The method is simple and efficient, high in precision and strong in robustness. The method has higher detection precision on the skewed time series data set and the non-skewed time series data set.
Drawings
FIG. 1 is a schematic diagram of a time series anomaly detection algorithm in the skew time series anomaly detection method based on the cost-sensitive hybrid network according to the present invention;
FIG. 2 is a schematic diagram of a cost-sensitive hybrid network model in the skew time series anomaly detection method based on the cost-sensitive hybrid network of the present invention;
FIG. 3 is a schematic diagram of a full connection layer in the skew time series anomaly detection method based on a cost-sensitive hybrid network according to the present invention;
FIG. 4 is a schematic diagram of a global average pooling layer in the skew time series anomaly detection method based on the cost-sensitive hybrid network according to the present invention;
FIG. 5 is a schematic diagram of a GRU network structure in the method for detecting the skew time series anomaly based on the cost-sensitive hybrid network according to the present invention;
FIG. 6 is a comparison of F-measure for different models on Dataset Dataset 1;
FIG. 7 is a comparison of F-measure for different models on Dataset Dataset 2;
FIG. 8 (a) is a graph comparing the loss variation of four networks on Dataset Dataset 1;
fig. 8 (b) is a graph comparing the loss variation of the four networks on Dataset 2.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
As shown in FIGS. 1 and 2, the method for detecting the skew time series abnormality based on the cost-sensitive hybrid network of the present invention comprises the steps of firstly establishing and training a cost-sensitive hybrid network model composed of a deep convolutional neural network DCNN, a gated recursive network GRU containing 128 cell units, and a cost-sensitive loss function, wherein local features of a time series are learned through the deep convolutional neural network DCNN, sequence features of the time series are learned through the gated recursive network GRU, then the features are combined and classified through a Soft-max classifier, the similarity between an output result and a real value is measured by using the cost-sensitive loss function in a model training process, then parameters of the network model are adjusted through a back propagation algorithm, and different penalty factors are used for samples of different numbers and categories to penalize error detection of the network model. By introducing BN operation and GAP operation, the overfitting problem of a general neural network caused by a plurality of training parameters is avoided, and the training speed of the model is improved.
The method for detecting the skew time series abnormity based on the cost sensitive hybrid network specifically comprises the following steps:
step 1.1, learning local features of a time sequence by using a Deep Convolutional Neural Network (DCNN) composed of three convolutional layers, wherein each convolutional layer comprises convolution operation and batch normalization operation, and a global average pooling layer is introduced into an output layer and is used for reducing feature dimensions;
step 1.1.1, convolution operation
The purpose of the convolution operation is to learn the local features of the sample. Definition ofRepresents a convolution kernel between the u-th channel in level d and the v-th channel in level d-1, and->Represents the output value of the sample in the u-th channel in layer d-1, -is combined with the output value of the preceding channel>And withThe local features of the time series are learned by convolution operations:
wherein,represents an output value of the u-th channel of level d, <' > is determined>Represents a bias of the u-th channel of level d>Representing convolution operation, and V represents the number of convolution kernels in the previous layer;
step 1.1.2, batch normalization operation
The batch normalization enables the intermediate output value of each layer to tend to be stable, and therefore the problem of unstable data distribution in the training process is solved. Therefore, the data needs to be normalized before each layer is input, which not only improves the stability of the network, but also improves the generalization capability of the network.
For an input time series of samples X = { X 1 ,x 2 ,…,x z And expressing the batch normalization operation as follows:
wherein,is a standard normalization value, τ is a constant used to ensure that the denominator is greater than 0, γ represents a data scale change, β represents a data offset, and/or->Represents the value after the batch normalization operation;
step 1.1.3, global average pooling layer
General convolutional networks to reduce the dimensionality of the feature vectors obtained by the last layer of convolution operations, typically include one or more fully-connected layers near the output layer, as shown in fig. 3. In the method of the present invention, we use a global average pooling layer GAP instead of a fully connected layer, as shown in fig. 4. This can reduce not only the dimensionality of the feature vectors, but also the parameters of the network. Carrying out average pooling operation on a plurality of feature vectors obtained by the previous convolutional layer by utilizing the global average pooling layer to obtain the following results:
A={a 1 ,a 2 ,…,a U } (5)
wherein, X u Representing the feature vector, K, of the u-th channel after the last layer of convolution GAP Representing a global average pooling matrix, U representing the dimension of the output feature vector, A representing the output value a of each channel u And combining as the final output vector.
Step 1.2, learning sequence characteristics of time sequence through gated recursion network GRU, wherein gated recursion network GRU is updated by update gate p s And a reset gate q s The updating gate is obtained by combining a forgetting gate and an input gate in the LSTM network structure. The structure of the GRU network is shown in fig. 5: x represents a time-series data sample, g s Output indicating time sThe amount of information is such that,indicating a hidden state at time s, the input of the memory cell at time s being g s-1 And X; reset gate p s Controlling the output value g of the last moment s-1 Flowing into hidden state at present>The reset gate is mapped to [0,1 ] by the activation function one]In between, a smaller value of the reset gate indicates that less information has been flowing into the current hidden state at the previous time, and the hidden state->Mapping to [ -1,1 ] by activating function two]In the range of (a), the mathematical expression thereof is as follows:
p s =σ(K p ·[g s-1 ,X]) (6)
wherein, K p Weight matrix representing reset gates, [ g ] s-1 ,X]Representing a vector g of two inputs s-1 And X are connected into a long vector, and sigma is the first activation function;
updating door q s Information g for determining output of s-1 time s-1 Is brought into the s-time output information g s Degree of updating gate q s Value is [0,1 ]]The larger the value is, the output information g at the previous time is shown s-1 Is brought to the current moment and outputs information g s The less, the mathematical expression thereof is as follows:
q s =σ(K q ·[g s-1 ,X s ]) (8)
wherein, K q Weight matrix representing the updated gate, [ g ] s-1 ,X s ]Representing a vector g of two inputs s-1 And X s Connecting into a long vector, wherein sigma is the activation function I;
in step 1.2, the first activation function is a Sigmoid activation function, and the second activation function is a tanh activation function;
reset gate p s Mapping to [0,1 ] by Sigmoid activation function]Between, hidden stateMapping to [ -1,1 ] by tanh activation function]In the range of (a), the mathematical expression thereof is as follows:
p s =σ(K p ·[g s-1 ,X]) (6)
wherein, K p Weight matrix representing reset gates, [ g ] s-1 ,X]Representing a vector g of two inputs s-1 And X are connected into a long vector, sigma is the Sigmoid activation function, K g~ Representing the weights for computing the hidden state.
Step 1.3, generally, during model training, a cross entropy loss function is used to measure the similarity between a true value and a predicted value, and the expression thereof is as follows:
wherein l j True label, X, representing the jth training sample j Represents the j time-series sample of the input, σ k,b (X j ) Representing the probability value of the model output, K representing the weight parameter, b representing the bias, and N representing the total number of samples.
In general, the total loss f 1 The smaller (K, b), the better the learning effect of the model. In the case of a severely skewed data distribution, the network model cannot obtain enough feature representation from a few classes by using a general cross entropy loss function, and therefore, the detection accuracy of the few classes is severely affected. The reason is that the general cross entropy loss function is on a small number of classesThe loss of samples and most classes of samples have the same penalty factor.
In order to solve the problem, in the training process of the cost-sensitive hybrid network model, a cost-sensitive loss function is used for measuring the similarity between an output result and a true value, and the expression is as follows:
wherein l j True label, X, representing the jth training sample j J-th time-series sample, σ, representing the input k,b (X j ) Representing the probability value output by the model, K representing a weight parameter, b representing a bias, and N representing the total number of samples; wherein eta and ν respectively represent punishment factors under the condition that the minority samples and the majority samples are wrongly classified, and when the minority samples are wrongly detected, the punishment factors are multiplied by a larger punishment factor eta, so that the total loss is amplified; when most samples are detected incorrectly, multiplying by a smaller penalty factor v, since the total number of most samples is more, the total loss is also large, and the calculation formula of η, v is as follows:
where N is the total number of samples, N normal_total Is the normal number of samples, n abnormal_total Is the number of abnormal samples, n classes For the sample class, n in the present invention classes =2;
Step 2, a skew time series data anomaly detection algorithm based on the cost sensitive hybrid network model:
the specific algorithm framework is shown in fig. 1, and the algorithm mainly comprises three stages: the first stage is a data preprocessing stage; the second stage is a time-series local feature learning stage, which mainly comprises the local feature learning of the time series based on the deep convolutional neural network DCNN in the step 1 and the local feature learning of the time series of the gated recursive network GRU; the third stage is an abnormality detection stage;
step 2.1, preprocessing data mainly comprises normalization operation and time slicing operation;
step 2.1.1, data normalization processing
X{t m (x m ,l m ) } (M =1,2, \8230;, M) denotes a time-series dataset, where t m (x m ,l m ) Representing time series samples, x m Signal value, l, representing the m-th sample m Label representing the m-th sample, l m Is 0 or 1, M represents the total number of samples, and the mathematical expression is as follows:
step 2.1.2, time slicing
Long time sequence data X { t } is processed by adopting sliding window m (x m ,l m ) Segmentation (M =1,2, \8230;, M) into short overlapping segments, taking a window function window () of length w, with a moving step h, and normalizing the data, which has undergone said step 2.1.1Is divided into->Each segment->Is w, the expression is as follows:
wherein L is r Denotes the r-th fragment, sets w to half the period of the time-series data,for the total number of fragments, M represents the total number of samples.
Step 2.2, local feature learning of time series: data of 80% in the time series dataInputting the training samples into the local features of the learning time sequence in the cost-sensitive hybrid network model constructed in the step 1, simultaneously performing cross verification by using part of the training samples, and updating model parameters by adopting a back propagation algorithm in the whole training and learning process; the specific process of feature learning comprises the following steps: based on the local feature learning of the time sequence of the deep convolutional neural network DCNN in the step 1, based on the local feature learning of the time sequence of the gated recursive network GRU in the step 1, obtaining a probability value output by the cost-sensitive hybrid network model by using a Softmax classifier to classify, and measuring the similarity between a predicted value and a true value by using the cost-sensitive loss function to update parameters;
step 2.2.1, deep convolutional neural network feature learning: and (2) local feature learning of a time sequence is carried out by adopting the deep convolutional neural network DCNN based on the step 1, a hidden layer of the convolutional network consists of three convolutional layers, each convolutional layer comprises three processing operations, and the specific flow is as follows.
Conv1 layer: assume that Conv1 layer has e 1 Each size is k 1 Of the convolution kernelSuppose to take e 1 =32,k 1 =8, for sample L r (L r ∈L train_set ) And the convolution kernel->Performing convolution operation to obtain e1 characteristic vectors with the length of w-7>Then, the final output of the Conv1 layer is obtained through the BN operation and the activation function LeakyReLUThis process is expressed as follows:
conv2 layer: assume that Conv2 layer has e 2 Size of k 2 Of the convolution kernelSuppose take e 2 =64,k 2 =5, the eigenvector @ obtained in the Conv1 layer is evaluated>And convolution kernel->Performing convolution operation to generate e 2 Characteristic vector with length w-11->Then, the final output ^ of Conv2 layer is obtained through the BN operation and the activation function LeakyReLU>This process is expressed as follows:
conv3 layer: assume that Conv3 layer has e 3 Size of k 3 Convolution kernel ofSuppose take e 3 =128,k 3 =3, the characteristic vector determined by the Conv2 layer is/is @>And convolution kernel>Performing convolution operation to generate 128 characteristic vectors with the length of w-13>The final output ^ of Conv3 layer is then obtained through the BN operation and the activation function LeakyReLU>This process is expressed as follows:
GAP layer for feature vector output by Conv3 layerUsing an AND>Convolution kernel K with same dimension GAP And withConvolution operation is carried out to generate a 128-dimensional characteristic vector->
WhereinRepresents the feature vector that the deep convolutional neural network finally learns->Each component value of (a);
step 2.2.2, gating recursive network feature learning: for an input time series data set L r (L r ∈L train_set ) Using the GRU learning sequence characteristics of the gated recursive network containing 128 cells to obtain the characteristic vector finally output by the gated recursive network
Wherein K p And K q Weight matrices, F, representing reset and update gates, respectively GRU A mapping function representing a GRU network;
step 2.2.3, outputting the cost sensitive hybrid network model: for the input time series samples L r (L r ∈L train_set ) Finally, the cost-sensitive hybrid network model outputs a probability value P by using a Softmax classifier nclass (L r ) Where nclass =0,1, it is assumed that nclass =0 denotes L r Belongs to the majority of classes, nclass =1 denotes L r Belonging to a few classes, this process is expressed as follows:
whereinA feature vector representing the output of the convolutional network, <' >>A feature vector representing the output of the GRU network, the function concat (-) will pick the feature vector->And &>Splicing into a long vector;
step 2.2.4, updating parameters by using a cost sensitive loss function: for the probability value output by the cost-sensitive hybrid network model CSHM obtained in the step 2.2.3, the similarity between the predicted value and the true value is measured by the cost-sensitive loss function formula (11) in the step 1.3, wherein the weight isIs biased to->Adopting a mechanism that the learning rate is 0.001 and gradient is reduced for every 200 segments, using 40% of training samples to perform cross verification, and updating the weight K and the bias b through a back propagation mechanism of an Adam optimization algorithm;
the final weight K and the deviation b are related to penalty factors eta and nu, and when a few types of samples are detected wrongly, the relatively large penalty factor eta is used for expanding the total loss; when most samples are misdetected, a relatively small penalty factor v is used to control the increase of the total loss;
the proposed cost sensitive loss function is generalized to the multi-class case, where the penalty factor for the multi-class skewed data samples is shown in equation (28):
wherein n is c_total Is the total number of samples of class c, η c Penalty factor corresponding to class c, c = {1,2, \8230;, n classes }。
Step 2.3, anomaly detection phase
Testing the test data by using the cost sensitive hybrid network model trained in the step 2.2, and testing the rest 20 percent of data in the time sequence dataAs a test sample, let phi (L) r (ii) a K, b) as a cost-sensitive hybrid network model, L r ∈L test_set The mathematical expression is:
wherein, P nclass (L r ) Is phi (L) r (ii) a Predicted probability value of K, b)/ r_label In order to predict the label of a sample,are parameters obtained in the learning process.
The invention relates to a simulation experiment result of a skew time series abnormity detection method based on a cost sensitive hybrid network, which comprises the following steps:
the experiment is carried out on an actual engineering data set and 44 UCR reference data sets, and the actual engineering data comprises a flywheel rotating speed data set (DataSet 1) and a gyroscope temperature data set (DataSet 2) of a certain device. The number of normal and abnormal values in these data sets varies greatly. Assume that most classes represent normal classes and a few classes represent abnormal classes. In the experiments, the FCN _ alsm network, the Resnet network, the FCN network, and the proposed CSHN were implemented using Keras deep learning packages. SVC (Support Vector Classification), adaBoost, RFC (Random Forest Classification) algorithms are implemented using the Scik kit-left package in Python 3.5.
1. Evaluation index
The performance of the method was evaluated using True Positive (TP), false Negative (FN), true Negative (TN) and False Positive (FP), which are defined as follows:
TP, the classifier detects the positive classes as the number of the positive classes; FN, the classifier detects the positive class as the number of the negative class; the classifier detects the negative class as the number of the positive class; TN the classifier detects the negative classes as the number of negative classes.
In the experiment, ACC was used + 、ACC - G-means and F-Measure to evaluate the performance of the algorithm, ACC + And ACC - Respectively representing the detection rates of normal samples and abnormal samples, and G-means can comprehensively evaluate the detection performance of the algorithm, which is defined as follows:
the F-measure is a comprehensive evaluation index for measuring the detection performance of the classifier on abnormal samples and is defined by weighted harmonic mean of Recall (Recall) and Precision (Precision).
Where Recall is a measure of completeness (i.e., how many samples of the exception class are correctly identified), precision is a measure of accuracy, and β is used to adjust the importance of accuracy relative to Recall (typically β = 1).
2. CSHN model assessment
In order to verify the effectiveness of the CSHN model of the method, the influence of a cost sensitive loss function and a general cross entropy loss function on the detection precision is compared. For this purpose, experiments were performed combining the proposed hybrid network model (DCNN + GRU) with cost-sensitive loss functions and general cross-entropy loss functions, which were performed on DataSet1 and DataSet2, respectively, and the results are shown in tables 1 and 2.
TABLE 1 test results on DataSet DataSet1
TABLE 2 test results on DataSet DataSet2
As can be seen from tables 1 and 2, for the normal samples, ACC + The range of variation in the values is small. ACC for abnormal samples using a general cross entropy loss function - The value is only around 78%, whereas with the cost sensitive loss function proposed by the present invention, ACC is used - The value increases by about 5% to 10%. The method obviously improves the detection precision of the minority class, which means that the proposed cost sensitive loss function can solve the problem of low detection precision of the minority class caused by the inclination of data distribution. In addition, from G-means and F-meas can be seen, the use of a cost sensitive loss function can improve detection performance.
3. Performance comparison
3.1ACC + 、ACC - Evaluation and comparison of G-means
In order to evaluate the detection performance of the method, evaluation indexes on the data sets DataSet1 and DataSet2 were calculated, and the results are shown in tables 3 and 4:
table 3 detection accuracy of different methods on DataSet1
Table 4 detection accuracy of different methods on DataSet2
As can be seen from tables 3 and 4, the deep learning based method is superior to the machine learning based algorithm. For normal samples, all detection methods ACC + The detection results are all more than 94%. ACC for abnormal samples, methods of the invention - ACC with value greater than the comparison method - The value is obtained. Through G-means comprehensive evaluation, the detection performance of the method is superior to that of a comparison method.
3.2 comparison of F-measure detection results
To further investigate the detection performance of the method, fig. 6 and 7 show the F-measure detection results of different methods on the data sets DataSet1 and DataSet2, respectively. In FIGS. 6 and 7, the abscissa represents the names of the different methods, and the ordinate represents the F-measure values of the different models on the data sets Dataset1 and Dataset2, respectively.
As can be seen from fig. 6 and 7, the deep learning based approach is superior to the machine learning based algorithm. This is because neural networks have better characterization capabilities for non-linear relationships. In the comparative method, the F-measure value of the method of the present invention is significantly higher than that of the other methods. This means that the detection performance of the method of the present invention is superior to that of the comparative method.
3.3 evaluation and comparison of convergence Rate and stability
For deep neural networks, the training loss reflects the convergence speed and stability of the network model. In terms of loss accuracy, the CSHN model is compared with FCN _ alsm, resnet, and FCN models based on deep learning, and fig. 8 (a) and 8 (b) are training loss variation curves of data sets DataSet1 and DataSet2, respectively. In fig. 8 (a) and 8 (b), the abscissa represents the number of iterations of the model, and the ordinate represents the loss values of the FCN _ alsm, respet, and FCN models on the data sets DataSet1 and DataSet2, respectively.
Fig. 8 (a) and 8 (b) show the variation trend of the training loss values on the data sets DataSet1 and DataSet2, respectively. It can be seen that the CSHN model loss values converge faster. On data set1, the loss value of the proposed CSHN model tends to be stable and significantly lower than the comparison network model when the number of iterations is greater than 250. On the data set DataSet2, when the number of iterations is greater than 120, the loss value of the proposed CSHN model tends to be stable and lower than the loss value of the comparison network model. This means that the stability of the model is better than that of the comparative network model.
4. Performance evaluation of UCR public datasets
To further verify the detection performance of the proposed CSHN model, 44 UCR equilibrium datasets were experimented. Since the model is tested over multiple data sets, it is necessary to define new metrics to evaluate the overall test performance. For this reason, the detection performance was evaluated using the accuracy of the evaluation index and the average error per type (MPCE).
Here, a data pool G = { G =isdefined z },g z Represents the z-th data set, C z Representative data set g c The evaluation index is defined as follows:
wherein the PCE z As a data set g z The MPCE is the average value of the error rates detected by each class in the data pool G, and Z is the number of data sets in the data pool G, and in the present invention, Z =44. The experimental results are shown in table 5:
TABLE 5 accuracy of different methods on 44 UCR datasets and MPCE
In table 5, the first column indicates the names of 44 UCR datasets, nclasses indicates the number of categories in each dataset, and win indicates the number of different methods with the highest accuracy among the 44 UCR datasets, where the highest accuracy means the highest experimental accuracy among the different methods on the same dataset.
As can be seen from Table 5, the method provided by the present invention has a significant detection effect not only on skewed data sets, but also on non-skewed data sets.
Claims (5)
1. The method for detecting the abnormity of the skew time series based on the cost-sensitive hybrid network is characterized by firstly establishing a cost-sensitive hybrid network model consisting of a deep convolutional neural network DCNN, a gated recursive network GRU and a cost-sensitive loss function, wherein the local characteristics of the time series are learned through the deep convolutional neural network DCNN, the sequence characteristics of the time series are learned through the gated recursive network GRU, then the characteristics are combined and classified through a Soft-max classifier, the similarity between an output result and a true value is measured by using the cost-sensitive loss function in the model training process, then the parameters of the network model are adjusted through a back propagation algorithm, and different penalty factors are used for samples of different numbers and categories to penalize the error detection of the network model, and the method specifically comprises the following steps:
step 1, integrating a deep convolutional neural network DCNN and a gated recursive network GRU containing 128 cell units, introducing a cost sensitive loss function, and constructing a cost sensitive hybrid network model CSHN;
step 1.1, learning local features of a time sequence by using a Deep Convolutional Neural Network (DCNN) composed of three convolutional layers, wherein each convolutional layer comprises convolution operation and batch normalization operation, and a global average pooling layer is introduced into an output layer and is used for reducing feature dimensions;
step 1.2, learning sequence characteristics of the time sequence by gating recursive network GRU, which updates gate p s And a reset gate q s Composition, X represents a time-series data sample, g s Indicating the amount of output information at time s,representing a hidden state at time s, the input to the memory unit at time s being g s-1 And X; reset gate p s Controlling the output value g of the last moment s-1 Into the hidden state at the present time>The reset gate is mapped to [0,1 ] by the activation function one]In between, hidden state>Mapping to [ -1,1 ] by activating function two]In the range of (a), the mathematical expression thereof is as follows:
p s =σ(K p ·[g s-1 ,X]) (6)
wherein, K p Weight matrix representing reset gates, [ g ] s-1 ,X]Representing a vector g of two inputs s-1 And X are connected into a long vector, and sigma is the first activation function;
updating door q s Information g for determining output of s-1 time s-1 Is brought into the s time output information g s Degree of updating gate q s Value is [0,1 ]]The larger the value is, the output information g at the previous time is shown s-1 Is brought into the current time output information g s The less, the mathematical expression thereof is as follows:
q s =σ(K q ·[g s-1 ,X s ]) (8)
wherein, K q Weight matrix representing the updated gate, [ g ] s-1 ,X s ]Vector g representing two inputs s-1 And X s Connecting into a long vector, wherein sigma is the activation function I;
step 1.3, in the training process of the cost-sensitive hybrid network model, measuring the similarity between an output result and a true value by using a cost-sensitive loss function, wherein the expression is as follows:
wherein l j True label, X, representing the jth training sample j J-th time-series sample, σ, representing the input k,b (X j ) Representing the probability value output by the model, K representing a weight parameter, b representing a bias, and N representing the total number of samples; wherein eta and ν respectively represent punishment factors under the condition that the minority samples and the majority samples are wrongly classified, and when the minority samples are wrongly detected, the punishment factors are multiplied by a larger punishment factor eta, so that the total loss is amplified; when most samples are detected wrongly, multiplying a smaller penalty factor v, eta and v by the calculation formula as follows:
where N is the total number of samples, N normal_total Is the normal number of samples, n abnormal_total Number of abnormal samples, n classes Is a sample class, n classes =2;
Step 2, a skew class time sequence data anomaly detection algorithm based on the cost sensitive hybrid network model:
the algorithm is mainly divided into three stages: the first stage is a data preprocessing stage; the second stage is a time-series local feature learning stage, which mainly comprises the local feature learning of the time series based on the deep convolutional neural network DCNN in the step 1 and the local feature learning of the time series of the gated recursive network GRU; the third stage is an abnormality detection stage;
step 2.1, preprocessing data mainly comprises normalization operation and time slicing operation;
step 2.2, local feature learning of time series: data of 80% in the time series dataInputting the training samples into the cost-sensitive hybrid network model constructed in the step 1 to learn local characteristics of a time sequence, simultaneously performing cross validation by using part of the training samples, and updating model parameters by adopting a back propagation algorithm in the whole training and learning process; the specific process of feature learning comprises the following steps: based on the local feature learning of the time sequence of the deep convolutional neural network DCNN in the step 1, based on the local feature learning of the time sequence of the gating recursive network GRU in the step 1, obtaining a probability value output by the cost-sensitive hybrid network model by using a Softmax classifier to classify, and measuring the similarity between a predicted value and a true value by using the cost-sensitive loss function to update parameters;
step 2.3, anomaly detection phase
Detecting the test data by using the cost sensitive hybrid network model trained in the step 2.2, and detecting the rest 20% of data in the time sequence dataAs a test sample, let phi (L) r (ii) a K, b) as a cost-sensitive hybrid network model, L r ∈L test_set The mathematical expression is:
2. The method for detecting the skew time series anomaly based on the cost-sensitive hybrid network according to claim 1, wherein the step 1.1 specifically comprises the following steps:
step 1.1.1, convolution operation
Definition ofRepresents a convolution kernel between the u-th channel in level d and the v-th channel in level d-1>Represents the output value of the u-th channel of the sample in layer d-1, < >>And/or>The local features of the time series are learned by convolution operations:
wherein,represents the output value of the u-th channel of level d, < >>Represents a bias of the u-th channel of level d>Representing convolution operation, and V represents the number of convolution kernels in the previous layer; />
Step 1.1.2, batch normalization operation
For an input time series of samples X = { X 1 ,x 2 ,…,x z And expressing the batch normalization operation as follows:
wherein, is a standard normalization value, τ is a constant used to ensure that the denominator is greater than 0, γ represents a data scale change, β represents a data offset, and->Representing a value after a batch normalization operation;
step 1.1.3, global average pooling layer
And carrying out average pooling operation on a plurality of feature vectors obtained by the last convolutional layer by utilizing the global average pooling layer to obtain the following results:
A={a 1 ,a 2 ,…,a U } (5)
wherein X u The feature vector, K, of the u-th channel after the last layer of convolution is represented GAP Representing a global average pooling matrix, U representing the dimension of the output feature vector, A representing the output value a of each channel u And combining as the final output vector.
3. The method for detecting the skew time series anomaly based on the cost-sensitive hybrid network as claimed in claim 1, wherein the first activation function in the step 1.2 is a Sigmoid activation function, and the second activation function is a tanh activation function;
reset gate p s Mapping to [0,1 ] by Sigmoid activation function]In a hidden stateMapping to [ -1,1 ] by tanh activation function]In the range of (a), the mathematical expression thereof is as follows:
p s =σ(K p ·[g s-1 ,X]) (6)
4. The method for detecting the skew time series anomaly based on the cost-sensitive hybrid network according to claim 1, wherein the specific steps of the step 2.1 are as follows:
step 2.1.1, data normalization processing
X{t m (x m ,l m ) } (M =1,2, \8230;, M) denotes a time-series dataset, where t m (x m ,l m ) Representing time series samples, x m Signal value, l, representing the m-th sample m Label representing the mth sample,/ m Is 0 or 1, M represents the total number of samples, and the mathematical expression is as follows:
step 2.1.2, time slicing
Long-time sequence data X { t) is processed by adopting a sliding window m (x m ,l m ) Segmentation (M =1,2, \8230;, M) into short overlapping segments, taking a window function window () of length w, with a moving step h, and normalizing the data, which has undergone said step 2.1.1Is divided into->Each segment->Is w, the expression is as follows:
5. The method for detecting the skew time series abnormality based on the cost-sensitive hybrid network as claimed in claim 1, wherein the feature learning process of the step 2.2 specifically comprises the following steps:
step 2.2.1, deep convolutional neural network feature learning: local feature learning of a time sequence is carried out by adopting the deep convolutional neural network DCNN based on the step 1, a hidden layer of the convolutional network consists of three convolutional layers, each convolutional layer comprises three processing operations, and the specific flow is as follows:
conv1 layer: assume that Conv1 layer has e 1 Size of k 1 Of the convolution kernelSuppose take e 1 =32,k 1 =8, for sample L r (L r ∈L train_set ) And the convolution kernel->Performing convolution operation to obtain e1 lengthsA feature vector of w-7->The final output ^ of Conv1 layer is then obtained through the BN operation and the activation function LeakyReLU>This process is expressed as follows:
conv2 layer: assume that Conv2 layer has e 2 Size of k 2 Of the convolution kernelSuppose to take e 2 =64,k 2 =5, the eigenvector @ obtained in the Conv1 layer is evaluated>And convolution kernel->Performing convolution operation to generate e 2 Characteristic vector with length w-11->Then, the final output ^ of Conv2 layer is obtained through the BN operation and the activation function LeakyReLU>This process is expressed as follows:
conv3 layer: assume that Conv3 layer has e 3 Size of k 3 Of the convolution kernelSuppose to take e 3 =128,k 3 =3, the eigenvector @obtainedby the Conv2 layer is substituted>And convolution kernel>Performing convolution operation to generate 128 pieces of data with the length of w-13 feature vector +>The final output ^ of Conv3 layer is then obtained through the BN operation and the activation function LeakyReLU>This process is expressed as follows:
GAP layer, feature vector for Conv3 layer outputUsing an AND>Convolution kernel K with same dimension GAP And/or>Convolution operation is carried out to generate a 128-dimensional characteristic vector->
WhereinRepresents the feature vector that the deep convolutional neural network finally learns->Each component value of (a);
step 2.2.2, gated recursive network feature learning: for the input time series data set L r (L r ∈L train_set ) Using the GRU learning sequence characteristics of the gated recursive network containing 128 cells to obtain the characteristic vector finally output by the gated recursive network
Wherein K p And K q Weight matrices, F, representing reset and update gates, respectively GRU A mapping function representing a GRU network;
step 2.2.3, outputting the cost sensitive hybrid network model: for the input time series samples L r (L r ∈L train_set ) Finally, the cost-sensitive hybrid network model outputs a probability value P by using a Softmax classifier nclass (L r ) Where nclass =0,1, it is assumed that nclass =0 denotes L r Belongs to a majority of classes, nclass =1 denotes L r Belonging to a few classes, this process is expressed as follows:
WhereinA feature vector representing the output of the convolutional network, <' >>A feature vector representing the output of the GRU network, the function concat (-) will pick the feature vector->And &>Splicing into a long vector;
step 2.2.4, updating parameters by using a cost sensitive loss function: for the probability value output by the cost-sensitive hybrid network model CSHM obtained in the step 2.2.3, the similarity between the predicted value and the true value is measured by the cost-sensitive loss function formula (11) in the step 1.3, wherein the weight isIs biased to->Adopting a mechanism that the learning rate is 0.001 and gradient is reduced for every 200 segments, using 40% of training samples to perform cross validation, and updating the weight K and the bias b through a back propagation mechanism of an Adam optimization algorithm;
the final weight K and the deviation b are related to penalty factors eta and nu, and when a few types of samples are detected wrongly, the relatively large penalty factor eta is used for expanding the total loss; when most samples are misdetected, a relatively small penalty factor v is used to control the increase of the total loss;
the proposed cost sensitive loss function is generalized to the multi-classification case, where the penalty factor of the multi-class skewed data samples is shown as equation (28):
wherein n is c_total Is the total number of samples of class c, η c Penalty factor corresponding to class c, c = {1,2, \8230;, n classes }。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010065816.8A CN111275113B (en) | 2020-01-20 | 2020-01-20 | Skew time series abnormity detection method based on cost sensitive hybrid network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010065816.8A CN111275113B (en) | 2020-01-20 | 2020-01-20 | Skew time series abnormity detection method based on cost sensitive hybrid network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111275113A CN111275113A (en) | 2020-06-12 |
CN111275113B true CN111275113B (en) | 2023-04-07 |
Family
ID=71003352
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010065816.8A Active CN111275113B (en) | 2020-01-20 | 2020-01-20 | Skew time series abnormity detection method based on cost sensitive hybrid network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111275113B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112039700B (en) * | 2020-08-26 | 2021-11-23 | 重庆理工大学 | Social network link abnormity prediction method based on stack generalization and cost sensitive learning |
CN112073227B (en) * | 2020-08-26 | 2021-11-05 | 重庆理工大学 | Social network link abnormity detection method by utilizing cascading generalization and cost sensitive learning |
CN112073298B (en) * | 2020-08-26 | 2021-08-17 | 重庆理工大学 | Social network link abnormity prediction system integrating stacked generalization and cost sensitive learning |
CN112364098A (en) * | 2020-11-06 | 2021-02-12 | 广西电网有限责任公司电力科学研究院 | Hadoop-based distributed power system abnormal data identification method and system |
CN112836719B (en) * | 2020-12-11 | 2024-01-05 | 南京富岛信息工程有限公司 | Indicator diagram similarity detection method integrating two classifications and triplets |
CN112686372A (en) * | 2020-12-28 | 2021-04-20 | 哈尔滨工业大学(威海) | Product performance prediction method based on depth residual GRU neural network |
CN113035361A (en) * | 2021-02-09 | 2021-06-25 | 北京工业大学 | Neural network time sequence classification method based on data enhancement |
CN113660196A (en) * | 2021-07-01 | 2021-11-16 | 杭州电子科技大学 | Network traffic intrusion detection method and device based on deep learning |
CN113705715B (en) * | 2021-09-04 | 2024-04-19 | 大连钜智信息科技有限公司 | Time sequence classification method based on LSTM and multi-scale FCN |
CN114881928A (en) * | 2022-04-02 | 2022-08-09 | 合肥工业大学 | Wheat frost disease detection method and system based on deep cost sensitive learning |
CN114912482A (en) * | 2022-04-30 | 2022-08-16 | 中国人民解放军海军航空大学 | Method and device for identifying radiation source |
CN116937758B (en) * | 2023-09-19 | 2023-12-19 | 广州德姆达光电科技有限公司 | Household energy storage power supply system and operation method thereof |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108846380A (en) * | 2018-04-09 | 2018-11-20 | 北京理工大学 | A kind of facial expression recognizing method based on cost-sensitive convolutional neural networks |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108932480B (en) * | 2018-06-08 | 2022-03-15 | 电子科技大学 | Distributed optical fiber sensing signal feature learning and classifying method based on 1D-CNN |
EP3594861B1 (en) * | 2018-07-09 | 2024-04-03 | Tata Consultancy Services Limited | Systems and methods for classification of multi-dimensional time series of parameters |
-
2020
- 2020-01-20 CN CN202010065816.8A patent/CN111275113B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108846380A (en) * | 2018-04-09 | 2018-11-20 | 北京理工大学 | A kind of facial expression recognizing method based on cost-sensitive convolutional neural networks |
Non-Patent Citations (1)
Title |
---|
谭洁帆 ; 朱焱 ; 陈同孝 ; 张真诚 ; .基于卷积神经网络和代价敏感的不平衡图像分类方法.计算机应用.2018,(07),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN111275113A (en) | 2020-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111275113B (en) | Skew time series abnormity detection method based on cost sensitive hybrid network | |
Kwon et al. | Beta shapley: a unified and noise-reduced data valuation framework for machine learning | |
He et al. | A novel ensemble method for credit scoring: Adaption of different imbalance ratios | |
CN110472817B (en) | XGboost integrated credit evaluation system and method combined with deep neural network | |
Zou et al. | Integration of residual network and convolutional neural network along with various activation functions and global pooling for time series classification | |
Mienye et al. | A deep learning ensemble with data resampling for credit card fraud detection | |
CN109034194B (en) | Transaction fraud behavior deep detection method based on feature differentiation | |
CN106203534A (en) | A kind of cost-sensitive Software Defects Predict Methods based on Boosting | |
CN110084609B (en) | Transaction fraud behavior deep detection method based on characterization learning | |
Shang et al. | A hybrid method for traffic incident detection using random forest-recursive feature elimination and long short-term memory network with Bayesian optimization algorithm | |
CN103927550A (en) | Handwritten number identifying method and system | |
CN113269647A (en) | Graph-based transaction abnormity associated user detection method | |
Shi et al. | Dynamic barycenter averaging kernel in RBF networks for time series classification | |
CN114547299A (en) | Short text sentiment classification method and device based on composite network model | |
CN115688101A (en) | Deep learning-based file classification method and device | |
Das et al. | Determining attention mechanism for visual sentiment analysis of an image using svm classifier in deep learning based architecture | |
CN117593037A (en) | Method for predicting completion capability of human-computer interaction user | |
Dong et al. | CML: A contrastive meta learning method to estimate human label confidence scores and reduce data collection cost | |
CN113837266B (en) | Software defect prediction method based on feature extraction and Stacking ensemble learning | |
Wu et al. | From grim reality to practical solution: Malware classification in real-world noise | |
Sreedhar et al. | An Improved Technique to Identify Fake News on Social Media Network using Supervised Machine Learning Concepts | |
Wang et al. | Early diagnosis of Parkinson's disease with Speech Pronunciation features based on XGBoost model | |
Li et al. | An improved adaboost algorithm for imbalanced data based on weighted KNN | |
Liu et al. | MRD-NETS: multi-scale residual networks with dilated convolutions for classification and clustering analysis of spacecraft electrical signal | |
Hou et al. | Adaptive learning cost‐sensitive convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |