CN111275113B - Skew time series abnormity detection method based on cost sensitive hybrid network - Google Patents

Skew time series abnormity detection method based on cost sensitive hybrid network Download PDF

Info

Publication number
CN111275113B
CN111275113B CN202010065816.8A CN202010065816A CN111275113B CN 111275113 B CN111275113 B CN 111275113B CN 202010065816 A CN202010065816 A CN 202010065816A CN 111275113 B CN111275113 B CN 111275113B
Authority
CN
China
Prior art keywords
representing
cost
samples
layer
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010065816.8A
Other languages
Chinese (zh)
Other versions
CN111275113A (en
Inventor
王晓峰
张英
李斌
王妍
雷锦锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN202010065816.8A priority Critical patent/CN111275113B/en
Publication of CN111275113A publication Critical patent/CN111275113A/en
Application granted granted Critical
Publication of CN111275113B publication Critical patent/CN111275113B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a skew time sequence anomaly detection method based on a cost-sensitive hybrid network, which comprises the steps of firstly establishing and training a cost-sensitive hybrid network model consisting of a deep convolutional neural network, a gated recursive network and a cost-sensitive loss function, wherein local characteristics of a time sequence are learned through the deep convolutional neural network, sequence characteristics of the time sequence are learned through the gated recursive network, then the characteristics are combined for classification, the similarity between an output result and a real value is measured by using the cost-sensitive loss function in the model training process, then parameters of the network model are adjusted through a back propagation algorithm, and different penalty factors are used for punishing error detection of the network model aiming at samples of different quantity and types. The method is simple and efficient, high in precision and strong in robustness, and has high detection precision on both the skew time series data set and the non-skew time series data set.

Description

Skew time series abnormity detection method based on cost sensitive hybrid network
Technical Field
The invention belongs to the technical field of time series data abnormity detection, and relates to a skew time series abnormity detection method based on a cost-sensitive hybrid network.
Background
Skew time series data refers to data sets with widely different sample data volumes in different classes. In practical applications, the time series data obtained by engineering measurement is mostly within a normal range, and has only a very small number of abnormal values, which is a typical skew time series data set. In the binary classification problem, the result of a general classifier is biased to a normal class, and the false detection rate of an abnormal class is very high. However, in practical applications, a few categories are focused on, for example, fault detection of spacecraft, disease diagnosis in the medical field, and credit card fraud in the financial field.
The time series classification method based on deep learning is based on the whole time series, and combines a feature extraction stage and a classification stage together for processing. The classification of univariate time series based on a multichannel Convolutional neural network model (MC-CNN) is proposed in the literature [ Y.ZHING, Q.Liu, E.Chen, explicit Multi-Channels later conditional neural network for Multi-variant time series classification [ J ], frontiers of Computer Science,2016,10 (1): 96-112 ]. The MC-CNN respectively learns time series characteristics by using three channels, combines the characteristics learned by the three channels together, and sends the combined characteristics to a Softmax layer for classification finally. Compared with the traditional algorithm, the MC-CNN model has better performance, but the MC-CNN model only performs experiments on two reference UCR data sets and cannot show superior performance. The most significant advantage of MCNN is that the classification technology of machine learning is continuously explored in the document [ Z, cui, W.Chen, Y.Chen, multi-scale conditional Neural Networks for time series classification [ J ], arXiv preprints arXiv:1603.06995,2016], a plurality of branch conversion layers are arranged, preprocessing is carried out on data in a time domain and a frequency domain respectively, a plurality of local convolution layers are established, features with different sizes and frequencies are automatically extracted, and the characteristic representation performance is outstanding. The MCNN makes model evaluations on 44 UCR reference data, with 10 data sets showing better experimental results. But MCNN requires more pre-processing and hyper-parameter settings. Because the MC-CNN and the MCNN models need to be preprocessed in a large quantity before training, the learning rate needs to be set manually, and the full connection layer is used for feature tiling before the network output layer, the parameters of network learning are increased obviously. The network models proposed in the documents [ Z.Wang, W.Yan, oates T.Time series classification from scratch with deep Neural Networks: A string basis, neural Networks (IJCNN), 2017International journal reference on IEEE,2017, 1578-1585] and [ F.Karim, S.Majumdar, H.Darabi, LSTM full volumetric network for time series classification [ J ], IEEE Access,2018, 1662-1669] all use a global average pooling layer instead of a full connection layer before the output layer, reducing network parameters, using an adaptive optimizer for the training of loss functions, avoiding the setting of learning rate. The documents [ Z.Wang, W.Yan, oates T.Time series classification from scratch with deep Neural Networks: A string basis, neural Networks (IJCNN), 2017International Joint Conference on. IEEE,2017, 1578-1585] propose a benchmark Neural network model in which Full Convolution Networks (FCN) and residual Networks (ResNet) achieve true end-to-end learning, without cumbersome preprocessing work, and show better performance on 18 data sets in an experiment of 44 UCR data sets. In order to further improve the performance of the network, two combined network models (Long Short message Memory network, LSTM-FCN) which combines a Long Short message Memory network and a full volume network for learning time sequence features are proposed in the documents [ F.Karim, S.Majumdar, H.Darabi, LSTM full connectivity network for time services classification [ J ], IEEE Access,2018,6: 1662-1669], wherein the ALTM-FCN model is disclosed by X29. Xiong, zhao, Y.Pat, K.Choo, Y.sub.Y.sub.sub.19, and X29. J.for predicting the time sequence features on the basis of LSTM-FCN. The ALSTM-FCN model is superior to the existing method in 51 data sets in 85 UCR data sets. However, the ALSTM-FCN model obviously reduces the learning efficiency by learning more parameters in the training phase. The prior literature also discloses the use of long-and-short memory networks for classification of medical and industrial data, but improvements in detection accuracy are not significant.
Most of the existing algorithms are directed at the situation that the scale of a minority class is equivalent to that of a majority class, and for an oblique class time series data set, namely a data set with the number of samples of the minority class and the majority class being seriously different, the feature learning of the minority class samples is insufficient by an algorithm, so that the classification precision is seriously reduced. Existing solutions are data-based sampling methods and algorithm-based methods. Data-based sampling methods typically preprocess a data set, including downsampling for most classes and oversampling for few classes. The former balances the data set by randomly deleting samples from the multi-class, with the disadvantage of causing loss of valuable information. The latter requires random generation of some samples that are not originally present to adjust the data distribution, and this method usually causes an overfitting problem in the final result because the original structure of the time series data is changed. The algorithm-based method is a threshold moving method, namely a decision threshold of the classifier is adjusted through experiments or artificial setting, and the cost for finding a proper threshold is very high. For a data set with serious skewness, the existing processing method has a plurality of problems, and the detection performance of the classifier is seriously influenced, so that the development of data analysis technology in the fields of industry, finance, medicine, military and the like is influenced.
Disclosure of Invention
The invention aims to provide a skew time series anomaly detection method based on a cost-sensitive hybrid network, which solves the problems that the detection precision of a few types of samples in a skew time series data set is low in the prior art, and the classification precision is seriously reduced because the feature learning of the few types of samples is insufficient in the prior algorithm.
The technical scheme adopted by the invention is a skew time series abnormity detection method based on a cost sensitive hybrid network. Firstly, establishing and training a cost-sensitive hybrid network model consisting of a Deep Convolutional Neural Network (DCNN), a gated recursive network (GRU) and a cost-sensitive loss function; learning local features of the time series through the deep convolution neural network DCNN, learning sequence features of the time series through the gated recursion network GRU, combining the features and classifying the combined features through a Soft-max classifier; in the model training process, a cost sensitive loss function is utilized to measure the similarity between an output result and a true value, then parameters of the network model are adjusted through a back propagation algorithm, and different penalty factors are used for samples with skew between classes to penalize error detection of the network model.
The invention is also characterized in that:
the method specifically comprises the following steps:
step 1, integrating a DCNN (deep convolutional neural network) and a GRU (gated recursive network) containing 128 cell units, introducing a cost sensitive loss function, and constructing a cost sensitive hybrid network model CSHN;
step 1.1, learning local features of a time sequence by using a Deep Convolutional Neural Network (DCNN) composed of three convolutional layers, wherein each convolutional layer comprises convolution operation and batch normalization operation, and a global average pooling layer is introduced into an output layer and used for reducing feature dimensions;
step 1.2, learning sequence characteristics of time sequence through gated recursion network GRU, wherein gated recursion network GRU is updated by update gate p s And a reset gate q s Composition, X represents a time-series data sample, g s Indicates the amount of output information at time s,
Figure BDA0002375933850000051
representing a hidden state at time s, the input to the memory unit at time s being g s-1 And X; reset gate p s Controlling the output value g of the last moment s-1 Into the hidden state at the present time>
Figure BDA0002375933850000052
Is mapped to by the activation function one0,1]In between, hidden states>
Figure BDA0002375933850000053
Mapping to [ -1,1 ] by activating function two]In the range of (a), the mathematical expression thereof is as follows:
p s =σ(K p ·[g s-1 ,X]) (6)
wherein, K p Weight matrix representing reset gates, [ g ] s-1 ,X]Representing a vector g of two inputs s-1 And X are connected into a long vector, and sigma is the first activation function;
updating door q s Information g for determining output of s-1 time s-1 Is brought into the S time to output information g s Degree of (d), update gate q s Value is [0,1 ]]The larger the value is, the output information g at the previous time is shown s-1 Is brought to the current moment and outputs information g s The less, the mathematical expression thereof is as follows:
q s =σ(K q ·[g s-1 ,X s ]) (8)
Figure BDA0002375933850000054
wherein, K q Weight matrix representing the updated gate, [ g ] s-1 ,X s ]Vector g representing two inputs s-1 And X s Connecting into a long vector, wherein sigma is the activation function I;
step 1.3, in the training process of the cost-sensitive hybrid network model, measuring the similarity between an output result and a true value by using a cost-sensitive loss function, wherein the expression is as follows:
Figure BDA0002375933850000055
wherein l j True label, X, representing the jth training sample j J-th time-series sample, σ, representing the input k,b (X j ) Representing model inputThe probability value is obtained, K represents a weight parameter, b represents bias, and N represents the total number of samples; wherein eta and ν respectively represent punishment factors under the condition that the minority samples and the majority samples are wrongly classified, and when the minority samples are wrongly detected, the punishment factors are multiplied by a larger punishment factor eta, so that the total loss is amplified; when most samples are detected wrongly, multiplying a smaller penalty factor v, eta and upsilon by the calculation formula as follows:
Figure BDA0002375933850000061
where N is the total number of samples, N normal_total Is the normal number of samples, n abnormal_total Is the number of abnormal samples, n classes For the sample class, n in the present invention classes =2;
Step 2, a skew time series data anomaly detection algorithm based on the cost sensitive hybrid network model:
the algorithm is mainly divided into three stages: the first stage is a data preprocessing stage; the second stage is a time sequence local feature learning stage, which mainly comprises the local feature learning of the time sequence based on the deep convolutional neural network DCNN in the step 1 and the local feature learning of the time sequence of the gated recursive network GRU; the third stage is an abnormality detection stage;
step 2.1, preprocessing data mainly comprises normalization operation and time slicing operation;
step 2.2, local feature learning of time series: data of 80% in the time series data
Figure BDA0002375933850000062
Inputting the training samples into the cost-sensitive hybrid network model constructed in the step 1 to learn local characteristics of a time sequence, simultaneously performing cross verification by using part of the training samples, and updating model parameters by adopting a back propagation algorithm in the whole training and learning process; the specific process of feature learning comprises the following steps: local features of the time sequence based on the deep convolutional neural network DCNN in the step 1Learning, namely performing local feature learning based on the time sequence of the gated recursive network GRU in the step 1, classifying by using a Softmax classifier to obtain a probability value output by the cost-sensitive hybrid network model, and updating parameters by using the cost-sensitive loss function to measure the similarity between a predicted value and a true value;
step 2.3, anomaly detection phase
Testing the test data by using the cost sensitive hybrid network model trained in the step 2.2, and testing the rest 20 percent of data in the time sequence data
Figure BDA0002375933850000071
As a sample book, let phi (L) r (ii) a K, b) as a cost-sensitive hybrid network model, L r ∈L test_set The mathematical expression is:
Figure BDA00023759338500000716
Figure BDA0002375933850000072
wherein, P nclass (L r ) Is phi (L) r (ii) a Predicted probability values of K, b), l r_label In order to predict the label of a sample,
Figure RE-GDA0002439109510000074
Figure RE-GDA0002439109510000075
are parameters obtained in the learning process.
Step 1.1 specifically comprises the following steps:
step 1.1.1, convolution operation
Definition of
Figure BDA0002375933850000074
Represents the difference between the u-th channel in the d-th layer and the v-th channel in the d-1 layerIn between convolution kernels, is greater than or equal to>
Figure BDA0002375933850000075
Represents the output value of the u-th channel of the sample in layer d-1, < >>
Figure BDA0002375933850000076
And &>
Figure BDA0002375933850000077
The local features of the time series are learned by convolution operations:
Figure BDA0002375933850000078
wherein the content of the first and second substances,
Figure BDA0002375933850000079
represents the output value of the u-th channel of level d, < >>
Figure BDA00023759338500000710
Represents a bias of the u-th channel of level d>
Figure BDA00023759338500000711
Representing convolution operation, and V represents the number of convolution kernels in the previous layer;
step 1.1.2, batch normalization operation
For an input time series of samples X = { X 1 ,x 2 ,…,x z And expressing the batch normalization operation as follows:
Figure BDA00023759338500000712
Figure BDA00023759338500000713
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00023759338500000714
is a standard normalization value, τ is a constant used to ensure that the denominator is greater than 0, γ represents a data scale change, β represents a data offset, and/or->
Figure BDA00023759338500000715
Represents the value after the batch normalization operation;
step 1.1.3, global average pooling layer
And carrying out average pooling operation on a plurality of feature vectors obtained by the previous convolutional layer by utilizing the global average pooling layer to obtain the following result:
Figure BDA0002375933850000081
A={a 1 ,a 2 ,…,a U } (5)
wherein, X u Representing the feature vector, K, of the u-th channel after the last layer of convolution GAP Representing a global average pooling matrix, U representing the dimension of the output feature vector, A representing the output value a of each channel u And combining as the final output vector.
In step 1.2, the first activation function is a Sigmoid activation function, and the second activation function is a tanh activation function;
reset gate p s Mapping to [0,1 ] by Sigmoid activation function]In the above-mentioned manner, hidden state g & s Mapping to [ -1,1 ] by tanh activation function]In the range of (a), the mathematical expression thereof is as follows:
p s =σ(K p ·[g s-1 ,X]) (6)
Figure BDA0002375933850000082
wherein, K p Weight matrix representing reset gates, [ g ] s-1 ,X]Representing a vector g of two inputs s-1 And X are connected into a long vector, sigma is the Sigmoid activation function,
Figure BDA0002375933850000083
representing the weights for computing the hidden state.
The specific steps of step 2.1 are as follows:
step 2.1.1, data normalization processing
X{t m (x m ,l m ) } (M =1,2, \ 8230;, M) represents a time-series data set, where t m (x m ,l m ) Representing time series samples, x m Signal value, l, representing the m-th sample m Label representing the mth sample,/ m Is 0 or 1, M represents the total number of samples, and the mathematical expression is as follows:
Figure BDA0002375933850000091
wherein the content of the first and second substances,
Figure BDA0002375933850000092
define >>
Figure BDA0002375933850000093
Representing the normalized time series data set;
step 2.1.2, time slicing
Long time sequence data X { t } is processed by adopting sliding window m (x m ,l m ) Dividing (M =1,2, \ 8230;, M) into short overlapping segments, taking a window function window () with length w, which is shifted by a step length h, and normalizing the data processed by the step 2.1.1
Figure BDA0002375933850000094
Is divided into->
Figure BDA0002375933850000095
Each segment->
Figure BDA0002375933850000096
Is w, the expression is as follows:
Figure BDA0002375933850000097
/>
wherein L is r Denotes the r-th fragment, sets w to half the period of the time-series data,
Figure BDA00023759338500000912
for the total number of fragments, M represents the total number of samples.
The specific steps of step 2.2 are as follows:
step 2.2.1, deep convolutional neural network feature learning: and (2) local feature learning of a time sequence is carried out by adopting the deep convolutional neural network DCNN based on the step 1, a hidden layer of the convolutional network consists of three convolutional layers, each convolutional layer comprises three processing operations, and the specific flow is as follows.
Conv1 layer: assume that Conv1 layer has e 1 Size of k 1 Of the convolution kernel
Figure BDA0002375933850000098
Suppose to take e 1 =32,k 1 =8, for sample L r (L r ∈L train_set ) And the convolution kernel->
Figure BDA0002375933850000099
Performing convolution operation to obtain e1 characteristic vectors with the length of w-7>
Figure BDA00023759338500000910
Then, the final output of the Conv1 layer is obtained through the BN operation and the activation function LeakyReLU
Figure BDA00023759338500000911
This process is expressed as follows:
Figure BDA0002375933850000101
Figure RE-GDA0002439109510000102
Figure RE-GDA0002439109510000103
wherein
Figure BDA0002375933850000104
Represents a Conv1 level bias>
Figure BDA0002375933850000105
Representing a convolution operation;
conv2 layer: assume that Conv2 layer has e 2 Size of k 2 Convolution kernel of
Figure BDA0002375933850000106
Suppose to take e 2 =64,k 2 =5, the characteristic vector obtained by Conv1 layer is quantified =>
Figure BDA0002375933850000107
And convolution kernel->
Figure BDA0002375933850000108
Performing convolution operation to generate e 2 Characteristic vector with length w = 11->
Figure BDA0002375933850000109
Then, the final output ^ of Conv2 layer is obtained through the BN operation and the activation function LeakyReLU>
Figure BDA00023759338500001010
This process is expressed as follows:
Figure BDA00023759338500001011
Figure RE-GDA00024391095100001012
Figure RE-GDA00024391095100001013
wherein
Figure BDA00023759338500001014
Represents the bias of the Conv2 layer;
conv3 layer: assume that Conv3 layer has e 3 Each size is k 3 Of the convolution kernel
Figure BDA00023759338500001015
Suppose take e 3 =128,k 3 =3, the characteristic vector determined by the Conv2 layer is/is @>
Figure BDA00023759338500001016
And convolution kernel->
Figure BDA00023759338500001017
Performing convolution operation to generate 128 characteristic vectors with the length of w-13>
Figure BDA00023759338500001018
The final output ^ of Conv3 layer is then obtained through the BN operation and the activation function LeakyReLU>
Figure BDA00023759338500001019
This process is expressed as follows:
Figure BDA00023759338500001020
Figure RE-GDA00024391095100001021
Figure RE-GDA00024391095100001022
wherein
Figure BDA0002375933850000111
Indicating the bias of the Conv3 layer.
GAP layer, feature vector for Conv3 layer output
Figure BDA0002375933850000112
Using and>
Figure BDA0002375933850000113
convolution kernel K with same dimension GAP And
Figure BDA0002375933850000114
convolution operation is carried out to generate a 128-dimensional characteristic vector->
Figure BDA0002375933850000115
Figure BDA0002375933850000116
Figure BDA0002375933850000117
Wherein
Figure BDA0002375933850000118
Feature vector representing the eventual learning of a deep convolutional neural network>
Figure BDA0002375933850000119
Each component value of (a);
step 2.2.2, gated recursive network feature learning: for an input time series data set L r (L r ∈L train_set ) Using the GRU learning sequence characteristics of the gated recursive network containing 128 cells to obtain the final output characteristics of the gated recursive networkEigenvector
Figure BDA00023759338500001110
Figure BDA00023759338500001111
Wherein K p And K q Weight matrices, F, representing reset and update gates, respectively GRU A mapping function representing a GRU network;
step 2.2.3, outputting the cost sensitive hybrid network model: for the input time series samples L r (L r ∈L train_set ) Finally outputting a probability value P by using a Softmax classifier by using the cost-sensitive hybrid network model nclass (L r ) Where nclass =0,1, and nclass =0 denotes L r Belongs to the majority of classes, nclass =1 denotes L r Belonging to a few classes, this process is expressed as follows:
Figure BDA00023759338500001112
wherein
Figure BDA00023759338500001113
A feature vector representing the output of the convolutional network, <' >>
Figure BDA00023759338500001114
A feature vector representing the output of the GRU network, the function concat (-) will pick the feature vector->
Figure BDA00023759338500001115
And &>
Figure BDA00023759338500001116
Splicing into a long vector;
step 2.2.4, updating parameters by using a cost sensitive loss function: probability value output by the cost sensitive hybrid network model CSHM obtained in the step 2.2.3Measuring the similarity between the predicted value and the true value by the cost sensitive loss function formula (11) in said step 1.3, wherein the weight
Figure RE-GDA0002439109510000121
Is biased to->
Figure RE-GDA0002439109510000122
Adopting a mechanism that the learning rate is 0.001 and gradient is reduced for every 200 segments, using 40% of training samples to perform cross validation, and updating the weight K and the bias b through a back propagation mechanism of an Adam optimization algorithm;
the final weight K and the deviation b are related to penalty factors eta and nu, and when a few types of samples are detected wrongly, the relatively large penalty factor eta is used for expanding the total loss; when most samples are misdetected, a relatively small penalty factor v is used to control the increase of the total loss;
generalizing the proposed cost sensitive loss function to a multi-class case, where the penalty factor for multi-class skewed data samples is as shown in equation (28):
Figure BDA0002375933850000123
wherein N is the total number of samples, N c_total Is the total number of samples of class c, η c Penalty factor corresponding to class c, c = {1,2, \8230;, n classes }。
The invention has the beneficial effects that:
the invention provides a skew time series data anomaly detection algorithm based on a cost-sensitive hybrid network model, wherein the cost-sensitive hybrid network model integrates the characteristics that a DCNN has strong local feature learning capability and a GRU has good sequence feature learning capability. The cost-sensitive hybrid network model has stronger nonlinear representation performance, is an end-to-end network model, and avoids a complex data preprocessing process. According to the invention, a cost sensitive loss function is introduced into the CSHN network model, and parameters of the network model are adjusted by adopting different penalty loss factors aiming at different types of samples, so that the problem of insufficient feature learning of a few types of samples is solved. The invention solves the problem of insufficient learning of a few types of samples in the prior art, and avoids the problems that the sampling method changes the structure of data, the threshold in the threshold moving method is difficult to determine and the like. The method is simple and efficient, high in precision and strong in robustness. The method has higher detection precision on the skewed time series data set and the non-skewed time series data set.
Drawings
FIG. 1 is a schematic diagram of a time series anomaly detection algorithm in the skew time series anomaly detection method based on the cost-sensitive hybrid network according to the present invention;
FIG. 2 is a schematic diagram of a cost-sensitive hybrid network model in the skew time series anomaly detection method based on the cost-sensitive hybrid network of the present invention;
FIG. 3 is a schematic diagram of a full connection layer in the skew time series anomaly detection method based on a cost-sensitive hybrid network according to the present invention;
FIG. 4 is a schematic diagram of a global average pooling layer in the skew time series anomaly detection method based on the cost-sensitive hybrid network according to the present invention;
FIG. 5 is a schematic diagram of a GRU network structure in the method for detecting the skew time series anomaly based on the cost-sensitive hybrid network according to the present invention;
FIG. 6 is a comparison of F-measure for different models on Dataset Dataset 1;
FIG. 7 is a comparison of F-measure for different models on Dataset Dataset 2;
FIG. 8 (a) is a graph comparing the loss variation of four networks on Dataset Dataset 1;
fig. 8 (b) is a graph comparing the loss variation of the four networks on Dataset 2.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
As shown in FIGS. 1 and 2, the method for detecting the skew time series abnormality based on the cost-sensitive hybrid network of the present invention comprises the steps of firstly establishing and training a cost-sensitive hybrid network model composed of a deep convolutional neural network DCNN, a gated recursive network GRU containing 128 cell units, and a cost-sensitive loss function, wherein local features of a time series are learned through the deep convolutional neural network DCNN, sequence features of the time series are learned through the gated recursive network GRU, then the features are combined and classified through a Soft-max classifier, the similarity between an output result and a real value is measured by using the cost-sensitive loss function in a model training process, then parameters of the network model are adjusted through a back propagation algorithm, and different penalty factors are used for samples of different numbers and categories to penalize error detection of the network model. By introducing BN operation and GAP operation, the overfitting problem of a general neural network caused by a plurality of training parameters is avoided, and the training speed of the model is improved.
The method for detecting the skew time series abnormity based on the cost sensitive hybrid network specifically comprises the following steps:
step 1, integrating a deep convolutional neural network DCNN and a gated recursive network GRU containing 128 cell units, introducing a cost sensitive loss function, and constructing a cost sensitive hybrid network model CSHN;
step 1.1, learning local features of a time sequence by using a Deep Convolutional Neural Network (DCNN) composed of three convolutional layers, wherein each convolutional layer comprises convolution operation and batch normalization operation, and a global average pooling layer is introduced into an output layer and is used for reducing feature dimensions;
step 1.1.1, convolution operation
The purpose of the convolution operation is to learn the local features of the sample. Definition of
Figure BDA0002375933850000141
Represents a convolution kernel between the u-th channel in level d and the v-th channel in level d-1, and->
Figure BDA0002375933850000142
Represents the output value of the sample in the u-th channel in layer d-1, -is combined with the output value of the preceding channel>
Figure BDA0002375933850000143
And with
Figure BDA0002375933850000144
The local features of the time series are learned by convolution operations:
Figure BDA0002375933850000145
wherein the content of the first and second substances,
Figure BDA0002375933850000146
represents an output value of the u-th channel of level d, <' > is determined>
Figure BDA0002375933850000147
Represents a bias of the u-th channel of level d>
Figure BDA0002375933850000148
Representing convolution operation, and V represents the number of convolution kernels in the previous layer;
step 1.1.2, batch normalization operation
The batch normalization enables the intermediate output value of each layer to tend to be stable, and therefore the problem of unstable data distribution in the training process is solved. Therefore, the data needs to be normalized before each layer is input, which not only improves the stability of the network, but also improves the generalization capability of the network.
For an input time series of samples X = { X 1 ,x 2 ,…,x z And expressing the batch normalization operation as follows:
Figure BDA0002375933850000151
Figure BDA0002375933850000152
wherein the content of the first and second substances,
Figure BDA0002375933850000153
is a standard normalization value, τ is a constant used to ensure that the denominator is greater than 0, γ represents a data scale change, β represents a data offset, and/or->
Figure BDA0002375933850000154
Represents the value after the batch normalization operation;
step 1.1.3, global average pooling layer
General convolutional networks to reduce the dimensionality of the feature vectors obtained by the last layer of convolution operations, typically include one or more fully-connected layers near the output layer, as shown in fig. 3. In the method of the present invention, we use a global average pooling layer GAP instead of a fully connected layer, as shown in fig. 4. This can reduce not only the dimensionality of the feature vectors, but also the parameters of the network. Carrying out average pooling operation on a plurality of feature vectors obtained by the previous convolutional layer by utilizing the global average pooling layer to obtain the following results:
Figure BDA0002375933850000155
A={a 1 ,a 2 ,…,a U } (5)
wherein, X u Representing the feature vector, K, of the u-th channel after the last layer of convolution GAP Representing a global average pooling matrix, U representing the dimension of the output feature vector, A representing the output value a of each channel u And combining as the final output vector.
Step 1.2, learning sequence characteristics of time sequence through gated recursion network GRU, wherein gated recursion network GRU is updated by update gate p s And a reset gate q s The updating gate is obtained by combining a forgetting gate and an input gate in the LSTM network structure. The structure of the GRU network is shown in fig. 5: x represents a time-series data sample, g s Output indicating time sThe amount of information is such that,
Figure BDA0002375933850000161
indicating a hidden state at time s, the input of the memory cell at time s being g s-1 And X; reset gate p s Controlling the output value g of the last moment s-1 Flowing into hidden state at present>
Figure BDA0002375933850000162
The reset gate is mapped to [0,1 ] by the activation function one]In between, a smaller value of the reset gate indicates that less information has been flowing into the current hidden state at the previous time, and the hidden state->
Figure BDA0002375933850000163
Mapping to [ -1,1 ] by activating function two]In the range of (a), the mathematical expression thereof is as follows:
p s =σ(K p ·[g s-1 ,X]) (6)
wherein, K p Weight matrix representing reset gates, [ g ] s-1 ,X]Representing a vector g of two inputs s-1 And X are connected into a long vector, and sigma is the first activation function;
updating door q s Information g for determining output of s-1 time s-1 Is brought into the s-time output information g s Degree of updating gate q s Value is [0,1 ]]The larger the value is, the output information g at the previous time is shown s-1 Is brought to the current moment and outputs information g s The less, the mathematical expression thereof is as follows:
q s =σ(K q ·[g s-1 ,X s ]) (8)
Figure BDA0002375933850000164
wherein, K q Weight matrix representing the updated gate, [ g ] s-1 ,X s ]Representing a vector g of two inputs s-1 And X s Connecting into a long vector, wherein sigma is the activation function I;
in step 1.2, the first activation function is a Sigmoid activation function, and the second activation function is a tanh activation function;
reset gate p s Mapping to [0,1 ] by Sigmoid activation function]Between, hidden state
Figure BDA0002375933850000165
Mapping to [ -1,1 ] by tanh activation function]In the range of (a), the mathematical expression thereof is as follows:
p s =σ(K p ·[g s-1 ,X]) (6)
Figure BDA0002375933850000166
wherein, K p Weight matrix representing reset gates, [ g ] s-1 ,X]Representing a vector g of two inputs s-1 And X are connected into a long vector, sigma is the Sigmoid activation function, K g~ Representing the weights for computing the hidden state.
Step 1.3, generally, during model training, a cross entropy loss function is used to measure the similarity between a true value and a predicted value, and the expression thereof is as follows:
Figure BDA0002375933850000171
wherein l j True label, X, representing the jth training sample j Represents the j time-series sample of the input, σ k,b (X j ) Representing the probability value of the model output, K representing the weight parameter, b representing the bias, and N representing the total number of samples.
In general, the total loss f 1 The smaller (K, b), the better the learning effect of the model. In the case of a severely skewed data distribution, the network model cannot obtain enough feature representation from a few classes by using a general cross entropy loss function, and therefore, the detection accuracy of the few classes is severely affected. The reason is that the general cross entropy loss function is on a small number of classesThe loss of samples and most classes of samples have the same penalty factor.
In order to solve the problem, in the training process of the cost-sensitive hybrid network model, a cost-sensitive loss function is used for measuring the similarity between an output result and a true value, and the expression is as follows:
Figure BDA0002375933850000172
wherein l j True label, X, representing the jth training sample j J-th time-series sample, σ, representing the input k,b (X j ) Representing the probability value output by the model, K representing a weight parameter, b representing a bias, and N representing the total number of samples; wherein eta and ν respectively represent punishment factors under the condition that the minority samples and the majority samples are wrongly classified, and when the minority samples are wrongly detected, the punishment factors are multiplied by a larger punishment factor eta, so that the total loss is amplified; when most samples are detected incorrectly, multiplying by a smaller penalty factor v, since the total number of most samples is more, the total loss is also large, and the calculation formula of η, v is as follows:
Figure BDA0002375933850000181
where N is the total number of samples, N normal_total Is the normal number of samples, n abnormal_total Is the number of abnormal samples, n classes For the sample class, n in the present invention classes =2;
Step 2, a skew time series data anomaly detection algorithm based on the cost sensitive hybrid network model:
the specific algorithm framework is shown in fig. 1, and the algorithm mainly comprises three stages: the first stage is a data preprocessing stage; the second stage is a time-series local feature learning stage, which mainly comprises the local feature learning of the time series based on the deep convolutional neural network DCNN in the step 1 and the local feature learning of the time series of the gated recursive network GRU; the third stage is an abnormality detection stage;
step 2.1, preprocessing data mainly comprises normalization operation and time slicing operation;
step 2.1.1, data normalization processing
X{t m (x m ,l m ) } (M =1,2, \8230;, M) denotes a time-series dataset, where t m (x m ,l m ) Representing time series samples, x m Signal value, l, representing the m-th sample m Label representing the m-th sample, l m Is 0 or 1, M represents the total number of samples, and the mathematical expression is as follows:
Figure BDA0002375933850000182
wherein the content of the first and second substances,
Figure BDA0002375933850000183
define >>
Figure BDA0002375933850000184
Representing a normalized set of time series data;
step 2.1.2, time slicing
Long time sequence data X { t } is processed by adopting sliding window m (x m ,l m ) Segmentation (M =1,2, \8230;, M) into short overlapping segments, taking a window function window () of length w, with a moving step h, and normalizing the data, which has undergone said step 2.1.1
Figure BDA0002375933850000185
Is divided into->
Figure BDA0002375933850000186
Each segment->
Figure BDA0002375933850000187
Is w, the expression is as follows:
Figure BDA0002375933850000191
wherein L is r Denotes the r-th fragment, sets w to half the period of the time-series data,
Figure BDA0002375933850000192
for the total number of fragments, M represents the total number of samples.
Step 2.2, local feature learning of time series: data of 80% in the time series data
Figure BDA0002375933850000193
Inputting the training samples into the local features of the learning time sequence in the cost-sensitive hybrid network model constructed in the step 1, simultaneously performing cross verification by using part of the training samples, and updating model parameters by adopting a back propagation algorithm in the whole training and learning process; the specific process of feature learning comprises the following steps: based on the local feature learning of the time sequence of the deep convolutional neural network DCNN in the step 1, based on the local feature learning of the time sequence of the gated recursive network GRU in the step 1, obtaining a probability value output by the cost-sensitive hybrid network model by using a Softmax classifier to classify, and measuring the similarity between a predicted value and a true value by using the cost-sensitive loss function to update parameters;
step 2.2.1, deep convolutional neural network feature learning: and (2) local feature learning of a time sequence is carried out by adopting the deep convolutional neural network DCNN based on the step 1, a hidden layer of the convolutional network consists of three convolutional layers, each convolutional layer comprises three processing operations, and the specific flow is as follows.
Conv1 layer: assume that Conv1 layer has e 1 Each size is k 1 Of the convolution kernel
Figure BDA0002375933850000194
Suppose to take e 1 =32,k 1 =8, for sample L r (L r ∈L train_set ) And the convolution kernel->
Figure BDA0002375933850000195
Performing convolution operation to obtain e1 characteristic vectors with the length of w-7>
Figure BDA0002375933850000196
Then, the final output of the Conv1 layer is obtained through the BN operation and the activation function LeakyReLU
Figure BDA0002375933850000197
This process is expressed as follows:
Figure BDA0002375933850000198
Figure RE-GDA0002439109510000201
Figure RE-GDA0002439109510000202
wherein
Figure BDA0002375933850000203
Represents a bias of Conv1 level,. Sup.,>
Figure BDA0002375933850000204
representing a convolution operation;
conv2 layer: assume that Conv2 layer has e 2 Size of k 2 Of the convolution kernel
Figure BDA0002375933850000205
Suppose take e 2 =64,k 2 =5, the eigenvector @ obtained in the Conv1 layer is evaluated>
Figure BDA0002375933850000206
And convolution kernel->
Figure BDA0002375933850000207
Performing convolution operation to generate e 2 Characteristic vector with length w-11->
Figure BDA0002375933850000208
Then, the final output ^ of Conv2 layer is obtained through the BN operation and the activation function LeakyReLU>
Figure BDA0002375933850000209
This process is expressed as follows:
Figure BDA00023759338500002010
Figure RE-GDA00024391095100002011
Figure RE-GDA00024391095100002012
wherein
Figure BDA00023759338500002013
Represents the bias of the Conv2 layer;
conv3 layer: assume that Conv3 layer has e 3 Size of k 3 Convolution kernel of
Figure BDA00023759338500002014
Suppose take e 3 =128,k 3 =3, the characteristic vector determined by the Conv2 layer is/is @>
Figure BDA00023759338500002015
And convolution kernel>
Figure BDA00023759338500002016
Performing convolution operation to generate 128 characteristic vectors with the length of w-13>
Figure BDA00023759338500002017
The final output ^ of Conv3 layer is then obtained through the BN operation and the activation function LeakyReLU>
Figure BDA00023759338500002018
This process is expressed as follows:
Figure BDA00023759338500002019
Figure RE-GDA00024391095100002020
Figure RE-GDA00024391095100002021
wherein
Figure BDA00023759338500002022
Indicating the bias of the Conv3 layer.
GAP layer for feature vector output by Conv3 layer
Figure BDA0002375933850000211
Using an AND>
Figure BDA0002375933850000212
Convolution kernel K with same dimension GAP And with
Figure BDA0002375933850000213
Convolution operation is carried out to generate a 128-dimensional characteristic vector->
Figure BDA0002375933850000214
Figure BDA0002375933850000215
Figure BDA0002375933850000216
Wherein
Figure BDA0002375933850000217
Represents the feature vector that the deep convolutional neural network finally learns->
Figure BDA0002375933850000218
Each component value of (a);
step 2.2.2, gating recursive network feature learning: for an input time series data set L r (L r ∈L train_set ) Using the GRU learning sequence characteristics of the gated recursive network containing 128 cells to obtain the characteristic vector finally output by the gated recursive network
Figure BDA0002375933850000219
Figure BDA00023759338500002110
Wherein K p And K q Weight matrices, F, representing reset and update gates, respectively GRU A mapping function representing a GRU network;
step 2.2.3, outputting the cost sensitive hybrid network model: for the input time series samples L r (L r ∈L train_set ) Finally, the cost-sensitive hybrid network model outputs a probability value P by using a Softmax classifier nclass (L r ) Where nclass =0,1, it is assumed that nclass =0 denotes L r Belongs to the majority of classes, nclass =1 denotes L r Belonging to a few classes, this process is expressed as follows:
Figure BDA00023759338500002111
wherein
Figure BDA00023759338500002112
A feature vector representing the output of the convolutional network, <' >>
Figure BDA00023759338500002113
A feature vector representing the output of the GRU network, the function concat (-) will pick the feature vector->
Figure BDA00023759338500002114
And &>
Figure BDA00023759338500002115
Splicing into a long vector;
step 2.2.4, updating parameters by using a cost sensitive loss function: for the probability value output by the cost-sensitive hybrid network model CSHM obtained in the step 2.2.3, the similarity between the predicted value and the true value is measured by the cost-sensitive loss function formula (11) in the step 1.3, wherein the weight is
Figure RE-GDA0002439109510000221
Is biased to->
Figure RE-GDA0002439109510000222
Adopting a mechanism that the learning rate is 0.001 and gradient is reduced for every 200 segments, using 40% of training samples to perform cross verification, and updating the weight K and the bias b through a back propagation mechanism of an Adam optimization algorithm;
the final weight K and the deviation b are related to penalty factors eta and nu, and when a few types of samples are detected wrongly, the relatively large penalty factor eta is used for expanding the total loss; when most samples are misdetected, a relatively small penalty factor v is used to control the increase of the total loss;
the proposed cost sensitive loss function is generalized to the multi-class case, where the penalty factor for the multi-class skewed data samples is shown in equation (28):
Figure BDA0002375933850000223
wherein n is c_total Is the total number of samples of class c, η c Penalty factor corresponding to class c, c = {1,2, \8230;, n classes }。
Step 2.3, anomaly detection phase
Testing the test data by using the cost sensitive hybrid network model trained in the step 2.2, and testing the rest 20 percent of data in the time sequence data
Figure BDA0002375933850000224
As a test sample, let phi (L) r (ii) a K, b) as a cost-sensitive hybrid network model, L r ∈L test_set The mathematical expression is:
Figure BDA0002375933850000225
Figure BDA0002375933850000226
wherein, P nclass (L r ) Is phi (L) r (ii) a Predicted probability value of K, b)/ r_label In order to predict the label of a sample,
Figure RE-GDA0002439109510000227
are parameters obtained in the learning process.
The invention relates to a simulation experiment result of a skew time series abnormity detection method based on a cost sensitive hybrid network, which comprises the following steps:
the experiment is carried out on an actual engineering data set and 44 UCR reference data sets, and the actual engineering data comprises a flywheel rotating speed data set (DataSet 1) and a gyroscope temperature data set (DataSet 2) of a certain device. The number of normal and abnormal values in these data sets varies greatly. Assume that most classes represent normal classes and a few classes represent abnormal classes. In the experiments, the FCN _ alsm network, the Resnet network, the FCN network, and the proposed CSHN were implemented using Keras deep learning packages. SVC (Support Vector Classification), adaBoost, RFC (Random Forest Classification) algorithms are implemented using the Scik kit-left package in Python 3.5.
1. Evaluation index
The performance of the method was evaluated using True Positive (TP), false Negative (FN), true Negative (TN) and False Positive (FP), which are defined as follows:
TP, the classifier detects the positive classes as the number of the positive classes; FN, the classifier detects the positive class as the number of the negative class; the classifier detects the negative class as the number of the positive class; TN the classifier detects the negative classes as the number of negative classes.
In the experiment, ACC was used + 、ACC - G-means and F-Measure to evaluate the performance of the algorithm, ACC + And ACC - Respectively representing the detection rates of normal samples and abnormal samples, and G-means can comprehensively evaluate the detection performance of the algorithm, which is defined as follows:
Figure BDA0002375933850000231
Figure BDA0002375933850000232
Figure BDA0002375933850000233
the F-measure is a comprehensive evaluation index for measuring the detection performance of the classifier on abnormal samples and is defined by weighted harmonic mean of Recall (Recall) and Precision (Precision).
Figure BDA0002375933850000241
Figure BDA0002375933850000242
Figure BDA0002375933850000243
Where Recall is a measure of completeness (i.e., how many samples of the exception class are correctly identified), precision is a measure of accuracy, and β is used to adjust the importance of accuracy relative to Recall (typically β = 1).
2. CSHN model assessment
In order to verify the effectiveness of the CSHN model of the method, the influence of a cost sensitive loss function and a general cross entropy loss function on the detection precision is compared. For this purpose, experiments were performed combining the proposed hybrid network model (DCNN + GRU) with cost-sensitive loss functions and general cross-entropy loss functions, which were performed on DataSet1 and DataSet2, respectively, and the results are shown in tables 1 and 2.
TABLE 1 test results on DataSet DataSet1
Figure BDA0002375933850000244
TABLE 2 test results on DataSet DataSet2
Figure BDA0002375933850000245
As can be seen from tables 1 and 2, for the normal samples, ACC + The range of variation in the values is small. ACC for abnormal samples using a general cross entropy loss function - The value is only around 78%, whereas with the cost sensitive loss function proposed by the present invention, ACC is used - The value increases by about 5% to 10%. The method obviously improves the detection precision of the minority class, which means that the proposed cost sensitive loss function can solve the problem of low detection precision of the minority class caused by the inclination of data distribution. In addition, from G-means and F-meas can be seen, the use of a cost sensitive loss function can improve detection performance.
3. Performance comparison
3.1ACC + 、ACC - Evaluation and comparison of G-means
In order to evaluate the detection performance of the method, evaluation indexes on the data sets DataSet1 and DataSet2 were calculated, and the results are shown in tables 3 and 4:
table 3 detection accuracy of different methods on DataSet1
Figure BDA0002375933850000251
Table 4 detection accuracy of different methods on DataSet2
Figure BDA0002375933850000252
As can be seen from tables 3 and 4, the deep learning based method is superior to the machine learning based algorithm. For normal samples, all detection methods ACC + The detection results are all more than 94%. ACC for abnormal samples, methods of the invention - ACC with value greater than the comparison method - The value is obtained. Through G-means comprehensive evaluation, the detection performance of the method is superior to that of a comparison method.
3.2 comparison of F-measure detection results
To further investigate the detection performance of the method, fig. 6 and 7 show the F-measure detection results of different methods on the data sets DataSet1 and DataSet2, respectively. In FIGS. 6 and 7, the abscissa represents the names of the different methods, and the ordinate represents the F-measure values of the different models on the data sets Dataset1 and Dataset2, respectively.
As can be seen from fig. 6 and 7, the deep learning based approach is superior to the machine learning based algorithm. This is because neural networks have better characterization capabilities for non-linear relationships. In the comparative method, the F-measure value of the method of the present invention is significantly higher than that of the other methods. This means that the detection performance of the method of the present invention is superior to that of the comparative method.
3.3 evaluation and comparison of convergence Rate and stability
For deep neural networks, the training loss reflects the convergence speed and stability of the network model. In terms of loss accuracy, the CSHN model is compared with FCN _ alsm, resnet, and FCN models based on deep learning, and fig. 8 (a) and 8 (b) are training loss variation curves of data sets DataSet1 and DataSet2, respectively. In fig. 8 (a) and 8 (b), the abscissa represents the number of iterations of the model, and the ordinate represents the loss values of the FCN _ alsm, respet, and FCN models on the data sets DataSet1 and DataSet2, respectively.
Fig. 8 (a) and 8 (b) show the variation trend of the training loss values on the data sets DataSet1 and DataSet2, respectively. It can be seen that the CSHN model loss values converge faster. On data set1, the loss value of the proposed CSHN model tends to be stable and significantly lower than the comparison network model when the number of iterations is greater than 250. On the data set DataSet2, when the number of iterations is greater than 120, the loss value of the proposed CSHN model tends to be stable and lower than the loss value of the comparison network model. This means that the stability of the model is better than that of the comparative network model.
4. Performance evaluation of UCR public datasets
To further verify the detection performance of the proposed CSHN model, 44 UCR equilibrium datasets were experimented. Since the model is tested over multiple data sets, it is necessary to define new metrics to evaluate the overall test performance. For this reason, the detection performance was evaluated using the accuracy of the evaluation index and the average error per type (MPCE).
Here, a data pool G = { G =isdefined z },g z Represents the z-th data set, C z Representative data set g c The evaluation index is defined as follows:
Figure BDA0002375933850000261
Figure BDA0002375933850000262
wherein the PCE z As a data set g z The MPCE is the average value of the error rates detected by each class in the data pool G, and Z is the number of data sets in the data pool G, and in the present invention, Z =44. The experimental results are shown in table 5:
TABLE 5 accuracy of different methods on 44 UCR datasets and MPCE
Figure BDA0002375933850000263
/>
Figure BDA0002375933850000271
In table 5, the first column indicates the names of 44 UCR datasets, nclasses indicates the number of categories in each dataset, and win indicates the number of different methods with the highest accuracy among the 44 UCR datasets, where the highest accuracy means the highest experimental accuracy among the different methods on the same dataset.
As can be seen from Table 5, the method provided by the present invention has a significant detection effect not only on skewed data sets, but also on non-skewed data sets.

Claims (5)

1. The method for detecting the abnormity of the skew time series based on the cost-sensitive hybrid network is characterized by firstly establishing a cost-sensitive hybrid network model consisting of a deep convolutional neural network DCNN, a gated recursive network GRU and a cost-sensitive loss function, wherein the local characteristics of the time series are learned through the deep convolutional neural network DCNN, the sequence characteristics of the time series are learned through the gated recursive network GRU, then the characteristics are combined and classified through a Soft-max classifier, the similarity between an output result and a true value is measured by using the cost-sensitive loss function in the model training process, then the parameters of the network model are adjusted through a back propagation algorithm, and different penalty factors are used for samples of different numbers and categories to penalize the error detection of the network model, and the method specifically comprises the following steps:
step 1, integrating a deep convolutional neural network DCNN and a gated recursive network GRU containing 128 cell units, introducing a cost sensitive loss function, and constructing a cost sensitive hybrid network model CSHN;
step 1.1, learning local features of a time sequence by using a Deep Convolutional Neural Network (DCNN) composed of three convolutional layers, wherein each convolutional layer comprises convolution operation and batch normalization operation, and a global average pooling layer is introduced into an output layer and is used for reducing feature dimensions;
step 1.2, learning sequence characteristics of the time sequence by gating recursive network GRU, which updates gate p s And a reset gate q s Composition, X represents a time-series data sample, g s Indicating the amount of output information at time s,
Figure FDA0004052030400000011
representing a hidden state at time s, the input to the memory unit at time s being g s-1 And X; reset gate p s Controlling the output value g of the last moment s-1 Into the hidden state at the present time>
Figure FDA0004052030400000012
The reset gate is mapped to [0,1 ] by the activation function one]In between, hidden state>
Figure FDA0004052030400000013
Mapping to [ -1,1 ] by activating function two]In the range of (a), the mathematical expression thereof is as follows:
p s =σ(K p ·[g s-1 ,X]) (6)
wherein, K p Weight matrix representing reset gates, [ g ] s-1 ,X]Representing a vector g of two inputs s-1 And X are connected into a long vector, and sigma is the first activation function;
updating door q s Information g for determining output of s-1 time s-1 Is brought into the s time output information g s Degree of updating gate q s Value is [0,1 ]]The larger the value is, the output information g at the previous time is shown s-1 Is brought into the current time output information g s The less, the mathematical expression thereof is as follows:
q s =σ(K q ·[g s-1 ,X s ]) (8)
Figure FDA0004052030400000021
wherein, K q Weight matrix representing the updated gate, [ g ] s-1 ,X s ]Vector g representing two inputs s-1 And X s Connecting into a long vector, wherein sigma is the activation function I;
step 1.3, in the training process of the cost-sensitive hybrid network model, measuring the similarity between an output result and a true value by using a cost-sensitive loss function, wherein the expression is as follows:
Figure FDA0004052030400000022
wherein l j True label, X, representing the jth training sample j J-th time-series sample, σ, representing the input k,b (X j ) Representing the probability value output by the model, K representing a weight parameter, b representing a bias, and N representing the total number of samples; wherein eta and ν respectively represent punishment factors under the condition that the minority samples and the majority samples are wrongly classified, and when the minority samples are wrongly detected, the punishment factors are multiplied by a larger punishment factor eta, so that the total loss is amplified; when most samples are detected wrongly, multiplying a smaller penalty factor v, eta and v by the calculation formula as follows:
Figure FDA0004052030400000023
where N is the total number of samples, N normal_total Is the normal number of samples, n abnormal_total Number of abnormal samples, n classes Is a sample class, n classes =2;
Step 2, a skew class time sequence data anomaly detection algorithm based on the cost sensitive hybrid network model:
the algorithm is mainly divided into three stages: the first stage is a data preprocessing stage; the second stage is a time-series local feature learning stage, which mainly comprises the local feature learning of the time series based on the deep convolutional neural network DCNN in the step 1 and the local feature learning of the time series of the gated recursive network GRU; the third stage is an abnormality detection stage;
step 2.1, preprocessing data mainly comprises normalization operation and time slicing operation;
step 2.2, local feature learning of time series: data of 80% in the time series data
Figure FDA0004052030400000034
Inputting the training samples into the cost-sensitive hybrid network model constructed in the step 1 to learn local characteristics of a time sequence, simultaneously performing cross validation by using part of the training samples, and updating model parameters by adopting a back propagation algorithm in the whole training and learning process; the specific process of feature learning comprises the following steps: based on the local feature learning of the time sequence of the deep convolutional neural network DCNN in the step 1, based on the local feature learning of the time sequence of the gating recursive network GRU in the step 1, obtaining a probability value output by the cost-sensitive hybrid network model by using a Softmax classifier to classify, and measuring the similarity between a predicted value and a true value by using the cost-sensitive loss function to update parameters;
step 2.3, anomaly detection phase
Detecting the test data by using the cost sensitive hybrid network model trained in the step 2.2, and detecting the rest 20% of data in the time sequence data
Figure FDA0004052030400000031
As a test sample, let phi (L) r (ii) a K, b) as a cost-sensitive hybrid network model, L r ∈L test_set The mathematical expression is:
Figure FDA0004052030400000032
Figure FDA0004052030400000033
wherein, P nclass (L r ) Is phi (L) r (ii) a Predicted probability value of K, b)/ r_label In order to predict the label of a sample,
Figure FDA0004052030400000041
are parameters obtained in the learning process.
2. The method for detecting the skew time series anomaly based on the cost-sensitive hybrid network according to claim 1, wherein the step 1.1 specifically comprises the following steps:
step 1.1.1, convolution operation
Definition of
Figure FDA0004052030400000042
Represents a convolution kernel between the u-th channel in level d and the v-th channel in level d-1>
Figure FDA0004052030400000043
Represents the output value of the u-th channel of the sample in layer d-1, < >>
Figure FDA0004052030400000044
And/or>
Figure FDA0004052030400000045
The local features of the time series are learned by convolution operations:
Figure FDA0004052030400000046
wherein the content of the first and second substances,
Figure FDA0004052030400000047
represents the output value of the u-th channel of level d, < >>
Figure FDA0004052030400000048
Represents a bias of the u-th channel of level d>
Figure FDA0004052030400000049
Representing convolution operation, and V represents the number of convolution kernels in the previous layer; />
Step 1.1.2, batch normalization operation
For an input time series of samples X = { X 1 ,x 2 ,…,x z And expressing the batch normalization operation as follows:
Figure FDA00040520304000000410
Figure FDA00040520304000000411
wherein the content of the first and second substances,
Figure FDA00040520304000000412
Figure FDA00040520304000000413
is a standard normalization value, τ is a constant used to ensure that the denominator is greater than 0, γ represents a data scale change, β represents a data offset, and->
Figure FDA00040520304000000414
Representing a value after a batch normalization operation;
step 1.1.3, global average pooling layer
And carrying out average pooling operation on a plurality of feature vectors obtained by the last convolutional layer by utilizing the global average pooling layer to obtain the following results:
Figure FDA00040520304000000415
A={a 1 ,a 2 ,…,a U } (5)
wherein X u The feature vector, K, of the u-th channel after the last layer of convolution is represented GAP Representing a global average pooling matrix, U representing the dimension of the output feature vector, A representing the output value a of each channel u And combining as the final output vector.
3. The method for detecting the skew time series anomaly based on the cost-sensitive hybrid network as claimed in claim 1, wherein the first activation function in the step 1.2 is a Sigmoid activation function, and the second activation function is a tanh activation function;
reset gate p s Mapping to [0,1 ] by Sigmoid activation function]In a hidden state
Figure DA00040520304031024924
Mapping to [ -1,1 ] by tanh activation function]In the range of (a), the mathematical expression thereof is as follows:
p s =σ(K p ·[g s-1 ,X]) (6)
Figure FDA0004052030400000051
wherein, K p Indicating reset gateWeight matrix, [ g ] s-1 ,X]Representing a vector g of two inputs s-1 And X are connected into a long vector, sigma is the Sigmoid activation function,
Figure FDA0004052030400000052
representing the weights for computing the hidden state.
4. The method for detecting the skew time series anomaly based on the cost-sensitive hybrid network according to claim 1, wherein the specific steps of the step 2.1 are as follows:
step 2.1.1, data normalization processing
X{t m (x m ,l m ) } (M =1,2, \8230;, M) denotes a time-series dataset, where t m (x m ,l m ) Representing time series samples, x m Signal value, l, representing the m-th sample m Label representing the mth sample,/ m Is 0 or 1, M represents the total number of samples, and the mathematical expression is as follows:
Figure FDA0004052030400000053
wherein the content of the first and second substances,
Figure FDA0004052030400000054
define >>
Figure FDA0004052030400000055
Representing a normalized time series data set;
step 2.1.2, time slicing
Long-time sequence data X { t) is processed by adopting a sliding window m (x m ,l m ) Segmentation (M =1,2, \8230;, M) into short overlapping segments, taking a window function window () of length w, with a moving step h, and normalizing the data, which has undergone said step 2.1.1
Figure FDA0004052030400000061
Is divided into->
Figure FDA0004052030400000062
Each segment->
Figure FDA0004052030400000063
Is w, the expression is as follows:
Figure FDA0004052030400000064
wherein L is r Denotes the r-th fragment, sets w to half the period of the time-series data,
Figure FDA0004052030400000065
for the total number of fragments, M represents the total number of samples.
5. The method for detecting the skew time series abnormality based on the cost-sensitive hybrid network as claimed in claim 1, wherein the feature learning process of the step 2.2 specifically comprises the following steps:
step 2.2.1, deep convolutional neural network feature learning: local feature learning of a time sequence is carried out by adopting the deep convolutional neural network DCNN based on the step 1, a hidden layer of the convolutional network consists of three convolutional layers, each convolutional layer comprises three processing operations, and the specific flow is as follows:
conv1 layer: assume that Conv1 layer has e 1 Size of k 1 Of the convolution kernel
Figure FDA0004052030400000066
Suppose take e 1 =32,k 1 =8, for sample L r (L r ∈L train_set ) And the convolution kernel->
Figure FDA0004052030400000067
Performing convolution operation to obtain e1 lengthsA feature vector of w-7->
Figure FDA0004052030400000068
The final output ^ of Conv1 layer is then obtained through the BN operation and the activation function LeakyReLU>
Figure FDA0004052030400000069
This process is expressed as follows:
Figure FDA00040520304000000610
Figure FDA00040520304000000611
Figure FDA0004052030400000071
wherein
Figure FDA0004052030400000072
Represents a bias of Conv1 level,. Sup.,>
Figure FDA0004052030400000073
represents a convolution operation;
conv2 layer: assume that Conv2 layer has e 2 Size of k 2 Of the convolution kernel
Figure FDA0004052030400000074
Suppose to take e 2 =64,k 2 =5, the eigenvector @ obtained in the Conv1 layer is evaluated>
Figure FDA0004052030400000075
And convolution kernel->
Figure FDA0004052030400000076
Performing convolution operation to generate e 2 Characteristic vector with length w-11->
Figure FDA0004052030400000077
Then, the final output ^ of Conv2 layer is obtained through the BN operation and the activation function LeakyReLU>
Figure FDA0004052030400000078
This process is expressed as follows:
Figure FDA0004052030400000079
Figure FDA00040520304000000710
/>
Figure FDA00040520304000000711
wherein
Figure FDA00040520304000000712
Represents the bias of the Conv2 layer;
conv3 layer: assume that Conv3 layer has e 3 Size of k 3 Of the convolution kernel
Figure FDA00040520304000000713
Suppose to take e 3 =128,k 3 =3, the eigenvector @obtainedby the Conv2 layer is substituted>
Figure FDA00040520304000000714
And convolution kernel>
Figure FDA00040520304000000715
Performing convolution operation to generate 128 pieces of data with the length of w-13 feature vector +>
Figure FDA00040520304000000716
The final output ^ of Conv3 layer is then obtained through the BN operation and the activation function LeakyReLU>
Figure FDA00040520304000000717
This process is expressed as follows:
Figure FDA00040520304000000718
Figure FDA00040520304000000719
Figure FDA00040520304000000720
wherein
Figure FDA00040520304000000721
Represents the bias of the Conv3 layer;
GAP layer, feature vector for Conv3 layer output
Figure FDA00040520304000000722
Using an AND>
Figure FDA00040520304000000723
Convolution kernel K with same dimension GAP And/or>
Figure FDA0004052030400000081
Convolution operation is carried out to generate a 128-dimensional characteristic vector->
Figure FDA0004052030400000082
Figure FDA0004052030400000083
Figure FDA0004052030400000084
Wherein
Figure FDA0004052030400000085
Represents the feature vector that the deep convolutional neural network finally learns->
Figure FDA0004052030400000086
Each component value of (a);
step 2.2.2, gated recursive network feature learning: for the input time series data set L r (L r ∈L train_set ) Using the GRU learning sequence characteristics of the gated recursive network containing 128 cells to obtain the characteristic vector finally output by the gated recursive network
Figure FDA0004052030400000087
Figure FDA0004052030400000088
Wherein K p And K q Weight matrices, F, representing reset and update gates, respectively GRU A mapping function representing a GRU network;
step 2.2.3, outputting the cost sensitive hybrid network model: for the input time series samples L r (L r ∈L train_set ) Finally, the cost-sensitive hybrid network model outputs a probability value P by using a Softmax classifier nclass (L r ) Where nclass =0,1, it is assumed that nclass =0 denotes L r Belongs to a majority of classes, nclass =1 denotes L r Belonging to a few classes, this process is expressed as follows:
Figure FDA0004052030400000089
Wherein
Figure FDA00040520304000000810
A feature vector representing the output of the convolutional network, <' >>
Figure FDA00040520304000000811
A feature vector representing the output of the GRU network, the function concat (-) will pick the feature vector->
Figure FDA00040520304000000812
And &>
Figure FDA00040520304000000813
Splicing into a long vector;
step 2.2.4, updating parameters by using a cost sensitive loss function: for the probability value output by the cost-sensitive hybrid network model CSHM obtained in the step 2.2.3, the similarity between the predicted value and the true value is measured by the cost-sensitive loss function formula (11) in the step 1.3, wherein the weight is
Figure FDA00040520304000000814
Is biased to->
Figure FDA00040520304000000815
Adopting a mechanism that the learning rate is 0.001 and gradient is reduced for every 200 segments, using 40% of training samples to perform cross validation, and updating the weight K and the bias b through a back propagation mechanism of an Adam optimization algorithm;
the final weight K and the deviation b are related to penalty factors eta and nu, and when a few types of samples are detected wrongly, the relatively large penalty factor eta is used for expanding the total loss; when most samples are misdetected, a relatively small penalty factor v is used to control the increase of the total loss;
the proposed cost sensitive loss function is generalized to the multi-classification case, where the penalty factor of the multi-class skewed data samples is shown as equation (28):
Figure FDA0004052030400000091
wherein n is c_total Is the total number of samples of class c, η c Penalty factor corresponding to class c, c = {1,2, \8230;, n classes }。
CN202010065816.8A 2020-01-20 2020-01-20 Skew time series abnormity detection method based on cost sensitive hybrid network Active CN111275113B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010065816.8A CN111275113B (en) 2020-01-20 2020-01-20 Skew time series abnormity detection method based on cost sensitive hybrid network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010065816.8A CN111275113B (en) 2020-01-20 2020-01-20 Skew time series abnormity detection method based on cost sensitive hybrid network

Publications (2)

Publication Number Publication Date
CN111275113A CN111275113A (en) 2020-06-12
CN111275113B true CN111275113B (en) 2023-04-07

Family

ID=71003352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010065816.8A Active CN111275113B (en) 2020-01-20 2020-01-20 Skew time series abnormity detection method based on cost sensitive hybrid network

Country Status (1)

Country Link
CN (1) CN111275113B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112073298B (en) * 2020-08-26 2021-08-17 重庆理工大学 Social network link abnormity prediction system integrating stacked generalization and cost sensitive learning
CN112073227B (en) * 2020-08-26 2021-11-05 重庆理工大学 Social network link abnormity detection method by utilizing cascading generalization and cost sensitive learning
CN112039700B (en) * 2020-08-26 2021-11-23 重庆理工大学 Social network link abnormity prediction method based on stack generalization and cost sensitive learning
CN112364098A (en) * 2020-11-06 2021-02-12 广西电网有限责任公司电力科学研究院 Hadoop-based distributed power system abnormal data identification method and system
CN112836719B (en) * 2020-12-11 2024-01-05 南京富岛信息工程有限公司 Indicator diagram similarity detection method integrating two classifications and triplets
CN112686372A (en) * 2020-12-28 2021-04-20 哈尔滨工业大学(威海) Product performance prediction method based on depth residual GRU neural network
CN113035361A (en) * 2021-02-09 2021-06-25 北京工业大学 Neural network time sequence classification method based on data enhancement
CN113660196A (en) * 2021-07-01 2021-11-16 杭州电子科技大学 Network traffic intrusion detection method and device based on deep learning
CN113705715B (en) * 2021-09-04 2024-04-19 大连钜智信息科技有限公司 Time sequence classification method based on LSTM and multi-scale FCN
CN114881928A (en) * 2022-04-02 2022-08-09 合肥工业大学 Wheat frost disease detection method and system based on deep cost sensitive learning
CN116937758B (en) * 2023-09-19 2023-12-19 广州德姆达光电科技有限公司 Household energy storage power supply system and operation method thereof

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846380A (en) * 2018-04-09 2018-11-20 北京理工大学 A kind of facial expression recognizing method based on cost-sensitive convolutional neural networks

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932480B (en) * 2018-06-08 2022-03-15 电子科技大学 Distributed optical fiber sensing signal feature learning and classifying method based on 1D-CNN
EP3594861B1 (en) * 2018-07-09 2024-04-03 Tata Consultancy Services Limited Systems and methods for classification of multi-dimensional time series of parameters

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846380A (en) * 2018-04-09 2018-11-20 北京理工大学 A kind of facial expression recognizing method based on cost-sensitive convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谭洁帆 ; 朱焱 ; 陈同孝 ; 张真诚 ; .基于卷积神经网络和代价敏感的不平衡图像分类方法.计算机应用.2018,(07),全文. *

Also Published As

Publication number Publication date
CN111275113A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
CN111275113B (en) Skew time series abnormity detection method based on cost sensitive hybrid network
Kwon et al. Beta shapley: a unified and noise-reduced data valuation framework for machine learning
He et al. A novel ensemble method for credit scoring: Adaption of different imbalance ratios
CN110472817B (en) XGboost integrated credit evaluation system and method combined with deep neural network
CN110287983B (en) Single-classifier anomaly detection method based on maximum correlation entropy deep neural network
CN109034194B (en) Transaction fraud behavior deep detection method based on feature differentiation
CN109993236A (en) Few sample language of the Manchus matching process based on one-shot Siamese convolutional neural networks
Mienye et al. A deep learning ensemble with data resampling for credit card fraud detection
CN110084609B (en) Transaction fraud behavior deep detection method based on characterization learning
Shang et al. A hybrid method for traffic incident detection using random forest-recursive feature elimination and long short-term memory network with Bayesian optimization algorithm
CN113269647A (en) Graph-based transaction abnormity associated user detection method
CN114547299A (en) Short text sentiment classification method and device based on composite network model
CN113674862A (en) Acute renal function injury onset prediction method based on machine learning
Das et al. Determining attention mechanism for visual sentiment analysis of an image using svm classifier in deep learning based architecture
Dong et al. CML: A contrastive meta learning method to estimate human label confidence scores and reduce data collection cost
Sreedhar et al. An Improved Technique to Identify Fake News on Social Media Network using Supervised Machine Learning Concepts
Jeyakarthic et al. Optimal bidirectional long short term memory based sentiment analysis with sarcasm detection and classification on twitter data
CN113837266A (en) Software defect prediction method based on feature extraction and Stacking ensemble learning
Li et al. An improved adaboost algorithm for imbalanced data based on weighted KNN
Wang et al. Early diagnosis of Parkinson's disease with Speech Pronunciation features based on XGBoost model
Wu et al. From grim reality to practical solution: Malware classification in real-world noise
Zhang et al. Evaluation of judicial imprisonment term prediction model based on text mutation
Liu et al. MRD-NETS: multi-scale residual networks with dilated convolutions for classification and clustering analysis of spacecraft electrical signal
CN116170187A (en) Industrial Internet intrusion monitoring method based on CNN and LSTM fusion network
CN115688101A (en) Deep learning-based file classification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant