Disclosure of Invention
The invention aims to provide a hybrid anomaly detection method based on a countermeasures self-encoder, which improves the accuracy of anomaly detection.
In order to achieve the above object, the present invention provides a hybrid anomaly detection method based on a countering self-encoder, comprising the following steps:
improving a countermeasure self-encoder model, and extracting characteristics of noise-added input data by using the improved countermeasure self-encoder model;
performing weighted fusion on the two groups of extracted characteristic vectors to obtain fusion characteristic vectors;
taking the fusion feature vector as training data, and integrally training an error classifier, an LOF classifier and a K-means classifier by utilizing an integrated learning mode to obtain a detection classifier;
and extracting two groups of feature vectors in a test set by using the improved confrontation self-encoder model, fusing the two groups of feature vectors, and inputting the fused feature vectors into the detection classifier to obtain an abnormal detection result.
Wherein, improving the antagonistic self-encoder model and extracting the characteristics of the noise-added input data by using the improved antagonistic self-encoder model comprises the following steps:
taking a LeakyReLU function and a Tanh function as activation functions of a first convolutional layer, a third convolutional layer and a fourth convolutional layer in two encoders in a self-encoder model, forming a self-encoder by a generator and the first encoder, and improving the activation function and the mapping function of a discriminator;
normalizing the acquired data set, and performing noise adding processing on the divided training set;
and training the improved confrontation self-encoder model, and extracting two groups of characteristics of the training set.
Training the improved confrontation self-encoder model, and extracting two groups of characteristics of the training set, wherein the training comprises:
inputting the noisy training set into the improved confrontation self-encoder model, performing feature extraction by using a first encoder, and reconstructing the training set by using the generator to take the extracted first group of feature vectors as input;
and performing feature extraction on the reconstructed training set by using a second encoder to obtain a second group of feature vectors.
Wherein, carry out the weight fusion to two sets of characteristic vectors of drawing out, obtain and fuse the characteristic vector, include:
and multiplying the first group of feature vectors by a weighting coefficient, multiplying the second group of feature vectors by subtracting the weighting coefficient from 1, adding the two products, and combining the corresponding labels to obtain corresponding fusion feature vectors.
The fusion feature vector is used as training data, and the ifroest classifier, the LOF classifier and the K-means classifier are integrally trained in an integrated learning mode to obtain a detection classifier, and the method comprises the following steps:
serializing three classifiers by using an AdaBoost algorithm, inputting the fusion feature vector as a training set into an ifoest classifier for training, adjusting weight distribution to obtain a weight coefficient, inputting the training set into a next classifier until the ifoest classifier, an LOF classifier and a K-means classifier are trained, and integrating all the weight coefficients to obtain a detection classifier.
Wherein, the improved confrontation self-encoder model is used for extracting two groups of characteristic vectors in a test set, and the two groups of characteristic vectors are fused and then input into the detection classifier to obtain an abnormal detection result, which comprises the following steps:
denoising the divided test set, and extracting two groups of feature vectors by using a first encoder and a second encoder in the improved confrontation self-encoder model;
and performing weighted fusion on the two groups of feature vectors, and inputting the feature vectors into a search detection classifier to obtain an abnormal detection result of each test sample.
The invention relates to a hybrid anomaly detection method based on a countermeasure self-encoder, which comprises the steps of firstly, improving a countermeasure self-encoder model, and extracting noise-added input data characteristics by utilizing the improved countermeasure self-encoder model; then, carrying out weighted fusion processing on the two extracted features; then, taking the fused feature vector obtained by fusion as training data, and integrally training an ifoest classifier, an LOF classifier and a K-means classifier by utilizing an integrated learning mode to obtain a detection classifier; and finally, extracting two groups of feature vectors in a test set by using the improved confrontation self-encoder model, fusing the two groups of feature vectors, and inputting the fused feature vectors into the detection classifier to obtain an abnormal detection result. Compared with the prior art, the method has the advantages that the countermeasure autoencoder is combined with the traditional anomaly detection method, so that the anomaly detection can be more accurately carried out on the data set, and the accuracy of the anomaly detection is improved.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Referring to fig. 1 to 3, the present invention provides a hybrid anomaly detection method based on a run-length encoder, comprising the following steps:
s101, improving a confrontation self-encoder model, and extracting characteristics of noise-added input data by using the improved confrontation self-encoder model.
Specifically, an improved antagonistic self-encoder model is constructed, and features of input data are extracted. The DCGAN model is improved and comprises two encoders E1 and E2, a generator G and a discriminator D, wherein the two encoders E1 and E2 adopt the same structure: the four convolution layers are used for dimension reduction, the second layer and the third layer adopt BatchNormalization technology, the activation function of the first three layers is LeakyReLU function, and the activation function of the last layer is Tanh. The generator G and the first encoder E1 form a self-encoder, which has four layers of inverse convolution neural networks, the first three layers use Batch Normalization technology, the activation function is LeakyReLU function, and the last layer uses Tanh as the activation function. The discriminator D has a similar network structure to the first encoder E1, and uses a four-layer convolutional neural network, the second and third layers use Batch Normalization technology, the activation function of the first three layers is a leak relu function, and the last layer uses Sigmoid as an output mapping function, as shown in fig. 4 to 6.
And normalizing the data set into a single-channel or three-channel picture with the same specification size. And splitting the data set into a training set and a testing set according to the ratio of 4:1, and training the improved self-countervailing encoder model on the training set. The noise adding processing is carried out on the training set before training, so that the characteristics representing the input data can be better extracted. The network adopts Adam algorithm to update parameters of each layer of the network, and the iteration times are 200 generations. The generator G and the discriminator D are trained first, and after training is completed, the encoders E1 and E2 are trained using noisy data. Three loss functions are included:
Lenc=||z1-z2||2
Lcon=||x-x′||1
Ladv=||f(x)-f(x')||2
the specific training process comprises the following steps: inputting m single-channel noise-added gray images or RGB three-channel noise-added images with the same pixel size into a confrontation self-encoder modelThe noisy training set is denoted as S { (x)1,y1),...,(xm,ym) In which xiRepresenting noisy n-dimensional data, yiIs a data tag. After convolutional layer feature extraction by the first encoder E1, each picture is represented as a first set of feature vectors Z1={(z11,y1),...,(z1m,ym) And outputting and storing the form of the Chinese character. The generator G reconstructs a training set using these feature vectors as input, and outputs reconstructed data S { (x)1',y1),...,(xm',ym)}. The discriminator D is responsible for ensuring that the reconstructed data is as similar as possible to the original data. The second encoder E2, after re-extracting the features, represents a second set of feature vectors Z2={(z21,y1),...,(z2m,ym) And storing.
And S102, carrying out weighted fusion on the two groups of extracted characteristic vectors to obtain fusion characteristic vectors.
Specifically, for the two sets of feature vectors Z obtained in S11={(z11,y1),...,(z1m,ym) Z and2={(z21,y1),...,(z2m,ym) And performing weighted fusion processing on the feature data of every two data respectively:
wherein, the lambda is a weighting coefficient and is selected from 0 to 1 according to the actual situation.
For the obtained Z31~Z3mAdd their original tag y1~ymForm a new set of eigenvectors Z3={(z31,y1),...,(z3m,ym)}。
S103, taking the fusion feature vector as training data, and integrally training an ifoest classifier, an LOF classifier and a K-means classifier by utilizing an integrated learning mode to obtain a detection classifier.
Specifically, serializing three classifiers by using an AdaBoost algorithm, inputting the fused feature vector as a training set into an ifoest classifier for training, adjusting weight distribution to obtain a weight coefficient, inputting the training set into a next classifier until the training of the ifoest classifier, an LOF classifier and a K-means classifier is completed, and integrating all the weight coefficients to obtain a detection classifier, as shown in fig. 7, S31: data Z after model training3In the process of importing the AdaBoost detection model, firstly, weight distribution W of training data is initialized1=(w11,...,w1m),w1i1/m, wherein i is 1, 2. Each training sample is given the same weight, and the fusion feature vector is taken as a training set.
S32: using the training set to train iforcest, i.e. classifier h
1(z) adjusting the iTree number through cross validation, and setting an abnormal ratio according to the sample label to enable a loss function e
1And minimum. Calculate h
1Error of (z)
If get e
1And if the number is more than 0.5, the training of the next classifier is directly skipped. Calculating a weight coefficient
Updating the weight distribution W
2=(w
21,...,w
2m),
Where i 1, 2., m, f (z) is the distribution function of the raw data.
S33: enter next classifier LOF, train LOF using training set, i.e. classifier h
2(z). Minimizing the loss function e by adjusting the K value in the LOF algorithm through cross-validation
2Threshold t of training LOF
1:(t
1',k)=argmin
t1',ke
2Even if the loss function e
2Minimum t
1' is denoted by t
1Comparing the abnormality score with a threshold value t
1Is determined by the sizeConstant value, and calculate h
2Error of (z)
If get e
2And if the number is more than 0.5, the training of the next classifier is directly skipped. Calculating a weight coefficient
Updating the weight distribution W
3=(w
31,...,w
3m),
Where i 1, 2., m, f (z) is the distribution function of the raw data.
S34: enter the next classifier K-means, i.e. classifier h
3(z). Adjusting the number K of clustering clusters in the K-means algorithm through cross validation, and selecting the relative distance D of each clustering center as an abnormal score to minimize the error e
3Training K-means to obtain a threshold value t
2:(D,k)=arg min
D,ke
2Even if the loss function e
3The smallest D is denoted as t
2Comparing the relative distance D with a threshold value t
2To determine an outlier; and calculate h
3Error of (z)
If get e
3And if the number is more than 0.5, the training of the next classifier is skipped directly. Calculating a weight coefficient
Linearly combining the 3 classifiers to obtain a final detection classifier or a final strong classifier
And S104, extracting two groups of feature vectors in a test set by using the improved confrontation self-encoder model, fusing the two groups of feature vectors, and inputting the fused feature vectors into the detection classifier to obtain an abnormal detection result.
Specifically, firstly, the test data set is subjected to noise adding, then the noise is sent to a trained improved confrontation self-encoder model, and two groups of feature vectors Z of the test set are extracted through encoders E1 and E21' and Z2'and weighted fusion in the step S102 is performed to obtain a fused feature vector Z'.
The fused feature vector Z' is taken as input data and sent to the detection classifier h (Z) obtained in step S103, and the strong classifier outputs the anomaly score of each test sample, thereby obtaining an anomaly detection result.
In order to verify the effectiveness of the hybrid anomaly detection model based on the anti-self encoder, the method of the invention is compared with the anomaly detection effects of three traditional anomaly detection algorithms in an MNIST data set, and the comparison result is shown in Table 1. Compared with three traditional anomaly detection methods, the accuracy value and the AUC value of the method provided by the invention are greatly improved, and the method is proved to have higher reliability.
TABLE 1 comparison of the four test methods
|
Iforest
|
LOF
|
OCSVM
|
Text algorithm
|
Rate of accuracy
|
89.81
|
84.57
|
70.82
|
92.38
|
AUC value
|
0.79
|
0.83
|
0.80
|
0.95 |
Compared with the prior art, the invention has the following remarkable advantages:
1. compared with the characteristics extracted by the traditional machine learning method, the characteristics extracted by the method are more abstract and representative, and the accuracy rate of anomaly detection can be effectively improved.
2. The integrated learning in the deep learning is introduced, three traditional anomaly detection algorithms, namely, iforest, LOF and K-means, are respectively integrated through the AdaBoost algorithm, and compared with the method of directly using the traditional anomaly detection algorithms, the method is more accurate, can process data with high dimensionality, and does not need to make feature selection.
3. By resisting two groups of feature vectors decoded by the self-encoder model and performing weighted fusion on the two groups of feature vectors, the features of the extracted input data are more representative, and compared with a method for extracting the features of the self-encoder alone, the method is more reliable and has higher accuracy rate of anomaly detection.
The invention relates to a hybrid anomaly detection method based on a countermeasure self-encoder, which comprises the steps of firstly, improving a countermeasure self-encoder model, and extracting noise-added input data characteristics by utilizing the improved countermeasure self-encoder model; then, carrying out weighted fusion processing on the two extracted features; then, taking the fused feature vector obtained by fusion as training data, and integrally training an ifoest classifier, an LOF classifier and a K-means classifier by utilizing an integrated learning mode to obtain a detection classifier; and finally, extracting two groups of feature vectors in a test set by using the improved confrontation self-encoder model, fusing the two groups of feature vectors, and inputting the fused feature vectors into the detection classifier to obtain an abnormal detection result. Compared with the prior art, the method has the advantages that the countermeasure autoencoder is combined with the traditional anomaly detection method, so that the anomaly detection can be more accurately carried out on the data set, and the accuracy of the anomaly detection is improved.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.