CN116318877A - Method for defending countersamples by using intrusion detection system with various characteristic manifold vectors - Google Patents

Method for defending countersamples by using intrusion detection system with various characteristic manifold vectors Download PDF

Info

Publication number
CN116318877A
CN116318877A CN202310123302.7A CN202310123302A CN116318877A CN 116318877 A CN116318877 A CN 116318877A CN 202310123302 A CN202310123302 A CN 202310123302A CN 116318877 A CN116318877 A CN 116318877A
Authority
CN
China
Prior art keywords
sample
features
feature
manifold
intrusion detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310123302.7A
Other languages
Chinese (zh)
Inventor
罗森林
邵思源
潘丽敏
巩锟
沈宇辉
王琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202310123302.7A priority Critical patent/CN116318877A/en
Publication of CN116318877A publication Critical patent/CN116318877A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to an intrusion detection system countering sample defense method by utilizing various characteristic manifold vectors, belonging to the field of computer and information science. The invention firstly extracts various characteristics for the network traffic sample: calculating sample category prediction probability by using a radial neural network, calculating sample feature weights based on an attention mechanism, extracting high-weight features, calculating correlation among sample features, and providing a random recursion feature elimination algorithm to select high-correlation features; secondly, mapping various features into low-dimensional manifold vectors, and calculating similarity with benign and malicious flow sample manifold vectors respectively; and finally, generating a flow sample with reconstructed characteristics by using a noise reduction self-encoder according to manifold similarity, and judging by using an intrusion detection system. According to the invention, the sample characteristics are reconstructed based on the manifold similarity of the characteristics, so that the influence of the antagonism disturbance in the characteristics on the detection precision is reduced, and the defensive capability of the intrusion detection system on the antagonism sample is improved.

Description

Method for defending countersamples by using intrusion detection system with various characteristic manifold vectors
Technical Field
The invention relates to an intrusion detection system countering sample defense method by utilizing various characteristic manifold vectors, belonging to the field of computer and information science.
Background
Intrusion detection is an active defense scheme based on network traffic analysis, and identifies network attack by performing feature evidence matching on captured data packets. The intrusion detection algorithm based on machine learning and deep learning can realize automatic modeling and recognition of network traffic, and improves the efficiency and quality of intrusion detection. However, the introduction of machine learning and deep learning algorithms also presents new challenges. After Szegedy et al first proposed the challenge sample concept for deep neural networks since 2013, more challenge sample generation methods, such as FGSM, BIM, PGD, C & W, GAN, were proposed by successive scholars. In the field of intrusion detection, the countermeasures sample generation method is used for making malicious traffic countermeasures samples, so that an intrusion detection model based on machine learning and deep learning misclassifies the malicious traffic into benign traffic, thereby achieving the purposes of avoiding an intrusion detection system and concealing malicious attack behaviors, and seriously threatening the safety of an information system. Therefore, researching an effective flow countermeasure sample defense method has important theoretical significance and practical value for guaranteeing the effectiveness of an intrusion detection system and maintaining the safety of an information system.
According to object-oriented differences in the defense process, the general challenge sample defense method in recent years can be divided into a method for improving a model architecture and a method based on data enhancement.
1. Method for improving model architecture
The method of improving the model architecture is considered from the architecture optimization level. Based on the intrusion detection model, an additional challenge sample classifier is constructed to identify the challenge sample, and an integrated voting mode is adopted to jointly determine the malicious category of the sample so as to reduce the success rate of the challenge sample deception. Such methods focus on improving the classification fault tolerance performance of the model architecture, but cannot effectively limit the influence of the antagonism disturbance on the intrusion detection accuracy, which not only generates additional model overhead, but also cannot fundamentally restrict the generation of the effective antagonism sample and the spoofing of the intrusion detection system.
2. Data enhancement based method
The data enhancement based approach is considered from the sample data plane. A large number of countermeasure samples are generated by using a known countermeasure sample generation method and used for enhancing a sample data set, and an intrusion detection model with smooth decision boundaries and strong robustness to countermeasure disturbance is obtained through training. Such methods focus on enhancing the diversity of data, but do not consider underlying principles against samples, lack of analysis of the differences between benign and malicious samples, resulting in models that are not resistant to unknown attacks and perform poorly in high-concealment disturbance recognition.
In summary, the present invention provides a method for defending an anti-sample by using an intrusion detection system with multiple feature manifold vectors, which aims at solving the problems that the existing improved model architecture method lacks limitation on the anti-disturbance and the data-based enhancement method is difficult to effectively mine and amplify sample differences so as to cause poor defending effect on the anti-sample.
Disclosure of Invention
The invention aims to meet the challenge sample defense requirements of an intrusion detection system based on machine learning and deep learning, solve the problem that the existing improved model architecture defense method lacks limitation on challenge disturbance, and overcome the defect that the data enhancement method is difficult to effectively mine and amplify differences between benign samples and malicious samples.
The design principle of the invention is as follows: firstly, carrying out characteristic pretreatment on an input flow sample; secondly, extracting various characteristics from the pretreated sample; then, mapping the extracted various features into low-dimensional manifold vectors, calculating manifold similarity with benign samples and malicious samples respectively, and outputting the current sample similarity types; and finally, inputting the preprocessed sample into a similar class corresponding generation model for reconstruction to amplify the characteristic difference, and outputting the preprocessed sample to an intrusion detection system for flow discrimination after carrying out inverse processing.
The technical scheme of the invention is realized by the following steps:
and step 1, performing characteristic preprocessing on an input network traffic sample.
And 1.1, extracting non-functional characteristics in a network traffic sample.
And 1.2, encoding the non-numerical features and normalizing the numerical features.
And 2, extracting various characteristics from the network traffic sample preprocessed in the step 1.
And 2.1, constructing a sample category prediction probability calculation network by using a radial neural network, and calculating the category prediction probability of the flow sample to be used as a soft label characteristic.
And 2.2, constructing a feature weight calculation network based on an attention mechanism, calculating the weight of each feature to a model decision, screening high-weight features and taking the high-weight features as high-contribution features.
And 2.3, calculating the correlation degree among the sample features, constructing a characterization capability calculation network by combining the proposed random recursion feature elimination algorithm, screening high-correlation features and taking the high-correlation features as high-characterization features.
And step 3, mapping the extracted various features into low-dimensional manifold vectors, performing similarity calculation with benign sample manifold vectors and malicious sample manifold vectors, and judging similar types.
And 3.1, respectively performing low-dimensional manifold mapping on the extracted various features by using the trained encoder of the noise reduction self-encoder model.
And 3.2, calculating benign manifold similarity and malicious manifold similarity according to the various characteristic average manifold vectors of the benign samples and the malicious samples, and comparing the values of the benign manifold similarity and the malicious manifold similarity to judge the similar types.
And 4, inputting the samples subjected to the feature pretreatment into a reconstruction module corresponding to the similar category for reconstruction so as to strengthen the data distribution features of the samples and amplify the sample difference, carrying out single clustering on all the reconstructed samples, and selecting a clustering center point for carrying out inverse treatment to serve as an optimal reconstructed sample.
And 4.1, inputting the characteristic samples processed in the step 1 into a benign sample generation model or a malicious sample generation model corresponding to the similar category to strengthen sample data distribution characteristics and amplify sample differences, and reconstructing and outputting 50 samples.
And 4.2, calculating difference values between 50 reconstructed samples and samples before reconstruction, combining the difference values with original feature vectors of the samples, performing sample single clustering according to the combined vectors, and selecting a clustering center point as a final optimal reconstructed sample.
And 4.3, carrying out inverse processing on the optimal reconstruction sample and outputting the optimal reconstruction sample.
And 5, performing behavior classification on the optimal reconstructed flow sample by the intrusion detection system, and outputting a classification result as benign flow or malicious flow.
Advantageous effects
Compared with a method for improving a model architecture, the method does not depend on extra model overhead to improve the flow classification fault tolerance of the intrusion detection system, and reduces the anti-sample escape rate through multiple feature extraction. The method has the advantages that redundant characteristic interference can be effectively eliminated by extracting various characteristics from the flow sample, the characteristics of easy addition of the antagonistic disturbance are focused on the model, the influence of the antagonistic disturbance is limited, the difficulty of manufacturing the flow antagonistic sample capable of effectively escaping the intrusion detection system is improved, and the success rate of the flow antagonistic sample attack is reduced.
Compared with a method based on data enhancement, the method focuses on sample feature difference amplification, and improves the classification accuracy of an intrusion detection system through manifold similarity calculation and feature reconstruction. Manifold similarity calculation is carried out on various characteristics, benign sample characteristics and malicious sample characteristics are reconstructed, and characteristic distribution differences between the manifold similarity calculation and the benign sample characteristics can be better highlighted, so that the difference between the benign sample and the malicious sample can be better captured by an intrusion detection system, and the defensive capability of the intrusion detection system on an antagonistic sample is effectively improved.
Drawings
FIG. 1 is a block diagram of an intrusion detection system employing multiple feature manifold vectors for countering sample defense.
Fig. 2 is a schematic diagram of a sample class prediction probability calculation process.
Fig. 3 is a schematic diagram of a high-weight feature extraction process.
Fig. 4 is a schematic diagram of a high correlation feature extraction process.
Fig. 5 is a schematic diagram of the calculation of feature weight vectors.
Fig. 6 is a low-dimensional manifold mapping schematic.
Detailed Description
For a better illustration of the objects and advantages of the invention, embodiments of the method of the invention are described in further detail below with reference to examples.
The specific flow is as follows:
and step 1, carrying out characteristic pretreatment on an input flow sample.
Step 1.1 step 1, carrying out feature screening pretreatment on an input flow sample, and extracting nonfunctional features which do not influence the flow bottom layer attribute in the flow sample. Feature extraction is performed from several dimensions: packet interval, ethernet packet header, IP packet header, TCP packet header, UDP packet header, ICMP packet, data packet direction. The detailed characteristic information is shown in table 1.
TABLE 1 pretreatment extraction characteristics of various data
Figure BDA0004080749200000041
And 1.2, further carrying out coding treatment on the features after feature extraction, and converting the non-numerical features into an n-dimensional feature vector through one-hot coding. The digitized feature values are then normalized using standard metrics in order to avoid the effect of scale differences between the different features on the results. Compressing the numerical characteristic value in the [0,1] interval by adopting a minimum-maximum normalization method, setting the characteristic value to be normalized as x, and setting the normalized characteristic value x' as:
Figure BDA0004080749200000042
wherein x is min And x max Representing the minimum of the corresponding features in the dataset, respectivelyValues and maximum values.
And finally, dividing the preprocessed samples into benign samples and malicious samples, and dividing the benign samples and the malicious samples into two parts with the same quantity, wherein the two parts are respectively used for subsequent flow sample manifold similarity calculation and training of a sample generation model.
And 2, extracting various characteristics from the network traffic sample preprocessed in the step 1.
And 2.1, constructing a sample class prediction probability calculation network by using a radial neural network, and calculating the flow sample class prediction probability to be used as a soft label characteristic.
Firstly, for a feature vector X with the number of input features being n, calculating the radial distance h between each feature X and the class center point c to which the corresponding hidden layer node c belongs j
Figure BDA0004080749200000051
Wherein r is j Representing the variance of the j-th hidden node, m represents the number of hidden nodes, n i Representing the dimension of the input feature vector X, c jk Representing the center point c j A corresponding kth eigenvalue.
Then, the predictive probability of each class of output flow of the softmax layer added before the output layer in the radial neural network is combined, namely a sample class predictive probability vector (p 0 ,p 1 ,...,p n ) The output layer calculating method comprises the following steps:
Figure BDA0004080749200000052
wherein w represents the weight between the hidden layer node and the output node, i represents the number of traffic categories, and q represents the number of output nodes.
Finally, a sample class prediction probability X can be calculated for the input Soft =(y 0 ,...,y i ,...,y n ) And features it as a soft label, where n represents the number of traffic categories, y i Representing the probability that the current sample belongs to the i-th class.
And 2.2, constructing a feature weight calculation network based on an attention mechanism, calculating the weight of each feature to a model decision, screening high-weight features and taking the high-weight features as high-contribution features.
First, the flow sample feature vector x= (X) 1 ,...,x i ,...,x n ) Input to deep neural network to obtain hidden layer representation Q= (Q) 1 ,...,q i ,...,q n ) N represents the characteristic dimension of the input traffic sample. The hidden layer uses the ReLU function as an activation function, and the hidden layer input is the traffic sample feature vector x i The weight of the neuron is W i The offset of the neuron is b i The calculation method is shown in formula (4).
q i =ReLU(W i x i +b i ) (4)
The attention weight distribution vector alpha is then obtained by a scoring function s (·) and a softmax function i The attention weight of a feature represents the weight of the feature to the classification result. The calculation method is shown in formula (5), formula (6) and formula (7).
α i =softmax(s(x i ,q i )) (5)
Figure BDA0004080749200000053
s(x i ,q i )=x i ×q i (7)
For all feature vectors x i Weighted summation is carried out, and a characteristic weight vector S= (S) 1 ,s 2 ,...,s n ),s i Representing the weight of the ith feature. The calculation method is shown in formula (8).
Figure BDA0004080749200000061
And finally, screening out the 15-dimensional characteristic with the highest weight as a high contribution characteristic.
And 2.3, calculating the correlation degree among the sample features, constructing a characterization capability calculation network by combining the proposed random recursion feature elimination algorithm, screening high-correlation features and taking the high-correlation features as high-characterization features.
For input traffic sample feature vector x= (X) 1 ,...,x i ,...,x j ,...,x n ) First, the inter-feature correlation coefficient Person (x i ,x j ) The calculation method is shown in formula (9).
Figure BDA0004080749200000062
For each feature x i Calculate other features and x i The first 20 features are ordered and selected according to the magnitude of the coefficient value to obtain a high-correlation feature combination vector
Figure BDA0004080749200000063
Finally, the n multiplied by n dimensional matrix R is obtained by combination
Figure BDA0004080749200000064
The method is used for training an intrusion detection model based on a deep neural network, and performing optimal feature combination selection on feature combination vectors in R by using a random recursion feature elimination algorithm.
Specifically, for each dimension feature combination vector in R
Figure BDA0004080749200000065
The following procedure was performed: initializing a random value m in a range of 0 to 20 and initializing an optimal accuracy acc in a range of 0 to 0.6 according to the set random seed best The corresponding optimal characteristic combination is r best =null; from->
Figure BDA0004080749200000066
The first m high correlation features are extracted to form a new feature subset s m Training a deep neural network and recording the accuracy acc of the training process i If (if)acc i >acc best Then let acc best =acc i And r is best =s m The method comprises the steps of carrying out a first treatment on the surface of the The method loops until the optimal subset r is screened out best As the current->
Figure BDA0004080749200000067
Is the optimal high correlation feature of->
Figure BDA0004080749200000068
Further, for R, the optimal high-correlation feature subsets respectively corresponding to the ith feature vector can be obtained
Figure BDA0004080749200000069
Optimum accuracy acc i Calculating the number num of characterization features i Model accuracy acc i As a characterizability index for each feature vector:
Figure BDA00040807492000000610
where n represents the number of sample features after pretreatment. And finally, screening the high correlation characteristic with the maximum characterization capability index as a high characterization characteristic output.
And 3, respectively carrying out low-dimensional manifold mapping on the features extracted in the step 2 by using a noise reduction self-encoder, carrying out similarity calculation on the features and the benign sample manifold vector and the malicious sample manifold vector, and judging similar types.
And 3.1, respectively performing low-dimensional manifold mapping on the extracted various features by using the trained encoder of the noise reduction self-encoder model. And inputting various features into a noise reduction self-encoder, and carrying out damage processing and dimension reduction sampling on the features by using the encoder to sequentially obtain high-contribution-degree feature low-dimension manifold vectors, high-representation-feature low-dimension manifold vectors and soft-label feature low-dimension manifold vectors.
And 3.2, calculating benign manifold similarity and malicious manifold similarity according to the various characteristic average manifold vectors of the benign samples and the malicious samples, and comparing the values of the benign manifold similarity and the malicious manifold similarity to judge the similar types.
Firstly, calculating various characteristic average manifold vectors of a malicious sample and a benign sample, then respectively carrying out similarity calculation on various characteristic low-dimensional manifold vectors of an input sample and the malicious sample average manifold vector and the benign sample average manifold vector, and judging similar class labels of the input sample.
Specifically, the manifold similarity process between the current sample and the benign sample is calculated as follows: and calculating manifold similarity between the high-contribution characteristic low-dimensional manifold vector of the current sample obtained by mapping and the high-contribution characteristic average manifold vector of the benign sample, and similarly calculating the corresponding high-representation characteristic low-dimensional manifold similarity and the soft label characteristic low-dimensional manifold similarity. Wherein manifold similarity is measured using an average mean square error calculation:
Figure BDA0004080749200000071
MSE represents average mean square error, X and Y represent two manifold vectors to be calculated respectively, N represents characteristic dimension of manifold vector, and Type corresponds to one of multiple characteristics (high contribution characteristic or high characterization characteristic or soft label characteristic). The three manifold similarity values calculated from the three features are then summed to form a total manifold similarity MSE between the current sample and the benign sample Ben . Similarly, the overall manifold similarity MSE between the current sample and the malicious sample can be calculated Att
Finally, MSE is compared Ben With MSE Att Selecting the class with smaller total manifold similarity as the similar class label of the sample, if MSE Ben >MSE Att The corresponding malicious similar category is output and vice versa.
And 4, inputting the samples subjected to the feature pretreatment into a similar class corresponding generation model for reconstruction so as to strengthen the data distribution features of the samples and amplify the sample differences, clustering all the reconstructed samples, and taking a clustering center point for inverse treatment as an optimal reconstructed sample.
And 4.1, inputting the characteristic samples processed in the step 1 into a benign sample generation model or a malicious sample generation model corresponding to the similar category to strengthen sample data distribution characteristics, amplify sample differences and reconstruct and output 50 samples. Specifically, if the similar class belongs to the benign sample, the similar class is input into a benign sample noise reduction self-encoder for reconstruction, otherwise, the similar class is input into a malicious sample noise reduction self-encoder for reconstruction.
The generative model for feature reconstruction employs a noise reduction auto-encoder trained based on a normal traffic (i.e., non-antagonistic samples containing benign traffic and malicious traffic) dataset. Because the generation model can learn the data distribution and the potential characterization of the samples of the corresponding categories, the data distribution characteristics of the samples of the corresponding categories can be enhanced after the samples are input into the benign sample generation model or the malicious sample generation model for reconstruction, and the data distribution differences among the samples of different categories are further amplified.
And 4.2, calculating difference values between 50 reconstructed samples and samples before reconstruction, combining the difference values with original feature vectors of the samples, performing sample single clustering according to the combined vectors, and selecting a clustering center point as a final optimal reconstructed sample.
Specifically, for each reconstructed sample, a difference value between the reconstructed sample and the input sample before reconstruction is calculated, and the difference value adopts an average mean square error calculation measure. And combining the difference value with the original sample characteristic vector to form an n+1-dimensional vector M, wherein n is the number of reconstructed sample characteristics. Then calculating the center point c of 50 sample points by using a single clustering method according to M, wherein the determination of the center point is an iterative minimization process:
Figure BDA0004080749200000081
wherein x is ij For the ith sample for the jth feature, c j The j-th feature of the center point is currently selected.
And finally, in order to ensure that the reconstruction information is not distorted while the distribution characteristics of the sample data are enhanced, namely, the reconstruction difference is small, a clustering center c is selected as an optimal reconstruction sample.
And 4.3, carrying out characteristic inverse processing on the optimal reconstructed sample, and outputting a complete flow sample. To maintain the validity of the traffic sample and facilitate subsequent intrusion detection system classification identification, the sample features need to be restored to their original dimensions and the non-digital features should obtain their non-digital representations according to the mapping relationship. First, it is necessary to compress the data in the interval [0,1]]The eigenvalue of (2) is expanded back to the original scale to obtain the eigenvalue x after inverse normalization *
x * =x′(x max -x min )+x min (13)
And then, mapping the non-numerical value characteristics back to the characters by the one-hot codes according to the mapping relation between the characteristic attributes and the numbers in the preprocessing. And finally, combining the functional features which are not extracted in the step 1 with the reconstructed feature vectors to obtain a complete feature sample.
And 5, classifying the optimal reconstructed flow sample by the intrusion detection system, and outputting a classification result as benign flow or malicious flow.
The validity of the invention is verified by design experiments. The intrusion detection system is realized by using an intrusion detection depth neural network model, experimental data uses NSL-KDD and CICIDS2017 public data sets, and an countermeasure sample generation method such as FGSM, BIM, PGD, C & W, IDSSAN and the like is adopted to generate a countermeasure sample for performance test on a defense framework, and normal samples are randomly extracted to enable the number ratio of the normal samples to the countermeasure sample to be 1:1. The detailed information of the challenge sample data set used in the method is shown in Table 2.
TABLE 2 number of samples under each dataset
Figure BDA0004080749200000091
The effectiveness of the defense method is evaluated by adopting accuracy, recall, precision and F1 value in the experiment, and the specific calculation method is as follows:
(1) Accuracy (Accuracy). 1500 samples were randomly drawn in one dataset as benign samples, and the corresponding 1500 samples were drawn again for challenge sample fabrication. N represents the total number of samples and n=3000, tp is the number of samples correctly classified as malicious traffic and against malicious traffic, TN is the number of samples correctly classified as benign traffic, and the correctly identified classified sample ratio is calculated:
Figure BDA0004080749200000092
(2) Recall (Recall). The sample number ratio is consistent with (1), and the malicious traffic and the anti-malicious traffic ratio which are correctly classified in all correctly classified samples are calculated:
Figure BDA0004080749200000093
(3) Precision (Precision). The sample number ratio is consistent with (1), FP refers to the number of samples that a benign traffic is misclassified as a malicious traffic, and the ratio of the number of samples that are correctly classified as a malicious traffic and against a malicious traffic in all non-benign samples is calculated:
Figure BDA0004080749200000094
(4) F1 value (F1-score). Calculating Precision, recall an equilibrium evaluation value between:
Figure BDA0004080749200000095
the experimental design includes a training process and a testing process. Training process: training a feature weight calculation network, a sample class prediction probability calculation network and an intrusion detection depth neural network by using benign traffic samples and malicious traffic samples; secondly, in manifold similarity calculation, two noise reduction automatic encoders trained based on benign traffic and malicious traffic respectively serve as a low-dimensional manifold mapping model, low-dimensional manifold mapping is carried out on various characteristics of the benign traffic and the malicious traffic respectively, and average manifold vectors are calculated. And finally training a noise reduction automatic encoder based on the malicious flow sample and the benign flow sample respectively to serve as a generation model for the characteristic reconstruction of the input sample, and carrying out data distribution characteristic strengthening reconstruction.
The testing process comprises the following steps: firstly, extracting various features of a sample to be detected to obtain high contribution degree features, high characterization features and soft label features; secondly, respectively carrying out low-dimensional manifold mapping on various features to obtain low-dimensional manifold vectors, calculating manifold similarity with feature average manifold vectors corresponding to benign samples and malicious samples, and outputting similar class labels according to the similarity; and then inputting the samples into a model corresponding to the similar category for reconstruction, strengthening the distribution characteristics of sample data, amplifying the sample difference, calculating the difference value between the reconstructed samples and the reconstructed original samples, and combining the difference value with the original characteristic vector to form a new vector, and carrying out single clustering according to the vector to select a clustering center as an optimal reconstructed sample. And finally, classifying the reconstructed flow sample by using an intrusion detection deep neural network model, calculating various evaluation indexes according to classification labels, and verifying the defending performance of the method.
The experiment is performed on a computer and a GPU server, and the computer is configured to: intel (R) Core (TM) i7-6700, CPU main frequency 3.40GHz,8GB RAM, operating system 64 bit Windows10; the server is configured to: GTX 1080Ti, RAM 256G, operating system is Linux Ubuntu 64 bit.
Experimental results: the experimental results of the defense method and other comparison methods are shown in tables 3 and 4.
TABLE 3 comparison of NSL-KDD dataset malicious traffic against sample defensive experiment results
Figure BDA0004080749200000101
TABLE 4 comparison test results of malicious traffic against sample defenses for CICIDS2017 dataset
Figure BDA0004080749200000102
The experimental results showed that:
(1) The intrusion detection model based on the deep neural network achieves the classification accuracy of 99.7% under the condition of normal flow samples, and obviously reduces various performances in the challenge sample data set, so that the challenge sample can be misled to conduct error classification on the classifier. The invention has better defense effect, achieves the accuracy of 97.3 percent and 96.4 percent in the NSL-KDD data set and the CICIDS data set respectively, and the recall rate reaches 98.0 percent and 97.5 percent, and the method can filter the antagonism disturbance and enable the model to correctly classify the antagonism sample, thereby effectively limiting the influence of the antagonism disturbance on the detection precision of an intrusion detection system.
(2) The accuracy of the comparison method in NSL-KDD data sets is 94.0 percent and 93.5 percent respectively, and the recall rate is 90.8 percent and 93.8 percent respectively, which are lower than the accuracy of 97.3 percent and the recall rate of 98.0 percent of the invention; in the CICIDS2017 data set, the accuracy of the improved model architecture method is only 69.7%, the accuracy based on the data enhancement method is 91.8%, and the accuracy is obviously lower than the accuracy of 96.4% of the invention. Therefore, compared with a comparison method, the method can realize better defense effect against the sample.
By comprehensively describing, the invention can effectively eliminate redundant characteristic interference and reduce the influence of antagonism disturbance and reduce the success rate of attack on the antagonism sample by extracting various characteristics of the network traffic sample; in addition, the similarity of various characteristic manifold types is calculated, sample characteristic reconstruction processing is carried out, the problem that malicious flow escape is caused by the fact that the difference between benign samples and malicious samples is small and cannot be identified by a classifier is effectively solved, the accuracy and the robustness of the intrusion detection system under the environment of resisting sample attack are improved, and the defensive power of the intrusion detection system to the resisting samples is further enhanced.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims (7)

1. An intrusion detection system using a plurality of characteristic manifold vectors for combating a sample defense method, characterized in that the method comprises the steps of:
step 1, carrying out feature pretreatment on an input network traffic sample, extracting non-functional features, encoding the non-numerical features, and normalizing the numerical features;
step 2, extracting various features from the flow sample after feature pretreatment;
step 3, respectively carrying out low-dimensional manifold mapping on the features extracted in the step 2, carrying out similarity calculation on the features and the benign sample manifold vectors and the malicious sample manifold vectors, and judging similar types;
step 4, inputting the sample after the feature pretreatment into a model generated by the correspondence of similar categories for reconstruction so as to strengthen the data distribution features of the sample, amplify the sample difference, cluster all the reconstructed samples, and taking a clustering center point as an optimal reconstructed sample by the inverse treatment;
and 5, classifying the optimal reconstructed flow sample by the intrusion detection system, and outputting a classification result as benign flow or malicious flow.
2. The method of countering sample defense by an intrusion detection system utilizing multiple characteristic manifold vectors according to claim 1, wherein: in step 2, various features are extracted from the flow sample, including high contribution features, high characterization features, and soft label features.
3. The method of countering sample defense by an intrusion detection system utilizing multiple characteristic manifold vectors according to claim 1, wherein: in the step 2, a characterization capability computing network is constructed, firstly, the Person correlation coefficient of each feature and other features is computed, the first 20 features are extracted according to the numerical order to form a feature matrix, and then, the optimal combination selection is carried out on each dimension feature vector in the matrix by utilizing a random recursion feature elimination algorithm.
4. The method of countering sample defense by an intrusion detection system utilizing multiple characteristic manifold vectors according to claim 1, wherein: in step 2, a random recursive feature elimination algorithm is provided for the ith vector in the input feature matrix
Figure FDA0004080749190000013
The following procedure was performed: firstly, initializing a random value m in a range of 0 to 20 and initializing an optimal accuracy acc in a range of 0 to 0.6 according to a set random seed best The corresponding optimal characteristic combination is r best =null; second from r x The first m features are extracted to form a new feature subset s m Training a deep neural network and recording the accuracy acc of the training process i If acc is i >acc best Then let acc best =acc i And r is bes t=s m The method comprises the steps of carrying out a first treatment on the surface of the The method loops until the optimal subset r is screened out best As the current->
Figure FDA0004080749190000011
Is the optimal high correlation feature of->
Figure FDA0004080749190000012
5. The method of countering sample defense by an intrusion detection system utilizing multiple characteristic manifold vectors according to claim 1, wherein: in the step 2, the characterization capability index is constructed to calculate the characterization capability of the ith feature subset after the processing of the recursive feature elimination algorithm, and the calculation formula is as follows: (num) i /n+acc i )/(2×acc i ×num i N), where n represents the number of sample features after pretreatment, acc i 、num i Respectively representing the optimal accuracy and the characterization feature number corresponding to the ith feature vector.
6. The method of countering sample defense by an intrusion detection system utilizing multiple characteristic manifold vectors according to claim 1, wherein: and 3, carrying out manifold similarity measurement according to various characteristics of the current sample, judging similar types, specifically, carrying out manifold similarity calculation on various characteristic low-dimensional manifolds obtained after manifold mapping and various characteristic average manifolds corresponding to benign samples and various characteristic average manifolds corresponding to malicious samples respectively, and outputting similar type labels of the current sample according to the size of the similarity value.
7. The method of countering sample defense by an intrusion detection system utilizing multiple characteristic manifold vectors according to claim 1, wherein: in step 4, feature reconstruction is carried out on samples according to similar categories to strengthen sample data distribution features and amplify sample differences, specifically, preprocessed samples are firstly input into noise reduction self-encoders corresponding to the similar categories to reconstruct, then difference values between 50 reconstructed flow samples and original samples are calculated and combined with original sample feature vectors, single clustering is carried out according to the vectors, and finally a clustering center point is selected as an optimal reconstructed sample.
CN202310123302.7A 2023-02-16 2023-02-16 Method for defending countersamples by using intrusion detection system with various characteristic manifold vectors Pending CN116318877A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310123302.7A CN116318877A (en) 2023-02-16 2023-02-16 Method for defending countersamples by using intrusion detection system with various characteristic manifold vectors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310123302.7A CN116318877A (en) 2023-02-16 2023-02-16 Method for defending countersamples by using intrusion detection system with various characteristic manifold vectors

Publications (1)

Publication Number Publication Date
CN116318877A true CN116318877A (en) 2023-06-23

Family

ID=86784320

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310123302.7A Pending CN116318877A (en) 2023-02-16 2023-02-16 Method for defending countersamples by using intrusion detection system with various characteristic manifold vectors

Country Status (1)

Country Link
CN (1) CN116318877A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117009970A (en) * 2023-10-07 2023-11-07 华中科技大学 Method for generating malicious software countermeasure sample in blind feature scene and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117009970A (en) * 2023-10-07 2023-11-07 华中科技大学 Method for generating malicious software countermeasure sample in blind feature scene and electronic equipment
CN117009970B (en) * 2023-10-07 2023-12-29 华中科技大学 Method for generating malicious software countermeasure sample in blind feature scene and electronic equipment

Similar Documents

Publication Publication Date Title
CN112784881B (en) Network abnormal flow detection method, model and system
CN110287983B (en) Single-classifier anomaly detection method based on maximum correlation entropy deep neural network
Mohammadi et al. A new deep learning approach for anomaly base IDS using memetic classifier
CN112217787B (en) Method and system for generating mock domain name training data based on ED-GAN
CN111353373A (en) Correlation alignment domain adaptive fault diagnosis method
CN113489685B (en) Secondary feature extraction and malicious attack identification method based on kernel principal component analysis
CN112560596B (en) Radar interference category identification method and system
CN113571067A (en) Voiceprint recognition countermeasure sample generation method based on boundary attack
CN116318877A (en) Method for defending countersamples by using intrusion detection system with various characteristic manifold vectors
CN113098862A (en) Intrusion detection method based on combination of hybrid sampling and expansion convolution
CN113269228A (en) Method, device and system for training graph network classification model and electronic equipment
CN111695611A (en) Bee colony optimization kernel extreme learning and sparse representation mechanical fault identification method
Guowei et al. Research on network intrusion detection method of power system based on random forest algorithm
Feng et al. A phishing webpage detection method based on stacked autoencoder and correlation coefficients
CN113627543A (en) Anti-attack detection method
Zhao et al. Training DHMMs of mine and clutter to minimize landmine detection errors
Ferrag et al. Generative adversarial networks-driven cyber threat intelligence detection framework for securing internet of things
CN117034112A (en) Malicious network traffic classification method based on sample enhancement and contrast learning
CN116684138A (en) DRSN and LSTM network intrusion detection method based on attention mechanism
CN116647844A (en) Vehicle-mounted network intrusion detection method based on stacking integration algorithm
CN116319033A (en) Network intrusion attack detection method, device, equipment and storage medium
Khan et al. Securing voice biometrics: One-shot learning approach for audio deepfake detection
CN113159181B (en) Industrial control system anomaly detection method and system based on improved deep forest
Wang et al. A hybrid cloud intrusion detection method based on SDAE and SVM
CN114760128A (en) Network abnormal flow detection method based on resampling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination