CN116684138A - DRSN and LSTM network intrusion detection method based on attention mechanism - Google Patents
DRSN and LSTM network intrusion detection method based on attention mechanism Download PDFInfo
- Publication number
- CN116684138A CN116684138A CN202310657316.7A CN202310657316A CN116684138A CN 116684138 A CN116684138 A CN 116684138A CN 202310657316 A CN202310657316 A CN 202310657316A CN 116684138 A CN116684138 A CN 116684138A
- Authority
- CN
- China
- Prior art keywords
- attention mechanism
- drsn
- matrix
- intrusion detection
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 68
- 230000007246 mechanism Effects 0.000 title claims abstract description 55
- 239000011159 matrix material Substances 0.000 claims abstract description 93
- 238000012549 training Methods 0.000 claims abstract description 20
- 238000000034 method Methods 0.000 claims abstract description 19
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000011176 pooling Methods 0.000 claims description 30
- 230000006870 function Effects 0.000 claims description 19
- 230000008602 contraction Effects 0.000 claims description 16
- 230000004913 activation Effects 0.000 claims description 14
- 238000010606 normalization Methods 0.000 claims description 14
- 239000013598 vector Substances 0.000 claims description 12
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 230000009467 reduction Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000005315 distribution function Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 claims description 3
- 238000005065 mining Methods 0.000 abstract description 2
- 238000013528 artificial neural network Methods 0.000 description 6
- 230000015654 memory Effects 0.000 description 6
- 230000002159 abnormal effect Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 238000004880 explosion Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 230000006978 adaptation Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008033 biological extinction Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/40—Network security protocols
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Hardware Design (AREA)
- Image Analysis (AREA)
Abstract
A network intrusion detection method based on DRSN and LSTM of attention mechanism carries on preprocessing to intrusion detection data set, uses convolution module and attention mechanism module to extract the flow characteristic of input, gets the characteristic matrix with channel and space attention mechanism, uses DRSN to extract the space characteristic; the method can be used for superposing deeper network structures, mining deeper spatial characteristic information and removing noise through soft thresholding. The spatial features are input into the LSTM for training, the time sequence features in the data are extracted, the optimized DRSN-LSTM intrusion detection model is obtained by continuously updating the weight parameters, the advantages of the DRSN for extracting the spatial features and the advantages of the LSTM for extracting the time sequence features are combined, the accuracy of intrusion detection can be effectively improved, the false alarm rate of intrusion detection is reduced, and the generalization capability of the model is enhanced based on the denoising capability of the DRSN.
Description
Technical Field
The application belongs to the technical field of network security, and particularly relates to a DRSN and LSTM network intrusion detection method based on an attention mechanism.
Background
With the growth of the internet scale and the continuous emergence of various network applications, networks have become a necessity for human life. At present, massive network traffic data information exists on the Internet, and various network attacks bring serious threat to network space security. The intrusion detection system is an important tool for protecting the safety of human network property as an effective network protection means.
Most of the current network intrusion detection technologies adopt a characteristic detection mode, and the detection range is limited. Specifically, the intrusion detection technology focusing on feature detection is only suitable for computer networks to develop relatively simple intrusion behavior detection and basic defense management, and in the face of complex network environments, the intrusion detection based on feature detection depends on a database of known attack marks, and events and traffic are matched with the database of known marks, so that whether an attack exists or not is judged, but unknown attacks cannot be detected. The other part of intrusion detection adopts anomaly detection to try to learn a normal behavior rule and identify everything else as anomaly or intrusion, but has the problems of low accuracy and high false alarm rate due to huge data volume, excessive characteristics, noisy data and redundant data.
Disclosure of Invention
In order to overcome the disadvantages of the prior art, the present application aims to provide a network intrusion detection method based on a DRSN (Deep Residual Shrinkage Networks, DRSN) and a Long Short-Term Memory (LSTM) of an attention mechanism (Attention Mechanism, AM), wherein due to the difference in importance degree of features, the AM is used to weight data features first, so that important features become prominent, unimportant features are weakened, feature dimensions participating in computation are reduced, and training speed is increased. The DRSN is adopted to extract the spatial characteristics in the characteristics, and on the basis of a residual error network, an attention mechanism and soft thresholding are added, so that the model can better pay attention to important characteristics, the influence of noise on a result is inhibited as much as possible, and the capability of extracting the characteristics of the model can be obviously improved. The DRSN has the capability of extracting the spatial features of the data, the LSTM has the capability of extracting the time sequence features in the data, the advantages of the DRSN and the LSTM are combined, the spatial features are extracted by using the DRSN, the noise data is eliminated, the time sequence features are extracted by using the LSTM, the time-space features of the data are fully learned, and the model training speed and the intrusion detection accuracy are improved.
In order to achieve the above purpose, the technical scheme adopted by the application is as follows:
a network intrusion detection method of DRSN and LSTM based on attention mechanism includes the following steps:
step 1, preprocessing an intrusion detection data set to obtain a data set X;
the intrusion detection data set comprises network traffic data and class identification tags label, wherein the network traffic data comprises numerical data and tag data, and the class identification tags label are used for identifying intrusion categories; the preprocessing encodes intrusion categories into different numbers; the label data are coded and converted into numerical value data, states with different one-bit characteristic values are represented through a group of binary numbers, and then all the numerical value data are scaled to dimension [0,1] through normalization operation;
step 2, extracting input flow characteristics by using a convolution module and an attention mechanism module to obtain a characteristic matrix MS with a channel and a spatial attention mechanism;
step 3, adjusting the feature matrix MS as a feature matrix X3 so that the feature matrix MS can be used as the input of a depth residual error shrinkage network;
step 4, extracting the spatial features of the feature matrix X3 by using a depth residual error shrinkage network;
and 5, inputting the spatial features into the LSTM for training, extracting time sequence features in the data, and continuously updating weight parameters to obtain an optimized DRSN-LSTM intrusion detection model.
And 6, performing network intrusion detection by using the optimized DRSN-LSTM intrusion detection model, taking network traffic as input and taking whether intrusion behavior is output.
In one embodiment, the intrusion detection data set is an NSL-KDD data set, and uses 41 types of network traffic data and a class identification tag label in each connection record thereof; the intrusion categories include Normal, dos, proboing, R L and U2R, representing non-intrusive, denial of service intrusion, monitoring and other acquisition of detection, remote machine illegitimate access, and regular user illegitimate access to local superuser privileges, respectively; the protocol type protocl_type, the service type service and the flag in the 41-class network traffic data form tag data, and the tag data adopts One-Hot coding.
In one embodiment, for a classification problem, normal is encoded as 1, dos, proboing, R2L, and U2R are all encoded as 2; for the multi-classification problem Normal, dos, proboing, R L and U2R are coded sequentially as 1, 2, 3, 4, 5.
In one embodiment, the one-dimensional vector data in step 1 is converted into a two-dimensional matrix format, that is, intrusion detection data is converted into a gray scale map, and the conversion process is as follows: selecting one sample X from the data set X, expanding X by using a random forward distribution function, expanding X from n ' columns to n ' +m ' columns, and reconstructing X into a matrix of p+q, wherein p+q=n ' +m ', repeating until all samples in X are traversed, wherein n ' is the initial column number of the samples, and m ' is the expanded column number.
In one embodiment, the convolution module performs a convolution operation on the two-dimensional matrix, that is, performs an inner product on the partial image and the convolution kernel matrix, where the formula is as follows:
wherein W is the matrix of the convolution kernel, S is the matrix after convolution, S (i,j) The method is characterized in that in the ith row and the jth column, convolution values are obtained after convolution, the size of a local image of convolution operation is required to be consistent with that of a convolution kernel, and x is required to be consistent with that of the convolution kernel (i+m,j+m) The local image information of the ith row, the jth column and the length and the width of the sample x are m and n respectively; w (w) (m,n) Is a convolution kernel of length m and width n.
In one embodiment, the attention mechanism module takes the form of a channel attention mechanism in series with a spatial attention mechanism; the feature matrix output by the convolution module firstly passes through a channel attention mechanism and outputs a feature matrix MC with channel attention weight; the MC serves as an input feature of the spatial attention mechanism, generating a feature matrix MS with channels and spatial attention mechanisms.
In one embodiment, the channel attention mechanism, the width is W, the height is H, the channel number is C, the dimension features are respectively reduced to 2 feature vectors of 1×1×c through wide and high-based average pooling and global maximum pooling, then the feature vectors are processed through a shared multi-layer perceptron MLP, added and converted into weight feature vectors of 1×1×c through a Sigmoid function, and finally MC is obtained through multiplication with input features, and the formula is as follows:
MC(X)=σ(MLP(MaxPool(X))+MLP(AvgPool(X)))
wherein: sigma is a nonlinear activation function Sigmoid, maxPool is maximum pooling, and AvgPool is average pooling;
the spatial attention mechanism respectively carries out global average pooling and global maximum pooling based on channels on MC in channel dimension, stacks the channel numbers of the formed feature graphs, and obtains MS through convolution layers and Sigmoid conversion, and the formula is as follows:
MS(X)=σ(f[AvgPool(MC(X));MaxPool(MC(X))])
wherein: f is convolution dimension reduction operation.
In one embodiment, in the step 3, the feature matrix MS is adjusted by performing convolution operation on the feature matrix MS twice, and then performing batch normalization and ReLU activation function activation to obtain the feature matrix X3.
In one embodiment, in the step 4, in the depth residual error contraction network, the residual error contraction module performs a convolution operation on the feature matrix X3 first, and then performs normalization and a ReLU activation function again to obtain a feature matrix X4; each element r in the feature matrix X4 is subjected to soft thresholding to obtain a new set of feature matrices, denoted as feature matrix X5, the feature matrix X5 matrix is convolved again, and then the extracted abstract high-dimensional features are reduced in dimension by global average pooling (Global Average Pooling, GAP).
In one embodiment, the soft thresholding is formulated as follows:
wherein r is a feature before soft thresholding, tau is a threshold, and y is an element feature after soft thresholding; the method for obtaining the threshold value tau is as follows: averaging the feature matrix X4, and then carrying out global average pooling to obtain a feature matrix A; inputting the feature matrix A into a fully-connected network of a residual error contraction module, taking a Sigmoid function as a last layer, and normalizing output to between 0 and 1 to obtain a weight coefficient alpha; and multiplying alpha by the characteristic of each channel of the characteristic matrix A to obtain a final threshold alpha multiplied by A, and recording the final threshold alpha multiplied by A as tau.
Compared with the prior art, the application has the beneficial effects that:
the DRSN is applied to the field of intrusion detection for the first time, and a new method is provided for intrusion detection. The residual error module of the DRSN can be used for superposing a deeper network structure, mining deeper space characteristic information and removing noise through soft thresholding. The application combines the DRSN and the LSTM for the intrusion detection field for the first time, creatively provides a DRSN-LSTM intrusion detection model, combines the advantages of the extraction of the spatial features of the DRSN and the advantages of the extraction of the time sequence features of the LSTM, is based on the denoising capability of the DRSN, can effectively improve the accuracy of intrusion detection, reduce the false alarm rate of intrusion detection and strengthen the generalization capability of the model.
Drawings
Fig. 1 is a flow chart of intrusion detection.
Fig. 2 is a flow chart of an attention mechanism.
Fig. 3 is a residual shrinkage module model diagram.
Fig. 4 is a structural view of LSTM.
Detailed Description
The application will be described in further detail with reference to the drawings and examples.
As shown in fig. 1, the present application is a network intrusion detection method of DRSN and LSTM based on an attention mechanism, including the following steps:
and step 1, preprocessing an intrusion detection data set to obtain a data set X.
The intrusion detection data set used in the application comprises network traffic data and class identification tags label, wherein the class identification tags label are used for identifying intrusion categories, and the network traffic data comprises numerical data and tag data.
The preprocessing of the application is to encode the intrusion category into different numbers for distinguishing; and the tag data is encoded and converted into numerical data to represent the states of different one-bit eigenvalues through a set of binary numbers, and then all numerical data is scaled to dimension [0,1] through normalization operation.
In particular embodiments of the present application, the particular type of intrusion dataset is not limited, and an implementer may choose one or more of DALPA 98, DALPA 99, DALPA 00, DALPA 2000, KDD99, NSL-KDD, and IDS2018 according to actual needs. The present example selects the NSL-KDD dataset. In the NSL-KDD data set, each connection record contains 41 types of network traffic data, a class identification tag and a difficulty level tag. Among the class 41 network traffic data, there are class 9 TCP connection basic characteristics, class 13 TCP connection content characteristics, class 9 time-related network traffic data, and class 10 host-related network traffic data, whereby class 41 characteristic attributes can be obtained. The difficulty level label is used primarily to correctly label the number of learners for a given record, and is discarded regardless of the training of the present application.
The application uses the 41-class network flow data and class identification label, wherein the intrusion class comprises Normal, dos, proboing, R L and U2R which respectively represent that no intrusion is received, service intrusion is refused, monitoring and other detection acquisition are carried out, remote machine illegal access and common user illegal access to local super user privileges are carried out; in the 41-class network traffic data, the protocol type protocl_type, the service type service and the flag are character-type characteristics, so that the tag data of the application is formed.
For the class identification label, whether the final training result is consistent with the label attribute can be distinguished, and training is not participated, so that the class is converted into the conventional number only according to the conversion rule of the table 1.
TABLE 1
That is, for the classification problem, normal is encoded as 1, dos, probing, R2L, and U2R are all encoded as 2; for the multi-classification problem Normal, dos, proboing, R L and U2R are coded sequentially as 1, 2, 3, 4, 5.
The tag data, namely the protocol type protocl_type, the service type service and the flag, form the tag data which adopts independent thermal coding (One-hot) and aims to enable a group of binary numbers to represent states with different One-bit characteristic values.
For the single-heat coding, a certain value of the discrete features corresponds to a certain point of the N-dimensional space, and the discrete features can be used for independent coding so that the calculation of the distance between the features is more reasonable. For example, the protocol_type attribute indicates TCP, UDP, ICMP network protocols, and 100 may be used to indicate TCP protocol, 010 may be used to indicate UDP protocol, 001 may be used to indicate ICMP protocol, and the remaining features may be used as such.
The training set and the data set are not limited in this implementation, and an implementation person can divide the training set and the data set according to own actual needs, and the implementation preferably divides 80% of the training set and 20% of the training set and the test set.
The normalization operation of the values can eliminate the difference between the data of different dimensions, and in order to ensure the reliability of the training result, a minimum-maximum (min-max) normalization method is adopted to normalize the characteristics to be within the range of [0,1]. The formula is as follows:
wherein x is the characteristic value of the original data, x min Is the minimum value of the feature, x max Is the characteristic maximum value, x norm Is normalized value.
Step 2: and (3) converting the one-dimensional vector data in the step (1) into a two-dimensional matrix format required by the model, namely converting intrusion detection data into a gray level map.
Based on the data filling mode, feature expansion is carried out on the data samples, the expanded data is converted into a two-dimensional matrix format, and the conversion process is as follows: selecting one sample X from the data set X, expanding X by using a random forward distribution function, expanding X from n ' columns to n ' +m ' columns, reconstructing X structure, reconstructing X into a p+q matrix, wherein p=n ' +m ', repeating the steps until all samples in X are traversed, wherein n ' is the initial column number of the samples, and m ' is the expanded column number.
Step 3: and extracting the input flow characteristics by using a convolution module and an attention mechanism module to obtain a characteristic matrix MS with a channel and a spatial attention mechanism.
As shown in fig. 2, the attention mechanism references the way the human being thinks of attention, and can selectively focus on a part of all information, while ignoring other visible information. The attention mechanism takes the form of a channel attention mechanism in series with a spatial attention mechanism.
In the step, a convolution module is utilized to perform a convolution operation on the two-dimensional matrix, namely, inner products are performed on the partial image and the convolution kernel matrix, and the formula is as follows:
wherein W is the matrix of the convolution kernel, S is the matrix after convolution, S (i,j) The method is characterized in that in the ith row and the jth column, convolution values are obtained after convolution, the size of a local image of convolution operation is required to be consistent with that of a convolution kernel, and x is required to be consistent with that of the convolution kernel (i+m,j+m) The local image information of the ith row, the jth column and the length and the width of the sample x are m and n respectively; w (w) (m,n) Is a convolution kernel of length m and width n.
The convolution operation can be performed multiple times, and the implementation is preferably performed once, so that the operation time is saved. Since the input image is a gray scale map of the intrusion detection data set converted into two dimensions, it is preferable to set the depth of the convolution to 1 and the width of the convolution kernel to 3, and the corresponding adjustment can be made according to the different data sets selected.
The attention mechanism module adopts a mode that a channel attention mechanism and a space attention mechanism are connected in series; the feature matrix output by the convolution module firstly passes through a channel attention mechanism and outputs a feature matrix MC with channel attention weight; the MC serves as an input feature of the spatial attention mechanism, generating a feature matrix MS with channels and spatial attention mechanisms.
The channel attention mechanism takes a feature matrix after convolution operation of a convolution module as input, the width is W, the height is H, and dimension features of the channel number C are respectively reduced to 2 feature vectors of 1 multiplied by C through average pooling and global maximum pooling based on the width and the height. Then through the shared multi-layer perceptron MLP, and add and convert into 1×1×C weight feature vector through Sigmoid function, finally through multiplying with the input feature, get the output feature matrix MC of the channel attention mechanism, the formula is as follows.
MC(X)=σ(MLP(MaxPool(X))+MLP(AvgPool(X)))
Wherein: sigma is the nonlinear activation function Sigmoid, maxPool is the maximum pooling, avgPool is the average pooling.
Then, taking the feature matrix MC as the input of a spatial attention mechanism, and respectively carrying out channel-based global average pooling and global maximum pooling on the MC in the channel dimension. After stacking (concat) the formed feature map channel number, the output feature matrix MS of the spatial attention module is finally generated through a convolution layer and Sigmoid conversion, and the formula is as follows:
MS(X)=σ(f[AvgPool(MC(X));MaxPool(MC(X))])
wherein: f is convolution dimension reduction operation.
The AM is applied to an intrusion detection model, useful characteristic information can be amplified, useless characteristic information is eliminated, dimension reduction is carried out on data, and the data volume is reduced, so that the detection efficiency and accuracy are improved.
And 4, adjusting the feature matrix MS to be a feature matrix X3 so that the feature matrix MS can be used as an input of a depth residual error shrinkage network.
And (3) expressing the characteristic matrix MS as a characteristic matrix X1, performing convolution operation twice to obtain a transformed characteristic matrix X2, and performing batch normalization (Batch Normalization, BN) and ReLU activation function activation on the characteristic matrix X2 to obtain a characteristic matrix X3. The ReLU activation function formula is: f (x) =max (0, x).
For the processing of the feature matrix X1, the convolution operation can be carried out for a plurality of times according to the actual situation of the user to obtain X2, but the times are not more than 4 times, so as to prevent the overfitting and influence the final effect.
And 5, extracting the spatial features of the feature matrix X3 by using a depth residual error shrinkage network.
The depth residual shrinkage network aims to strengthen the capability of the depth neural network for extracting useful features from samples containing noise and redundancy by introducing a residual shrinkage module, is used for removing the redundant features, improves the classification accuracy of a neural network model, is more convenient for back propagation by means of identity mapping of the residual network, reduces the difficulty of neural network training and prevents gradient explosion.
In the step, the feature matrix X3 is subjected to noise elimination and effective feature extraction through 3 residual error contraction modules after passing through normalization layers (Batch Normalization, BN), then the extracted abstract high-dimensional features are subjected to dimension reduction through global average pooling layers (Global Average Pooling, GAP), training parameters are greatly reduced, overfitting is avoided, and classification results are input through full connection.
The residual contraction network, like the conventional deep convolution network, includes: convolution layer, pooling layer, bias term, activation function, cross entropy loss function. A deviation term needs to be added between the convolution kernel and the feature map, and the expression form of the deviation term is as follows:
wherein x is i Input feature map representing the ith channel, y j Is the output characteristic diagram of the jth channel, k is the convolution kernel, M j Is the channel set that calculates and outputs the j-th feature map.
The depth residual shrinkage network aims to strengthen the capability of the depth neural network for extracting useful features from samples containing noise and redundancy by introducing a residual shrinkage module, is used for removing the redundancy features, improves the classification accuracy of a neural network model, is more convenient for back propagation by identity mapping of the residual network, reduces the difficulty of neural network training and prevents gradient explosion. Rejecting redundant features relies on soft thresholding, which is a key step in noise reduction algorithms, to reject features with absolute values less than a certain threshold, and to shrink features with absolute values greater than the threshold toward zero.
The residual contraction module is shown in fig. 3, and the two-dimensional convolution is to slide on the two-dimensional image by using a filter, multiply and sum corresponding positions, and extract the characteristics of the image. It is taken as the first layer and so the input image size needs to be given. The convolution effect is mainly determined by the number, width, depth and step size of convolution kernels. Since the input image is a gray scale map converted from the intrusion detection data set, the depth of convolution is set to 1. Considering the universality of the model, the width of the convolution kernel is set to be 3 so as to reduce model parameters, namely in the residual error contraction module of the step, the convolution is two-dimensional convolution, the convolution depth is 1, and the convolution kernel width is 3. And performing soft thresholding on the extracted image features, which is completed by a residual error contraction module, which is the core of the DRSN. Unlike the normal residual module, the residual contraction module embeds a sub-network adaptive generation threshold.
The specific implementation steps of the residual error contraction module are as follows:
step 5-1: the residual contraction module first performs a convolution operation on the incoming feature matrix X3. And then carrying out normalization and ReLU activation functions again to obtain a new feature matrix X4.
Step 5-2: each element r in the feature matrix X4 is soft thresholded to obtain a new set of feature matrices, denoted as feature matrix X5. The specific method comprises the following steps:
averaging the feature matrix X4, and then carrying out global average pooling to obtain a feature matrix A; inputting the feature matrix A into a full-connection network of a residual error contraction module, taking a Sigmoid function as a final layer, normalizing output to be between 0 and 1, and obtaining a weight coefficient alpha; and multiplying alpha by the characteristic of each channel of the characteristic matrix A to obtain a final threshold alpha multiplied by A, and recording the final threshold alpha multiplied by A as tau.
The threshold is an average of the absolute values of a digital x feature map between 0 and 1 in such a way that the threshold is not only positive but not too large. And different samples have different thresholds. According to different thresholds, the model can be made to notice the characteristics irrelevant to the current task, and the model is set to be zero through soft thresholding; at the same time, the features related to the current task can be noted, and they are preserved.
Each element r in the feature matrix X4 is soft thresholded as follows to obtain a feature matrix X5. The soft thresholding formula is as follows:
where r is the pre-soft thresholding feature, τ is the threshold, and y is the post-soft thresholding elemental feature.
The derivative of the soft thresholded output with respect to the input is:
the above formula shows that soft thresholding derivatives, either 1 or 0, can reduce the risk of the deep learning algorithm encountering gradient vanishing and gradient explosion.
Step 5-3: and convolving the feature matrix X5 matrix again, and then reducing the dimension of the extracted abstract high-dimension features through global average pooling (Global Average Pooling, GAP), so as to greatly reduce training parameters and avoid overfitting.
Further, X5 may be passed into the two following residual contraction modules, performing the same operations as the first residual module. Finally, the feature X6 after soft thresholding and space extraction is obtained.
Step 6: converting the spatial features extracted by the DRSN into one-dimensional features, inputting the one-dimensional features into an LSTM model for training, extracting time sequence features in data, and continuously updating weight parameters to obtain an optimized DRSN-LSTM intrusion detection model.
LSTM was proposed to solve the problems of gradient extinction and gradient explosion of the recurrent neural network (Recurrent Neural Network, RNN). As shown in FIG. 4, LSTM uses memory cells in the hidden layer, which are mainly composed of forgetting gate, input gate, output gate and self-connected memory cells, compared with RNN. The equations for LSTM forget gate, input gate, output gate and memory cells are as follows:
forgetting the door: f (f) t =σ(w f ·[h t-1 ,x t ]+b f )
An input door: i.e t =σ(w i ·[h t-1 ,x t ]+b i )
Candidate memory cells:
memory finenessCells:
output door: o (o) t =σ(w o ·[h t-1 ,x t ]+b o )
And (3) outputting: h is a t =o t *tanh(C t )
Wherein f, i, o and c represent respectively the forgetting gate, the input gate, the output gate and the output of the memory cell, w f 、w i 、w c 、w o Is a weight matrix, b f 、b i 、b c 、b o Is the bias vector and σ is the sigmoid function.
In this step, the training process is as follows:
step 6-1: the input feature vector is acquired through an input gate in the LSTM, and then whether to forget the data is determined through a forget gate.
Step 6-2: and calculating errors of each layer through the training set classification result, and transmitting the errors to the upper layer.
Step 6-3: and calculating the gradient of each weight according to the calculated error term.
Step 6-4: the weights of the layers are updated according to the gradient. And optimizing and screening out the parameter combination of the DRSN-LSTM intrusion detection model in iteration, and finding out the optimal parameters.
Different iteration times can be selected according to computer configuration and actual needs, in the embodiment, 100 times are iterated in total, and the parameter combination of the DRSN-LSTM intrusion detection model is optimized and screened in the iteration to find the optimal parameter.
Step 7: and performing intrusion detection test on the trained DRSN-LSTM intrusion detection model on the test set to obtain a classification result and evaluate the performance of the model. Finally, the network intrusion detection can be performed by using an optimized DRSN-LSTM intrusion detection model, taking network traffic as input and taking whether intrusion behavior is output.
And performing intrusion detection test on the test set by using the trained DRSN-LSTM intrusion detection model. The evaluation index used in the implementation is the accuracy, the precision, the recall rate and the F1 value.
Recall (Recall): the proportion of the number of samples correctly identified as abnormal in the abnormal samples to the total number of abnormal samples is also referred to as true case rate (True Positive Rate) or Sensitivity (Sensitivity). The formula is:
precision (Precision): the number of samples correctly identified as abnormal by the classifier is a proportion of the number of samples predicted as abnormal by the classifier. The formula is:
accuracy (Accuracy): the proportion of the number of correctly classified samples of the classifier to the total number of samples, namely the proportion of the number of samples of the classifier, the predicted result of which is consistent with the actual result, to the total number of samples. The formula is:
f1 Score (F1 Score): the comprehensive index comprehensively considering the recall rate and the precision rate is a harmonic average value of the recall rate and the precision rate and is used for measuring the overall performance of the classifier. The formula is:
wherein: TP is true positive data, namely predicted as attack data, and actually is true attack data;
TN is true negative data, namely predicted normal data, and actual TN is true normal data; FP is false positive data, i.e. predicted as attack data, actually normal data; FN is false negative data, i.e. predicted as normal data, actually attack data. The implementation accuracy reaches 97.56%, the accuracy also reaches 74.68%, the recall rate reaches 99.65%, and the F1 value reaches 85.43%.
Compared with the prior art, the method has the advantages that the attention mechanism, the DRSN and the LSTM model are reasonably and efficiently combined, and the detection accuracy and the generalization capability of the model are improved.
The foregoing is only a preferred embodiment of the application, it being noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present application, and such modifications and adaptations are intended to be comprehended within the scope of the application.
Claims (10)
1. A network intrusion detection method based on DRSN and LSTM of attention mechanism is characterized by comprising the following steps:
step 1, preprocessing an intrusion detection data set to obtain a data set X;
the intrusion detection data set comprises network traffic data and class identification tags label, wherein the network traffic data comprises numerical data and tag data, and the class identification tags label are used for identifying intrusion categories; the preprocessing encodes intrusion categories into different numbers; the label data are coded and converted into numerical value data, states with different one-bit characteristic values are represented through a group of binary numbers, and then all the numerical value data are scaled to dimension [0,1] through normalization operation;
step 2, extracting input flow characteristics by using a convolution module and an attention mechanism module to obtain a characteristic matrix MS with a channel and a spatial attention mechanism;
step 3, adjusting the feature matrix MS as a feature matrix X3 so that the feature matrix MS can be used as the input of a depth residual error shrinkage network;
step 4, extracting the spatial features of the feature matrix X3 by using a depth residual error shrinkage network;
step 5, inputting the spatial features into LSTM for training, extracting time sequence features in the data, and continuously updating weight parameters to obtain an optimized DRSN-LSTM intrusion detection model;
and 6, performing network intrusion detection by using the optimized DRSN-LSTM intrusion detection model, taking network traffic as input and taking whether intrusion behavior is output.
2. The method for detecting network intrusion by DRSN and LSTM based on attention mechanism according to claim 1, wherein the intrusion detection data set is NSL-KDD data set, and uses 41 kinds of network traffic data and one kind of class identification tag label in each connection record thereof; the intrusion categories include Normal, dos, proboing, R L and U2R, representing non-intrusive, denial of service intrusion, monitoring and other acquisition of detection, remote machine illegitimate access, and regular user illegitimate access to local superuser privileges, respectively; the protocol type protocl_type, the service type service and the flag in the 41-class network traffic data form tag data, and the tag data adopts One-Hot coding.
3. The method for detecting network intrusion by DRSN and LSTM based on attention mechanism of claim 2, wherein Normal is encoded as 1, dos, probing, R2L and U2R are all encoded as 2 for the classification problem; for the multi-classification problem Normal, dos, proboing, R L and U2R are coded sequentially as 1, 2, 3, 4, 5.
4. The network intrusion detection method based on DRSN and LSTM of claim 1, wherein the one-dimensional vector data in step 1 is converted into a two-dimensional matrix format, that is, intrusion detection data is converted into a gray scale map, and the conversion process is as follows: selecting one sample X from the data set X, expanding X by using a random forward distribution function, expanding X from n ' columns to n ' +m ' columns, and reconstructing X into a matrix of p+q, wherein p+q=n ' +m ', repeating until all samples in X are traversed, wherein n ' is the initial column number of the samples, and m ' is the expanded column number.
5. The method for detecting network intrusion by DRSN and LSTM based on an attention mechanism according to claim 1, wherein the convolution module performs a convolution operation on the two-dimensional matrix, i.e. performs an inner product on the partial image and the convolution kernel matrix, as follows:
wherein W is the matrix of the convolution kernel, S is the matrix after convolution, S (i,j) The method is characterized in that in the ith row and the jth column, convolution values are obtained after convolution, the size of a local image of convolution operation is required to be consistent with that of a convolution kernel, and x is required to be consistent with that of the convolution kernel (i+m,j+m) The local image information of the ith row, the jth column and the length and the width of the sample x are m and n respectively; w (w) (m,n) Is a convolution kernel of length m and width n.
6. The network intrusion detection method according to claim 1, wherein the attention mechanism module adopts a mode that a channel attention mechanism is connected in series with a spatial attention mechanism; the feature matrix output by the convolution module firstly passes through a channel attention mechanism and outputs a feature matrix MC with channel attention weight; the MC serves as an input feature of the spatial attention mechanism, generating a feature matrix MS with channels and spatial attention mechanisms.
7. The network intrusion detection method according to claim 6, wherein the channel attention mechanism, width W, height H, and channel number C dimensional features are reduced to 2 feature vectors of 1×1×c through wide and high average pooling and global maximum pooling, respectively, then through a shared multi-layer perceptron MLP, and added to be converted into weight feature vectors of 1×1×c through Sigmoid function, and finally through multiplication with input features to obtain MC, the formula is as follows:
MC(X)=σ(MLP(MaxPool(X))+MLP(AvgPool(X)))
wherein: sigma is a nonlinear activation function Sigmoid, maxPool is maximum pooling, and AvgPool is average pooling;
the spatial attention mechanism respectively carries out global average pooling and global maximum pooling based on channels on MC in channel dimension, stacks the channel numbers of the formed feature graphs, and obtains MS through convolution layers and Sigmoid conversion, and the formula is as follows:
MS(X)=σ(f[AvgPool(MC(X));MaxPool(MC(X))])
wherein: f is convolution dimension reduction operation.
8. The method for detecting network intrusion by DRSN and LSTM based on attention mechanism according to claim 1, wherein in step 3, the feature matrix MS is adjusted by performing convolution operation twice, and then batch normalization and ReLU activation function activation are performed to obtain the feature matrix X3.
9. The method for detecting network intrusion by DRSN and LSTM based on attention mechanism according to claim 1 or 8, wherein in step 4, in the depth residual contraction network, the residual contraction module performs a convolution operation on the feature matrix X3 first, and then performs normalization and ReLU activation functions again to obtain a feature matrix X4; each element r in the feature matrix X4 is subjected to soft thresholding to obtain a new set of feature matrices, denoted as feature matrix X5, the feature matrix X5 matrix is convolved again, and then the extracted abstract high-dimensional features are reduced in dimension by global average pooling (Global Average Pooling, GAP).
10. The method for detecting network intrusion by DRSN and LSTM based attention mechanism of claim 9, wherein the soft thresholding is formulated as follows:
wherein r is a feature before soft thresholding, tau is a threshold, and y is an element feature after soft thresholding; the method for obtaining the threshold value tau is as follows: averaging the feature matrix X4, and then carrying out global average pooling to obtain a feature matrix A; inputting the feature matrix A into a fully-connected network of a residual error contraction module, taking a Sigmoid function as a last layer, and normalizing output to between 0 and 1 to obtain a weight coefficient alpha; and multiplying alpha by the characteristic of each channel of the characteristic matrix A to obtain a final threshold alpha multiplied by A, and recording the final threshold alpha multiplied by A as tau.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310657316.7A CN116684138A (en) | 2023-06-05 | 2023-06-05 | DRSN and LSTM network intrusion detection method based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310657316.7A CN116684138A (en) | 2023-06-05 | 2023-06-05 | DRSN and LSTM network intrusion detection method based on attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116684138A true CN116684138A (en) | 2023-09-01 |
Family
ID=87786731
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310657316.7A Pending CN116684138A (en) | 2023-06-05 | 2023-06-05 | DRSN and LSTM network intrusion detection method based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116684138A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117411684A (en) * | 2023-10-17 | 2024-01-16 | 国网新疆电力有限公司营销服务中心(资金集约中心、计量中心) | Industrial control network intrusion detection method and system based on deep learning |
-
2023
- 2023-06-05 CN CN202310657316.7A patent/CN116684138A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117411684A (en) * | 2023-10-17 | 2024-01-16 | 国网新疆电力有限公司营销服务中心(资金集约中心、计量中心) | Industrial control network intrusion detection method and system based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110287983B (en) | Single-classifier anomaly detection method based on maximum correlation entropy deep neural network | |
CN110048827B (en) | Class template attack method based on deep learning convolutional neural network | |
CN109446804B (en) | Intrusion detection method based on multi-scale feature connection convolutional neural network | |
CN109309675A (en) | A kind of network inbreak detection method based on convolutional neural networks | |
CN111652290A (en) | Detection method and device for confrontation sample | |
CN114462520A (en) | Network intrusion detection method based on traffic classification | |
CN115811440B (en) | Real-time flow detection method based on network situation awareness | |
CN116910752B (en) | Malicious code detection method based on big data | |
CN116684138A (en) | DRSN and LSTM network intrusion detection method based on attention mechanism | |
CN116192523A (en) | Industrial control abnormal flow monitoring method and system based on neural network | |
CN115695025B (en) | Training method and device for network security situation prediction model | |
CN113378160A (en) | Graph neural network model defense method and device based on generative confrontation network | |
CN113556319A (en) | Intrusion detection method based on long-short term memory self-coding classifier under internet of things | |
CN114338199A (en) | Attention mechanism-based malicious flow detection method and system | |
CN116633601A (en) | Detection method based on network traffic situation awareness | |
CN114003900A (en) | Network intrusion detection method, device and system for secondary system of transformer substation | |
Xin et al. | Research on feature selection of intrusion detection based on deep learning | |
CN116245645A (en) | Financial crime partner detection method based on graph neural network | |
CN115546003A (en) | Back door watermark image data set generation method based on confrontation training network | |
Li et al. | Robust content fingerprinting algorithm based on invariant and hierarchical generative model | |
Tanaka et al. | Recognition of paper currencies by hybrid neural network | |
CN114662143B (en) | Sensitive link privacy protection method based on graph embedding | |
CN117614742B (en) | Malicious traffic detection method with enhanced honey point perception | |
Lai et al. | Detecting network intrusions using signal processing with query-based sampling filter | |
CN117592085A (en) | Data security detection method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |