CN116684138A

CN116684138A - DRSN and LSTM network intrusion detection method based on attention mechanism

Info

Publication number: CN116684138A
Application number: CN202310657316.7A
Authority: CN
Inventors: 王海凤; 王凯江; 白倩; 杜辉; 贾颜妃; 郑承蔚; 刘瑞
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2023-06-05
Filing date: 2023-06-05
Publication date: 2023-09-01

Abstract

A network intrusion detection method based on DRSN and LSTM of attention mechanism carries on preprocessing to intrusion detection data set, uses convolution module and attention mechanism module to extract the flow characteristic of input, gets the characteristic matrix with channel and space attention mechanism, uses DRSN to extract the space characteristic; the method can be used for superposing deeper network structures, mining deeper spatial characteristic information and removing noise through soft thresholding. The spatial features are input into the LSTM for training, the time sequence features in the data are extracted, the optimized DRSN-LSTM intrusion detection model is obtained by continuously updating the weight parameters, the advantages of the DRSN for extracting the spatial features and the advantages of the LSTM for extracting the time sequence features are combined, the accuracy of intrusion detection can be effectively improved, the false alarm rate of intrusion detection is reduced, and the generalization capability of the model is enhanced based on the denoising capability of the DRSN.

Description

DRSN and LSTM network intrusion detection method based on attention mechanism

Technical Field

The application belongs to the technical field of network security, and particularly relates to a DRSN and LSTM network intrusion detection method based on an attention mechanism.

Background

With the growth of the internet scale and the continuous emergence of various network applications, networks have become a necessity for human life. At present, massive network traffic data information exists on the Internet, and various network attacks bring serious threat to network space security. The intrusion detection system is an important tool for protecting the safety of human network property as an effective network protection means.

Most of the current network intrusion detection technologies adopt a characteristic detection mode, and the detection range is limited. Specifically, the intrusion detection technology focusing on feature detection is only suitable for computer networks to develop relatively simple intrusion behavior detection and basic defense management, and in the face of complex network environments, the intrusion detection based on feature detection depends on a database of known attack marks, and events and traffic are matched with the database of known marks, so that whether an attack exists or not is judged, but unknown attacks cannot be detected. The other part of intrusion detection adopts anomaly detection to try to learn a normal behavior rule and identify everything else as anomaly or intrusion, but has the problems of low accuracy and high false alarm rate due to huge data volume, excessive characteristics, noisy data and redundant data.

Disclosure of Invention

In order to overcome the disadvantages of the prior art, the present application aims to provide a network intrusion detection method based on a DRSN (Deep Residual Shrinkage Networks, DRSN) and a Long Short-Term Memory (LSTM) of an attention mechanism (Attention Mechanism, AM), wherein due to the difference in importance degree of features, the AM is used to weight data features first, so that important features become prominent, unimportant features are weakened, feature dimensions participating in computation are reduced, and training speed is increased. The DRSN is adopted to extract the spatial characteristics in the characteristics, and on the basis of a residual error network, an attention mechanism and soft thresholding are added, so that the model can better pay attention to important characteristics, the influence of noise on a result is inhibited as much as possible, and the capability of extracting the characteristics of the model can be obviously improved. The DRSN has the capability of extracting the spatial features of the data, the LSTM has the capability of extracting the time sequence features in the data, the advantages of the DRSN and the LSTM are combined, the spatial features are extracted by using the DRSN, the noise data is eliminated, the time sequence features are extracted by using the LSTM, the time-space features of the data are fully learned, and the model training speed and the intrusion detection accuracy are improved.

In order to achieve the above purpose, the technical scheme adopted by the application is as follows:

a network intrusion detection method of DRSN and LSTM based on attention mechanism includes the following steps:

step 1, preprocessing an intrusion detection data set to obtain a data set X;

the intrusion detection data set comprises network traffic data and class identification tags label, wherein the network traffic data comprises numerical data and tag data, and the class identification tags label are used for identifying intrusion categories; the preprocessing encodes intrusion categories into different numbers; the label data are coded and converted into numerical value data, states with different one-bit characteristic values are represented through a group of binary numbers, and then all the numerical value data are scaled to dimension [0,1] through normalization operation;

step 2, extracting input flow characteristics by using a convolution module and an attention mechanism module to obtain a characteristic matrix MS with a channel and a spatial attention mechanism;

step 3, adjusting the feature matrix MS as a feature matrix X3 so that the feature matrix MS can be used as the input of a depth residual error shrinkage network;

step 4, extracting the spatial features of the feature matrix X3 by using a depth residual error shrinkage network;

and 5, inputting the spatial features into the LSTM for training, extracting time sequence features in the data, and continuously updating weight parameters to obtain an optimized DRSN-LSTM intrusion detection model.

And 6, performing network intrusion detection by using the optimized DRSN-LSTM intrusion detection model, taking network traffic as input and taking whether intrusion behavior is output.

In one embodiment, the intrusion detection data set is an NSL-KDD data set, and uses 41 types of network traffic data and a class identification tag label in each connection record thereof; the intrusion categories include Normal, dos, proboing, R L and U2R, representing non-intrusive, denial of service intrusion, monitoring and other acquisition of detection, remote machine illegitimate access, and regular user illegitimate access to local superuser privileges, respectively; the protocol type protocl_type, the service type service and the flag in the 41-class network traffic data form tag data, and the tag data adopts One-Hot coding.

In one embodiment, for a classification problem, normal is encoded as 1, dos, proboing, R2L, and U2R are all encoded as 2; for the multi-classification problem Normal, dos, proboing, R L and U2R are coded sequentially as 1, 2, 3, 4, 5.

In one embodiment, the one-dimensional vector data in step 1 is converted into a two-dimensional matrix format, that is, intrusion detection data is converted into a gray scale map, and the conversion process is as follows: selecting one sample X from the data set X, expanding X by using a random forward distribution function, expanding X from n ' columns to n ' +m ' columns, and reconstructing X into a matrix of p+q, wherein p+q=n ' +m ', repeating until all samples in X are traversed, wherein n ' is the initial column number of the samples, and m ' is the expanded column number.

In one embodiment, the convolution module performs a convolution operation on the two-dimensional matrix, that is, performs an inner product on the partial image and the convolution kernel matrix, where the formula is as follows:

wherein W is the matrix of the convolution kernel, S is the matrix after convolution, S _(i,j) The method is characterized in that in the ith row and the jth column, convolution values are obtained after convolution, the size of a local image of convolution operation is required to be consistent with that of a convolution kernel, and x is required to be consistent with that of the convolution kernel _(i+m,j+m) The local image information of the ith row, the jth column and the length and the width of the sample x are m and n respectively; w (w) _(m,n) Is a convolution kernel of length m and width n.

In one embodiment, the attention mechanism module takes the form of a channel attention mechanism in series with a spatial attention mechanism; the feature matrix output by the convolution module firstly passes through a channel attention mechanism and outputs a feature matrix MC with channel attention weight; the MC serves as an input feature of the spatial attention mechanism, generating a feature matrix MS with channels and spatial attention mechanisms.

In one embodiment, the channel attention mechanism, the width is W, the height is H, the channel number is C, the dimension features are respectively reduced to 2 feature vectors of 1×1×c through wide and high-based average pooling and global maximum pooling, then the feature vectors are processed through a shared multi-layer perceptron MLP, added and converted into weight feature vectors of 1×1×c through a Sigmoid function, and finally MC is obtained through multiplication with input features, and the formula is as follows:

MC(X)＝σ(MLP(MaxPool(X))+MLP(AvgPool(X)))

wherein: sigma is a nonlinear activation function Sigmoid, maxPool is maximum pooling, and AvgPool is average pooling;

the spatial attention mechanism respectively carries out global average pooling and global maximum pooling based on channels on MC in channel dimension, stacks the channel numbers of the formed feature graphs, and obtains MS through convolution layers and Sigmoid conversion, and the formula is as follows:

MS(X)＝σ(f[AvgPool(MC(X))；MaxPool(MC(X))])

wherein: f is convolution dimension reduction operation.

In one embodiment, in the step 3, the feature matrix MS is adjusted by performing convolution operation on the feature matrix MS twice, and then performing batch normalization and ReLU activation function activation to obtain the feature matrix X3.

In one embodiment, in the step 4, in the depth residual error contraction network, the residual error contraction module performs a convolution operation on the feature matrix X3 first, and then performs normalization and a ReLU activation function again to obtain a feature matrix X4; each element r in the feature matrix X4 is subjected to soft thresholding to obtain a new set of feature matrices, denoted as feature matrix X5, the feature matrix X5 matrix is convolved again, and then the extracted abstract high-dimensional features are reduced in dimension by global average pooling (Global Average Pooling, GAP).

In one embodiment, the soft thresholding is formulated as follows:

wherein r is a feature before soft thresholding, tau is a threshold, and y is an element feature after soft thresholding; the method for obtaining the threshold value tau is as follows: averaging the feature matrix X4, and then carrying out global average pooling to obtain a feature matrix A; inputting the feature matrix A into a fully-connected network of a residual error contraction module, taking a Sigmoid function as a last layer, and normalizing output to between 0 and 1 to obtain a weight coefficient alpha; and multiplying alpha by the characteristic of each channel of the characteristic matrix A to obtain a final threshold alpha multiplied by A, and recording the final threshold alpha multiplied by A as tau.

Compared with the prior art, the application has the beneficial effects that:

the DRSN is applied to the field of intrusion detection for the first time, and a new method is provided for intrusion detection. The residual error module of the DRSN can be used for superposing a deeper network structure, mining deeper space characteristic information and removing noise through soft thresholding. The application combines the DRSN and the LSTM for the intrusion detection field for the first time, creatively provides a DRSN-LSTM intrusion detection model, combines the advantages of the extraction of the spatial features of the DRSN and the advantages of the extraction of the time sequence features of the LSTM, is based on the denoising capability of the DRSN, can effectively improve the accuracy of intrusion detection, reduce the false alarm rate of intrusion detection and strengthen the generalization capability of the model.

Drawings

Fig. 1 is a flow chart of intrusion detection.

Fig. 2 is a flow chart of an attention mechanism.

Fig. 3 is a residual shrinkage module model diagram.

Fig. 4 is a structural view of LSTM.

Detailed Description

The application will be described in further detail with reference to the drawings and examples.

As shown in fig. 1, the present application is a network intrusion detection method of DRSN and LSTM based on an attention mechanism, including the following steps:

and step 1, preprocessing an intrusion detection data set to obtain a data set X.

The intrusion detection data set used in the application comprises network traffic data and class identification tags label, wherein the class identification tags label are used for identifying intrusion categories, and the network traffic data comprises numerical data and tag data.

The preprocessing of the application is to encode the intrusion category into different numbers for distinguishing; and the tag data is encoded and converted into numerical data to represent the states of different one-bit eigenvalues through a set of binary numbers, and then all numerical data is scaled to dimension [0,1] through normalization operation.

In particular embodiments of the present application, the particular type of intrusion dataset is not limited, and an implementer may choose one or more of DALPA 98, DALPA 99, DALPA 00, DALPA 2000, KDD99, NSL-KDD, and IDS2018 according to actual needs. The present example selects the NSL-KDD dataset. In the NSL-KDD data set, each connection record contains 41 types of network traffic data, a class identification tag and a difficulty level tag. Among the class 41 network traffic data, there are class 9 TCP connection basic characteristics, class 13 TCP connection content characteristics, class 9 time-related network traffic data, and class 10 host-related network traffic data, whereby class 41 characteristic attributes can be obtained. The difficulty level label is used primarily to correctly label the number of learners for a given record, and is discarded regardless of the training of the present application.

The application uses the 41-class network flow data and class identification label, wherein the intrusion class comprises Normal, dos, proboing, R L and U2R which respectively represent that no intrusion is received, service intrusion is refused, monitoring and other detection acquisition are carried out, remote machine illegal access and common user illegal access to local super user privileges are carried out; in the 41-class network traffic data, the protocol type protocl_type, the service type service and the flag are character-type characteristics, so that the tag data of the application is formed.

For the class identification label, whether the final training result is consistent with the label attribute can be distinguished, and training is not participated, so that the class is converted into the conventional number only according to the conversion rule of the table 1.

TABLE 1

That is, for the classification problem, normal is encoded as 1, dos, probing, R2L, and U2R are all encoded as 2; for the multi-classification problem Normal, dos, proboing, R L and U2R are coded sequentially as 1, 2, 3, 4, 5.

The tag data, namely the protocol type protocl_type, the service type service and the flag, form the tag data which adopts independent thermal coding (One-hot) and aims to enable a group of binary numbers to represent states with different One-bit characteristic values.

For the single-heat coding, a certain value of the discrete features corresponds to a certain point of the N-dimensional space, and the discrete features can be used for independent coding so that the calculation of the distance between the features is more reasonable. For example, the protocol_type attribute indicates TCP, UDP, ICMP network protocols, and 100 may be used to indicate TCP protocol, 010 may be used to indicate UDP protocol, 001 may be used to indicate ICMP protocol, and the remaining features may be used as such.

The training set and the data set are not limited in this implementation, and an implementation person can divide the training set and the data set according to own actual needs, and the implementation preferably divides 80% of the training set and 20% of the training set and the test set.

The normalization operation of the values can eliminate the difference between the data of different dimensions, and in order to ensure the reliability of the training result, a minimum-maximum (min-max) normalization method is adopted to normalize the characteristics to be within the range of [0,1]. The formula is as follows:

wherein x is the characteristic value of the original data, x _min Is the minimum value of the feature, x _max Is the characteristic maximum value, x _norm Is normalized value.

Step 2: and (3) converting the one-dimensional vector data in the step (1) into a two-dimensional matrix format required by the model, namely converting intrusion detection data into a gray level map.

Based on the data filling mode, feature expansion is carried out on the data samples, the expanded data is converted into a two-dimensional matrix format, and the conversion process is as follows: selecting one sample X from the data set X, expanding X by using a random forward distribution function, expanding X from n ' columns to n ' +m ' columns, reconstructing X structure, reconstructing X into a p+q matrix, wherein p=n ' +m ', repeating the steps until all samples in X are traversed, wherein n ' is the initial column number of the samples, and m ' is the expanded column number.

Step 3: and extracting the input flow characteristics by using a convolution module and an attention mechanism module to obtain a characteristic matrix MS with a channel and a spatial attention mechanism.

As shown in fig. 2, the attention mechanism references the way the human being thinks of attention, and can selectively focus on a part of all information, while ignoring other visible information. The attention mechanism takes the form of a channel attention mechanism in series with a spatial attention mechanism.

In the step, a convolution module is utilized to perform a convolution operation on the two-dimensional matrix, namely, inner products are performed on the partial image and the convolution kernel matrix, and the formula is as follows:

The convolution operation can be performed multiple times, and the implementation is preferably performed once, so that the operation time is saved. Since the input image is a gray scale map of the intrusion detection data set converted into two dimensions, it is preferable to set the depth of the convolution to 1 and the width of the convolution kernel to 3, and the corresponding adjustment can be made according to the different data sets selected.

The attention mechanism module adopts a mode that a channel attention mechanism and a space attention mechanism are connected in series; the feature matrix output by the convolution module firstly passes through a channel attention mechanism and outputs a feature matrix MC with channel attention weight; the MC serves as an input feature of the spatial attention mechanism, generating a feature matrix MS with channels and spatial attention mechanisms.

The channel attention mechanism takes a feature matrix after convolution operation of a convolution module as input, the width is W, the height is H, and dimension features of the channel number C are respectively reduced to 2 feature vectors of 1 multiplied by C through average pooling and global maximum pooling based on the width and the height. Then through the shared multi-layer perceptron MLP, and add and convert into 1×1×C weight feature vector through Sigmoid function, finally through multiplying with the input feature, get the output feature matrix MC of the channel attention mechanism, the formula is as follows.

MC(X)＝σ(MLP(MaxPool(X))+MLP(AvgPool(X)))

Wherein: sigma is the nonlinear activation function Sigmoid, maxPool is the maximum pooling, avgPool is the average pooling.

Then, taking the feature matrix MC as the input of a spatial attention mechanism, and respectively carrying out channel-based global average pooling and global maximum pooling on the MC in the channel dimension. After stacking (concat) the formed feature map channel number, the output feature matrix MS of the spatial attention module is finally generated through a convolution layer and Sigmoid conversion, and the formula is as follows:

MS(X)＝σ(f[AvgPool(MC(X))；MaxPool(MC(X))])

wherein: f is convolution dimension reduction operation.

The AM is applied to an intrusion detection model, useful characteristic information can be amplified, useless characteristic information is eliminated, dimension reduction is carried out on data, and the data volume is reduced, so that the detection efficiency and accuracy are improved.

And 4, adjusting the feature matrix MS to be a feature matrix X3 so that the feature matrix MS can be used as an input of a depth residual error shrinkage network.

And (3) expressing the characteristic matrix MS as a characteristic matrix X1, performing convolution operation twice to obtain a transformed characteristic matrix X2, and performing batch normalization (Batch Normalization, BN) and ReLU activation function activation on the characteristic matrix X2 to obtain a characteristic matrix X3. The ReLU activation function formula is: f (x) =max (0, x).

For the processing of the feature matrix X1, the convolution operation can be carried out for a plurality of times according to the actual situation of the user to obtain X2, but the times are not more than 4 times, so as to prevent the overfitting and influence the final effect.

And 5, extracting the spatial features of the feature matrix X3 by using a depth residual error shrinkage network.

The depth residual shrinkage network aims to strengthen the capability of the depth neural network for extracting useful features from samples containing noise and redundancy by introducing a residual shrinkage module, is used for removing the redundant features, improves the classification accuracy of a neural network model, is more convenient for back propagation by means of identity mapping of the residual network, reduces the difficulty of neural network training and prevents gradient explosion.

In the step, the feature matrix X3 is subjected to noise elimination and effective feature extraction through 3 residual error contraction modules after passing through normalization layers (Batch Normalization, BN), then the extracted abstract high-dimensional features are subjected to dimension reduction through global average pooling layers (Global Average Pooling, GAP), training parameters are greatly reduced, overfitting is avoided, and classification results are input through full connection.

The residual contraction network, like the conventional deep convolution network, includes: convolution layer, pooling layer, bias term, activation function, cross entropy loss function. A deviation term needs to be added between the convolution kernel and the feature map, and the expression form of the deviation term is as follows:

wherein x is _i Input feature map representing the ith channel, y _j Is the output characteristic diagram of the jth channel, k is the convolution kernel, M _j Is the channel set that calculates and outputs the j-th feature map.

The depth residual shrinkage network aims to strengthen the capability of the depth neural network for extracting useful features from samples containing noise and redundancy by introducing a residual shrinkage module, is used for removing the redundancy features, improves the classification accuracy of a neural network model, is more convenient for back propagation by identity mapping of the residual network, reduces the difficulty of neural network training and prevents gradient explosion. Rejecting redundant features relies on soft thresholding, which is a key step in noise reduction algorithms, to reject features with absolute values less than a certain threshold, and to shrink features with absolute values greater than the threshold toward zero.

The residual contraction module is shown in fig. 3, and the two-dimensional convolution is to slide on the two-dimensional image by using a filter, multiply and sum corresponding positions, and extract the characteristics of the image. It is taken as the first layer and so the input image size needs to be given. The convolution effect is mainly determined by the number, width, depth and step size of convolution kernels. Since the input image is a gray scale map converted from the intrusion detection data set, the depth of convolution is set to 1. Considering the universality of the model, the width of the convolution kernel is set to be 3 so as to reduce model parameters, namely in the residual error contraction module of the step, the convolution is two-dimensional convolution, the convolution depth is 1, and the convolution kernel width is 3. And performing soft thresholding on the extracted image features, which is completed by a residual error contraction module, which is the core of the DRSN. Unlike the normal residual module, the residual contraction module embeds a sub-network adaptive generation threshold.

The specific implementation steps of the residual error contraction module are as follows:

step 5-1: the residual contraction module first performs a convolution operation on the incoming feature matrix X3. And then carrying out normalization and ReLU activation functions again to obtain a new feature matrix X4.

Step 5-2: each element r in the feature matrix X4 is soft thresholded to obtain a new set of feature matrices, denoted as feature matrix X5. The specific method comprises the following steps:

averaging the feature matrix X4, and then carrying out global average pooling to obtain a feature matrix A; inputting the feature matrix A into a full-connection network of a residual error contraction module, taking a Sigmoid function as a final layer, normalizing output to be between 0 and 1, and obtaining a weight coefficient alpha; and multiplying alpha by the characteristic of each channel of the characteristic matrix A to obtain a final threshold alpha multiplied by A, and recording the final threshold alpha multiplied by A as tau.

The threshold is an average of the absolute values of a digital x feature map between 0 and 1 in such a way that the threshold is not only positive but not too large. And different samples have different thresholds. According to different thresholds, the model can be made to notice the characteristics irrelevant to the current task, and the model is set to be zero through soft thresholding; at the same time, the features related to the current task can be noted, and they are preserved.

Each element r in the feature matrix X4 is soft thresholded as follows to obtain a feature matrix X5. The soft thresholding formula is as follows:

where r is the pre-soft thresholding feature, τ is the threshold, and y is the post-soft thresholding elemental feature.

The derivative of the soft thresholded output with respect to the input is:

the above formula shows that soft thresholding derivatives, either 1 or 0, can reduce the risk of the deep learning algorithm encountering gradient vanishing and gradient explosion.

Step 5-3: and convolving the feature matrix X5 matrix again, and then reducing the dimension of the extracted abstract high-dimension features through global average pooling (Global Average Pooling, GAP), so as to greatly reduce training parameters and avoid overfitting.

Further, X5 may be passed into the two following residual contraction modules, performing the same operations as the first residual module. Finally, the feature X6 after soft thresholding and space extraction is obtained.

Step 6: converting the spatial features extracted by the DRSN into one-dimensional features, inputting the one-dimensional features into an LSTM model for training, extracting time sequence features in data, and continuously updating weight parameters to obtain an optimized DRSN-LSTM intrusion detection model.

LSTM was proposed to solve the problems of gradient extinction and gradient explosion of the recurrent neural network (Recurrent Neural Network, RNN). As shown in FIG. 4, LSTM uses memory cells in the hidden layer, which are mainly composed of forgetting gate, input gate, output gate and self-connected memory cells, compared with RNN. The equations for LSTM forget gate, input gate, output gate and memory cells are as follows:

forgetting the door: f (f) _t ＝σ(w _f ·[h _t-1 ，x _t ]+b _f )

An input door: i.e _t ＝σ(w _i ·[h _t-1 ，x _t ]+b _i )

Candidate memory cells:

memory finenessCells:

output door: o (o) _t ＝σ(w _o ·[h _t-1 ，x _t ]+b _o )

And (3) outputting: h is a _t ＝o _t *tanh(C _t )

Wherein f, i, o and c represent respectively the forgetting gate, the input gate, the output gate and the output of the memory cell, w _f 、w _i 、w _c 、w _o Is a weight matrix, b _f 、b _i 、b _c 、b _o Is the bias vector and σ is the sigmoid function.

In this step, the training process is as follows:

step 6-1: the input feature vector is acquired through an input gate in the LSTM, and then whether to forget the data is determined through a forget gate.

Step 6-2: and calculating errors of each layer through the training set classification result, and transmitting the errors to the upper layer.

Step 6-3: and calculating the gradient of each weight according to the calculated error term.

Step 6-4: the weights of the layers are updated according to the gradient. And optimizing and screening out the parameter combination of the DRSN-LSTM intrusion detection model in iteration, and finding out the optimal parameters.

Different iteration times can be selected according to computer configuration and actual needs, in the embodiment, 100 times are iterated in total, and the parameter combination of the DRSN-LSTM intrusion detection model is optimized and screened in the iteration to find the optimal parameter.

Step 7: and performing intrusion detection test on the trained DRSN-LSTM intrusion detection model on the test set to obtain a classification result and evaluate the performance of the model. Finally, the network intrusion detection can be performed by using an optimized DRSN-LSTM intrusion detection model, taking network traffic as input and taking whether intrusion behavior is output.

And performing intrusion detection test on the test set by using the trained DRSN-LSTM intrusion detection model. The evaluation index used in the implementation is the accuracy, the precision, the recall rate and the F1 value.

Recall (Recall): the proportion of the number of samples correctly identified as abnormal in the abnormal samples to the total number of abnormal samples is also referred to as true case rate (True Positive Rate) or Sensitivity (Sensitivity). The formula is:

precision (Precision): the number of samples correctly identified as abnormal by the classifier is a proportion of the number of samples predicted as abnormal by the classifier. The formula is:

accuracy (Accuracy): the proportion of the number of correctly classified samples of the classifier to the total number of samples, namely the proportion of the number of samples of the classifier, the predicted result of which is consistent with the actual result, to the total number of samples. The formula is:

f1 Score (F1 Score): the comprehensive index comprehensively considering the recall rate and the precision rate is a harmonic average value of the recall rate and the precision rate and is used for measuring the overall performance of the classifier. The formula is:

wherein: TP is true positive data, namely predicted as attack data, and actually is true attack data;

TN is true negative data, namely predicted normal data, and actual TN is true normal data; FP is false positive data, i.e. predicted as attack data, actually normal data; FN is false negative data, i.e. predicted as normal data, actually attack data. The implementation accuracy reaches 97.56%, the accuracy also reaches 74.68%, the recall rate reaches 99.65%, and the F1 value reaches 85.43%.

Compared with the prior art, the method has the advantages that the attention mechanism, the DRSN and the LSTM model are reasonably and efficiently combined, and the detection accuracy and the generalization capability of the model are improved.

The foregoing is only a preferred embodiment of the application, it being noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present application, and such modifications and adaptations are intended to be comprehended within the scope of the application.

Claims

1. A network intrusion detection method based on DRSN and LSTM of attention mechanism is characterized by comprising the following steps:

step 1, preprocessing an intrusion detection data set to obtain a data set X;

step 5, inputting the spatial features into LSTM for training, extracting time sequence features in the data, and continuously updating weight parameters to obtain an optimized DRSN-LSTM intrusion detection model;

2. The method for detecting network intrusion by DRSN and LSTM based on attention mechanism according to claim 1, wherein the intrusion detection data set is NSL-KDD data set, and uses 41 kinds of network traffic data and one kind of class identification tag label in each connection record thereof; the intrusion categories include Normal, dos, proboing, R L and U2R, representing non-intrusive, denial of service intrusion, monitoring and other acquisition of detection, remote machine illegitimate access, and regular user illegitimate access to local superuser privileges, respectively; the protocol type protocl_type, the service type service and the flag in the 41-class network traffic data form tag data, and the tag data adopts One-Hot coding.

3. The method for detecting network intrusion by DRSN and LSTM based on attention mechanism of claim 2, wherein Normal is encoded as 1, dos, probing, R2L and U2R are all encoded as 2 for the classification problem; for the multi-classification problem Normal, dos, proboing, R L and U2R are coded sequentially as 1, 2, 3, 4, 5.

4. The network intrusion detection method based on DRSN and LSTM of claim 1, wherein the one-dimensional vector data in step 1 is converted into a two-dimensional matrix format, that is, intrusion detection data is converted into a gray scale map, and the conversion process is as follows: selecting one sample X from the data set X, expanding X by using a random forward distribution function, expanding X from n ' columns to n ' +m ' columns, and reconstructing X into a matrix of p+q, wherein p+q=n ' +m ', repeating until all samples in X are traversed, wherein n ' is the initial column number of the samples, and m ' is the expanded column number.

5. The method for detecting network intrusion by DRSN and LSTM based on an attention mechanism according to claim 1, wherein the convolution module performs a convolution operation on the two-dimensional matrix, i.e. performs an inner product on the partial image and the convolution kernel matrix, as follows:

6. The network intrusion detection method according to claim 1, wherein the attention mechanism module adopts a mode that a channel attention mechanism is connected in series with a spatial attention mechanism; the feature matrix output by the convolution module firstly passes through a channel attention mechanism and outputs a feature matrix MC with channel attention weight; the MC serves as an input feature of the spatial attention mechanism, generating a feature matrix MS with channels and spatial attention mechanisms.

7. The network intrusion detection method according to claim 6, wherein the channel attention mechanism, width W, height H, and channel number C dimensional features are reduced to 2 feature vectors of 1×1×c through wide and high average pooling and global maximum pooling, respectively, then through a shared multi-layer perceptron MLP, and added to be converted into weight feature vectors of 1×1×c through Sigmoid function, and finally through multiplication with input features to obtain MC, the formula is as follows:

MC(X)＝σ(MLP(MaxPool(X))+MLP(AvgPool(X)))

MS(X)＝σ(f[AvgPool(MC(X))；MaxPool(MC(X))])

wherein: f is convolution dimension reduction operation.

8. The method for detecting network intrusion by DRSN and LSTM based on attention mechanism according to claim 1, wherein in step 3, the feature matrix MS is adjusted by performing convolution operation twice, and then batch normalization and ReLU activation function activation are performed to obtain the feature matrix X3.

9. The method for detecting network intrusion by DRSN and LSTM based on attention mechanism according to claim 1 or 8, wherein in step 4, in the depth residual contraction network, the residual contraction module performs a convolution operation on the feature matrix X3 first, and then performs normalization and ReLU activation functions again to obtain a feature matrix X4; each element r in the feature matrix X4 is subjected to soft thresholding to obtain a new set of feature matrices, denoted as feature matrix X5, the feature matrix X5 matrix is convolved again, and then the extracted abstract high-dimensional features are reduced in dimension by global average pooling (Global Average Pooling, GAP).

10. The method for detecting network intrusion by DRSN and LSTM based attention mechanism of claim 9, wherein the soft thresholding is formulated as follows: