CN115086029A

CN115086029A - Network intrusion detection method based on two-channel space-time feature fusion

Info

Publication number: CN115086029A
Application number: CN202210672884.XA
Authority: CN
Inventors: 苏新; 张桂福; 成振
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2022-06-15
Filing date: 2022-06-15
Publication date: 2022-09-20

Abstract

The invention discloses a network intrusion detection method based on two-channel space-time feature fusion, which comprises the following steps: collecting network flow data from a monitoring network to generate a sample set to be detected; performing one-hot coding on character type characteristics in the generated sample set to be detected, and performing Min-Max normalization on all characteristics of the sample to be detected to obtain a preprocessed sample set to be detected; inputting the preprocessed sample to be detected into a trained two-channel space-time characteristic fusion network for detection to obtain a detection result of the sample to be detected; if the detected result is the attack flow data, isolating the data source in the monitoring network and informing an administrator, and if the detected result is the normal flow data, allowing the flow to pass normally. The invention can effectively extract the spatial characteristics and the time sequence characteristics of the network flow data, reduce the complexity of an intrusion detection model and have better characteristic representation capability and higher detection rate.

Description

Network intrusion detection method based on two-channel space-time feature fusion

Technical Field

The invention belongs to the field of network security, relates to a network intrusion detection technology, and particularly relates to a network intrusion detection method based on dual-channel space-time feature fusion.

Background

With the rapid development of the internet and sensor technologies, human-computer interaction, Device to Device (D2D) interaction between devices, makes life more convenient. However, the structure between networks is more and more dynamic and heterogeneous, from a single centralized structure to a distributed and centralized mixed structure, and moreover, due to the low price of most sensor devices and the lack of a safe and effective defense mechanism of the network, the network is flooded with various types of network attacks, and the attack technology of network attackers is continuously developed. For example, an attacker may intrude into other countries' waters by modifying military monitoring data; the nodes may also be extinguished by launching a Distributed Denial of Service (DDoS) attack on the monitored area, such that the node energy is exhausted. An attacker can also illegally access an unauthorized sensor network and tamper with the data, destroying the availability, integrity and reliability of the network. How to effectively detect the attacks in the network is an important problem which needs to be solved urgently in the field of network security.

Traditional security mechanisms firewalls, user authentication and encryption techniques, etc. have been difficult to identify disguised attacks in the face of the ever-increasing diversity of attack types today, e.g., encryption techniques completely fail when the key is exposed and accessed by an attacker.

Intrusion detection is used as an active defense mechanism, not only can the network attack of an intruder be resisted, but also the safety of the system can be enhanced. Intrusion Detection Systems (IDS) are divided into Host-based Intrusion Detection (HIDS) and Network-based Intrusion Detection (NIDS) from the point of view of the deployment location. HIDS, deployed on a single host, monitors all activities on the host and detects suspicious behavior, has the advantage of providing higher security for the monitoring host, but is inefficient. NIDS, by contrast, do not have the problem of being deployed at critical locations in the network to achieve protection of the entire network and equipment by constantly monitoring real-time traffic in the network.

In recent years, Machine Learning (ML) based methods have received great attention in the field of Network intrusion detection, and have achieved significant results in the Internet of Things (IoT), Wireless Sensor Networks (WSN), and Internet of Vehicles (IoV). In addition, a Deep Learning (DL) based method is used to solve the problem that the ML method has difficulty in processing high-dimensional big data, and to improve the detection rate. Kim-Hung Le et al (Le K H, Nguyen M H, Tran T D, et al. IMIDS: An Intelligent Detection System against Cyber attacks in IoT [ J ]. Electronics,2022,11(4):524.) adopt a Convolutional Neural Network (CNN) structure to detect attacks in the Network, which alleviates the problem of difficult Detection under the condition of few attack samples lack, but the proposed CNN model has huge number of parameters, occupies large resources and is difficult to guarantee in real time. Lei et al (Lei S, Xia C, Li Z, et al. HNN: a novel model to the study of the interaction detection based on multi-feature correction and temporal-spatial analysis [ J ]. IEEE Transactions on Network Science and Engineering,2021,8(4):3257-3274.) reduce redundancy by feature selection and multi-feature correlation analysis, and use CNN, Long Short Term Memory Network (LSTM) and Deep Neural Network (DNN) as intrusion detection models, achieve better detection results, but there are also cases of high model complexity and the real-time performance is not guaranteed.

At present, most of researches on network intrusion detection based on deep learning adopt a single CNN to extract spatial features of network traffic data to detect network attacks, adopt single LSTM and RNN to extract time sequence features of network traffic to detect network attacks, or simply cascade the CNN and the LSTM to detect the network attacks, and cannot effectively extract the spatial features and the time sequence features of the network traffic, so that the detection rate of the network intrusion is low. Secondly, the proposed model does not fully consider the situation that the resources of scenes such as IoT and WSN are limited, so a lightweight network intrusion detection model capable of effectively extracting the spatial and temporal characteristics of network traffic data is urgently needed.

Disclosure of Invention

The invention aims to: in order to overcome the defects in the prior art, the network intrusion detection method based on the dual-channel space-time feature fusion is provided, the space features and the time sequence features of the network flow data can be effectively extracted, the complexity of an intrusion detection model is reduced, and the network intrusion detection accuracy is improved.

The technical scheme is as follows: in order to achieve the above object, the present invention provides a network intrusion detection method based on dual-channel spatio-temporal feature fusion, comprising the following steps:

s1: collecting network flow data from a monitoring network to generate a sample set to be detected;

s2: performing one-hot coding on character type characteristics in the generated sample set to be detected, and performing Min-Max normalization on all characteristics of the sample to be detected to obtain a preprocessed sample set to be detected;

s3: inputting the preprocessed sample to be detected into a trained two-channel space-time characteristic fusion network for detection to obtain a detection result of the sample to be detected;

s4: if the detected result is the attack flow data, isolating the data source in the monitoring network and informing an administrator, and if the detected result is the normal flow data, allowing the flow to pass normally.

Further, the original format of the network traffic data collected in step S1 is a Pcap format, and each piece of network traffic in the Pcap format is analyzed to obtain a corresponding feature vector.

Further, each feature vector in the step S1 includes five types of features, i.e., stream features, basic features, content features, temporal features, and composite features, and these five types of features can be divided into character-type features and numerical-type features.

Further, the one-hot encoding in step S2 is used to convert the character-type features into binary-valued features that can be processed by the network intrusion detection model, and the specific process in step S2 is as follows:

setting a character feature to have alpha ₁ ,α ₂ ,α ₃ ,α ₄ The values of the four characters can be respectively coded as (1,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1), and each feature in the feature vector is scaled to 0-1 by a Min-Max normalization method, so that the situation that each feature is taken out due to the fact that each feature is reducedThe effect of the difference in magnitude of the values is shown in equation (1):

wherein x and x' are features before and after normalization, x _max And x _min Characteristic maxima and minima.

Further, the two-channel spatio-temporal feature fusion network in step S3 includes a spatial feature extraction module of one-dimensional vector convolution, a temporal feature extraction module of BiGRU of state attention unit, and a classifier module, where the spatial feature extraction module is configured to extract spatial features of a sample to be detected, the temporal feature extraction module is configured to extract temporal features of the sample to be detected, and the classifier module is configured to output a detection classification result according to the extracted spatial features and the temporal features.

Further, the spatial feature extraction module comprises 3 layers of one-dimensional vector convolution layers, the number of convolution kernels is 16,32 and 32 respectively, the sizes of the convolution kernels are 3,3 and 3, the step length is 2,2 and 2, the activation functions are all modified linear functions relu, one Maxpool1D layer is added behind each layer of one-dimensional vector convolution layer, the pooling size is 2, and then a Flatten layer and a full connection layer with the unit number of 16 are added.

Further, the time sequence feature extraction module comprises 2 layers of BiGRUs, the number of the units is 36 and 18 respectively, a Dropout layer is added behind the second layer of BiGRUs, the discarding rate is 0.3, and then a state attention unit and a full connection layer with the number of the units being 6 are added.

Further, the classifier module comprises a coordinate layer with an input unit of 22, 2 full connection layers with unit numbers of 32 and 16 respectively, an activation function is a modified linear function relu, a Dropout layer is added behind each full connection layer, the discarding rate is 0.1 and 0.1 respectively, an output layer is added at last, in the case of binary classification, the output layer unit number is 1, the activation function is sigmoid, the loss function is binary _ cross, in the case of tenth classification, the output layer unit number is 10, the loss function is spark _ structural _ cross, and the activation function is Softmax.

Further, the state attention unit in the timing feature extraction module specifically is:

hidden state matrix considering output of last layer bidirectional BiGRU at any moment

Wherein

Representing the outputs of the forward and backward moments at the Tth moment, a weight vector w of Q's query at the state attention unit is calculated by equation (2),

w＝softmax(tanh(QK+b)V) (2)

wherein, K, V, b are respectively a key matrix, a value matrix and a bias vector; w is a column vector of dimension 2T;

the output o of Q input to the state attention cell is calculated by equation (3):

o＝w ^T Q (3)。

further, the training process of the two-channel spatio-temporal feature fusion network in the step S3 is as follows: initializing parameters of the two-channel space-time characteristic fusion network by adopting an Xavier mode, selecting a cross entropy function as a loss function, and updating the parameters by adopting an Adam optimizer and a back propagation algorithm; selecting a network intrusion detection reference data set UNSW-NB15 as a training data set, carrying out one-hot coding on character type features in the network intrusion detection reference data set, and then carrying out Min-Max normalization on all the features in the network intrusion detection reference data set to obtain a preprocessed training sample set; inputting the preprocessed training data set into a dual-channel space-time feature fusion network according to the time sequence generated by the network flow for training to obtain a trained dual-channel space-time feature fusion network intrusion detection model.

The network intrusion detection method based on the two-channel space-time feature fusion fully considers the spatial relationship between the network traffic data features and the time sequence relationship between the network traffic data. The spatial features are extracted by using a spatial feature extraction module, and the temporal features are extracted by using a temporal feature extraction module. The spatial features and the time sequence features are input into a full-connection classifier for detection, and the problems that the time sequence relation among network traffic data cannot be extracted by using a single CNN structure and the spatial feature extraction capability of the single LSTM and RNN structures on the network traffic data features is poor are solved.

Has the advantages that: compared with the prior art, the invention has the following advantages:

1. the spatial relationship between the network traffic data characteristics and the time sequence relationship between the network traffic data are comprehensively considered, the defect that space-time characteristics cannot be effectively extracted by singly adopting a CNN (network communication network) and an RNN (radio network) is overcome, and the designed network intrusion detection model fully considers the condition that resources are limited in IoT (IoT) and WSN (wireless sensor network) scenes, has better characteristic representation capability and higher detection rate, and improves the detection accuracy of network intrusion.

2. In the whole detection method process, the parameters needing to be trained are few, the calculation complexity and the occupied resources are reduced, and the time overhead required by detection is greatly reduced; and the effectiveness of each component is verified through ablation experiments.

Drawings

FIG. 1 is a flow chart of a dual-channel space-time feature fusion network intrusion detection method;

FIG. 2 is a schematic structural diagram of a two-channel spatiotemporal feature fusion network;

FIG. 3 is a schematic diagram of the loss variation in the binary training process;

fig. 4 is a schematic diagram of the loss variation of ten-degree training process.

Detailed Description

The present invention is further illustrated by the following figures and specific examples, which are to be understood as illustrative only and not as limiting the scope of the invention, which is to be given the full breadth of the appended claims and any and all equivalent modifications thereof which may occur to those skilled in the art upon reading the present specification.

The invention provides a network intrusion detection method based on dual-channel space-time feature fusion, which comprises the following steps as shown in figure 1:

s1: collecting network flow data from a monitoring network, analyzing the network flow of each Pcap format to obtain a corresponding feature vector, wherein each feature vector comprises five types of features including flow features, basic features, content features, time features and synthetic features, the five types of features comprise 49 features, and except proto, service and state, the rest are character-type features, and a sample set to be detected is generated according to the features;

setting a character feature to have alpha ₁ ,α ₂ ,α ₃ ,α ₄ The four character values can be respectively encoded as (1,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1), and the Min-Max normalization method scales each feature in the feature vector to 0-1, so as to reduce the influence caused by different orders of magnitude of the values among the features, as shown in formula (1):

S3: inputting the preprocessed sample to be detected into a trained dual-channel space-time characteristic fusion network shown in FIG. 2 for detection to obtain a detection result of the sample to be detected;

the two-channel space-time feature fusion network comprises a space feature extraction module of one-dimensional vector convolution, a time sequence feature extraction module of a BiGRU of a state attention unit and a classifier module, wherein the space feature extraction module is used for extracting the space features of a sample to be detected, the time sequence feature extraction module is used for extracting the time sequence features of the sample to be detected, and the classifier module is used for outputting a detection classification result according to the extracted space features and the time sequence features.

The spatial feature extraction module comprises 3 layers of one-dimensional vector convolution layers, the number of convolution kernels is 16,32 and 32 respectively, the sizes of the convolution kernels are 3,3 and 3, the step length is 2,2 and 2, the activation functions are all correction linear functions relu, a Maxpool1D layer is added behind each layer of one-dimensional vector convolution layer, the pooling size is 2, and then a Flatten layer and a full connection layer with the unit number of 16 are added.

The time sequence feature extraction module comprises 2 layers of BiGRUs, the number of the units is 36 and 18 respectively, a Dropout layer is added behind the second layer of BiGRUs, the discarding rate is 0.3, and then a state attention unit and a full connection layer with the number of the units being 6 are added.

The classifier module comprises a coordinate layer with 22 input units, 2 full connection layers with 32 and 16 unit numbers respectively, an activation function which is a modified linear function relu, a Dropout layer is added behind each full connection layer, the discarding rate is 0.1 and 0.1 respectively, an output layer is added at the end, the output layer unit number is 1 in the case of binary classification, the activation function is sigmoid, the loss function is binary _ cross, the output layer unit number is 10 in the case of ten-minute classification, the loss function is spark _ coordinate _ cross, and the activation function is Softmax.

The state attention unit in the time sequence feature extraction module is specifically as follows:

Wherein

Representing the outputs at the Tth time instant at the forward time instant and the backward time instant, a weight vector w of Q's query at the state attention unit is calculated by equation (2),

w＝softmax(tanh(QK+b)V) (2)

o＝w ^T Q (3)

the training process of the two-channel space-time feature fusion network comprises the following steps: initializing parameters of the two-channel space-time characteristic fusion network by adopting an Xavier mode, selecting a cross entropy function as a loss function, and updating the parameters by adopting an Adam optimizer and a back propagation algorithm; selecting a network intrusion detection reference data set UNSW-NB15 as a training data set, carrying out one-hot coding on character type features in the network intrusion detection reference data set, and then carrying out Min-Max normalization on all the features in the network intrusion detection reference data set to obtain a preprocessed training sample set; inputting the preprocessed training data set into a dual-channel space-time feature fusion network according to the time sequence generated by the network flow for training to obtain a trained dual-channel space-time feature fusion network intrusion detection model.

In order to verify the effectiveness and actual effect of the network intrusion detection method, the following experimental procedures are performed in this embodiment by using four evaluation criteria, namely Accuracy (Accuracy), Precision (Precision), Recall (Recall), and F1-Score:

experimental tools: the hardware system of this experiment is Windows 10, Intel (R) core (TM) i7-8700KCPU, 16GB memory. All techniques were implemented in Python3.6 using TensorFlow and Scikit-spare frameworks.

The experiment adopts an UNSW-NB15 intrusion detection reference data set as an experiment data set, UNSW-NB15 data are created by a network-wide laboratory of an Australia network security center and comprise real normal activities and synthesized current attack behaviors, the data set comprises 257,673 samples, 175,341 samples are in a training set, and the rest 82,332 samples are in a testing set. The data set contains a total of 10 sample types: normal, Generic, exploites, Fuzzers, DoS, Reconnaisnce, Analysis, Backdoor, Shellcode, Worms. The detailed sample numbers are shown in table 1.

Preprocessing the training set and the testing set, encoding proto, service and state by adopting one-hot encoding, expanding data from 49 dimensions to 196 dimensions, and scaling values of all features to be between 0 and 1 by using Min-Max normalization.

Inputting the preprocessed training set into a dual-channel space-time feature fusion network for training, initializing network initial parameters in an Xavier mode, adopting an Adam optimizer, setting the learning rate to be 0.0035, setting the batch size of training data to be 1024, setting the training cycle number to be 120, setting a loss function to be binary _ crosssensory in the case of two classes, and setting a loss function to be Sparse _ catalytic _ crosssensory in the case of ten classes. And randomly extracting 20% from the training data set to be used as a cross validation set to observe whether the training process is over-fitted, wherein specific parameters of a spatial feature extraction module, a time sequence feature extraction module and a full-connection classifier module in the dual-channel space-time feature fusion network are shown in tables 2,3 and 4.

Fig. 3 and 4 are respectively a loss change diagram for the case of the second class and the tenth class, and it is observed that the loss of the training set and the loss of the verification set rapidly decrease in the early stage of the model under the case of the second class, and tend to converge in the 50 th cycle, the oscillation amplitude is small, and the loss of the training set and the loss of the verification set do not have the high variance, which illustrates that the two-channel spatio-temporal feature fusion network designed by the invention can effectively learn from data under the case of the second class; as shown in fig. 4, under ten similar conditions, the training set loss and the verification set loss gradually tend to converge with the increase of the training period, the oscillation amplitude is smaller, the condition that the difference between the training set loss and the verification set loss is large does not occur, and the overfitting phenomenon does not occur, thereby verifying the characteristic of strong learning capability of the two-channel spatiotemporal feature fusion network designed by the invention.

Experimental results, the present invention is compared with Random Forest (RF), Support Vector Machines (SVM), Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), and bidirectional gated cyclic unit (BiGRU) algorithms, and the results are classified into two categories as shown in table 5, and ten categories as shown in table 6:

from the results in table 5, the network intrusion detection method based on the dual-channel spatio-temporal feature fusion provided by the invention obtains the optimal results on the accuracy, the recall rate and the F1-Score index, the accuracy is improved by 8.76% -33.76% compared with the comparison method, the highest accuracy is 33.16% and the lowest accuracy is 6.06% compared with the F1-Score index, the suboptimal result is obtained on the quasi-accuracy index, but the difference is not large, the accuracy is 0.74% different from the optimal SVM, and the four evaluation indexes are comprehensively considered.

From the results in table 6, the network intrusion detection method based on the two-channel spatio-temporal feature fusion provided by the invention obtains the optimal results on the four indexes of accuracy, precision, recall and F1-score, and the accuracy is improved by 11.32% -28.72%; the accuracy is improved by 3.37% -26.5%; the recall rate is improved by 11.33 to 28.73 percent; the improvement on F1-Score is 12.2-35.16%. This verifies that the present invention performs well with multiple classifications of attacks.

The total consumed time for detecting 82,332 samples in the pretreated test set is 2.149 seconds, and 38311 samples are processed per second on average, which shows that the real-time processing capability required by the real-time situation can be met.

The dual-channel space-time feature fusion network designed by the invention fully considers the complexity of the model, and the number of the parameters to be trained is 64179 under the condition of class II classification, and the number of the parameters to be trained is 64332 under the condition of class ten classification. The method has the characteristic of light weight, and the method is proved to be suitable for network intrusion detection in the scenes with limited resources such as IoT, WSN and the like.

Claims

1. A network intrusion detection method based on dual-channel space-time feature fusion is characterized by comprising the following steps:

s2: performing one-hot coding on character type features in the generated sample set to be detected, and performing Min-Max normalization on all features of the sample to be detected to obtain a preprocessed sample set to be detected;

s4: if the detected result is attack flow data, the data source in the monitoring network is isolated and the administrator is informed, and if the detected result is normal flow data, the flow is allowed to pass normally.

2. The method according to claim 1, wherein the original format of the network traffic data collected in step S1 is Pcap format, and each piece of network traffic in Pcap format is analyzed to obtain a corresponding feature vector.

3. The method for detecting network intrusion based on two-channel spatio-temporal feature fusion of claim 2, wherein each feature vector in the step S1 includes five types of features including stream feature, basic feature, content feature, temporal feature and composite feature, and the five types of features can be divided into character-type feature and numerical-type feature.

4. The method according to claim 1, wherein the one-hot encoding in step S2 is used to convert character-type features into binary-valued features that can be processed by a network intrusion detection model, and the specific process in step S2 is as follows:

setting a character feature to have alpha ₁ ,α ₂ ,α ₃ ,α ₄ The values of the four characters can be respectively encoded as (1,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1), and the Min-Max normalization method scales each feature in the feature vector to 0-1, as shown in formula (1):

5. The method according to claim 1, wherein the two-channel spatio-temporal feature fusion network in step S3 includes a spatial feature extraction module of one-dimensional vector convolution, a temporal feature extraction module of BiGRU of state attention unit, and a classifier module, the spatial feature extraction module is configured to extract spatial features of the sample to be detected, the temporal feature extraction module is configured to extract temporal features of the sample to be detected, and the classifier module is configured to output a detection classification result according to the extracted spatial features and the temporal features.

6. The network intrusion detection method based on the two-channel spatio-temporal feature fusion of claim 5, wherein the spatial feature extraction module comprises 3 layers of one-dimensional vector convolution layers, the number of convolution kernels is 16,32 and 32 respectively, the sizes of the convolution kernels are 3,3 and 3, the step lengths are 2,2 and 2, the activation functions are all modified linear functions relu, one Maxpool1D layer is added behind each one-dimensional vector convolution layer, the pooling size is 2, and then a Flatten layer and a full connection layer with the unit number of 16 are added.

7. The method as claimed in claim 5, wherein the time series feature extraction module includes 2 layers of BiGRU, the number of units is 36,18 respectively, a Dropout layer is added behind the second layer of BiGRU, the discarding rate is 0.3, and then a state attention unit and a full link layer with the number of units being 6 are added.

8. The method as claimed in claim 5, wherein the classifier module includes a configuration layer with 22 input units, 2 full-link layers with 32 and 16 units, respectively, the activation function is a modified linear function relu, a Dropout layer is added after each full-link layer, the dropping rate is 0.1 and 0.1, an output layer is added at the end, the output layer unit number is 1 in the case of binary classification, the activation function is sigmoid, the loss function is binary _ crosssensitivity, the output layer unit number is 10 in the case of ten classification, the loss function is Sparse _ sensing _ cross, and the activation function is Softmax.

9. The method for detecting network intrusion based on two-channel spatio-temporal feature fusion according to claim 7, wherein the state attention unit in the time sequence feature extraction module is specifically:

Wherein

w＝softmax(tanh(QK+b)V) (2)

o＝w ^T Q (3)。

10. the method for detecting network intrusion based on two-channel spatio-temporal feature fusion of claim 1, wherein the training process of the two-channel spatio-temporal feature fusion network in the step S3 is as follows: initializing parameters of the two-channel space-time characteristic fusion network by adopting an Xavier mode, selecting a cross entropy function as a loss function, and updating the parameters by adopting an Adam optimizer and a back propagation algorithm; selecting a network intrusion detection reference data set UNSW-NB15 as a training data set, carrying out one-hot coding on character type features in the network intrusion detection reference data set, and then carrying out Min-Max normalization on all the features in the network intrusion detection reference data set to obtain a preprocessed training sample set; inputting the preprocessed training data set into a dual-channel space-time feature fusion network according to the time sequence generated by the network flow for training to obtain a trained dual-channel space-time feature fusion network intrusion detection model.