Background
The continuous development of network technology brings great changes to human life and has considerable hidden dangers. There are a lot of attacks in the cyberspace, threatening the security of people's information and property as well as national security. Intrusion detection is an important method for maintaining the security of network space, and can monitor a system or a network so as to discover malicious or policy-violations, and effectively detect and resist network attacks.
Intrusion Detection Systems (IDS) have been known for 40 years and classification can be classified into signature-based IDS and anomaly-based IDS based on detection methods. Signature-based IDS defines patterns of malicious network activities through known attacks, and matches incoming traffic with the defined patterns to discover the attacks, which has a low false alarm rate but can only detect known network attacks. Patterns of normal traffic behavior are established based on the anomalous IDS, and behaviors that deviate from normal behavior by more than a threshold are considered anomalous. The method can detect unknown attacks, but has high false alarm rate. Researchers have proposed a variety of different methods to adapt to different application scenarios and reduce the false alarm rate. The intrusion detection based on the abnormity is equivalent to the network flow abnormity detection, and the used methods include a statistical learning method, a traditional machine learning method and a deep learning method. The traditional machine learning method comprises a support vector machine, naive Bayes, K neighbor, a decision tree, hierarchical clustering and the like. These methods rely on manually designed features, the feature design requires strong professional knowledge, some important information is lost or changed in the feature extraction process, and the performance of such methods gradually reaches the bottleneck.
In recent years, deep learning has the advantages of being free of human intervention, capable of learning autonomously, effectively processing high-dimensional data and automatically extracting useful information, has great potential and is researched and used by many scholars. In the field of network traffic anomaly detection, unlike the traditional machine learning which relies on complex feature engineering, the deep learning technology can automatically extract deep features from traffic and classify the deep features. Commonly used methods are multilayer perceptrons, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs) and automatic encoders. The CNN has strong generalization capability and good image processing effect. In the network traffic anomaly detection, the network traffic is often converted into a gray-scale map as an input of the CNN, which is used to extract spatial features of the traffic. RNN is a network for processing sequence data, often used to extract the temporal characteristics of network flows. Researchers have also combined various structures, such as CNNs and RNNs, when features need to be extracted from different angles. However, there are still some problems in directly using these deep learning methods in dealing with network traffic problems.
For example, packets in network traffic arrive sequentially in a certain order, and RNNs and variants thereof are used to extract timing characteristics between packets. However, RNN design presupposes that the time intervals between elements in a sequence are evenly distributed or that there are no time intervals between elements, such as speech and text. Obviously, the arrival speed of the data packets in a network flow is different, and even for different types of attacks, the arrival speed of the data packets can be obviously different. This means that the time intervals between packets are not the same and do not comply with the setting of RNN and its variants. That is, only sequence features between packets are extracted using RNN and its variants, without taking into account temporal features (frequency of packet arrivals).
Similar problems exist with prior art techniques such as the following: chinese patent application CN111428789A discloses a network traffic anomaly detection method based on deep learning, which realizes the anomaly detection of network traffic by combining a long-short term memory network and a convolutional neural network in deep learning to learn the time-space characteristics of the network traffic, but the method does not consider the time interval between data packets, so that the accuracy is not high; chinese patent application CN108900546A discloses an LSTM-based time series network anomaly detection method, which uses LSTM to predict the value of network traffic and compare with the actual measurement value, thereby detecting anomalies, and this method does not consider the spatial characteristics of traffic, and only considers the timing characteristics, and does not consider the time interval between data packets, resulting in low accuracy. Chinese patent application CN108494746A discloses a method for detecting network port traffic anomaly, which uses LSTM to predict that port traffic values cannot be compared with actual measurement values to detect anomalies, and also does not take into account spatial characteristics and time intervals between packets.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a time-space sensing network flow abnormal behavior detection method and an electronic device, which improve LSTM, take the time interval between data packets in network flow into consideration, provide time and length sensing LSTM, and better improve the efficiency and accuracy of intrusion detection.
The technical content of the invention comprises:
a time-space sensing network flow abnormal behavior detection method comprises the following steps:
1) extracting flow characteristics, length characteristics and time characteristics of network flow, and obtaining space characteristics according to the flow characteristics;
2) obtaining classification characteristics according to the space characteristics, the length characteristics and the time characteristics;
3) and classifying the network traffic according to the classification characteristics to obtain the abnormal behavior detection result of the network traffic.
Further, before extracting the flow characteristics, the length characteristics and the time characteristics of the network flow, preprocessing the network flow; the pretreatment comprises the following steps: and dividing the original file into five tuples of < source IP address, source port number, destination IP address, destination port number and protocol > and removing invalid data, wherein the original file is composed of data packets of the network flow.
Further, the method for segmenting the original file comprises the following steps: SplitCap was used.
Further, the invalid data includes: MAC address, type of ethernet header, version of IP header and differentiated services field.
Further, the method for obtaining the spatial features comprises the following steps: and inputting the stream characteristics into a one-dimensional convolutional neural network consisting of two convolutional layers, two pooling layers and a full-connection layer.
Further, the activation function of the one-dimensional convolutional neural network includes: ReLU.
Further, the method for preventing the one-dimensional convolutional neural network from being over-fitted comprises the following steps: a dropout operation is used.
Further, the pooling layer comprises: a maximum pooling layer.
Further, each dimension of the length feature comprises: the length of the current packet.
Further, each dimension of the temporal feature includes: the time interval between the current data packet and the previous data packet, wherein the interval between the first data packet and the previous data packet is 0.
Further, the method for acquiring the classification features comprises the following steps: inputting the spatial, length and temporal characteristics into an improved LSTM network, wherein the LSTM network is improved by controlling the cell state and the cell hiding state using a gating structure, comprising the steps of:
1) cell status according to time t-1
t-1And time interval delta
tObtaining adjusted cell status
Wherein the time interval is delta
tIs the time interval between the data packet at time T and the data packet at time T-1, T
t(. cndot.) is a function of time perception,
W
dand b
dIs LSTM network parameter;
2) creating candidate cell states
Wherein x
tFor input of cells at time t, h
t-1Is the state of cell hiding at time t-1, L
t(. cndot.) is a length perception function,
W
cand U
cIs LSTM network parameter;
3) according to the adjusted cell state
And candidate cell status
The cell state at the t moment is obtained by common updating
Wherein f is
tTo forget to enter, i
tIs input to the input gate;
4) according to the cell state CtCalculating the cell hidden state h at time tt=ot*tanh(Ct) Wherein o istIs output from the output gate.
Further, the method for classifying the network traffic according to the classification features comprises the following steps: a Softmax classifier was used.
A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the above-mentioned method when executed.
An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer to perform the method as described above.
Compared with the prior art, the invention has the following positive effects:
the invention performs experiments on a real network traffic data set, and evaluates the performance of the model by using the overall accuracy, precision, recall rate and F1 value. Comprehensive experimental results show that the model provided by the invention is superior to the existing baseline recognition methods such as decision trees, random forests and the like in performance and superior to a single comparison method using CNN and LSTM.
Detailed Description
In order to make the technical solutions in the embodiments of the present invention better understood and make the objects, features, and advantages of the present invention more comprehensible, the technical core of the present invention is described in further detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the invention, a time-space sensing network flow abnormal behavior identification method is designed, LSTM is improved, time and length sensing LSTM is provided by taking time intervals among data packets in network flow into consideration, and the structure is named TL-LSTM; in addition, the invention provides a new intrusion detection model, namely a space-time perception intrusion detection model, and the model is named as STIDM. Specifically, the STIDM extracts the spatial characteristics of the network traffic by using the CNN, extracts the temporal characteristics of the network traffic by using the TL-LSTM, and performs classification detection on the network traffic by using the extracted characteristics, so that the detection performance is improved.
The general idea of the invention is to preprocess network traffic data, select partial data as the flow information, time information and length information of the network traffic data, then use the above information to extract the space and time characteristics of the network traffic respectively by the convolutional neural network and the long-short term memory network, and finally use the classifier to classify the characteristics, thereby detecting the attack behavior in the network traffic data.
The overall structure of the invention is shown in fig. 1, and the details of the steps of the method are described as follows:
1) data processing module
The verification dataset used by the present invention is the CICIDS2017 dataset. The CICIDS2017 data set is from the university of New Brorelix, Canada, and an author Sharafaldin simulates the normal interaction behavior of 25 users and various common attack behaviors in a simulation environment containing a complete network configuration and completely captures network traffic. The CICIDS2017 dataset stores all network traffic per day in pcap format and tags the classification of the network traffic based on timestamps and quintuples.
In order to remove redundant data, the detection efficiency is improved, and the time consumption is reduced. The invention carries out data preprocessing on the network flow data, including original file segmentation, invalid data removal and data extraction.
a) Segmenting original documents
The data set CICIDS2017 is partitioned according to five-tuple (source IP address, source port number, destination IP address, destination port number, protocol) by using SplitCap, wherein in practical application, an original file needs to be captured from a network.
b) Removing invalid data
When application data is transported in a network, the structure in the data link layer is an ethernet frame, the headers being an ethernet header, an IP header and a TCP/UDP header, respectively. Studies by other scholars have indicated that: the MAC address, type of ethernet header and version of IP header, diffserv field are not typically used for network flow characteristics, so the corresponding fields in the packet are discarded.
c) Data extraction
Statistics show that most network flows have fewer than 10 packets. In order to have the same size for the inputs to the model, the present invention selects the first 10 packets from each stream. Since the length of each packet is not the same, only the first 160 bytes are selected from each packet. If the length of the data packet exceeds 160 bytes, only the first 160 bytes are taken; if less than 160 bytes, 0 is padded. The longer the packet, the more information it may contain. To reduce the impact of truncation or padding, the module reads the length of each packet as a length signature. The time characteristic is the difference between the time stamp of each packet and the previous packet, and the time characteristic value of the first packet is 0. After data extraction, the dimension of the flow features is 10 × 160, and the dimension of the length features and the time features is 10 × 1. Each dimension of the length feature is the length of the current packet, and each dimension of the time feature is the time interval between the current packet and the previous packet (the interval between the first packet and the previous packet is 0).
2) Spatial feature extraction module
The convolutional neural network is widely applied to the field of image processing, has three characteristics of local perception, weight sharing and downsampling, reduces parameters required by the neural network, and reduces challenges brought by a full-connection structure. The two-dimensional convolution is mainly used for processing images, and because the network flow and the data packet are one-dimensional data, the method uses the one-dimensional convolution neural network to extract the spatial characteristics of the network flow.
The spatial feature extraction module is based on a convolutional neural network classical structure LeNet-5, and the structure is shown in FIG. 2. The input of the spatial feature extraction module is the flow feature, and the vector with the dimension of 10 × 160 is converted into the vector input layer with the dimension of 1 × 1600. The hidden layer is composed of two convolutional layers and two pooling layers. The activation function is ReLU. The pooling layer is maximum pooling. In addition, the use of dropout operations in this module prevents overfitting. And finally, a full connection layer is provided, and the output of the full connection layer is the output of the spatial feature extraction module.
3) Temporal feature extraction module
The long-term and short-term memory network is the most successful extension of a Recurrent Neural Network (RNN), and the problems of gradient disappearance and gradient explosion possibly generated by the RNN are solved by introducing a gating structure and a cell state, so that the problem of long-term dependence can be effectively treated. LSTM enables networks to remember valuable information in timing data, with great success in the handling of many timing-related problems. The network traffic is composed of data packets with a certain order, and can be regarded as sequence data. However, since LSTM is not fully suited for network traffic, the present invention modifies LSTM to provide TL-LSTM taking into account the time interval between packets.
Data packets in the network flow arrive at the destination address according to a certain sequence and time intervals, and the data packets are associated in a time dimension. Researchers often use time-series models to process network flows, and commonly used models include RNN, LSTM, and the like. LSTM has enjoyed great success in the areas of machine translation and the like. However, the LSTM design presupposes that the time intervals between elements in the sequence are evenly distributed, regardless of the effect on the time dimension. However, in the field of network traffic anomaly detection, packets in a network flow do not conform to this characteristic. For example, when a DoS attack occurs, there are a large number of persistent illegitimate requests that reach the server. The important feature of using LSTM to extract features in the time dimension directly discards different time intervals between packets. Further, the more payload content a packet is, the more information it is considered to contain. The completion truncation operation performed on the data packet by the data processing module may cause partial information loss.
Based on the above observation, the present invention provides TL-LSTM with an improved LSTM structure as shown in FIG. 3. TL-LSTM uses gating structures to control cellular and cellular cryptic states. I is used for an input gate, a forgetting gate and an output gate at the time t respectivelyt,ft,otAnd (4) showing. The gating structure is updated in the same way as in the standard LSTM structure, as follows:
it=σ(Wixt+Uiht-1+bi) Formula (1)
ft=σ(Wfxt+Ufht-1+bf) Formula (2)
ot=σ(Woxt+Uoht-1+bo) Formula (3)
Wherein x istThe input of cells at time t. W, U and b are model parameters. h ist-1Is the state of cell hiding at time t-1. σ denotes sigmoid activation function, i.e., σ (x) 1/1+ e-x。
Cellular shape at time t-1 due to non-uniform time intervals between packetsState C
t-1And cannot be directly used as input for the next cell. Therefore, the time interval Δ between the time t packet and the time t-1 packet
tInput to a time perception function T
tIn (a), the output of the function is used to adjust C
t-1. The adjusted cell state was recorded as
The update process is as follows:
cell State C
tThe updating of (2) is divided into two steps. First, candidate cell states are created
Indicating the memory acquired by the cells at the current time. In order to compensate the influence on the truncation and completion of the data packet, the length characteristic l at the moment t is used
tAs a function of length perception L
tInput of (·), output of length perception function, hidden state and input decision at this moment
Then, the cell state at the current time is updated by the adjusted cell state at the previous time and the current candidate state.
The hidden state of the cell at the present time is updated as shown in equation (7).
ht=ot*tanh(Ct) Formula (7)
The TL-LSTM structure is used to extract temporal features. The temporal feature extraction module consists of a single-layer TL-LSTM structure, each cell comprises 256 hidden layer units, and the time step is 10.
Statistics show that the time intervals between packets in a network flow vary greatly. The cell state can be adjusted by directly using the time interval, so that great errors are brought to the model, and the time perception function is used for unifying standards, so that the errors are reduced. Intuitively, the larger the time interval, the weaker the connection between packets, and the weaker the effect of the previous element in the sequence on the current element. Therefore, the time perception function needs to be a monotone non-increasing function. Referring to the research on electronic medical records in the field of predictive medicine, the time perception function selected by the invention is as the formula (8), wherein e ≈ 2.718 is a base number of natural logarithm.
It is generally believed that a longer load implies more information. Therefore, the length-aware function needs to be a monotonically non-decreasing function. The length perception function selected by the present invention is as in equation (9).
4) Classifier
The classifier is used to detect anomalies or classify network traffic, and Softmax is used herein as the classifier. Which outputs the probability for each class. The class with the highest probability value is the result of the classification. The probability of each class is calculated as in equation (10), where C is the total number of classes.
5) Comparison of results
The invention performs experiments on a real network traffic data set, and evaluates the performance of the model by using the overall accuracy, precision, recall rate and F1 value. In order to verify the effectiveness of the method, the invention carries out comparison experiments on both the machine learning method and the deep learning method. Seven commonly used machine learning methods are used for comparison, which are respectively as follows: k Nearest Neighbor (KNN), Random Forest (RF), ID3, AdaBoost, multilayer perceptron (MLP), Naive Bayes (Naive-Bayes), and Quadratic Discriminant Analysis (QDA). Baseline comparative experiments for the deep learning method were one-dimensional CNN, two-dimensional CNN, LSTM and TL-LSTM.
a) Performance comparison with machine learning methods
The results of the performance comparison of this experiment on the CICIDS2017 data set are shown in Table 1. The experimental result of the machine learning method is the result of the experiment performed after the author of the data set, Sharafaldin et al, manually extracts 80 features from the network traffic. For the machine learning method, the performance of the KNN, RF and ID3 algorithms is optimal. The performance of the RF algorithm is optimal in terms of execution time. In these algorithms, naive bayes has a low recall rate, which means that there are many attack traffic misclassified as normal traffic. The algorithm is premised on that the attributes are independent of each other, and in the experiment, since the Sharafaldin uses 80 features, the classification performance is reduced due to the fact that the number of attributes is large and the attributes are related. In addition, since this data set is created artificially, this will affect the prior probability and thus the classification result. In addition, a probability threshold that is too high may also lead to this situation.
The traditional machine learning method depends on manually designed features, the selection and extraction of the features need to have certain professional knowledge, and some important information can be lost or damaged in the feature extraction process. More representative features can be automatically extracted by using a deep learning method, so the experimental result of the STIDM is superior to other traditional machine learning methods.
TABLE 1 comparison of STIDM Performance with conventional machine learning on CICIDS2017 dataset
Algorithm
|
Rate of accuracy
|
Recall rate
|
F1
|
Execution time/second
|
KNN
|
0.96
|
0.96
|
0.96
|
1908.23
|
RF
|
0.98
|
0.97
|
0.97
|
74.39
|
ID3
|
0.98
|
0.98
|
0.98
|
235.02
|
AdaBoost
|
0.77
|
0.84
|
0.77
|
1126.24
|
MLP
|
0.77
|
0.83
|
0.76
|
575.73
|
Naive-Bayes
|
0.88
|
0.04
|
0.04
|
14.77
|
QDA
|
0.97
|
0.88
|
0.92
|
18.79
|
The invention
|
0.99
|
0.99
|
0.99
|
222.01 |
b) Performance comparison with deep learning methods
The deep learning method can learn the intrinsic rules of the data, so that the method is more suitable for fitting and predicting the network traffic data. In order to verify the effectiveness of the STIDM, the invention uses one-dimensional CNN, two-dimensional CNN, standard LSTM and TL-LSTM as the control experiment of the STIDM, and the experimental results are shown in Table 2. From the results, it can be observed that all indexes of the one-dimensional CNN are higher than those of the two-dimensional CNN, which confirms that the one-dimensional CNN is more suitable for processing one-dimensional data such as traffic and data packets than the two-dimensional CNN. The effect of TL-LSTM is superior to LSTM, which illustrates that taking into account the time interval between packets in the network traffic helps to extract more accurate temporal features. In these five experiments, STIDM achieved the highest scores in precision, accuracy, recall and F1. Therefore, the STIDM can automatically extract more representative temporal and spatial characteristics and effectively detect abnormal flow. The STIDM training takes longer than other models at the same epoch, but within an acceptable range. Although the performance of the STIDM is only improved by a small margin compared with other models, in practical application, the amount of network traffic is huge, and the small-margin improvement can also bring great help to the network traffic anomaly detection.
TABLE 2 Performance comparison of STIDM with deep learning methods on CICIDS2017 dataset
The performance comparison result with the machine learning and deep learning method shows that the STIDM model provided by the invention has better prediction accuracy for the detection of network traffic and has the potential of practical application.
The above-mentioned embodiments only express the embodiments of the present invention, and the description thereof is specific, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent should be subject to the appended claims.