CN115811440A

CN115811440A - Real-time flow detection method based on network situation awareness

Info

Publication number: CN115811440A
Application number: CN202310040052.0A
Authority: CN
Inventors: 车洵; 孙捷; 金奎�; 卫英俊
Original assignee: Nanjing Zhongzhiwei Information Technology Co ltd
Current assignee: Nanjing Zhongzhiwei Information Technology Co ltd
Priority date: 2023-01-12
Filing date: 2023-01-12
Publication date: 2023-03-17
Anticipated expiration: 2043-01-12
Also published as: CN115811440B

Abstract

The invention discloses a real-time flow detection method based on network situation awareness, which comprises the following steps: carrying out character type digitization and normalization processing on the network flow data; according to the actual network situation, the calculation force scheduling module judges the network load condition and distributes the calculation force according to the network load condition; entering preprocessed flow data, performing two-classification detection under the constraint of distributed computational power, and judging normal flow and abnormal flow; entering abnormal flow detected by the second classification, performing multi-classification detection under the constraint of distributed computing power, and judging the attack type of the abnormal flow; collecting correctly distinguished normal flow and abnormal flow, performing data augmentation on the abnormal flow by using GAN, and retraining by using mixed augmented training data to obtain updated parameters; repeating the steps, and continuously carrying out real-time intrusion detection of network situation awareness; the method and the device have the characteristics of identifying abnormal data in the network traffic quickly and accurately and having generalized identification capability on the abnormal traffic.

Description

Real-time flow detection method based on network situation awareness

Technical Field

The invention relates to the field of network security, in particular to a real-time flow detection method based on network situation awareness.

Background

In recent years, with continuous development and wide application of emerging computer technologies such as cloud computing, distributed systems, big data, 5G communication, internet of things and industrial control networks, the number of user equipment accessing the internet is increased rapidly, security events of computer networks frequently occur, network information security faces huge challenges, individuals and organizations are protected from network attacks of hackers, and finding intruders in time is an important step for guaranteeing network security. The intrusion detection technology with deep learning and energizing has advantages in the aspects of processing high-dimensional data, mining hidden information in the data and the like, and is widely concerned by the academic and industrial fields. However, accurately classifying network traffic is not a simple task.

When an intrusion detection system based on deep learning processes large-scale data, a method for reducing data dimensionality is often adopted to reduce the computational complexity. However, when data dimension reduction is performed on network traffic, important information in the network traffic is likely to be removed, which greatly reduces the detection accuracy of the model. In addition, existing intrusion detection systems directly or indirectly assume that computational resources are abundant, ignoring the impact of detection time on system utility. With the rapid increase of network data scale, the existing intrusion detection system can not meet the requirement of low time delay of network flow gradually. Currently, deep learning models used by intrusion detection systems are typically trained and tested using public data sets. The learning mode using static data leads to strong dependence of the intrusion detection system on data sets and low generalization of models. With the continuous change of network attacks, the identification capability of the existing intrusion detection system on abnormal network attack flow is continuously reduced.

In recent years, the popularization of deep learning has led to its widespread use for identifying various types of cyber attacks. Because the deep learning overcomes the defects of shallow learning and can automatically extract high-level features, the application research of the deep learning in network intrusion detection arouses wide attention of scholars at home and abroad. The publication proposes a new hybrid approach, SCDNN, for network intrusion detection, which consists of Spectral Clustering (SC) and Deep Neural Networks (DNN). First, the SC divides the original training data set into k training subsets and trains k sub-DNN classifiers with the training subsets. Next, the test data set is divided into subsets by the SC, and the corresponding sub-DNNs are tested by the test data subsets. Experimental results show that SCDNN is evaluated on 6 KDD-Cup99 and NSL-KDD data sets, and the detection precision is superior to that of SVM, BP neural networks, RF and Bayesian methods. The published literature proposes an unsupervised network intrusion detection method ID-CVAE based on a conditional variant self-encoder, which designs a specific architecture and only integrates an intrusion flag inside a decoder layer. The ID-CVAE classifier can recover missing features from an incomplete training data set, the accuracy of the ID-CVAE classifier on an NSL-KDD data set reaches 80.10%, and the ID-CVAE classifier has a better classification effect than other common classifiers.

With the further research, network security personnel design and verify more deep learning energized network intrusion detection methods. Such as: the published literature provides a deep learning method for network intrusion detection by using a Recurrent Neural Network (RNN), which is called RNN-IDS, and researches the accuracy and training time of RNNs with different learning rates and hidden layer neuron numbers in two-class and multi-class experiments, and the experimental results show that the multi-class accuracy of RNNs on KDDTest + and KDDTest-21 test sets is 73.28 percent and 68.55 percent respectively, which are slightly lower than 88.32 percent and 86.71 percent of the two classes. The published literature also provides a method for converting NSL-KDD data into picture characteristics by the existing graph conversion technology, then performing two-classification network intrusion detection by using residual convolutional neural networks ResNet and GoogLeNet, and simultaneously evaluating the feasibility and detection performance of converting intrusion detection into image classification. The published literature also proposes a new two-stage deep learning (TSDL) model, which uses a stacked self-encoder to classify network traffic normally and abnormally and output probability values in the first stage; and in the second stage, the probability value is added to the original characteristic as an additional characteristic, and then the softmax classifier is used for detecting normal attacks and other types of attacks, the detection accuracy rates of TSDL on KDD99 and UNSW-NB15 data sets respectively reach 99.996% and 89.134%, and the method is obviously superior to other reference detection methods.

In summary, the deep learning method has a satisfactory effect in a network intrusion detection system, but with the continuous expansion of network data, a large amount of nonlinear network data brings new challenges to the intrusion detection method based on deep learning, so that the intrusion detection method faces the problems of low detection rate of unknown attacks and low-frequency attacks and difficulty in balancing high efficiency, generalization and reliability. Therefore, it is urgently needed to provide a real-time traffic detection method based on network situation awareness to solve the above problems.

Disclosure of Invention

Therefore, a real-time traffic detection method based on network situation awareness is needed, which can quickly and accurately identify abnormal data in network traffic, continuously learn new traffic characteristics during operation, and have extremely strong generalization and identification capabilities on abnormal traffic.

In order to achieve the above object, the inventor provides a real-time traffic detection method based on network situation awareness, including:

s1: carrying out character type digitalization and normalization processing on the network flow data;

s2: according to the actual network situation, the computing power scheduling module judges the network load situation and distributes computing power to the Multi-Class detection Net, the Multi-Label detection Net and the Data authentication Net according to the network load situation;

s3: the flow data of the pretreatment enters a Multi-Label detection Net, and the two-classification detection is carried out under the constraint of the distributed computing power, so as to judge the normal flow and the abnormal flow;

s4: the abnormal flow detected by the second classification enters a Multi-Label detection Net, multi-classification detection is carried out under the constraint of distributed computing power, and the attack type of the abnormal flow is judged;

s5: collecting correctly distinguished normal flow and abnormal flow by Data augmentation Net, performing Data augmentation on the abnormal flow by using GAN, and retraining Multi-Class detection Net and Multi-Label detection Net by using mixed augmented training Data to obtain updated parameters;

s6: and repeating the steps S1 to S5, and continuously carrying out real-time intrusion detection of network situation perception.

As a preferred mode of the invention, when distributing computing power, the computing power scheduling module firstly distributes all computing power to the Multi-Class detection Net preferentially, and then distributes the residual computing power to the Multi-Label detection Net and the Data evaluation Net.

As a preferred mode of the invention, for the Multi-Class detection Net, a Sigmoid layer is used at the last layer of the network, the output probability value is normalized to [0,1], the correctly labeled threshold value is set to 0.5, when the network is trained, a two-Class cross entropy loss function is used for calculating the error on each label, and the error function of the neural network is equal to the sum of all label loss functions.

As a preferred mode of the present invention, the Data augmentation Net uses a generalized learning model to augment network traffic, the generalized learning model continuously collects normal traffic and abnormal traffic during an operation process, and uses GAN to augment Data of abnormal Data to form augmented training Data in which abnormal Data and normal traffic are mixed, during an intrusion detection process, a situation awareness module is used to perform redundant storage on network weight parameters, and after augmented training Data of a certain scale is generated, a network with redundant storage is trained to update the parameters.

In a preferred embodiment of the present invention, the step S3 of performing the binary detection by the Multi-Label detection Net under the constraint of allocated computing power comprises the following two steps:

s301: extracting flow data characteristics to capture space-time characteristics from network flow to the maximum extent;

s302: key feature learning, such that the model focuses on important features that are beneficial for classification.

As a preferred mode of the present invention, in step S301, spatio-temporal feature is captured using spatio-temporal connection learning, the spatio-temporal connection learning includes a spatio-temporal block and a transition block, the spatio-temporal block includes two core feature extraction blocks Conv and a long-short term memory layer LSTM, and is implemented by a packet CNN and a long-short term memory layer LSTM, respectively, wherein the packet convolutional layer uses a 3 × 3 filter, the number of channels of the output feature map is twice the number of input channels, and the number of packets is the number of data channels of the input model; the transition block is used for reducing the dimensionality, and a long-short-term memory layer LSTM without changing the dimensionality is added in the transition block.

In a preferred embodiment of the present invention, a batch normalization layer, a Maxpooling layer and a Dropout layer are added to the spatio-temporal block.

As a preferred mode of the present invention, in step S302, the key feature learning of the Multi-Label detection Net is performed by 1 self-attention layer and 3 full-links, the attentiveness weight is calculated by the attentiveness layer according to the actual number of input channels and is matrix-multiplied with the feature map output in the first stage of situational awareness detection, the FC1 layer uses TanH as an activation function and adds a Dropout layer with a discarding rate of 0.5, the FC2 layer and the FC1 layer reduce the dimension of input data by half, and the FC3 layer outputs a classification result by setting a threshold to mark the category of input data and using Sigmoid as an activation function.

As a preferred mode of the present invention, the Multi-Class detection Net performs feature extraction work by 3 convolutional layers and 2 self-attention layers, classification work is performed by 3 fully-connected layers and 2 Dropout layers, the Conv1 layer receives 16-channel image data, outputs 32-channel feature images, uses a convolution kernel of 3 × 3, and an activation function TanH, and the attentiveness weight of the mask is calculated by the attentiveness layer 1 according to the number of actual input channels and is matrix-multiplied with a feature map output by the Conv1 layer, the Conv2 layer and the Conv3 layer receive image data 0 of 32 channels and 64 channels, respectively, and output feature images of 64 channels and 128 channels, respectively, and uses a convolution kernel of 2 × 2, the activation function TanH, and the FC1 layer and the FC2 layer in the attention layer respectively contain 512 neurons and 64 neurons, and each use the activation function TanH and a dropoff layer of 0.5, and the FC3 layer contains 2 neurons for marking the input data type.

As a preferred embodiment of the present invention, the step S5: the Data augmentation Net collecting the correctly distinguished normal flow and abnormal flow and using GAN to perform Data augmentation on the abnormal flow comprises the following steps: introducing self-attention into a GAN framework to enable a generator G and a discriminator D to be capable of extracting the relation between data airspaces in a global range, performing convolution calculation on data by using a 3 x 3 filter, performing convolution calculation on each feature map by using 31 x 1 filters, calculating self-attention scores by taking obtained results as queries and keys respectively, obtaining attention weights through a softmax function, and performing matrix multiplication on the attention weights and values to obtain a new feature map.

Different from the prior art, the technical scheme has the following beneficial effects: the method can switch the detection mode according to the machine computing power and the network load condition, always identify the abnormal data in the network flow by a quick and high-precision method, can continuously learn new flow characteristics during operation, and has extremely strong generalization identification capability on the abnormal flow. Firstly, an intrusion detection method capable of adaptively adjusting modes according to network situations is designed, and detection speed and detection precision can be dynamically balanced. Then, a model optimization method for streaming data is provided, and the generalization learning capability of the model to abnormal traffic is improved. Finally, aiming at the mode self-adaptive flow detection, a new evaluation index is formulated so as to more comprehensively measure the performance of the intrusion detection model in the real network environment. Experimental results show that the method is superior to the existing benchmark algorithm in the indexes such as detection precision, accuracy, F1 value and the like.

Drawings

FIG. 1 is a block diagram of a framework model for a method according to an embodiment.

Fig. 2 is a schematic diagram of a model structure of the multi-classification task according to the embodiment.

FIG. 3 is a functional block diagram of a Multi-Label detection Net according to an embodiment.

Fig. 4 is a network structure diagram of the space-time block according to the embodiment.

Fig. 5 is a network architecture diagram of the key feature learning phase in accordance with an embodiment.

Fig. 6 is a self-attention weight thermodynamic diagram of features in the NSL-KDD dataset according to an embodiment.

Fig. 7 is a network structure diagram of the Multi-Class detection Net according to the embodiment.

Fig. 8 is a network structure diagram of Data authentication Net according to the embodiment.

FIG. 9 is a comparison graph of the detection speeds of the five models according to the embodiment.

Fig. 10 is a curve of variation trend of the F1 fraction of the real time of the five models with the data transmission rate of the network according to the embodiment.

Fig. 11 is a comparison diagram of training situations of the online learning mode and the static data mode according to the embodiment.

Detailed Description

In order to explain technical contents, structural features, objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in combination with the embodiments.

As shown in fig. 1, the present embodiment provides a real-time traffic detection method based on network situation awareness, including:

s5: collecting correctly distinguished normal flow and abnormal flow by the Data augmentation Net, performing Data augmentation on the abnormal flow by using the GAN, and retraining the Multi-Class detection Net and the Multi-Label detection Net by using mixed augmented training Data to obtain updated parameters;

s6: and repeating the steps S1 to S5, and continuously carrying out real-time intrusion detection of network situation awareness.

In the specific implementation process of the foregoing embodiment, the embodiment proposes an intrusion detection method combining spatial and temporal features for the dynamics of the actual network situation, and the system structure of the intrusion detection method is shown in fig. 4. The embodiment particularly designs the computing power scheduling module to adjust the computing power support condition of each module in the model. In order to enable the detection speed of the model to be dynamically adjusted according to the network load and the machine computing power, the computing power scheduling module firstly distributes all the computing power to the Multi-Class detection network Multi-Class detection Net preferentially, and then distributes the residual computing power to the Multi-label detection network Multi-Class detection Net and the Data augmentation network. On the other hand, the Multi-Label detection Net can only classify the network traffic twice, and cannot classify the abnormal traffic finely, so the embodiment introduces the Multi-Label detection Net, which can classify the abnormal traffic finely by extracting the characteristics of the abnormal traffic.

As is known, it is difficult to construct an intrusion detection model with high detection speed and detection accuracy and strong generalization capability, and the classification task type needs to be reasonably judged. The present embodiment converts the original two-classification task into a multi-classification task, which may allow the input sample to possess one or more labels. Specifically, as shown in fig. 2, this embodiment shows a Multi-Class detection Net structure of Multi-classification task, and the model uses a Sigmoid layer at the last layer of the network to normalize the output probability value to [0,1], and sets the correctly labeled threshold value to 0.5. When training the network, the error on each label is calculated by using the two-class cross entropy loss function, so that the error function of the neural network is equal to the sum of all label loss functions. In order to prevent excessive influence among multiple traffics, some network types capable of isolating each channel are used in the detection model, such as: and grouping the convolutional layers.

In the past research, the deep learning models are often trained by using static data, and once the training samples have the phenomenon of data type imbalance, the detection accuracy of the models is affected. In addition, the static data greatly weakens the generalization of the model, so that the model can only identify intrusion data with single characteristics, and has no way for malicious attacks which may appear in the future.

One common approach to the above problem is to balance the training samples using a generator-confrontation network (GAN) to perform data augmentation on the training data. Most GANs use convolutional layers in both the discriminator D and the generator G, but the information of the convolutional processing is concentrated in only one local neighborhood, so that only long-distance features of the space can be processed after passing through multiple convolutional layers. In general, learning of long-range features by convolutional GAN may be hampered by: (1) The shallow layer small convolution GAN model can not extract long-distance features; (2) The loss function may not be able to capture long-range features by gradient-guided individual filter learning; (3) Increasing the size of the filter can increase the feature extraction capability of the network, but can also increase the computation time index of the convolutional network.

It can be seen that when various features of the traffic data are learned comprehensively, the convolutional GAN cannot perform an effective data augmentation task. Thus, the present embodiment provides Data augmentation Net in FIG. 4, which augments network flows using a generalized learning model. The generalized learning model continuously collects normal flow and abnormal flow in the running process of the real-time flow detection system, and performs data augmentation on abnormal data by using the GAN to form augmented training data mixed by the abnormal data and the normal flow. In the intrusion detection process, the situation awareness module firstly performs redundant storage on the network weight parameters, and trains the redundantly stored network after generating the augmented training data of a certain scale so as to update the parameters of the situation awareness model, thereby completing the function of updating the parameters in fig. 1.

The structure and function of the Multi-Label detection Net, multi-Class detection Net and Data evaluation Net modules of FIG. 1 are described in detail below.

As shown in fig. 3, the Multi-Label detection Net needs to have both high-precision detection capability (model quality) and extremely small model execution time (model cost). The model quality is closely related to the flow data feature extraction and how efficiently the extracted features are used for final detection. The present embodiment proposes a two-stage dense connection network architecture, as shown in fig. 3. The first stage is to extract the flow data characteristics to capture the space-time characteristics from the network flow to the maximum extent; the second stage is key feature learning, so that the model focuses more on important features beneficial to classification, and the detection capability and efficiency are improved. The execution time in the model cost is related to the number of trainable parameters, the smaller the number of trainable parameters, the higher the detection speed. Therefore, there is a need in the design to prevent excessive network trainable parameters, resulting in reduced intrusion detection speed.

For the spatio-temporal feature extraction stage, specifically: the network traffic data has spatial and temporal correlation, so the embodiment proposes a spatio-temporal connection learning, which can learn spatio-temporal features of different abstraction levels from the input traffic to the maximum extent and allows a deeper neural network to be constructed with high performance and easy training. The feature of spatio-temporal connection learning is to create a staggered arrangement pattern between spatio-temporal blocks and transition blocks, where the number of spatio-temporal blocks is always 1 more than the number of transition blocks. The design of the space-time block and the transition block is as follows.

A time-space block: fig. 4 shows a space-time block, which contains two kernel feature extraction blocks, conv and a long-short term memory layer LSTM, implemented by a block convolutional neural network CNN and a long-short term memory layer LSTM, respectively. The packet convolutional layer uses a 3 × 3 filter, the number of channels of the output feature map is twice the number of input channels, and the number of packets is the number of data channels of the input model. The packet convolution is used in the space-time block, so that the complexity of convolution calculation can be effectively reduced, because when the number of packets is equal to the number of data channels of the input model, the filter of each packet only extracts the space-domain feature of one piece of traffic data (one channel), the data of each traffic strip cannot be influenced mutually, and the classification is not accurate. The long-short term memory layer has the number of the neurons of the hidden layer equal to the number of the data channels of the input model, so that the dimension of data passing through the long-short term memory layer is not changed. Each feature of the network flow is input as a time sequence element, so that the time sequence feature among the spatial domain features can be extracted on the basis of each spatial domain feature.

In order to effectively utilize the feature extraction capability of CNN and LSTM on the traffic data and reduce the potential high computation cost of spatio-temporal connection learning, the present embodiment further adds three auxiliary layers in the spatio-temporal block to further enhance the fitting capability of the model on the nonlinear relationship and stabilize the training process. (1) The training process is accelerated by adopting Batch Normalization (BN), so that the final generalization error is reduced; (2) The Maxpooling (MP) maximum pooling layer provides basic conversion invariance for internal representation, and reduces the calculation cost; (3) Dropout discard layer is a regularization algorithm to prevent overfitting.

Spatiotemporal connection learning enhances the propagation of features and gradients in the network, while multiple spatiotemporal blocks can be stacked to form a deeper neural network.

A transition block: dimensional cursing problems indicate that if the number of features of a neural network model (i.e. the dimension of the feature space) increases rapidly, the predictive power of the model will decrease significantly. The space-time block will multiply the feature space dimension. To alleviate this problem and continue to build deeper networks to fully understand the features of the various abstraction levels, a transition block needs to be added between two spatiotemporal blocks to reduce dimensionality. In order to maintain spatial and temporal characteristics simultaneously in the dimension reduction process, the LSTM layer without changing the dimension is added to the transition block in the embodiment, so that the characteristic space is prevented from growing, and the generalization capability and robustness of the model are improved.

Obviously, the situation awareness detection first stage uses the space-time blocks to construct a deeper neural network, wherein the space-time blocks are connected through the transition blocks, and more space-time features can be extracted. In order to further improve the detection capability, the present embodiment proposes a second stage of situation awareness detection, focusing on features that are more important to the intrusion detection result.

The present embodiment is a key feature learning phase, and uses a self-attention mechanism to focus more attention on the important features that are considered to distinguish the attack from the normal behavior. In the learning stage, each feature obtains an attention score, and the higher the attention score is, the more important it is, the greater the influence on the detection engine is.

As shown in FIG. 5, the key feature learning phase of the Multi-Label detection Net completes the feature extraction work by 1 self attention layer Atte and 3 full connection layers, namely FC1 layer, FC2 layer and FC3 layer. Wherein the attentional weight is calculated by the Atte layer according to the number of actual input channels, and the attentional graph output by the first stage of situation awareness detection is subjected to matrix multiplication. The FC1 layer uses TanH as the activation function and adds a Dropout layer with a 0.5 drop rate. Both the FC2 layer and the FC1 layer may reduce the input data dimension by half. The FC3 layer outputs the classification result by setting a threshold to mark the category of the input data using Sigmoid as an activation function.

The present embodiment uses a self-attention mechanism on the NSL-KDD dataset to compute and visualize the attention weights of the features. Fig. 6 shows the self-attention weight distribution of 40 features in the NSL-KDD dataset. It can be seen that the self-attention mechanism can better extract feature relationships over long distances than convolution calculations. For example, feature No. 20 is affected by features No. 2, 9, and 11 in addition to its height. It can be seen that the self-attention mechanism can enhance the interpretability of the captured features and reduce the semantic gap between the intrusion detection system and the security engineer. In addition, the mechanism can help security engineers obtain attention weight, select important features for correlation analysis, further filter false alarms, effectively identify real attacks and timely respond to the attacks. In addition, the embodiment can better acquire the relationship between the global features of the traffic data and the classification result by using a self-attention mechanism, so as to alleviate the problems of gradient disappearance and performance degradation, thereby obtaining higher accuracy.

The embodiment is a multi-type detection network, and specifically: compared to Multi-Label detection Net, multi-Class detection Net needs to have higher detection capability (model quality), while the execution time of the model (model cost) can be slightly higher than the pre-detection module. As shown in fig. 7, the pre-detection module performs feature extraction work by 3 convolutional layers Conv1, conv2 and Conv3 and 2 self-attention layers attentive 1 and attentive 2; the classification work is finished by 3 full connection layers and 2 Dropout layers. The Conv1 layer receives 16-channel image data, outputs a 32-channel feature image, and activates the function TanH using a convolution kernel of 3 × 3. And the attentive weight of the mask is calculated by the Atte1 layer according to the number of the actual input channels, and the attentive weight is subjected to matrix multiplication with the characteristic diagram output by the Conv1 layer. The Conv2 layer and the Conv3 layer receive the 32-channel and 64-channel image data 0, respectively, output the 64-channel and 128-channel feature images, respectively, and activate the function TanH using a 2 × 2 convolution kernel. The Atte2 layer is similar to the Atte1 layer, where the FC1 and FC2 layers contain 512 and 64 neurons, respectively, both using the activation function TanH and a Dropout layer with a 0.5 drop rate. The FC3 layer contains 2 neurons, and no activation function is used to label the class of input data.

For the data augmentation network, the embodiment introduces self attention into the GAN framework, so that the generator G and the arbiter D can widely extract the relationship between data airspaces in the global scope. As shown in FIG. 8, data convolution Net first performs a convolution calculation on the Data using a larger filter, such as: a 3 × 3 filter is adopted; then, convolution calculation is performed on each feature map by using 3 filters of 1 × 1, the obtained results are used as a query and a key respectively to calculate a self attention score, and an attention weight is obtained through a softmax function. Then, the attention weight and the value are subjected to matrix multiplication to obtain a new characteristic diagram.

Training instability is a common problem for GAN, because the classification loss function that guides GAN learning does not reflect the quality of the model generation data well. In the embodiment, the modification of the loss function by the WGAN-GP is used for reference, a gradient penalty term is added in the loss function, and the disappearance of the gradient or the explosion of the gradient during network training is prevented. In addition, on the basis of training using the spectrum normalization to stabilize the GAN, the embodiment also uses the dual-time scale update rule TTUR to update the parameters of the GAN, thereby significantly reducing the calculation cost of the training.

The experimental methods of the above examples are provided below, specifically:

data set selection: the NSL-KDD dataset is derived from a KDDCUP99 dataset, which is a statistically enhanced version proposed by Tavallae et al, and the experiments are all performed on this dataset. The NSL-KDD dataset contains 41 data features and 1 class label in the raw KDDCUP99 dataset. The 1 st to 9 th features contain basic features extracted from a TCP/IP connection protocol; features 10 to 22 include content features generated from the payload of the network packet; the 23 rd to 31 th features are extracted from the time attribute of the flow; the 32 nd to 41 th features contain the traffic characteristics of the end host. Each record is provided with a class label for distinguishing attack types, including: doS, probe, U2R, R2L.

The NSL-KDD consists of four subdata sets: KDDTrain +, KDDTrain +20%, KDDTest + and KDDTest-21.KDDTrain +20% is a subset of KDDTrain +, KDDTest-21 is a subset of KDDTest +, and KDDTest-21 screens out flow records which are more difficult to detect from KDDTest +. In order to illustrate the accuracy and the universality of intrusion detection, KDDTrain +, KDDTest + KDDTest-21 data sets are used as basic data sets in the experiment, and the distribution of different types of data in the three data sets is shown in the table 1.

TABLE 1 number of types of data in NSL-KDD dataset

Data preprocessing: to better extract the flow characteristics, the experiment performed the following preprocessing work on the flow data in the NSL-KDD dataset.

Step 1: character type digitization

The NSL-KDD dataset has 3 features and class identifications being character types and there is a constant feature of 0. The experiment adopts 2 character type digital processing modes: under the single-hot coding mode, the symbolic data is processed into a 121-dimensional vector with a corresponding label of 1 and the rest labels of 0; in the tag encoding mode, the symbol data is processed into a natural number to form a 40-dimensional vector.

Step 2: normalization process

After the flow data is digitized, the contributions of different dimensional features to model fitting are unequal, and the classification deviation can be caused by over-emphasizing the features with larger orders of magnitude. Therefore, the min-max normalization shown in the following formula was used in this experiment to maintain a certain numerical comparability and to improve the stability and speed of back propagation.

Establishing an experiment platform: the experiment used a system as shown in table 2:

TABLE 2 System parameter settings

Measurement indexes are as follows: in the experiment, classification accuracy, precision, recall ratio and F1 score are used as performance indexes to evaluate the performance effect of the detection model, wherein TP represents a correctly predicted abnormal constant, TN represents a correctly predicted normal example number, FP represents a normal example number wrongly classified as abnormal, and FN represents an abnormal example number wrongly classified as normal.

The accuracy is the proportion of the number of correct predictions to the total number of all records, the higher the model accuracy, the better the traffic classification performance, and the expression is as follows:

the accuracy, which is a measure of the quality of the correct prediction, is calculated from the ratio of the correct prediction samples to the number of all prediction samples of the particular class, and is expressed as follows:

DR (detection rate) or Recall refers to the proportion of correctly classified actual attack traffic to the total number of attack traffic, and is expressed as follows:

DR is also known as TPR (true positive rate) or recall, the higher DR the better the traffic classification performance.

The F1-Score is a trade-off between accuracy and recall, which is a harmonic mean of accuracy and recall, expressed as follows:

in order to better evaluate the performance of the intrusion detection model in a real network environment, an evaluation index including detection time and detection effect is defined in the experiment and is called as a Real Time F1 Score (RTFS). When the network data transmission rate is

The maximum detection rate of the intrusion detection model is

The RTFS calculation formula is as follows

When in use

Is less than

The detection model can carry out detection under the condition of meeting the time delay requirement, and the RTFS of the detection model is equal to the F1-Score; when in use

Is greater than

In time, the detection model cannot complete the detection task within a specified time, and the F1-Score thereof is attenuated according to the detection speed difference.

And (3) analyzing an experimental result: experiment 1 comparison of the comprehensive Properties

In the experiment, two-classification training and testing are carried out on an NSL-KDD data set, and the two-classification performance of RF, MLP, CNN, AE, CNN + LSTM, BIGAN + MLP, SAGAN + CNN + LSTM and the model is tested, and the experimental results are shown in Table 3.

As can be seen from table 3, in both the two classifications of the conventional machine learning and deep learning models, there are cases where the accuracy is low and the recall rate is high, and both of them are biased to recognize the traffic as a positive example. The main reason is that the number of samples for different attack classes in the training set varies widely, especially for R2L and U2R. Therefore, the GAN augmented data set is used, and the balance data set is used to retrain the model again, so that the recall rate and the accuracy are balanced, and the F1 score of the model is improved. Although the performance of the experimental model is slightly worse than the performance of SAGAN + CNN + LSTM in the traditional index, the experimental model can complete flow detection at a higher speed under the condition of sacrificing a little accuracy, and is far ahead of SAGAN + CNN + LSTM in the real time F1 fraction.

TABLE 3 Standard test comparison table of the model of the method and other models

Experiment 2: effect of network flow arrival Rate on F1 score

In the experiment, a balanced KDDTest + data set is used as a test, and the total detection rate of AE, MLP, CNN + LSTM and the model is calculated. Fig. 9 shows that the total detection rate of the experimental model is far better than other four traditional detection methods.

After the experimental model converts the two classification problems of the network flow into the multi-label classification problem of a plurality of flows by using the packet convolution, the detection speed is greatly improved and far exceeds other models in the real time F1 fraction. For example, in fig. 10, when the network data transmission rate is 16MB/s, the experimental model can still complete the intrusion detection task without affecting the use of the user, and the detection performance of other methods is drastically reduced.

Experiment 3: online learning performance gain analysis

The experiment is carried out on a KDDTest + data set, and network parameters are updated in an online learning and static mode. FIG. 11 shows that, when an unknown type of attack traffic occurs, the online learning can extract traffic characteristics more quickly, and a higher F1 score and more stable detection performance are obtained; and the network trained by the static data often causes the detection precision to be reduced because of insufficient generalization capability.

The embodiment provides an AI framework for real-time traffic detection, which makes a good trade-off between the high efficiency and the generalization of the intrusion detection model. The key of the method is to design a situation perception method with adjustable granularity and integrate a self-attention mechanism into a neural network. According to the accuracy, the positive rate, the false positive rate and the four evaluation indexes, compared with a traditional benchmark detection model, the model obtains satisfactory two-classification and multi-classification results on the NSL-KDD data set. The method mainly uses a packet convolution technology, a self-attention mechanism and a generation type countermeasure network technology, so that the model of the method can effectively balance the intrusion detection speed and the detection precision. Meanwhile, the online learning method for the streaming data can greatly enhance the generalization capability of the model and improve the recognition performance of the model under the condition of not influencing the network performance.

It should be noted that, although the above embodiments have been described herein, the scope of the present invention is not limited thereby. Therefore, based on the innovative concepts of the present invention, the technical solutions of the present invention can be directly or indirectly applied to other related technical fields by changing and modifying the embodiments described herein or by using the equivalent structures or equivalent processes of the content of the present specification and the attached drawings, and are included in the scope of the present invention.

Claims

1. A real-time flow detection method based on network situation awareness is characterized by comprising the following steps:

s1: carrying out character type digitization and normalization processing on the network flow data;

2. The real-time traffic detection method based on network situation awareness according to claim 1, wherein: when distributing the computing power, the computing power scheduling module firstly distributes all the computing power to the Multi-Class detection Net in priority, and then distributes the residual computing power to the Multi-Label detection Net and the Data evaluation Net.

3. The real-time traffic detection method based on network situation awareness according to claim 1, wherein: and (3) for the Multi-Class detection Net, using a Sigmoid layer at the last layer of the network, normalizing the output probability value to [0,1], setting a correctly labeled threshold value to be 0.5, and when the network is trained, calculating the error on each label by using a two-Class cross entropy loss function, wherein the error function of the neural network is equal to the sum of all label loss functions.

4. The real-time traffic detection method based on network situation awareness according to claim 1, wherein: the Data augmentation Net uses a generalized learning model to augment network traffic, the generalized learning model continuously collects normal traffic and abnormal traffic in the operation process, and uses GAN to augment Data of abnormal Data to form augmented training Data mixed by the abnormal Data and the normal traffic, in the intrusion detection process, redundant storage is firstly carried out on network weight parameters through a situation perception module, and after augmented training Data of a certain scale are generated, the redundantly stored network is trained for updating the parameters.

5. The real-time traffic detection method based on network situation awareness is characterized in that, in step S3, the performing of the two-class detection by the Multi-Label detection Net under the constraint of distributed computing power includes the following two steps:

6. The real-time traffic detection method based on network situation awareness according to claim 5, wherein: in step S301, spatio-temporal features are captured using spatio-temporal connection learning, which includes a spatio-temporal block and a transition block, where the spatio-temporal block includes two core feature extraction blocks Conv and a long short term memory layer LSTM, and is implemented by a packet CNN and a long short term memory layer LSTM, respectively, where the packet convolutional layer uses a 3 × 3 filter, the number of channels of the output feature map is twice the number of input channels, and the number of packets is the number of data channels of the input model; the transition block is used for reducing dimensionality, and a long short-term memory layer LSTM without changing dimensionality is added in the transition block.

7. The method as claimed in claim 6, wherein a batch normalization layer, a Maxpooling layer and a Dropout layer are added in the space-time block.

8. The real-time traffic detection method based on network situation awareness according to claim 5, wherein: in step S302, the Multi-Label detection Net completes feature extraction by 1 self-attention layer and 3 full connections, the attentiveness weight is calculated by the attentiveness layer according to the number of actual input channels, and matrix multiplication is performed on the attentiveness weight and a feature map output in the first stage of situational awareness detection, the TanH is used as an activation function in the FC1 layer, a Dropout layer with a dropping rate of 0.5 is added, the dimension of input data is reduced by half in the FC2 layer and the FC1 layer, and the Sigmoid is used as an activation function to output classification results in the FC3 layer by setting the type of threshold labeled input data.

9. The real-time traffic detection method based on network situation awareness according to claim 1, wherein: the Multi-Class detection Net finishes feature extraction work by 3 convolution layers and 2 self-attention layers, classification work is finished by 3 full-connection layers and 2 Dropout layers, a Conv1 layer receives 16-channel image data, a 32-channel feature image is output, a 3 x 3 convolution kernel is used, an activation function TanH is calculated by an Atte1 layer according to the number of actual input channels, matrix multiplication is carried out on the mask attention weight and a feature map output by the Conv1 layer, the Conv2 layer and the Conv3 layer respectively receive 32-channel image data 0 and 64-channel image data 0, respectively output 64-channel feature images and 128-channel feature images, a 2 x 2 convolution kernel is used, the activation function TanH is used, the FC1 layer and the FC2 layer in the Atte2 layer respectively contain 512-64 neurons, the activation function TanH and the Dropout layer with the rejection rate of 0.5 are used, and the FC3 layer contains 2 neurons and is used for marking the classification of input data.

10. The real-time traffic detection method based on network situation awareness according to claim 1, wherein the step S5: the Data augmentation Net collecting the correctly distinguished normal flow and abnormal flow and using GAN to perform Data augmentation on the abnormal flow comprises the following steps: the Data evaluation Net firstly uses a 3 multiplied by 3 filter to carry out convolution calculation on Data, then uses 3 multiplied by 1 filters to carry out convolution calculation on each feature map, respectively uses the obtained results as query and key to calculate self-attention score, obtains attention weight through softmax function, and then carries out matrix multiplication on the attention weight and the value to obtain a new feature map.