CN116647391A

CN116647391A - Network intrusion detection method and system based on parallel self-encoder and weight discarding

Info

Publication number: CN116647391A
Application number: CN202310641389.7A
Authority: CN
Inventors: 欧毓毅; 周仲潇
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2023-06-01
Filing date: 2023-06-01
Publication date: 2023-08-25

Abstract

The application discloses a network intrusion detection method and a system based on parallel self-encoder and weight discarding, wherein the method comprises the following steps: acquiring a network flow data set and preprocessing data to obtain a corresponding training set and a corresponding testing set; based on a sparse self-encoder and a depth self-encoder, a weight-drop technology is introduced, and a PAEN-WDGRU intrusion detection model is constructed; performing iterative training on the PAEN-WDGRU intrusion detection model through the training set until a preset iterative threshold is reached, and outputting the trained PAEN-WDGRU intrusion detection model; the test set is input into a trained PAEN-WDGRU intrusion detection model for prediction and comparison. By using the method and the device, the robustness of the model can be improved, and the risk of overfitting of the model can be reduced while the dependence of the flow characteristics is not destroyed. The network intrusion detection method and system based on parallel self-encoder and weight discarding can be widely applied to the technical field of intrusion detection.

Description

Network intrusion detection method and system based on parallel self-encoder and weight discarding

Technical Field

The application relates to the technical field of intrusion detection, in particular to a network intrusion detection method and system based on parallel self-encoder and weight discarding.

Background

With the advent of new internet technologies such as file sharing, mobile payment, instant messaging and the like, the current network security environment becomes more and more complex and variable, and at the same time, network attackers become more hidden, so that the attack cost is further reduced, and all these factors seriously threaten the network security environment;

in order to resist malicious traffic attacks in a network, an intrusion detection technology is rapidly developed, intrusion detection is used as a network barrier to well detect malicious traffic, but as network attack means are increasingly abundant and network traffic is rapidly increased, a detector of an intrusion detection system needs to be continuously updated, higher requirements are also provided for the detection rate and false alarm rate of the intrusion detection system, as machine learning is continuously developed and widely applied, the intrusion detection system based on a machine learning model is greatly improved in performance, the stronger characteristic automatic extraction capability of the machine learning model can well cope with a large number of network traffic and various network attacks, more importantly, deep learning is used as a more advanced machine learning method, the characteristics in high-dimensional network traffic data can be automatically extracted, the characteristics of malicious traffic are recorded and identified, and as the characteristics of the network traffic are based on time sequence, RNN can have stronger malicious traffic detection capability, has gradient and vanishing gradient, and is easier to store a better performance of a network (such as a network explosion model) in a short-cycle mode although the RNN has the disadvantage of being better developed, and the network is easier to be more suitable for a network explosion model (more than a network is easily applied). In order to avoid this risk, the risk of overfitting is widely applied to many existing cyclic neural network models by using a dropout technology, however, dropout is to discard neurons directly with a probability, and since the hidden state and the memory state of the discarded neurons in the next time step are discarded, the information flow in the time sequence is damaged, in addition, the self-encoder is used as an unsupervised neural network model, the non-linear feature dimension reduction can be performed on the network traffic, and more effective new features are extracted. However, only a single self-encoder is used to learn only one feature representation when reducing the dimension of the flow feature, and more abundant and complex flow features may not be captured.

Disclosure of Invention

In order to solve the technical problems, the application aims to provide a network intrusion detection method and a network intrusion detection system based on parallel self-encoders and weight discarding, which can improve the robustness of a model and reduce the risk of overfitting of the model while the dependency relationship of flow characteristics is not destroyed.

The first technical scheme adopted by the application is as follows: the network intrusion detection method based on parallel self-encoder and weight discarding comprises the following steps:

acquiring a network flow data set and preprocessing data to obtain a corresponding training set and a corresponding testing set;

based on a sparse self-encoder and a depth self-encoder, a weight-drop technology is introduced, and a PAEN-WDGRU intrusion detection model is constructed;

performing iterative training on the PAEN-WDGRU intrusion detection model through the training set until a preset iterative threshold is reached, and outputting the trained PAEN-WDGRU intrusion detection model;

inputting the test set into the trained PAEN-WDGRU intrusion detection model for prediction to obtain a predicted value;

and comparing the predicted value with the network flow data set to obtain a comparison result, and judging whether the network flow data set comprises intrusion network flow data according to the comparison result.

Further, the step of obtaining the network traffic data set and performing data preprocessing to obtain a corresponding training set and a corresponding testing set specifically includes:

acquiring a network traffic data set;

converting character characteristic data in the network flow data set into numerical values to obtain network flow characteristic numerical values;

performing one-hot coding on the network traffic characteristic value to obtain a coded network traffic characteristic value;

normalizing the encoded network flow characteristic value to obtain a normalized network flow characteristic value;

and dividing and labeling the normalized network flow characteristic values according to a preset proportion to obtain a training set and a testing set.

Further, the step of constructing a PAEN-WDGRU intrusion detection model specifically comprises the following steps:

combining a parallel self-encoder network model, a splicing layer, a weight discarding gating cycle unit neural network model and a full connection layer to construct a PAEN-WDGRU intrusion detection model;

the parallel self-encoder network model comprises a sparse self-encoder network and a depth self-encoder network, wherein the sparse self-encoder network and the depth self-encoder network are in parallel connection, the sparse self-encoder network comprises an encoding layer, two hidden layers and a decoding layer, a sparsity penalty term is introduced to the output of the encoding layer, and the depth self-encoder network comprises a plurality of encoding layers, a potential feature representation layer and a plurality of decoding layers;

the weight discarding gating cycle unit neural network model comprises a reset gate of a gating cycle unit, an update gate of the gating cycle unit, a candidate hidden layer and an output layer, wherein the candidate hidden layer is introduced into the weight-drop technology.

Further, the step of iteratively training the PAEN-WDGRU intrusion detection model through the training set until reaching a preset iteration threshold and outputting the trained PAEN-WDGRU intrusion detection model specifically comprises the following steps:

inputting the training set into a PAEN-WDGRU intrusion detection model for training;

the sparse self-encoder network and the deep self-encoder network based on the PAEN-WDGRU intrusion detection model respectively perform feature dimension reduction processing on the training set to obtain a training data set after corresponding dimension reduction;

splicing the training data sets after dimension reduction based on a splicing layer of the PAEN-WDGRU intrusion detection model to obtain spliced training data sets;

performing feature extraction processing on the spliced training data set based on a weight discarding gating circulating unit neural network model of the PAEN-WDGRU intrusion detection model to obtain a feature training data set;

classifying the feature training data set based on a full connection layer of the PAEN-WDGRU intrusion detection model to obtain a classification result, wherein the classification result comprises abnormal intrusion flow data and normal flow data;

and (3) circulating the step of training the PAEN-WDGRU intrusion detection model until the preset iterative training times are reached, and obtaining the trained PAEN-WDGRU intrusion detection model.

Further, the training set is trained by the depth self-encoder network in the PAEN-WDGRU intrusion detection model, which specifically comprises the following steps:

using a relu function as an activation function for the coding layer of the depth self-encoder network;

for the decoding layers of the depth self-encoder network, other decoding layers use a relu function as an activation function except that the last decoding layer uses a sigmoid function;

for hidden layer compositions where the potential feature representation layer of the depth self-encoder network is a decreasing number of neurons for the plurality of hidden layer neurons, the number of hidden layer neurons is set to 32 and an adam optimizer is selected using the optimizer to minimize the loss function of the depth self-encoder network.

Further, the expression of the loss function of the depth self-encoder network is:

in the above, x _i Representing real data, x' _i Representing the reconstructed data, and a represents the number of input layer neurons.

Further, the step of training the training set by the sparse self-encoder network in the PAEN-WDGRU intrusion detection model specifically comprises the following steps:

for the coding layer of the sparse self-encoder network, the activation function uses a relu function, the decoding layer uses a sigmoid function, and the number of neurons of the hidden layer is set to be 32;

adding a sparsity penalty term to the output of the encoding layer of the sparse self-encoder network, introducing a degree of sparsity of the neurons, a desired level of activation of the neurons, and an average level of activation of the neurons, and selecting an adam optimizer to minimize a loss function of the sparse self-encoder network.

Further, the expression of the loss function of the sparse self-encoder network is:

in the above, A _SAE (a,x _i ,x' _i ) Representing a loss function of a sparse self-encoder network, A _AE (a,x _i ,x' _i ) Representing the loss function of the depth-from-encoder network, ε represents the sparseness of the neurons, ζ represents the desired level of activation of the neurons, ζ _j Represents the mean activation level of neurons, x _i Representing real data, x' _i Representing the reconstructed data, a and b represent the number of input layer neurons and hidden layer neurons, respectively.

Further, the step of performing feature extraction processing on the spliced training data set by the weight discarding gating cycle unit neural network model based on the PAEN-WDGRU intrusion detection model to obtain a feature training data set specifically comprises the following steps:

inputting the spliced training data set into a weight discarding gating circulating unit neural network model;

based on the weight discarding and gating the reset gate of the loop unit neural network model, carrying out reset forgetting processing on the state information of the current time step to obtain a reset data set;

based on the weight, discarding an update gate of the gate control loop unit neural network model, controlling the combination of the reset data set and the current input information in the current time step so as to update the state information of the current time step and obtain updated data information;

the weight matrix of the candidate hidden layer of the weight discarding gating circulating unit neural network model is calculated by using a weight-drop technology;

and combining the updated data information with the candidate hidden layer state at the current moment, calculating the output at the current moment, and obtaining the feature training data set.

The second technical scheme adopted by the application is as follows: a network intrusion detection system based on parallel self-encoder and weight dropping, comprising:

the preprocessing module is used for acquiring a network flow data set and preprocessing data to obtain a corresponding training set and a corresponding testing set;

the construction module is used for constructing a PAEN-WDGRU intrusion detection model based on the sparse self-encoder and the depth self-encoder and by introducing weight-drop technology;

the training module is used for carrying out iterative training on the PAEN-WDGRU intrusion detection model through the training set until a preset iteration threshold is reached, and outputting the trained PAEN-WDGRU intrusion detection model;

the test module is used for inputting the test set into the trained PAEN-WDGRU intrusion detection model to predict, so as to obtain a predicted value;

and the comparison module is used for comparing the predicted value with the network flow data set, obtaining a comparison result and judging whether the network flow data set comprises intrusion network flow data according to the comparison result.

The method and the system have the beneficial effects that: according to the application, a network flow data set is acquired and data preprocessing is performed, a weight-drop technology is further introduced based on a sparse self-encoder and a depth self-encoder, a PAEN-WDGRU intrusion detection model is constructed, characteristics with more remarkable and expressive capacity are obtained through extracting the characteristics of the parallel self-encoder network model, robustness of the model is improved, a weight drop gate-control circulating unit neural network model uses the weight-drop technology for a hidden layer of the GRU, weights of a certain row in a weight matrix are randomly dropped, compared with drop direct-drop neurons, the information flow in a time sequence is destroyed due to the fact that the dropped neurons are dropped in a hidden state and a memory state of the next time step, and therefore, the weight drop gate-control circulating unit neural network model can better reduce the risk of model overfitting and improve the generalization capability of the model.

Drawings

FIG. 1 is a flow chart of steps of a network intrusion detection method based on parallel self-encoder and weight discarding of the present application;

FIG. 2 is a block diagram of a network intrusion detection system based on parallel self-encoders and weight dropping in accordance with the present application;

FIG. 3 is a flow chart of the overall framework of the PAEN-WDGRU intrusion detection model of the present application;

FIG. 4 is a schematic diagram of a prior art self-encoder network architecture;

FIG. 5 is a schematic diagram of a sparse self-encoder network architecture of the present application;

FIG. 6 is a schematic diagram of the deep self-encoder network architecture of the present application;

FIG. 7 is a schematic diagram of the structure of the gated loop cell of the present application.

Detailed Description

The application will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.

Referring to fig. 1, the present application provides a network intrusion detection method based on parallel self-encoder and weight discarding, the method comprising the steps of:

s1, preprocessing a data set by utilizing coding and data standardization;

specifically, network flow data is used as a data set, character characteristic data of the data set is converted into a numerical value, and then one-hot coding is carried out on the characteristic numerical value; for example, when the character values 'tcp' and 'udp' are included in the Protocol field, the 'tcp' is mapped to 0, the 'udp' is mapped to 1, and then one-hot encoding is performed, the 'tcp' is changed to [1,0], and the 'udp' is changed to [0,1], so that the discrete feature values are mapped to the European space.

Further, each characteristic value is normalized separately, and the formula is as follows:

in the above, x _min Is the minimum value of the feature, x _max Is the maximum of this feature;

dividing a data set, dividing the whole data set into a training set and a testing set according to the proportion of 8:2, respectively taking out Label fields in the training set and the testing set, judging the flow data piece by piece, if the Label field value is 'Benign', marking as 0, otherwise marking as 1, namely marking a normal sample as 0, marking an abnormal sample as 1, storing the training data as train_X, and storing a training Label as train_Y; the Test data is stored as test_X and the Test tag is stored as test_Y.

S2, constructing a parallel self-encoder network (PAEN) model;

specifically, referring to fig. 3, the parallel self-encoder network model consists of two models, a sparse self-encoder network (SAE) and a deep self-encoder network, respectively.

S21, constructing a depth self-encoder network model;

specifically, a DAE (depth self-encoder network model) is formed by adding a plurality of encoding layers and a plurality of decoding layers on the basis of a self-encoder (shown in fig. 4), as shown in fig. 6, wherein the number of neurons of each hidden layer is smaller than that of neurons of the previous layer, and finally a potential feature representation is obtained, and the weights are updated by using a mean square error loss function in the DAE, wherein the weights are as follows:

S22, building a sparse self-encoder network model;

specifically, referring to fig. 5, sae (sparse self encoder network model) consists of one encoding layer, two concealment layers and one decoding layer. The input of the coding layer is the characteristic dimension of the data, the first hidden layer is the output of the coding layer, the second hidden layer is the dropout layer, the decoding layer is mainly used for reconstructing the data, in SAE, a sparsity penalty term is added to the output of the coder layer, and in the training process of SAE, the sparsity epsilon of the neuron, the expected activation level zeta of the neuron and the average activation level zeta of the neuron are introduced _j The final loss function to be minimized is as follows:

in the above, x _i Is the real data, x' _i Is the reconstructed data; a and b refer to the number of input layer neurons and hidden layer neurons, respectively, which means that only a few elements in the output vector of the encoder layer are non-zero, which can force the network to learn a more robust feature representation.

S3, building a weight discarding gating cycle unit (WDGRU) neural network model;

specifically, referring to fig. 7, a reset gate r of the gated loop cell is designed _t For controlling how much previous state information should be forgotten or reset at the current time step to better accommodate the current input information, the reset gate calculation formula is as follows:

r _t ＝σ(W _rh h _t-1 +W _rx x _t +b _r )

update gate for designing a gated loop unitz _t For controlling how much previous state information should be kept and combined with the current input information at the current time step to update the state information of the current time step. The update gate calculation formula is as follows:

z _t ＝σ(W _zh h _t-1 +W _zx x _t +b _z )

the candidate hidden layer state at the current moment is calculated, and the calculation formula is as follows:

calculating the output h at the current moment _t The expression is:

wherein σ is a sigmoid function, +. _rh ，W _rx And b _r Is reset gate r _t Weight and bias, W _zh ，W _zx And b _z Is to update z _t Weight and bias of W _hh ，W _hx And b _h The weights and biases of the candidate hidden layers;

on the basis of GRU, weight-drop technology is used for the weight matrix of the hidden layer, so that a WDGRU model is built. The weight-drop technique is a regularization technique such that each row of the weight matrix between hidden layers is discarded with a probability of 1-p, in other words, the weight-drop technique introduces dynamic sparsity into the weight matrix W, and during the training phase, a binary mask matrix M with the same shape as W is generated _W Each element randomly takes 0 or 1 according to a certain probability distribution, and W and M are calculated _W Carrying out Hadamard product operation to obtain a weight matrix after part of rows are randomly discardedThe following formula is shown:

in the above-mentioned method, the step of,representing the weight matrix after randomly discarding part of the rows, W represents the weight matrix, M _W Representing a binary mask matrix.

S4, training a PAEN model;

specifically, firstly defining the finally required feature dimension, defining the node number of an encoding layer, a potential representation layer and a decoding layer, initializing the weight of a network, wherein the weight and bias of the node number satisfy Gaussian distribution, using a relu function as an activation function for the encoding layer of the DAE, using a relu function as an activation function for the decoding layer except for the last layer, setting the neuron number of a hidden layer to be 32; an optimizer is used to select an adam optimizer to minimize the loss function. The parameter updating of the optimizer is not affected by the gradient expansion transformation, the parameter does not need to be manually adjusted, the learning rate can be automatically adjusted, the phenomenon that the parameter falls into a local minimum value is effectively avoided, and the method is suitable for being applied to large-scale data and parameter scenes. Setting the training round number of the model to be 10, namely epochs=10, for the coding layer of SAE, the activating function uses a relu function, and the decoding layer uses a sigmoid function; the number of neurons of the hidden layer is set to 32; selecting an adam optimizer to minimize a loss function by using the optimizer, setting the training round number of the model to be 10, namely epochs=10, taking out the coding layers of the two networks after training the two networks, inputting data to finish feature dimension reduction of the data, and combining the outputs of the coding layers and the data to be used as the output of PAEN;

the data in DAE is first input into the encoder, then subjected to a series of hidden layers to perform feature extraction and dimension reduction operations, and finally output to the encoding layer. And the output of the coding layer is reconstructed layer by layer through a series of hidden layers of the decoder, and finally an output result is obtained. In the whole process, the network aims to minimize the error between the input data after being encoded and decoded and the original data, and network parameters are optimized through a back propagation algorithm, so that good characteristic extraction and reconstruction effects are achieved; in SAE, the coding layer maps the input data to a smaller coding vector by activating the sigmoid function, and the decoding layer maps the coding vector back to the original input space by the sigmoid function, so that the self-encoder reduces the loss function added with the sparse penalty term as much as possible during training, thereby ensuring that the coding layer can effectively retain the key information of the input data.

S5, training a WDGRU model;

specifically, firstly, defining the input, output and hidden layer structure of a model, taking the output of PAEN as the input of WDGRU, initializing the weights and the biases of the model, setting the number of neurons of the output layer to 64, further extracting the characteristics by the WDGRU, and inputting the characteristics into a fully connected layer for classification by using a softmax function, wherein the softmax function is as follows:

in the WDGRU model, optimization is performed using a cross entropy loss function, which is shown below:

in the above formula, u is a true class label,the model training method comprises the steps of setting the model training round number to 10, and obtaining a trained model after the 10 rounds of training are completed;

in the training process, weight matrix of GRU is regularized by using Weight-Drop technology. The method of processing the data is the same as that of a standard GRU, namely, long-term dependence in sequence data is captured through circular connection. The difference is that the WDGRU randomly discards the weight matrix during training to prevent overfitting. Thus, at each time step, the input data is processed through the round robin connection of the GRU and some of the weight matrices will be randomly discarded for regularization.

S6, after the training stage is completed, inputting the test set into the trained model to obtain a predicted value; and comparing the predicted value with the real label of the flow to obtain a corresponding comparison result.

Referring to fig. 2, a network intrusion detection system based on parallel self-encoder and weight dropping, comprising:

The content in the method embodiment is applicable to the system embodiment, the functions specifically realized by the system embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.

While the preferred embodiment of the present application has been described in detail, the application is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims

1. The network intrusion detection method based on parallel self-encoder and weight discarding is characterized by comprising the following steps:

2. The network intrusion detection method based on parallel self-encoder and weight discarding according to claim 1, wherein the step of obtaining a network traffic data set and performing data preprocessing to obtain a corresponding training set and test set specifically comprises:

acquiring a network traffic data set;

3. The network intrusion detection method based on parallel self-encoder and weight dropping according to claim 2, wherein the step of constructing a PAEN-WDGRU intrusion detection model specifically comprises:

4. The network intrusion detection method based on parallel self-encoder and weight discarding according to claim 3, wherein the step of iteratively training the PAEN-WDGRU intrusion detection model through a training set until a preset iteration threshold is reached and outputting the trained PAEN-WDGRU intrusion detection model specifically comprises the steps of:

5. The parallel self-encoder and weight discard based network intrusion detection method of claim 4, wherein the training set is trained by the depth self-encoder network in the PAEN-WDGRU intrusion detection model, comprising the steps of:

6. The parallel self-encoder and weight discard based network intrusion detection method of claim 5, wherein the expression of the loss function of the depth self-encoder network is:

7. The parallel self-encoder and weight discard based network intrusion detection method of claim 6, wherein the step of training the training set by the sparse self-encoder network in the PAEN-WDGRU intrusion detection model specifically comprises:

8. The parallel self-encoder and weight discard based network intrusion detection method of claim 7, wherein the sparse self-encoder network loss function is expressed as:

9. The network intrusion detection method based on parallel self-encoder and weight discarding according to claim 8, wherein the step of performing feature extraction processing on the spliced training data set by using a weight discarding gating cyclic unit neural network model based on the PAEN-WDGRU intrusion detection model to obtain a feature training data set specifically comprises the following steps:

10. The network intrusion detection system based on parallel self-encoder and weight discarding is characterized by comprising the following modules: