CN113572783A

CN113572783A - Network intrusion detection method based on attack sharing loss and deep neural network

Info

Publication number: CN113572783A
Application number: CN202110869744.7A
Authority: CN
Inventors: 赵仕伟; 罗影; 刘红; 肖伟; 陈柏希
Original assignee: Chengdu Aeronautic Polytechnic
Current assignee: Chengdu Aeronautic Polytechnic
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2021-10-29

Abstract

The invention discloses a network intrusion detection method based on attack sharing loss and a deep neural network, which comprises the following steps: s1: collecting a plurality of network attack type data, inputting the network attack type data into a built deep neural network, and outputting a predicted value of each network attack type; s2: constructing an attack sharing loss function according to the predicted value of each network attack type, classifying the network attack types by using the attack sharing loss function, and taking the network attack types as intrusion detection results; s3: and updating the deep neural network by utilizing gradient optimization according to the network attack type data, and entering next network intrusion detection. The invention aims at the intrusion detection of various networks and realizes high detection precision. And an attack sharing loss function is defined to solve the problem of unbalanced classification, and an Adam optimizer for adaptively updating the learning rate is utilized to accelerate the model learning process.

Description

Network intrusion detection method based on attack sharing loss and deep neural network

Technical Field

The invention belongs to the technical field of network monitoring, and particularly relates to a network intrusion detection method based on attack sharing loss and a deep neural network.

Background

Cyber attacks pose a serious threat to computer system security and expose digital wealth to significant risks. There is a strong need for an effective intrusion detection system that can identify intrusion attacks with high accuracy. Classifying intrusion events is challenging due to the wide variety of attacks. Furthermore, in normal network environments, where most connections are initiated by benign behavior, the class imbalance problem in intrusion detection forces the classifier to be biased towards the majority or benign class, leaving many attack events undetected. The traditional intrusion detection technology based on the signature is heavily dependent on a signature database constructed by a security expert, so that a new attack cannot be detected. Various data mining and machine learning models, such as decision trees, Support Vector Machines (SVMs), and graph mining algorithms, are also used to discover anomalies from network monitoring data. However, they are not suitable for representing intrusion detection classification functions with many complex variations, and besides, deep learning, which avoids the need of feature extraction compared to the conventional machine learning model, is used as a method for processing complex input and output mapping, but can only extract inherent or general features in data, and does not have the capability of learning complex classification functions by using the deep learning.

Disclosure of Invention

The invention aims to solve the problems that network attack classification of a network connection example following long tail class distribution is uneven and different types of intrusion attacks are distributed unevenly in practice, and provides a network intrusion detection method based on attack sharing loss and a deep neural network.

The technical scheme of the invention is as follows: a network intrusion detection method based on attack sharing loss and a deep neural network comprises the following steps:

s1: collecting a plurality of network attack type data, inputting the network attack type data into a built deep neural network, and outputting a predicted value of each network attack type;

s2: constructing an attack sharing loss function according to the predicted value of each network attack type, classifying the network attack types by using the attack sharing loss function, and taking the network attack types as intrusion detection results;

s3: and updating the deep neural network by utilizing gradient optimization according to the network attack type data, and entering next network intrusion detection.

Further, in step S1, the deep neural network includes an input layer, a plurality of hidden layers, and an output layer, which are connected in sequence;

in step S1, the specific method for outputting the predicted value of each network attack type is as follows: sequentially inputting a plurality of network attack type data to an input layer and a hidden layer, and adding weights and bias vectors into the hidden layer to be used as hidden layer neurons to obtain characteristic values of the plurality of network attack type data; and entering an activated state through an activation function of the output layer, and calculating the predicted value of each type of network attack type at the output layer.

Further, in the hidden layer of the deep neural network, the first

Output of i-th hidden layer neuron of layer hidden layer

The calculation formula of (2) is as follows:

wherein g (-) represents a corrected linear unit activation function,

indicating the connection layer of

Layer hiding layer and

the ith row in the weight matrix of the layer concealment layer,

is shown as

A bias vector of a layer concealment layer; g (x) max {0, x };

first, the

Sparse output of layer hidden layers

The calculation formula of (2) is as follows:

where r denotes a mask vector composed of Bernoulli random variables,

is shown as

The layer is hidden.

Further, the output layer of the deep neural network comprises c activation units, wherein c represents the number of classes;

predicted value of each type of network attack type data belonging to jth type of network attack

The calculation formula of (2) is as follows:

where soft max (. cndot.) represents the activation function, exp (. cndot.) represents the exponential function, z_jRepresents the linear activation vector of layer j, z_kRepresenting the k-th layer linear activation vector.

Further, in step S2, the shared loss function J is attacked_ASThe expression of (a) is:

wherein, J_CERepresents the cross entropy loss, N represents the number of training samples, λ (-) represents the control parameter function, I (-) represents the indicator function,

the predicted value of the ith class of network attack type is shown, c represents the number of classes, and log (-) represents a logarithmic function.

Further, step S3 includes the following sub-steps:

s31: computing a stochastic gradient g in a deep neural network_tThe calculation formula is as follows:

wherein, theta_t-1Denotes the t-1 th parameter vector, J (theta)_t-1) Representing a vector theta with respect to a parameter_t-1The loss value of (d);

s32: according to a random gradient g_tUpdating the first moment s used to store the random gradient_tAnd a second moment r for storing the mean of the exponential decay of the squared gradient_tThe update formula is as follows:

s_t＝ρ₁s_t-1+(1-ρ₁)g_t

where ρ is₁Representing a hyperparameter in a first moment, s_t-1First order matrix, p, representing the t-1 st random gradient₂Representing a hyperparameter in a second moment, r_t-1Represents the t-1 th mask vector;

s33: according to a first moment s_tAnd second moment r_tUpdating the parameter vector theta_tAnd finishing gradient updating, wherein the updating formula is as follows:

where ξ denotes the step size, δ denotes the stability factor, θ_t-1Representing the t-1 th parameter vector.

The invention has the beneficial effects that: the invention aims at the intrusion detection of various networks and realizes high detection precision. And an attack sharing loss function is defined to solve the problem of unbalanced classification, and an Adam optimizer for adaptively updating the learning rate is utilized to accelerate the model learning process.

Drawings

FIG. 1 is a flow chart of a method of network intrusion detection;

FIG. 2 is a block diagram of a deep neural network;

fig. 3 is a flow chart of a training method.

Detailed Description

The embodiments of the present invention will be further described with reference to the accompanying drawings.

As shown in fig. 1, the present invention provides a network intrusion detection method based on attack sharing loss and deep neural network, comprising the following steps:

In the embodiment of the present invention, as shown in fig. 2, in step S1, the deep neural network includes an input layer, a plurality of hidden layers, and an output layer, which are connected in sequence;

The input layer of the deep neural network consists of d units, each unit representing an input element. Specifically, each input data point is represented as (x, y), where x is the feature set and y is the label.

In an embodiment of the invention, in the hidden layer of the deep neural network, the first

Output of i-th hidden layer neuron of layer hidden layer

The calculation formula of (2) is as follows:

wherein g (-) represents a corrected linear unit activation function,

indicating the connection layer of

Layer hiding layer and

the ith row in the weight matrix of the layer concealment layer,

is shown as

A bias vector of a layer concealment layer; g (x) max {0, x };

first, the

Sparse output of layer hidden layers

The calculation formula of (2) is as follows:

where r denotes a mask vector composed of Bernoulli random variables,

is shown as

The layer is hidden.

In an embodiment of the present invention, an output layer of the deep neural network includes c activation units, where c represents the number of classes;

The calculation formula of (2) is as follows:

In the embodiment of the present invention, in step S2, the shared loss function J is attacked_ASThe expression of (a) is:

For any example (x)⁽ⁱ⁾,y⁽ⁱ⁾) If it is a benign event, y ⁽ⁱ⁾1, otherwise, y⁽ⁱ⁾∈{2,...,c}。

Most modern neural networks use a cross-entropy loss of J_CETo describe the difference between the ground truth labels and the model predictions. Specifically, J_CEThe losses are calculated as follows:

where θ is composed of a weight matrix between successive layers in the neural network, p data is the empirical data distribution in the training set, p (y)⁽ⁱ⁾|x⁽ⁱ⁾(ii) a θ) is the neural network to input x⁽ⁱ⁾Probability of correct classification, N is the number of training samples, c is the number of classes, I (y)⁽ⁱ⁾J) is an index function,

the parameter θ in the network is optimized to minimize J_CE(theta) and a desired classification accuracy is obtained.

One drawback of the cross-entropy loss function is that it does not take into account the type of misclassification, and therefore penalizes equally all classes of classification errors. Intrusion detection systems have two types of error classification: first, intrusion misclassification: intrusion attacks are misclassified as benign events; second, attack classification errors: a class a intrusion attack (e.g., a denial of service attack) is wrongly classified as a class B intrusion attack (e.g., a probe attack).

In practice, intrusion misclassification should be penalized more than attack misclassification, because attack misclassification still triggers alarms to the information technology security team and allows further inspection of the event, while intrusion misclassification allows attack events to bypass the security inspection and cause potentially serious damage. Therefore, intrusion misclassification should be more penalized than attack misclassification. To solve the problem of differential penalties for different types of misclassifications.

Thus, the second step builds the attack sharing loss function, and the invention improves the basic cross-entropy loss function for any instance (x)⁽ⁱ⁾,y⁽ⁱ⁾) If it is a benign event, y ⁽ⁱ⁾1, otherwise, y⁽ⁱ⁾E.g.. c.. Meanwhile, the invention designs an attack sharing loss function J_ASIt is provided with an additional regularization term to penalize intrusion misclassification, i.e. misestimation between benign and attack tags.

In the attack sharing loss function flow, the loss term is added with an additional regularization penalty value, wherein J is less than the control parameter lambda_ASSimilar to the Vanilla cross entropy loss function; when the lambda control parameter is large, J_ASThe method is intended to be used as a target function for solving the binary classification problem, namely benign and attack, so that the benign tag is reduced by a penalty value, the error tag is increased by the penalty value, and the unbalanced classification problem is solved.

In the embodiment of the present invention, step S3 includes the following sub-steps:

s_t＝ρ₁s_t-1+(1-ρ₁)g_t

ρ₁,ρ₂e (0,1) is a hyperparameter that determines decay rate;

s, r are initialized as follows:

s33: according to a first moment s_tAnd second moment r_tUpdate the parameter theta_tAnd finishing gradient updating, wherein the updating formula is as follows:

By the above formula, a larger evolution is made in a more gradual slope direction of the parameter space, which contributes to a faster convergence compared to SGD, since another attractive property of the Adam optimizer is that it is robust to the choice of hyper-parameters.

In deep learning, the most widely used optimization algorithm is Stochastic Gradient Descent (SGD). In each round it estimates the gradient using a set of small samples and updates the parameters. Although simple, the progressive convergence speed of the SGD is slow, especially when there are saddle points (i.e., points where one dimension is tilted up and the other dimension is tilted down) and a gentle region (i.e., a region where the gradient remains steadily high). Saddle points and slow down regions are widely present due to the complexity of the intrusion detection classification boundaries.

And in the third step of gradient optimization, in order to accelerate the learning process, an Adam optimizer for adaptively updating the learning rate is adopted. The exponentially decaying averages of the historical and squared gradients are first stored using two variables, s and r, respectively. Initially, set s to 0 and r to 0, in the t-th round of forward and backward propagation, a small batch of m samples was taken from the training set and the random gradient was calculated.

And (3) completing an attack sharing deep network training process through a three-step process, as shown in fig. 3, extracting features of data through a hidden layer in a network, outputting the features through Softmax, performing attack sharing loss calculation with a data tag, and finally updating the gradient of the data tag by an Adam optimizer, namely completing a batch of data iterative training process.

In an embodiment of the invention, the test uses three data sets, namely, KDD99, CICIDS17 and CICIDS18 data sets. In the KDD99 dataset, each connection record is described by 41 elements and 1 tag, and these features include three pieces of information, namely basic connection information (e.g., duration, protocol type (tcp, udp, icmp), number of erroneous segments, number of urgent packets, etc.), content information (e.g., number of failed login attempts, number of shell prompts, number of operations on access control files, etc.), and traffic information (e.g., number of connections to host in the last two seconds, proportion of connections with synchronization errors, etc.). Attacks in the data set are classified into 4 types, namely denial of service, detection, U2R (ordinary users illegally obtain root access authority of the system) and R2L (remote attackers obtain local access authority to the host by using certain bugs); the CICIDS17 data aggregation has 283 ten thousand network connection instances, each instance is described by 81 functions, and the attack types comprise denial of service, penetration and brute force attack; the CICIDS18 data set comprises 630 ten thousand network connection instances, each instance has 77 elements, and the attack types comprise denial of service, data transmission, and the like,Infiltration and botnets. To evaluate the class imbalance level in each data set, we also report a class imbalance metric Ω in the training set_imbIt is defined as:

where n denotes the number of instances in the dataset, n_iIndicates the number of instances belonging to class I,

Ω_imbthe minimum percentage count of data samples required to measure all classes in order to form an overall balanced or evenly distributed, larger Ω_imbValues indicate a higher level of grade imbalance.

KDD99 data set, derived the classification accuracy of the invention and baseline approach to KDD99 data set. This patent produced the highest CBA among the test results. This shows that the present invention is effective in detecting intrusion attack events from unbalanced datasets. KNN, DT and MLP + CE focus only on most levels, producing unsatisfactory performance on few levels. For example, KNN and DT fail to capture any U2R and R2L attack instances. In contrast, cost sensitive classifiers associate too much cost for the U2R and R2L classes because they are extremely low in representativeness. This makes the classifier too prone to these classes and underperforms on the KDD99 dataset. Oversampling and undersampling mitigate the side effects of the step imbalance problem, but this improvement is not as relevant as the present invention.

CICIDS17 dataset, tested the accuracy of all classifiers. In addition to brute force attacks, this patent produces similar and satisfactory accuracy and recall across each class, and produces the best CBA across all classifiers. The brute force attack instance only accounts for around 0.5% of the data set, and although the attack sharing penalty function is intended to bring the decision boundary closer to the attack class, it does little to help this class. Second, KNN and MLP + CE perform close to this patent. The cause was further investigated and it was found that most of the attack instances were present in the training set and the test set. The DT simply marks each test case as a benign connection. Similarly, CNN recognizes almost every connection as a DoS attack, and cost sensitive classifiers focus only on benign and DoS classes, with the CBA of these three baselines being naturally lower.

CICIDS18 dataset, the accuracy of the invention was compared to the baseline of CIDS18 dataset. The invention has the second best performance in solving the problem of the order unbalance and shows the effectiveness of the class sharing loss function. No baseline method produced CBA above 30%. Likewise, a cost sensitive baseline focuses all attention to the level representing the least.

The working principle and the process of the invention are as follows: the invention extracts features by deep learning, learns classification boundaries, and designs a new loss function, namely an attack sharing loss function, which eliminates the bias towards most or benign classes by moving decision boundaries to attack classes. The method specifically comprises the following steps: firstly, a deep feedforward network is constructed, complex patterns of benign communication and malicious connection are learned from training data, in order to accelerate the learning process of big data, a new optimization algorithm is adopted to dynamically adjust the learning rate, and the algorithm tracks the exponential decay average values of first-order and second-order moments of past gradients; secondly, in order to solve the class imbalance problem in intrusion detection, a new loss function, namely attack sharing loss, is designed for the deep feedforward network, and the attack sharing loss function adopts different types of misclassification difference punishment, so that intrusion misclassification is carried out with more punishment than attack misclassification.

It will be appreciated by those of ordinary skill in the art that the embodiments of the network intrusion detection method based on attack sharing loss and deep neural networks described herein are intended to assist the reader in understanding the principles of the present invention, and it is to be understood that the scope of the invention is not limited to such specific statements and embodiments. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. A network intrusion detection method based on attack sharing loss and a deep neural network is characterized by comprising the following steps:

2. The method for detecting network intrusion based on attack sharing loss and deep neural network of claim 1, wherein in step S1, the deep neural network comprises an input layer, a plurality of hidden layers and an output layer which are connected in sequence;

3. The method of claim 2, wherein the output of the i-th hidden layer neuron in the l-th hidden layer among the hidden layers of the deep neural network is output from the i-th hidden layer neuron

The calculation formula of (2) is as follows:

wherein g (-) represents a corrected linear unit activation function,

represents the ith row in the weight matrix connecting the l-1 layer hidden layer and the l-th layer hidden layer,

a bias vector representing the l-th layer hidden layer;

sparse output h of l-1 hidden layer^(l-1)The calculation formula of (2) is as follows:

h^(l-1)＝h^(l-1)*r

where r denotes a mask vector composed of Bernoulli random variables, h^(l-1)Indicating the l-1 th hidden layer.

4. The method according to claim 2, wherein the output layer of the deep neural network comprises c active units, wherein c represents the number of classes;

The calculation formula of (2) is as follows:

wherein softmax (. cndot.) represents an activation function, exp (. cndot.) represents an exponential function, z_jRepresents the linear activation vector of layer j, z_kRepresenting the k-th layer linear activation vector.

5. The method for detecting network intrusion based on attack sharing loss and deep neural network of claim 1, wherein in the step S2, the attack sharing loss function J_ASThe expression of (a) is:

6. The method for detecting network intrusion based on attack sharing loss and deep neural network of claim 1, wherein the step S3 includes the following sub-steps:

s_t＝ρ₁s_t-1+(1-ρ₁)g_t