CN113055384A

CN113055384A - SSDDQN network abnormal flow detection method

Info

Publication number: CN113055384A
Application number: CN202110271456.1A
Authority: CN
Inventors: 董仕; 夏元俊
Original assignee: Zhoukou Normal University
Current assignee: Zhoukou Normal University
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2021-06-29

Abstract

The invention discloses a semi-supervised Double Deep Q-Network (SSDDQN) Network abnormal flow detection method based on Deep reinforcement learning Double Deep Q-Network, and relates to the technical field of computer Network safety. The method comprises the following steps: the method comprises the steps of obtaining a training sample of computer network flow data, establishing a neural network, training the neural network by using the training sample in an SSDDQN mode, updating parameters of the neural network, and finally performing anomaly detection on the network flow data by using the trained neural network. The SSDDQN mode of the invention can not only reduce the manual marking cost and improve the learning performance, but also ensure that the neural network is simpler and faster and is easier to be deployed in a harsh network environment, and simultaneously improve the detection accuracy of unknown attacks.

Description

SSDDQN network abnormal flow detection method

Technical Field

The invention relates to the technical field of computer Network safety, in particular to a method for detecting Network abnormal flow of a semi-supervised Double Deep Q-Network (SSDDQN).

Background

Nowadays, a great number of abnormal behaviors causing malicious consequences such as network failures, abuses, attacks and the like exist in the internet, and the behaviors are often reflected in network traffic, wherein abnormal situations such as worms, port views, denial of service attacks, distributed denial of service attacks and the like are particularly common. These anomalies tend to waste network resources, resulting in degraded performance of network devices and end hosts, and even security concerns for a large number of network users.

At present, most of network traffic intrusion detection is mainly based on misuse detection or a machine learning algorithm: the misuse detection is to distinguish the abnormal behavior from the behavior under the normal condition according to the known characteristics to realize the detection of the intrusion behavior according to the known characteristics, but the detection effect of the unknown attack type is very poor and the false alarm rate is very high, and the maintenance of the characteristics is mostly completed by adopting a manual mode; the traditional machine learning detection algorithm depends on manual extraction of flow characteristics and labeling, the manual intervention is serious, and the classification effect depends on the quality of the manually extracted characteristics to a great extent. Both of the above rely on labor, which is extremely costly.

Deep learning is used as a high-level branch of machine learning, complex data can be processed, data characteristics can be automatically learned only through training, but due to a complex network, prediction cannot be quickly trained, and therefore the deep learning cannot be deployed in a harsh environment with real-time response.

Disclosure of Invention

The method for detecting the network abnormal flow of the SSDDQN can solve the problems in the prior art.

The invention provides a method for detecting network abnormal flow of SSDDQN, which comprises the following steps:

step 1, obtaining sample data from computer network flow data, wherein the sample data comprises training samples;

step 2, establishing a neural network, and enabling the flow characteristics s in the current flow data in the training sample_tInputting the label A of all flow characteristics into a neural network, predicting all Q function values under the current flow characteristics according to each label, obtaining the maximum Q function value through a greedy strategy algorithm, and obtaining the maximum Q function value through the maximum Q functionNumerical value is obtained to obtain predicted flow label a 'under current flow characteristic'_t；

Step 3, labeling a 'with the predicted flow'_tWith the true label a in the training sample^* _tComparing, if consistent, obtaining the reward r_tThe reward value is 1; if not, award r_tIs 0;

step 4, receiving the flow characteristic s of the next stage in the training sample_t+1Predicting the corresponding label A' by an unsupervised learning algorithm, calculating all Q function values by a target network, obtaining the maximum Q function value, and replaying the reward r in the set according to the Q function value and experience_tCalculating a target Q function value;

step 5, passing the current flow characteristic s_tAnd predicted traffic tag a'_tCalculating a current Q function value when the current neural network is trained, obtaining a loss function through the current Q function value and a target Q function value, updating parameters of the current neural network through a back propagation algorithm, and periodically copying network parameters to the target network to obtain the trained neural network;

and 6, inputting the flow data to be detected into the trained neural network for abnormal flow detection.

Preferably, step 3 is carried out to obtain s_t、r_t、a'_t、s_t+1It is then placed in the empirical playback set, and step 4, when used, is randomly taken from the empirical playback set.

Preferably, step 2 specifically comprises:

the flow characteristics s in the current flow data in the training sample are measured_tAnd label a ═ of all traffic characteristics₀,a₁,…,a_t,…,a_n) Inputting into a neural network, and calculating all Q function values:

Q(s,a)＝E[R_t|s_t＝s,a_t＝a]

wherein, E represents the value of the expected value,

in return for time t, γ isAttenuation factor, avoiding R in continuous tasks_tF, changing the value of the time step to be infinity, wherein T is a time step, and n is the number of tags;

predicting all Q function values under the current flow characteristics according to each label:

Q(s_t,a)＝[Q(s_t,a₀),Q(s_t,a₁),...,Q(s_t,a_t),...,Q(s_t,a_n)]

and (3) solving a maximum Q function value through a greedy strategy algorithm:

Policy(s_t)＝arg_amax(Q(s_t,a))

obtaining a predicted flow label a 'under the current flow characteristic through the maximum Q function value'_t。

Preferably, step 4 specifically includes:

receiving the flow characteristic s of the next stage in the training sample_t+1Predicting the label A 'by an unsupervised learning algorithm, and using the predicted label A' and the traffic characteristics s_t+1All Q function values are calculated and the maximum value is found:

Q's_t+1＝maxQ'(s_t+1A')

the reward r in the experience playback set_tCalculating a target Q function value:

Q^*＝r_t+γ·Q's_t+1。

preferably, step 5 specifically includes:

by current flow characteristic s_tAnd predicted traffic tag a'_tCalculating the current Q function value during the current neural network training:

Q's_t＝Q'(s_t,a'_t)

and solving a loss function through the current Q function value and the target Q function value:

L＝1/n·∑_n(Q's_t-Q^*)²

and finally, updating the parameters of the current neural network through a back propagation algorithm, and periodically copying the network parameters to the target network.

Preferably, step 1 also applies to training samplesForming a new training sample after sampling, and 2, obtaining the flow characteristic s in the current flow data in the new training sample_tAnd all the flow characteristics label A input neural network to calculate all the Q function values.

Preferably, the sample data further includes a test sample, and when the trained neural network is obtained in step 5, the test sample is input into the neural network to test the performance of the neural network.

The method for detecting the network abnormal flow of the SSDDQN has the following advantages that:

1. and (5) manually marking the cost. At present, the scale of marking data is far from keeping up with the application requirement, and the manual marking cost is extremely high. With the labeled data, the algorithm can be trained on the basis, and the higher the quality of data labeling is, the more accurate the learning result is. And the CNN, RNN and DBN which are commonly used at present all adopt a supervised learning mode, a large amount of marking cost is needed, and the manual marking cost can be reduced by adopting a semi-supervised learning mode, so that the learning performance is improved.

2. Is easier to implement. Most of the current flow anomaly detection uses an open data set, namely, off-line detection. But with the expansion of network data in these years, the kinds and the number of attack traffic are increased, even unknown variants are caused, and therefore, the detection of the traffic is more difficult, and the importance of real-time detection is highlighted. Because the neural networks with functions of deep reinforcement learning strategies, values or Q functions and the like for classification are simpler and faster, the neural networks are easier to deploy in a harsh network environment. Furthermore, the reward function for detection is very flexible and does not need to be differentiated.

3. And (4) unknown attack detection. Most of the current simulation works, training the model by using a public and famous data set, and only detecting existing attacks but not detecting unknown attacks. And adopting semi-supervised deep reinforcement learning, adopting unsupervised learning algorithm clustering, predicting the characteristic label of the target network, and generating a Q function value by the target network so as to improve the accuracy of unknown attack detection.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a network abnormal traffic detection method of SSDDQN in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present invention provides a network abnormal traffic detection method of SSDDQN, which includes the following steps:

step 1, obtaining sample data from computer network flow data, dividing the sample data into training samples and testing samples, and forming new training samples after sampling the training samples in small batches.

Step 2, establishing a deep neural network, wherein the deep neural network has three hidden layers, the number of neurons of each hidden layer is 100, and the receiving state in the sampled new training sample, namely the flow characteristic s in the current flow data_tAnd actions, i.e. label a ═ a for all traffic characteristics₀,a₁,…,a_t,…,a_n) Input to the neural network to calculate all Q function values:

Q(s,a)＝E[R_t|s_t＝s,a_t＝a]

wherein, E represents the value of the expected value,

the time t is used as a return, gamma is used as an attenuation factor, and R is avoided during the continuous task_tAnd f, T is a time step, and n is the number of tags.

Q(s_t,a)＝[Q(s_t,a₀),Q(s_t,a₁),...,Q(s_t,a_t),...,Q(s_t,a_n)]

and (3) solving a maximum Q function value through a greedy strategy algorithm:

Policy(s_t)＝arg_amax(Q(s_t,a))

the greedy strategy algorithm is to gradually approach a given target from a certain initial solution of the problem so as to obtain a better solution as fast as possible. The algorithm stops when a certain step in the algorithm is reached and can no longer proceed. That is, the greedy policy algorithm does not consider the overall optimum, but rather makes the best choice currently viewed.

Step 3, labeling a 'with the predicted flow'_tWith real label a in training sample sampled in small batch^* _tComparing, if consistent, obtaining the reward r_tThe reward value is 1; if not, award r_tThe value of (d) is 0. Will predict traffic label a'_tAnd a prize r_tPutting the data into an empirical playback set, wherein the empirical playback set is used for solving the problems of correlation and non-static distribution among data, and the predicted traffic label a 'obtained at the current moment is used'_tPrize r_tWhen the storage is needed in the next stage, the storage is randomly taken out.

Step 4, receiving the flow characteristic s of the next stage in the training sample_t+1Predicting the label A 'by an unsupervised learning algorithm, and using the predicted label A' and the traffic characteristics s_t+1All Q function values are calculated and the maximum value is found:

Q's_t+1＝maxQ'(s_t+1A')

Q^*＝r_t+γ·Q's_t+1

wherein the unsupervised learning algorithm comprises: k-means, hierarchical clustering, Gaussian mixture model GMM and the like.

Taking K _ means as an example, the calculation method of the target Q function value is as follows:

(1) at S_t+1In the method, k samples are randomly selected

(2) Initialize l cluster centers { x₁,x₂,...,x_l}；

(3) Calculating Euclidean distance from each object to each cluster center

(4) Determining x from the nearest mean vector_lCluster marking of (2): lambda [ alpha ]_l＝argmin_ld_lk；

(5) Sample x_lDividing into corresponding clusters: c_λl＝C_λl∪{x_l}；

(6) The label a' of the last cluster is output.

Step 5, passing the current flow characteristic s_tAnd predicted traffic tag a'_tTo calculate the current Q function value when the current neural network is trained:

Q's_t＝Q'(s_t,a'_t)

the above-mentioned method for calculating the current Q function value predicted during the training of the current neural network is the same as the calculation direction of the target Q function value, and a description thereof is not repeated.

L＝1/n·∑_n(Q's_t-Q^*)²

finally, updating parameters of the current neural network through a back propagation algorithm, and periodically copying the network parameters to a target network to obtain the trained neural network, wherein the parameters comprise weight values

Offset value

And so on, where ρ is the learning rate.

And 6, performing performance test on the trained neural network by using the test sample.

And 7, inputting the flow data to be detected into a neural network for abnormal flow detection.

To verify the effectiveness of the method of the invention, experiments were carried out using the public and well-known data set NSL-KDD. The experimental platform is an associative desktop computer, the system is Windows10, the processor is Intel (R) core (TM) i7-8700 CPU @3.20GHz, and the RAM is 16 GB. Because 23 feature labels are contained in the training sample of the NSL-KDD data set and 38 feature labels are contained in the testing sample, the performance that the intrusion detection framework reduces the labor cost and improves the detection accuracy rate can be embodied due to the unbalanced distribution of the data set labels. The experiment mainly comprises the following steps:

1. and (6) sampling data. The original flow data is sampled and divided into a training sample and a testing sample, and the training sample and the testing sample mainly comprise a current state (current flow characteristic), an action (real label) and a next-stage state (next-stage flow characteristic).

2. All Q-function values and rewards are calculated. Calculating all Q function values in the current neural network by inputting current flow characteristics and all flow characteristic labels, obtaining a predicted flow label when the Q function value is maximum through a greedy strategy algorithm, comparing the predicted flow label with a real label, and if the predicted flow label is consistent with the real label, rewarding is 1, and if the predicted flow label is inconsistent with the real label, rewarding is 0. And finally, calculating the current Q function value during the current neural network training through the predicted flow label and the current flow characteristic.

3. And calculating a final Q function value. By inputting in an unsupervised learning algorithmOne-stage traffic features direct prediction of their labels A ', using the predicted labels A' and traffic features s_t+1All the Q function values are calculated, and the maximum value is the final Q function value.

4. A loss function is calculated. And obtaining a target Q function value by multiplying the final Q function value by the attenuation factor and adding the obtained reward, calculating a loss function by the target Q function value and the current Q function value during the current neural network training so as to update the neural network parameters, and storing the training model.

5. The neural network is evaluated by evaluation criteria. Inputting a test sample, loading a local neural network model, and calculating an evaluation result through a series of evaluation criteria.

The analysis experiment result shows that compared with the traditional machine learning, the accuracy of the deep reinforcement learning is improved by about 7 percent and the F1-score value is improved by about 8 percent compared with the common deep reinforcement learning.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A network abnormal flow detection method of SSDDQN is characterized by comprising the following steps:

step 2, establishing a neural network, and enabling the flow characteristics s in the current flow data in the training sample_tAnd all the labels A of the traffic characteristics are input into the neural network, and prediction is carried out according to each labelObtaining the maximum Q function value through a greedy strategy algorithm according to all Q function values under the current flow characteristic, and obtaining a predicted flow label a 'under the current flow characteristic according to the maximum Q function value'_t；

2. The method for detecting abnormal network traffic of SSDDQN as claimed in claim 1, wherein step 3 is obtaining s_t、r_t、a'_t、s_t+1It is then placed in the empirical playback set, and step 4, when used, is randomly taken from the empirical playback set.

3. The method for detecting network abnormal traffic of SSDDQN according to claim 1, wherein step 2 specifically comprises:

the flow characteristics s in the current flow data in the training sample are measured_tAnd label a ═ of all traffic characteristics₀,a₁,…,a_t,…,a_n) Transfusion systemEntering the neural network, calculating all Q function values:

Q(s,a)＝E[R_t|s_t＝s,a_t＝a]

wherein, E represents the value of the expected value,

the time t is used as a return, gamma is used as an attenuation factor, and R is avoided during the continuous task_tF, changing the value of the time step to be infinity, wherein T is a time step, and n is the number of tags;

Q(s_t,a)＝[Q(s_t,a₀),Q(s_t,a₁),...,Q(s_t,a_t),...,Q(s_t,a_n)]

and (3) solving a maximum Q function value through a greedy strategy algorithm:

Policy(s_t)＝arg_amax(Q(s_t,a))

4. The method for detecting network abnormal traffic of SSDDQN according to claim 1, wherein step 4 specifically comprises:

Q's_t+1＝maxQ'(s_t+1A')

Q^*＝r_t+γ·Q's_t+1。

5. the method for detecting network abnormal traffic of SSDDQN as claimed in claim 4, wherein step 5 specifically comprises:

Q's_t＝Q'(s_t,a'_t)

L＝1/n·∑_n(Q's_t-Q^*)²

6. The method according to claim 1, wherein step 1 further samples the training samples to form new training samples, and step 2 combines the traffic features s in the current traffic data in the new training samples_tAnd all the flow characteristics label A input neural network to calculate all the Q function values.

7. The method according to claim 1, wherein the sample data further includes a test sample, and when the trained neural network is obtained in step 5, the test sample is input into the neural network to test the performance of the neural network.