CN114844679A

CN114844679A - Distributed denial of service attack detection method based on MCA-KMeans algorithm in SDN

Info

Publication number: CN114844679A
Application number: CN202210367801.6A
Authority: CN
Inventors: 张佳璇; 侯爱琴; 吴昊; 王思明; 肖云; 季于东
Original assignee: Northwest University
Current assignee: Northwest University
Priority date: 2022-04-08
Filing date: 2022-04-08
Publication date: 2022-08-02

Abstract

The invention provides a distributed denial of service attack detection method based on an MCA-KMeans algorithm in an SDN, which comprises the following steps: step 3, after the flow table information is obtained, calculating to obtain a characteristic numerical value in a T period, and forming a data set obtained through calculation; the characteristic numerical value comprises a source port entropy value, a destination IP address entropy value and an average value of the data packet; and 4, aiming at each switch, using the data set obtained by calculation in the step 3 as input data, then using the data in a clustering algorithm classification model for training, and obtaining a distributed denial of service attack model of multivariate data analysis and clustering algorithm after t iterations until the centroid is not changed any more. The invention can quickly, accurately detect the attack in real time, locate the attack host to delete the attack data, protect the security of the software defined network as much as possible, reduce the consumption of network resources, reduce the time required by detecting the attack and improve the accuracy rate of attack detection.

Description

Distributed denial of service attack detection method based on MCA-KMeans algorithm in SDN

Technical Field

The invention belongs to the technical field of computer network security, relates to a software defined network, and particularly relates to a distributed denial of service attack detection method based on an MCA-KMeans algorithm in an SDN.

Background

With the continuous development of network technology, in the age of big data, a new network architecture, Software Defined Networks (SDN), appears in the internet field. It decouples the control plane and the data plane and centralizes network management by means of a specific application running on the controller, and despite many advantages, SDN network centric security issues remain one of the concerns of the research community.

Distributed Denial of Service (DDoS) is one of the most threatening attacks in SDN network security, and refers to that multiple attackers in different positions send attacks to one or more targets at the same time, the attack modes are numerous, the destructive power is strong, the counterfeit source IP address confuses the defense system, the detection difficulty is increased, and the traditional attack detection method is difficult to detect attack data along with the change of data, so that the network system is paralyzed.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a distributed denial of service attack detection method based on an MCA-KMeans algorithm in an SDN, so as to solve the technical problem that the accuracy of attack detection in the prior art needs to be further improved.

In order to solve the technical problems, the invention adopts the following technical scheme:

a distributed denial of service attack detection method based on MCA-KMeans algorithm in SDN comprises the following steps:

step 1, acquiring a characteristic data set comprising an entropy value of an endogenous port in a T period, an entropy value of a destination IP address in the T period and an average value of a data packet in the T period, performing characteristic standardization on data, then obtaining a group of new data sets through multivariate data analysis, training a classification model by using a clustering algorithm, and constructing a software defined network topology architecture;

step 2, a software-defined network topology structure is used for connecting a controller, and data acquisition is carried out by acquiring flow table information in a switch of a data layer on a control plane;

step 3, after the flow table information is obtained in the step 2, calculating to obtain a characteristic numerical value in a T period, and forming a calculated data set;

the characteristic numerical value comprises a source port entropy value, a destination IP address entropy value and an average value of the data packet;

step 4, aiming at each switch, using the data set obtained by calculation in the step 3 as input data, then using the data for training a clustering algorithm classification model, and obtaining a distributed denial of service attack model of multivariate data analysis and clustering algorithm after t iterations until the centroid is not changed any more;

And step 5, detecting and judging by adopting the distributed denial of service attack model of the multivariate data analysis and clustering algorithm obtained by training in the step 4, if the judgment result is malicious flow, positioning an attack target by the controller, deleting the attack flow table entry in the flow table and returning the attack flow table entry to the current flow table information of the switch.

The invention also has the following technical characteristics:

preferably, in step 3, the characteristic value is calculated according to the following formula:

in the formula:

P(x _i ) Indicating the number of ith source ports in the T period;

sum (p (x)) represents the total number of endogenous ports in the T cycle;

S(x _i ) Indicating the number of ith source IP addresses in the T period;

sum (s (x)) represents the total number of endogenous ports in the T cycle;

packet _ i represents the size of the ith packet in the T period;

SPE represents source port entropy;

DAE represents a destination IP address entropy value;

APS represents the average value of the packet.

Preferably, in step 5, the specific process of detecting the distributed denial of service attack model of the multivariate data analysis and clustering algorithm includes:

step 501, analyzing the relevance of the metadata:

extracting a data set X ═ X by using a public intrusion detection evaluation data set ₁ ,x ₂ ,…,x _n }；

In the formula:

x _i represents the ith flow entry record;

A jth feature representing an ith flow entry record;

n represents the total number of rows of the data set;

performing multivariate data analysis, and using a geometric correlation formula:

calculating to obtain a new data set: d ═ TAS ¹ ,TAS ² ,…,TAS ⁿ },

In the formula:

a correlation coefficient representing the j-th characteristic and the k-th characteristic of the ith flow table entry record;

TAS ⁱ the ith row of data which represents the new data set D obtained by the calculation of the ith flow table entry record;

n represents the total number of rows representing the data set;

j represents the jth eigenvalue;

k represents the kth characteristic value;

i represents the ith data line;

step 502, the clustering algorithm model comprises the following specific steps:

step 50201, randomly selecting k points from the input data point set as k cluster centers (mu) ₁ ,μ ₂ ,…,μ _k }; in the formula: mu.s _k Representing random values in the D data set;

step 50202, initializing cluster partition C to

50203, for each point x in the data set _i Calculating the similarity coefficient between the selected cluster center and the nearest cluster center, and calculating the similarity coefficient between the selected cluster center and the nearest cluster center _i Classifying the cluster with the minimum similarity value;

step 50204, calculating the mean value of each cluster by using a mean value calculation formula after one round of calculation is finished, and taking the mean value as a new centroid of the secondary cluster;

step 50205, continuously repeating the operation of the step 50203 and the operation of the step 50204, and outputting a classification model after t iterations until all k centroid vectors are unchanged; the specific formula used is as follows:

In the formula:

SCM indicates that the similarity coefficient of the two vectors is calculated;

μn _j representing computing a new cluster centroid;

μ _j ^T a transposed vector representing a centroid vector;

n represents the total number of rows of the data set;

i represents the ith data line;

j denotes the jth centroid.

Preferably, the software defined network topology comprises one RYU controller, 4 switches and 25 hosts.

Preferably, the switches all adopt OpenFlow switches in a software defined network.

Compared with the prior art, the invention has the following technical effects:

in the invention, under the software defined network environment, the multivariate data analysis and clustering algorithm are used for training the detection model in a classification way, so that the model is suitable for various conditions, can quickly, accurately detect the attack in real time, and can position the attack host to delete the attack data, thereby protecting the safety of the software defined network as much as possible, reducing the consumption of network resources, reducing the time required by detecting the attack and improving the accuracy of attack detection.

The invention (II) establishes the mutual relation between data under the software defined network, and then utilizes the algorithm to quickly detect the attack and defend in real time.

The invention (III) also has good applicability, and tests are carried out by utilizing various public data sets to find that the result meets the requirements.

Drawings

FIG. 1 is a diagram of a distributed denial of service attack detection system architecture based on multivariate data analysis and clustering algorithm designed by the present invention.

Fig. 2 is a schematic diagram of a software-defined network topology.

FIG. 3 is a schematic diagram of attack detection steps of a distributed denial of service attack detection model based on multivariate data analysis and clustering algorithm.

Fig. 4(a) is a schematic diagram of SPEs calculated in multiple cycles in a flow table of an OpenFlow switch in the topology.

Fig. 4(b) is a schematic diagram of DAE calculated in multiple cycles in the flow table of the OpenFlow switch in the topology.

Fig. 4(c) is a schematic diagram of APS calculated in multiple cycles in the flow table of the OpenFlow switch in the topology.

Fig. 5 is a schematic diagram of time required for detecting attacks by a distributed denial of service attack detection model, a support vector machine model (SVM) and a decision tree model (DecisionTree) based on multivariate data analysis and a clustering algorithm.

Fig. 6 is a schematic diagram of performance evaluation of distributed denial of service attack detection by a distributed denial of service attack detection model, a support vector machine model (SVM), a neural network classification model (NeuralNetwork), a K-nearest neighbor classification model (KNN), and an entropy detection model (entrypydetection) based on multivariate data analysis and clustering algorithm.

FIG. 7 is a comparison graph of the performance of the distributed denial of service attack detection model based on multivariate data analysis and clustering algorithm, the comparative example 1MCA analysis method and the comparative example 2 DDoS attack detection method based on K-means clustering algorithm.

FIG. 8 is a comparison graph of the detection rates of the DDoS attack detection method based on the support vector machine algorithm and the distributed denial of service attack detection model based on the multivariate data analysis and clustering algorithm in comparative example 3.

FIG. 9 is a comparison graph of the detection performance of the DDoS attack detection new framework based on the combination of K-Means clustering and K-nearest neighbor algorithm and the distributed denial of service attack detection model based on multivariate data analysis and clustering algorithm in comparative example 4.

The present invention will be explained in further detail with reference to examples.

Detailed Description

It is to be understood that all devices, models and algorithms of the present invention, unless otherwise specified, are intended to be implemented using any of the devices, models and algorithms known in the art. For example, the clustering algorithm classification model adopts a known clustering algorithm classification model. For example, the RYU controller and the OpenFlow switch both employ known RYU controllers and OpenFlow switches.

In the present invention, it is to be noted that:

SDN, Software Defined Networks, refers to Software Defined Networks.

The MCA-KMeans algorithm, i.e., multivariable Correlation Analysis and Improved k-means clustering algorithm, refers to a Multivariate data Analysis and clustering algorithm.

DDoS, i.e., Distributed Denial of Service, refers to a Distributed Denial of Service attack.

SPE, Source Port Encopy, refers to the Source Port Entropy value.

The DAE, Destination IP Address Encopy, refers to the Entropy value of the Destination IP Address.

APS, or Average Packet Size, refers to the Average value of the Packet.

SCM, a Similarity coefficient measurement, refers to a Similarity coefficient calculated for two vectors.

The overall technical concept of the invention is as follows: the invention provides a distributed denial of service attack (DDoS) detection algorithm based on multivariate data analysis and clustering algorithm in a Software Defined Network (SDN), and the implementation environment of the algorithm is in the SDN environment. The flow table on the switch is monitored and obtained through the controller, a plurality of feature data in the flow table items are extracted, a detection model is obtained through multivariate data analysis and clustering algorithm training, the network state is monitored in real time under an SDN network architecture, DDoS attack detection is carried out on suspicious flow, the attacked OpenFlow switch is positioned, and the attacked flow table items are deleted, so that the safe communication of the SDN network is ensured, and the problems of resource exhaustion, network paralysis and the like are avoided. The existing detection method is not suitable for a network with large flow in an SDN network, for example, the detection method based on entropy value and K-nearest neighbor classification algorithm is only suitable for the situation of less data sets. The distributed denial of service attack detection algorithm based on multivariate data analysis and clustering algorithm in the software defined network not only adapts to the current network flow, but also has short detection time and low computation complexity, and meets the requirement of software defined network attack detection.

According to the method, a public data set CIC DoS dataset (2017) is used as training data, required features are extracted through a feature selection algorithm, a new data set is constructed to enable the data to be connected, and then a clustering algorithm is used for training a classification model. In the testing stage, the classification model is used for classifying the testing set, whether the classification result is malicious flow is judged, if the judgment result is the malicious flow, the flow table is processed and the result processed by the switch is returned, so that the safe communication of the network is ensured, the detection time is short, the accuracy is high, and the applicability is strong.

The present invention is not limited to the following embodiments, and all equivalent changes based on the technical solutions of the present invention fall within the protection scope of the present invention.

The embodiment is as follows:

the embodiment provides a distributed denial of service attack detection method based on an MCA-KMeans algorithm in an SDN, which comprises the following steps:

step 1, as shown in fig. 1, acquiring a feature data set including an endogenous port entropy (SPE) in a T period, a destination IP address entropy (DAE) in the T period, and an average value (APS) of a data packet in the T period, performing feature standardization on data, then obtaining a new data set through multivariate data analysis, training a classification model by using a clustering algorithm, and constructing a Software Defined Network (SDN) topology architecture;

In this embodiment, SPE represents the information entropy of the source port in the same period, DAE represents the information entropy of the destination IP address in the same period, and APS represents the average size of the data packet in the same period.

Step 2, as shown in fig. 2, a software-defined network topology structure is used to connect a controller, and data acquisition is performed by acquiring flow table information in a switch of a data layer on a control plane;

in this embodiment, the software-defined network topology includes one RYU controller (an opening source software-defined network controller), 4 switches, and 25 hosts.

In this embodiment, the switches all adopt OpenFlow switches in a software defined network.

the flow table of each switch has a plurality of flow table entries, and each flow table entry is composed of three elements: header Fields (headers Fields) for packet matching, Counters (Counters) for counting the number of matching packets, and Actions (Actions) showing how the matching packets are processed. The header field of the flow table includes 12 tuples, such as a source port, a source MAC (media access control address) address, a destination MAC address, an IP protocol, a source IP address, and a destination IP address. The counter of the OpenFlow flow table is used for maintaining each flow table, each data flow, each device port and each forwarding queue in the switch and counting related information of data flow. For example: counting the number of table entries, the number of data packet query times, the number of data packet matching times and the like of the current activity for each flow table; counting the number of received data packets, the number of bytes, the duration of the data stream and the like for each data stream; the actions of the OpenFlow flow table are used to instruct the switch how the matching packet should be handled after it is received.

The invention obtains the size of a source port, a destination IP address and a data packet in a flow table of the switch, calculates the number of each numerical value by using a counter, then detects the characteristic change in a model so as to detect whether the attack exists, and deletes the flow table entry by using the deletion action of the flow table if the attack occurs.

In step 3, the characteristic numerical value is calculated according to the following formula:

in the formula:

P(x _i ) Representing the number of ith source ports in the T period;

sum (p (x)) represents the total number of endogenous ports in the T cycle;

S(x _i ) Indicating the number of ith source IP addresses in the T period;

sum (s (x)) represents the total number of endogenous ports in the T cycle;

packet _ i represents the size of the ith packet in the T period;

SPE represents a source port entropy value;

DAE represents a destination IP address entropy value;

APS represents the average value of the packet.

Step 4, as shown in fig. 3, for each switch, using the data set calculated in step 3 as input data, then using the data in a clustering algorithm classification model for training, and obtaining a distributed denial of service attack model of multivariate data analysis and clustering algorithm after t iterations until the centroid is not changed any more;

In step 5, the specific process of detecting the distributed denial of service attack model of the multivariate data analysis and clustering algorithm comprises the following steps:

step 501, analyzing the relevance of the metadata:

In the formula:

x _i represents the ith flow entry record;

a jth feature representing an ith flow entry record;

n represents the total number of rows of the data set;

calculating to obtain a new data set: d ═ TAS ¹ ,TAS ² ,…,TAS ⁿ },

In the formula:

TAS ⁱ (Triangle Area Space) represents the ith line of data of the new data set D obtained by the ith flow entry record through calculation;

n represents the total number of rows representing the data set;

j represents the jth eigenvalue;

k represents the kth characteristic value;

i represents the ith data line;

step 50202, initializing cluster partition C to

in the formula:

SCM indicates that the similarity coefficient of the two vectors is calculated;

μn _j representing computing a new cluster centroid;

μ _j ^T a transposed vector representing a centroid vector;

n represents the total number of rows of the data set;

i represents the ith data line;

j denotes the jth centroid.

Application example 1:

the application example is based on the MCA-KMeans algorithm-based distributed denial of service attack detection method in the SDN provided by the embodiment. As shown in fig. 1, the distributed denial of service attack detection model based on multivariate data analysis and clustering algorithm designed for the present invention is applied to test the model detection time.

And acquiring a characteristic data set comprising SPE, DAE and APS, wherein SPE represents the information entropy value of the source port in the same period, DAE represents the information entropy value of the destination IP address in the same period, and APS represents the average size of the data packet in the same period. In order to observe the change of data more clearly, firstly, the OpenFlow flow table is periodically detected in real time, the detection results of SPE, DAE and APS are obtained through statistics, as shown in fig. 4(a) to 4(c), it can be observed that when the host is attacked, the SPE value (fig. 4(b)) and the APS value (fig. 4(c)) rapidly rise, the DAE value (fig. 4(a)) rapidly falls, and the normal state and the attacked state of the host are clearly seen to be different through real-time data. And then, carrying out feature standardization by using the data set, wherein the purpose is to reduce the feature scale and reduce the data deviation. Obtaining a group of new data sets through multivariate data correlation analysis, wherein the new data sets have the advantages that the correlation among characteristic attributes is enhanced and is used for a clustering algorithm to train a classification model;

As shown in fig. 2, a Software Defined Network (SDN) topology is constructed by using a Mininet tool, which is formed by connecting end-hosts, switches and routers, to connect each switch in the SDN topology with a controller, the entire network topology is composed of a RYU controller, four switches and 25 hosts, the IP addresses of the hosts are 10.1.1.1-10.1.1.25, and the controllers acquire flow table information of each switch to acquire data;

after the creation of the topological structure and the training of the model are finished, the time detection module is embedded into the classification model for obtaining the detection time, and in addition, two classification attack detection models of SVM and decisionTree are adopted and embedded into the time detection module for comparing with the invention to evaluate the model.

As shown in fig. 3, the controller and Mininet are started to enable the SDN network to communicate normally, and the controller detection period is 2 seconds. And randomly attacking the network, automatically detecting the models, returning the results, obtaining the detection time of each model through multiple detections, drawing a comparison graph as shown in figure 5, and finding that the effect is better through comparison.

Application example 2:

the application example is based on the MCA-KMeans algorithm-based distributed denial of service attack detection method in the SDN provided by the embodiment. In order to better embody the advantages of the model, more model evaluation parameters are used for comparison. The method comprises the steps of extracting a required characteristic data set by utilizing a public data set, preprocessing the data, dividing the data into a training set and a testing set according to a proportion, wherein the training set is used for training a model to determine model parameters, the testing set is used for evaluating the model, the evaluating model is judged by using a confusion matrix, the confusion matrix can reflect the performance of the model more comprehensively, and a lot of indexes can be derived from the confusion matrix. Wherein tp (truepositive): true case, actually positive predicts positive; fp (falsephotositive): false positive case, actually negative but predicted positive; fn (falsenegive): false negative examples, actually positive but predicted negative; tn (truenenegative): on the contrary, an actual negative prediction is negative.

The model is first evaluated for Accuracy (Accuracy), which is defined as: for a given test set, the ratio of the number of samples correctly classified by the classification model to the total number of samples is calculated as follows. The method is obtained through testing, as shown in fig. 6, and the accuracy of detection is better than that of other methods through comparison.

Utilizing a Recall rate (Recall) and False Alarm Rate (FAR) evaluation model, wherein the Recall rate is defined as Recall ratio, namely the proportion of all the parts which are correctly predicted to be positive and actually are positive; the false alarm rate is defined as the proportion of the negative samples to the total negative samples, which reflects the ability of the classifier or the model to correctly predict the purity of the positive samples, and the calculation formula is as follows.

Comparative example 1:

this comparative example presents a DDoS attack detection method based on multivariate data analysis, which steps are similar to the multivariate data analysis of the specific embodiment, except for the classification model (2).

Step 50201, the classification model (2) randomly selects k points from the input data point set to serve as k clustering centers { mu ₁ ,μ ₂ ,…,μ _k In which μ _j Representing random values in the D data set.

Step 50202, initializing cluster partition C to

50203, for each point x in the data set _i Calculating the similarity coefficient between the selected cluster center and the nearest cluster center, and calculating the similarity coefficient between the selected cluster center and the nearest cluster center _i The cluster with the smallest similarity value is classified.

Step 50204, after one round of calculation is finished, the mean value of each cluster is calculated by using a mean value calculation formula and is used as a new centroid of the secondary cluster.

Step 50205, the operations of the step 50203 and the step 50204 are continuously repeated, and after t iterations, until all the k centroid vectors are unchanged, a classification model is output.

Comparative example 2:

the comparison example provides a DDoS attack detection method based on a K-means clustering algorithm, and other steps of the method are similar to those of the improved clustering algorithm of the specific embodiment, except for the multivariate data analysis (1).

Multivariate data analysis (1) uses the public intrusion detection evaluation data set to extract the data set X ═ X ₁ ,x ₂ ,…,x _n },

Represents the ith flow table entry record,

represents the jth characteristic of the ith flow entry record. Performing multivariate feature data analysis, and utilizing a geometric correlation formula:

calculating to obtain a new data set: d ═ TAS ¹ ,TAS ² ,…,TAS ⁿ },

Comparative analysis was performed on example 2, comparative example 1, and comparative example 2, a detection model was constructed according to the proposed algorithm, and comparative tests were performed on the accuracy, the false alarm rate, and the recall rate using the same data set, and the comparative results are shown in fig. 7. As can be seen from the figure, the DDoS attack detection method based on multivariate data analysis and clustering algorithm has better results on the test set, which indicates that the detection model is feasible.

Comparative example 3:

the comparison example provides a DDoS attack detection method based on a support vector machine algorithm, and the DDoS attack detection method adopts an SVM algorithm to detect DDoS attacks, and is different in the algorithms of the two.

Comparative analysis was performed on example 1 and comparative example 3, a model was trained using a public data set, and comparative tests were performed on accuracy, false alarm rate, and recall rate, with the results shown in fig. 8. As can be seen from the figure, the DDoS attack detection method based on the multivariate data analysis and the clustering algorithm has higher accuracy and lower false alarm rate on the test set, and the detection model is feasible.

Comparative example 4:

the method preprocesses data by using characteristic standardization, then classifies by using the K-Means algorithm, and classifies by using the K-Means algorithm if data which cannot be classified exists.

Comparative example 5:

the method is used for standardizing data, then clustering is generated by using a clustering algorithm, and an abnormal value is obtained for each time sliding window model by using a time sliding window model.

The embodiment 2, the comparative example 4 and the comparative example 5 are compared and analyzed, firstly, the common point is that a model is obtained by combining the clustering algorithms, the difference lies in a data preprocessing and combining method, the result obtained by comparing the data sets is shown in fig. 9, and the accuracy of the method is further verified by the comparison result.

Claims

1. A distributed denial of service attack detection method based on MCA-KMeans algorithm in SDN is characterized by comprising the following steps:

step 2, a software defined network topology structure is used for connecting a controller, and data acquisition is carried out by acquiring flow table information in a switch of a data layer on a control plane;

Step 3, after the flow table information is obtained in the step 2, calculating to obtain a characteristic numerical value in a T period, and forming a data set obtained through calculation;

2. The method of claim 3, wherein the eigenvalue is calculated according to the following formula:

in the formula:

P(x _i ) Representing the number of ith source ports in the T period;

sum (p (x)) represents the total number of endogenous ports in the T cycle;

S(x _i ) Indicating the number of ith source IP addresses in the T period;

sum (s (x)) represents the total number of endogenous ports in the T cycle;

packet _ i represents the size of the ith packet in the T period;

SPE represents a source port entropy value;

DAE represents a destination IP address entropy value;

APS represents the average value of the packet.

3. The method for detecting distributed denial of service attack based on MCA-KMeans algorithm in SDN of claim, wherein the specific process of detecting the distributed denial of service attack model of multivariate data analysis and clustering algorithm in step 5 comprises:

step 501, analyzing the relevance of the metadata:

extracting a data set X ═ X by using a public intrusion detection evaluation data set ₁ ，x ₂ ，...，x _n }；

In the formula:

x _i represents the ith flow entry record;

a jth feature representing an ith flow entry record;

n represents the total number of rows of the data set;

calculating to obtain a new data set: d ═ TAS ¹ ，TAS ² ，...，TAS ⁿ }，

In the formula:

n represents the total number of rows representing the data set;

j represents the jth eigenvalue;

k represents the kth characteristic value;

i represents the ith data line;

step 50201, randomly selecting k points from the input data point set as k cluster centers (mu) ₁ ，μ ₂ ，…，μ _k }; in the formula: mu.s _k Representing random values in the D data set;

step 50202, initializing cluster partition C to

in the formula:

SCM indicates that the similarity coefficient of the two vectors is calculated;

μn _j representing computing a new cluster centroid;

μ _j ^T a transposed vector representing a centroid vector;

n represents the total number of rows of the data set;

i represents the ith data line;

j denotes the jth centroid.

4. The method of claim, wherein the software-defined network topology comprises one RYU controller, 4 switches and 25 hosts.

5. The method of claim, wherein each of the switches is an OpenFlow switch in a software-defined network.