CN114844679A - Distributed denial of service attack detection method based on MCA-KMeans algorithm in SDN - Google Patents

Distributed denial of service attack detection method based on MCA-KMeans algorithm in SDN Download PDF

Info

Publication number
CN114844679A
CN114844679A CN202210367801.6A CN202210367801A CN114844679A CN 114844679 A CN114844679 A CN 114844679A CN 202210367801 A CN202210367801 A CN 202210367801A CN 114844679 A CN114844679 A CN 114844679A
Authority
CN
China
Prior art keywords
data
data set
attack
value
ith
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210367801.6A
Other languages
Chinese (zh)
Inventor
张佳璇
侯爱琴
吴昊
王思明
肖云
季于东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwest University
Original Assignee
Northwest University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwest University filed Critical Northwest University
Priority to CN202210367801.6A priority Critical patent/CN114844679A/en
Publication of CN114844679A publication Critical patent/CN114844679A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1458Denial of Service
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a distributed denial of service attack detection method based on an MCA-KMeans algorithm in an SDN, which comprises the following steps: step 3, after the flow table information is obtained, calculating to obtain a characteristic numerical value in a T period, and forming a data set obtained through calculation; the characteristic numerical value comprises a source port entropy value, a destination IP address entropy value and an average value of the data packet; and 4, aiming at each switch, using the data set obtained by calculation in the step 3 as input data, then using the data in a clustering algorithm classification model for training, and obtaining a distributed denial of service attack model of multivariate data analysis and clustering algorithm after t iterations until the centroid is not changed any more. The invention can quickly, accurately detect the attack in real time, locate the attack host to delete the attack data, protect the security of the software defined network as much as possible, reduce the consumption of network resources, reduce the time required by detecting the attack and improve the accuracy rate of attack detection.

Description

Distributed denial of service attack detection method based on MCA-KMeans algorithm in SDN
Technical Field
The invention belongs to the technical field of computer network security, relates to a software defined network, and particularly relates to a distributed denial of service attack detection method based on an MCA-KMeans algorithm in an SDN.
Background
With the continuous development of network technology, in the age of big data, a new network architecture, Software Defined Networks (SDN), appears in the internet field. It decouples the control plane and the data plane and centralizes network management by means of a specific application running on the controller, and despite many advantages, SDN network centric security issues remain one of the concerns of the research community.
Distributed Denial of Service (DDoS) is one of the most threatening attacks in SDN network security, and refers to that multiple attackers in different positions send attacks to one or more targets at the same time, the attack modes are numerous, the destructive power is strong, the counterfeit source IP address confuses the defense system, the detection difficulty is increased, and the traditional attack detection method is difficult to detect attack data along with the change of data, so that the network system is paralyzed.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a distributed denial of service attack detection method based on an MCA-KMeans algorithm in an SDN, so as to solve the technical problem that the accuracy of attack detection in the prior art needs to be further improved.
In order to solve the technical problems, the invention adopts the following technical scheme:
a distributed denial of service attack detection method based on MCA-KMeans algorithm in SDN comprises the following steps:
step 1, acquiring a characteristic data set comprising an entropy value of an endogenous port in a T period, an entropy value of a destination IP address in the T period and an average value of a data packet in the T period, performing characteristic standardization on data, then obtaining a group of new data sets through multivariate data analysis, training a classification model by using a clustering algorithm, and constructing a software defined network topology architecture;
step 2, a software-defined network topology structure is used for connecting a controller, and data acquisition is carried out by acquiring flow table information in a switch of a data layer on a control plane;
step 3, after the flow table information is obtained in the step 2, calculating to obtain a characteristic numerical value in a T period, and forming a calculated data set;
the characteristic numerical value comprises a source port entropy value, a destination IP address entropy value and an average value of the data packet;
step 4, aiming at each switch, using the data set obtained by calculation in the step 3 as input data, then using the data for training a clustering algorithm classification model, and obtaining a distributed denial of service attack model of multivariate data analysis and clustering algorithm after t iterations until the centroid is not changed any more;
And step 5, detecting and judging by adopting the distributed denial of service attack model of the multivariate data analysis and clustering algorithm obtained by training in the step 4, if the judgment result is malicious flow, positioning an attack target by the controller, deleting the attack flow table entry in the flow table and returning the attack flow table entry to the current flow table information of the switch.
The invention also has the following technical characteristics:
preferably, in step 3, the characteristic value is calculated according to the following formula:
Figure BDA0003586635070000021
in the formula:
P(x i ) Indicating the number of ith source ports in the T period;
sum (p (x)) represents the total number of endogenous ports in the T cycle;
S(x i ) Indicating the number of ith source IP addresses in the T period;
sum (s (x)) represents the total number of endogenous ports in the T cycle;
packet _ i represents the size of the ith packet in the T period;
SPE represents source port entropy;
DAE represents a destination IP address entropy value;
APS represents the average value of the packet.
Preferably, in step 5, the specific process of detecting the distributed denial of service attack model of the multivariate data analysis and clustering algorithm includes:
step 501, analyzing the relevance of the metadata:
extracting a data set X ═ X by using a public intrusion detection evaluation data set 1 ,x 2 ,…,x n };
In the formula:
x i represents the ith flow entry record;
Figure BDA0003586635070000031
Figure BDA0003586635070000032
A jth feature representing an ith flow entry record;
n represents the total number of rows of the data set;
performing multivariate data analysis, and using a geometric correlation formula:
Figure BDA0003586635070000033
calculating to obtain a new data set: d ═ TAS 1 ,TAS 2 ,…,TAS n },
Figure BDA0003586635070000034
In the formula:
Figure BDA0003586635070000035
a correlation coefficient representing the j-th characteristic and the k-th characteristic of the ith flow table entry record;
TAS i the ith row of data which represents the new data set D obtained by the calculation of the ith flow table entry record;
n represents the total number of rows representing the data set;
j represents the jth eigenvalue;
k represents the kth characteristic value;
i represents the ith data line;
step 502, the clustering algorithm model comprises the following specific steps:
step 50201, randomly selecting k points from the input data point set as k cluster centers (mu) 12 ,…,μ k }; in the formula: mu.s k Representing random values in the D data set;
step 50202, initializing cluster partition C to
Figure BDA0003586635070000041
50203, for each point x in the data set i Calculating the similarity coefficient between the selected cluster center and the nearest cluster center, and calculating the similarity coefficient between the selected cluster center and the nearest cluster center i Classifying the cluster with the minimum similarity value;
step 50204, calculating the mean value of each cluster by using a mean value calculation formula after one round of calculation is finished, and taking the mean value as a new centroid of the secondary cluster;
step 50205, continuously repeating the operation of the step 50203 and the operation of the step 50204, and outputting a classification model after t iterations until all k centroid vectors are unchanged; the specific formula used is as follows:
Figure BDA0003586635070000042
In the formula:
SCM indicates that the similarity coefficient of the two vectors is calculated;
μn j representing computing a new cluster centroid;
μ j T a transposed vector representing a centroid vector;
n represents the total number of rows of the data set;
i represents the ith data line;
j denotes the jth centroid.
Preferably, the software defined network topology comprises one RYU controller, 4 switches and 25 hosts.
Preferably, the switches all adopt OpenFlow switches in a software defined network.
Compared with the prior art, the invention has the following technical effects:
in the invention, under the software defined network environment, the multivariate data analysis and clustering algorithm are used for training the detection model in a classification way, so that the model is suitable for various conditions, can quickly, accurately detect the attack in real time, and can position the attack host to delete the attack data, thereby protecting the safety of the software defined network as much as possible, reducing the consumption of network resources, reducing the time required by detecting the attack and improving the accuracy of attack detection.
The invention (II) establishes the mutual relation between data under the software defined network, and then utilizes the algorithm to quickly detect the attack and defend in real time.
The invention (III) also has good applicability, and tests are carried out by utilizing various public data sets to find that the result meets the requirements.
Drawings
FIG. 1 is a diagram of a distributed denial of service attack detection system architecture based on multivariate data analysis and clustering algorithm designed by the present invention.
Fig. 2 is a schematic diagram of a software-defined network topology.
FIG. 3 is a schematic diagram of attack detection steps of a distributed denial of service attack detection model based on multivariate data analysis and clustering algorithm.
Fig. 4(a) is a schematic diagram of SPEs calculated in multiple cycles in a flow table of an OpenFlow switch in the topology.
Fig. 4(b) is a schematic diagram of DAE calculated in multiple cycles in the flow table of the OpenFlow switch in the topology.
Fig. 4(c) is a schematic diagram of APS calculated in multiple cycles in the flow table of the OpenFlow switch in the topology.
Fig. 5 is a schematic diagram of time required for detecting attacks by a distributed denial of service attack detection model, a support vector machine model (SVM) and a decision tree model (DecisionTree) based on multivariate data analysis and a clustering algorithm.
Fig. 6 is a schematic diagram of performance evaluation of distributed denial of service attack detection by a distributed denial of service attack detection model, a support vector machine model (SVM), a neural network classification model (NeuralNetwork), a K-nearest neighbor classification model (KNN), and an entropy detection model (entrypydetection) based on multivariate data analysis and clustering algorithm.
FIG. 7 is a comparison graph of the performance of the distributed denial of service attack detection model based on multivariate data analysis and clustering algorithm, the comparative example 1MCA analysis method and the comparative example 2 DDoS attack detection method based on K-means clustering algorithm.
FIG. 8 is a comparison graph of the detection rates of the DDoS attack detection method based on the support vector machine algorithm and the distributed denial of service attack detection model based on the multivariate data analysis and clustering algorithm in comparative example 3.
FIG. 9 is a comparison graph of the detection performance of the DDoS attack detection new framework based on the combination of K-Means clustering and K-nearest neighbor algorithm and the distributed denial of service attack detection model based on multivariate data analysis and clustering algorithm in comparative example 4.
The present invention will be explained in further detail with reference to examples.
Detailed Description
It is to be understood that all devices, models and algorithms of the present invention, unless otherwise specified, are intended to be implemented using any of the devices, models and algorithms known in the art. For example, the clustering algorithm classification model adopts a known clustering algorithm classification model. For example, the RYU controller and the OpenFlow switch both employ known RYU controllers and OpenFlow switches.
In the present invention, it is to be noted that:
SDN, Software Defined Networks, refers to Software Defined Networks.
The MCA-KMeans algorithm, i.e., multivariable Correlation Analysis and Improved k-means clustering algorithm, refers to a Multivariate data Analysis and clustering algorithm.
DDoS, i.e., Distributed Denial of Service, refers to a Distributed Denial of Service attack.
SPE, Source Port Encopy, refers to the Source Port Entropy value.
The DAE, Destination IP Address Encopy, refers to the Entropy value of the Destination IP Address.
APS, or Average Packet Size, refers to the Average value of the Packet.
SCM, a Similarity coefficient measurement, refers to a Similarity coefficient calculated for two vectors.
The overall technical concept of the invention is as follows: the invention provides a distributed denial of service attack (DDoS) detection algorithm based on multivariate data analysis and clustering algorithm in a Software Defined Network (SDN), and the implementation environment of the algorithm is in the SDN environment. The flow table on the switch is monitored and obtained through the controller, a plurality of feature data in the flow table items are extracted, a detection model is obtained through multivariate data analysis and clustering algorithm training, the network state is monitored in real time under an SDN network architecture, DDoS attack detection is carried out on suspicious flow, the attacked OpenFlow switch is positioned, and the attacked flow table items are deleted, so that the safe communication of the SDN network is ensured, and the problems of resource exhaustion, network paralysis and the like are avoided. The existing detection method is not suitable for a network with large flow in an SDN network, for example, the detection method based on entropy value and K-nearest neighbor classification algorithm is only suitable for the situation of less data sets. The distributed denial of service attack detection algorithm based on multivariate data analysis and clustering algorithm in the software defined network not only adapts to the current network flow, but also has short detection time and low computation complexity, and meets the requirement of software defined network attack detection.
According to the method, a public data set CIC DoS dataset (2017) is used as training data, required features are extracted through a feature selection algorithm, a new data set is constructed to enable the data to be connected, and then a clustering algorithm is used for training a classification model. In the testing stage, the classification model is used for classifying the testing set, whether the classification result is malicious flow is judged, if the judgment result is the malicious flow, the flow table is processed and the result processed by the switch is returned, so that the safe communication of the network is ensured, the detection time is short, the accuracy is high, and the applicability is strong.
The present invention is not limited to the following embodiments, and all equivalent changes based on the technical solutions of the present invention fall within the protection scope of the present invention.
The embodiment is as follows:
the embodiment provides a distributed denial of service attack detection method based on an MCA-KMeans algorithm in an SDN, which comprises the following steps:
step 1, as shown in fig. 1, acquiring a feature data set including an endogenous port entropy (SPE) in a T period, a destination IP address entropy (DAE) in the T period, and an average value (APS) of a data packet in the T period, performing feature standardization on data, then obtaining a new data set through multivariate data analysis, training a classification model by using a clustering algorithm, and constructing a Software Defined Network (SDN) topology architecture;
In this embodiment, SPE represents the information entropy of the source port in the same period, DAE represents the information entropy of the destination IP address in the same period, and APS represents the average size of the data packet in the same period.
Step 2, as shown in fig. 2, a software-defined network topology structure is used to connect a controller, and data acquisition is performed by acquiring flow table information in a switch of a data layer on a control plane;
in this embodiment, the software-defined network topology includes one RYU controller (an opening source software-defined network controller), 4 switches, and 25 hosts.
In this embodiment, the switches all adopt OpenFlow switches in a software defined network.
Step 3, after the flow table information is obtained in the step 2, calculating to obtain a characteristic numerical value in a T period, and forming a calculated data set;
the characteristic numerical value comprises a source port entropy value, a destination IP address entropy value and an average value of the data packet;
the flow table of each switch has a plurality of flow table entries, and each flow table entry is composed of three elements: header Fields (headers Fields) for packet matching, Counters (Counters) for counting the number of matching packets, and Actions (Actions) showing how the matching packets are processed. The header field of the flow table includes 12 tuples, such as a source port, a source MAC (media access control address) address, a destination MAC address, an IP protocol, a source IP address, and a destination IP address. The counter of the OpenFlow flow table is used for maintaining each flow table, each data flow, each device port and each forwarding queue in the switch and counting related information of data flow. For example: counting the number of table entries, the number of data packet query times, the number of data packet matching times and the like of the current activity for each flow table; counting the number of received data packets, the number of bytes, the duration of the data stream and the like for each data stream; the actions of the OpenFlow flow table are used to instruct the switch how the matching packet should be handled after it is received.
The invention obtains the size of a source port, a destination IP address and a data packet in a flow table of the switch, calculates the number of each numerical value by using a counter, then detects the characteristic change in a model so as to detect whether the attack exists, and deletes the flow table entry by using the deletion action of the flow table if the attack occurs.
In step 3, the characteristic numerical value is calculated according to the following formula:
Figure BDA0003586635070000091
in the formula:
P(x i ) Representing the number of ith source ports in the T period;
sum (p (x)) represents the total number of endogenous ports in the T cycle;
S(x i ) Indicating the number of ith source IP addresses in the T period;
sum (s (x)) represents the total number of endogenous ports in the T cycle;
packet _ i represents the size of the ith packet in the T period;
SPE represents a source port entropy value;
DAE represents a destination IP address entropy value;
APS represents the average value of the packet.
Step 4, as shown in fig. 3, for each switch, using the data set calculated in step 3 as input data, then using the data in a clustering algorithm classification model for training, and obtaining a distributed denial of service attack model of multivariate data analysis and clustering algorithm after t iterations until the centroid is not changed any more;
and step 5, detecting and judging by adopting the distributed denial of service attack model of the multivariate data analysis and clustering algorithm obtained by training in the step 4, if the judgment result is malicious flow, positioning an attack target by the controller, deleting the attack flow table entry in the flow table and returning the attack flow table entry to the current flow table information of the switch.
In step 5, the specific process of detecting the distributed denial of service attack model of the multivariate data analysis and clustering algorithm comprises the following steps:
step 501, analyzing the relevance of the metadata:
extracting a data set X ═ X by using a public intrusion detection evaluation data set 1 ,x 2 ,…,x n };
In the formula:
x i represents the ith flow entry record;
Figure BDA0003586635070000101
Figure BDA0003586635070000102
a jth feature representing an ith flow entry record;
n represents the total number of rows of the data set;
performing multivariate data analysis, and using a geometric correlation formula:
Figure BDA0003586635070000103
calculating to obtain a new data set: d ═ TAS 1 ,TAS 2 ,…,TAS n },
Figure BDA0003586635070000104
In the formula:
Figure BDA0003586635070000105
a correlation coefficient representing the j-th characteristic and the k-th characteristic of the ith flow table entry record;
TAS i (Triangle Area Space) represents the ith line of data of the new data set D obtained by the ith flow entry record through calculation;
n represents the total number of rows representing the data set;
j represents the jth eigenvalue;
k represents the kth characteristic value;
i represents the ith data line;
step 502, the clustering algorithm model comprises the following specific steps:
step 50201, randomly selecting k points from the input data point set as k cluster centers (mu) 12 ,…,μ k }; in the formula: mu.s k Representing random values in the D data set;
step 50202, initializing cluster partition C to
Figure BDA0003586635070000111
50203, for each point x in the data set i Calculating the similarity coefficient between the selected cluster center and the nearest cluster center, and calculating the similarity coefficient between the selected cluster center and the nearest cluster center i Classifying the cluster with the minimum similarity value;
step 50204, calculating the mean value of each cluster by using a mean value calculation formula after one round of calculation is finished, and taking the mean value as a new centroid of the secondary cluster;
step 50205, continuously repeating the operation of the step 50203 and the operation of the step 50204, and outputting a classification model after t iterations until all k centroid vectors are unchanged; the specific formula used is as follows:
Figure BDA0003586635070000112
in the formula:
SCM indicates that the similarity coefficient of the two vectors is calculated;
μn j representing computing a new cluster centroid;
μ j T a transposed vector representing a centroid vector;
n represents the total number of rows of the data set;
i represents the ith data line;
j denotes the jth centroid.
Application example 1:
the application example is based on the MCA-KMeans algorithm-based distributed denial of service attack detection method in the SDN provided by the embodiment. As shown in fig. 1, the distributed denial of service attack detection model based on multivariate data analysis and clustering algorithm designed for the present invention is applied to test the model detection time.
And acquiring a characteristic data set comprising SPE, DAE and APS, wherein SPE represents the information entropy value of the source port in the same period, DAE represents the information entropy value of the destination IP address in the same period, and APS represents the average size of the data packet in the same period. In order to observe the change of data more clearly, firstly, the OpenFlow flow table is periodically detected in real time, the detection results of SPE, DAE and APS are obtained through statistics, as shown in fig. 4(a) to 4(c), it can be observed that when the host is attacked, the SPE value (fig. 4(b)) and the APS value (fig. 4(c)) rapidly rise, the DAE value (fig. 4(a)) rapidly falls, and the normal state and the attacked state of the host are clearly seen to be different through real-time data. And then, carrying out feature standardization by using the data set, wherein the purpose is to reduce the feature scale and reduce the data deviation. Obtaining a group of new data sets through multivariate data correlation analysis, wherein the new data sets have the advantages that the correlation among characteristic attributes is enhanced and is used for a clustering algorithm to train a classification model;
As shown in fig. 2, a Software Defined Network (SDN) topology is constructed by using a Mininet tool, which is formed by connecting end-hosts, switches and routers, to connect each switch in the SDN topology with a controller, the entire network topology is composed of a RYU controller, four switches and 25 hosts, the IP addresses of the hosts are 10.1.1.1-10.1.1.25, and the controllers acquire flow table information of each switch to acquire data;
after the creation of the topological structure and the training of the model are finished, the time detection module is embedded into the classification model for obtaining the detection time, and in addition, two classification attack detection models of SVM and decisionTree are adopted and embedded into the time detection module for comparing with the invention to evaluate the model.
As shown in fig. 3, the controller and Mininet are started to enable the SDN network to communicate normally, and the controller detection period is 2 seconds. And randomly attacking the network, automatically detecting the models, returning the results, obtaining the detection time of each model through multiple detections, drawing a comparison graph as shown in figure 5, and finding that the effect is better through comparison.
Application example 2:
the application example is based on the MCA-KMeans algorithm-based distributed denial of service attack detection method in the SDN provided by the embodiment. In order to better embody the advantages of the model, more model evaluation parameters are used for comparison. The method comprises the steps of extracting a required characteristic data set by utilizing a public data set, preprocessing the data, dividing the data into a training set and a testing set according to a proportion, wherein the training set is used for training a model to determine model parameters, the testing set is used for evaluating the model, the evaluating model is judged by using a confusion matrix, the confusion matrix can reflect the performance of the model more comprehensively, and a lot of indexes can be derived from the confusion matrix. Wherein tp (truepositive): true case, actually positive predicts positive; fp (falsephotositive): false positive case, actually negative but predicted positive; fn (falsenegive): false negative examples, actually positive but predicted negative; tn (truenenegative): on the contrary, an actual negative prediction is negative.
The model is first evaluated for Accuracy (Accuracy), which is defined as: for a given test set, the ratio of the number of samples correctly classified by the classification model to the total number of samples is calculated as follows. The method is obtained through testing, as shown in fig. 6, and the accuracy of detection is better than that of other methods through comparison.
Utilizing a Recall rate (Recall) and False Alarm Rate (FAR) evaluation model, wherein the Recall rate is defined as Recall ratio, namely the proportion of all the parts which are correctly predicted to be positive and actually are positive; the false alarm rate is defined as the proportion of the negative samples to the total negative samples, which reflects the ability of the classifier or the model to correctly predict the purity of the positive samples, and the calculation formula is as follows.
Figure BDA0003586635070000131
Comparative example 1:
this comparative example presents a DDoS attack detection method based on multivariate data analysis, which steps are similar to the multivariate data analysis of the specific embodiment, except for the classification model (2).
Step 50201, the classification model (2) randomly selects k points from the input data point set to serve as k clustering centers { mu 12 ,…,μ k In which μ j Representing random values in the D data set.
Step 50202, initializing cluster partition C to
Figure BDA0003586635070000141
50203, for each point x in the data set i Calculating the similarity coefficient between the selected cluster center and the nearest cluster center, and calculating the similarity coefficient between the selected cluster center and the nearest cluster center i The cluster with the smallest similarity value is classified.
Step 50204, after one round of calculation is finished, the mean value of each cluster is calculated by using a mean value calculation formula and is used as a new centroid of the secondary cluster.
Step 50205, the operations of the step 50203 and the step 50204 are continuously repeated, and after t iterations, until all the k centroid vectors are unchanged, a classification model is output.
Comparative example 2:
the comparison example provides a DDoS attack detection method based on a K-means clustering algorithm, and other steps of the method are similar to those of the improved clustering algorithm of the specific embodiment, except for the multivariate data analysis (1).
Multivariate data analysis (1) uses the public intrusion detection evaluation data set to extract the data set X ═ X 1 ,x 2 ,…,x n },
Figure BDA0003586635070000142
Represents the ith flow table entry record,
Figure BDA0003586635070000143
represents the jth characteristic of the ith flow entry record. Performing multivariate feature data analysis, and utilizing a geometric correlation formula:
Figure BDA0003586635070000144
calculating to obtain a new data set: d ═ TAS 1 ,TAS 2 ,…,TAS n },
Figure BDA0003586635070000145
Comparative analysis was performed on example 2, comparative example 1, and comparative example 2, a detection model was constructed according to the proposed algorithm, and comparative tests were performed on the accuracy, the false alarm rate, and the recall rate using the same data set, and the comparative results are shown in fig. 7. As can be seen from the figure, the DDoS attack detection method based on multivariate data analysis and clustering algorithm has better results on the test set, which indicates that the detection model is feasible.
Comparative example 3:
the comparison example provides a DDoS attack detection method based on a support vector machine algorithm, and the DDoS attack detection method adopts an SVM algorithm to detect DDoS attacks, and is different in the algorithms of the two.
Comparative analysis was performed on example 1 and comparative example 3, a model was trained using a public data set, and comparative tests were performed on accuracy, false alarm rate, and recall rate, with the results shown in fig. 8. As can be seen from the figure, the DDoS attack detection method based on the multivariate data analysis and the clustering algorithm has higher accuracy and lower false alarm rate on the test set, and the detection model is feasible.
Comparative example 4:
the method preprocesses data by using characteristic standardization, then classifies by using the K-Means algorithm, and classifies by using the K-Means algorithm if data which cannot be classified exists.
Comparative example 5:
the method is used for standardizing data, then clustering is generated by using a clustering algorithm, and an abnormal value is obtained for each time sliding window model by using a time sliding window model.
The embodiment 2, the comparative example 4 and the comparative example 5 are compared and analyzed, firstly, the common point is that a model is obtained by combining the clustering algorithms, the difference lies in a data preprocessing and combining method, the result obtained by comparing the data sets is shown in fig. 9, and the accuracy of the method is further verified by the comparison result.

Claims (5)

1. A distributed denial of service attack detection method based on MCA-KMeans algorithm in SDN is characterized by comprising the following steps:
step 1, acquiring a characteristic data set comprising an entropy value of an endogenous port in a T period, an entropy value of a destination IP address in the T period and an average value of a data packet in the T period, performing characteristic standardization on data, then obtaining a group of new data sets through multivariate data analysis, training a classification model by using a clustering algorithm, and constructing a software defined network topology architecture;
step 2, a software defined network topology structure is used for connecting a controller, and data acquisition is carried out by acquiring flow table information in a switch of a data layer on a control plane;
Step 3, after the flow table information is obtained in the step 2, calculating to obtain a characteristic numerical value in a T period, and forming a data set obtained through calculation;
the characteristic numerical value comprises a source port entropy value, a destination IP address entropy value and an average value of the data packet;
step 4, aiming at each switch, using the data set obtained by calculation in the step 3 as input data, then using the data for training a clustering algorithm classification model, and obtaining a distributed denial of service attack model of multivariate data analysis and clustering algorithm after t iterations until the centroid is not changed any more;
and step 5, detecting and judging by adopting the distributed denial of service attack model of the multivariate data analysis and clustering algorithm obtained by training in the step 4, if the judgment result is malicious flow, positioning an attack target by the controller, deleting the attack flow table entry in the flow table and returning the attack flow table entry to the current flow table information of the switch.
2. The method of claim 3, wherein the eigenvalue is calculated according to the following formula:
Figure FDA0003586635060000021
in the formula:
P(x i ) Representing the number of ith source ports in the T period;
sum (p (x)) represents the total number of endogenous ports in the T cycle;
S(x i ) Indicating the number of ith source IP addresses in the T period;
sum (s (x)) represents the total number of endogenous ports in the T cycle;
packet _ i represents the size of the ith packet in the T period;
SPE represents a source port entropy value;
DAE represents a destination IP address entropy value;
APS represents the average value of the packet.
3. The method for detecting distributed denial of service attack based on MCA-KMeans algorithm in SDN of claim, wherein the specific process of detecting the distributed denial of service attack model of multivariate data analysis and clustering algorithm in step 5 comprises:
step 501, analyzing the relevance of the metadata:
extracting a data set X ═ X by using a public intrusion detection evaluation data set 1 ,x 2 ,...,x n };
In the formula:
x i represents the ith flow entry record;
Figure FDA0003586635060000022
Figure FDA0003586635060000023
a jth feature representing an ith flow entry record;
n represents the total number of rows of the data set;
performing multivariate data analysis, and using a geometric correlation formula:
Figure FDA0003586635060000024
calculating to obtain a new data set: d ═ TAS 1 ,TAS 2 ,...,TAS n },
Figure FDA0003586635060000025
In the formula:
Figure FDA0003586635060000031
a correlation coefficient representing the j-th characteristic and the k-th characteristic of the ith flow table entry record;
TAS i the ith row of data which represents the new data set D obtained by the calculation of the ith flow table entry record;
n represents the total number of rows representing the data set;
j represents the jth eigenvalue;
k represents the kth characteristic value;
i represents the ith data line;
step 502, the clustering algorithm model comprises the following specific steps:
step 50201, randomly selecting k points from the input data point set as k cluster centers (mu) 1 ,μ 2 ,…,μ k }; in the formula: mu.s k Representing random values in the D data set;
step 50202, initializing cluster partition C to
Figure FDA0003586635060000032
50203, for each point x in the data set i Calculating the similarity coefficient between the selected cluster center and the nearest cluster center, and calculating the similarity coefficient between the selected cluster center and the nearest cluster center i Classifying the cluster with the minimum similarity value;
step 50204, calculating the mean value of each cluster by using a mean value calculation formula after one round of calculation is finished, and taking the mean value as a new centroid of the secondary cluster;
step 50205, continuously repeating the operation of the step 50203 and the operation of the step 50204, and outputting a classification model after t iterations until all k centroid vectors are unchanged; the specific formula used is as follows:
Figure FDA0003586635060000033
in the formula:
SCM indicates that the similarity coefficient of the two vectors is calculated;
μn j representing computing a new cluster centroid;
μ j T a transposed vector representing a centroid vector;
n represents the total number of rows of the data set;
i represents the ith data line;
j denotes the jth centroid.
4. The method of claim, wherein the software-defined network topology comprises one RYU controller, 4 switches and 25 hosts.
5. The method of claim, wherein each of the switches is an OpenFlow switch in a software-defined network.
CN202210367801.6A 2022-04-08 2022-04-08 Distributed denial of service attack detection method based on MCA-KMeans algorithm in SDN Pending CN114844679A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210367801.6A CN114844679A (en) 2022-04-08 2022-04-08 Distributed denial of service attack detection method based on MCA-KMeans algorithm in SDN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210367801.6A CN114844679A (en) 2022-04-08 2022-04-08 Distributed denial of service attack detection method based on MCA-KMeans algorithm in SDN

Publications (1)

Publication Number Publication Date
CN114844679A true CN114844679A (en) 2022-08-02

Family

ID=82564455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210367801.6A Pending CN114844679A (en) 2022-04-08 2022-04-08 Distributed denial of service attack detection method based on MCA-KMeans algorithm in SDN

Country Status (1)

Country Link
CN (1) CN114844679A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108848095A (en) * 2018-06-22 2018-11-20 安徽大学 The detection of server ddos attack and defence method under SDN environment based on double entropys
US20210029158A1 (en) * 2017-08-16 2021-01-28 Samsung Electronics Co., Ltd. Device and method for handling network attacks in software defined network
CN113452695A (en) * 2021-06-25 2021-09-28 中国舰船研究设计中心 DDoS attack detection and defense method in SDN environment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210029158A1 (en) * 2017-08-16 2021-01-28 Samsung Electronics Co., Ltd. Device and method for handling network attacks in software defined network
CN108848095A (en) * 2018-06-22 2018-11-20 安徽大学 The detection of server ddos attack and defence method under SDN environment based on double entropys
CN113452695A (en) * 2021-06-25 2021-09-28 中国舰船研究设计中心 DDoS attack detection and defense method in SDN environment

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
EEENKIDU: "挖掘建模③—聚类分析(包括相关性分析、雷达图等)及python实现", 《CSDN》 *
刘向举等: "基于软件定义物联网的分布式拒绝服务攻击检测方法", 《计算机应用》 *
刘运等: "基于k-Means改进算法的分布式拒绝服务攻击检测", 《计算机工程与科学》 *
刘鹏程: "软件定义物联网环境下的DDoS攻击检测与防御方法研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Similar Documents

Publication Publication Date Title
Sangkatsanee et al. Practical real-time intrusion detection using machine learning approaches
Wang et al. Identifying intrusions in computer networks with principal component analysis
Chkirbene et al. Hybrid machine learning for network anomaly intrusion detection
CN111107102A (en) Real-time network flow abnormity detection method based on big data
CN103368979B (en) Network security verifying device based on improved K-means algorithm
CN113079143A (en) Flow data-based anomaly detection method and system
Peng et al. Network intrusion detection based on deep learning
CN111709022B (en) Hybrid alarm association method based on AP clustering and causal relationship
Sarwar et al. Design of an advance intrusion detection system for IoT networks
Maslan et al. Feature selection for DDoS detection using classification machine learning techniques
CN110719270A (en) FCM algorithm-based slow denial of service attack detection method
Abdulrazaq et al. Combination of multi classification algorithms for intrusion detection system
CN111294342A (en) Method and system for detecting DDos attack in software defined network
Ibrahim et al. Performance comparison of intrusion detection system using three different machine learning algorithms
CN115795330A (en) Medical information anomaly detection method and system based on AI algorithm
Xia et al. An abnormal traffic detection method for IoT devices based on federated learning and depthwise separable convolutional neural networks
CN113132352B (en) Router threat perception method and system based on flow statistical characteristics
CN114531283A (en) Method, system, storage medium and terminal for measuring robustness of intrusion detection model
Manandhar et al. Towards practical anomaly-based intrusion detection by outlier mining on TCP packets
CN116527307A (en) Botnet detection algorithm based on community discovery
Ren et al. Application of network intrusion detection based on fuzzy c-means clustering algorithm
CN114844679A (en) Distributed denial of service attack detection method based on MCA-KMeans algorithm in SDN
Zargar et al. Selection of effective network parameters in attacks for intrusion detection
Dayanandam et al. Regression algorithms for efficient detection and prediction of DDoS attacks
Balakin et al. Detection of computer attacks using outliner method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220802