CN113162926B - KNN-based network attack detection attribute weight analysis method - Google Patents

KNN-based network attack detection attribute weight analysis method Download PDF

Info

Publication number
CN113162926B
CN113162926B CN202110419085.7A CN202110419085A CN113162926B CN 113162926 B CN113162926 B CN 113162926B CN 202110419085 A CN202110419085 A CN 202110419085A CN 113162926 B CN113162926 B CN 113162926B
Authority
CN
China
Prior art keywords
sample
knn
training
knn model
attack
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110419085.7A
Other languages
Chinese (zh)
Other versions
CN113162926A (en
Inventor
张留美
邓茜
王一川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Shiyou University
Original Assignee
Xian Shiyou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Shiyou University filed Critical Xian Shiyou University
Priority to CN202110419085.7A priority Critical patent/CN113162926B/en
Publication of CN113162926A publication Critical patent/CN113162926A/en
Application granted granted Critical
Publication of CN113162926B publication Critical patent/CN113162926B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1458Denial of Service
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a KNN-based network attack detection attribute weight analysis method, which comprises the following steps: step 1, downloading a DDoS data set and recording as a sample A; step 2, processing the sample A obtained in the step 1, converting the sample A into a file with a suffix name of csv format, and naming the file as a sample I; step 3, classifying the label columns in the sample I obtained in the step 2 by 0 and 1; step 4, preprocessing the sample II obtained in the step 3 to obtain a sample V; and 5, dividing the sample five obtained in the step 4 into a training set and a testing set, inputting the training set into the KNN model for training, adjusting adjustable parameters to obtain a trained KNN model, inputting the testing set into the trained KNN model for testing, and checking the accuracy. The method solves the problems of low data processing speed and small amount of obtained information in the existing method.

Description

一种基于KNN的网络攻击检测属性权重分析方法An attribute weight analysis method for network attack detection based on KNN

技术领域technical field

本发明属于网络攻击检测技术领域,涉及一种基于KNN的网络攻击检测属性权重分析方法。The invention belongs to the technical field of network attack detection, and relates to a KNN-based network attack detection attribute weight analysis method.

背景技术Background technique

随着互联网的快速发展和广泛普及,网络入侵的种类与数量越来越多,网络入侵事件出现得更加频繁。在互联网信息时代中,计算机处理信息的能力加快,同时越来越多的网络攻击针对公众的个人信息情况,这种网络攻击造成了社会经济损失和个人心理恐慌,个人和企业以及政府越来越重视网络安全。而网络攻击中常见的攻击方式是分布式拒绝服务攻击,即DDoS攻击。With the rapid development and widespread popularization of the Internet, there are more and more types and quantities of network intrusions, and network intrusion events appear more frequently. In the age of Internet information, the ability of computers to process information is accelerated, and more and more cyber attacks are aimed at the public's personal information. Such cyber attacks have caused social and economic losses and personal psychological panic. Take network security seriously. A common attack method in network attacks is a distributed denial of service attack, that is, a DDoS attack.

DDoS是当今互联网最重要的威胁之一。DDoS攻击是指攻击者通过控制多台计算机,向攻击目标发送大量持续地请求,使攻击目标无法回应合法用户正常访问资源的请求,给攻击目标带来巨大的损失。DDoS攻击的对象主要是网站和服务器,通过消耗服务器的资源,其中包括CPU、内存和网络带宽等。此外,DDoS也可以对网络基础设施进行攻击,通过巨大的攻击流量,其中包括路由器、交换机等,可以导致攻击目标所在的网络性能大幅下降甚至瘫痪。DDoS is one of the most important threats to the Internet today. DDoS attack means that the attacker sends a large number of continuous requests to the attack target by controlling multiple computers, so that the attack target cannot respond to legitimate users' requests for normal access to resources, which brings huge losses to the attack target. The main targets of DDoS attacks are websites and servers, by consuming server resources, including CPU, memory, and network bandwidth. In addition, DDoS can also attack the network infrastructure. Through huge attack traffic, including routers, switches, etc., the performance of the network where the attack target is located can be greatly reduced or even paralyzed.

DDoS攻击的原理可以理解为,攻击者通过黑客手段将网络上的大量计算机劫持并控制,对目标发起攻击。因而,这种攻击也被称作分布式攻击。常见的攻击方式有三种:第一种是SYN Flood攻击,利用TCP协议的三次握手,由于请求IP地址是假冒的,第三次握手包一直得不到确认,服务器一直处于半连接的状态,直到将等待队列塞满,服务器无法提供正常服务;第二种是UDP Flood攻击,利用UDP的无连接性,通过发送大量UDP小包使攻击目标无法提供正常服务;第三种是CC攻击,一般用于网站攻击,通过发送数据包使网站无法正常访问。The principle of a DDoS attack can be understood as the attacker hijacks and controls a large number of computers on the network by means of hackers, and launches an attack on the target. Therefore, this attack is also called a distributed attack. There are three common attack methods: the first is the SYN Flood attack, which uses the three-way handshake of the TCP protocol. Since the requested IP address is fake, the third handshake packet has not been confirmed, and the server has been in a semi-connected state until The waiting queue is full, and the server cannot provide normal services; the second is the UDP flood attack, which uses the connectionless nature of UDP to send a large number of UDP small packets to make the target unable to provide normal services; the third is the CC attack, which is generally used for Website attack, which makes the website unable to visit normally by sending data packets.

想要更好的规范网络安全,需要对DDoS数据集进行分析。分析数据集中的属性对网络安全的作用,可以获得目标网络中是否发生的DDoS攻击。常见的DDoS数据集有CAIDADDoS Attack 2007,CIC-IDS2018,KDD等等。以KDD Cup 99为例,是一个用来从正常连接中监测非正常连接的数据集。数据集中有41个属性和一个标签列。41个属性可分为TCP连接的基本特征;TCP连接的内容特征;基于时间的网络流量统计特征,使用2秒的时间窗;基于主机的网络流量统计特征,主机特征,用来评估持续时间在两秒钟以上的攻击。由于KDD Cup99存在类别不均衡的问题,NSL-KDD是KDD Cup 99数据集的重采样版本。研究者需要根据自己的实际需要和目的选择合适的属性进行预处理,再选择合适的算法进行分析。大多数研究者采用的数据集具有数据量小,属性少等特点,选择部分属性结合传统的方法就可以满足要求。但是随着科技的进步,5G技术的出现以及物联网和人工智能等技术的逐渐发展,数据集规模越来越大,属性越来越多,再使用传统的数据处理方法就会出现一系列的问题包括处理速度慢、效率低、得到的信息量少。To better regulate network security, DDoS datasets need to be analyzed. By analyzing the effect of attributes in the dataset on network security, we can obtain whether DDoS attacks have occurred in the target network. Common DDoS datasets are CAIDADDoS Attack 2007, CIC-IDS2018, KDD and so on. Taking KDD Cup 99 as an example, it is a dataset used to detect abnormal connections from normal connections. There are 41 attributes and a label column in the dataset. The 41 attributes can be divided into basic characteristics of TCP connections; content characteristics of TCP connections; time-based network traffic statistics characteristics, using a time window of 2 seconds; host-based network traffic statistics characteristics, host characteristics, used to evaluate the duration in Attack for more than two seconds. Due to the class imbalance problem of KDD Cup99, NSL-KDD is a resampled version of the KDD Cup99 dataset. Researchers need to select appropriate attributes for preprocessing according to their actual needs and purposes, and then select appropriate algorithms for analysis. The datasets used by most researchers have the characteristics of small amount of data and few attributes. Selecting some attributes combined with traditional methods can meet the requirements. However, with the advancement of science and technology, the emergence of 5G technology and the gradual development of technologies such as the Internet of Things and artificial intelligence, the scale of data sets is getting larger and larger, and there are more and more attributes. Using traditional data processing methods, a series of data sets will appear. Problems include slow processing, inefficiency, and little information available.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种基于KNN的网络攻击检测属性权重分析方法,解决了现有方法数据处理速度慢、得到的信息量少的问题。The purpose of the present invention is to provide a KNN-based network attack detection attribute weight analysis method, which solves the problems of slow data processing speed and small amount of information obtained by the existing method.

本发明所采用的技术方案是,一种基于KNN的网络攻击检测属性权重分析方法,具体按照以下步骤实施:The technical solution adopted by the present invention is a KNN-based network attack detection attribute weight analysis method, which is specifically implemented according to the following steps:

步骤1,下载DDoS数据集,记为样本A;Step 1, download the DDoS dataset, denoted as sample A;

步骤2,对步骤1得到的样本A进行处理,转化为后缀名为.csv格式文件,命名为样本一;Step 2, process the sample A obtained in step 1, convert it into a file with a suffix named .csv, and name it as sample 1;

步骤3,将步骤2得到的样本一中的标签列进行0,1分类;Step 3, classify the label column in sample 1 obtained in step 2 by 0, 1;

步骤4,将经步骤3得到的样本二进行预处理,得到样本五;Step 4, preprocess the sample 2 obtained in step 3 to obtain sample 5;

步骤5,将步骤4得到的样本五划分为训练集和测试集,将训练集输入KNN模型进行训练,并调整可调参数,得到训练好的KNN模型,将测试集输入训练好的KNN模型进行测试,查看准确率。Step 5: Divide the sample 5 obtained in step 4 into a training set and a test set, input the training set into the KNN model for training, and adjust the adjustable parameters to obtain a trained KNN model, and input the test set into the trained KNN model for training. Test to see accuracy.

本发明的特征还在于,The present invention is also characterized in that,

步骤2的具体过程为:对步骤1得到的样本A通过只读的形式打开文件,去掉样本A中每一行的空格,再利用分隔符对字符串进行切片,转化为后缀名为.csv格式文件,命名为样本一。The specific process of step 2 is: open the file in the form of read-only for sample A obtained in step 1, remove the spaces in each line of sample A, and then use the delimiter to slice the string, and convert it into a file with a suffix named .csv format. , named Sample One.

步骤3的具体过程为:The specific process of step 3 is:

步骤3.1,从步骤2得到的样本一中挑选出protocal是tcp,攻击类型是DoS和正常的数据,并将正常数据的标签列设置为1,DoS攻击的标签列设置为0,命名为样本二;Step 3.1, select from the sample 1 obtained in step 2 that the protocol is tcp, the attack type is DoS and normal data, and the label column of normal data is set to 1, and the label column of DoS attack is set to 0, named as sample 2 ;

步骤3.2,统计步骤3.1得到的样本二中正常流量和异常流量的数目,并借助matplotlib.pyplot可视化显示柱状图。Step 3.2, count the number of normal flow and abnormal flow in the second sample obtained in step 3.1, and visualize the histogram with the help of matplotlib.pyplot.

步骤4的具体过程为:The specific process of step 4 is:

步骤4.1,将步骤3得到的样本二,通过pandas函数读入,并去掉步骤2中的分隔符,得到样本三;In step 4.1, the second sample obtained in step 3 is read in through the pandas function, and the separator in step 2 is removed to obtain sample three;

步骤4.2,将步骤4.1得到的样本三,通过shape查看行数,取前60%行作为样本四;Step 4.2, use the sample 3 obtained in step 4.1, check the number of rows by shape, and take the first 60% of the rows as sample 4;

步骤4.3,对步骤4.2得到的样本四每一属性命名,并按列进行归一化处理;Step 4.3, name each attribute of sample 4 obtained in step 4.2, and normalize it by column;

步骤4.4,对步骤4.3处理后的样本四依次选择每一个属性和标签列分别记为x和y,对x转化为二维数组并命名为X,y转化为一维数组并命名为Y,X和Y拼接成样本五。Step 4.4, for the sample 4 processed in step 4.3, select each attribute and label column as x and y in turn, convert x into a two-dimensional array and name it X, and convert y into a one-dimensional array and name it as Y, X and Y are spliced into sample five.

步骤5的具体过程为:The specific process of step 5 is:

步骤5.1,明确KNN模型中weights和leaf_size参数的数值;Step 5.1, clarify the values of the weights and leaf_size parameters in the KNN model;

步骤5.2,将步骤4得到的样本五划分为训练集和测试集,将训练集输入KNN模型进行训练,并将n_neighbors按照整数自定义范围依次取值,在自定义范围中准确率最高的,即为最优KNN模型;Step 5.2: Divide the sample 5 obtained in step 4 into a training set and a test set, input the training set into the KNN model for training, and set n_neighbors according to the integer custom range in turn, and the one with the highest accuracy in the custom range is is the optimal KNN model;

步骤5.3,将步骤5.2中的测试集输入最优KNN模型中进行测试,查看准确率。Step 5.3, input the test set in step 5.2 into the optimal KNN model for testing to check the accuracy.

步骤5.2中,训练集和测试集的样本量比为7:3。In step 5.2, the sample size ratio of training set and test set is 7:3.

本发明的有益效果是,本发明一种基于KNN的网络攻击检测属性权重分析方法,针对DDoS数据集进行攻击检测,选择部分属性进行分析,借助机器学习KNN模型进行每个属性准确率分析,快速选择准确率最高的部分属性,能够及时检测是否发生DDoS攻击,具有很强的参考性,并且数据处理速度快、得到的信息量全面。The beneficial effect of the present invention is that the present invention is a KNN-based network attack detection attribute weight analysis method, which can detect attacks on DDoS data sets, select some attributes for analysis, and use the machine learning KNN model to analyze the accuracy of each attribute. Selecting some attributes with the highest accuracy rate can detect whether DDoS attacks occur in time, which has a strong reference, and the data processing speed is fast and the amount of information obtained is comprehensive.

附图说明Description of drawings

图1是本发明一种基于KNN的网络攻击检测属性权重分析方法中的算法流程图;Fig. 1 is the algorithm flow chart in a kind of KNN-based network attack detection attribute weight analysis method of the present invention;

图2是本发明一种基于KNN的网络攻击检测属性权重分析方法中统计样本二中正常流量和异常流量柱状图。FIG. 2 is a histogram of normal traffic and abnormal traffic in the second statistical sample in a KNN-based network attack detection attribute weight analysis method of the present invention.

具体实施方式Detailed ways

下面结合附图和具体实施方式对本发明进行详细说明。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

本发明提供一种基于KNN的网络攻击检测属性权重分析方法,如图1所示,具体按照以下步骤实施:The present invention provides a KNN-based network attack detection attribute weight analysis method, as shown in FIG. 1 , which is specifically implemented according to the following steps:

步骤1,下载DDoS数据集,记为样本A;Step 1, download the DDoS dataset, denoted as sample A;

步骤2,对步骤1得到的样本A通过只读的形式打开文件,去掉样本A中每一行的空格,再利用分隔符对字符串进行切片,转化为后缀名为.csv格式文件,命名为样本一;Step 2: Open the file in the form of read-only for the sample A obtained in step 1, remove the spaces in each line of the sample A, and then use the delimiter to slice the string, convert it into a .csv format file with a suffix, and name it as the sample one;

步骤3,将步骤2得到的样本一中的标签列进行0,1分类;Step 3, classify the label column in sample 1 obtained in step 2 by 0, 1;

步骤3.1,从步骤2得到的样本一中挑选出protocal是tcp,攻击类型是DoS和正常的数据,并将正常数据的标签列设置为1,DoS攻击的标签列设置为0,命名为样本二;Step 3.1, select from the sample 1 obtained in step 2 that the protocol is tcp, the attack type is DoS and normal data, and the label column of normal data is set to 1, and the label column of DoS attack is set to 0, named as sample 2 ;

步骤3.2,统计步骤3.1得到的样本二中正常流量和异常流量的数目,并借助matplotlib.pyplot可视化显示柱状图;Step 3.2, count the number of normal flow and abnormal flow in sample 2 obtained in step 3.1, and visualize the histogram with the help of matplotlib.pyplot;

步骤4,将经步骤3得到的样本二进行预处理,得到样本五;Step 4, preprocess the sample 2 obtained in step 3 to obtain sample 5;

步骤4.1,将步骤3得到的样本二,通过pandas函数读入,并去掉步骤2中的分隔符,得到样本三;In step 4.1, the second sample obtained in step 3 is read in through the pandas function, and the separator in step 2 is removed to obtain sample three;

步骤4.2,将步骤4.1得到的样本三,通过shape查看行数,取前60%行作为样本四;Step 4.2, use the sample 3 obtained in step 4.1, check the number of rows by shape, and take the first 60% of the rows as sample 4;

步骤4.3,对步骤4.2得到的样本四根据DDoS数据集官方介绍进行每一属性命名,并按列进行归一化处理;Step 4.3: Name each attribute of sample 4 obtained in step 4.2 according to the official introduction of the DDoS dataset, and normalize it by column;

步骤4.4,对步骤4.3处理后的样本四依次选择每一个属性和标签列分别记为x和y,对x转化为二维数组并命名为X,y转化为一维数组并命名为Y,X和Y拼接成样本五;Step 4.4, for the sample 4 processed in step 4.3, select each attribute and label column as x and y in turn, convert x into a two-dimensional array and name it X, and convert y into a one-dimensional array and name it as Y, X Splicing with Y to form sample five;

步骤5,将步骤4得到的样本五划分为训练集和测试集,训练集和测试集的样本量比为7:3,将训练集输入KNN模型进行训练,并调整可调参数,得到训练好的KNN模型,将测试集输入训练好的KNN模型进行测试,查看准确率;Step 5: Divide the sample 5 obtained in step 4 into a training set and a test set. The ratio of the sample size of the training set and the test set is 7:3. Input the training set into the KNN model for training, and adjust the adjustable parameters to get the training set. Enter the KNN model of the test set into the trained KNN model for testing to check the accuracy;

步骤5.1,明确KNN模型中weights和leaf_size参数的数值;Step 5.1, clarify the values of the weights and leaf_size parameters in the KNN model;

步骤5.2,将步骤4得到的样本五划分为训练集和测试集,将训练集输入KNN模型进行训练,并将n_neighbors按照整数自定义范围依次取值,在自定义范围中准确率最高的,即为最优KNN模型;Step 5.2: Divide the sample 5 obtained in step 4 into a training set and a test set, input the training set into the KNN model for training, and set n_neighbors according to the integer custom range in turn, and the one with the highest accuracy in the custom range is is the optimal KNN model;

步骤5.3,将步骤5.2中的测试集输入最优KNN模型中进行测试,查看准确率,通过准确率了解KNN模型预测效果,以及时检测到是否发生DDoS攻击。Step 5.3, input the test set in step 5.2 into the optimal KNN model for testing, check the accuracy rate, understand the prediction effect of the KNN model through the accuracy rate, and detect whether a DDoS attack occurs in time.

实施例Example

步骤1,下载DDoS数据集KDD99(Data Mining and Knowledge Discovery Cup1999DataSet),记为样本A;Step 1. Download the DDoS dataset KDD99 (Data Mining and Knowledge Discovery Cup1999DataSet), denoted as sample A;

步骤2,对步骤1得到的样本A通过只读的形式打开文件,去掉样本A中每一行的空格,再利用分隔符对字符串进行切片,转化为后缀名为.csv格式文件,命名为样本一;Step 2: Open the file in the form of read-only for the sample A obtained in step 1, remove the spaces in each line of the sample A, and then use the delimiter to slice the string, convert it into a .csv format file with a suffix, and name it as the sample one;

步骤3.1,从步骤2得到的样本一中挑选出protocal是tcp的数据,攻击类型是DoS和正常的数据,并将正常数据的标签列设置为1,DoS攻击的标签列设置为0,命名为样本二;Step 3.1, from the sample 1 obtained in step 2, select the data whose protocal is tcp, the attack type is DoS and normal data, and set the label column of normal data to 1 and the label column of DoS attack to 0, named as sample two;

步骤3.2,统计步骤3.1得到的样本二中正常流量和异常流量的数目,并借助matplotlib.pyplot可视化显示柱状图,如图2所示,从图2中可以看出,横坐标显示包括正常流量、异常流量以及总流量,纵坐标是各流量具体的数目,则正常流量的数目为768670,异常流量的数目为1074241,总流量的数目为1842911;Step 3.2, count the number of normal flow and abnormal flow in sample 2 obtained in step 3.1, and use matplotlib.pyplot to visualize the bar chart, as shown in Figure 2. As can be seen from Figure 2, the abscissa display includes normal flow, Abnormal flow and total flow, the ordinate is the specific number of each flow, then the number of normal flow is 768670, the number of abnormal flow is 1074241, and the number of total flow is 1842911;

步骤4.1,将步骤3得到的样本二,通过pandas函数读入,并去掉步骤2中的分隔符,得到样本三;In step 4.1, the second sample obtained in step 3 is read in through the pandas function, and the separator in step 2 is removed to obtain sample three;

步骤4.2,将步骤4.1得到的样本三,通过shape查看行数,取前60%行作为样本四;Step 4.2, use the sample 3 obtained in step 4.1, check the number of rows by shape, and take the first 60% of the rows as sample 4;

步骤4.3,对步骤4.2得到的样本四41个根据数据集官方介绍进行属性命名,并按列进行归一化处理,第42列命名为attack_type;In step 4.3, the 41 samples obtained in step 4.2 are named according to the official introduction of the dataset, and are normalized by column. The 42nd column is named attack_type;

步骤4.4,对步骤4.3处理后的样本四并依次选择count,same_srv_rate,dst_host_serror_rate,dst_host_srv_serror_rate四个属性和attack_type列分别记为x和y,对x转化为二维数组并命名为X,y转化为一维数组并命名为Y,X和Y拼接成样本五;Step 4.4, select the four attributes of count, same_srv_rate, dst_host_serror_rate, dst_host_srv_serror_rate and the attack_type column for the sample four processed in step 4.3 and record them as x and y respectively, convert x into a two-dimensional array and name it X, and y into a dimensional array and named Y, X and Y are concatenated into sample five;

步骤5.1,明确KNN模型中weights和leaf_size参数的数值,weights取值有uniform和distance,两种取值预测准确率结果非常接近影响,leaf_size在29和30以后预测准确率趋势趋于平缓,本实施例去weights='uniform',leaf_size=30;Step 5.1, clarify the values of the weights and leaf_size parameters in the KNN model. The weights values are uniform and distance. The prediction accuracy results of the two values are very close to the impact. After leaf_size is 29 and 30, the prediction accuracy trend tends to be flat. This implementation Example go weights='uniform', leaf_size=30;

步骤5.2,将步骤4得到的样本五划分为训练集和测试集,训练集和测试集的样本量比为7:3,将训练集输入KNN模型进行训练,n_neighbors=5,得到训练好的KNN模型,查看准确率,准确率为0.9981128890282283,运行时间为1340.890951秒;Step 5.2: Divide the sample 5 obtained in step 4 into a training set and a test set. The ratio of the sample size of the training set to the test set is 7:3. Input the training set into the KNN model for training, n_neighbors=5, and get the trained KNN Model, check the accuracy rate, the accuracy rate is 0.9981128890282283, and the running time is 1340.890951 seconds;

步骤5.3,将步骤5.2中的测试集输入训练好的KNN模型中进行测试,查看准确率。Step 5.3, input the test set in step 5.2 into the trained KNN model for testing to check the accuracy.

运行时间的表达为:假如经步骤3得到的样本二有N个样本,而且每个样本的特征为D维的向量,为了做预测需要循环所有的训练样本,时间复杂度为O(N)。另外,当我们计算两个样本之间距离的时候,这个复杂度就依赖于样本的特征维度,则时间复杂度为O(D);属性每次只选择一个,时间复杂度为O(1),把循环样本的过程看做是外层循环,计算样本之间距离看作是内层循环,所以时间复杂度为O(N*1),预测D个属性,所以总的时间复杂度为O(N*D)。The running time is expressed as: if the second sample obtained in step 3 has N samples, and the feature of each sample is a D-dimensional vector, in order to make predictions, all training samples need to be looped, and the time complexity is O(N). In addition, when we calculate the distance between two samples, the complexity depends on the feature dimension of the sample, so the time complexity is O(D); only one attribute is selected at a time, and the time complexity is O(1) , the process of looping samples is regarded as an outer loop, and the calculation of the distance between samples is regarded as an inner loop, so the time complexity is O(N*1), and D attributes are predicted, so the total time complexity is O (N*D).

对照例Control example

步骤4.4,对步骤4.3处理后的样本四并随机选择num_shells,num_root,dst_host_srv_diff_host_rate,srv_diff_host_rate四个属性和attack_type列分别记为x和y,对x转化为二维数组并命名为X,y转化为一维数组并命名为Y,X和Y拼接成样本六;Step 4.4, randomly select the four attributes of num_shells, num_root, dst_host_srv_diff_host_rate, srv_diff_host_rate and the attack_type column for the sample four processed in step 4.3 and record them as x and y respectively, convert x into a two-dimensional array and name it as X, and convert y into a dimensional array and named Y, X and Y are concatenated into sample six;

步骤5.2,将步骤4得到的样本六划分为训练集和测试集,训练集和测试集的样本量比为7:3,将训练集输入KNN模型进行训练,n_neighbors=5,得到训练好的KNN模型,查看准确率,准确率为0.8638808165824601,运行时间为3248.227169秒;Step 5.2: Divide the sample 6 obtained in step 4 into a training set and a test set. The ratio of the sample size of the training set to the test set is 7:3. Input the training set into the KNN model for training, n_neighbors=5, and get the trained KNN Model, check the accuracy, the accuracy is 0.8638808165824601, and the running time is 3248.227169 seconds;

其余步骤和实施例相同。The rest of the steps are the same as in the embodiment.

由此可以看出,本发明实施例准确率相比对照例随机选取四个属性提高15.528%,运行时间比提高58.719%。借助机器学习KNN模型分别预测DDoS攻击属性准确率,对于研究者来说可以快速选择准确率最高的部分属性,及时检测到是否发生DDoS攻击,具有很强的参考性。From this, it can be seen that the accuracy of the embodiment of the present invention is increased by 15.528% compared to the control example by randomly selecting four attributes, and the running time ratio is increased by 58.719%. With the help of the machine learning KNN model to predict the accuracy of DDoS attack attributes, researchers can quickly select some attributes with the highest accuracy, and detect whether a DDoS attack occurs in time, which has a strong reference.

Claims (2)

1.一种基于KNN的网络攻击检测属性权重分析方法,其特征在于,具体按照以下步骤实施:1. a network attack detection attribute weight analysis method based on KNN, is characterized in that, is specifically implemented according to the following steps: 步骤1,下载DDoS数据集,记为样本A;Step 1, download the DDoS dataset, denoted as sample A; 步骤2,对步骤1得到的样本A进行处理,转化为后缀名为.csv格式文件,命名为样本一;Step 2, process the sample A obtained in step 1, convert it into a file with a suffix named .csv, and name it as sample 1; 具体过程为:对步骤1得到的样本A通过只读的形式打开文件,去掉样本A中每一行的空格,再利用分隔符对字符串进行切片,转化为后缀名为.csv格式文件,命名为样本一;The specific process is: open the file in the form of read-only for the sample A obtained in step 1, remove the spaces in each line of the sample A, and then use the delimiter to slice the string, and convert it into a .csv format file with a suffix named as sample one; 步骤3,将步骤2得到的样本一中的标签列进行0,1分类;Step 3, classify the label column in sample 1 obtained in step 2 by 0, 1; 具体过程为:The specific process is: 步骤3.1,从步骤2得到的样本一中挑选出protocal是tcp,攻击类型是DoS和正常数据,并将正常流量的标签列设置为1,DoS攻击的标签列设置为0,命名为样本二;Step 3.1, select from the sample 1 obtained in step 2 that the protocol is tcp, the attack type is DoS and normal data, and the label column of normal traffic is set to 1, and the label column of DoS attack is set to 0, named as sample 2; 步骤3.2,统计步骤3.1得到的样本二中正常流量和异常流量的数目,并借助matplotlib.pyplot可视化显示柱状图;Step 3.2, count the number of normal flow and abnormal flow in sample 2 obtained in step 3.1, and visualize the histogram with the help of matplotlib.pyplot; 步骤4,将经步骤3得到的样本二进行预处理,得到样本五;Step 4, preprocess the sample 2 obtained in step 3 to obtain sample 5; 具体过程为:The specific process is: 步骤4.1,将步骤3得到的样本二,通过pandas函数读入,并去掉步骤2中的分隔符,得到样本三;In step 4.1, the second sample obtained in step 3 is read in through the pandas function, and the separator in step 2 is removed to obtain sample three; 步骤4.2,将步骤4.1得到的样本三,通过shape查看行数,取前60%行作为样本四;Step 4.2, use the sample 3 obtained in step 4.1, check the number of rows by shape, and take the first 60% of the rows as sample 4; 步骤4.3,对步骤4.2得到的样本四每一属性命名,并按列进行归一化处理;Step 4.3, name each attribute of sample 4 obtained in step 4.2, and normalize it by column; 步骤4.4,对步骤4.3处理后的样本四依次选择每一个属性和标签列分别记为x和y,对x转化为二维数组并命名为X,y转化为一维数组并命名为Y,X和Y拼接成样本五;Step 4.4, for the sample 4 processed in step 4.3, select each attribute and label column as x and y in turn, convert x into a two-dimensional array and name it X, and convert y into a one-dimensional array and name it as Y, X Splicing with Y to form sample five; 步骤5,将步骤4得到的样本五划分为训练集和测试集,将训练集输入KNN模型进行训练,并调整可调参数,得到训练好的KNN模型,将测试集输入训练好的KNN模型进行测试,查看准确率;Step 5: Divide the sample 5 obtained in step 4 into a training set and a test set, input the training set into the KNN model for training, and adjust the adjustable parameters to obtain a trained KNN model, and input the test set into the trained KNN model for training. Test, check the accuracy; 具体过程为:The specific process is: 步骤5.1,明确KNN模型中weights和leaf_size参数的数值;Step 5.1, clarify the values of the weights and leaf_size parameters in the KNN model; 步骤5.2,将步骤4得到的样本五划分为训练集和测试集,将训练集输入KNN模型进行训练,并将n_neighbors按照整数自定义范围依次取值,在自定义范围中准确率最高的,即为最优KNN模型;Step 5.2: Divide the sample 5 obtained in step 4 into a training set and a test set, input the training set into the KNN model for training, and set n_neighbors according to the integer custom range in turn, and the one with the highest accuracy in the custom range is is the optimal KNN model; 步骤5.3,将步骤5.2中的测试集输入最优KNN模型中进行测试,查看准确率。Step 5.3, input the test set in step 5.2 into the optimal KNN model for testing to check the accuracy. 2.根据权利要求1所述的一种基于KNN的网络攻击检测属性权重分析方法,其特征在于,所述步骤5.2中,训练集和测试集的样本量比为7:3。2. A KNN-based network attack detection attribute weight analysis method according to claim 1, characterized in that, in the step 5.2, the sample size ratio of the training set and the test set is 7:3.
CN202110419085.7A 2021-04-19 2021-04-19 KNN-based network attack detection attribute weight analysis method Active CN113162926B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110419085.7A CN113162926B (en) 2021-04-19 2021-04-19 KNN-based network attack detection attribute weight analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110419085.7A CN113162926B (en) 2021-04-19 2021-04-19 KNN-based network attack detection attribute weight analysis method

Publications (2)

Publication Number Publication Date
CN113162926A CN113162926A (en) 2021-07-23
CN113162926B true CN113162926B (en) 2022-08-26

Family

ID=76868851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110419085.7A Active CN113162926B (en) 2021-04-19 2021-04-19 KNN-based network attack detection attribute weight analysis method

Country Status (1)

Country Link
CN (1) CN113162926B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423580A (en) * 2017-04-01 2017-12-01 吉林大学 Grand genomic fragment attribute reduction and sorting technique based on neighborhood rough set
CN108632279A (en) * 2018-05-08 2018-10-09 北京理工大学 A kind of multilayer method for detecting abnormality based on network flow
CN108769079A (en) * 2018-07-09 2018-11-06 四川大学 A kind of Web Intrusion Detection Techniques based on machine learning
CN109873833A (en) * 2019-03-11 2019-06-11 浙江工业大学 A data injection attack detection method based on chi-square distance KNN
CN110929801A (en) * 2019-12-02 2020-03-27 武汉大学 Improved Euclid distance KNN classification method and system
CN111598163A (en) * 2020-05-14 2020-08-28 中南大学 Radar HRRP target recognition method based on stacking integrated learning method
CN111614576A (en) * 2020-06-02 2020-09-01 国网山西省电力公司电力科学研究院 A method and system for network data traffic identification based on wavelet analysis and support vector machine

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108289104B (en) * 2018-02-05 2020-07-17 重庆邮电大学 An industrial SDN network DDoS attack detection and mitigation method
CN110213280A (en) * 2019-06-10 2019-09-06 湘潭大学 Ddos attack detection method based on LDMDBF under a kind of SDN environment
CN112187752A (en) * 2020-09-18 2021-01-05 湖北大学 Intrusion detection classification method and device based on random forest

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423580A (en) * 2017-04-01 2017-12-01 吉林大学 Grand genomic fragment attribute reduction and sorting technique based on neighborhood rough set
CN108632279A (en) * 2018-05-08 2018-10-09 北京理工大学 A kind of multilayer method for detecting abnormality based on network flow
CN108769079A (en) * 2018-07-09 2018-11-06 四川大学 A kind of Web Intrusion Detection Techniques based on machine learning
CN109873833A (en) * 2019-03-11 2019-06-11 浙江工业大学 A data injection attack detection method based on chi-square distance KNN
CN110929801A (en) * 2019-12-02 2020-03-27 武汉大学 Improved Euclid distance KNN classification method and system
CN111598163A (en) * 2020-05-14 2020-08-28 中南大学 Radar HRRP target recognition method based on stacking integrated learning method
CN111614576A (en) * 2020-06-02 2020-09-01 国网山西省电力公司电力科学研究院 A method and system for network data traffic identification based on wavelet analysis and support vector machine

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于数据增强和模型更新的异常流量检测技术;张浩等;《信息网络安全》;20200210(第02期);全文 *
基于深度学习的入侵检测系统;董宁等;《网络安全技术与应用》;20201015(第10期);全文 *

Also Published As

Publication number Publication date
CN113162926A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
Ur Rehman et al. DIDDOS: An approach for detection and identification of Distributed Denial of Service (DDoS) cyberattacks using Gated Recurrent Units (GRU)
CN109450842B (en) Network malicious behavior recognition method based on neural network
Qu et al. A survey on the development of self-organizing maps for unsupervised intrusion detection
Li et al. RTVD: A real-time volumetric detection scheme for DDoS in the Internet of Things
Viegas et al. Toward a reliable anomaly-based intrusion detection in real-world environments
WO2022052476A1 (en) Training method for detection model, system, device, and storage medium
Idhammad et al. Detection system of HTTP DDoS attacks in a cloud environment based on information theoretic entropy and random forest
US10375143B2 (en) Learning indicators of compromise with hierarchical models
Singh et al. MLP-GA based algorithm to detect application layer DDoS attack
KR102135024B1 (en) Method and apparatus for identifying category of cyber attack aiming iot devices
Farhan et al. Performance analysis of intrusion detection for deep learning model based on CSE-CIC-IDS2018 dataset
Liu et al. LSTM-CGAN: Towards generating low-rate DDoS adversarial samples for blockchain-based wireless network detection models
CN106713371A (en) Fast Flux botnet detection method based on DNS anomaly mining
CN102638474B (en) Application layer DDOS (distributed denial of service) attack and defense method
Tang et al. Low-rate DoS attack detection based on two-step cluster analysis and UTR analysis
Odusami et al. A survey and meta‐analysis of application‐layer distributed denial‐of‐service attack
Lei et al. Detecting malicious domains with behavioral modeling and graph embedding
Sree et al. HADM: detection of HTTP GET flooding attacks by using Analytical hierarchical process and Dempster–Shafer theory with MapReduce
Shalini et al. DOCUS-DDoS detection in SDN using modified CUSUM with flash traffic discrimination and mitigation
CN117614742A (en) Malicious traffic detection method with enhanced honey point perception
CN111131309A (en) Distributed denial of service detection method and device and model creation method and device
Althobaiti et al. Securing Cloud Computing from Flash Crowd Attack Using Ensemble Intrusion Detection System.
CN113162926B (en) KNN-based network attack detection attribute weight analysis method
Niu et al. Using XGBoost to discover infected hosts based on HTTP traffic
Gocher et al. Impact analysis to detect and mitigate distributed denial of service attacks with Ryu-SDN Controller: A comparative analysis of four different machine learning classification algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant