WO2021258961A1 - 基于改进K-means算法的网络流量分类方法及系统 - Google Patents

基于改进K-means算法的网络流量分类方法及系统 Download PDF

Info

Publication number
WO2021258961A1
WO2021258961A1 PCT/CN2021/095793 CN2021095793W WO2021258961A1 WO 2021258961 A1 WO2021258961 A1 WO 2021258961A1 CN 2021095793 W CN2021095793 W CN 2021095793W WO 2021258961 A1 WO2021258961 A1 WO 2021258961A1
Authority
WO
WIPO (PCT)
Prior art keywords
network traffic
traffic data
data point
density
network
Prior art date
Application number
PCT/CN2021/095793
Other languages
English (en)
French (fr)
Inventor
张登银
蔡岳
肖毅
赵莎莎
Original Assignee
南京邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京邮电大学 filed Critical 南京邮电大学
Publication of WO2021258961A1 publication Critical patent/WO2021258961A1/zh
Priority to US17/846,908 priority Critical patent/US11570069B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0882Utilisation of link capacity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0888Throughput
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • H04L43/062Generation of reports related to network traffic

Definitions

  • the invention relates to a network traffic classification method and system based on an improved K-means algorithm, and belongs to the technical field of network traffic classification.
  • Network traffic classification technology is one of the basic methods to analyze network traffic characteristics and enhance network controllability. Early network traffic classification methods were mainly based on ports. However, as the data complexity in the network increases, many protocols no longer follow this Rules, and many p2p software has the characteristics of randomness and concurrency in the use of ports. Most of these software use multiple ports at the same time, and some applications will deliberately disguise ports. For example, the use of DNS tunnels can bypass ACLs or traffic audits. , Therefore port-based network traffic classification is no longer reliable.
  • the purpose of the present invention is to overcome the deficiencies in the prior art and provide a network traffic classification method and system based on an improved K-means algorithm, which can ensure a higher network traffic classification accuracy rate.
  • the present invention provides a network traffic classification method based on an improved K-means algorithm.
  • the method includes the following steps:
  • Step 1 Define the number of network traffic data points as N;
  • All network flow data points include the network flow data point closest to the i-th network flow data point and the n-th closest network flow data point to the i-th network flow data point;
  • Step 2 Determine whether the total number of network traffic data points in the initial cluster center set NIC reaches the expected network traffic cluster number k value, if it does not reach the k value, then calculate the high-density network traffic data point set of each network traffic data point Candidate metric value, select the network traffic data point with the largest candidate metric value and add it to the initial cluster center set, and remove the network traffic data point from the high-density network traffic data point set, and then repeat step 2 until the initial clustering The total number of network traffic data points NIC in the class center set reaches the k value, and step 2 ends.
  • the formula for calculating the n-th density D in of the i-th network traffic data point is
  • the formula for calculating n in the nth distance of the i-th network traffic data point is
  • a j is the j-th network flow data point in the high-density network flow data point set
  • j 1, 2, 3...NHD
  • NHD is the total number of network flow data points in the high-density network flow data point set
  • ic1, ic2, ⁇ , icNIC are the first, second, and NIC network traffic data points in the initial clustering center set respectively
  • ⁇ A j ,ic1> is the high-density network traffic data point set
  • ⁇ A j, icNIC> is the jth network traffic data point set in the high-density network traffic
  • the present invention provides a network traffic classification system based on an improved K-means algorithm, the system including:
  • the first definition module used to define the number of network traffic data points as N;
  • the third definition module It is used to define all network traffic data points between the nearest to the nth nearest to the i-th network traffic data point, with the i-th network traffic data point as the center of the sphere, and the i-th network traffic data point
  • the distribution density in a multidimensional space hypersphere with a radius r of nth distance is the nth density D in of the i-th network traffic data point, and the network traffic data from the nearest to the nth closest to the i-th network traffic data point
  • All network traffic data points include the network traffic data point closest to the i-th network traffic data point and the network traffic data point that is the n-th closest to the i-th network traffic data point;
  • Input module used to input the network traffic data point set to be clustered and the expected number of network traffic clusters k;
  • Determining the n value module used to determine the specific value of n in the nth distance of the i-th network traffic data point;
  • Average calculation module used to calculate the average avg of the nth density of all network traffic data points
  • the first joining module used to add the nth network traffic data point with a density greater than avg among all network traffic data points to the high-density network traffic data point set;
  • the second joining module used to select the n-th densest network traffic data point in the high-density network traffic data point set, add it to the initial cluster center set, and remove the network from the high-density network traffic data point set Flow data points;
  • Judgment module used to determine whether the total number of network traffic data points in the initial clustering center set NIC reaches the expected network traffic cluster number k value, if it does not reach the k value, calculate each network traffic data in the high-density network traffic data point set Point candidate metric value, select the network traffic data point with the largest candidate metric value and add it to the initial cluster center set, remove the network traffic data point from the high-density network traffic data point set, and then repeat the judgment module The task is executed until the total number of NICs of the network traffic data points in the initial cluster center set reaches the k value, and the task performed by the judgment module is over.
  • the present invention provides a network traffic classification system based on an improved K-means algorithm, including a processor and a storage medium;
  • the storage medium is used to store instructions
  • the processor is configured to operate according to the instruction to execute the steps of any one of the foregoing methods.
  • a computer-readable storage medium has a computer program stored thereon, and when the program is executed by a processor, it implements the steps of any one of the foregoing methods.
  • the present invention fully considers the distribution of network traffic data points, and abandons the method of randomly generating initial clustering centers, thereby obtaining better initial clustering centers and performing clustering based on them, so that training efficiency is improved. Improved, so as to ensure a higher accuracy of network traffic classification.
  • FIG. 1 is a flowchart of preparation work before selecting an initial cluster center provided by an embodiment of the present invention
  • FIG. 2 is a specific flow chart of selecting initial clustering centers and clustering, and establishing the mapping relationship between network traffic clusters and network application types obtained by clustering according to an embodiment of the present invention.
  • the embodiment of the present invention provides a network traffic classification method based on an improved K-means algorithm.
  • the method includes the following steps:
  • Step 1 Obtain network traffic data packets.
  • Step 2 Count the information characteristics of network traffic data packets, including data packet size, data packet control byte length and flow duration, and vectorize these information characteristics.
  • Step 3 Supplement the missing value to the data in the vectorized network traffic data packet.
  • the strategy of mean value filling is adopted.
  • Step 4 Through the calculation between the vectorized information features, three additional new features are obtained.
  • C/S and S/C are obtained by dividing the number of packets in the C/S direction by the number of packets in the S/C direction.
  • the proportional value of the number of packets is used as the first new feature; the total number of bytes in two-way transmission obtained by adding the total number of bytes in C/S to the total number of bytes in S/C as the second new feature; using packet average control
  • the third new feature is the percentage of average control bytes obtained by dividing the number of bytes by the average number of packets.
  • Step 5 Separate the training set and the test set.
  • the training set refers to the network traffic data packet set delivered to the improved K-means algorithm and clustered, and then the classification model is obtained through the training set;
  • the test set refers to According to the clustering result (that is, the classification model obtained by clustering), the network traffic data packets in the set are divided into different network application types, and the correct rate of the division is calculated to evaluate the network traffic data packet set of the classification model performance; test;
  • the union of the set and the training set is the entire network traffic data packet set, that is, the complete set.
  • the test set occupies 20% of the complete set, and the training set occupies 80% of the complete set.
  • Step 6 Select marked network traffic data packets. Since this embodiment is based on unsupervised learning, it is considered that all network traffic data packets are not marked, but if all network traffic data packets are not marked, then gather After the class gets the cluster, it is impossible to establish a mapping relationship between the cluster and the actual application type. Therefore, it is necessary to randomly consider that some network traffic data packets are marked, so that the maximum likelihood method can be used to map the cluster to the actual application type and be more realistic Therefore, for a large number of untagged network traffic data packets and a small number of marked network traffic data packets, the embodiment of the present invention only needs to identify the untagged network traffic data packets;
  • Step 7 The preprocessing of transforming the data in the vectorized network traffic data packet.
  • the processing method is to first adopt the logarithmic transformation method, then adopt the standardization method, and finally adopt the normalization method; in this embodiment,
  • Steps 1 to 7 are the flow chart of the preparation work before selecting the initial cluster center.
  • Step 8 Select the initial clustering center based on the improved K-means algorithm, and abstract network traffic data packets as network traffic data points. The specific steps are as follows:
  • Step (8.3) Define all network traffic data points between the closest to the nth closest to the i-th network traffic data point (all network traffic data points include the closest network traffic data point and distance from the i-th network traffic data point)
  • the i-th network traffic data point is the nearest n-th network traffic data point) in a multidimensional space hypersphere with the i-th network traffic data point as the center of the sphere and the n-th distance of the i-th network traffic data point as the radius r
  • the distribution density in is the nth density D in of the i-th network traffic data point, where, And there is only one network traffic data point from the nearest to the nth closest to the i-th network traffic data point;
  • the multidimensional space hypersphere contains n-1 points, and the network traffic data point that is the nth nearest to the i-th network traffic data point is on the spherical shell of the multidimensional space hypersphere, it can be counted as 0.5 points, Therefore , the numerator of the D in calculation formula is n-1+0.5 which is n-0.5; in order to avoid the problem of insufficient calculation accuracy caused by the nth distance of the i-th network traffic data point being too small, the denominator of the D in calculation formula Is the radius r+1 of the hypersphere in the multidimensional space.
  • Step (8.6) Calculate the average avg of the nth density of all network traffic data points, where,
  • Step (8.8) Select the network traffic data point with the n-th highest density in the high-density network traffic data point set, add it to the initial cluster center set, and remove the network traffic data from the high-density network traffic data point set point;
  • Step (8.9) Determine whether the total number of network traffic data points in the initial clustering center set NIC reaches the expected network traffic cluster number k value, if it does not reach the k value, calculate each network traffic data in the high-density network traffic data point set Point candidate metric value, select the network traffic data point with the largest candidate metric value and add it to the initial cluster center set, and remove the network traffic data point from the high-density network traffic data point set, and then repeat the step (8.9 ) Until the total number of network traffic data points in the initial cluster center set NIC reaches the value of k, step (8.9) ends, and the initial cluster center is obtained,
  • Step 9 Use the initial cluster centers obtained in step (8.9) to perform clustering.
  • Step 10 Use the maximum likelihood method to establish the mapping relationship between network traffic clusters obtained by clustering and network application types. The specific steps are as follows:
  • n ji network traffic cluster C i is marked as the number of data points in the network application network traffic type of R j, N i is the network traffic to all cluster C i are marked network traffic data the total number of points;
  • R t be the finally recognized network application type corresponding to the network traffic cluster C i , then R t needs to meet the following conditions:
  • C i ) max[P(R 1
  • the network traffic cluster is regarded as an unknown network application type.
  • C i ) is simplified. If the network traffic data points of a certain network application type in the network traffic data points marked within the network traffic cluster are the most, then the network traffic The cluster is identified as the network application type, so there are often multiple network traffic clusters mapped to the same network application type, and there is not necessarily a one-to-one correspondence between the network traffic cluster and the network application type.
  • step 8 to step 10 are specific flowcharts of selecting initial clustering centers and clustering, and establishing the mapping relationship between network traffic clusters obtained by clustering and network application types.
  • the present invention fully considers the distribution of network traffic data points, and abandons the method of randomly generating initial clustering centers, thereby obtaining better initial clustering centers and performing clustering based on them, so that training efficiency is improved. Improved, so as to ensure a higher accuracy of network traffic classification.
  • the embodiment of the present invention provides a network traffic classification system based on an improved K-means algorithm, and the system includes:
  • the first definition module used to define the number of network traffic data points as N;
  • the third definition module It is used to define all network traffic data points between the nearest to the nth nearest to the i-th network traffic data point, with the i-th network traffic data point as the center of the sphere, and the i-th network traffic data point
  • the distribution density in a multidimensional space hypersphere with a radius r of nth distance is the nth density D in of the i-th network traffic data point, and the network traffic data from the nearest to the nth closest to the i-th network traffic data point
  • All network traffic data points include the network traffic data point closest to the i-th network traffic data point and the network traffic data point that is the n-th closest to the i-th network traffic data point;
  • Input module used to input the network traffic data point set to be clustered and the expected number of network traffic clusters k;
  • Determining the n value module used to determine the specific value of n in the nth distance of the i-th network traffic data point;
  • Average calculation module used to calculate the average avg of the nth density of all network traffic data points
  • the first joining module used to add the nth network traffic data point with a density greater than avg among all network traffic data points to the high-density network traffic data point set;
  • the second joining module used to select the n-th densest network traffic data point in the high-density network traffic data point set, add it to the initial cluster center set, and remove the network from the high-density network traffic data point set Flow data points;
  • Judgment module used to determine whether the total number of network traffic data points in the initial clustering center set NIC reaches the expected network traffic cluster number k value, if it does not reach the k value, calculate each network traffic data in the high-density network traffic data point set Point candidate metric value, select the network traffic data point with the largest candidate metric value and add it to the initial cluster center set, remove the network traffic data point from the high-density network traffic data point set, and then repeat the judgment module The task is executed until the total number of NICs of the network traffic data points in the initial cluster center set reaches the k value, and the task performed by the judgment module is over.
  • the embodiment of the present invention also provides a network traffic classification system based on the improved K-means algorithm, including a processor and a storage medium;
  • the storage medium is used to store instructions
  • the processor is configured to operate according to the instruction to execute the steps of the foregoing method.
  • the embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of the foregoing method are implemented.
  • this application can be provided as a method, a system, or a computer program product. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction system.
  • the system implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Environmental & Geological Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明公开了一种基于改进K-means算法的网络流量分类方法及系统,所述方法包括:判断初始聚类中心集合中网络流量数据点的总数NIC是否达到期望的网络流量簇数量k值,若没有达到k值,则计算高密度网络流量数据点集合中各个网络流量数据点的候选度量值,选择候选度量值最大的网络流量数据点并将其加入到初始聚类中心集合中,并从高密度网络流量数据点集合中去除该网络流量数据点,然后重复该步骤直至初始聚类中心集合中网络流量数据点的总数NIC达到k值,步骤结束。本发明能够保证较高的网络流量分类准确率。

Description

基于改进K-means算法的网络流量分类方法及系统 技术领域
本发明涉及一种基于改进K-means算法的网络流量分类方法及系统,属于网络流量分类技术领域。
背景技术
网络流量分类技术是分析网络流量特征和增强网络可控性的基本手段之一,早期的网络流量分类方法主要是基于端口,但是随着网络中的数据复杂度增加,很多协议不再遵循这样的规则,而很多p2p软件对端口的使用具有随机性和并发性的特点,这类软件大部分都同时使用多个端口,另外有些应用会故意伪装端口,例如使用DNS隧道可以绕开ACL或者流量审计,因此基于端口的网络流量分类不再可靠。
还有一种网络流量分类方法是基于数据包特征标识的,有些数据包特征标识只对应一个应用,从而可以通过识别特征标识来区分数据包,这种方法简单而且正确率高。但是一旦协议发生改变,该方法就要重新调节特征标识,从而对协议变化的反应不够迅速,另外,对于特征标识的识别工作很大程度上依赖人的劳动,这会耗费较大人力和时间。
发明内容
本发明的目的在于克服现有技术中的不足,提供一种基于改进K-means算法的网络流量分类方法及系统,能够保证较高的网络流量分类准确率。
为达到上述目的,本发明是采用下述技术方案实现的:
第一方面,本发明提供了一种基于改进K-means算法的网络流量分类方法, 所述方法包括如下步骤:
步骤1:定义网络流量数据点数量为N;
定义距离第i个网络流量数据点为第n近的网络流量数据点与第i个网络流量数据点之间的欧氏距离为第i个网络流量数据点的第n距离,其中i=1,2,3...N;
定义距离第i个网络流量数据点最近到第n近之间的所有网络流量数据点在以第i个网络流量数据点为球心,以第i个网络流量数据点的第n距离为半径r的多维空间超球体中的分布密度为第i个网络流量数据点的第n密度D in,且距离第i个网络流量数据点从最近到第n近的网络流量数据点分别有且只有一个,其中所有网络流量数据点包括距离第i个网络流量数据点最近的网络流量数据点和距离第i个网络流量数据点为第n近的网络流量数据点;
输入待聚类的网络流量数据点集和期望的网络流量簇数量k;
确定第i个网络流量数据点的第n距离中n的具体数值;
计算所有网络流量数据点的第n密度的平均值avg;
将所有网络流量数据点中第n密度大于avg的网络流量数据点加入到高密度网络流量数据点集合中;
选取高密度网络流量数据点集合中第n密度最大的网络流量数据点,将其加入到初始聚类中心集合中,并从高密度网络流量数据点集合中去除该网络流量数据点;
步骤2:判断初始聚类中心集合中网络流量数据点的总数NIC是否达到期望的网络流量簇数量k值,若没有达到k值,则计算高密度网络流量数据点集合中各个网络流量数据点的候选度量值,选择候选度量值最大的网络流量数据点并将其加入到初始聚类中心集合中,并从高密度网络流量数据点集合中去除 该网络流量数据点,然后重复步骤2直至初始聚类中心集合中网络流量数据点的总数NIC达到k值,步骤2结束。
结合第一方面,进一步的,第i个网络流量数据点的第n密度D in的计算公式为
Figure PCTCN2021095793-appb-000001
结合第一方面,进一步的,第i个网络流量数据点的第n距离中n的计算公式为
Figure PCTCN2021095793-appb-000002
结合第一方面,进一步的,所有网络流量数据点的第n密度的平均值avg的计算公式为
Figure PCTCN2021095793-appb-000003
结合第一方面,进一步的,高密度网络流量数据点集合中第j个网络流量数据点的候选度量值,记作cd j,其计算公式为cd j=min(<A j,ic1>,<A j,ic2>,···,<A j,icNIC>),
其中,A j为高密度网络流量数据点集合中第j个网络流量数据点,j=1,2,3...NHD,NHD为高密度网络流量数据点集合中网络流量数据点的总数,ic1,ic2,···,icNIC分别为初始聚类中心集合中第1个,第2个···,第NIC个网络流量数据点,<A j,ic1>为高密度网络流量数据点集合中第j个网络流量数据点与初始聚类中心集合中第1个网络流量数据点间的欧氏距离,以此类推,<A j,icNIC>为高密度网络流量数据点集合中第j个网络流量数据点与初始聚类中心集合中第NIC个网络流量数据点间的欧氏距离。
第二方面,本发明提供了一种基于改进K-means算法的网络流量分类系统,所述系统包括:
第一定义模块:用于定义网络流量数据点数量为N;
第二定义模块:用于定义距离第i个网络流量数据点为第n近的网络流量数据点与第i个网络流量数据点之间的欧氏距离为第i个网络流量数据点的第n距离,其中i=1,2,3...N;
第三定义模块:用于定义距离第i个网络流量数据点最近到第n近之间的所有网络流量数据点在以第i个网络流量数据点为球心,以第i个网络流量数据点的第n距离为半径r的多维空间超球体中的分布密度为第i个网络流量数据点的第n密度D in,且距离第i个网络流量数据点从最近到第n近的网络流量数据点分别有且只有一个,其中所有网络流量数据点包括距离第i个网络流量数据点最近的网络流量数据点和距离第i个网络流量数据点为第n近的网络流量数据点;
输入模块:用于输入待聚类的网络流量数据点集和期望的网络流量簇数量k;
确定n值模块:用于确定第i个网络流量数据点的第n距离中n的具体数值;
计算平均值模块:用于计算所有网络流量数据点的第n密度的平均值avg;
第一加入模块:用于将所有网络流量数据点中第n密度大于avg的网络流量数据点加入到高密度网络流量数据点集合中;
第二加入模块:用于选取高密度网络流量数据点集合中第n密度最大的网络流量数据点,将其加入到初始聚类中心集合中,并从高密度网络流量数据点集合中去除该网络流量数据点;
判断模块:用于判断初始聚类中心集合中网络流量数据点的总数NIC是否达到期望的网络流量簇数量k值,若没有达到k值,则计算高密度网络流量数据点集合中各个网络流量数据点的候选度量值,选择候选度量值最大的网络流量数据点并将其加入到初始聚类中心集合中,并从高密度网络流量数据点集合 中去除该网络流量数据点,然后重复判断模块所执行任务直至初始聚类中心集合中网络流量数据点的总数NIC达到k值,判断模块所执行任务结束。
第三方面,本发明提供了一种基于改进K-means算法的网络流量分类系统,包括处理器及存储介质;
所述存储介质用于存储指令;
所述处理器用于根据所述指令进行操作以执行前述任一项所述方法的步骤。
第四方面,计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述任一项所述方法的步骤。
与现有技术相比,本发明所达到的有益效果:
本发明基于改进K-means算法充分考虑了网络流量数据点的分布情况,放弃了随机产生初始聚类中心的方法,从而得到较好的初始聚类中心并以此进行聚类,使得训练效率得以提高,从而能够保证较高的网络流量分类准确率。
附图说明
图1是本发明实施例提供的选择初始聚类中心前的准备工作流程图;
图2是本发明实施例提供的选择初始聚类中心并聚类,建立聚类得到的网络流量簇与网络应用类型的映射关系的具体流程图。
具体实施方式
下面结合附图对本发明作进一步描述。以下实施例仅用于更加清楚地说明本发明的技术方案,而不能以此来限制本发明的保护范围。
本发明实施例提供了一种基于改进K-means算法的网络流量分类方法,所述方法包括如下步骤:
步骤1:获取网络流量数据包。
步骤2:统计网络流量数据包的信息特征,包括数据包大小,数据包控制字节长度和流持续时间,并将这些信息特征向量化。
步骤3:对向量化的网络流量数据包中的数据进行缺失值补充,在本实施例中,采用均值填充的策略。
步骤4:通过向量化的信息特征之间的计算,得到三个额外的新特征,具体如下:使用C/S向包数除以S/C向包数得到的C/S和S/C向包数的比例值作为第一个新特征;使用C/S向总字节数加上S/C向总字节数得到的双向传输总字节数作为第二个新特征;使用包平均控制字节数除以包平均字节数得到的平均控制字节占比作为第三个新特征。
步骤5:训练集和测试集分离,在本实施例中,训练集是指交付给改进K-means算法并进行聚类的网络流量数据包集合,然后通过训练集获得分类模型;测试集是指根据聚类结果(即聚类得到的分类模型)将该集合中网络流量数据包划分为不同的网络应用类型,并统计该划分的正确率用以评估分类模型性能的网络流量数据包集合;测试集与训练集的并集为整个网络流量数据包集合,即为全集,测试集占全集的20%,训练集占全集的80%。
步骤6:选择带标记的网络流量数据包,由于本实施例基于非监督学习,因此认为所有网络流量数据包都不带有标记,但是如果所有的网络流量数据包都不带有标记,那么聚类得到簇之后就无法将簇与实际的应用类型建立映射关系,因此需要随机认为一些网络流量数据包具有标记,这样才能用极大似然法将簇与实际的应用类型对应起来并更加符合实际情况,从而针对大量未标记的网络流量数据包和少量已标记过的网络流量数据包,本发明实施例只需识别未标记 的网络流量数据包;
步骤7:对向量化的网络流量数据包中的数据进行变换的预处理工作,处理方法为先采用对数变换方法,再采用标准化方法,最后采用归一化方法;在本实施例中,采用对数变换方法的具体步骤为,记待变换的数据为x,变换结果为x′,运用对数变换公式x′=ln(x+1)进行对数变换,使得网络流量数据包的信息特征分布更加趋向正态分布,从而提高改进K-means算法对未标记的网络流量数据包识别的准确率。
如图1所示,步骤1-步骤7为选择初始聚类中心前的准备工作流程图。
步骤8:基于改进K-means算法选择初始聚类中心,将网络流量数据包抽象为网络流量数据点,具体步骤如下:
步骤(8.1):定义网络流量数据点数量为N;
步骤(8.2):定义距离第i个网络流量数据点为第n近的网络流量数据点与第i个网络流量数据点之间的欧氏距离为第i个网络流量数据点的第n距离(i=1,2,3...N);
步骤(8.3):定义距离第i个网络流量数据点最近到第n近之间的所有网络流量数据点(所有网络流量数据点包括距离第i个网络流量数据点最近的网络流量数据点和距离第i个网络流量数据点为第n近的网络流量数据点)在以第i个网络流量数据点为球心,以第i个网络流量数据点的第n距离为半径r的多维空间超球体中的分布密度为第i个网络流量数据点的第n密度D in,其中,
Figure PCTCN2021095793-appb-000004
且距离第i个网络流量数据点从最近到第n近的网络流量数据点分别有且只有一个;
由于多维空间超球体内包含n-1个点,而距离第i个网络流量数据点为第n 近的网络流量数据点在多维空间超球体的球壳上,可将其计作为0.5个点,因此D in计算式的分子为n-1+0.5为n-0.5;为避免第i个网络流量数据点的第n距离太小而带来的计算精度不够问题,因此令D in计算式的分母为多维空间超球体的半径r+1。
步骤(8.4):输入待聚类的网络流量数据点集和期望的网络流量簇数量k;
步骤(8.5):确定第i个网络流量数据点的第n距离中n的具体数值,其中,
Figure PCTCN2021095793-appb-000005
n的具体数值和网络流量簇数量k有关,n为平均簇数值N/k的1/8,此为本实施例通过实验得到的最优解;
步骤(8.6):计算所有网络流量数据点的第n密度的平均值avg,其中,
Figure PCTCN2021095793-appb-000006
步骤(8.7):将所有网络流量数据点中第n密度大于avg的网络流量数据点加入到高密度网络流量数据点集合中;
步骤(8.8):选取高密度网络流量数据点集合中第n密度最大的网络流量数据点,将其加入到初始聚类中心集合中,并从高密度网络流量数据点集合中去除该网络流量数据点;
步骤(8.9):判断初始聚类中心集合中网络流量数据点的总数NIC是否达到期望的网络流量簇数量k值,若没有达到k值,则计算高密度网络流量数据点集合中各个网络流量数据点的候选度量值,选择候选度量值最大的网络流量数据点并将其加入到初始聚类中心集合中,并从高密度网络流量数据点集合中去除该网络流量数据点,然后重复步骤(8.9)直至初始聚类中心集合中网络流量数据点的总数NIC达到k值,步骤(8.9)结束,从而得到初始聚类中心,
其中,高密度网络流量数据点集合中第j个网络流量数据点的候选度量值,记作cd j,其计算公式为cd j=min(<A j,ic1>,<A j,ic2>,···,<A j,icNIC>),而A j为高密度网络流量数据点集合中第j个网络流量数据点,j=1,2,3...NHD,NHD为高密度网络流量数据点集合中网络流量数据点的总数,ic1,ic2,···,icNIC分别为初始聚类中心集合中第1个,第2个···,第NIC个网络流量数据点,<A j,ic1>为高密度网络流量数据点集合中第j个网络流量数据点与初始聚类中心集合中第1个网络流量数据点间的欧氏距离,以此类推,<A j,icNIC>为高密度网络流量数据点集合中第j个网络流量数据点与初始聚类中心集合中第NIC个网络流量数据点间的欧氏距离。
步骤9:使用步骤(8.9)得到的初始聚类中心进行聚类。
步骤10:使用极大似然法,建立聚类得到的网络流量簇与网络应用类型的映射关系,具体步骤如下:
设C={C 1,C 2,...,C k}为聚类得到的网络流量簇集合,k为聚类得到的网络流量簇的集合总数,设R={R 1,R 2,...,R l}为网络流量的网络应用类型集合,l为网络应用类型总数,其中l≤k,设聚类得到的网络流量簇C i与网络应用类型R j之间存在映射f:C→R,使用极大似然法定义映射f的概率模型为
Figure PCTCN2021095793-appb-000007
其中,n ji是网络流量簇C i中被标记为网络应用类型R j的网络流量数据点数量,N i为网络流量簇C i中所有被标记的网络流量数据点总数;
记R t为网络流量簇C i所对应的被最终认定的网络应用类型,则R t需要满足如下条件:
P(R t|C i)=max[P(R 1|C i),P(R 2|C i),...,P(R l|C i)]。
如果一个网络流量簇中没有已标记的网络流量数据点,那么该网络流量簇就被认定为未知的网络应用类型。在本实施例中,P(R j|C i)的计算有所简化,如果网络流量簇内部已标记的网络流量数据点中某个网络应用类型的网络流量数据点最多,就将该网络流量簇认定为该网络应用类型,因此常常存在多个网络流量簇映射到同一个网络应用类型,并不一定是网络流量簇与网络应用类型一一对应。
如图2所示,步骤8-步骤10为选择初始聚类中心并聚类,建立聚类得到的网络流量簇与网络应用类型的映射关系的具体流程图。
本发明基于改进K-means算法充分考虑了网络流量数据点的分布情况,放弃了随机产生初始聚类中心的方法,从而得到较好的初始聚类中心并以此进行聚类,使得训练效率得以提高,从而能够保证较高的网络流量分类准确率。
本发明实施例提供了一种基于改进K-means算法的网络流量分类系统,所述系统包括:
第一定义模块:用于定义网络流量数据点数量为N;
第二定义模块:用于定义距离第i个网络流量数据点为第n近的网络流量数据点与第i个网络流量数据点之间的欧氏距离为第i个网络流量数据点的第n距离,其中i=1,2,3...N;
第三定义模块:用于定义距离第i个网络流量数据点最近到第n近之间的所有网络流量数据点在以第i个网络流量数据点为球心,以第i个网络流量数据点的第n距离为半径r的多维空间超球体中的分布密度为第i个网络流量数据点的第n密度D in,且距离第i个网络流量数据点从最近到第n近的网络流量数据点 分别有且只有一个,其中所有网络流量数据点包括距离第i个网络流量数据点最近的网络流量数据点和距离第i个网络流量数据点为第n近的网络流量数据点;
输入模块:用于输入待聚类的网络流量数据点集和期望的网络流量簇数量k;
确定n值模块:用于确定第i个网络流量数据点的第n距离中n的具体数值;
计算平均值模块:用于计算所有网络流量数据点的第n密度的平均值avg;
第一加入模块:用于将所有网络流量数据点中第n密度大于avg的网络流量数据点加入到高密度网络流量数据点集合中;
第二加入模块:用于选取高密度网络流量数据点集合中第n密度最大的网络流量数据点,将其加入到初始聚类中心集合中,并从高密度网络流量数据点集合中去除该网络流量数据点;
判断模块:用于判断初始聚类中心集合中网络流量数据点的总数NIC是否达到期望的网络流量簇数量k值,若没有达到k值,则计算高密度网络流量数据点集合中各个网络流量数据点的候选度量值,选择候选度量值最大的网络流量数据点并将其加入到初始聚类中心集合中,并从高密度网络流量数据点集合中去除该网络流量数据点,然后重复判断模块所执行任务直至初始聚类中心集合中网络流量数据点的总数NIC达到k值,判断模块所执行任务结束。
本发明实施例还提供了一种基于改进K-means算法的网络流量分类系统,包括处理器及存储介质;
所述存储介质用于存储指令;
所述处理器用于根据所述指令进行操作以执行前述方法的步骤。
本发明实施例还提供了一种计算机可读存储介质,其上存储有计算机程序, 该程序被处理器执行时实现前述方法的步骤。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的系统。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令系统的制造品,该指令系统实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通 技术人员来说,在不脱离本发明技术原理的前提下,还可以做出若干改进和变形,这些改进和变形也应视为本发明的保护范围。

Claims (8)

  1. 一种基于改进K-means算法的网络流量分类方法,其特征在于,所述方法包括如下步骤:
    步骤1:定义网络流量数据点数量为N;
    定义距离第i个网络流量数据点为第n近的网络流量数据点与第i个网络流量数据点之间的欧氏距离为第i个网络流量数据点的第n距离,其中i=1,2,3...N;
    定义距离第i个网络流量数据点最近到第n近之间的所有网络流量数据点在以第i个网络流量数据点为球心,以第i个网络流量数据点的第n距离为半径r的多维空间超球体中的分布密度为第i个网络流量数据点的第n密度D in,且距离第i个网络流量数据点从最近到第n近的网络流量数据点分别有且只有一个,其中所有网络流量数据点包括距离第i个网络流量数据点最近的网络流量数据点和距离第i个网络流量数据点为第n近的网络流量数据点;
    输入待聚类的网络流量数据点集和期望的网络流量簇数量k;
    确定第i个网络流量数据点的第n距离中n的具体数值;
    计算所有网络流量数据点的第n密度的平均值avg;
    将所有网络流量数据点中第n密度大于avg的网络流量数据点加入到高密度网络流量数据点集合中,所述高密度即为大于密度平均值avg;
    选取高密度网络流量数据点集合中第n密度最大的网络流量数据点,将其加入到初始聚类中心集合中,并从高密度网络流量数据点集合中去除该网络流量数据点;
    步骤2:判断初始聚类中心集合中网络流量数据点的总数NIC是否达到期望的网络流量簇数量k值,若没有达到k值,则计算高密度网络流量数据点集合中各个网络流量数据点的候选度量值,选择候选度量值最大的网络流量数据 点并将其加入到初始聚类中心集合中,并从高密度网络流量数据点集合中去除该网络流量数据点,然后重复步骤2直至初始聚类中心集合中网络流量数据点的总数NIC达到k值,步骤2结束。
  2. 根据权利要求1所述的基于改进K-means算法的网络流量分类方法,其特征在于,第i个网络流量数据点的第n密度D in的计算公式为
    Figure PCTCN2021095793-appb-100001
  3. 根据权利要求1所述的基于改进K-means算法的网络流量分类方法,其特征在于,第i个网络流量数据点的第n距离中n的计算公式为
    Figure PCTCN2021095793-appb-100002
  4. 根据权利要求2所述的基于改进K-means算法的网络流量分类方法,其特征在于,所有网络流量数据点的第n密度的平均值avg的计算公式为
    Figure PCTCN2021095793-appb-100003
  5. 根据权利要求1所述的基于改进K-means算法的网络流量分类方法,其特征在于,高密度网络流量数据点集合中第j个网络流量数据点的候选度量值,记作cd j,其计算公式为cd j=min(<A j,ic1>,<A j,ic2>,···,<A j,icNIC>),其中,A j为高密度网络流量数据点集合中第j个网络流量数据点,j=1,2,3...NHD,NHD为高密度网络流量数据点集合中网络流量数据点的总数,ic1,ic2,···,icNIC分别为初始聚类中心集合中第1个,第2个···,第NIC个网络流量数据点,<A j,ic1>为高密度网络流量数据点集合中第j个网络流量数据点与初始聚类中心集合中第1个网络流量数据点间的欧氏距离,以此类推,<A j,icNIC>为高密度网络流量数据点集合中第j个网络流量数据点与初始聚类中心集合中第NIC个网络流量数据点间的欧氏距离。
  6. 一种基于改进K-means算法的网络流量分类系统,其特征在于,所述系 统包括:
    第一定义模块:用于定义网络流量数据点数量为N;
    第二定义模块:用于定义距离第i个网络流量数据点为第n近的网络流量数据点与第i个网络流量数据点之间的欧氏距离为第i个网络流量数据点的第n距离,其中i=1,2,3...N;
    第三定义模块:用于定义距离第i个网络流量数据点最近到第n近之间的所有网络流量数据点在以第i个网络流量数据点为球心,以第i个网络流量数据点的第n距离为半径r的多维空间超球体中的分布密度为第i个网络流量数据点的第n密度D in,且距离第i个网络流量数据点从最近到第n近的网络流量数据点分别有且只有一个,其中所有网络流量数据点包括距离第i个网络流量数据点最近的网络流量数据点和距离第i个网络流量数据点为第n近的网络流量数据点;
    输入模块:用于输入待聚类的网络流量数据点集和期望的网络流量簇数量k;
    确定n值模块:用于确定第i个网络流量数据点的第n距离中n的具体数值;
    计算平均值模块:用于计算所有网络流量数据点的第n密度的平均值avg;
    第一加入模块:用于将所有网络流量数据点中第n密度大于avg的网络流量数据点加入到高密度网络流量数据点集合中;
    第二加入模块:用于选取高密度网络流量数据点集合中第n密度最大的网络流量数据点,将其加入到初始聚类中心集合中,并从高密度网络流量数据点集合中去除该网络流量数据点;
    判断模块:用于判断初始聚类中心集合中网络流量数据点的总数NIC是否达到期望的网络流量簇数量k值,若没有达到k值,则计算高密度网络流量数 据点集合中各个网络流量数据点的候选度量值,选择候选度量值最大的网络流量数据点并将其加入到初始聚类中心集合中,并从高密度网络流量数据点集合中去除该网络流量数据点,然后重复判断模块所执行任务直至初始聚类中心集合中网络流量数据点的总数NIC达到k值,判断模块所执行任务结束。
  7. 一种基于改进K-means算法的网络流量分类系统,其特征在于,包括处理器及存储介质;
    所述存储介质用于存储指令;
    所述处理器用于根据所述指令进行操作以执行根据权利要求1~5任一项所述方法的步骤。
  8. 计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1~5任一项所述方法的步骤。
PCT/CN2021/095793 2020-06-22 2021-05-25 基于改进K-means算法的网络流量分类方法及系统 WO2021258961A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/846,908 US11570069B2 (en) 2020-06-22 2022-06-22 Network traffic classification method and system based on improved K-means algorithm

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010572022.0A CN111740921A (zh) 2020-06-22 2020-06-22 基于改进K-means算法的网络流量分类方法及系统
CN202010572022.0 2020-06-22

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/846,908 Continuation US11570069B2 (en) 2020-06-22 2022-06-22 Network traffic classification method and system based on improved K-means algorithm

Publications (1)

Publication Number Publication Date
WO2021258961A1 true WO2021258961A1 (zh) 2021-12-30

Family

ID=72650238

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/095793 WO2021258961A1 (zh) 2020-06-22 2021-05-25 基于改进K-means算法的网络流量分类方法及系统

Country Status (3)

Country Link
US (1) US11570069B2 (zh)
CN (1) CN111740921A (zh)
WO (1) WO2021258961A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111740921A (zh) * 2020-06-22 2020-10-02 南京邮电大学 基于改进K-means算法的网络流量分类方法及系统
CN116340830B (zh) * 2023-05-19 2023-08-18 山东通维信息工程有限公司 一种基于深层记忆模型的高速公路机电系统故障分类方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015060849A1 (en) * 2013-10-24 2015-04-30 Hewlett-Packard Development Company, L.P. Network traffic classification and redirection
CN110009005A (zh) * 2019-03-15 2019-07-12 南京邮电大学 一种基于特征强相关的网络流量分类方法
CN110365603A (zh) * 2019-06-28 2019-10-22 西安交通大学 一种基于5g网络能力开放的自适应网络流量分类方法
CN111211994A (zh) * 2019-11-28 2020-05-29 南京邮电大学 基于SOM与K-means融合算法的网络流量分类方法
CN111740921A (zh) * 2020-06-22 2020-10-02 南京邮电大学 基于改进K-means算法的网络流量分类方法及系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8676729B1 (en) * 2011-06-14 2014-03-18 Narus, Inc. Network traffic classification using subspace clustering techniques
US8694630B1 (en) * 2011-11-18 2014-04-08 Narus, Inc. Self-learning classifier for internet traffic
US9686173B1 (en) * 2014-10-27 2017-06-20 Narus, Inc. Unsupervised methodology to unveil content delivery network structures
US9729571B1 (en) * 2015-07-31 2017-08-08 Amdocs Software Systems Limited System, method, and computer program for detecting and measuring changes in network behavior of communication networks utilizing real-time clustering algorithms
CN107846326B (zh) * 2017-11-10 2020-11-10 北京邮电大学 一种自适应的半监督网络流量分类方法、系统及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015060849A1 (en) * 2013-10-24 2015-04-30 Hewlett-Packard Development Company, L.P. Network traffic classification and redirection
CN110009005A (zh) * 2019-03-15 2019-07-12 南京邮电大学 一种基于特征强相关的网络流量分类方法
CN110365603A (zh) * 2019-06-28 2019-10-22 西安交通大学 一种基于5g网络能力开放的自适应网络流量分类方法
CN111211994A (zh) * 2019-11-28 2020-05-29 南京邮电大学 基于SOM与K-means融合算法的网络流量分类方法
CN111740921A (zh) * 2020-06-22 2020-10-02 南京邮电大学 基于改进K-means算法的网络流量分类方法及系统

Also Published As

Publication number Publication date
US20220329504A1 (en) 2022-10-13
CN111740921A (zh) 2020-10-02
US11570069B2 (en) 2023-01-31

Similar Documents

Publication Publication Date Title
WO2020083073A1 (zh) 非机动车图像多标签分类方法、系统、设备及存储介质
CN108520272B (zh) 一种改进苍狼算法的半监督入侵检测方法
WO2021258961A1 (zh) 基于改进K-means算法的网络流量分类方法及系统
CN106817248B (zh) 一种apt攻击检测方法
CN109815788A (zh) 一种图片聚类方法、装置、存储介质及终端设备
CN106991047B (zh) 一种用于对面向对象软件缺陷进行预测的方法及系统
CN110460605B (zh) 一种基于自动编码的异常网络流量检测方法
WO2018157699A1 (zh) 一种全局最优粒子滤波方法及全局最优粒子滤波器
CN111415025A (zh) 一种赤潮等级预测的方法
CN108280236A (zh) 一种基于LargeVis的随机森林可视化数据分析方法
CN110929848A (zh) 基于多挑战感知学习模型的训练、跟踪方法
CN116402117B (zh) 图像分类卷积神经网络剪枝方法及芯粒器件数据分配方法
WO2021072891A1 (zh) 知识图谱的关系对齐方法、装置、设备及存储介质
WO2020024444A1 (zh) 人群绩效等级识别方法、装置、存储介质及计算机设备
CN113255236A (zh) 基于孪生网络的非侵入式负荷自适应识别方法
CN111353534B (zh) 一种基于自适应分数阶梯度的图数据类别预测方法
CN111079930B (zh) 数据集质量参数的确定方法、装置及电子设备
CN115037543A (zh) 一种基于双向时间卷积神经网络的异常网络流量检测方法
CN117272195A (zh) 基于图卷积注意力网络的区块链异常节点检测方法及系统
CN115017988A (zh) 一种用于状态异常诊断的竞争聚类方法
CN115189939A (zh) 一种基于hmm模型的电网网络入侵检测方法及系统
CN110609832B (zh) 一种面向流式数据的非重复采样方法
She et al. A convolutional autoencoder based method with smote for cyber intrusion detection
CN114723043A (zh) 基于超图模型谱聚类的卷积神经网络卷积核剪枝方法
CN109800384B (zh) 一种基于粗糙集信息决策表的基本概率赋值计算方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21828426

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21828426

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21828426

Country of ref document: EP

Kind code of ref document: A1