CN117478390A - Network intrusion detection method based on improved density peak clustering algorithm - Google Patents
Network intrusion detection method based on improved density peak clustering algorithm Download PDFInfo
- Publication number
- CN117478390A CN117478390A CN202311461821.0A CN202311461821A CN117478390A CN 117478390 A CN117478390 A CN 117478390A CN 202311461821 A CN202311461821 A CN 202311461821A CN 117478390 A CN117478390 A CN 117478390A
- Authority
- CN
- China
- Prior art keywords
- cluster
- point
- network intrusion
- outlier
- calculating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 36
- 230000002159 abnormal effect Effects 0.000 claims abstract description 15
- 238000010845 search algorithm Methods 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims abstract description 5
- 238000012847 principal component analysis method Methods 0.000 claims abstract description 4
- 238000004364 calculation method Methods 0.000 claims description 19
- 239000011159 matrix material Substances 0.000 claims description 18
- 238000013450 outlier detection Methods 0.000 claims description 10
- 238000012163 sequencing technique Methods 0.000 claims description 6
- 230000009467 reduction Effects 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000000034 method Methods 0.000 abstract description 28
- 238000010801 machine learning Methods 0.000 abstract description 3
- 238000000605 extraction Methods 0.000 abstract 1
- 238000012360 testing method Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Probability & Statistics with Applications (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention relates to a network intrusion detection method based on an improved density peak clustering algorithm, and belongs to the technical field of machine learning and computer network security. The method comprises the steps of coding character type features in a network intrusion data set into digital features, carrying out standardized processing, carrying out feature extraction on the network intrusion data set by using a principal component analysis method, removing redundant data and reducing the dimension; calculating neighbors of network intrusion data, and calculating k neighbors when the network intrusion data reach a stable state by using a natural neighbor search algorithm; calculating the density of each point so as to obtain a local representative point according to the density; calculating the distance between the local representative points, and applying a density peak clustering algorithm to the distance to obtain a cluster result; and calculating an outlier factor based on the clusters for each cluster, and taking the detected outlier as abnormal attack data. The method solves the problem that the existing method often ignores cluster abnormal points, and solves the problem that the current intrusion detection method based on the clusters can not well identify manifold clusters.
Description
Technical Field
The invention belongs to the technical field of machine learning and computer network security, and relates to a network intrusion detection method based on an improved density peak clustering algorithm.
Background
With the rapid development of big data and artificial intelligence technology, the network scale is continuously enlarged, which introduces more network security problems. Among them, outlier detection plays an important role in the field of network intrusion detection. Outliers refer to data points that deviate significantly from other data points in the data set due to different mechanisms or unusual processes. Outliers in network data sets are often generated by abnormal network attacks. The network intrusion detection based on the clustering is often applied to an offline environment, and when the data scale is smaller, the intrusion detection method based on the clustering can easily detect abnormal points, and the method can effectively identify burst attacks and isolated attacks. Common cluster-based intrusion detection techniques are generally based on clustering methods such as K-means, DBSCAN, density peak values and the like, and when the conventional cluster-based intrusion detection methods are applied to a network intrusion data set with popular clusters, the problem that manifold clusters cannot be well identified usually exists, so that the representativeness of outlier detection results is reduced.
Therefore, a new network intrusion detection method is needed to solve the above problems.
Disclosure of Invention
In view of the above, the present invention aims to provide a network intrusion detection method based on an improved density peak clustering algorithm, which uses an accurate clustering result obtained by the improved density peak clustering algorithm to improve the representativeness of local outliers, thereby improving the representativeness of outlier detection results. The invention evaluates the clustered clusters by evaluating the outlier degree and takes a small cluster as a whole, and has better effect on detecting the outlier based on the clusters compared with other outlier detection algorithms aiming at single-point outliers.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a network intrusion detection method based on an improved density peak clustering algorithm obtains a historical network intrusion data set R, performs data numerical standardization processing on the R, extracts main features by applying a principal component analysis method, reduces dimensions, and performs dimension reduction on each point p in the data set R i Calculating the limited neighbors, calculating the iteration number when reaching a stable state by using a natural neighbor search algorithm, marking the iteration number as k, and calculating each point p i To obtain each point p i Calculating the distance of shared neighbors among the cores, and obtaining a cluster C by using a density peak clustering algorithm on the local representative points 1 ,C 2 ,…,C k And calculating an outlier factor for each cluster, and sorting according to the outlier factor results to select the lowest n clusters as outlier detection results.
The method specifically comprises the following steps:
s1: preprocessing a historical network intrusion data set, including uniformly encoding character type features or labels in the data set into numerical values, and carrying out standardized processing on the numerical values; and carrying out data dimension reduction on the standardized data;
s2: creating a ball tree to calculate each point p in the data set R based on Euclidean distance i Traversing the ball tree to form an ordered k nearest neighbor matrix and an ordered distance matrix;
s3: according to the k neighbor matrix and the distance matrix obtained in the step S2, a natural neighbor search algorithm is utilized to obtain a iteration number k in a self-adaptive mode;
s4: for each point p according to the density calculation formula i Calculate its density rho (p i ) Sequencing the density value matrixes and obtaining descending density value matrixes and index value matrixes thereof;
s5: each point p is selected i The most dense point in k-nearest neighbor of (2) is taken as point p i Is a local representative point core;
s6: point p having the same local representative point i Dividing into an initial fuzzy sub-cluster;
s7: calculating the distance between the local representative points core according to the formula, thereby obtaining the shortest path between the local representative points core;
s8: applying a density peak clustering algorithm to the local representative points core, constructing a two-dimensional decision graph, selecting a decision center, and distributing non-local representative points to clusters corresponding to the representative points so as to obtain a final cluster C 1 ,C 2 ,…,C k ;
S9: calculating the last outlier upper limit u to be selected by utilizing a formula according to the outlier proportion a to be selected;
s10: calculating cluster C according to the formula 1 ,C 2 ,…,C k And (3) sequencing the calculation results, selecting the lowest n clusters as the final outlier detection result, and identifying the cluster to which the n clusters belong as an abnormal attack type.
In step S1, a principal component analysis method is applied to reduce the data dimension of the normalized data.
Further, in step S4, the density calculation formula is:
wherein rho (p) i ) Representing point p i Density of N k (p i ) Representing point p i Is set of k neighbors, eu (p i O) represents point o to point p i Euclidean distance between them.
Further, in step S7, the formula for calculating the distance between the local representative points core is:
where inset (i, j) represents the intersection between the blurred cluster representing point i and the blurred cluster representing point j.
Further, in step S7, the shortest path is acquired using the Floyd algorithm.
Further, in step S9, the calculation formula of the outlier upper limit is:
{|C 1 |+|C 2 |+…+|C i-1 |≥|R|×a}∩{|C 1 |+|C 2 |+…+|C i-2 |<|R|×a}
then cluster C corresponding to i i The number of points in (a) is the upper limit of outliers, where |c| represents the number of points in the cluster.
Further, in step S10, the calculation formula of the outlier factor of the cluster is:
wherein CBOF (C) i ) Represent C i Outlier factor of cluster, C j Cluster is C i A hypothetical normal cluster next to the cluster; d (C) i ,C j ) The calculation formula of (2) is as follows:
d(C i ,C j )=min{eu(p,q)|p∈C i ,q∈C j }
wherein d (C) i ,C j ) Represent C i Cluster and C j Shortest distance between clusters.
The invention has the beneficial effects that:
1) The method starts from the characteristic that the historical network intrusion data set is mapped to the manifold data set obtained after low dimension through high dimension space, namely the network intrusion data set after dimension reduction contains complex manifold clusters, the existing abnormal point detection method based on clustering is difficult to accurately identify, and the density peak clustering algorithm is improved to be introduced, so that the method has higher accuracy in processing the network intrusion data set.
2) The problem that the sample data volume with the labels is not large exists in the network intrusion data set, and the existing intrusion detection model based on machine learning often needs a large number of training sets with labels, which can lead to low practicability; the invention uses an intrusion detection model based on unsupervised learning, and does not need a sample with a label when intrusion detection is performed, so that the invention has higher practicability when being applied to a network intrusion data set.
3) Many single point outliers are associated with sporadic trivial events, while clustered outliers are associated with some significant persistent anomalies, such as network anomalies caused by anomaly attacks over a period of time. Compared with the method based on the local outliers, the method does not need to calculate the outlier degree of each point, only calculates the outlier degree of each cluster, and reduces the time cost.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.
Drawings
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart of a network intrusion detection method based on an improved density peak clustering algorithm of the present invention;
FIG. 2 is an original outlier distribution on dataset 1;
FIG. 3 shows the detection result after the method proposed by the present invention is applied to the data set 1;
FIG. 4 is a test result after application of an orphan forest algorithm on dataset 1;
FIG. 5 is a detection result after the local anomaly factor algorithm is applied to dataset 1;
FIG. 6 is an original outlier distribution on dataset 2;
FIG. 7 is a test result after applying the method proposed by the present invention on dataset 2;
FIG. 8 is a test result after applying an orphan forest algorithm to dataset 2;
fig. 9 is a detection result after the local anomaly factor algorithm is applied to the data set 2.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.
Referring to fig. 1 to 9, the present invention provides a network intrusion detection method based on an improved density peak clustering algorithm, wherein a relatively accurate clustering result is obtained by clustering a data set by the improved density peak clustering algorithm, the result cluster is evaluated, an outlier factor of each cluster is calculated, and outliers with a larger outlier degree are selected according to the outlier factor ranking, so as to detect an outlier caused by an abnormal network attack.
Firstly, calculating the limited neighbors of each point in the data set by using Euclidean distance, acquiring a distance matrix of each point and the neighbors thereof, sequencing the distance matrix, finding the k neighbors when the distance reaches a stable state by using a natural neighbor search algorithm, calculating the density according to the k neighbors of each point in the data set, selecting representative points, calculating the distance between the representative points and obtaining the shortest path. And obtaining a clustering result by using a density peak clustering algorithm on the representative points, calculating an outlier upper limit by using a clustering result cluster, calculating outlier factors of the clustering result cluster, sequencing the outlier factors and obtaining a final outlier result.
(1) For the method of computing neighbors of a data collection point:
and (3) establishing a Euclidean distance-based ball tree, traversing and inquiring the limited neighbor nodes of each node, and obtaining the ordered k neighbor matrix and distance matrix.
(2) For the natural neighbor search algorithm:
and iterating the k neighbors of the nodes according to the obtained result of the spherical tree, and reaching a stable state when the k neighbor of each node with one node is the node or the data object without the node in the data set is unchanged. The k-nearest neighbor at this time is called a natural nearest neighbor.
(3) The density calculation method for the nodes comprises the following steps:
wherein rho (p) i ) Representing point p i Density, p i Is each node in the data set, k is the iterative result in the natural neighbor search algorithm, N k (p i ) Representing point p i Is set of k neighbors, eu (p i O) represents point o to point p i Euclidean distance between them.
eu(p i The calculation method of o) is as follows:
(4) The selection method for the representative point comprises the following steps:
if one node does not have a representative point, the point with the highest density in the k neighbor of the node is selected as the representative point, if the representative point exists and one point with a larger density and a closer distance than the representative point exists, the representative point is replaced, and if the point is the representative point of other points, the representative point of other points is replaced by the point with the larger density and the closer distance as the representative point.
(5) For a cluster class that divides representative points, a method of calculating distances between representative points:
before calculation, the cluster class needs to be divided, the data point set with the same representative point is divided into the same cluster, and k neighbors of each point in the cluster, which are not in the cluster, are also divided into the same cluster class. Wherein, the calculation formula when there is an intersection between clusters is:
wherein inset (i, j) is the set of data points where the cluster corresponding to representative point i intersects the cluster corresponding to representative point j. When there is no intersection between clusters, the distance between representative points is the maximum value. Thus, an adjacency matrix is constructed with representative points as nodes and distances between the representative points as weights.
(6) For the method of calculating the shortest path between representative points:
the method used to calculate the shortest path between representative points is the Floyd algorithm. The algorithm is a classical algorithm for shortest path from one point to all other points in the weighted graph. The state transfer equation is as follows:
matrix[i,j]=min{matrix[i,k]+matrix[k,j],matrix[i,j]}
where matrix [ i, j ] represents the shortest distance from point i to point j, and point k is an intermediate point that may pass between i and j. And accessing the intermediate nodes traversed by each path by a recursion method, thereby completing the shortest path output between the representative points.
(7) For the application of the density peak clustering algorithm:
calculating the relative distance delta, i.e. the representative point p i To density ratio p i Large and distance p i Nearest point and p i The distance between them is plotted as the density rho (p i ) And a two-dimensional decision graph with delta on the ordinate, wherein points with large delta and rho are manually selected as cluster centers. Then the representative points of other non-clustering centers are classified into clustering centers which are more dense and closer to each other than the representative points, and then the points in the cluster class of the representative points are also classified into the cluster class corresponding to the clustering center, so as to obtain a clustering result C 1 ,C 2 ,…,C k 。
(8) The calculation method for the outlier upper limit is as follows:
if the clustering result obtained by clustering the density peak value, obtaining C 1 ,C 2 ,…,C k Wherein |C 1 |≥|C 2 |≥…≥|C k I being C 1 To C k Is an ordered cluster sequence obtained according to the number of data points in the cluster. The parameter a is given such that the cluster satisfies the following condition:
{|C 1 |+|C 2 |+…+|C i-1 |≥|R|×a}∩{|C 1 |+|C 2 |+…+|C i-2 |<|R|×a}
cluster C at this time i The number of data points in the model is the upper limit of outliers.
(9) The calculation formula for the outlier factor is:
wherein C is j Cluster is C i The hypothetical normal cluster next to the cluster, d (C i ,C j ) The calculation formula of (2) is as follows:
d(C i ,C j )=min{eu(p,q)|p∈C i ,q∈C j }
the higher the CBOF, the more abnormal the outlier based on the cluster is, so all outlier scores are ranked from big to small, and n clusters which do not exceed the upper limit of the outlier are selected as outlier results.
Comparison experiment:
the experiment adopts a KDCUP 99 data set as an embodiment verification, wherein the data set is widely used in the field of network intrusion detection and is developed by researchers at university of Mejil Canada and used for evaluating the performance and accuracy of a network intrusion detection system. The data set is derived from network traffic information of 1998 U.S. air force research laboratories, which contains normal traffic and attack traffic, which is divided into four categories, about 49 ten thousand records, 41 features.
Selecting two data sets with different abnormal attack proportions from the KDCUP 99 data set as test objects through a rectangular frame, wherein the data set 1 comprises 8710 pieces of data and 2178 pieces of abnormal data; dataset 2 contained 9818 pieces of data, 303 pieces of anomaly data. Data set 1 and data set 2 are two data sets of different outlier ratios, with the outlier ratio of data set 1 being about 30% and the outlier ratio of data set 2 being about 3%. The anomaly data contains anomaly points caused by different attack types, but the present embodiment simplifies intrusion detection only to a two-classification problem, i.e., only detects whether it is a network attack. The local anomaly factor algorithm and the isolated forest algorithm are two classical methods for outlier detection. The detection results of the method and the algorithm for detecting the two classical outliers are visually represented by a graph, wherein the normal points are represented by solid circles, and the outliers are represented by solid pentagons. The results are shown in FIGS. 2 to 9.
1) In this embodiment, the result evaluation index is the accuracy, recall, and F1-score. The accuracy is the proportion of the correct sample to the total sample number; the recall is the proportion of the detected correct sample in the actual correct samples to the actual correct samples; the F1-score is a harmonic average value of the accuracy rate and the recall rate, the accuracy rate represents the distinguishing capability of the model to the negative sample, the recall rate represents the identifying capability of the model to the positive sample, the F1-score is the combination of the accuracy rate and the recall rate, and the higher the F1-score is, the more robust the model is. The detection result indexes of the two classical outlier detection methods, namely the method, the local anomaly factor algorithm and the isolated forest algorithm, on the data sets 1 and 2 are shown in the table 1, and the data in the table indicate that the effect of the method on the network intrusion data set is obviously superior to that of the conventional outlier detection method.
Table 1 results of the tests
2) As can be seen from the data in table 1, the results of both classical algorithms perform poorly in the data set with an outlier ratio of 30%, whereas the method proposed by the present invention can perform well. While both classical algorithms have better performance in the 3% anomaly data set, the proposed method has only a small performance degradation. In general, the common intrusion detection method is generally suitable for a data set with a low anomaly ratio, performance is reduced due to the increase of anomaly points, and the recognition effect on an anomaly cluster caused by network anomaly attack is poor. The method provided by the invention has better universality, is not influenced by the abnormal proportion of the data set, has pertinence to the identification of the abnormal cluster, and is suitable for the detection of network intrusion.
Experimental results show that the method provided by the invention can well detect the abnormal attack points in the network intrusion data sets with different abnormal data rates, and has better robustness.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.
Claims (7)
1. The network intrusion detection method based on the improved density peak clustering algorithm is characterized by comprising the following steps of:
s1: preprocessing a historical network intrusion data set, including uniformly encoding character type features or labels in the data set into numerical values, and carrying out standardized processing on the numerical values; and carrying out data dimension reduction on the standardized data;
s2: creating a ball tree to calculate each point p in the data set R based on Euclidean distance i Traversing the ball tree to form an ordered k nearest neighbor matrix and an ordered distance matrix;
s3: according to the k neighbor matrix and the distance matrix obtained in the step S2, a natural neighbor search algorithm is utilized to obtain a iteration number k in a self-adaptive mode;
s4: for each point p according to the density calculation formula i Calculate its density rho (p i ) Sequencing the density value matrixes and obtaining descending density value matrixes and index value matrixes thereof;
s5: each point p is selected i The most dense point in k-nearest neighbor of (2) is taken as point p i Is a local representative point core;
s6: point p having the same local representative point i Dividing into an initial fuzzy sub-cluster;
s7: calculating the distance between the local representative points core according to the formula, thereby obtaining the shortest path between the local representative points core;
s8: applying a density peak clustering algorithm to the local representative points core, constructing a two-dimensional decision graph, selecting a decision center, and distributing non-local representative points to clusters corresponding to the representative points so as to obtain a final cluster C 1 ,C 2 ,…,C k ;
S9: calculating the last outlier upper limit u to be selected by utilizing a formula according to the outlier proportion a to be selected;
s10: calculating cluster C according to the formula 1 ,C 2 ,…,C k And (3) sequencing the calculation results, selecting the lowest n clusters as the final outlier detection result, and identifying the cluster to which the n clusters belong as an abnormal attack type.
2. The network intrusion detection method based on the improved density peak clustering algorithm according to claim 1, wherein in step S1, a principal component analysis method is applied to perform data dimension reduction on the normalized data.
3. The network intrusion detection method based on the improved density peak clustering algorithm according to claim 1, wherein in step S4, the density calculation formula is:
wherein rho (p) i ) Representing point p i Density of N k (p i ) Representing point p i Is set of k neighbors, eu (p i O) represents point o to point p i Euclidean distance between them.
4. The network intrusion detection method based on the improved density peak clustering algorithm according to claim 3, wherein in step S7, the formula for calculating the distance between the local representative points core is:
where inset (i, j) represents the intersection between the blurred cluster representing point i and the blurred cluster representing point j.
5. The network intrusion detection method based on the improved density peak clustering algorithm according to claim 1, wherein in step S7, the shortest path is obtained by using the Floyd algorithm.
6. The network intrusion detection method based on the improved density peak clustering algorithm according to claim 4, wherein in step S9, the calculation formula of the outlier upper limit is:
{|C 1 |+|C 2 |+…+|C i-1 |≥|R|×a}∩{|C 1 |+|C 2 |+…+|C i-2 |<|R|×a}
then cluster C corresponding to i i The number of points in (a) is the upper limit of outliers, where |c| represents the number of points in the cluster.
7. The network intrusion detection method based on the improved density peak clustering algorithm according to claim 6, wherein in step S10, the calculation formula of the outlier factor of the cluster is:
wherein CBOF (C) i ) Represent C i Outlier factor of cluster, C j Cluster is C i A hypothetical normal cluster next to the cluster; d (C) i ,C j ) The calculation formula of (2) is as follows:
d(C i ,C j )=min{eu(p,q)|p∈C i ,q∈C j }
wherein d (C) i ,C j ) Represent C i Cluster and C j Shortest distance between clusters.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311461821.0A CN117478390A (en) | 2023-11-06 | 2023-11-06 | Network intrusion detection method based on improved density peak clustering algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311461821.0A CN117478390A (en) | 2023-11-06 | 2023-11-06 | Network intrusion detection method based on improved density peak clustering algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117478390A true CN117478390A (en) | 2024-01-30 |
Family
ID=89630773
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311461821.0A Pending CN117478390A (en) | 2023-11-06 | 2023-11-06 | Network intrusion detection method based on improved density peak clustering algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117478390A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117714215A (en) * | 2024-02-06 | 2024-03-15 | 江苏开博科技有限公司 | Real-time network threat detection method and functional equipment |
CN118228075A (en) * | 2024-05-23 | 2024-06-21 | 国网山东省电力公司宁津县供电公司 | Single-user power failure automatic alarm monitoring method and system based on big data |
CN118337525A (en) * | 2024-06-07 | 2024-07-12 | 蓝海睿创科技(山东)有限责任公司 | Cloud asset security management system based on big data |
-
2023
- 2023-11-06 CN CN202311461821.0A patent/CN117478390A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117714215A (en) * | 2024-02-06 | 2024-03-15 | 江苏开博科技有限公司 | Real-time network threat detection method and functional equipment |
CN117714215B (en) * | 2024-02-06 | 2024-04-23 | 江苏开博科技有限公司 | Real-time network threat detection method and functional equipment |
CN118228075A (en) * | 2024-05-23 | 2024-06-21 | 国网山东省电力公司宁津县供电公司 | Single-user power failure automatic alarm monitoring method and system based on big data |
CN118337525A (en) * | 2024-06-07 | 2024-07-12 | 蓝海睿创科技(山东)有限责任公司 | Cloud asset security management system based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117478390A (en) | Network intrusion detection method based on improved density peak clustering algorithm | |
CN111211994B (en) | Network traffic classification method based on SOM and K-means fusion algorithm | |
CN110460605B (en) | Abnormal network flow detection method based on automatic coding | |
CN111556016B (en) | Network flow abnormal behavior identification method based on automatic encoder | |
CN107579846B (en) | Cloud computing fault data detection method and system | |
CN109873779A (en) | A kind of grading type wireless identification of signal modulation method based on LSTM | |
WO2018006631A1 (en) | User level automatic segmentation method and system | |
CN112926403A (en) | Unsupervised pedestrian re-identification method based on hierarchical clustering and difficult sample triples | |
CN112529638B (en) | Service demand dynamic prediction method and system based on user classification and deep learning | |
CN115913691A (en) | Network flow abnormity detection method and system | |
CN112115957A (en) | Data stream identification method and device and computer storage medium | |
CN115580445A (en) | Unknown attack intrusion detection method, device and computer readable storage medium | |
CN115374851A (en) | Gas data anomaly detection method and device | |
Jinyin et al. | Fast density clustering algorithm for numerical data and categorical data | |
CN115344693A (en) | Clustering method based on fusion of traditional algorithm and neural network algorithm | |
CN114897085A (en) | Clustering method based on closed subgraph link prediction and computer equipment | |
CN110544047A (en) | Bad data identification method | |
CN117575745A (en) | Course teaching resource individual recommendation method based on AI big data | |
CN115827932A (en) | Data outlier detection method, system, computer device and storage medium | |
CN115830413A (en) | Image feature library updating method, image feature library checking method and related equipment | |
CN114124437B (en) | Encrypted flow identification method based on prototype convolutional network | |
CN115563520A (en) | Semi-supervised learning method based on kmeans clustering and application thereof | |
CN112014821B (en) | Unknown vehicle target identification method based on radar broadband characteristics | |
CN116976574A (en) | Building load curve dimension reduction method based on two-stage hybrid clustering algorithm | |
CN114547601A (en) | Random forest intrusion detection method based on multi-layer classification strategy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |