CN113780437B - Improved method of DPC clustering algorithm - Google Patents
Improved method of DPC clustering algorithm Download PDFInfo
- Publication number
- CN113780437B CN113780437B CN202111080561.3A CN202111080561A CN113780437B CN 113780437 B CN113780437 B CN 113780437B CN 202111080561 A CN202111080561 A CN 202111080561A CN 113780437 B CN113780437 B CN 113780437B
- Authority
- CN
- China
- Prior art keywords
- center
- cluster
- clustering
- density
- data points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 48
- 238000000034 method Methods 0.000 title claims abstract description 26
- 230000004927 fusion Effects 0.000 claims abstract description 56
- 238000007500 overflow downdraw method Methods 0.000 claims abstract description 11
- 238000004891 communication Methods 0.000 claims description 19
- 238000009826 distribution Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000006872 improvement Effects 0.000 claims description 7
- 244000140747 Iris setosa Species 0.000 claims description 2
- 235000000827 Iris setosa Nutrition 0.000 claims description 2
- 241001136653 Iris virginica Species 0.000 claims description 2
- 238000004138 cluster model Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000002776 aggregation Effects 0.000 description 3
- 238000004220 aggregation Methods 0.000 description 3
- 238000007621 cluster analysis Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 238000007499 fusion processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000009828 non-uniform distribution Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an improved method of DPC clustering algorithm, comprising the following steps: s1, selecting an initial clustering center through the mean value distance and the cut-off center; s2, clustering according to Euclidean distances from all data points to each initial clustering center by adopting a K-Means allocation strategy; s3, updating the cluster center, performing center offset, reassigning attributions to all data points, and repeating the operation; s4, judging whether center fusion is needed between clusters; if center fusion is needed, adopting the thought of an iterative fusion method to perform center fusion, and obtaining a new clustering result; if not, the final clustering result in S3 is adopted. The invention provides a new clustering idea, namely a method for searching an initial clustering center based on the maximum mean value distance and fusing each cluster based on high-density connection, and the idea of an iterative fusion method is adopted for central fusion to obtain a better clustering result.
Description
Technical Field
The invention relates to the technical field of data analysis and mining, in particular to an improved method of a DPC clustering algorithm.
Background
As a recent research field, cluster analysis involves many subjects such as data mining, pattern recognition, machine learning, and data analysis, and along with development of technology, the era of big data has been advanced, and information contained in data has high value. Cluster analysis is intended to divide objects into groups by only data information describing the objects and their relationships, such that the objects within a group are similar and the objects between groups are different.
Clustering essentially aggregates a set of clusters that typically contain all the objects in the dataset (some algorithms recognize noise, which is generally considered to belong to noise clusters). In addition, it may also specify the relationship of clusters to each other, for example, the hierarchical structure of clusters embedded in each other. The following more well-known clustering methods are listed according to the cluster model: 1) Connectivity-based cluster models, corresponding to hierarchical clustering: the hierarchical clustering algorithm has high time complexity. 2) Centroid-based cluster models, corresponding to partitional clustering: the general clustering algorithm has the problem that the number k of clusters is difficult to determine and the initial center point is difficult to select. 3) Based on the distributed cluster model, corresponding to model clustering: although the theoretical basis of these methods is very good, such algorithms are often prone to overfitting. 4) A mesh-based cluster model corresponding to a mesh cluster; but also involves the following disadvantages: (1) poor clustering accuracy; (2) low accuracy. 5) The density-based cluster model, corresponding to density clustering, still has some problems to be solved: (1) Complexity O (n) 2 ) The method is high and is not suitable for cluster analysis of large-scale data; (2) The process is not adaptive and the intrinsic parameters cannot be automatically adjusted. For example, the density peak and d cannot be adaptively selected c The method comprises the steps of carrying out a first treatment on the surface of the (3) The accuracy is easy to influence, and when DPC calculates the local density, if the local structure of the data is not considered, the phenomenon of losing a plurality of clusters, namely 'false peaks' and 'no peaks' is caused, so that the clustering accuracy is influenced; (4) The high-dimensional data has poor applicability because many dimensions in the high-dimensional data are independent of each other, which can result in the loss of some clusters.
Disclosure of Invention
The present invention provides an improved method of DPC clustering algorithms to overcome the above-described problems.
In order to achieve the above object, the present invention provides the following technical solutions:
s1, selecting an initial clustering center through the mean value distance and the cut-off center;
s2, clustering according to Euclidean distances from all data points to each initial clustering center by adopting a K-Means allocation strategy;
s3, updating the cluster center, performing center offset, reassigning attributions to all data points, and repeating the operation; when the Euclidean distance between two points of the new cluster center and the old cluster center is smaller than a set value, stopping updating the cluster center, and taking the last clustering result as a final clustering result;
s4, judging whether center fusion is needed or not between clusters obtained after updating the cluster center; if center fusion is needed, adopting the thought of an iterative fusion method to perform center fusion, and obtaining a new clustering result; if not, the final clustering result in S3 is adopted.
Further, the cluster center is updated continuously, including:
and adopting a K-Means to update the strategy of the cluster center, and calculating the average value of all data points in each cluster to serve as a new cluster center of the current cluster.
Further, reassigning the attributes for all the data points as described in S3 includes:
reassigning attributions to the data points according to the new cluster centers; and calculating the distance from all the data points to each new clustering center by adopting a K-Means distribution strategy, and re-clustering the data points according to the distance.
Further, the central fusing in S4 includes:
s511, traversing all cluster centers, judging whether the clusters are fused or not in pairs, stopping judging when two clusters to be fused are encountered, and returning the two cluster centers to be fused;
s512, solving the density of two cluster centers, taking the label of the cluster center with larger density in the two clusters as a fused cluster label, and returning a label distribution result after cluster fusion;
s513, binding the data set and the label again, and marking; solving the average value center of the same points of the cluster labels as a fusion cluster center;
s514, returning to the new cluster center set after fusion;
and S515, carrying out the fusion of every two cluster cores according to the iteration of the obtained new cluster cores until the new cluster cores can not be fused any more and the central fusion is finished.
Further, determining whether center fusion is required between clusters includes:
s521, taking the straight line distance between two cluster centers as the diameter, taking the midpoint of the straight line distance between the two cluster centers as the center of a circle, and finding out the data points respectively belonging to the two clusters in the circle;
s522, finding out paired pseudo core data points, which specifically comprises the following steps: calculating the distance between data points respectively belonging to two clusters in a circle, and finding out paired points with the distance between the two data points smaller than the truncated radius, namely paired pseudo-core data points; if no paired pseudo core data points exist, the two clusters cannot be fused;
the cutoff radius d c The calculation formula of (2) is as follows:
d c =maxDist*distPercent/100(1)
wherein distList represents a distance vector; distPercent represents a cut-off percentage, and the selection of distPercent makes corresponding adjustment according to the characteristics of different data sets;
s523, finding out paired true core data points, which specifically comprises the following steps: finding two paired data points with the density being greater than the minimum density from paired pseudo-core data points, namely paired true-core data points; if no pair of true core data points exist, the two clusters cannot be fused; the minimum density is a manually set value;
s524, judging whether the pair of true core book points and the two cluster centers are communicated with each other in a high density,
high density communication includes:
dividing the distance from each true core data point to the clustering center on the same side by the truncated radius to obtain a high-density threshold value on the side; taking the straight line distance between each true core data point on each side and the clustering center on the same side as the diameter, taking the midpoint of the straight line distance as the center of a circle, respectively making a circle, counting the number of high-density data points in the circle, and if the number of the high-density points is larger than or equal to the high-density threshold value of the local side, and the maximum local density in the circle is not more than twice the minimum local density, carrying out high-density communication on the local side, otherwise, not meeting the high-density communication on the local side;
when both sides meet high-density communication, the two clusters can be fused into one cluster, and if one side does not meet high-density communication or both sides do not meet high-density communication, the two clusters cannot be fused;
further, S1, selecting an initial clustering center by using the mean distance and the selected point range of the reduced initial clustering center, including:
s11, narrowing the point selection range of the initial clustering center: calculating the density of all data points, setting a minimum density value, and selecting all data points larger than the minimum density value;
s12, selecting a 1 st clustering center: finding the data point with the maximum density as the 1 st clustering center;
s13, selecting a 2 nd clustering center: excluding data points in the truncated radius of the 1 st clustering center, selecting the data point farthest from the 1 st clustering center from the rest data points, judging whether the distance between the data point and the 1 st clustering center is more than twice the truncated radius, and selecting the data point as the 2 nd clustering center if the distance between the data point and the 1 st clustering center is more than twice the truncated radius; otherwise, the cluster is not selected as the 2 nd cluster center, and the initial cluster center is selected;
s14, selecting a 3 rd clustering center: after selecting the 2 nd clustering center, excluding the data points in the 1 st clustering center cutoff radius and the data points in the 2 nd clustering center cutoff radius, finding the data point with the maximum average value of the sum of the distances from the 1 st clustering center and the 2 nd clustering center from the rest data points, judging whether the distances from the data point to the 1 st clustering center and the 2 nd clustering center are both larger than the twice cutoff radius, and if both the distances are larger than the twice cutoff radius, selecting the data point as the 3 rd clustering center; otherwise, the cluster is not selected as the 3 rd cluster center, and the initial cluster center is selected;
and repeating the step S14 until the distance from the new mean value center to the cluster center selected already is smaller than twice the cut-off radius, and stopping selecting the initial cluster center.
Further, the calculation formula of the local density is:
wherein ρ is i Represents the local density, n represents the number of data points, d ij =dist(x i ,x j ) Representing data point x i Data point x j Distance d of (d) c Representing the radius of truncation, ρ i Representing data point x i Is a local density of (c).
The invention provides a new clustering idea, namely an MM-HDC (max mean and high density connection) method for searching an initial clustering center based on the maximum mean value distance and fusing each cluster based on high-density connection, wherein the mean value distance and the cutoff center are firstly utilized to select the initial clustering center, then a K-Means distribution strategy is adopted to perform clustering according to the distance from all data points to each initial clustering center, then the clustering center is continuously updated, center shifting is performed until the position change of the new cluster center and the old cluster center is small, the updating of the clustering center is stopped, and the last clustering result is used as a final clustering result. And finally, performing center fusion by adopting the thought of an iterative fusion method to obtain a better clustering result.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it will be obvious that the drawings in the following description are some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a view of the 1 st clustering visual effect of the invention;
FIG. 3 is a 7 th clustering visualization effect diagram of the present invention;
FIG. 4 is a view of the 3 rd order center fusion visualization of the present invention;
FIG. 5 is a visual illustration of the high density communication effect of the present invention;
FIG. 6 is a 3 rd clustering result on Iris dataset of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1, an improved method of DPC clustering algorithm includes:
s1, selecting an initial clustering center through the mean value distance and the cut-off center;
s2, clustering according to Euclidean distances from all data points to each initial clustering center by adopting a K-Means allocation strategy;
s3, updating the cluster center, performing center offset, reassigning attributions to all data points, and repeating the operation; when the Euclidean distance between two points of the new cluster center and the old cluster center is smaller than 1, stopping updating the cluster center, and taking the last clustering result as a final clustering result;
s4, judging whether center fusion is needed between clusters; if center fusion is needed, adopting the thought of an iterative fusion method to perform center fusion, and obtaining a new clustering result; if not, the final clustering result in S3 is adopted.
Preferably, the present invention includes:
and adopting a K-Means to update the strategy of the cluster center, and calculating the average value of all points in each cluster to serve as a new cluster center of the current cluster.
Preferably, the method comprises the steps of:
reassigning attributions to the data points according to the new cluster centers; and calculating the distance from all the data points to each new clustering center by adopting a K-Means distribution strategy, and re-clustering the data points according to the distance.
Preferably, the central fusion comprises:
s511, traversing all cluster centers, judging whether the clusters are fused or not, and stopping searching and returning two cluster center indexes to be fused when two clusters to be fused are encountered;
s512, solving the density of two cluster cores, taking the label of the cluster core with large density as a fused cluster label, and returning a label distribution result after cluster fusion;
s513, changing the data set labels into order, comprising: binding the data set and the label again, and marking; solving the average value center of the same points of the cluster labels as a fusion cluster center;
s514, a fusion process of two clusters is visualized through Matplotlib, and a new cluster center set after fusion is returned;
and S515, carrying out the fusion of every two cluster cores according to the iteration of the obtained new cluster cores until the new cluster cores can not be fused any more and the central fusion is finished.
Preferably, determining whether center fusion is required between clusters includes:
s521, taking the straight line distance between two cluster centers as the diameter, taking the midpoint of the straight line distance between the two cluster centers as the center of a circle, and finding out the points respectively belonging to the two clusters in the circle.
Specifically, the cluster centers of two clusters to be fused are designated as a and b, and the distance between the two cluster centers is designated as d ab The method comprises the steps of carrying out a first treatment on the surface of the Taking the distance between two cluster centers as a diameter to make a circle, and finding out a cluster A epsilon a and a cluster B epsilon B in the circle; straight line distance d between two cluster centers ab Is of diameter d ab The center point o of the cluster a epsilon a is the center point of the cluster a, and the point set A, B epsilon B is the center point of the cluster B.
S522, finding a pair of pseudo core points: calculating the distances between every two points belonging to two clusters in the circle, and finding out paired points with the distances between the two points smaller than the cut-off radius, namely paired pseudo core points; if no paired pseudo core points exist, the two clusters cannot be fused;
preferably, the cutoff radius d c The calculation formula of (2) is as follows:
d c =maxDist*distPercent/100
wherein distList represents a distance vector; distPercent represents a cut-off percentage, and the selection of distPercent makes corresponding adjustment according to the characteristics of different data sets;
specifically, find pairs of pseudo core points: d, d AB <d c The Euclidean distance from each point in the set of points A to each point in the set of points B is denoted as d AB The cut-off distance chosen when calculating the density of each point in the dataset is denoted d c The density of each point here refers to d centered around the current point c The number of points in the radius range, wherein the cut-off distance is set as the cut-off distance parameter in the DPC algorithm; if d AB The existing distance is smaller than d c These paired points are called paired pseudo core points (a ', B'), among which points the point belonging to set a is denoted a ', and the point belonging to set B is denoted B'.
S523, finding a pair of true core points: finding out paired points with the densities of the two points being greater than the minimum density from paired pseudo core points, namely paired true core points; if no pair of true core points exist, the two clusters cannot be fused; the minimum density is a manually set value;
specifically, find the true core point: finding ρ from A', B i <ρ min Pairs of points x e A ', y e B', where x and y are connected in density. Wherein ρ is i Representing the density of point i ρ min Representing a minimum density value, wherein if a point with the top 70% of the truncated density rank is taken as a point selection range of an initial clustering center, the point is the density value of the last point with the top 70% of the truncated density rank in descending order; wherein x, y are the pairs of points in the two clusters meeting the above conditions, these pairs of points are called true core points (x, y), i.e., points connected in density, and (x, y) ∈ (A ', B')
S524, judging whether the pair true core points are communicated with the two cluster centers in a high density manner:
dividing the distance from each true core point to the clustering center on the same side by the truncated radius to obtain a high-density threshold value of the side; taking the straight line distance between the true core point of each side and the clustering center of the same side as the diameter, taking the midpoint of the straight line distance as the center of a circle, respectively making a circle, counting the number of high-density points in the circle, and if the number of the high-density points is larger than or equal to the high-density threshold value of the side, carrying out high-density communication on the side, otherwise, not meeting the high-density communication on the side;
when both sides meet high-density communication, the two clusters can be fused into one cluster, and if one side does not meet high-density communication or both sides do not meet high-density communication, the two clusters cannot be fused.
Specifically, determining whether or not high-density communication includes:
(1) Connect ax, d ax The diameter is round, and the number of points p in the round is high ax ≥d ax /d c
(2) Connect by, d by The diameter is round, and the number of points p in the round is high by ≥d by /d c
(3)p ax 、p by The number of points is not much different, i.e. the local density cannot be more than doubled.
The cluster core a and the true core point x are communicated with each other in a high density mode, and the cluster core b and the true core point y are communicated with each other in a high density mode, so that the cluster cores a and b are communicated with each other in a high density mode, namely, the cluster a and the cluster b can be combined into one cluster. (1) (2) when making circles, the center of the circles are respectively the midpoints of ax and by, d ax Represents the linear distance from a to x, d ax /d c Representing each d c At least one high density point is arranged in the distance, the other side is the same, and the high density point is ρ i >ρ min Is a point of (2).
Preferably, S1 selects an initial clustering center by using the mean distance and the selected point range of the reduced initial clustering center, and the method comprises the following steps:
s11, narrowing the point selection range of the initial clustering center: calculating the density of all data points, setting a minimum density value, and selecting all data points larger than the minimum density value;
s12, selecting a 1 st clustering center: finding the data point with the maximum density as the 1 st clustering center;
s13, selecting a 2 nd clustering center: excluding points in the truncated radius of the 1 st clustering center, picking the furthest point from the 1 st clustering center in the rest data points, judging whether the distance from the point to the 1 st clustering center is more than twice the truncated radius, and selecting the point as the 2 nd clustering center if the distance from the point to the 1 st clustering center is more than twice the truncated radius; otherwise, the cluster is not selected as the 2 nd cluster center, and the initial cluster center is selected;
s14, selecting a 3 rd clustering center: after selecting the 2 nd clustering center, removing data points in the 1 st clustering center cutoff radius and data points in the 2 nd clustering center cutoff radius, finding out the point with the maximum average value of the sum of distances from the 1 st clustering center and the 2 nd clustering center from the rest points, judging whether the distances from the point to the 1 st clustering center and the 2 nd clustering center are both larger than the twice cutoff radius, and selecting the point as the 3 rd clustering center if the distances are both larger than the twice cutoff radius; otherwise, the cluster is not selected as the 3 rd cluster center, and the initial cluster center is selected;
and repeating the step S14 until the distance between the center of the new mean value and some cluster centers selected before is smaller than twice the truncated radius, and finishing the initial cluster center selection.
Specifically, S11, calculate the density ρ of all data points i And arranging in a descending order, and setting Δρ=70% as a point selection range of a cluster center; Δρ represents the cut-off percentage under the condition that the selected cluster center is the point with higher density after all the points are ordered in descending density order, and Δρ of the subsequent steps is the meaning
S12, selecting a first clustering center: among the points of Δρ=70%, the point with the greatest density is found as the 1 st cluster center;
s13, selecting a second clustering center: from the points of Δρ=70%, the 1 st cluster center d is excluded c Among the remaining points with Δρ=70%, the point farthest from the 1 st cluster center is selected, and it is determined whether the distance from the point to the 1 st cluster center is greater than 2*d c If > 2*d c The 2 nd cluster center is selected. A step of
Specifically, here > 2*d c The method is to ensure that a new cluster center and a core point area of a selected cluster center cannot be crossed, otherwise, the core areas of the two points should be integrated into a core area to avoid the formation of fusion points, the new cluster center is not a core point of a first cluster center which is selected and is relatively far away, if the new cluster center is not greater than the first cluster center, the point is not selected as a second cluster center, and the initial cluster center is selected completely, namely S15;
s14, selecting a third clustering center: from these points of Δρ=70%, the 1 st cluster center d is excluded c Points within radius and cluster center 2 d c After points within the radius, find from the remaining points of Δρ=70%The point with the maximum average value of the sum of the distances from the 1 st clustering center and the 2 nd clustering center is judged whether the distances from the point to the 1 st clustering center and the 2 nd clustering center are more than 2*d c If all are > 2*d c Selecting the clustering center as a 3 rd clustering center;
specifically, here > 2*d c The method is to ensure that a new cluster center and a core point area of a selected cluster center cannot be crossed, otherwise, the core areas of the two points are integrated into a core area, the formation of fusion points is avoided, and the new cluster center is not the core point of the first cluster center and is relatively far away from the first cluster center; if the distance to each selected cluster center is not satisfied > 2*d c I.e. one is greater than one and not greater than or neither is greater than, the point is not selected as a third cluster center, and the initial cluster center is selected completely, i.e. S15;
s15, repeating the operation until the distance between the center of the new mean value and some cluster centers selected before is less than 2*d c And finishing the selection of the initial cluster center.
Preferably, the calculation formula of the local density is:
wherein n represents the number of data points, d ij =dist(x i ,x j ) Representing point x i And point x j Distance d of (d) c Representing the radius of truncation, ρ i Representing data point x i Is a local density of (c).
Preferably, the calculation formula of the density distance of the current point is:
δ i =max j (d ij ) (3)
δ i =min i d ij (i:ρ j >ρ i ) (4)
wherein d ij =dist(x i ,x j ) Representing point x i And point x j Distance ρ of (1) i Representing data pointsx i Local density ρ of j Representing data point x j Local density, delta i Representing the current point x i Density distance of (2);
if the current point x i With maximum local density, delta i Representing data set and point x i Data point with greatest distance to x i A distance therebetween; if the current point x i Without maximum local density, delta i Indicating that the local density is greater than point x in all i And x in the data points of (2) i The data point with the smallest distance to x i Distance between them.
Example 2
The implementation steps of the present invention are divided into the following two parts: the first part is the initial cluster and the second part is the center fusion, described below as detailed steps of the improved algorithm herein.
1. Initial clustering: maximum mean +K-Means
step1, selecting initial clustering center (see initial clustering center selection strategy)
step2, 1 st clustering
And assigning attributions to all the data points according to the selected initial clustering center. Calculating the distance from all data points to each initial cluster center by adopting a K-Means distribution strategy, and classifying the distance from the initial cluster center to the cluster;
step3, updating cluster center, and performing center offset
And adopting a K-Means to update the strategy of the cluster center, and calculating the average value of all points in each cluster to serve as a new cluster center of the current cluster.
step4, reassigning ownership of all data points
And reassigning attributions to all the data points according to the new clustering center. Calculating the distance from all data points to each new cluster center by adopting a K-Means distribution strategy, and classifying the distance from the new cluster center to the cluster;
step5 repeat step3 and step4
step6, setting a stop condition to obtain a final clustering result
Until the change of the position of the new cluster center and the old cluster center is small, namely the distance is small, a small value can be defined, and the updating of the cluster center is stopped. And taking the last clustering result as a final clustering result.
Center fusion: iterative fusion method
The two-cluster fusion comprises a fusion cluster core (the average value center of two clusters of points obtained by using K-Means) and a fusion cluster label (the label of the cluster core with high density), and the specific steps are as follows:
step1, traversing all cluster centers, judging whether the clusters are fused or not in pairs, stopping searching when two clusters to be fused are encountered, and returning two cluster center indexes to be fused
step2 obtaining the density (d) of two cluster centers c The number of points in the radius), the label of the cluster center with large density is used as a fusion cluster label, and a label distribution result after cluster fusion is returned
step3, changing the labels of the data set into order, and finding the average center of the same points of the cluster labels as the fusion cluster center
step4, visualizing the fusion process of the two clusters, and returning to the new cluster center set after fusion
step5, carrying out the two-to-two cluster core fusion according to the iteration of the obtained new cluster core until the new cluster core can not be fused again and the center fusion is finished.
Principle of determining whether to fuse:
(1) Let the cluster centers of two clusters to be fused be a and b. D is set as ab For diameter circle, respectively finding out the point set of A E a cluster and B E B cluster in the circle
(2) Find pairs of pseudo core points: d, d AB <d c Is formed as point a ', B'.
(3) Find the true core point: finding ρ from A', B i <ρ min Pairs of points x e A ', y e B', where x and y are connected in density.
(4) Judging whether ax and by are communicated in a high density:
1) D is set as ax The diameter is round, and the number of points p in the round is high ax ≥d ax /d c
2) D is set as by The diameter is round, and the number of points p in the round is high by ≥d by /d c
3)p ax 、p by The number of points is not much different, i.e. the local density cannot be more than doubled.
The cluster cores a and the true core point x are communicated with each other in a high density manner, and the cluster cores b and the true core point y are communicated with each other in a high density manner, so that the cluster cores a and b are communicated with each other in a high density manner, namely, the cluster a and the cluster b can be combined into one cluster.
Note that: d, d ax Represents the linear distance from a to x, d ax /d c Representing each d c At least one high density point is located in the distance, the high density point is local density>Points of minimum density.
Initial cluster center selection strategy
(1) Calculating the density ρ of all data points i And the clustering centers are arranged in a descending order, and Δρ=70% is set as a clustering center point selection range;
(2) First cluster center: among these points of Δρ=70%, the point of greatest density is found as the 1 st cluster center;
(3) The second cluster center: from these points of Δρ=70%, the 1 st cluster center d is excluded c After the points within the radius, the point farthest from the 1 st cluster center is selected from the remaining points at Δρ=70%, and it is determined whether the distance from this point to the 1 st cluster center is > 2*d c If > 2*d c Selecting the clustering center as a 2 nd clustering center;
(4) Third cluster center: from these points of Δρ=70%, the 1 st cluster center d is excluded c Points within radius and cluster center 2 d c After the points within the radius, find the point with the largest average value of the sum of the distances from the 1 st cluster center and the 2 nd cluster center from the rest points with Δρ=70%, and judge whether the distances from the point to the 1 st cluster center and the 2 nd cluster center are all > 2*d c If all are > 2*d c Selecting the clustering center as a 3 rd clustering center;
(5) Selection of the remaining cluster centers and so on, each new mean center excludes d for all cluster centers that have been currently selected from Δρ=70% of these points c After the points within the radius, find the distance from all cluster centers that have been selected at present from the points that remain 70% of the density rankingThe point of maximum mean value of the sum is reached until the distance from the center of the new mean value to some cluster centers which have been selected before is less than 2*d c Stopping the selection of the cluster center, and finishing the selection of the initial cluster center.
The invention provides an MM-HDC (max mean and high density connection) method for searching an initial clustering center based on a maximum mean value distance and fusing each cluster based on high-density connection on the basis of combining each research result. First, Δρ=70% is set to select initial cluster centers from the initial cluster centers, and the mean distance is introduced until the distance from the new mean center to some cluster centers already selected before is less than 2*d c And stopping selecting the cluster center, and finishing the selection of the initial cluster center. And then adopting a K-Means allocation strategy to cluster according to the distance from all data points to each initial cluster center, continuously updating the cluster centers, performing center offset until the position change of the new cluster center and the old cluster center is small (namely, the distance is small), stopping updating the cluster centers, and taking the last clustering result as a final clustering result. And finally, performing center fusion by adopting the thought of an iterative fusion method to obtain a better clustering result. Experimental results of classical data sets show that the MM-HDC algorithm is superior to the DPC algorithm and the K-Means algorithm, and the improved density peak clustering algorithm has higher accuracy. Furthermore, the MM-HDC algorithm can yield satisfactory results on data sets of special shapes or non-uniform distributions.
Example 3
In order to effectively solve a plurality of defects in the field of the existing density peak clustering algorithm, the invention is technically improved from the following multiple angles, thereby realizing the characteristics of remarkable classification effect, high precision, strong practicability and the like, and specifically comprises the following steps:
aiming at the problems, the patent improves the density peak clustering, proposes a new clustering idea, namely an MM-HDC (max mean and high density connection) method for searching an initial clustering center based on the maximum mean distance and fusing each cluster based on high-density connection,
firstly, selecting an initial clustering center by using the mean value distance and the cut-off center,
then adopting a K-Means allocation strategy to cluster according to the distance from all data points to each initial cluster center,
and continuously updating the cluster center, performing center offset until the position change of the new cluster center and the old cluster center is small (namely, the distance is small), stopping updating the cluster center, and taking the last clustering result as a final clustering result. And finally, performing center fusion by adopting the thought of an iterative fusion method to obtain a better clustering result.
The local density ρ is proposed i Distance delta i
For any data point i, 2 quantities need to be calculated: local density of data point i and its distance from the nearest point with higher density i The definition is as follows:
wherein n is the number of data points d ij =dist(x i ,x j ) Represents x i And x j Is a distance of (2); χ is an indicator function, the function χ (x) is defined as follows: when x < 0, χ (x) =1, otherwise χ (x) =0; d, d c Is the radius of the truncation. As can be seen, ρ i Equal to d distributed at i c The number of data points in the neighborhood, i.e., the density. Delta i Then it is measured by calculating the minimum distance between point i and other points of higher density:
δ i =min i d ij (i:ρ j >ρ i ) (2)
for the highest density point, delta can be taken i =max j (d ij ) Finally, the data points are arranged in descending order of density.
Gaussian kernel:
From the formulas (1) and (3), Gaussian kernel is a continuous value calculation method, so that the latter produces less probability of collisions (i.e., different data points have the same local density). For data point x i Relative distance delta i Can be defined as:
as can be seen from equation (4), when x i Delta with maximum local density i Representing data set and x i Data point with greatest distance to x i Distance between them. Otherwise, delta i Indicating that the local density is greater than x at all i And x in the data points of (2) i The data point (or points) with the smallest distance to x i Distance between them. For each data point x in the dataset i Calculating the local density ρ i And relative distance delta i . Each point in the dataset is represented as a binary pair (ρ i ,δ i ) Drawn on a plane (in ρ i Is the transverse axis delta i Vertical axis), referred to as a decision graph. The decision diagram is the key of the DPC algorithm to select the cluster center. Selecting points of the relative upper right area in the decision diagram as cluster centers, wherein p of the points i And delta i The value is relatively large, satisfying two characteristics of the cluster center. For data sets with complex decision graphs, it is difficult to select the correct cluster center, and the one-step merging strategy of DPC algorithm to non-cluster center points can also cause a chain reaction, and once a data point is allocated incorrectly, a series of sample data set class cluster errors can be caused. We have improved on the selection of the initial cluster center. And the patent selects 3 evaluation indexes of the contour coefficient, the Karnsky-Harabasi index and the Dyson Barbit index.
Compared with the prior art, the experiment of the invention adopts visual display to the result of data set aggregation, one color represents one class, and the parameters are set to be percentage 1.8 and part 0.7. The initial cluster center is selected according to the strategy of find_centers_auto () function, and the strategy of who is close to and allocated to K-Means is used for carrying out the first clustering, as shown in figure 2 as the first clustering result.
After each clustering, calculating the mean center of the cluster as a new cluster center of the cluster, and then clustering by using a strategy that K-Means is close to who is allocated to who, as shown in figure 3, as a result of the last clustering;
and after the clustering is finished, performing central fusion, performing iterative fusion according to a run_concat () function strategy by adopting an iterative fusion method, wherein the total number of the fusion is 3, and the result of the third fusion is shown in fig. 4.
As can be seen from fig. 2-5: on the data set aggregation, the seventh clustering can be divided into ten types, and after center fusion, the 7 types can be clustered well through continuous iterative updating, so that a satisfactory clustering result is obtained.
The following experiments are performed on different data sets under different algorithms, the experiments respectively cluster the improved algorithm of DPC, the K-Means algorithm and the DPC algorithm on the following 4 data sets, and give corresponding optimal parameters, the bold is the optimal result in each algorithm, the results of the data sets are visually displayed, one color represents a class, and the following table 1 is the comparison of the clustered results:
table 1 detailed comparison of experiments
As can be seen from the comparison experiment results in Table 1, in the aggregation data set, the MM-HDC is higher than the K-Means and DPC algorithms in sc index, and the MM-HDC and DPC algorithms can obtain satisfactory clustering results, but the clustering results of the K-Means algorithm are not ideal; on the pathbased dataset, the MM-HDC algorithm can cluster well into 3 classes here, but DPC does not divide the dataset well into 3 classes, since DPC simply calculates local density and relative distance properties, DPC does not identify all classes well for datasets with uneven dataset distribution, we can also see that the davilsburg index of MM-HDC algorithm is higher than K-Means and DPC algorithms; on the synthetic data set, the MM-HDC algorithm can be clustered into 5 classes well, and the satisfactory effect is achieved; on a flame data set, the DPC algorithm can achieve a good effect, the clustering result of the K-Means algorithm is not ideal, and the MM-HDC algorithm effect is obviously better than that of the K-Means algorithm.
Example 4
Example application
As shown in fig. 6, the clustering algorithm is an important non-supervised learning technique in machine learning, and has been widely used in various fields such as business, bioinformatics, image processing, social networking, e-commerce, and many other fields. Iris dataset is a classical real dataset, which is often used as an example in both the fields of statistical learning and machine learning. The data set contains 150 rows of data in total, and each row of data consists of 4 characteristic values and one target value. The 4 eigenvalues are respectively: sepal length, sepal width, petal length, petal width; the target values are three different types of iris respectively: irisSetosa, irisVersicolour, irisVirginica. It is possible to predict which of the 3 flower varieties the iris flower belongs to from these 4 features.
Selecting the sepal length and the petal length as characteristics, clustering Iris Iris data sets by using an MM-HDC clustering algorithm model, and obtaining a final clustering result through 3 times of clustering as shown in figure 6, wherein the parameter values related to the algorithm are as follows: cut-off distance percentage =2.1, cut-off center percentage part =0.8. For the improved dpc clustering algorithm proposed by the present invention, the evaluation indexes are as follows: sc:0.5218, chi:495.0828, dbi:0.5631. it can be seen that the MM-HDC clustering algorithm provided herein can better gather iris data into 3 categories, and the 3 classical clustering result evaluation indexes of the contour coefficient sc, the kalin-shoba index chi and the davison Ding Bao index dbi are all well behaved, so that the MM-HDC clustering algorithm based on the dpc clustering algorithm improvement provided herein is also applicable to a real data set, can be used for clustering flower varieties according to the data representation of the biological characteristics of a certain flower in actual life, and has practical significance.
The beneficial effects are that:
the invention provides a new clustering idea, namely an MM-HDC (max mean and high density connection) method for searching an initial clustering center based on the maximum mean value distance and fusing each cluster based on high-density connection, wherein the mean value distance and the cutoff center are firstly utilized to select the initial clustering center, then a K-Means distribution strategy is adopted to perform clustering according to the distance from all data points to each initial clustering center, then the clustering center is continuously updated, center shifting is performed until the position change of the new cluster center and the old cluster center is small (namely, the distance is small), the updating of the clustering center is stopped, and the last clustering result is used as a final clustering result. And finally, performing center fusion by adopting the thought of an iterative fusion method to obtain a better clustering result.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.
Claims (5)
1. An improvement method of a DPC clustering algorithm is characterized in that the improvement method is used for carrying out clustering analysis on an Iris Iris data set, wherein the data set contains 150 rows of data in total, each row of data consists of 4 characteristic values and a target value, and the 4 characteristic values are respectively: sepal length, sepal width, petal length, petal width; the target values are three different types of iris respectively: irisSetosa, irisVersicolour, irisVirginica, comprising:
s1, selecting an initial clustering center through the mean value distance and the cut-off center;
s2, clustering according to Euclidean distances from all data points to each initial clustering center by adopting a K-Means allocation strategy;
s3, updating the cluster center, performing center offset, reassigning attributions to all data points, and repeating the operation; when the Euclidean distance between two points of the new cluster center and the old cluster center is smaller than a set value, stopping updating the cluster center, and taking the last clustering result as a final clustering result;
s4, judging whether center fusion is needed or not between clusters obtained after updating the cluster center; if center fusion is needed, adopting the thought of an iterative fusion method to perform center fusion, and obtaining a new clustering result; if not, adopting the final clustering result in the S3;
the center fusion in S4 includes:
s511, traversing all cluster centers, judging whether the clusters are fused or not in pairs, stopping judging when two clusters to be fused are encountered, and returning the two cluster centers to be fused;
s512, solving the density of two cluster centers, taking the label of the cluster center with larger density in the two clusters as a fused cluster label, and returning a label distribution result after cluster fusion;
s513, binding the data set and the label again, and marking; solving the average value center of the same points of the cluster labels as a fusion cluster center;
s514, returning to the new cluster center set after fusion;
s515, carrying out pairwise cluster core fusion according to the obtained new cluster core iteration until the new cluster core can not be fused any more and the center fusion is finished;
judging whether center fusion is needed between clusters or not comprises the following steps:
s521, taking the straight line distance between two cluster centers as the diameter, taking the midpoint of the straight line distance between the two cluster centers as the center of a circle, and finding out the data points respectively belonging to the two clusters in the circle;
s522, finding out paired pseudo core data points, which specifically comprises the following steps: calculating the distance between data points respectively belonging to two clusters in a circle, and finding out paired points with the distance between the two data points smaller than the truncated radius, namely paired pseudo-core data points; if no paired pseudo core data points exist, the two clusters cannot be fused;
the cutoff radius d c The calculation formula of (2) is as follows:
d c =maxDist*distPercent/100(1)
wherein distList represents a distance vector; distPercent represents a cut-off percentage, and the selection of distPercent makes corresponding adjustment according to the characteristics of different data sets;
s523, finding out paired true core data points, which specifically comprises the following steps: finding two paired data points with the density being greater than the minimum density from paired pseudo-core data points, namely paired true-core data points; if no pair of true core data points exist, the two clusters cannot be fused; the minimum density is a manually set value;
s524, judging whether the pair of true core book points and the two cluster centers are communicated with each other in a high density,
high density communication includes:
dividing the distance from each true core data point to the clustering center on the same side by the truncated radius to obtain a high-density threshold value on the side; taking the straight line distance between each true core data point on each side and the clustering center on the same side as the diameter, taking the midpoint of the straight line distance as the center of a circle, respectively making a circle, counting the number of high-density data points in the circle, and if the number of the high-density points is larger than or equal to the high-density threshold value of the local side, and the maximum local density in the circle is not more than twice the minimum local density, carrying out high-density communication on the local side, otherwise, not meeting the high-density communication on the local side;
when both sides meet high-density communication, the two clusters can be fused into one cluster, and if one side does not meet high-density communication or both sides do not meet high-density communication, the two clusters cannot be fused.
2. The improvement of DPC clustering algorithm according to claim 1, wherein the continuously updating cluster center in S3 includes:
and adopting a K-Means to update the strategy of the cluster center, and calculating the average value of all data points in each cluster to serve as a new cluster center of the current cluster.
3. The improvement to a DPC clustering algorithm of claim 1, wherein the reassigning attributes to all data points in S3 comprises:
reassigning attributions to the data points according to the new cluster centers; and calculating the distance from all the data points to each new clustering center by adopting a K-Means distribution strategy, and re-clustering the data points according to the distance.
4. The improvement method of DPC clustering algorithm according to claim 1, wherein the selecting an initial cluster center by using the mean distance and the reduced selection point range of the initial cluster center in S1 includes:
s11, narrowing the point selection range of the initial clustering center: calculating the density of all data points, setting a minimum density value, and selecting all data points larger than the minimum density value;
s12, selecting a 1 st clustering center: finding the data point with the maximum density as the 1 st clustering center;
s13, selecting a 2 nd clustering center: excluding data points in the truncated radius of the 1 st clustering center, selecting the data point farthest from the 1 st clustering center from the rest data points, judging whether the distance between the data point and the 1 st clustering center is more than twice the truncated radius, and selecting the data point as the 2 nd clustering center if the distance between the data point and the 1 st clustering center is more than twice the truncated radius; otherwise, the cluster is not selected as the 2 nd cluster center, and the initial cluster center is selected;
s14, selecting a 3 rd clustering center: after selecting the 2 nd clustering center, excluding the data points in the 1 st clustering center cutoff radius and the data points in the 2 nd clustering center cutoff radius, finding the data point with the maximum average value of the sum of the distances from the 1 st clustering center and the 2 nd clustering center from the rest data points, judging whether the distances from the data point to the 1 st clustering center and the 2 nd clustering center are both larger than the twice cutoff radius, and if both the distances are larger than the twice cutoff radius, selecting the data point as the 3 rd clustering center; otherwise, the cluster is not selected as the 3 rd cluster center, and the initial cluster center is selected;
and repeating the step S14 until the distance from the new mean value center to the cluster center selected already is smaller than twice the cut-off radius, and stopping selecting the initial cluster center.
5. The improvement method of DPC clustering algorithm according to claim 1, wherein the calculation formula of the local density is:
wherein ρ is i Represents the local density, n represents the number of data points, d ij =dist(x i ,x j ) Representing data point x i Data point x j Distance d of (d) c Representing the radius of truncation, ρ i Representing data point x i Is a local density of (c).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111080561.3A CN113780437B (en) | 2021-09-15 | 2021-09-15 | Improved method of DPC clustering algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111080561.3A CN113780437B (en) | 2021-09-15 | 2021-09-15 | Improved method of DPC clustering algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113780437A CN113780437A (en) | 2021-12-10 |
CN113780437B true CN113780437B (en) | 2024-04-05 |
Family
ID=78844005
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111080561.3A Active CN113780437B (en) | 2021-09-15 | 2021-09-15 | Improved method of DPC clustering algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113780437B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114399004A (en) * | 2022-02-25 | 2022-04-26 | 上海图灵智算量子科技有限公司 | Mixed clustering method for simulating bifurcation and brain heuristic cognition |
CN114896393B (en) * | 2022-04-15 | 2023-06-27 | 中国电子科技集团公司第十研究所 | Data-driven text increment clustering method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20110096236A (en) * | 2010-02-22 | 2011-08-30 | 숭실대학교산학협력단 | Apparatus and method for clustering using mutual information between clusters |
US8355998B1 (en) * | 2009-02-19 | 2013-01-15 | Amir Averbuch | Clustering and classification via localized diffusion folders |
CN109840558A (en) * | 2019-01-25 | 2019-06-04 | 南京航空航天大学 | Based on density peaks-core integration adaptive clustering scheme |
CN111914930A (en) * | 2020-07-30 | 2020-11-10 | 上海工程技术大学 | Density peak value clustering method based on self-adaptive micro-cluster fusion |
CN113344019A (en) * | 2021-01-20 | 2021-09-03 | 昆明理工大学 | K-means algorithm for improving decision value selection initial clustering center |
-
2021
- 2021-09-15 CN CN202111080561.3A patent/CN113780437B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8355998B1 (en) * | 2009-02-19 | 2013-01-15 | Amir Averbuch | Clustering and classification via localized diffusion folders |
KR20110096236A (en) * | 2010-02-22 | 2011-08-30 | 숭실대학교산학협력단 | Apparatus and method for clustering using mutual information between clusters |
CN109840558A (en) * | 2019-01-25 | 2019-06-04 | 南京航空航天大学 | Based on density peaks-core integration adaptive clustering scheme |
CN111914930A (en) * | 2020-07-30 | 2020-11-10 | 上海工程技术大学 | Density peak value clustering method based on self-adaptive micro-cluster fusion |
CN113344019A (en) * | 2021-01-20 | 2021-09-03 | 昆明理工大学 | K-means algorithm for improving decision value selection initial clustering center |
Non-Patent Citations (2)
Title |
---|
基于K近邻的密度峰值聚类算法;曾嘉豪;;中国国际财经(中英文);20180308(第05期);全文 * |
基于改进的密度峰值算法的K-means算法;杜洪波;白阿珍;朱立军;;统计与决策;20180930(第18期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113780437A (en) | 2021-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113780437B (en) | Improved method of DPC clustering algorithm | |
CN111814251B (en) | Multi-target multi-modal particle swarm optimization method based on Bayesian adaptive resonance | |
CN105930862A (en) | Density peak clustering algorithm based on density adaptive distance | |
CN104217015B (en) | Based on the hierarchy clustering method for sharing arest neighbors each other | |
CN111553127A (en) | Multi-label text data feature selection method and device | |
CN111626321B (en) | Image data clustering method and device | |
CN113344019A (en) | K-means algorithm for improving decision value selection initial clustering center | |
CN111401785A (en) | Power system equipment fault early warning method based on fuzzy association rule | |
Latifi-Pakdehi et al. | DBHC: A DBSCAN-based hierarchical clustering algorithm | |
CN110348488A (en) | A kind of modal identification method based on local density's peak value cluster | |
CN111079788A (en) | K-means clustering method based on density Canopy | |
CN111291822A (en) | Equipment running state judgment method based on fuzzy clustering optimal k value selection algorithm | |
CN113435108A (en) | Battlefield target grouping method based on improved whale optimization algorithm | |
CN117407732A (en) | Unconventional reservoir gas well yield prediction method based on antagonistic neural network | |
CN107704872A (en) | A kind of K means based on relatively most discrete dimension segmentation cluster initial center choosing method | |
CN111814979B (en) | Fuzzy set automatic dividing method based on dynamic programming | |
Saha et al. | Performance evaluation of some symmetry-based cluster validity indexes | |
CN111914930A (en) | Density peak value clustering method based on self-adaptive micro-cluster fusion | |
CN112801197A (en) | K-means method based on user data distribution | |
Kliegr | Quantitative CBA: Small and Comprehensible Association Rule Classification Models | |
CN112308160A (en) | K-means clustering artificial intelligence optimization algorithm | |
Banka et al. | Feature selection and classification for gene expression data using evolutionary computation | |
CN112951438A (en) | Outlier detection method based on noise threshold distance measurement | |
Saha et al. | A new multiobjective simulated annealing based clustering technique using stability and symmetry | |
CN111046914B (en) | Semi-supervised classification method based on dynamic composition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |