CN107563443A - A kind of adaptive semi-supervised Density Clustering method and system - Google Patents
A kind of adaptive semi-supervised Density Clustering method and system Download PDFInfo
- Publication number
- CN107563443A CN107563443A CN201710789195.6A CN201710789195A CN107563443A CN 107563443 A CN107563443 A CN 107563443A CN 201710789195 A CN201710789195 A CN 201710789195A CN 107563443 A CN107563443 A CN 107563443A
- Authority
- CN
- China
- Prior art keywords
- density
- data
- category
- cluster
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to technical field of data processing, discloses a kind of adaptive semi-supervised Density Clustering method and system, automatically extracts density parameter from the data with category and without category first;Then initial clustering analysis is carried out using multiple density parameters on target data set, obtains Local Clustering result;Finally by the integration to local cluster result, final global clustering result is obtained.Algorithm proposed by the present invention need not set cluster class number, and in process of cluster analysis, cluster class number automatically determines according to data set density information;Algorithm proposed by the present invention can automatically extract multigroup density parameter from the data with category and without category, then density clustering analysis is carried out to target data set using these density parameters, excellent cluster analysis result can be obtained, and there is stronger adaptivity and noise immunity.
Description
Technical field
The invention belongs to technical field of data processing, more particularly to a kind of adaptive semi-supervised Density Clustering method and it is
System.
Background technology
Clustering algorithm is in computer operation, it may be said that is one of most important data mining task, it is directed to target
The structure of cluster (cluster) in data.One cluster is made up of the example of all " similar " between some, " dissmilarity "
Example is then in other cluster.Under different angles and different standards, the category division of clustering algorithm seems varied.
However, there is a kind of division system to clustering algorithm, approved by everybody.Clustering algorithm is divided into level and gathered by the system
Class algorithm (hierarchical clustering), the clustering algorithm (partitioning-based based on segmentation
Clustering), density-based algorithms (density-based clustering), the clustering algorithm based on model
(model-based).Recently, many researchers attempt to extract some constraint informations from supervision message, and are constrained with these
Information come instruct cluster flow, with reach improve cluster result efficiency and accuracy rate result.Therefore, clustering algorithm produces again
One new branch --- semi-supervised clustering (semi-supervised clustering).As a rule, semi-supervised clustering
Algorithm is divided into two classes again, based on the semi-supervised clustering apart from (distance-based) and based on constraints (constraint-
Based semi-supervised clustering).
In the method based on distance, adjustment distance method is parametrization, and parameter is from advance with restrictive condition
Supervision message existing for form, such as Must-Link and Cannot-Link constraintss.Must-Link means in the constraint
Under the conditions of example must be assigned in same cluster, on the contrary, Cannot-Link then means the example under the constraints
It must be divided into different clusters.Adjustment distance method is generally achieved in a manner of the transition matrix that can be inquired about, so
Every a pair of examples have a corresponding distance values, when the relation between two examples is Must-Link, two examples it
Between distance should be shortened, on the contrary, when the relation between two examples is Cannot-Link, between two examples away from
From should be extended.However, this mode for realizing constraints, the cluster process of algorithm is not instructed strictly, some
When the result that clusters can and constraints disagree, such as, it is Must-Link to have the relation between a pair of examples, but it
The distance between still farther out, cause to be divided into different clusters.Literature survey is shown, most to adjust distance side
Method is provided to solve classification or semi-supervised clustering task puts forward.
Current clustering algorithm is changed based on the method for constraints, by user provide category or
Constraints instructs clustering algorithm, to reach more preferable cluster result.Specifically, cluster can be calculated with different modes
Method is modified to be realized.Constrained COBWEB algorithms are embedded in constraints by optimizing the target clustered
During progressively splitting.Constrained K-means algorithms by constraints information be specially example rank about
The form of beam is incorporated in traditional K-means algorithms.Seeded-K-means algorithms set congruent point using constraints, without
It is that congruent point is randomly choosed as conventional method.In the method, initial cluster meets the transitive closure of constraints, these
The center of cluster is arranged to congruent point, and after initialization step is completed, no matter either with or without restricted information, the structure of cluster can be by
Iteration updates.Semi-supervised hierarchical clustering algorithm was clustered by the way that constraint information is introduced based on gathering together for hypermetric distance matrix
Cheng Zhong, so as to realize semi-supervised clustering.C-DBSCAN algorithms are on the basis of data instance with constraints come cluster dividing.The party
The algorithm that method is illustrated based on DBSCAN simultaneously has good robustness on the data application of irregular shape.
Many clustering algorithms have all been modified to the algorithm with semi-supervised learning function, and most of algorithm therein is all
It is clustering algorithm or hierarchical clustering algorithm based on segmentation, Name-based Routing does not almost have.It is used for clustering in fact, working as
Data when there is the features such as the of different sizes of cluster, shape inequality, density-based algorithms are a kind of ideal choosings
Select.It unlike the clustering algorithm based on segmentation, it is necessary to make great efforts to build the optimum segmentation situation in global data space, such as
DBSCAN algorithms accomplish that region is optimal.Furthermore, it is understood that the semi-supervised learning algorithm based on density, to mutual immediate data
Example can use Must-Link constraintss, can also use Cannot-Link constraintss.Contrast is using constraints as base
For the partitioning algorithm of plinth, it has natural advantage, because many such algorithms are generally in the same Cannot- of converged state
Link constraints conflicts.
In summary, the problem of prior art is present be:
Existing clustering method adaptivity and noise immunity are poor, it is impossible to identify that target data concentrates complicated clustering architecture;
The existing specific shortcoming of semi-supervised clustering algorithm is as follows:
On the number select permeability of cluster class, But most of algorithms is required for user to be determined in advance before being clustered to be obtained
Cluster number (class number);But in real data, class number is unknown, is typically passed through constantly experiment to be closed
Suitable class number, to obtain preferable cluster result.
In algorithm parameter unicity problem,
The selection of algorithm parameter generally has unicity, for same algorithm, can be generally applicable without one group of parameter
In the complicated clustering architecture that all kinds data set is showed.One group of parameter is typically suitable only for the cluster point of certain situation
Analysis.
The content of the invention
The problem of existing for prior art, the invention provides a kind of adaptive semi-supervised Density Clustering method and it is
System.
The present invention is achieved in that a kind of adaptive semi-supervised Density Clustering method, described adaptive semi-supervised
Density Clustering method includes:First density parameter is automatically extracted from the data with category and without category;
Then initial clustering analysis is carried out using multiple density parameters on target data set, obtains Local Clustering result;
Finally by the integration to local cluster result, final global clustering result is obtained.
Further, the clustering architecture that the target data is concentrated, including:Different size of cluster, cluster of different shapes and difference
The cluster of density.
Further, the multiple density parameter learning includes:
Known XUFor the data of no category, XLFor the data with category, data set is builtThe data
Collection comprising all classes be designated as j data and either with or without the data with category set;
Specific steps include:
Step 1, first, randomly select a data point x with category1For initialization points most at the beginning,Then its closest point p in X ' is found1, p1∈ X ', and using the distance between 2 points as r,
With the two points structure data set D, and D from the middle removals of X ', X '=X '-D;If point as the middle presence of X '
p2, p2It is less than or equal to r with the distance of some point in data set D, then, just by p2Move into data set D;
Step 2, iterate step 1, and until not new data point is added in data set D, whether validation data set D
Completely include data setIf completely include data setSo for category j, Epsj=r, if without complete
Include data setSo suitably increase r length;
Step 3, step 2 is repeated, until data set D meets condition;Obtain radius Epsj。
Further, Eps is being obtainedjNumerical value after, initialization MinPts be 2, then using DBSCAN to no band category
The set of data set and all classes data set for being designated as j clustered, the density parameter of cluster is Epsj, MinPts;Obtain
Cluster result Pj, check PjIf data setIn all data point be all divided into same cluster, by MinPts value
Add 1, then data set is clustered using DBSCAN algorithms again;Always it is more than iteration initialize MinPts be 2 to
Sorting procedure is carried out to data set using DBSCAN algorithms again, until data setIn all point do not have in cluster result
It is divided into same cluster;For category j, MinPtsj=MinPts-1.
Further, the Local Clustering result is integrated, including:
Obtain Eps corresponding to each categoryjAnd MinPtsjAfterwards, density set { density corresponding to category is calculatedj
=MinPtsj/EpsjAnd category corresponding to Local Clustering results set { Pj};
Then, sorted to density set corresponding to category in a manner of descending, find numerical value highest density in sequence
densityoCorresponding class is designated as o., it is described corresponding to class be designated as o.In cluster result PoIn, if there is no category and drawn
The point that class is designated as in o cluster is assigned to, the cluster in semi-supervised learning theory for these points it is assumed that assign category o., connect down
Come, in the same way, the high density of logarithm value second is operated accordingly, by that analogy, to remaining all density
All operated accordingly successively;If also data point is not endowed category, these data points are considered as noise spot.
Another object of the present invention is to provide a kind of adaptive semi-supervised Density Clustering system.
Advantages of the present invention and good effect are:
The invention discloses a kind of adaptive semi-supervised clustering algorithm of multiple density information in clustering architecture based on data, this
Algorithm has stronger adaptivity and noise immunity, can recognize that target data concentrates complicated clustering architecture, including:Cluster size is not
Same, shape difference, density difference etc..For different density areas, the algorithm can be from the data with category and without category
Density parameter is automatically extracted, then initial clustering analysis is carried out using multiple density parameters on target data set, so as to obtain office
Portion's cluster result, finally by the integration to local cluster result, obtain final global clustering result.
In adaptive semi-supervised Density Clustering method provided by the invention, when the size for the data that cluster with cluster
During the features such as difference, shape inequality, density-based algorithms are a kind of ideal selections.It is unlike based on segmentation
Clustering algorithm is, it is necessary to make great efforts to build the optimum segmentation situation in global data space, for example DBSCAN algorithms accomplish that region is optimal.
Furthermore, it is understood that the semi-supervised learning algorithm based on density, Must-Link can be used about to mutual immediate data instance
Beam condition, it can also use Cannot-Link constraintss.Contrast for the partitioning algorithm based on constraints, the present invention
With natural advantage because many such algorithms generally in converged state with Cannot-Link constraints conflicts.
It is also an advantage of the present invention that:
On the number select permeability of cluster class:But most of algorithms is required for user to be determined in advance before being clustered to be obtained
Cluster number (class number);But in real data, class number is unknown, is typically passed through constantly experiment to be closed
Suitable class number, to obtain preferable cluster result.And algorithm proposed by the present invention need not set cluster class number, in cluster analysis
During, cluster class number automatically determines according to data set density information;
In algorithm parameter unicity problem:The selection of algorithm parameter generally has unicity, for same algorithm, does not have
There is one group of parameter to be generally applicable to the complicated clustering architecture that all kinds data set is showed.One group of parameter is typically only fitted
Together in the cluster analysis of certain situation.And algorithm proposed by the present invention can automatically extract from the data with category and without category
Multigroup density parameter, density clustering analysis then is carried out to target data set using these density parameters, can be obtained
Excellent cluster analysis result, and there is stronger adaptivity and noise immunity.
The present invention does not need user's input parameter (such as:Cluster class number), the related parameter of density can be passed through spy by algorithm
Rope data internal structure automatically obtains;
The present invention can identify of different sizes, shape is different, density is different data by integrating region clustering result,
It is and insensitive to noise data;
The present invention not only fully meets the restrictive condition of input, and much the semi-supervised clustering algorithm based on density can produce
The raw cluster being largely made up of an independent data point or minimal amount of data point, the algorithm in the present invention can significantly reduce this
The side effect of kind.
Brief description of the drawings
Fig. 1 is adaptive semi-supervised Density Clustering method flow diagram provided in an embodiment of the present invention.
Fig. 2 is the exemplary plot of determination Eps processes provided in an embodiment of the present invention.
In figure:(a), known XUFor the data of no category, XLFor the data with category, data set can be builtContain all classes be designated as j data and either with or without the data with category set.First, random choosing
Take a data point x with category1For initialization points most at the beginning,Then it is found closest one in X '
Individual point p1, p1∈ X ', and using the distance between 2 points as r,With the two point
Data set D is built, and D from the middle removals of X ', X '=X '-D;If point p as the middle presence of X '2, p2With in data set D
The distance of some point is less than or equal to r, then, just by p2Move into data set D.
(b), iterate previous step, until not new data point is added in data set D, now, it is possible to verify
Whether data set D has completely included data setIf data set is completely includedSo for category j, Epsj=
R, if not completely including data setSo suitably increase r length.
(c) above step, is repeated, until data set D meets condition, determines radius Epsj。
Fig. 3 is the exemplary plot of integration Local Clustering result provided in an embodiment of the present invention.
In figure:The distribution situation of cluster during (a)-(f) Local Clusterings result is integrated in figure.
Fig. 4 is each algorithm provided in an embodiment of the present invention on the data set with multiple density and different shape characteristic
Performance figure.
In figure:(a), three different clusters altogether by 191 data point example sets into, wherein spherical cluster by 60 with circle mark
The data point example composition of note, a crescent cluster are made up of 60 data point examples with rhombus mark, and another is crescent
Cluster is made up of 60 data point examples with triangle mark, and in addition, also 11 noise datas are in figure with the shape of cross
Formula, which marks out, to be come.(b) situation of data band category in data space, is represented.(c)-(g), represent from multiple density, different shape
Data set in distinguish three cluster distribution situations;(h), represent to obtain highest accuracy rate -94.3%.
Fig. 5 is each algorithm provided in an embodiment of the present invention on the uneven data set with manifold structure characteristic of cluster
Performance figure.
In figure:(a), represent data set altogether by 790 data point example sets into wherein positioned at the spherical of data space center
Cluster is by 395 data point example sets into and with circle labeled data point, circular cluster is by 363 data point example sets into and with water chestnut
Shape mark data points, the cluster in the upper left corner is by 3 data point example sets into and being marked with asterisk, the cluster in the upper right corner is by 3 data points
Example forms, and is marked with triangle, and the cluster in the lower left corner is by 3 data point example sets into and to mark, the cluster in the lower right corner is by 3
Individual data point example set into, and with mark.There are 20 noise datas in data set, positioned at circular cluster and four clusters of surrounding
Between.(b) situation of data band category in data space, is represented, the situation of input data set can also be regarded as;(c-h) it is, anti-
The performance situation reflected on the data set, identify the uneven characteristic with manifold structure of the intrinsic cluster of data set.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to embodiments, to the present invention
It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to
Limit the present invention.
In adaptive semi-supervised Density Clustering method provided by the invention, when the size for the data that cluster with cluster
During the features such as difference, shape inequality, density-based algorithms are a kind of ideal selections.It is unlike based on segmentation
Clustering algorithm is, it is necessary to make great efforts to build the optimum segmentation situation in global data space, for example DBSCAN algorithms accomplish that region is optimal.
Furthermore, it is understood that the semi-supervised learning algorithm based on density, Must-Link can be used about to mutual immediate data instance
Beam condition, it can also use Cannot-Link constraintss.Contrast for the partitioning algorithm based on constraints, the present invention
With natural advantage because many such algorithms generally in converged state with Cannot-Link constraints conflicts.
The application principle of the present invention is described below in conjunction with the accompanying drawings.
As shown in figure 1, adaptive semi-supervised Density Clustering method provided in an embodiment of the present invention, including:
S101:First density parameter is automatically extracted from the data with category and without category;
S102:Then initial clustering analysis is carried out using multiple density parameters on target data set, obtains Local Clustering knot
Fruit;
S103:Finally by the integration to local cluster result, final global clustering result is obtained.
First, the application principle of the present invention is further described with reference to specific embodiment.
The present invention proposes a kind of adaptive semi-supervised Density Clustering method, and it is made up of two parts, including multiple
Density parameter learns to integrate with Local Clustering result, including:
1) multiple density parameter learning:
For density-based algorithms, different density parameters (minimal data points MinPts and radius
Eps) mean that cluster result also differs.Present invention assumes that be all between data set corresponding to each category it is of different sizes,
Shape is different, density is also different, then, its single density parameter is all found for each category, is just highly desirable.
Known XUFor the data of no category, XLFor the data with category, data set can be builtWrap
Contained all classes be designated as j data and either with or without the data with category set.First, a number with category is randomly selected
Strong point x1For initialization points most at the beginning,Then its closest point p in X ' is found1, p1∈ X ', and
Using the distance between 2 points as r,With the two points structure data set D, and
And D from the middle removals of X ', X '=X '-D.If point p as the middle presence of X '2, p2With the distance of some point in data set D
Less than or equal to r, then, just by p2Move into data set D.
Iterate previous step, until not new data point is added in data set D, now, it is possible to verify data
Whether collection D has completely included data setIf data set is completely includedSo for category j, Epsj=r, such as
Fruit does not completely include data setSo suitably increase r length,
Above step is repeated, until data set D meets condition.How Fig. 2 determines radius Eps if being demonstratedjProcess.
Obtaining EpsjNumerical value after, MinPts can be initialized as 2, then using DBSCAN to the not number with category
Clustered according to the set of collection and all classes data set for being designated as j, density parameter when cluster is (Epsj, MinPts).This
Sample one, it is possible to obtain cluster result Pj, check PjIf data setIn all data point be all divided into it is same
In cluster, then MinPts value is added into 1, then data set clustered using DBSCAN algorithms again.Iteration always
Above step, until data setIn all point be not divided into same cluster in cluster result, then for class
Mark for j, MinPtsj=MinPts-1.
2) Local Clustering result is integrated:
Once obtain Eps corresponding to each categoryjAnd MinPtsj, with regard to density set corresponding to category can be calculated
{densityj=MinPtsj/Epsj, and Local Clustering results set { P corresponding to categoryjx.Then, to corresponding to category
Density set is sorted in a manner of descending, finds wherein numerical value highest density d ensityo, corresponding class is designated as o.Clustering
As a result PoIn, if there is no category and it is divided into the point that class is designated as in o cluster, then, it is theoretical according to semi-supervised learning
In cluster assume (the cluster assumption) (if some point in same cluster, then they are very possible
With identical category), it is possible to assign category o for these points.Next, in the same way, logarithm value second is high
Density is operated accordingly, after this, by that analogy, is all operated accordingly successively to being left all density.On
After the completion of stating step, if also data point is not endowed category, then these data points are considered as noise spot.
Essentially, the step for integrating Local Clustering result, according to data dividing condition before to every number
Strong point has carried out the assignment of category.When a data point can be identified by different categories, the algorithm is always data point tax
Give and wherein correspond to density highest category.In other words, if having lap between two clusters, the algorithm is more likely to weight
Folded part incorporates the high cluster of density into, this also means that the division interval between cluster and cluster has been put into low density region.This
(the low density eparation are assumed in the low-density segmentation that the mentality of designing of sample has met in semi-supervised learning just
Assumption) (division border should be placed on low density place as far as possible).In order to more vivid explain that integration is local
The step for cluster result, the present invention have the artificial 2-D data of different densities characteristic as sample using one group, demonstrated in figure 3
This process.The distribution situation of cluster during (a)-(f) Local Clusterings result is integrated in figure;Wherein entered respectively with circle, triangle, rhombus
Rower is noted.
2nd, the application principle of the present invention is further described with reference to specific embodiment.
The embodiment of the present invention provides a kind of adaptive semi-supervised Density Clustering method, and it is made up of two parts, including
Multiple density parameter learning is integrated with Local Clustering result, and idiographic flow is as follows:
3rd, the application principle of the present invention is further described with reference to good effect.
The present invention does not need user's input parameter (such as:Cluster class number), the related parameter of density can be passed through spy by algorithm
Rope data internal structure automatically obtains;
The present invention can identify of different sizes, shape is different, density is different data by integrating region clustering result,
It is and insensitive to noise data;
The present invention not only fully meets the restrictive condition of input, and much the semi-supervised clustering algorithm based on density can produce
The raw cluster being largely made up of an independent data point or minimal amount of data point, the algorithm in the present invention can significantly reduce this
The side effect of kind.
The comprehensive contrast of the invention carried out with representative semi-supervised algorithm, these algorithms include:C-DBSCAN,
Constrained Clustering via Spectral Regularization(CCSR),Constrained Kmeans,
Constrained Evidential Clustering (CEVCLUS) and Semi-Bayesian.Experimental data set is former
This all data carries category, before the experiments were performed, remains the category of 10% data, the data of residue 90% at random
Category be removed, turn into the not data with category.Although the data for being retained category are selected at random, need to protect
Demonstrate,prove each category at least two data points and remain corresponding category, the information of constraints is only from remaining with category
Obtained in data.
First, table of each algorithm on first data set (cluster has multiple density and different shape characteristic) is observed
It is existing.As shown in Fig. 4 (a), three different clusters altogether by 191 data point example sets into, wherein spherical cluster by 60 with circle mark
The data point example composition of note, a crescent cluster are made up of 60 data point examples with rhombus mark, and another is crescent
Cluster is made up of 60 data point examples with triangle mark, and in addition, also 11 noise datas are in figure with the shape of cross
Formula, which marks out, to be come.Fig. 3 (b) illustrates the situation of data band category in data space.It is proposed by the present invention as shown in Fig. 4 (c-h)
Algorithm (Fig. 3-h) obtains highest accuracy rate-94.3%.It is not only exactly from multiple density, data set of different shapes
Three clusters have been distinguished, and have contemplated the influence for the noise data being added thereto.
Next, to observe table of each algorithm on second data set (cluster has cluster uneven and manifold structure characteristic)
It is existing.As shown in Fig. 5 (a), the data set is altogether by 790 data point example sets into wherein the spherical cluster positioned at data space center
By 395 data point example sets into and with circle labeled data point, circular cluster is by 363 data point example sets into and with rhombus
Mark data points, the cluster in the upper left corner is by 3 data point example sets into and being marked with asterisk, the cluster in the upper right corner is by 3 data point realities
Example composition, and is marked with triangle, and the cluster in the lower left corner is by 3 data point example sets into and to mark, the cluster in the lower right corner is by 3
Data point example forms, and with a mark.Also have 20 noise datas in data set, positioned at circular cluster and four clusters of surrounding it
Between.Fig. 5 (b) illustrates the situation of data band category in data space, can also regard the situation of input data set as.Fig. 5 (c-
H) performance situation of the algorithm proposed by the present invention on the data set is reflected, the algorithm, can be fine compared with analogous algorithms
Ground have identified the uneven characteristic with manifold structure of the intrinsic cluster of data set, and achieve 98.6% accuracy rate.
Artificial 2-D data is tested by using different algorithms, experimental result has convincingly demonstrated the present invention and carried
The advantages of algorithm gone out, and be unequivocally demonstrated that, when designing density-based algorithms, the weight of multiple density parameter learning
The property wanted.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
All any modification, equivalent and improvement made within refreshing and principle etc., should be included in the scope of the protection.
Claims (6)
- A kind of 1. adaptive semi-supervised Density Clustering method, it is characterised in that the adaptive semi-supervised Density Clustering side Method includes:First density parameter is automatically extracted from the data with category and without category;Then initial clustering analysis is carried out using multiple density parameters on target data set, obtains Local Clustering result;Finally by the integration to local cluster result, final global clustering result is obtained.
- 2. adaptive semi-supervised Density Clustering method as claimed in claim 1, it is characterised in that the target data is concentrated Clustering architecture, including:The cluster of different size of cluster, cluster of different shapes and different densities.
- 3. adaptive semi-supervised Density Clustering method as claimed in claim 1, it is characterised in that the multiple density parameter Study includes:Known XUFor the data of no category, XLFor the data with category, data set is builtThe data set includes All classes be designated as j data and either with or without the data with category set;Specific steps include:Step 1, first, randomly select a data point x with category1For initialization points most at the beginning, Then its closest point p in X ' is found1, p1∈ X ', and using the distance between 2 points as r,With the two points structure data set D, and D from the middle removals of X ', X '=X '-D;If point p as the middle presence of X '2, p2 It is less than or equal to r with the distance of some point in data set D, then, just by p2Move into data set D;Step 2, iterate step 1, and until not new data point is added in data set D, whether validation data set D is complete Include data setIf completely include data setSo for category j, Epsj=r, if do not completely included Data setSo suitably increase r length;Step 3, step 2 is repeated, until data set D meets condition;Obtain radius Epsj。
- 4. adaptive semi-supervised Density Clustering method as claimed in claim 3, it is characterised in that obtaining EpsjNumerical value Afterwards, it is 2 to initialize MinPts, is then designated as j data set to the not data set with category and all classes using DBSCAN Set is clustered, and the density parameter of cluster is Epsj, MinPts;Obtain cluster result Pj, check PjIf data setIn All data points are all divided into same cluster, and MinPts value is added into 1, then again using DBSCAN algorithms to data Collection is clustered;It is 2 to gathering again using DBSCAN algorithms to data set more than iteration to initialize MinPts always Class step, until data setIn all point be not divided into same cluster in cluster result;For category j, MinPtsj=MinPts-1.
- 5. adaptive semi-supervised Density Clustering method as claimed in claim 1, it is characterised in that the Local Clustering result Integrate, including:Obtain Eps corresponding to each categoryjAnd MinPtsjAfterwards, density set { density corresponding to category is calculatedj= MinPtsj/EpsjAnd category corresponding to Local Clustering results set { Pj};Then, sorted to density set corresponding to category in a manner of descending, find numerical value highest density d ensity in sequenceo Corresponding class is designated as o., it is described corresponding to class be designated as o.In cluster result PoIn, if there is no category and it is divided into class The point being designated as in o cluster, the cluster in semi-supervised learning theory for these points it is assumed that assign category o., next, according to Same mode, the high density of logarithm value second are operated accordingly, by that analogy, are all entered successively to being left all density The corresponding operation of row;If also data point is not endowed category, these data points are considered as noise spot.
- A kind of 6. adaptive semi-supervised Density Clustering system of adaptive semi-supervised Density Clustering method as claimed in claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710789195.6A CN107563443A (en) | 2017-09-05 | 2017-09-05 | A kind of adaptive semi-supervised Density Clustering method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710789195.6A CN107563443A (en) | 2017-09-05 | 2017-09-05 | A kind of adaptive semi-supervised Density Clustering method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107563443A true CN107563443A (en) | 2018-01-09 |
Family
ID=60979126
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710789195.6A Pending CN107563443A (en) | 2017-09-05 | 2017-09-05 | A kind of adaptive semi-supervised Density Clustering method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107563443A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110781920A (en) * | 2019-09-24 | 2020-02-11 | 同济大学 | Method for identifying semantic information of cloud components of indoor scenic spots |
CN112488138A (en) * | 2019-09-11 | 2021-03-12 | 中国移动通信集团广东有限公司 | User category identification method and device, electronic equipment and storage medium |
CN112613530A (en) * | 2020-11-23 | 2021-04-06 | 北京思特奇信息技术股份有限公司 | Cell resident identification method and system based on adaptive density clustering algorithm |
CN113744405A (en) * | 2021-08-26 | 2021-12-03 | 武汉理工大学 | Indoor target extraction method based on exponential function density clustering model |
-
2017
- 2017-09-05 CN CN201710789195.6A patent/CN107563443A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112488138A (en) * | 2019-09-11 | 2021-03-12 | 中国移动通信集团广东有限公司 | User category identification method and device, electronic equipment and storage medium |
CN110781920A (en) * | 2019-09-24 | 2020-02-11 | 同济大学 | Method for identifying semantic information of cloud components of indoor scenic spots |
CN112613530A (en) * | 2020-11-23 | 2021-04-06 | 北京思特奇信息技术股份有限公司 | Cell resident identification method and system based on adaptive density clustering algorithm |
CN113744405A (en) * | 2021-08-26 | 2021-12-03 | 武汉理工大学 | Indoor target extraction method based on exponential function density clustering model |
CN113744405B (en) * | 2021-08-26 | 2023-06-06 | 武汉理工大学 | Indoor target extraction method based on exponential function density clustering model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107563443A (en) | A kind of adaptive semi-supervised Density Clustering method and system | |
CN104346620B (en) | To the method and apparatus and image processing system of the pixel classifications in input picture | |
CN107992887A (en) | Classifier generation method, sorting technique, device, electronic equipment and storage medium | |
Abbas et al. | DenMune: Density peak based clustering using mutual nearest neighbors | |
CN109753995B (en) | Optimization method of 3D point cloud target classification and semantic segmentation network based on PointNet + | |
CN108614997B (en) | Remote sensing image identification method based on improved AlexNet | |
CN106096727A (en) | A kind of network model based on machine learning building method and device | |
CN107341447A (en) | A kind of face verification mechanism based on depth convolutional neural networks and evidence k nearest neighbor | |
CN106845510A (en) | Chinese tradition visual culture Symbol Recognition based on depth level Fusion Features | |
CN108648191A (en) | Pest image-recognizing method based on Bayes's width residual error neural network | |
CN109902736A (en) | A kind of Lung neoplasm image classification method indicated based on autocoder construction feature | |
CN105654107A (en) | Visible component classification method based on SVM | |
CN107506786A (en) | A kind of attributive classification recognition methods based on deep learning | |
CN105894044A (en) | Single-plant tree point cloud automatic extraction method based on vehicle-mounted laser scanning data | |
CN105005764A (en) | Multi-direction text detection method of natural scene | |
CN102750551A (en) | Hyperspectral remote sensing classification method based on support vector machine under particle optimization | |
CN109086412A (en) | A kind of unbalanced data classification method based on adaptive weighted Bagging-GBDT | |
CN104091038A (en) | Method for weighting multiple example studying features based on master space classifying criterion | |
CN106845536B (en) | Parallel clustering method based on image scaling | |
Nathiya et al. | An analytical study on behavior of clusters using k means, em and k* means algorithm | |
CN105005789A (en) | Vision lexicon based remote sensing image terrain classification method | |
CN103914705A (en) | Hyperspectral image classification and wave band selection method based on multi-target immune cloning | |
CN100416599C (en) | Not supervised classification process of artificial immunity in remote sensing images | |
CN105608443B (en) | A kind of face identification method of multiple features description and local decision weighting | |
CN111931853A (en) | Oversampling method based on hierarchical clustering and improved SMOTE |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180109 |
|
RJ01 | Rejection of invention patent application after publication |