CN102426631A - High-dimension space mapping-based K harmonic mean clustering method - Google Patents
High-dimension space mapping-based K harmonic mean clustering method Download PDFInfo
- Publication number
- CN102426631A CN102426631A CN 201110341012 CN201110341012A CN102426631A CN 102426631 A CN102426631 A CN 102426631A CN 201110341012 CN201110341012 CN 201110341012 CN 201110341012 A CN201110341012 A CN 201110341012A CN 102426631 A CN102426631 A CN 102426631A
- Authority
- CN
- China
- Prior art keywords
- data
- distance
- distance measure
- dimensional space
- clustering method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention discloses a high-dimension space mapping-based K harmonic mean clustering method. In the method, supposing that sample data has had a space vector form, the space vector data is mapped to a higher-dimension space and then K harmonic mean is introduced to perform data clustering; and the method specifically comprises the following steps of: (1) processing data; (2) selecting an initialization clustering center of the data; (3) mapping a distance measure to the high-dimension space; (4) substituting the mapped distance measure to calculate a harmonic distance of a data sample; (5) performing K mean clustering by taking the harmonic distance as the distance measure; and (6) outputting a result. By using the method, the sensitivity of the conventional K mean algorithm on an initial value can be effectively improved, and clustering error caused by data aliasing is greatly improved.
Description
Technical field
The present invention relates to computational science and Intelligent Information Processing field, especially data set is carried out the technology of cluster, specifically a kind of K harmomic mean clustering method based on the higher dimensional space mapping.
Background technology
Cluster analysis is the basis of further analysis and deal with data as a kind of data preprocessing method, and cluster analysis becomes indispensable important tool in handling large-scale data.At present; The most frequently used data clustering method is a K mean cluster method; Can solve the clustered demand in the Intelligent Information Processing process to a certain extent though experiment showed, this method, but this method is very responsive to the randomness of initialization cluster centre; And can't solve the data aliasing problem in the practical engineering application, so this method can not be applicable to the demand of current large-scale complex data clusters.Therefore active demand is a kind of very responsive and can solve the clustering method of data aliasing problem to the initialization center that clusters.
Summary of the invention
The object of the present invention is to provide a kind of K harmomic mean clustering method based on the higher dimensional space mapping, this method can make large-scale complex data clusters result stable and more accurate.
To achieve these goals, technical scheme of the present invention is: a kind of K harmomic mean clustering method based on the higher dimensional space mapping, it comprises the steps:
(1) be the space vector form with original data processing, promptly each data sample all exists with the form of hyperspace vector;
(2) the initialization cluster centre of selection data;
(3) distance measure is mapped to higher dimensional space;
(4) distance measure after will shining upon is brought the mediation distance of calculating sample point into;
(5) be that distance measure carries out the K mean cluster with this mediation distance;
(6) result's output.
In order can to differentiate preferably, to extract and amplify useful characteristic, thereby realize cluster more accurately, the distance measure in the above-mentioned steps (3) is the included angle cosine value, and adopts the Mercer kernel function that the included angle cosine value is mapped to higher dimensional space.
The present invention has the following advantages:
The K harmomic mean clustering method based on the higher dimensional space mapping that the present invention is directed to the data clusters design under the complicated occasion can be stablized cluster exactly to point-like space vector data, realizes the converging operation different classes of to data.In the distance metric field, utilize radially basic kernel function that cosine tolerance is mapped to higher-dimension and calculate, can effectively separate the aliasing data, the cosine measure for traditional has very big advantage.
Description of drawings
Accompanying drawing is the process flow diagram of the inventive method.
Embodiment
Method step of the present invention is shown in accompanying drawing, and is clear in order to explain, and will describe specific embodiment of the present invention step by step below.
(1) data processing.
The data basis of this method is a form space vector form the most widely in this area, and promptly each data sample all is that form with the hyperspace vector exists.Because of most of real datas all are the form appearance with the hyperspace vector, so the concrete grammar of data processing does not belong to content of the present invention, this step is merely the data that the used data of explanation this method should be the space vector form.
(2) select the data initialization cluster centre.
Involved in the present invention to the field be data clusters, so answer the expection classification of specific data to count K.The present invention is directed to the expection classification and count K, select K initialization cluster centre.Because of the present invention for primary data and insensitive, so present embodiment for randomly drawing K data sample as the initialization cluster centre, cluster centre is gathered and is designated as C
l=[C
L1, C
L2..., C
Lm], wherein l is the iterations of cluster centre, C
LmBe the cluster centre after m classification l wheel calculates.
(3) distance measure is mapped to higher dimensional space.
The distance measure of present embodiment is the included angle cosine value; Carry out the mapping of Mercer kernel function for included angle cosine tolerance; Because of the Mercer kernel function has key property; Be about to low dimension data and pass through Nonlinear Mapping to higher-dimension, can differentiate, extract and amplify useful characteristic preferably, thereby realize cluster more accurately.Be without loss of generality, present embodiment uses that comparatively typical gaussian kernel function describes in the Mercer kernel function, the distance measure (formula (1)) between two data samples after the mapping as follows:
(4) distance measure after will shining upon is brought the mediation distance of calculating between the sample point into.
In traditional K mean cluster method, distance calculating method is the minor increment of computational data point and cluster centre.And in the present invention, distance calculating method promptly uses the harmonic average of data point and all cluster centres to substitute Traditional calculating methods, thereby has introduced dynamic weighting for adopting the mediation distance, and hard cluster is softening.
(5) be that distance measure carries out the K mean cluster with this mediation distance.
Through aforementioned calculation, the cluster centre C of l class in the K mean cluster method
lChange formula (formula (2)) and cluster objective function E
KHMComputing formula (formula (3)) is distinguished as follows:
X wherein
iBe i sample point, the d in formula (2) and the formula (3) is calculated by formula (1), does not stop the iteration cluster centre by formula (2), and is stable until formula (3) result, and then cluster process finishes.
(6) result's output.
In the art, the method for result's output is more, and the present invention does not relate to concrete output form as a result, and only defining this step is one of necessary step of the present invention.
The foregoing description does not limit the present invention in any way, and every employing is equal to the technical scheme that replacement or the mode of equivalent transformation obtain and all drops in protection scope of the present invention.
Claims (3)
1. the K harmomic mean clustering method based on the higher dimensional space mapping is characterized in that comprising the steps:
(1) be the space vector form with original data processing;
(2) the initialization cluster centre of selection data;
(3) distance measure is mapped to higher dimensional space;
(4) distance measure after will shining upon is brought the mediation distance of calculating sample point into;
(5) be that distance measure carries out the K mean cluster with this mediation distance;
(6) result's output.
2. the K harmomic mean clustering method based on the higher dimensional space mapping according to claim 1, it is characterized in that: the distance measure in the said step (3) is the included angle cosine value.
3. the K harmomic mean clustering method based on the higher dimensional space mapping according to claim 2 is characterized in that: adopt the Mercer kernel function that the included angle cosine value is mapped to higher dimensional space in the said step (3).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110341012 CN102426631A (en) | 2011-11-01 | 2011-11-01 | High-dimension space mapping-based K harmonic mean clustering method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110341012 CN102426631A (en) | 2011-11-01 | 2011-11-01 | High-dimension space mapping-based K harmonic mean clustering method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102426631A true CN102426631A (en) | 2012-04-25 |
Family
ID=45960610
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110341012 Pending CN102426631A (en) | 2011-11-01 | 2011-11-01 | High-dimension space mapping-based K harmonic mean clustering method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102426631A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105574165A (en) * | 2015-12-17 | 2016-05-11 | 国家电网公司 | Power grid operation monitoring information identification and classification method based on clustering |
CN106526450A (en) * | 2016-10-27 | 2017-03-22 | 桂林电子科技大学 | Multi-target NoC testing planning optimization method |
-
2011
- 2011-11-01 CN CN 201110341012 patent/CN102426631A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105574165A (en) * | 2015-12-17 | 2016-05-11 | 国家电网公司 | Power grid operation monitoring information identification and classification method based on clustering |
CN105574165B (en) * | 2015-12-17 | 2019-11-26 | 国家电网公司 | A kind of grid operating monitoring information identification classification method based on cluster |
CN106526450A (en) * | 2016-10-27 | 2017-03-22 | 桂林电子科技大学 | Multi-target NoC testing planning optimization method |
CN106526450B (en) * | 2016-10-27 | 2018-12-11 | 桂林电子科技大学 | A kind of multiple target NoC test-schedule optimization method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102693452A (en) | Multiple-model soft-measuring method based on semi-supervised regression learning | |
CN104093203A (en) | Access point selection algorithm used for wireless indoor positioning | |
CN101692257A (en) | Method for registering complex curved surface | |
CN103336869A (en) | Multi-objective optimization method based on Gaussian process simultaneous MIMO model | |
CN103744935A (en) | Rapid mass data cluster processing method for computer | |
CN102506805B (en) | Multi-measuring-point planeness evaluation method based on support vector classification | |
CN104616059A (en) | DOA (Direction of Arrival) estimation method based on quantum-behaved particle swarm | |
CN103310463B (en) | Based on the online method for tracking target of Probabilistic Principal Component Analysis and compressed sensing | |
CN104699595A (en) | Software testing method facing to software upgrading | |
Kane et al. | Determining the number of clusters for a k-means clustering algorithm | |
CN103942415A (en) | Automatic data analysis method of flow cytometer | |
Zeng et al. | A note on learning rare events in molecular dynamics using lstm and transformer | |
CN102426631A (en) | High-dimension space mapping-based K harmonic mean clustering method | |
CN103207804B (en) | Based on the MapReduce load simulation method of group operation daily record | |
CN103063233B (en) | A kind of method that adopts multisensor to reduce measure error | |
Coronel-Brizio et al. | The Anderson–Darling test of fit for the power-law distribution from left-censored samples | |
CN102033936A (en) | Method for comparing similarity of time sequences | |
CN103914373A (en) | Method and device for determining priority corresponding to module characteristic information | |
CN104899440A (en) | Magnetic leakage flux defect reconstruction method based on universal gravitation search algorithm | |
CN104268217A (en) | User behavior time relativity determining method and device | |
Martín-Fernández et al. | Indexes to find the optimal number of clusters in a hierarchical clustering | |
CN105488523A (en) | Data clustering analysis method based on Grassmann manifold | |
CN102930158A (en) | Variable selection method based on partial least square | |
CN103020390B (en) | A kind of model for predicting rainfall and run-off similarity | |
CN102637200B (en) | Method for distributing multi-level associated data to same node of cluster |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20120425 |