CN102426631A - High-dimension space mapping-based K harmonic mean clustering method - Google Patents

High-dimension space mapping-based K harmonic mean clustering method Download PDF

Info

Publication number
CN102426631A
CN102426631A CN 201110341012 CN201110341012A CN102426631A CN 102426631 A CN102426631 A CN 102426631A CN 201110341012 CN201110341012 CN 201110341012 CN 201110341012 A CN201110341012 A CN 201110341012A CN 102426631 A CN102426631 A CN 102426631A
Authority
CN
China
Prior art keywords
data
distance
distance measure
dimensional space
clustering method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201110341012
Other languages
Chinese (zh)
Inventor
王建宇
康其桔
马鹏飞
孙丽娟
陆源
何新
王凯
田乃鲁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology Changshu Research Institute Co Ltd
Original Assignee
Nanjing University of Science and Technology Changshu Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology Changshu Research Institute Co Ltd filed Critical Nanjing University of Science and Technology Changshu Research Institute Co Ltd
Priority to CN 201110341012 priority Critical patent/CN102426631A/en
Publication of CN102426631A publication Critical patent/CN102426631A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a high-dimension space mapping-based K harmonic mean clustering method. In the method, supposing that sample data has had a space vector form, the space vector data is mapped to a higher-dimension space and then K harmonic mean is introduced to perform data clustering; and the method specifically comprises the following steps of: (1) processing data; (2) selecting an initialization clustering center of the data; (3) mapping a distance measure to the high-dimension space; (4) substituting the mapped distance measure to calculate a harmonic distance of a data sample; (5) performing K mean clustering by taking the harmonic distance as the distance measure; and (6) outputting a result. By using the method, the sensitivity of the conventional K mean algorithm on an initial value can be effectively improved, and clustering error caused by data aliasing is greatly improved.

Description

A kind of K harmomic mean clustering method based on the higher dimensional space mapping
Technical field
The present invention relates to computational science and Intelligent Information Processing field, especially data set is carried out the technology of cluster, specifically a kind of K harmomic mean clustering method based on the higher dimensional space mapping.
Background technology
Cluster analysis is the basis of further analysis and deal with data as a kind of data preprocessing method, and cluster analysis becomes indispensable important tool in handling large-scale data.At present; The most frequently used data clustering method is a K mean cluster method; Can solve the clustered demand in the Intelligent Information Processing process to a certain extent though experiment showed, this method, but this method is very responsive to the randomness of initialization cluster centre; And can't solve the data aliasing problem in the practical engineering application, so this method can not be applicable to the demand of current large-scale complex data clusters.Therefore active demand is a kind of very responsive and can solve the clustering method of data aliasing problem to the initialization center that clusters.
Summary of the invention
The object of the present invention is to provide a kind of K harmomic mean clustering method based on the higher dimensional space mapping, this method can make large-scale complex data clusters result stable and more accurate.
To achieve these goals, technical scheme of the present invention is: a kind of K harmomic mean clustering method based on the higher dimensional space mapping, it comprises the steps:
(1) be the space vector form with original data processing, promptly each data sample all exists with the form of hyperspace vector;
(2) the initialization cluster centre of selection data;
(3) distance measure is mapped to higher dimensional space;
(4) distance measure after will shining upon is brought the mediation distance of calculating sample point into;
(5) be that distance measure carries out the K mean cluster with this mediation distance;
(6) result's output.
In order can to differentiate preferably, to extract and amplify useful characteristic, thereby realize cluster more accurately, the distance measure in the above-mentioned steps (3) is the included angle cosine value, and adopts the Mercer kernel function that the included angle cosine value is mapped to higher dimensional space.
The present invention has the following advantages:
The K harmomic mean clustering method based on the higher dimensional space mapping that the present invention is directed to the data clusters design under the complicated occasion can be stablized cluster exactly to point-like space vector data, realizes the converging operation different classes of to data.In the distance metric field, utilize radially basic kernel function that cosine tolerance is mapped to higher-dimension and calculate, can effectively separate the aliasing data, the cosine measure for traditional has very big advantage.
Description of drawings
Accompanying drawing is the process flow diagram of the inventive method.
Embodiment
Method step of the present invention is shown in accompanying drawing, and is clear in order to explain, and will describe specific embodiment of the present invention step by step below.
(1) data processing.
The data basis of this method is a form space vector form the most widely in this area, and promptly each data sample all is that form with the hyperspace vector exists.Because of most of real datas all are the form appearance with the hyperspace vector, so the concrete grammar of data processing does not belong to content of the present invention, this step is merely the data that the used data of explanation this method should be the space vector form.
(2) select the data initialization cluster centre.
Involved in the present invention to the field be data clusters, so answer the expection classification of specific data to count K.The present invention is directed to the expection classification and count K, select K initialization cluster centre.Because of the present invention for primary data and insensitive, so present embodiment for randomly drawing K data sample as the initialization cluster centre, cluster centre is gathered and is designated as C l=[C L1, C L2..., C Lm], wherein l is the iterations of cluster centre, C LmBe the cluster centre after m classification l wheel calculates.
(3) distance measure is mapped to higher dimensional space.
The distance measure of present embodiment is the included angle cosine value; Carry out the mapping of Mercer kernel function for included angle cosine tolerance; Because of the Mercer kernel function has key property; Be about to low dimension data and pass through Nonlinear Mapping to higher-dimension, can differentiate, extract and amplify useful characteristic preferably, thereby realize cluster more accurately.Be without loss of generality, present embodiment uses that comparatively typical gaussian kernel function describes in the Mercer kernel function, the distance measure (formula (1)) between two data samples after the mapping as follows:
d ( l 1 , l 2 ) = exp ( cos 2 ( l 1 , l 2 ) σ 2 ) - - - ( 1 )
(4) distance measure after will shining upon is brought the mediation distance of calculating between the sample point into.
In traditional K mean cluster method, distance calculating method is the minor increment of computational data point and cluster centre.And in the present invention, distance calculating method promptly uses the harmonic average of data point and all cluster centres to substitute Traditional calculating methods, thereby has introduced dynamic weighting for adopting the mediation distance, and hard cluster is softening.
(5) be that distance measure carries out the K mean cluster with this mediation distance.
Through aforementioned calculation, the cluster centre C of l class in the K mean cluster method lChange formula (formula (2)) and cluster objective function E KHMComputing formula (formula (3)) is distinguished as follows:
C l = Σ i = 1 n 1 ( Σ j = 1 k d i , l 2 d i , j 2 ) 2 Σ i = 1 n 1 ( Σ j = 1 k d i , l 2 d i , j 2 ) 2 - - - ( 2 )
E KHM = Σ i = 1 n k Σ l = 1 k 1 d ( X i , C i ) - - - ( 3 )
X wherein iBe i sample point, the d in formula (2) and the formula (3) is calculated by formula (1), does not stop the iteration cluster centre by formula (2), and is stable until formula (3) result, and then cluster process finishes.
(6) result's output.
In the art, the method for result's output is more, and the present invention does not relate to concrete output form as a result, and only defining this step is one of necessary step of the present invention.
The foregoing description does not limit the present invention in any way, and every employing is equal to the technical scheme that replacement or the mode of equivalent transformation obtain and all drops in protection scope of the present invention.

Claims (3)

1. the K harmomic mean clustering method based on the higher dimensional space mapping is characterized in that comprising the steps:
(1) be the space vector form with original data processing;
(2) the initialization cluster centre of selection data;
(3) distance measure is mapped to higher dimensional space;
(4) distance measure after will shining upon is brought the mediation distance of calculating sample point into;
(5) be that distance measure carries out the K mean cluster with this mediation distance;
(6) result's output.
2. the K harmomic mean clustering method based on the higher dimensional space mapping according to claim 1, it is characterized in that: the distance measure in the said step (3) is the included angle cosine value.
3. the K harmomic mean clustering method based on the higher dimensional space mapping according to claim 2 is characterized in that: adopt the Mercer kernel function that the included angle cosine value is mapped to higher dimensional space in the said step (3).
CN 201110341012 2011-11-01 2011-11-01 High-dimension space mapping-based K harmonic mean clustering method Pending CN102426631A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110341012 CN102426631A (en) 2011-11-01 2011-11-01 High-dimension space mapping-based K harmonic mean clustering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110341012 CN102426631A (en) 2011-11-01 2011-11-01 High-dimension space mapping-based K harmonic mean clustering method

Publications (1)

Publication Number Publication Date
CN102426631A true CN102426631A (en) 2012-04-25

Family

ID=45960610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110341012 Pending CN102426631A (en) 2011-11-01 2011-11-01 High-dimension space mapping-based K harmonic mean clustering method

Country Status (1)

Country Link
CN (1) CN102426631A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574165A (en) * 2015-12-17 2016-05-11 国家电网公司 Power grid operation monitoring information identification and classification method based on clustering
CN106526450A (en) * 2016-10-27 2017-03-22 桂林电子科技大学 Multi-target NoC testing planning optimization method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574165A (en) * 2015-12-17 2016-05-11 国家电网公司 Power grid operation monitoring information identification and classification method based on clustering
CN105574165B (en) * 2015-12-17 2019-11-26 国家电网公司 A kind of grid operating monitoring information identification classification method based on cluster
CN106526450A (en) * 2016-10-27 2017-03-22 桂林电子科技大学 Multi-target NoC testing planning optimization method
CN106526450B (en) * 2016-10-27 2018-12-11 桂林电子科技大学 A kind of multiple target NoC test-schedule optimization method

Similar Documents

Publication Publication Date Title
CN102693452A (en) Multiple-model soft-measuring method based on semi-supervised regression learning
CN104093203A (en) Access point selection algorithm used for wireless indoor positioning
CN101692257A (en) Method for registering complex curved surface
CN103336869A (en) Multi-objective optimization method based on Gaussian process simultaneous MIMO model
CN103744935A (en) Rapid mass data cluster processing method for computer
CN102506805B (en) Multi-measuring-point planeness evaluation method based on support vector classification
CN104616059A (en) DOA (Direction of Arrival) estimation method based on quantum-behaved particle swarm
CN103310463B (en) Based on the online method for tracking target of Probabilistic Principal Component Analysis and compressed sensing
CN104699595A (en) Software testing method facing to software upgrading
Kane et al. Determining the number of clusters for a k-means clustering algorithm
CN103942415A (en) Automatic data analysis method of flow cytometer
Zeng et al. A note on learning rare events in molecular dynamics using lstm and transformer
CN102426631A (en) High-dimension space mapping-based K harmonic mean clustering method
CN103207804B (en) Based on the MapReduce load simulation method of group operation daily record
CN103063233B (en) A kind of method that adopts multisensor to reduce measure error
Coronel-Brizio et al. The Anderson–Darling test of fit for the power-law distribution from left-censored samples
CN102033936A (en) Method for comparing similarity of time sequences
CN103914373A (en) Method and device for determining priority corresponding to module characteristic information
CN104899440A (en) Magnetic leakage flux defect reconstruction method based on universal gravitation search algorithm
CN104268217A (en) User behavior time relativity determining method and device
Martín-Fernández et al. Indexes to find the optimal number of clusters in a hierarchical clustering
CN105488523A (en) Data clustering analysis method based on Grassmann manifold
CN102930158A (en) Variable selection method based on partial least square
CN103020390B (en) A kind of model for predicting rainfall and run-off similarity
CN102637200B (en) Method for distributing multi-level associated data to same node of cluster

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120425