CN107330442A - In a kind of combination class between compactness and class separation property increment fuzzy clustering method - Google Patents

In a kind of combination class between compactness and class separation property increment fuzzy clustering method Download PDF

Info

Publication number
CN107330442A
CN107330442A CN201710387502.8A CN201710387502A CN107330442A CN 107330442 A CN107330442 A CN 107330442A CN 201710387502 A CN201710387502 A CN 201710387502A CN 107330442 A CN107330442 A CN 107330442A
Authority
CN
China
Prior art keywords
data
class
barycenter
data block
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710387502.8A
Other languages
Chinese (zh)
Inventor
刘永利
段天毅
刘静
晁浩
陈敬丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Technology
Original Assignee
Henan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Technology filed Critical Henan University of Technology
Priority to CN201710387502.8A priority Critical patent/CN107330442A/en
Publication of CN107330442A publication Critical patent/CN107330442A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Abstract

The present invention proposes a kind of increment fuzzy clustering method of separation property between compactness and class in combination class, and this method splits data into continuous data block, and is handled in order, and this method can handle large-scale data and data flow.After being weighted to FCS algorithms, single channel increment method can be used.After this method, processing speed is significantly improved, and does not interfere with the accuracy of cluster.

Description

In a kind of combination class between compactness and class separation property increment fuzzy clustering method
Technical field
The present invention relates to a kind of clustering method, in particular it relates in a kind of combination class between compactness and class separation property increasing Fuzzy clustering method is measured, belongs to Data Mining.
Background technology
The high data object of similarity is divided into a cluster by clustering algorithm, and the high data object of distinctiveness ratio is divided into not Same cluster.So far, the achievement in research for clustering algorithm is plentiful and substantial, according to accumulation rule of the data object in cluster not Together, these algorithms can be divided into hard cluster and fuzzy clustering.In hard cluster, each data object can only be under the jurisdiction of a certain completely Individual cluster;And fuzzy clustering then requires that each data object is under the jurisdiction of multiple clusters with different probability.Comparatively speaking, two class algorithm Have his own strong points, hard clustering algorithm is simply efficient, and fuzzy clustering algorithm more meets cognition of the people to objective world.
Either hard cluster or fuzzy clustering, most of clustering algorithm only considers compactness in class, and ignores between class point From property, therefore FCS (Fuzzy Compactness and Separation) algorithm is suggested.FCS algorithms ensure that tight in class While cause property is minimum, separation property is maximum between class, and the characteristic with hard cluster and fuzzy clustering, can effectively lift cluster essence Degree and cluster efficiency.
But FCS algorithms can not effectively handle large-scale data and flow data, therefore, the present invention proposes a kind of combine in class The increment fuzzy clustering method of separation property between compactness and class.This method is pressed suitable by splitting data into continuous data block Sequence is handled, and the present invention is handled large-scale data and data flow.
The content of the invention
In order to solve problems of the prior art, the present invention proposes a kind of to combine in class separation property between compactness and class Increment fuzzy clustering method.
1. in a kind of combination class between compactness and class separation property increment fuzzy clustering method, it is characterised in that:This method Step is as follows:
(1) whole data set is divided into D blocks, and is each data point distribution weight 1 per block number in;
(2) clustering processing is carried out to each data block for distributing weight;
(3) the 1st data block is clustered, obtains cluster result [U11,U12,...,U1t,...,U1c] and cluster barycenter [a11,a12,...,a1t,...,a1c], wherein 0<T≤c, U1cRepresent the c classes of the 1st data block, a1cRepresent the 1st data block C-th of barycenter;
(4) after the i-th -1 data block has been handled, wherein 1<I≤D, is the barycenter [a of i-1 data block(i-1)1, a(i-1)2,...,a(i-1)t,...,a(i-1)c] in each barycenter assign weighted value wt, wtIt is subordinate to for the data point in data block In cluster U(i-1)tDegree of membership sum;Weight 1 is assigned by each data point in i-th of the data block newly obtained, power will be assigned The data block of group of data points Cheng Xin in the barycenter and i-th of data block of the i-th -1 data block of weight, enters again to new data block Row cluster operation, obtains cluster result [Ui1,Ui2,...,Uit,...,Uic] and cluster barycenter [ai1,ai2,...,ait,..., aic], sequentially find the cluster U where the barycenter that the i-th -1 time cluster is obtainedit, then class U(i-1)tIn all data points belong to class Uit
(5) circulation performs step (4), has handled last data block, has obtained final barycenter and cluster result.2. as weighed Profit requires the increment fuzzy clustering method of separation property between compactness and class in a kind of combination class described in 1, it is characterised in that:Step Suddenly what is clustered in (2) comprises the following steps that:
1) β, worst error value ε, maximum iteration τ are initializedmaxWith subordinated-degree matrix ucj, by η be entered as 0 to 1 with Machine number, defines τ=1;
2) according to ηc, ucjWithUpdate ac
3) according to ηc, acWithUpdate ucj
4) according to β, acWithUpdate ηc
5) τ=τ+1 is updated;
If 6) max (| ucj(τ)-ucj(τ -1) |)≤ε or τ=τmax, terminate iteration, otherwise return to step 2).
Wherein C is the number of class, and N is data amount check, ηcIt is that c-th of barycenter of control and class where other barycenter are misaligned Parameter, ucjIt is degree of membership of j-th of data point to c-th of class, constraints isuci∈ [0,1], m be it is fuzzy because Son and m>1, wjFor the weight of j-th of data, acIt is the barycenter of c-th of class, xjIt is j-th of data point,It is several According to average, | | xj-ac||2It is square of j-th of data point to c-th of barycenter Euclidean distance, 0≤β≤1.0, k=1 ..., C.
In order to handle large-scale data and data flow, it is proposed that the present invention, this method can not only significantly improve place Speed is managed, and does not interfere with the accuracy of cluster.Compared with the conventional method, the method that the present invention is newly proposed can be faster more smart Really handle large-scale data and data flow.
Embodiment
In order that with single channel increment method, it is necessary to be weighted to FCS algorithms.First, matrix in the class of definition weighting SIFWThe matrix S between classIFB
Wherein C is the number of class, and N is the number of data, wjFor weight, ucjIt is that j-th of data point is subordinate to c-th of class Spend, constraints isucj∈ [0,1], m are fuzzy factor and m>1, xjIt is j-th of data point,It is sample average, acIt is the barycenter of c-th of class, wherein | | xj-ac||2It is j-th of data point to c-th of matter Square of the Euclidean distance of the heart.
According to above-mentioned two formula, the object function of increment FCS algorithms is obtained
By following constraint
Wherein
0≤β≤1.0, k=1 ..., C.
According to constraints, method of Lagrange multipliers is used object function, and the following new object function of construction can be tried to achieve Object function is set to reach the necessary condition of minimum value
Local derviation is asked to U in above formula and allows it to be equal to 0 and is obtained
It can be drawn with constraints according to above formula
Obtained likewise, seeking local derviation to A in new object function and allowing it to be equal to 0
It can be obtained according to above formula
This method comprises the following steps:
(1) whole data set is divided into D blocks, and is each data point distribution weight 1 per block number in;
(2) clustering processing is carried out to each data block for distributing weight;
(3) the 1st data block is clustered, obtains cluster result [U11,U12,...,U1t,...,U1c] and cluster barycenter [a11,a12,...,a1t,...,a1c], wherein 0<T≤c, U1cRepresent the c classes of the 1st data block, a1cRepresent the 1st data block C-th of barycenter;
(4) after the i-th -1 data block has been handled, wherein 1<I≤D, is the barycenter [a of i-1 data block(i-1)1, a(i-1)2,...,a(i-1)t,...,a(i-1)c] in each barycenter assign weighted value wt, wtIt is subordinate to for the data point in data block In cluster U(i-1)tDegree of membership sum;Weight 1 is assigned by each data point in i-th of the data block newly obtained, power will be assigned The data block of group of data points Cheng Xin in the barycenter and i-th of data block of the i-th -1 data block of weight, enters again to new data block Row cluster operation, obtains cluster result [Ui1,Ui2,...,Uit,...,Uic] and cluster barycenter [ai1,ai2,...,ait,..., aic], sequentially find the cluster U where the barycenter that the i-th -1 time cluster is obtainedit, then class U(i-1)tIn all data points belong to class Uit
(5) circulation performs step (4), has handled last data block, has obtained final barycenter and cluster result.Step (2) cluster comprises the following steps that in:
1) β, worst error value ε, maximum iteration τ are initializedmaxWith subordinated-degree matrix ucj, by η be entered as 0 to 1 with Machine number, defines τ=1;
2) according to ηc, ucjA is updated with formula (10)c
3) according to ηc, acU is updated with formula (8)cj
4) according to β, acη is updated with formula (5)c
5) τ=τ+1 is updated;
If 6) max (| ucj(τ)-ucj(τ -1) |)≤ε or τ=τmax, terminate iteration, otherwise return to step 2).
By taking Statlog Segmentation data sets as an example, the data set has 2310 data points, and 19 attributes are drawn It is divided into 7 classes.Clustering method is carried out to the data set as follows:
The data set is divided into 10 pieces, every piece has 231 data points, and weight 1 is distributed to each data point;
1st data block is clustered, 7 classes, 7 barycenter corresponding with its are obtained, the weight of each barycenter distribution is Such data point degree of membership sum;The representative of this 7 barycenter as 7 classes is added in the 2nd data block, new number is constituted Clustered again according to block;By that analogy, always this 10 pieces of data block clusters are completed to can obtain final result.
Data set is carried out piecemeal processing by this method, is reduced data volume and is clustered iterations, therefore improves cluster Efficiency, and can be obtained through experiment, the F-Measure values of SPFCS algorithms are lifted respectively than traditional SPFCM algorithms and SPHFCM algorithms 3.9% and 21.4%.

Claims (2)

1. in a kind of combination class between compactness and class separation property increment fuzzy clustering method, it is characterised in that:This method step It is as follows:
(1) whole data set is divided into D blocks, and is each data point distribution weight 1 per block number in;
(2) clustering processing is carried out to each data block for distributing weight;
(3) the 1st data block is clustered, obtains cluster result [U11,U12,...,U1t,...,U1c] and cluster barycenter [a11, a12,...,a1t,...,a1c], wherein 0<T≤c, U1cRepresent the c classes of the 1st data block, a1cRepresent the of the 1st data block C barycenter;
(4) after the i-th -1 data block has been handled, wherein 1<I≤D, is the barycenter [a of i-1 data block(i-1)1, a(i-1)2,...,a(i-1)t,...,a(i-1)c] in each barycenter assign weighted value wt, wtIt is subordinate to for the data point in data block In cluster U(i-1)tDegree of membership sum;Weight 1 is assigned by each data point in i-th of the data block newly obtained, power will be assigned The data block of group of data points Cheng Xin in the barycenter and i-th of data block of the i-th -1 data block of weight, enters again to new data block Row cluster operation, obtains cluster result [Ui1,Ui2,...,Uit,...,Uic] and cluster barycenter [ai1,ai2,...,ait,..., aic], sequentially find the cluster U where the barycenter that the i-th -1 time cluster is obtainedit, then class U(i-1)tIn all data points belong to class Uit
(5) circulation performs step (4), has handled last data block, has obtained final barycenter and cluster result.
2. in a kind of combination class as described in claim 1 between compactness and class separation property increment fuzzy clustering method, its It is characterised by:Cluster comprises the following steps that in step (2):
1) β, worst error value ε, maximum iteration τ are initializedmaxWith subordinated-degree matrix ucj, by η be entered as 0 to 1 it is random Number, defines τ=1;
2) according to ηc, ucjWithUpdate ac
3) according to ηc, acWithUpdate ucj
4) according to β, acWithUpdate ηc
5) τ=τ+1 is updated;
If 6) max (| ucj(τ)-ucj(τ -1) |)≤ε or τ=τmax, terminate iteration, otherwise return to step 2).
Wherein C is the number of class, and N is data amount check, ηcThe misaligned parameter of the class where c-th of barycenter of control and other barycenter, ucjIt is degree of membership of j-th of data point to c-th of class, constraints isM is fuzzy factor and m> 1, wjFor the weight of j-th of data, acIt is the barycenter of c-th of class, xjIt is j-th of data point,It is that data are equal Value, | | xj-ac||2It is square of j-th of data point to c-th of barycenter Euclidean distance, 0≤β≤1.0, k=1 ..., C.
CN201710387502.8A 2017-05-25 2017-05-25 In a kind of combination class between compactness and class separation property increment fuzzy clustering method Pending CN107330442A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710387502.8A CN107330442A (en) 2017-05-25 2017-05-25 In a kind of combination class between compactness and class separation property increment fuzzy clustering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710387502.8A CN107330442A (en) 2017-05-25 2017-05-25 In a kind of combination class between compactness and class separation property increment fuzzy clustering method

Publications (1)

Publication Number Publication Date
CN107330442A true CN107330442A (en) 2017-11-07

Family

ID=60193637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710387502.8A Pending CN107330442A (en) 2017-05-25 2017-05-25 In a kind of combination class between compactness and class separation property increment fuzzy clustering method

Country Status (1)

Country Link
CN (1) CN107330442A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097072A (en) * 2019-03-19 2019-08-06 河南理工大学 A kind of fuzzy clustering evaluation method based on two sub-module degree

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097072A (en) * 2019-03-19 2019-08-06 河南理工大学 A kind of fuzzy clustering evaluation method based on two sub-module degree
CN110097072B (en) * 2019-03-19 2022-10-04 河南理工大学 Fuzzy clustering evaluation method based on two-degree-of-modularity

Similar Documents

Publication Publication Date Title
CN107169504B (en) A kind of hand-written character recognition method based on extension Non-linear Kernel residual error network
CN107730006A (en) A kind of nearly zero energy consumption controller of building based on regenerative resource big data deep learning
CN110245783B (en) Short-term load prediction method based on C-means clustering fuzzy rough set
CN111008504B (en) Wind power prediction error modeling method based on meteorological pattern recognition
CN104217015B (en) Based on the hierarchy clustering method for sharing arest neighbors each other
CN108428024B (en) Emergency resource allocation decision optimization method for irregular emergency under uncertain information
CN106991442A (en) The self-adaptive kernel k means method and systems of shuffled frog leaping algorithm
CN111461921B (en) Load modeling typical user database updating method based on machine learning
CN111523819B (en) Energy-saving potential evaluation method considering uncertainty of output power of distributed power supply
CN116050540A (en) Self-adaptive federal edge learning method based on joint bi-dimensional user scheduling
CN110765582B (en) Self-organization center K-means microgrid scene division method based on Markov chain
CN110909994A (en) Small hydropower station power generation amount prediction method based on big data drive
CN113392877B (en) Daily load curve clustering method based on ant colony algorithm and C-K algorithm
CN107330442A (en) In a kind of combination class between compactness and class separation property increment fuzzy clustering method
CN110570091A (en) Load identification method based on improved F-score feature selection and particle swarm BP neural network
CN115967952A (en) Resource allocation management system based on FCM clustering algorithm and edge computing industrial Internet of things
CN107093005A (en) The method that tax handling service hall&#39;s automatic classification is realized based on big data mining algorithm
CN104698838B (en) Based on the fuzzy scheduling rule digging method that domain dynamic is divided and learnt
CN105590167A (en) Method and device for analyzing electric field multivariate operating data
CN106778824A (en) A kind of increment fuzzy c central point clustering method towards time series data
CN116226689A (en) Power distribution network typical operation scene generation method based on Gaussian mixture model
CN110852370A (en) Clustering algorithm-based large-industry user segmentation method
CN107229950A (en) In a kind of combination class between compactness and class separation property increment fuzzy clustering method
CN106897292A (en) A kind of internet data clustering method and system
CN108205721B (en) Spline interpolation typical daily load curve selecting device based on clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171107