CN107330442A - In a kind of combination class between compactness and class separation property increment fuzzy clustering method - Google Patents
In a kind of combination class between compactness and class separation property increment fuzzy clustering method Download PDFInfo
- Publication number
- CN107330442A CN107330442A CN201710387502.8A CN201710387502A CN107330442A CN 107330442 A CN107330442 A CN 107330442A CN 201710387502 A CN201710387502 A CN 201710387502A CN 107330442 A CN107330442 A CN 107330442A
- Authority
- CN
- China
- Prior art keywords
- data
- class
- barycenter
- data block
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Abstract
The present invention proposes a kind of increment fuzzy clustering method of separation property between compactness and class in combination class, and this method splits data into continuous data block, and is handled in order, and this method can handle large-scale data and data flow.After being weighted to FCS algorithms, single channel increment method can be used.After this method, processing speed is significantly improved, and does not interfere with the accuracy of cluster.
Description
Technical field
The present invention relates to a kind of clustering method, in particular it relates in a kind of combination class between compactness and class separation property increasing
Fuzzy clustering method is measured, belongs to Data Mining.
Background technology
The high data object of similarity is divided into a cluster by clustering algorithm, and the high data object of distinctiveness ratio is divided into not
Same cluster.So far, the achievement in research for clustering algorithm is plentiful and substantial, according to accumulation rule of the data object in cluster not
Together, these algorithms can be divided into hard cluster and fuzzy clustering.In hard cluster, each data object can only be under the jurisdiction of a certain completely
Individual cluster;And fuzzy clustering then requires that each data object is under the jurisdiction of multiple clusters with different probability.Comparatively speaking, two class algorithm
Have his own strong points, hard clustering algorithm is simply efficient, and fuzzy clustering algorithm more meets cognition of the people to objective world.
Either hard cluster or fuzzy clustering, most of clustering algorithm only considers compactness in class, and ignores between class point
From property, therefore FCS (Fuzzy Compactness and Separation) algorithm is suggested.FCS algorithms ensure that tight in class
While cause property is minimum, separation property is maximum between class, and the characteristic with hard cluster and fuzzy clustering, can effectively lift cluster essence
Degree and cluster efficiency.
But FCS algorithms can not effectively handle large-scale data and flow data, therefore, the present invention proposes a kind of combine in class
The increment fuzzy clustering method of separation property between compactness and class.This method is pressed suitable by splitting data into continuous data block
Sequence is handled, and the present invention is handled large-scale data and data flow.
The content of the invention
In order to solve problems of the prior art, the present invention proposes a kind of to combine in class separation property between compactness and class
Increment fuzzy clustering method.
1. in a kind of combination class between compactness and class separation property increment fuzzy clustering method, it is characterised in that:This method
Step is as follows:
(1) whole data set is divided into D blocks, and is each data point distribution weight 1 per block number in;
(2) clustering processing is carried out to each data block for distributing weight;
(3) the 1st data block is clustered, obtains cluster result [U11,U12,...,U1t,...,U1c] and cluster barycenter
[a11,a12,...,a1t,...,a1c], wherein 0<T≤c, U1cRepresent the c classes of the 1st data block, a1cRepresent the 1st data block
C-th of barycenter;
(4) after the i-th -1 data block has been handled, wherein 1<I≤D, is the barycenter [a of i-1 data block(i-1)1,
a(i-1)2,...,a(i-1)t,...,a(i-1)c] in each barycenter assign weighted value wt, wtIt is subordinate to for the data point in data block
In cluster U(i-1)tDegree of membership sum;Weight 1 is assigned by each data point in i-th of the data block newly obtained, power will be assigned
The data block of group of data points Cheng Xin in the barycenter and i-th of data block of the i-th -1 data block of weight, enters again to new data block
Row cluster operation, obtains cluster result [Ui1,Ui2,...,Uit,...,Uic] and cluster barycenter [ai1,ai2,...,ait,...,
aic], sequentially find the cluster U where the barycenter that the i-th -1 time cluster is obtainedit, then class U(i-1)tIn all data points belong to class
Uit;
(5) circulation performs step (4), has handled last data block, has obtained final barycenter and cluster result.2. as weighed
Profit requires the increment fuzzy clustering method of separation property between compactness and class in a kind of combination class described in 1, it is characterised in that:Step
Suddenly what is clustered in (2) comprises the following steps that:
1) β, worst error value ε, maximum iteration τ are initializedmaxWith subordinated-degree matrix ucj, by η be entered as 0 to 1 with
Machine number, defines τ=1;
2) according to ηc, ucjWithUpdate ac;
3) according to ηc, acWithUpdate ucj;
4) according to β, acWithUpdate ηc;
5) τ=τ+1 is updated;
If 6) max (| ucj(τ)-ucj(τ -1) |)≤ε or τ=τmax, terminate iteration, otherwise return to step 2).
Wherein C is the number of class, and N is data amount check, ηcIt is that c-th of barycenter of control and class where other barycenter are misaligned
Parameter, ucjIt is degree of membership of j-th of data point to c-th of class, constraints isuci∈ [0,1], m be it is fuzzy because
Son and m>1, wjFor the weight of j-th of data, acIt is the barycenter of c-th of class, xjIt is j-th of data point,It is several
According to average, | | xj-ac||2It is square of j-th of data point to c-th of barycenter Euclidean distance, 0≤β≤1.0, k=1 ..., C.
In order to handle large-scale data and data flow, it is proposed that the present invention, this method can not only significantly improve place
Speed is managed, and does not interfere with the accuracy of cluster.Compared with the conventional method, the method that the present invention is newly proposed can be faster more smart
Really handle large-scale data and data flow.
Embodiment
In order that with single channel increment method, it is necessary to be weighted to FCS algorithms.First, matrix in the class of definition weighting
SIFWThe matrix S between classIFB
Wherein C is the number of class, and N is the number of data, wjFor weight, ucjIt is that j-th of data point is subordinate to c-th of class
Spend, constraints isucj∈ [0,1], m are fuzzy factor and m>1, xjIt is j-th of data point,It is sample average, acIt is the barycenter of c-th of class, wherein | | xj-ac||2It is j-th of data point to c-th of matter
Square of the Euclidean distance of the heart.
According to above-mentioned two formula, the object function of increment FCS algorithms is obtained
By following constraint
Wherein
0≤β≤1.0, k=1 ..., C.
According to constraints, method of Lagrange multipliers is used object function, and the following new object function of construction can be tried to achieve
Object function is set to reach the necessary condition of minimum value
Local derviation is asked to U in above formula and allows it to be equal to 0 and is obtained
It can be drawn with constraints according to above formula
Obtained likewise, seeking local derviation to A in new object function and allowing it to be equal to 0
It can be obtained according to above formula
This method comprises the following steps:
(1) whole data set is divided into D blocks, and is each data point distribution weight 1 per block number in;
(2) clustering processing is carried out to each data block for distributing weight;
(3) the 1st data block is clustered, obtains cluster result [U11,U12,...,U1t,...,U1c] and cluster barycenter
[a11,a12,...,a1t,...,a1c], wherein 0<T≤c, U1cRepresent the c classes of the 1st data block, a1cRepresent the 1st data block
C-th of barycenter;
(4) after the i-th -1 data block has been handled, wherein 1<I≤D, is the barycenter [a of i-1 data block(i-1)1,
a(i-1)2,...,a(i-1)t,...,a(i-1)c] in each barycenter assign weighted value wt, wtIt is subordinate to for the data point in data block
In cluster U(i-1)tDegree of membership sum;Weight 1 is assigned by each data point in i-th of the data block newly obtained, power will be assigned
The data block of group of data points Cheng Xin in the barycenter and i-th of data block of the i-th -1 data block of weight, enters again to new data block
Row cluster operation, obtains cluster result [Ui1,Ui2,...,Uit,...,Uic] and cluster barycenter [ai1,ai2,...,ait,...,
aic], sequentially find the cluster U where the barycenter that the i-th -1 time cluster is obtainedit, then class U(i-1)tIn all data points belong to class
Uit;
(5) circulation performs step (4), has handled last data block, has obtained final barycenter and cluster result.Step
(2) cluster comprises the following steps that in:
1) β, worst error value ε, maximum iteration τ are initializedmaxWith subordinated-degree matrix ucj, by η be entered as 0 to 1 with
Machine number, defines τ=1;
2) according to ηc, ucjA is updated with formula (10)c;
3) according to ηc, acU is updated with formula (8)cj;
4) according to β, acη is updated with formula (5)c;
5) τ=τ+1 is updated;
If 6) max (| ucj(τ)-ucj(τ -1) |)≤ε or τ=τmax, terminate iteration, otherwise return to step 2).
By taking Statlog Segmentation data sets as an example, the data set has 2310 data points, and 19 attributes are drawn
It is divided into 7 classes.Clustering method is carried out to the data set as follows:
The data set is divided into 10 pieces, every piece has 231 data points, and weight 1 is distributed to each data point;
1st data block is clustered, 7 classes, 7 barycenter corresponding with its are obtained, the weight of each barycenter distribution is
Such data point degree of membership sum;The representative of this 7 barycenter as 7 classes is added in the 2nd data block, new number is constituted
Clustered again according to block;By that analogy, always this 10 pieces of data block clusters are completed to can obtain final result.
Data set is carried out piecemeal processing by this method, is reduced data volume and is clustered iterations, therefore improves cluster
Efficiency, and can be obtained through experiment, the F-Measure values of SPFCS algorithms are lifted respectively than traditional SPFCM algorithms and SPHFCM algorithms
3.9% and 21.4%.
Claims (2)
1. in a kind of combination class between compactness and class separation property increment fuzzy clustering method, it is characterised in that:This method step
It is as follows:
(1) whole data set is divided into D blocks, and is each data point distribution weight 1 per block number in;
(2) clustering processing is carried out to each data block for distributing weight;
(3) the 1st data block is clustered, obtains cluster result [U11,U12,...,U1t,...,U1c] and cluster barycenter [a11,
a12,...,a1t,...,a1c], wherein 0<T≤c, U1cRepresent the c classes of the 1st data block, a1cRepresent the of the 1st data block
C barycenter;
(4) after the i-th -1 data block has been handled, wherein 1<I≤D, is the barycenter [a of i-1 data block(i-1)1,
a(i-1)2,...,a(i-1)t,...,a(i-1)c] in each barycenter assign weighted value wt, wtIt is subordinate to for the data point in data block
In cluster U(i-1)tDegree of membership sum;Weight 1 is assigned by each data point in i-th of the data block newly obtained, power will be assigned
The data block of group of data points Cheng Xin in the barycenter and i-th of data block of the i-th -1 data block of weight, enters again to new data block
Row cluster operation, obtains cluster result [Ui1,Ui2,...,Uit,...,Uic] and cluster barycenter [ai1,ai2,...,ait,...,
aic], sequentially find the cluster U where the barycenter that the i-th -1 time cluster is obtainedit, then class U(i-1)tIn all data points belong to class
Uit;
(5) circulation performs step (4), has handled last data block, has obtained final barycenter and cluster result.
2. in a kind of combination class as described in claim 1 between compactness and class separation property increment fuzzy clustering method, its
It is characterised by:Cluster comprises the following steps that in step (2):
1) β, worst error value ε, maximum iteration τ are initializedmaxWith subordinated-degree matrix ucj, by η be entered as 0 to 1 it is random
Number, defines τ=1;
2) according to ηc, ucjWithUpdate ac;
3) according to ηc, acWithUpdate ucj;
4) according to β, acWithUpdate ηc;
5) τ=τ+1 is updated;
If 6) max (| ucj(τ)-ucj(τ -1) |)≤ε or τ=τmax, terminate iteration, otherwise return to step 2).
Wherein C is the number of class, and N is data amount check, ηcThe misaligned parameter of the class where c-th of barycenter of control and other barycenter,
ucjIt is degree of membership of j-th of data point to c-th of class, constraints isM is fuzzy factor and m>
1, wjFor the weight of j-th of data, acIt is the barycenter of c-th of class, xjIt is j-th of data point,It is that data are equal
Value, | | xj-ac||2It is square of j-th of data point to c-th of barycenter Euclidean distance, 0≤β≤1.0, k=1 ..., C.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710387502.8A CN107330442A (en) | 2017-05-25 | 2017-05-25 | In a kind of combination class between compactness and class separation property increment fuzzy clustering method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710387502.8A CN107330442A (en) | 2017-05-25 | 2017-05-25 | In a kind of combination class between compactness and class separation property increment fuzzy clustering method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107330442A true CN107330442A (en) | 2017-11-07 |
Family
ID=60193637
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710387502.8A Pending CN107330442A (en) | 2017-05-25 | 2017-05-25 | In a kind of combination class between compactness and class separation property increment fuzzy clustering method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107330442A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097072A (en) * | 2019-03-19 | 2019-08-06 | 河南理工大学 | A kind of fuzzy clustering evaluation method based on two sub-module degree |
-
2017
- 2017-05-25 CN CN201710387502.8A patent/CN107330442A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097072A (en) * | 2019-03-19 | 2019-08-06 | 河南理工大学 | A kind of fuzzy clustering evaluation method based on two sub-module degree |
CN110097072B (en) * | 2019-03-19 | 2022-10-04 | 河南理工大学 | Fuzzy clustering evaluation method based on two-degree-of-modularity |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107169504B (en) | A kind of hand-written character recognition method based on extension Non-linear Kernel residual error network | |
CN107730006A (en) | A kind of nearly zero energy consumption controller of building based on regenerative resource big data deep learning | |
CN110245783B (en) | Short-term load prediction method based on C-means clustering fuzzy rough set | |
CN111008504B (en) | Wind power prediction error modeling method based on meteorological pattern recognition | |
CN104217015B (en) | Based on the hierarchy clustering method for sharing arest neighbors each other | |
CN108428024B (en) | Emergency resource allocation decision optimization method for irregular emergency under uncertain information | |
CN106991442A (en) | The self-adaptive kernel k means method and systems of shuffled frog leaping algorithm | |
CN111461921B (en) | Load modeling typical user database updating method based on machine learning | |
CN111523819B (en) | Energy-saving potential evaluation method considering uncertainty of output power of distributed power supply | |
CN116050540A (en) | Self-adaptive federal edge learning method based on joint bi-dimensional user scheduling | |
CN110765582B (en) | Self-organization center K-means microgrid scene division method based on Markov chain | |
CN110909994A (en) | Small hydropower station power generation amount prediction method based on big data drive | |
CN113392877B (en) | Daily load curve clustering method based on ant colony algorithm and C-K algorithm | |
CN107330442A (en) | In a kind of combination class between compactness and class separation property increment fuzzy clustering method | |
CN110570091A (en) | Load identification method based on improved F-score feature selection and particle swarm BP neural network | |
CN115967952A (en) | Resource allocation management system based on FCM clustering algorithm and edge computing industrial Internet of things | |
CN107093005A (en) | The method that tax handling service hall's automatic classification is realized based on big data mining algorithm | |
CN104698838B (en) | Based on the fuzzy scheduling rule digging method that domain dynamic is divided and learnt | |
CN105590167A (en) | Method and device for analyzing electric field multivariate operating data | |
CN106778824A (en) | A kind of increment fuzzy c central point clustering method towards time series data | |
CN116226689A (en) | Power distribution network typical operation scene generation method based on Gaussian mixture model | |
CN110852370A (en) | Clustering algorithm-based large-industry user segmentation method | |
CN107229950A (en) | In a kind of combination class between compactness and class separation property increment fuzzy clustering method | |
CN106897292A (en) | A kind of internet data clustering method and system | |
CN108205721B (en) | Spline interpolation typical daily load curve selecting device based on clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171107 |