CN108710914A - A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm - Google Patents
A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm Download PDFInfo
- Publication number
- CN108710914A CN108710914A CN201810495011.XA CN201810495011A CN108710914A CN 108710914 A CN108710914 A CN 108710914A CN 201810495011 A CN201810495011 A CN 201810495011A CN 108710914 A CN108710914 A CN 108710914A
- Authority
- CN
- China
- Prior art keywords
- fuzzy
- sample
- cluster centre
- gfc
- clustering algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2155—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm, step includes:Optimization division is carried out according to GFC the minimization of object function principle to sample set;Initialize the position and speed value of multiple particles;By the realization cluster centre initialization corresponding with sample clustering center of particle position value;Distance and the inversely proportional relationship of fuzzy membership between definition sample, cluster centre is to calculate sample fuzzy membership;Newer cluster centre is obtained by particle cluster algorithm iterative formula;GFC object functions are calculated.The fuzzy clustering algorithm that the present invention is constructed is not limited by constraint is normalized, and can make effectively to excavate and identify to noise data.The fuzzy membership constructed can be expanded with cluster centre inversely prroportional relationship form and be deformed into diversified forms, and the scope of application of clustering algorithm is improved, and can also make to hide to fuzzy indicator and ignore, the interference so as to avoid fuzzy indicator to clustering algorithm.
Description
Technical field
It is the invention belongs to the method for the unsupervised data classification in Data Mining, more particularly to a kind of to be based on broad sense mould
Paste the unsupervised data classification method of clustering algorithm.
Background technology
Fuzzy clustering based on object function is the important research content in clustering field, and is widely used in no prison
Superintend and direct the fields such as pattern classification, audio and video analyzing processing, machine intelligence study and data mining analysis.FCM Algorithms
(fuzzy C-means clustering, FCM) is a kind of typically from the fuzzy clustering calculation of cluster object function derivation
Method is most important and most widely used fuzzy clustering method.The model tormulation formal intuition of FCM algorithms and it should be readily appreciated that, is excellent
Change solve theory it is more rigorous, can by computer programming calculation, the result of cluster performance preferably etc..
FCM algorithms are limited to the constraint of normalizing condition, therefore more sensitive to noise data, far from all kinds of cluster centres
Noise data can still obtain higher fuzzy membership, PCM algorithms (Possibility C mean clustering
Algorithm, PCM) normalization constraint is abandoned on the basis of FCM algorithms, but sample fuzzy membership is only clustered with such
Center is related and leads to cluster centre consistency, and PFCM, FPCM scheduling algorithm are taken add respectively on the basis of FCM, PCM algorithm
Method combines and the form of multiplicative combination combines the two, to make full use of the respective advantage of two algorithms, but increases very much
Artificial experience is needed to take fixed union variable, so that clustering algorithm is complicated and determines method without effective parameter optimization.
There are three important factors in fuzzy clustering algorithm, first, the expression of fuzzy membership.Fuzzy membership embodies
The relationship of sample and cluster centre, when sample and cluster centre apart from it is larger when, clustering algorithm assigns sample smaller fuzzy
Degree of membership, so fuzzy membership is inversely proportional to sample, cluster centre distance.Second is that taking for cluster centre is fixed.In order to cluster mesh
Scalar functions minimize, and the sample that cluster centre should be larger with fuzzy membership is close, and cluster centre should fall into sample in other words
Assemble more place.Cluster centre is mainly calculated by two methods, one is sample fuzzy membership weighted average,
Another is to estimate to obtain by biological evolution algorithm such as genetic algorithm (genetic algorithm, GA) optimizing.Third, true
Surely object function is clustered.The cluster object function of FCM algorithms is minimized based on error weighted sum of squares in class, hidden degree of membership
Fuzzy c-Means Clustering Algorithm (hidden-membership fuzzy c-means clustering algorithm, HMFCM)
Converted by equation, by FCM algorithms cluster object function be converted to sample, cluster centre distance minimum form, this also body
Where the essence of existing clustering algorithm, that is, error is showed by sample and cluster centre distance in class, pursues error in class
It minimizes.Due to sample, cluster centre distance and the inversely proportional relationship of fuzzy membership, cluster object function can also express
For the maximization of fuzzy membership.
In addition, since FCM algorithms propose, Bezdek utilizes fuzzy membership determined by gradient method and AO alternative iteration methods
Degree, cluster centre method of estimation always affect the expansion of follow-up study work, and FCM convergence conditions require fuzzy be subordinate to
Category degree Second Order Sea matches battle array positive definite, is embodied in and fuzzy indicator is required to be more than 1.Theoretical proof points out, when utilizing particle cluster algorithm
Biological evolutions algorithms such as (particle swarm optimization algorithm, PSO) estimates fuzzy membership
When, due to having broken away from the constringent limitation of gradient method, fuzzy indicator value range can be extended to more than zero by clustering algorithm
Situation, clustering algorithm can still keep Clustering Effect.
Invention content
The present invention carries to overcome Fuzzy c-Means Clustering Algorithm (FCM) normalization to constrain the defect sensitive to noise data
Go out generalized fuzzy clustering algorithm (generalized fuzzy clustering algorithm, GFC), passes through inverse proportion form
Relationship between ambiguity in definition degree of membership and cluster centre, at the same using particle cluster algorithm carry out cluster centre parameter Estimation and
Can containing to noise data collection for object function is turned to fuzzy membership maximum.
In order to achieve the above-mentioned object of the invention, the present invention adopts the following technical scheme that:
A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm, includes the following steps:
Step 1:Optimization division is carried out according to GFC the minimization of object function principle to sample set;
Step 2:Initialize the position and speed value of multiple particles;
Step 3:By the realization cluster centre initialization corresponding with sample clustering center of particle position value;
Step 4:Distance and the inversely proportional relationship of fuzzy membership between definition sample, cluster centre is to calculate sample mould
Paste degree of membership;
Step 5:Newer cluster centre is obtained by particle cluster algorithm iterative formula;
Step 6:GFC object functions are calculated.
Further, the step 1 the specific steps are:
Enable X={ x1,x2,L,xj,L,xnIndicate given sample set, xjIndicate j-th of sample;1≤j≤n, n are samples
This number;Optimization division is carried out to sample set X so that target function value JGFCMinimum, wherein JGFCReally by formula (1) institute
It is fixed;
In formula (1), c indicates the classification number divided, 1≤i≤c, uijIndicate j-th of sample xjIt is under the jurisdiction of the mould of the i-th class
Paste degree of membership;U={ uij, i=1, L, c;J=1, L, n } indicate that subordinated-degree matrix, m (m > 0) they are fuzzy indicator,For uijM
It is secondary.
Further, the step 2 the specific steps are:The position that multiple c × d tie up particles is initialized with the random number between 0,1
Set Xh (0)With speed Vh (0)。
Further, the step 3 the specific steps are:
λ=1 is initialized, then the cluster centre of the λ times iteration is θi (λ), cluster centre matrix is P(λ)={ θi (λ), i=
1,...,c};By particle position Xh (λ)With every d dimension components for one group, the cluster centre θ of the i-th class is corresponded toi (λ), i=1 ..., c.
Definition iterations are λ, maximum iteration λmax。
Further, the step 4 the specific steps are:
The m powers of fuzzy membership are calculated with formula (2)
ε indicates the positive number of a very little, to overcome the formula incompleteness of formula (3);M is a certain normal number, to table
Existing fuzzy membership, apart from inversely prroportional relationship level, can be taken as 1 without loss of generality with sample, cluster centre;||xj-θi (λ)||Table
Show and is based on j-th of sample xjWith the i-th class cluster centre θi (λ)Distance,
Fuzzy clustering algorithm requires sample, cluster centre distance and the inversely proportional relationship of fuzzy membership, inversely prroportional relationship
There are many, the simply linear product inversely prroportional relationship of GFC algorithms selections, other inversely prroportional relationships can also introduce GFC calculations herein
Relationship replacement is carried out in method.
Further, the step 5 the specific steps are:
Define PSO algorithm fitness function formulas (4)
Judge ||f(U(λ))-f(U(λ-1))||< ε or λ > λmax, if so, then uij(λ) is that iterative algorithm parameter Estimation goes out
Optimal fuzzy membership, and enable uij (λ)=uijIn substitution formula (1), and then realize the optimal dividing to sample set X, ε, λmax
It is given in advance threshold value;If not, 6 are gone to step, until condition meets.
Further, the step 6 the specific steps are:
According to the excellent solution fitness function value f (U of PSO algorithms(λ)), record contemporary individual optimal solution P in particle cluster algorithmh (λ)
With group optimal solution g(λ), λ=λ+1 is enabled, by formula (5), (6) update particle rapidity Vh (λ+1)And position Xh (λ+1), go to step 3;
Vh (λ+1)=wVh (λ)+c1r1[Ph (λ)-Xh (λ)]+c2r2[g(λ)-Xh (λ)] (5)
Xh (λ+1)=Xh (λ)+Vh (λ+1) (6)
C in formula (5), (6)1, c2For accelerated factor, it is taken as positive constant;r1, r2Wei [0,1]Between random number, w is known as
Inertial factor.
Compared with the prior art, beneficial effects of the present invention are embodied in:
1. fuzzy indicator m>0 expansion and omission to fuzzy indicator
Cluster target function type (1) and inversely prroportional relationship formula (2) determine the property of GFC algorithms.Fuzzy indicator m is extended to
M > 0, by formula (2) it is found that sample, cluster centre cluster centre ||xj-θi||2Be withInversely proportional relationship, as m > 0,
With fuzzy membership uijDirect proportionality, ||xj-θi||2It is and uijInversely, meet fuzzy clustering algorithm
The smaller cluster basic principle of sample, the bigger degree of membership of cluster centre distance.Convolution (1) and formula (2), due to fuzzy indicator m
> 0, therefore GFC algorithm object functions maximize the minimum for being equivalent to error in class, also comply with the examination of clustering algorithm evaluation
Standard.
In addition, by formula (1) it is found that as m > 0, GFC algorithm object functionsMinimum be equivalent to
Minimum, i.e. GFC algorithms object function can be unrelated with fuzzy indicator m.Known to convolution (1) and (1), it is only necessary to determine;|xj-θi
||2Value can determine target function value, GFC algorithms can obtain object function independent of fuzzy indicator and carry out classification
Judgement, setting to fuzzy indicator can be omitted by being equivalent to GFC algorithms.
2. can be based on the noiseproof feature of the intuitive analysis GFC algorithms of diagram
The autgmentability of 3.GFC algorithm inversely prroportional relationships
GFC algorithms inversely prroportional relationship can be extended to a variety of expression-forms:
Wherein formula (7) is the inversely prroportional relationship of exponential form.
Wherein formula (8) is the inversely prroportional relationship of logarithmic form.
The forms such as the inversely prroportional relationship of polynomial form, and combination inversely prroportional relationship can also be constructed.GFC algorithm inverse ratios
The expansion of example analysis relationship, enriches the display form and the scope of application of GFC algorithms, forms GFC algorithm clusters.
4.GFC algorithms have good noise immunity to noise data
The FCM algorithms reason sensitive to noise data is its normalization constraint, as shown in formula (9):
That is sample xjFor all kinds of degrees of membership and be 1, as noise data xkWhen far from Various types of data, sample is fuzzy
Degree of membership still obeys normalization constraint, and FCM algorithms is caused still to assign higher fuzzy membership to noise data so that calculates
Method can not carry out rejection to noise data.
GFC algorithms fuzzy membership is determined by formula (10):
By formula (10) it is found that working as noise data xjFar from all cluster centre θiWhen, fuzzy membership uijValue will be non-
It is often small, without by normalized constraint, to distinguish itself and normal data, therefore GFC algorithms have certain noise
Rejection ability.
Description of the drawings
Fig. 1 is the Gaussian data collection for taking (5,5) as cluster centre;
Fig. 2 is the Gaussian data collection for taking (10,10) as cluster centre.
Specific implementation mode
Invention is further described in detail below in conjunction with the accompanying drawings.
A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm (GFC algorithms) of the invention abandons tradition
The modeling format of FCM algorithms, the distance between setting sample, cluster centre and the inversely proportional relationship of fuzzy membership, utilize particle
Group's algorithm (PSO) searches for the excellent solution of cluster centre in solution space, and turns to cluster object function with fuzzy membership maximum.GFC
Algorithm is not limited by normalization constraint, can make effectively excavation and identification to noise data.The inversely prroportional relationship shape constructed
Formula can be expanded and be deformed into diversified forms, and the scope of application of clustering algorithm is improved.GFC algorithms can also hide fuzzy indicator
Ignore, the interference so as to avoid fuzzy indicator to clustering algorithm.
The method of the present invention carries out as follows:
Step 1:Enable X={ x1,x2,L,xj,L,xnIndicate given sample set, xjIndicate j-th of sample;1≤j≤n,
N is the number of sample;Optimization division is carried out to sample set X so that target function value JGFCMinimum, wherein JGFCBy formula (1)
It determines.
In formula (1), c indicates the classification number divided, 1≤i≤c, uijIndicate j-th of sample xjIt is under the jurisdiction of the mould of the i-th class
Paste degree of membership.U={ uij, i=1, L, c;J=1, L, n } indicate that subordinated-degree matrix, m (m > 0) they are fuzzy indicator,For uijM
It is secondary.
Step 2:The position X that multiple c × d tie up particles is initialized with the random number between 0,1h (0)With speed Vh (0)。
Step 3:λ=1 is initialized, then the cluster centre of the λ times iteration is θi (λ), cluster centre matrix is P(λ)={ θi (λ),
I=1 ..., c }.By particle position Xh (λ)With every d dimension components for one group, the cluster centre θ of the i-th class is corresponded toi (λ), i=
1,...,c.Definition iterations are λ, maximum iteration λmax;
Step 4:The m powers of fuzzy membership are calculated with formula (2)
ε indicates the positive number of a very little, to overcome the formula incompleteness of formula (3);M is a certain normal number, to table
Existing fuzzy membership, apart from inversely prroportional relationship level, can be taken as 1 without loss of generality with sample, cluster centre.||xj-θi (λ)||Table
Show and is based on j-th of sample xjWith the i-th class cluster centre θi (λ)Distance,
Fuzzy clustering algorithm requires sample, cluster centre distance and the inversely proportional relationship of fuzzy membership, inversely prroportional relationship
There are many, the simply linear product inversely prroportional relationship of GFC algorithms selections, other inversely prroportional relationships can also introduce GFC calculations herein
Relationship replacement is carried out in method.
Step 5:Define PSO algorithm fitness function formulas (4)
Judge ||f(U(λ))-f(U(λ-1))||< ε or λ > λmax, if so, then uij (λ)Go out for iterative algorithm parameter Estimation
Optimal fuzzy membership, and enable uij (λ)=uijIn substitution formula (1), and then realize the optimal dividing to sample set X, ε, λmax
It is given in advance threshold value.If not, 6 are gone to step, until condition meets.
Step 6:According to the excellent solution fitness function value f (U of PSO algorithms(λ)), it is optimal to record present age individual in particle cluster algorithm
Solve Ph (λ)With group optimal solution g(λ), λ=λ+1 is enabled, by formula (5), (6) update particle rapidity Vh (λ+1)And position Xh (λ+1), go to step
3。
Vh (λ+1)=wVh (λ)+c1r1[Ph (λ)-Xh (λ)]+c2r2[g(λ)-Xh (λ)] (5)
Xh (λ+1)=Xh (λ)+Vh (λ+1) (6)
C in formula (5), (6)1, c2For accelerated factor, it is taken as positive constant;r1, r2Wei [0,1]Between random number, w is known as
Inertial factor.
Embodiment 1:
In the present embodiment, repeatedly there is the phenomenon that cluster centre consistency in emulation testing in PCM algorithms, cause to cluster
As a result invalid.Therefore in order to verify the validity and feasibility of GFC algorithms, GFC algorithms and FCM algorithms are compared survey by selection
Examination.
Generalized fuzzy clustering algorithm (GFC) is to carry out as follows:
Step 1:Enable X={ x1,x2,L,xj,L,xnIndicate given sample set, xjIndicate j-th of sample;1≤j≤n,
N is the number of sample;Optimization division is carried out to sample set X so that target function value JGFCMinimum, wherein JGFCBy formula (1)
It determines.
In formula (1), c indicates the classification number divided, 1≤i≤c, uijIndicate j-th of sample xjIt is under the jurisdiction of the mould of the i-th class
Paste degree of membership.U={ uij, i=1, L, c;J=1, L, n } indicate that subordinated-degree matrix, m (m > 0) they are fuzzy indicator,For uijM
It is secondary.
One, the test based on dimensional Gaussian data set
Test includes two aspects, first, the Cluster Validity of clustering algorithm, is mainly reflected in the test essence of clustering algorithm
Degree, second is that noise immunity of the clustering algorithm to noise data, it is desirable that clustering algorithm assigns lower fuzzy membership to noise data,
I.e. clustering algorithm can distinguish noise data and normal data.
1) validity test
Manually generated dimensional Gaussian data set is tested, it is 2 to select cluster classification number, is combined using two gaussian randoms
Test data set is generated, agreement class center is (5,5) and (10,10), and two class sample numbers are respectively respectively 100, and covariance matrix is all
Qu Wei [5 0;0 5].
Particle cluster algorithm provides the approach of GFC algorithms solution, particle position vector sum velocity vector in particle cluster algorithm
Often dimension component be all real number, a particle position vector is a feasible solution, and position vector dimension is c × d dimensions, and c is poly-
Class classification number, d are the component dimension of sample, have corresponded to the d dimension space coordinates of c cluster centre.Particle scale is taken as 30, iteration
Number is defined as 300 times, and the often dimension component value range of particle unknown vector is [0,20], every d dimensions point of particle position vector
Measure the specific d dimension components for corresponding to some cluster centre.In order to avoid particle group optimizing calculating is absorbed in the very poor office of Clustering Effect
Portion is optimal, chooses FCM algorithms and trains the cluster centre come, a primary position in series for particle cluster algorithm
It sets, to improve the clustering performance of GFC algorithms, that is, has:
θi(0)=θi * (11)
Wherein θi(0) the positional value X being together in series when having corresponded to particle cluster algorithm initialization assignmenth (0), θi *For FCM algorithms
Excellent solution in cluster result, its object is to jump out bad local extremum solution using FCM algorithms guiding GFC algorithms.In addition right
Parameter in GFC algorithms (2), takes ε=0.1, M=1
Test result records all kinds of measuring accuracies, preserves the two final cluster centre coordinates of class data iteration, and table 1 gives
Measuring accuracy and cluster centre coordinate.
Test result of the table 1 based on dimensional Gaussian data set
As known from Table 1, for the bulk data set with preferable Margin Classification, FCM algorithms and GFC algorithms can all obtain compared with
Good classifying quality, clustering precision otherness is little, and GFC algorithms can omit the selection to fuzzy indicator m, simplify algorithm parameter
Setting.
2) noise immunity is tested
Examine containment properties of two algorithms to noise data, that is, the fuzzy membership for requiring clustering algorithm to distribute noise data
It spends the smaller the better.On the basis of original dimensional Gaussian data set, supplement coordinate is a noise sample of (500,500).It is anti-
Test record of making an uproar result includes the poly- of sample class center, all kinds of fuzzy membership degrees of membership of noise data and normal data
Class effect, test result are as shown in table 2.
Table 2 is based on the test result of noisy (500,500) dimensional Gaussian data set
By test result table 2 it is found that noise data (500,500) all has the Clustering Effect of FCM algorithms and GFC algorithms
Large effect.It is analyzed as the normalization of FCM algorithms constrains, noise data (500,500) is imparted higher by FCM algorithms
Fuzzy membership, cause FCM algorithms can not rejection noise data.And influence of the noise data for GFC algorithms is exactly two classes
There is consistency phenomenon in cluster centre so that Cluster Validity reduces, but because of the noise immunity design principle of the algorithm, poly-
It is better than FCM algorithms in class precision, and noise data is made to be only capable of obtaining minimum fuzzy membership relative to normal data.Noise number
According to all kinds of fuzzy membership othernesses it is very small, therefore can utilize noise data characteristic construction rejection method, define mould
Degree of membership difference threshold formula formula (12) rejection noise data is pasted, as shown.
max(uij)-min(uij) < δ1 (12)
In formula (12), max (uij)-min(uij) < δ1For fuzzy membership difference threshold rejection formula。For arbitrary sample
This xj, its all kinds of fuzzy memberships are uij(i ∈ 1 ..., c), as these uijWhen meeting the requirement of formula (12), then visual sample
xjFor noise data.In based on the dimensional Gaussian of noisy (500,500) emulation cluster data test, δ is taken1=0.00001,
Rejection can be carried out to noise data, this is because noise data obtains minimum all kinds of fuzzy memberships far from cluster centre
Degree.When with GFC algorithms, it should use it for carrying out rejection to noise data first, then carry out clustering again, you can obtain
Obtain preferable cluster result.
Two, the test based on UCI data sets
Test of heuristics, iris data sets characteristic such as 3 institute of table are carried out based on iris data sets in UCI machine learning databases
Show, the related setting of test is similar with the test based on Gaussian data collection, and particle often ties up component value range and is;0,50], and
Value transformation is carried out to parameter M and ε when GFC test of heuristics, to study stability of the algorithm to parameter.Each clustering algorithm according to
Parameter and data set carry out 10 tests, calculate all kinds of cluster mean accuracies.Table 4 gives the test knot based on iris data
Fruit.
3 experimental data set attribute of table
As shown in Table 4, when M=1, ε=0.1, GFC algorithms obtain minimum cluster mean accuracy 90.60, when M=3, ε=
When 0.3, GFC algorithms obtain highest and cluster mean accuracy 92.20, and the minimum and highest average clustering precision of GFC algorithms is above
The cluster mean accuracy of FCM algorithms, in addition overall average precision of the GFC algorithms based on various parameters value is 91.47938, is better than
The Clustering Effect of FCM algorithms.From two emulation testings it is found that GFC algorithms have more preferably clustering performance compared with FCM algorithms, say
The Cluster Validity of set calculating method is illustrated.
Test result of the table 4 based on iris data sets
Step 2:The position X that multiple c × d tie up particles is initialized with the random number between 0,1h (0)With speed Vh (0)。
Step 3:λ=1 is initialized, then the cluster centre of the λ times iteration is θi (λ), cluster centre matrix is P(λ)={ θi (λ),
I=1 ..., c }.By particle position Xh (λ)With every d dimension components for one group, the cluster centre θ of the i-th class is corresponded toi (λ), i=
1,...,c.Definition iterations are λ, maximum iteration λmax;
Step 4:The m powers of fuzzy membership are calculated with formula (2)
ε indicates the positive number of a very little, to overcome the formula incompleteness of formula (3);M is a certain constant, to show
Fuzzy membership, apart from inversely prroportional relationship level, can be taken as 1 without loss of generality with sample, cluster centre.||xj-θi (λ)||It indicates
Based on j-th of sample xjWith the i-th class cluster centre θi (λ)Distance,
Fuzzy clustering algorithm requires sample, cluster centre distance and the inversely proportional relationship of fuzzy membership, inversely prroportional relationship
There are many, the simply linear product inversely prroportional relationship of GFC algorithms selections, other inversely prroportional relationships can also introduce GFC calculations herein
Relationship replacement is carried out in method.
Step 5:Define PSO algorithm fitness function formulas (4)
Judge ||f(U(λ))-f(U(λ-1))||< ε or λ > λmax, if so, then uij (λ)Go out for iterative algorithm parameter Estimation
Optimal fuzzy membership, and enable uij (λ)=uijIn substitution formula (1), and then realize the optimal dividing to sample set X, ε, λmax
It is given in advance threshold value.If not, 6 are gone to step, until condition meets.
Step 6:According to the excellent solution fitness function value f (U of PSO algorithms(λ)), it is optimal to record present age individual in particle cluster algorithm
Solve Ph (λ)With group optimal solution g(λ), λ=λ+1 is enabled, by formula (5), (6) update particle rapidity Vh (λ+1)And position Xh (λ+1), go to step
3。
Vh (λ+1)=wVh (λ)+c1r1[Ph (λ)-Xh (λ)]+c2r2[g(λ)-Xh (λ)] (5)
Xh (λ+1)=Xh (λ)+Vh (λ+1) (6)
C in formula (5), (6)1, c2For accelerated factor, it is taken as positive constant;r1, r2Wei [0,1]Between random number, w is known as
Inertial factor.
As depicted in figs. 1 and 2, the Gaussian data collection of a kind of 50 samples composition is provided, center is (5,5), covariance square
Battle array is [3,0;0,3], the dispersion degree of data expressed by covariance matrix, take first that (5,5) are Gaussian data collection it is poly-
Class center θ1, as shown in Figure 1, it is second cluster centre θ of Gaussian data collection to take (10,10)2Cluster centre, as shown in Figure 2.
It will be apparent that the cluster centre θ in Fig. 11Compared to θ in Fig. 22More meet the needs of practical clustering problem.The cluster centre θ of Fig. 11
The case where being less than Fig. 2 to the distance of each sample, according to the basic principle of fuzzy clustering algorithm it is found that in Fig. 1 each sample it is fuzzy
Degree of membership is higher than the sample in Fig. 2, if Fig. 2 is regarded as the original state of clustering algorithm, Fig. 1 is regarded as cluster optimization final state,
Then cluster centre θ in fig. 22Cluster centre θ into Fig. 11Optimize in evolution process, that is, is equivalent to the maximum of formula (1)
Change, so the maximization of GFC algorithms Chinese style (1) and the inversely prroportional relationship of formula (2) are to obey fuzzy clustering algorithm target call.
It can be illustrated by emulation experiment, the GFC algorithms proposed are outstanding on Cluster Validity and noiseproof feature.
The inversely prroportional relationship of GFC algorithm constructions sample, cluster centre and fuzzy membership, and with fuzzy membership m powers sum most
The big cluster object function for being turned to algorithm, while being searched in solution space in cluster using PSO population biological evolution algorithms
The excellent solution of the heart.GFC algorithms do not normalize constraint therefore will not be sensitive to noise data, can make effectively to refuse to noise data
Know, while fuzzy indicator can be omitted and omitted, and its inversely prroportional relationship can be transformed to a variety of inversely prroportional relationships, further enhance
Adaptability of the GFC algorithms to various data.
In conclusion the invention discloses a kind of unsupervised data based on generalized fuzzy clustering algorithm (GFC algorithms) point
Class method, characteristic information are shown as follows:Its characteristic information is shown as follows:1. pair sample set is according to GFC mesh
Scalar functions minimization principle carries out optimization division;2. initializing the position and speed value of multiple particles;3. by particle position value
Realization cluster centre initialization corresponding with sample clustering center;4. defining the distance and fuzzy membership between sample, cluster centre
Inversely proportional relationship is to calculate sample fuzzy membership;5. obtaining newer cluster centre by particle cluster algorithm iterative formula;
6. GFC object functions are calculated.The fuzzy clustering algorithm that the present invention is constructed is not limited by normalization constraint, can be to making an uproar
Sound data are made effectively to excavate and identify.The fuzzy membership constructed can expand deformation with cluster centre inversely prroportional relationship form
For diversified forms, the scope of application of clustering algorithm is improved, also fuzzy indicator can be made to hide and ignore, so as to avoid fuzzy finger
Mark the interference to clustering algorithm.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention.All essences in the present invention
All any modification, equivalent and improvement etc., should all be included in the protection scope of the present invention made by within refreshing and principle.
Claims (7)
1. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm, includes the following steps:
Step 1:Optimization division is carried out according to GFC the minimization of object function principle to sample set;
Step 2:Initialize the position and speed value of multiple particles;
Step 3:By the realization cluster centre initialization corresponding with sample clustering center of particle position value;
Step 4:Defining distance and the inversely proportional relationship of fuzzy membership between sample, cluster centre, sample is fuzzy to be subordinate to calculate
Category degree;
Step 5:Newer cluster centre is obtained by particle cluster algorithm iterative formula;
Step 6:GFC object functions are calculated.
2. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature
It is:The step 1 the specific steps are:
Enable X={ x1,x2,L,xj,L,xnIndicate given sample set, xjIndicate j-th of sample;1≤j≤n, n are samples
Number;Optimization division is carried out to sample set X so that target function value JGFCMinimum, wherein JGFCIt is determined by formula (1);
In formula (1), c indicates the classification number divided, 1≤i≤c, uijIndicate j-th of sample xjIt is under the jurisdiction of the fuzzy person in servitude of the i-th class
Category degree;U={ uij, i=1, L, c;J=1, L, n } indicate that subordinated-degree matrix, m (m > 0) they are fuzzy indicator,For uijM times.
3. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature
It is:The step 2 the specific steps are:The position X that multiple c × d tie up particles is initialized with the random number between 0,1h (0)And speed
Spend Vh (0)。
4. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature
It is:The step 3 the specific steps are:
λ=1 is initialized, then the cluster centre of the λ times iteration is θi (λ), cluster centre matrix is P(λ)={ θi (λ), i=1 ...,
c};By particle position Xh (λ)With every d dimension components for one group, the cluster centre θ of the i-th class is corresponded toi (λ), i=1 ..., c.Definition changes
Generation number is λ, maximum iteration λmax。
5. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature
It is:The step 4 the specific steps are:
The m powers of fuzzy membership are calculated with formula (2)
ε indicates the positive number of a very little, to overcome the formula incompleteness of formula (3);M is a certain normal number, to representative model
Degree of membership is pasted with sample, cluster centre apart from inversely prroportional relationship level, 1 can be taken as without loss of generality;||xj-θi (λ)||Indicate base
In j-th of sample xjWith the i-th class cluster centre θi (λ)Distance,
Fuzzy clustering algorithm requires sample, cluster centre distance and the inversely proportional relationship of fuzzy membership, and inversely prroportional relationship has more
Kind, the simply linear product inversely prroportional relationship of GFC algorithms selections, other inversely prroportional relationships can also be introduced into GFC algorithms herein
The replacement of the relationship of progress.
6. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature
It is:The step 5 the specific steps are:
Define PSO algorithm fitness function formulas (4)
Judge ||f(U(λ))-f(U(λ-1))||< ε or λ > λmax, if so, then uij (λ)Go out for iterative algorithm parameter Estimation optimal
Fuzzy membership, and enable uij (λ)=uijIn substitution formula (1), and then realize the optimal dividing to sample set X, ε, λmaxIt is prior
Given threshold value;If not, 6 are gone to step, until condition meets.
7. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature
It is:The step 6 the specific steps are:
According to the excellent solution fitness function value f (U of PSO algorithms(λ)), record contemporary individual optimal solution P in particle cluster algorithmh (λ)And group
Optimal solution g(λ), λ=λ+1 is enabled, by formula (5), (6) update particle rapidity Vh (λ+1)And position Xh (λ+1), go to step 3;
Vh (λ+1)=wVh (λ)+c1r1[Ph (λ)-Xh (λ)]+c2r2[g(λ)-Xh (λ)] (5)
Xh (λ+1)=Xh (λ)+Vh (λ+1) (6)
C in formula (5), (6)1, c2For accelerated factor, it is taken as positive constant;r1, r2Wei [0,1]Between random number, w is known as inertia
The factor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810495011.XA CN108710914A (en) | 2018-05-22 | 2018-05-22 | A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810495011.XA CN108710914A (en) | 2018-05-22 | 2018-05-22 | A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108710914A true CN108710914A (en) | 2018-10-26 |
Family
ID=63868606
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810495011.XA Pending CN108710914A (en) | 2018-05-22 | 2018-05-22 | A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108710914A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110310658A (en) * | 2019-06-21 | 2019-10-08 | 桂林电子科技大学 | A kind of speech Separation method based on Speech processing |
CN110929777A (en) * | 2019-11-18 | 2020-03-27 | 济南大学 | Data kernel clustering method based on transfer learning |
CN111561907A (en) * | 2020-03-31 | 2020-08-21 | 华电电力科学研究院有限公司 | Tower drum uneven settlement monitoring method based on plane dip angle measurement |
CN111666981A (en) * | 2020-05-13 | 2020-09-15 | 云南电网有限责任公司信息中心 | System data anomaly detection method based on genetic fuzzy clustering |
CN112215492A (en) * | 2020-10-12 | 2021-01-12 | 国网甘肃省电力公司电力科学研究院 | Aggregation grouping method based on power supply spatial distribution and regulation characteristics |
CN112422546A (en) * | 2020-11-10 | 2021-02-26 | 昆明理工大学 | Network anomaly detection method based on variable neighborhood algorithm and fuzzy clustering |
CN112487552A (en) * | 2020-11-18 | 2021-03-12 | 南京航空航天大学 | Envelope dividing and gain scheduling method of flying wing unmanned aerial vehicle based on fuzzy clustering |
CN112583723A (en) * | 2020-12-15 | 2021-03-30 | 东方红卫星移动通信有限公司 | FCM-based large-scale routing network expression method |
CN113447813A (en) * | 2020-09-03 | 2021-09-28 | 鲁能集团有限公司 | Fault diagnosis method and equipment for offshore wind generating set |
CN114548225A (en) * | 2022-01-19 | 2022-05-27 | 中国人民解放军国防科技大学 | Method, device and equipment for processing situation data outlier samples based on FCM |
CN115952432A (en) * | 2022-12-21 | 2023-04-11 | 四川大学华西医院 | Unsupervised clustering method based on diabetes data |
-
2018
- 2018-05-22 CN CN201810495011.XA patent/CN108710914A/en active Pending
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110310658B (en) * | 2019-06-21 | 2021-11-30 | 桂林电子科技大学 | Voice separation method based on voice signal processing |
CN110310658A (en) * | 2019-06-21 | 2019-10-08 | 桂林电子科技大学 | A kind of speech Separation method based on Speech processing |
CN110929777A (en) * | 2019-11-18 | 2020-03-27 | 济南大学 | Data kernel clustering method based on transfer learning |
CN111561907A (en) * | 2020-03-31 | 2020-08-21 | 华电电力科学研究院有限公司 | Tower drum uneven settlement monitoring method based on plane dip angle measurement |
CN111666981A (en) * | 2020-05-13 | 2020-09-15 | 云南电网有限责任公司信息中心 | System data anomaly detection method based on genetic fuzzy clustering |
CN111666981B (en) * | 2020-05-13 | 2023-03-31 | 云南电网有限责任公司信息中心 | System data anomaly detection method based on genetic fuzzy clustering |
CN113447813B (en) * | 2020-09-03 | 2022-09-13 | 中国绿发投资集团有限公司 | Fault diagnosis method and equipment for offshore wind generating set |
CN113447813A (en) * | 2020-09-03 | 2021-09-28 | 鲁能集团有限公司 | Fault diagnosis method and equipment for offshore wind generating set |
CN112215492A (en) * | 2020-10-12 | 2021-01-12 | 国网甘肃省电力公司电力科学研究院 | Aggregation grouping method based on power supply spatial distribution and regulation characteristics |
CN112422546A (en) * | 2020-11-10 | 2021-02-26 | 昆明理工大学 | Network anomaly detection method based on variable neighborhood algorithm and fuzzy clustering |
CN112487552A (en) * | 2020-11-18 | 2021-03-12 | 南京航空航天大学 | Envelope dividing and gain scheduling method of flying wing unmanned aerial vehicle based on fuzzy clustering |
CN112583723A (en) * | 2020-12-15 | 2021-03-30 | 东方红卫星移动通信有限公司 | FCM-based large-scale routing network expression method |
CN112583723B (en) * | 2020-12-15 | 2022-08-26 | 东方红卫星移动通信有限公司 | FCM-based large-scale routing network expression method |
CN114548225A (en) * | 2022-01-19 | 2022-05-27 | 中国人民解放军国防科技大学 | Method, device and equipment for processing situation data outlier samples based on FCM |
CN114548225B (en) * | 2022-01-19 | 2024-02-02 | 中国人民解放军国防科技大学 | Method, device and equipment for processing situation data outlier sample based on FCM |
CN115952432A (en) * | 2022-12-21 | 2023-04-11 | 四川大学华西医院 | Unsupervised clustering method based on diabetes data |
CN115952432B (en) * | 2022-12-21 | 2024-03-12 | 四川大学华西医院 | Unsupervised clustering method based on diabetes data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108710914A (en) | A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm | |
CN110163258A (en) | A kind of zero sample learning method and system reassigning mechanism based on semantic attribute attention | |
CN108777873A (en) | The wireless sensor network abnormal deviation data examination method of forest is isolated based on weighted blend | |
CN104021255B (en) | Multi-resolution hierarchical presenting and hierarchical matching weighted comparison method for CAD (computer aided design) model | |
Page et al. | Spatial product partition models | |
CN107301328B (en) | Cancer subtype accurate discovery and evolution analysis method based on data flow clustering | |
Lei et al. | Detecting protein complexes from DPINs by density based clustering with Pigeon-Inspired Optimization Algorithm | |
Zhang et al. | Gllpa: A graph layout based label propagation algorithm for community detection | |
Wang et al. | Combination evaluation method of fuzzy c-mean clustering validity based on hybrid weighted strategy | |
CN106250918A (en) | A kind of mixed Gauss model matching process based on the soil-shifting distance improved | |
Fei et al. | An improved BPNN method based on probability density for indoor location | |
Zhang et al. | Chameleon algorithm based on improved natural neighbor graph generating sub-clusters | |
CN102663773A (en) | Dual-core type adaptive fusion tracking method of video object | |
CN108985375A (en) | Consider the multiple features fusion tracking of particle weight spatial distribution | |
CN116597294A (en) | SLAM map topology evaluation method and device, electronic equipment and storage medium | |
Dalton | Optimal ROC-based classification and performance analysis under Bayesian uncertainty models | |
CN112784886B (en) | Brain image classification method based on multi-layer maximum spanning tree graph core | |
CN115442887A (en) | Indoor positioning method based on cellular network RSSI | |
Huang et al. | Optimizing fiducial marker placement for improved visual localization | |
Wang et al. | FCM algorithm and index CS for the signal sorting of radiant points | |
CN115638795B (en) | Indoor multi-source ubiquitous positioning fingerprint database generation and positioning method | |
Song et al. | Kernel-based fuzzy local information clustering algorithm self-integrating non-local information | |
Zhang et al. | A novel fuzzy clustering approach based on breadth-first search algorithm | |
Donghui et al. | A fuzzy similarity-based clustering optimized by particle swarm optimization | |
CN108446736A (en) | It is fused into the Novel semi-supervised to constraint and scale restriction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181026 |