CN108710914A - A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm - Google Patents

A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm Download PDF

Info

Publication number
CN108710914A
CN108710914A CN201810495011.XA CN201810495011A CN108710914A CN 108710914 A CN108710914 A CN 108710914A CN 201810495011 A CN201810495011 A CN 201810495011A CN 108710914 A CN108710914 A CN 108710914A
Authority
CN
China
Prior art keywords
fuzzy
sample
cluster centre
gfc
clustering algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810495011.XA
Other languages
Chinese (zh)
Inventor
文传军
许定亮
刘福燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou Institute of Technology
Original Assignee
Changzhou Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou Institute of Technology filed Critical Changzhou Institute of Technology
Priority to CN201810495011.XA priority Critical patent/CN108710914A/en
Publication of CN108710914A publication Critical patent/CN108710914A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm, step includes:Optimization division is carried out according to GFC the minimization of object function principle to sample set;Initialize the position and speed value of multiple particles;By the realization cluster centre initialization corresponding with sample clustering center of particle position value;Distance and the inversely proportional relationship of fuzzy membership between definition sample, cluster centre is to calculate sample fuzzy membership;Newer cluster centre is obtained by particle cluster algorithm iterative formula;GFC object functions are calculated.The fuzzy clustering algorithm that the present invention is constructed is not limited by constraint is normalized, and can make effectively to excavate and identify to noise data.The fuzzy membership constructed can be expanded with cluster centre inversely prroportional relationship form and be deformed into diversified forms, and the scope of application of clustering algorithm is improved, and can also make to hide to fuzzy indicator and ignore, the interference so as to avoid fuzzy indicator to clustering algorithm.

Description

A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm
Technical field
It is the invention belongs to the method for the unsupervised data classification in Data Mining, more particularly to a kind of to be based on broad sense mould Paste the unsupervised data classification method of clustering algorithm.
Background technology
Fuzzy clustering based on object function is the important research content in clustering field, and is widely used in no prison Superintend and direct the fields such as pattern classification, audio and video analyzing processing, machine intelligence study and data mining analysis.FCM Algorithms (fuzzy C-means clustering, FCM) is a kind of typically from the fuzzy clustering calculation of cluster object function derivation Method is most important and most widely used fuzzy clustering method.The model tormulation formal intuition of FCM algorithms and it should be readily appreciated that, is excellent Change solve theory it is more rigorous, can by computer programming calculation, the result of cluster performance preferably etc..
FCM algorithms are limited to the constraint of normalizing condition, therefore more sensitive to noise data, far from all kinds of cluster centres Noise data can still obtain higher fuzzy membership, PCM algorithms (Possibility C mean clustering Algorithm, PCM) normalization constraint is abandoned on the basis of FCM algorithms, but sample fuzzy membership is only clustered with such Center is related and leads to cluster centre consistency, and PFCM, FPCM scheduling algorithm are taken add respectively on the basis of FCM, PCM algorithm Method combines and the form of multiplicative combination combines the two, to make full use of the respective advantage of two algorithms, but increases very much Artificial experience is needed to take fixed union variable, so that clustering algorithm is complicated and determines method without effective parameter optimization.
There are three important factors in fuzzy clustering algorithm, first, the expression of fuzzy membership.Fuzzy membership embodies The relationship of sample and cluster centre, when sample and cluster centre apart from it is larger when, clustering algorithm assigns sample smaller fuzzy Degree of membership, so fuzzy membership is inversely proportional to sample, cluster centre distance.Second is that taking for cluster centre is fixed.In order to cluster mesh Scalar functions minimize, and the sample that cluster centre should be larger with fuzzy membership is close, and cluster centre should fall into sample in other words Assemble more place.Cluster centre is mainly calculated by two methods, one is sample fuzzy membership weighted average, Another is to estimate to obtain by biological evolution algorithm such as genetic algorithm (genetic algorithm, GA) optimizing.Third, true Surely object function is clustered.The cluster object function of FCM algorithms is minimized based on error weighted sum of squares in class, hidden degree of membership Fuzzy c-Means Clustering Algorithm (hidden-membership fuzzy c-means clustering algorithm, HMFCM) Converted by equation, by FCM algorithms cluster object function be converted to sample, cluster centre distance minimum form, this also body Where the essence of existing clustering algorithm, that is, error is showed by sample and cluster centre distance in class, pursues error in class It minimizes.Due to sample, cluster centre distance and the inversely proportional relationship of fuzzy membership, cluster object function can also express For the maximization of fuzzy membership.
In addition, since FCM algorithms propose, Bezdek utilizes fuzzy membership determined by gradient method and AO alternative iteration methods Degree, cluster centre method of estimation always affect the expansion of follow-up study work, and FCM convergence conditions require fuzzy be subordinate to Category degree Second Order Sea matches battle array positive definite, is embodied in and fuzzy indicator is required to be more than 1.Theoretical proof points out, when utilizing particle cluster algorithm Biological evolutions algorithms such as (particle swarm optimization algorithm, PSO) estimates fuzzy membership When, due to having broken away from the constringent limitation of gradient method, fuzzy indicator value range can be extended to more than zero by clustering algorithm Situation, clustering algorithm can still keep Clustering Effect.
Invention content
The present invention carries to overcome Fuzzy c-Means Clustering Algorithm (FCM) normalization to constrain the defect sensitive to noise data Go out generalized fuzzy clustering algorithm (generalized fuzzy clustering algorithm, GFC), passes through inverse proportion form Relationship between ambiguity in definition degree of membership and cluster centre, at the same using particle cluster algorithm carry out cluster centre parameter Estimation and Can containing to noise data collection for object function is turned to fuzzy membership maximum.
In order to achieve the above-mentioned object of the invention, the present invention adopts the following technical scheme that:
A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm, includes the following steps:
Step 1:Optimization division is carried out according to GFC the minimization of object function principle to sample set;
Step 2:Initialize the position and speed value of multiple particles;
Step 3:By the realization cluster centre initialization corresponding with sample clustering center of particle position value;
Step 4:Distance and the inversely proportional relationship of fuzzy membership between definition sample, cluster centre is to calculate sample mould Paste degree of membership;
Step 5:Newer cluster centre is obtained by particle cluster algorithm iterative formula;
Step 6:GFC object functions are calculated.
Further, the step 1 the specific steps are:
Enable X={ x1,x2,L,xj,L,xnIndicate given sample set, xjIndicate j-th of sample;1≤j≤n, n are samples This number;Optimization division is carried out to sample set X so that target function value JGFCMinimum, wherein JGFCReally by formula (1) institute It is fixed;
In formula (1), c indicates the classification number divided, 1≤i≤c, uijIndicate j-th of sample xjIt is under the jurisdiction of the mould of the i-th class Paste degree of membership;U={ uij, i=1, L, c;J=1, L, n } indicate that subordinated-degree matrix, m (m > 0) they are fuzzy indicator,For uijM It is secondary.
Further, the step 2 the specific steps are:The position that multiple c × d tie up particles is initialized with the random number between 0,1 Set Xh (0)With speed Vh (0)
Further, the step 3 the specific steps are:
λ=1 is initialized, then the cluster centre of the λ times iteration is θi (λ), cluster centre matrix is P(λ)={ θi (λ), i= 1,...,c};By particle position Xh (λ)With every d dimension components for one group, the cluster centre θ of the i-th class is corresponded toi (λ), i=1 ..., c. Definition iterations are λ, maximum iteration λmax
Further, the step 4 the specific steps are:
The m powers of fuzzy membership are calculated with formula (2)
ε indicates the positive number of a very little, to overcome the formula incompleteness of formula (3);M is a certain normal number, to table Existing fuzzy membership, apart from inversely prroportional relationship level, can be taken as 1 without loss of generality with sample, cluster centre;||xji (λ)||Table Show and is based on j-th of sample xjWith the i-th class cluster centre θi (λ)Distance,
Fuzzy clustering algorithm requires sample, cluster centre distance and the inversely proportional relationship of fuzzy membership, inversely prroportional relationship There are many, the simply linear product inversely prroportional relationship of GFC algorithms selections, other inversely prroportional relationships can also introduce GFC calculations herein Relationship replacement is carried out in method.
Further, the step 5 the specific steps are:
Define PSO algorithm fitness function formulas (4)
Judge &#124;&#124;f(U(λ))-f(U(λ-1))&#124;&#124;< ε or λ > λmax, if so, then uij(λ) is that iterative algorithm parameter Estimation goes out Optimal fuzzy membership, and enable uij (λ)=uijIn substitution formula (1), and then realize the optimal dividing to sample set X, ε, λmax It is given in advance threshold value;If not, 6 are gone to step, until condition meets.
Further, the step 6 the specific steps are:
According to the excellent solution fitness function value f (U of PSO algorithms(λ)), record contemporary individual optimal solution P in particle cluster algorithmh (λ) With group optimal solution g(λ), λ=λ+1 is enabled, by formula (5), (6) update particle rapidity Vh (λ+1)And position Xh (λ+1), go to step 3;
Vh (λ+1)=wVh (λ)+c1r1[Ph (λ)-Xh (λ)]+c2r2[g(λ)-Xh (λ)] (5)
Xh (λ+1)=Xh (λ)+Vh (λ+1) (6)
C in formula (5), (6)1, c2For accelerated factor, it is taken as positive constant;r1, r2Wei &#91;0,1&#93;Between random number, w is known as Inertial factor.
Compared with the prior art, beneficial effects of the present invention are embodied in:
1. fuzzy indicator m>0 expansion and omission to fuzzy indicator
Cluster target function type (1) and inversely prroportional relationship formula (2) determine the property of GFC algorithms.Fuzzy indicator m is extended to M > 0, by formula (2) it is found that sample, cluster centre cluster centre &#124;&#124;xji||2Be withInversely proportional relationship, as m > 0, With fuzzy membership uijDirect proportionality, &#124;&#124;xji||2It is and uijInversely, meet fuzzy clustering algorithm The smaller cluster basic principle of sample, the bigger degree of membership of cluster centre distance.Convolution (1) and formula (2), due to fuzzy indicator m > 0, therefore GFC algorithm object functions maximize the minimum for being equivalent to error in class, also comply with the examination of clustering algorithm evaluation Standard.
In addition, by formula (1) it is found that as m > 0, GFC algorithm object functionsMinimum be equivalent to Minimum, i.e. GFC algorithms object function can be unrelated with fuzzy indicator m.Known to convolution (1) and (1), it is only necessary to determine;&#124;xji ||2Value can determine target function value, GFC algorithms can obtain object function independent of fuzzy indicator and carry out classification Judgement, setting to fuzzy indicator can be omitted by being equivalent to GFC algorithms.
2. can be based on the noiseproof feature of the intuitive analysis GFC algorithms of diagram
The autgmentability of 3.GFC algorithm inversely prroportional relationships
GFC algorithms inversely prroportional relationship can be extended to a variety of expression-forms:
Wherein formula (7) is the inversely prroportional relationship of exponential form.
Wherein formula (8) is the inversely prroportional relationship of logarithmic form.
The forms such as the inversely prroportional relationship of polynomial form, and combination inversely prroportional relationship can also be constructed.GFC algorithm inverse ratios The expansion of example analysis relationship, enriches the display form and the scope of application of GFC algorithms, forms GFC algorithm clusters.
4.GFC algorithms have good noise immunity to noise data
The FCM algorithms reason sensitive to noise data is its normalization constraint, as shown in formula (9):
That is sample xjFor all kinds of degrees of membership and be 1, as noise data xkWhen far from Various types of data, sample is fuzzy Degree of membership still obeys normalization constraint, and FCM algorithms is caused still to assign higher fuzzy membership to noise data so that calculates Method can not carry out rejection to noise data.
GFC algorithms fuzzy membership is determined by formula (10):
By formula (10) it is found that working as noise data xjFar from all cluster centre θiWhen, fuzzy membership uijValue will be non- It is often small, without by normalized constraint, to distinguish itself and normal data, therefore GFC algorithms have certain noise Rejection ability.
Description of the drawings
Fig. 1 is the Gaussian data collection for taking (5,5) as cluster centre;
Fig. 2 is the Gaussian data collection for taking (10,10) as cluster centre.
Specific implementation mode
Invention is further described in detail below in conjunction with the accompanying drawings.
A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm (GFC algorithms) of the invention abandons tradition The modeling format of FCM algorithms, the distance between setting sample, cluster centre and the inversely proportional relationship of fuzzy membership, utilize particle Group's algorithm (PSO) searches for the excellent solution of cluster centre in solution space, and turns to cluster object function with fuzzy membership maximum.GFC Algorithm is not limited by normalization constraint, can make effectively excavation and identification to noise data.The inversely prroportional relationship shape constructed Formula can be expanded and be deformed into diversified forms, and the scope of application of clustering algorithm is improved.GFC algorithms can also hide fuzzy indicator Ignore, the interference so as to avoid fuzzy indicator to clustering algorithm.
The method of the present invention carries out as follows:
Step 1:Enable X={ x1,x2,L,xj,L,xnIndicate given sample set, xjIndicate j-th of sample;1≤j≤n, N is the number of sample;Optimization division is carried out to sample set X so that target function value JGFCMinimum, wherein JGFCBy formula (1) It determines.
In formula (1), c indicates the classification number divided, 1≤i≤c, uijIndicate j-th of sample xjIt is under the jurisdiction of the mould of the i-th class Paste degree of membership.U={ uij, i=1, L, c;J=1, L, n } indicate that subordinated-degree matrix, m (m > 0) they are fuzzy indicator,For uijM It is secondary.
Step 2:The position X that multiple c × d tie up particles is initialized with the random number between 0,1h (0)With speed Vh (0)
Step 3:λ=1 is initialized, then the cluster centre of the λ times iteration is θi (λ), cluster centre matrix is P(λ)={ θi (λ), I=1 ..., c }.By particle position Xh (λ)With every d dimension components for one group, the cluster centre θ of the i-th class is corresponded toi (λ), i= 1,...,c.Definition iterations are λ, maximum iteration λmax
Step 4:The m powers of fuzzy membership are calculated with formula (2)
ε indicates the positive number of a very little, to overcome the formula incompleteness of formula (3);M is a certain normal number, to table Existing fuzzy membership, apart from inversely prroportional relationship level, can be taken as 1 without loss of generality with sample, cluster centre.&#124;&#124;xji (λ)&#124;&#124;Table Show and is based on j-th of sample xjWith the i-th class cluster centre θi (λ)Distance,
Fuzzy clustering algorithm requires sample, cluster centre distance and the inversely proportional relationship of fuzzy membership, inversely prroportional relationship There are many, the simply linear product inversely prroportional relationship of GFC algorithms selections, other inversely prroportional relationships can also introduce GFC calculations herein Relationship replacement is carried out in method.
Step 5:Define PSO algorithm fitness function formulas (4)
Judge &#124;&#124;f(U(λ))-f(U(λ-1))&#124;&#124;< ε or λ > λmax, if so, then uij (λ)Go out for iterative algorithm parameter Estimation Optimal fuzzy membership, and enable uij (λ)=uijIn substitution formula (1), and then realize the optimal dividing to sample set X, ε, λmax It is given in advance threshold value.If not, 6 are gone to step, until condition meets.
Step 6:According to the excellent solution fitness function value f (U of PSO algorithms(λ)), it is optimal to record present age individual in particle cluster algorithm Solve Ph (λ)With group optimal solution g(λ), λ=λ+1 is enabled, by formula (5), (6) update particle rapidity Vh (λ+1)And position Xh (λ+1), go to step 3。
Vh (λ+1)=wVh (λ)+c1r1[Ph (λ)-Xh (λ)]+c2r2[g(λ)-Xh (λ)] (5)
Xh (λ+1)=Xh (λ)+Vh (λ+1) (6)
C in formula (5), (6)1, c2For accelerated factor, it is taken as positive constant;r1, r2Wei &#91;0,1&#93;Between random number, w is known as Inertial factor.
Embodiment 1:
In the present embodiment, repeatedly there is the phenomenon that cluster centre consistency in emulation testing in PCM algorithms, cause to cluster As a result invalid.Therefore in order to verify the validity and feasibility of GFC algorithms, GFC algorithms and FCM algorithms are compared survey by selection Examination.
Generalized fuzzy clustering algorithm (GFC) is to carry out as follows:
Step 1:Enable X={ x1,x2,L,xj,L,xnIndicate given sample set, xjIndicate j-th of sample;1≤j≤n, N is the number of sample;Optimization division is carried out to sample set X so that target function value JGFCMinimum, wherein JGFCBy formula (1) It determines.
In formula (1), c indicates the classification number divided, 1≤i≤c, uijIndicate j-th of sample xjIt is under the jurisdiction of the mould of the i-th class Paste degree of membership.U={ uij, i=1, L, c;J=1, L, n } indicate that subordinated-degree matrix, m (m > 0) they are fuzzy indicator,For uijM It is secondary.
One, the test based on dimensional Gaussian data set
Test includes two aspects, first, the Cluster Validity of clustering algorithm, is mainly reflected in the test essence of clustering algorithm Degree, second is that noise immunity of the clustering algorithm to noise data, it is desirable that clustering algorithm assigns lower fuzzy membership to noise data, I.e. clustering algorithm can distinguish noise data and normal data.
1) validity test
Manually generated dimensional Gaussian data set is tested, it is 2 to select cluster classification number, is combined using two gaussian randoms Test data set is generated, agreement class center is (5,5) and (10,10), and two class sample numbers are respectively respectively 100, and covariance matrix is all Qu Wei &#91;5 0;0 5&#93;.
Particle cluster algorithm provides the approach of GFC algorithms solution, particle position vector sum velocity vector in particle cluster algorithm Often dimension component be all real number, a particle position vector is a feasible solution, and position vector dimension is c × d dimensions, and c is poly- Class classification number, d are the component dimension of sample, have corresponded to the d dimension space coordinates of c cluster centre.Particle scale is taken as 30, iteration Number is defined as 300 times, and the often dimension component value range of particle unknown vector is &#91;0,20&#93;, every d dimensions point of particle position vector Measure the specific d dimension components for corresponding to some cluster centre.In order to avoid particle group optimizing calculating is absorbed in the very poor office of Clustering Effect Portion is optimal, chooses FCM algorithms and trains the cluster centre come, a primary position in series for particle cluster algorithm It sets, to improve the clustering performance of GFC algorithms, that is, has:
θi(0)=θi * (11)
Wherein θi(0) the positional value X being together in series when having corresponded to particle cluster algorithm initialization assignmenth (0), θi *For FCM algorithms Excellent solution in cluster result, its object is to jump out bad local extremum solution using FCM algorithms guiding GFC algorithms.In addition right Parameter in GFC algorithms (2), takes ε=0.1, M=1
Test result records all kinds of measuring accuracies, preserves the two final cluster centre coordinates of class data iteration, and table 1 gives Measuring accuracy and cluster centre coordinate.
Test result of the table 1 based on dimensional Gaussian data set
As known from Table 1, for the bulk data set with preferable Margin Classification, FCM algorithms and GFC algorithms can all obtain compared with Good classifying quality, clustering precision otherness is little, and GFC algorithms can omit the selection to fuzzy indicator m, simplify algorithm parameter Setting.
2) noise immunity is tested
Examine containment properties of two algorithms to noise data, that is, the fuzzy membership for requiring clustering algorithm to distribute noise data It spends the smaller the better.On the basis of original dimensional Gaussian data set, supplement coordinate is a noise sample of (500,500).It is anti- Test record of making an uproar result includes the poly- of sample class center, all kinds of fuzzy membership degrees of membership of noise data and normal data Class effect, test result are as shown in table 2.
Table 2 is based on the test result of noisy (500,500) dimensional Gaussian data set
By test result table 2 it is found that noise data (500,500) all has the Clustering Effect of FCM algorithms and GFC algorithms Large effect.It is analyzed as the normalization of FCM algorithms constrains, noise data (500,500) is imparted higher by FCM algorithms Fuzzy membership, cause FCM algorithms can not rejection noise data.And influence of the noise data for GFC algorithms is exactly two classes There is consistency phenomenon in cluster centre so that Cluster Validity reduces, but because of the noise immunity design principle of the algorithm, poly- It is better than FCM algorithms in class precision, and noise data is made to be only capable of obtaining minimum fuzzy membership relative to normal data.Noise number According to all kinds of fuzzy membership othernesses it is very small, therefore can utilize noise data characteristic construction rejection method, define mould Degree of membership difference threshold formula formula (12) rejection noise data is pasted, as shown.
max(uij)-min(uij) < δ1 (12)
In formula (12), max (uij)-min(uij) < δ1For fuzzy membership difference threshold rejection formulaFor arbitrary sample This xj, its all kinds of fuzzy memberships are uij(i ∈ 1 ..., c), as these uijWhen meeting the requirement of formula (12), then visual sample xjFor noise data.In based on the dimensional Gaussian of noisy (500,500) emulation cluster data test, δ is taken1=0.00001, Rejection can be carried out to noise data, this is because noise data obtains minimum all kinds of fuzzy memberships far from cluster centre Degree.When with GFC algorithms, it should use it for carrying out rejection to noise data first, then carry out clustering again, you can obtain Obtain preferable cluster result.
Two, the test based on UCI data sets
Test of heuristics, iris data sets characteristic such as 3 institute of table are carried out based on iris data sets in UCI machine learning databases Show, the related setting of test is similar with the test based on Gaussian data collection, and particle often ties up component value range and is;0,50&#93;, and Value transformation is carried out to parameter M and ε when GFC test of heuristics, to study stability of the algorithm to parameter.Each clustering algorithm according to Parameter and data set carry out 10 tests, calculate all kinds of cluster mean accuracies.Table 4 gives the test knot based on iris data Fruit.
3 experimental data set attribute of table
As shown in Table 4, when M=1, ε=0.1, GFC algorithms obtain minimum cluster mean accuracy 90.60, when M=3, ε= When 0.3, GFC algorithms obtain highest and cluster mean accuracy 92.20, and the minimum and highest average clustering precision of GFC algorithms is above The cluster mean accuracy of FCM algorithms, in addition overall average precision of the GFC algorithms based on various parameters value is 91.47938, is better than The Clustering Effect of FCM algorithms.From two emulation testings it is found that GFC algorithms have more preferably clustering performance compared with FCM algorithms, say The Cluster Validity of set calculating method is illustrated.
Test result of the table 4 based on iris data sets
Step 2:The position X that multiple c × d tie up particles is initialized with the random number between 0,1h (0)With speed Vh (0)
Step 3:λ=1 is initialized, then the cluster centre of the λ times iteration is θi (λ), cluster centre matrix is P(λ)={ θi (λ), I=1 ..., c }.By particle position Xh (λ)With every d dimension components for one group, the cluster centre θ of the i-th class is corresponded toi (λ), i= 1,...,c.Definition iterations are λ, maximum iteration λmax
Step 4:The m powers of fuzzy membership are calculated with formula (2)
ε indicates the positive number of a very little, to overcome the formula incompleteness of formula (3);M is a certain constant, to show Fuzzy membership, apart from inversely prroportional relationship level, can be taken as 1 without loss of generality with sample, cluster centre.&#124;&#124;xji (λ)&#124;&#124;It indicates Based on j-th of sample xjWith the i-th class cluster centre θi (λ)Distance,
Fuzzy clustering algorithm requires sample, cluster centre distance and the inversely proportional relationship of fuzzy membership, inversely prroportional relationship There are many, the simply linear product inversely prroportional relationship of GFC algorithms selections, other inversely prroportional relationships can also introduce GFC calculations herein Relationship replacement is carried out in method.
Step 5:Define PSO algorithm fitness function formulas (4)
Judge &#124;&#124;f(U(λ))-f(U(λ-1))&#124;&#124;< ε or λ > λmax, if so, then uij (λ)Go out for iterative algorithm parameter Estimation Optimal fuzzy membership, and enable uij (λ)=uijIn substitution formula (1), and then realize the optimal dividing to sample set X, ε, λmax It is given in advance threshold value.If not, 6 are gone to step, until condition meets.
Step 6:According to the excellent solution fitness function value f (U of PSO algorithms(λ)), it is optimal to record present age individual in particle cluster algorithm Solve Ph (λ)With group optimal solution g(λ), λ=λ+1 is enabled, by formula (5), (6) update particle rapidity Vh (λ+1)And position Xh (λ+1), go to step 3。
Vh (λ+1)=wVh (λ)+c1r1[Ph (λ)-Xh (λ)]+c2r2[g(λ)-Xh (λ)] (5)
Xh (λ+1)=Xh (λ)+Vh (λ+1) (6)
C in formula (5), (6)1, c2For accelerated factor, it is taken as positive constant;r1, r2Wei &#91;0,1&#93;Between random number, w is known as Inertial factor.
As depicted in figs. 1 and 2, the Gaussian data collection of a kind of 50 samples composition is provided, center is (5,5), covariance square Battle array is &#91;3,0;0,3&#93;, the dispersion degree of data expressed by covariance matrix, take first that (5,5) are Gaussian data collection it is poly- Class center θ1, as shown in Figure 1, it is second cluster centre θ of Gaussian data collection to take (10,10)2Cluster centre, as shown in Figure 2. It will be apparent that the cluster centre θ in Fig. 11Compared to θ in Fig. 22More meet the needs of practical clustering problem.The cluster centre θ of Fig. 11 The case where being less than Fig. 2 to the distance of each sample, according to the basic principle of fuzzy clustering algorithm it is found that in Fig. 1 each sample it is fuzzy Degree of membership is higher than the sample in Fig. 2, if Fig. 2 is regarded as the original state of clustering algorithm, Fig. 1 is regarded as cluster optimization final state, Then cluster centre θ in fig. 22Cluster centre θ into Fig. 11Optimize in evolution process, that is, is equivalent to the maximum of formula (1) Change, so the maximization of GFC algorithms Chinese style (1) and the inversely prroportional relationship of formula (2) are to obey fuzzy clustering algorithm target call.
It can be illustrated by emulation experiment, the GFC algorithms proposed are outstanding on Cluster Validity and noiseproof feature. The inversely prroportional relationship of GFC algorithm constructions sample, cluster centre and fuzzy membership, and with fuzzy membership m powers sum most The big cluster object function for being turned to algorithm, while being searched in solution space in cluster using PSO population biological evolution algorithms The excellent solution of the heart.GFC algorithms do not normalize constraint therefore will not be sensitive to noise data, can make effectively to refuse to noise data Know, while fuzzy indicator can be omitted and omitted, and its inversely prroportional relationship can be transformed to a variety of inversely prroportional relationships, further enhance Adaptability of the GFC algorithms to various data.
In conclusion the invention discloses a kind of unsupervised data based on generalized fuzzy clustering algorithm (GFC algorithms) point Class method, characteristic information are shown as follows:Its characteristic information is shown as follows:1. pair sample set is according to GFC mesh Scalar functions minimization principle carries out optimization division;2. initializing the position and speed value of multiple particles;3. by particle position value Realization cluster centre initialization corresponding with sample clustering center;4. defining the distance and fuzzy membership between sample, cluster centre Inversely proportional relationship is to calculate sample fuzzy membership;5. obtaining newer cluster centre by particle cluster algorithm iterative formula; 6. GFC object functions are calculated.The fuzzy clustering algorithm that the present invention is constructed is not limited by normalization constraint, can be to making an uproar Sound data are made effectively to excavate and identify.The fuzzy membership constructed can expand deformation with cluster centre inversely prroportional relationship form For diversified forms, the scope of application of clustering algorithm is improved, also fuzzy indicator can be made to hide and ignore, so as to avoid fuzzy finger Mark the interference to clustering algorithm.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention.All essences in the present invention All any modification, equivalent and improvement etc., should all be included in the protection scope of the present invention made by within refreshing and principle.

Claims (7)

1. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm, includes the following steps:
Step 1:Optimization division is carried out according to GFC the minimization of object function principle to sample set;
Step 2:Initialize the position and speed value of multiple particles;
Step 3:By the realization cluster centre initialization corresponding with sample clustering center of particle position value;
Step 4:Defining distance and the inversely proportional relationship of fuzzy membership between sample, cluster centre, sample is fuzzy to be subordinate to calculate Category degree;
Step 5:Newer cluster centre is obtained by particle cluster algorithm iterative formula;
Step 6:GFC object functions are calculated.
2. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature It is:The step 1 the specific steps are:
Enable X={ x1,x2,L,xj,L,xnIndicate given sample set, xjIndicate j-th of sample;1≤j≤n, n are samples Number;Optimization division is carried out to sample set X so that target function value JGFCMinimum, wherein JGFCIt is determined by formula (1);
In formula (1), c indicates the classification number divided, 1≤i≤c, uijIndicate j-th of sample xjIt is under the jurisdiction of the fuzzy person in servitude of the i-th class Category degree;U={ uij, i=1, L, c;J=1, L, n } indicate that subordinated-degree matrix, m (m > 0) they are fuzzy indicator,For uijM times.
3. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature It is:The step 2 the specific steps are:The position X that multiple c × d tie up particles is initialized with the random number between 0,1h (0)And speed Spend Vh (0)
4. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature It is:The step 3 the specific steps are:
λ=1 is initialized, then the cluster centre of the λ times iteration is θi (λ), cluster centre matrix is P(λ)={ θi (λ), i=1 ..., c};By particle position Xh (λ)With every d dimension components for one group, the cluster centre θ of the i-th class is corresponded toi (λ), i=1 ..., c.Definition changes Generation number is λ, maximum iteration λmax
5. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature It is:The step 4 the specific steps are:
The m powers of fuzzy membership are calculated with formula (2)
ε indicates the positive number of a very little, to overcome the formula incompleteness of formula (3);M is a certain normal number, to representative model Degree of membership is pasted with sample, cluster centre apart from inversely prroportional relationship level, 1 can be taken as without loss of generality;&#124;&#124;xji (λ)&#124;&#124;Indicate base In j-th of sample xjWith the i-th class cluster centre θi (λ)Distance,
Fuzzy clustering algorithm requires sample, cluster centre distance and the inversely proportional relationship of fuzzy membership, and inversely prroportional relationship has more Kind, the simply linear product inversely prroportional relationship of GFC algorithms selections, other inversely prroportional relationships can also be introduced into GFC algorithms herein The replacement of the relationship of progress.
6. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature It is:The step 5 the specific steps are:
Define PSO algorithm fitness function formulas (4)
Judge &#124;&#124;f(U(λ))-f(U(λ-1))&#124;&#124;< ε or λ > λmax, if so, then uij (λ)Go out for iterative algorithm parameter Estimation optimal Fuzzy membership, and enable uij (λ)=uijIn substitution formula (1), and then realize the optimal dividing to sample set X, ε, λmaxIt is prior Given threshold value;If not, 6 are gone to step, until condition meets.
7. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature It is:The step 6 the specific steps are:
According to the excellent solution fitness function value f (U of PSO algorithms(λ)), record contemporary individual optimal solution P in particle cluster algorithmh (λ)And group Optimal solution g(λ), λ=λ+1 is enabled, by formula (5), (6) update particle rapidity Vh (λ+1)And position Xh (λ+1), go to step 3;
Vh (λ+1)=wVh (λ)+c1r1[Ph (λ)-Xh (λ)]+c2r2[g(λ)-Xh (λ)] (5)
Xh (λ+1)=Xh (λ)+Vh (λ+1) (6)
C in formula (5), (6)1, c2For accelerated factor, it is taken as positive constant;r1, r2Wei &#91;0,1&#93;Between random number, w is known as inertia The factor.
CN201810495011.XA 2018-05-22 2018-05-22 A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm Pending CN108710914A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810495011.XA CN108710914A (en) 2018-05-22 2018-05-22 A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810495011.XA CN108710914A (en) 2018-05-22 2018-05-22 A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm

Publications (1)

Publication Number Publication Date
CN108710914A true CN108710914A (en) 2018-10-26

Family

ID=63868606

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810495011.XA Pending CN108710914A (en) 2018-05-22 2018-05-22 A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm

Country Status (1)

Country Link
CN (1) CN108710914A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110310658A (en) * 2019-06-21 2019-10-08 桂林电子科技大学 A kind of speech Separation method based on Speech processing
CN110929777A (en) * 2019-11-18 2020-03-27 济南大学 Data kernel clustering method based on transfer learning
CN111561907A (en) * 2020-03-31 2020-08-21 华电电力科学研究院有限公司 Tower drum uneven settlement monitoring method based on plane dip angle measurement
CN111666981A (en) * 2020-05-13 2020-09-15 云南电网有限责任公司信息中心 System data anomaly detection method based on genetic fuzzy clustering
CN112215492A (en) * 2020-10-12 2021-01-12 国网甘肃省电力公司电力科学研究院 Aggregation grouping method based on power supply spatial distribution and regulation characteristics
CN112422546A (en) * 2020-11-10 2021-02-26 昆明理工大学 Network anomaly detection method based on variable neighborhood algorithm and fuzzy clustering
CN112487552A (en) * 2020-11-18 2021-03-12 南京航空航天大学 Envelope dividing and gain scheduling method of flying wing unmanned aerial vehicle based on fuzzy clustering
CN112583723A (en) * 2020-12-15 2021-03-30 东方红卫星移动通信有限公司 FCM-based large-scale routing network expression method
CN113447813A (en) * 2020-09-03 2021-09-28 鲁能集团有限公司 Fault diagnosis method and equipment for offshore wind generating set
CN114548225A (en) * 2022-01-19 2022-05-27 中国人民解放军国防科技大学 Method, device and equipment for processing situation data outlier samples based on FCM
CN115952432A (en) * 2022-12-21 2023-04-11 四川大学华西医院 Unsupervised clustering method based on diabetes data

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110310658B (en) * 2019-06-21 2021-11-30 桂林电子科技大学 Voice separation method based on voice signal processing
CN110310658A (en) * 2019-06-21 2019-10-08 桂林电子科技大学 A kind of speech Separation method based on Speech processing
CN110929777A (en) * 2019-11-18 2020-03-27 济南大学 Data kernel clustering method based on transfer learning
CN111561907A (en) * 2020-03-31 2020-08-21 华电电力科学研究院有限公司 Tower drum uneven settlement monitoring method based on plane dip angle measurement
CN111666981A (en) * 2020-05-13 2020-09-15 云南电网有限责任公司信息中心 System data anomaly detection method based on genetic fuzzy clustering
CN111666981B (en) * 2020-05-13 2023-03-31 云南电网有限责任公司信息中心 System data anomaly detection method based on genetic fuzzy clustering
CN113447813B (en) * 2020-09-03 2022-09-13 中国绿发投资集团有限公司 Fault diagnosis method and equipment for offshore wind generating set
CN113447813A (en) * 2020-09-03 2021-09-28 鲁能集团有限公司 Fault diagnosis method and equipment for offshore wind generating set
CN112215492A (en) * 2020-10-12 2021-01-12 国网甘肃省电力公司电力科学研究院 Aggregation grouping method based on power supply spatial distribution and regulation characteristics
CN112422546A (en) * 2020-11-10 2021-02-26 昆明理工大学 Network anomaly detection method based on variable neighborhood algorithm and fuzzy clustering
CN112487552A (en) * 2020-11-18 2021-03-12 南京航空航天大学 Envelope dividing and gain scheduling method of flying wing unmanned aerial vehicle based on fuzzy clustering
CN112583723A (en) * 2020-12-15 2021-03-30 东方红卫星移动通信有限公司 FCM-based large-scale routing network expression method
CN112583723B (en) * 2020-12-15 2022-08-26 东方红卫星移动通信有限公司 FCM-based large-scale routing network expression method
CN114548225A (en) * 2022-01-19 2022-05-27 中国人民解放军国防科技大学 Method, device and equipment for processing situation data outlier samples based on FCM
CN114548225B (en) * 2022-01-19 2024-02-02 中国人民解放军国防科技大学 Method, device and equipment for processing situation data outlier sample based on FCM
CN115952432A (en) * 2022-12-21 2023-04-11 四川大学华西医院 Unsupervised clustering method based on diabetes data
CN115952432B (en) * 2022-12-21 2024-03-12 四川大学华西医院 Unsupervised clustering method based on diabetes data

Similar Documents

Publication Publication Date Title
CN108710914A (en) A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm
CN110163258A (en) A kind of zero sample learning method and system reassigning mechanism based on semantic attribute attention
CN108777873A (en) The wireless sensor network abnormal deviation data examination method of forest is isolated based on weighted blend
CN104021255B (en) Multi-resolution hierarchical presenting and hierarchical matching weighted comparison method for CAD (computer aided design) model
Page et al. Spatial product partition models
CN107301328B (en) Cancer subtype accurate discovery and evolution analysis method based on data flow clustering
Lei et al. Detecting protein complexes from DPINs by density based clustering with Pigeon-Inspired Optimization Algorithm
Zhang et al. Gllpa: A graph layout based label propagation algorithm for community detection
Wang et al. Combination evaluation method of fuzzy c-mean clustering validity based on hybrid weighted strategy
CN106250918A (en) A kind of mixed Gauss model matching process based on the soil-shifting distance improved
Fei et al. An improved BPNN method based on probability density for indoor location
Zhang et al. Chameleon algorithm based on improved natural neighbor graph generating sub-clusters
CN102663773A (en) Dual-core type adaptive fusion tracking method of video object
CN108985375A (en) Consider the multiple features fusion tracking of particle weight spatial distribution
CN116597294A (en) SLAM map topology evaluation method and device, electronic equipment and storage medium
Dalton Optimal ROC-based classification and performance analysis under Bayesian uncertainty models
CN112784886B (en) Brain image classification method based on multi-layer maximum spanning tree graph core
CN115442887A (en) Indoor positioning method based on cellular network RSSI
Huang et al. Optimizing fiducial marker placement for improved visual localization
Wang et al. FCM algorithm and index CS for the signal sorting of radiant points
CN115638795B (en) Indoor multi-source ubiquitous positioning fingerprint database generation and positioning method
Song et al. Kernel-based fuzzy local information clustering algorithm self-integrating non-local information
Zhang et al. A novel fuzzy clustering approach based on breadth-first search algorithm
Donghui et al. A fuzzy similarity-based clustering optimized by particle swarm optimization
CN108446736A (en) It is fused into the Novel semi-supervised to constraint and scale restriction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181026