CN108710914A

CN108710914A - A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm

Info

Publication number: CN108710914A
Application number: CN201810495011.XA
Authority: CN
Inventors: 文传军; 许定亮; 刘福燕
Original assignee: Changzhou Institute of Technology
Current assignee: Changzhou Institute of Technology
Priority date: 2018-05-22
Filing date: 2018-05-22
Publication date: 2018-10-26

Abstract

The invention discloses a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm, step includes：Optimization division is carried out according to GFC the minimization of object function principle to sample set；Initialize the position and speed value of multiple particles；By the realization cluster centre initialization corresponding with sample clustering center of particle position value；Distance and the inversely proportional relationship of fuzzy membership between definition sample, cluster centre is to calculate sample fuzzy membership；Newer cluster centre is obtained by particle cluster algorithm iterative formula；GFC object functions are calculated.The fuzzy clustering algorithm that the present invention is constructed is not limited by constraint is normalized, and can make effectively to excavate and identify to noise data.The fuzzy membership constructed can be expanded with cluster centre inversely prroportional relationship form and be deformed into diversified forms, and the scope of application of clustering algorithm is improved, and can also make to hide to fuzzy indicator and ignore, the interference so as to avoid fuzzy indicator to clustering algorithm.

Description

A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm

Technical field

It is the invention belongs to the method for the unsupervised data classification in Data Mining, more particularly to a kind of to be based on broad sense mould Paste the unsupervised data classification method of clustering algorithm.

Background technology

Fuzzy clustering based on object function is the important research content in clustering field, and is widely used in no prison Superintend and direct the fields such as pattern classification, audio and video analyzing processing, machine intelligence study and data mining analysis.FCM Algorithms (fuzzy C-means clustering, FCM) is a kind of typically from the fuzzy clustering calculation of cluster object function derivation Method is most important and most widely used fuzzy clustering method.The model tormulation formal intuition of FCM algorithms and it should be readily appreciated that, is excellent Change solve theory it is more rigorous, can by computer programming calculation, the result of cluster performance preferably etc..

FCM algorithms are limited to the constraint of normalizing condition, therefore more sensitive to noise data, far from all kinds of cluster centres Noise data can still obtain higher fuzzy membership, PCM algorithms (Possibility C mean clustering Algorithm, PCM) normalization constraint is abandoned on the basis of FCM algorithms, but sample fuzzy membership is only clustered with such Center is related and leads to cluster centre consistency, and PFCM, FPCM scheduling algorithm are taken add respectively on the basis of FCM, PCM algorithm Method combines and the form of multiplicative combination combines the two, to make full use of the respective advantage of two algorithms, but increases very much Artificial experience is needed to take fixed union variable, so that clustering algorithm is complicated and determines method without effective parameter optimization.

There are three important factors in fuzzy clustering algorithm, first, the expression of fuzzy membership.Fuzzy membership embodies The relationship of sample and cluster centre, when sample and cluster centre apart from it is larger when, clustering algorithm assigns sample smaller fuzzy Degree of membership, so fuzzy membership is inversely proportional to sample, cluster centre distance.Second is that taking for cluster centre is fixed.In order to cluster mesh Scalar functions minimize, and the sample that cluster centre should be larger with fuzzy membership is close, and cluster centre should fall into sample in other words Assemble more place.Cluster centre is mainly calculated by two methods, one is sample fuzzy membership weighted average, Another is to estimate to obtain by biological evolution algorithm such as genetic algorithm (genetic algorithm, GA) optimizing.Third, true Surely object function is clustered.The cluster object function of FCM algorithms is minimized based on error weighted sum of squares in class, hidden degree of membership Fuzzy c-Means Clustering Algorithm (hidden-membership fuzzy c-means clustering algorithm, HMFCM) Converted by equation, by FCM algorithms cluster object function be converted to sample, cluster centre distance minimum form, this also body Where the essence of existing clustering algorithm, that is, error is showed by sample and cluster centre distance in class, pursues error in class It minimizes.Due to sample, cluster centre distance and the inversely proportional relationship of fuzzy membership, cluster object function can also express For the maximization of fuzzy membership.

In addition, since FCM algorithms propose, Bezdek utilizes fuzzy membership determined by gradient method and AO alternative iteration methods Degree, cluster centre method of estimation always affect the expansion of follow-up study work, and FCM convergence conditions require fuzzy be subordinate to Category degree Second Order Sea matches battle array positive definite, is embodied in and fuzzy indicator is required to be more than 1.Theoretical proof points out, when utilizing particle cluster algorithm Biological evolutions algorithms such as (particle swarm optimization algorithm, PSO) estimates fuzzy membership When, due to having broken away from the constringent limitation of gradient method, fuzzy indicator value range can be extended to more than zero by clustering algorithm Situation, clustering algorithm can still keep Clustering Effect.

Invention content

The present invention carries to overcome Fuzzy c-Means Clustering Algorithm (FCM) normalization to constrain the defect sensitive to noise data Go out generalized fuzzy clustering algorithm (generalized fuzzy clustering algorithm, GFC), passes through inverse proportion form Relationship between ambiguity in definition degree of membership and cluster centre, at the same using particle cluster algorithm carry out cluster centre parameter Estimation and Can containing to noise data collection for object function is turned to fuzzy membership maximum.

In order to achieve the above-mentioned object of the invention, the present invention adopts the following technical scheme that：

A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm, includes the following steps：

Step 1：Optimization division is carried out according to GFC the minimization of object function principle to sample set；

Step 2：Initialize the position and speed value of multiple particles；

Step 3：By the realization cluster centre initialization corresponding with sample clustering center of particle position value；

Step 4：Distance and the inversely proportional relationship of fuzzy membership between definition sample, cluster centre is to calculate sample mould Paste degree of membership；

Step 5：Newer cluster centre is obtained by particle cluster algorithm iterative formula；

Step 6：GFC object functions are calculated.

Further, the step 1 the specific steps are：

Enable X={ x₁,x₂,L,x_j,L,x_nIndicate given sample set, x_jIndicate j-th of sample；1≤j≤n, n are samples This number；Optimization division is carried out to sample set X so that target function value J_GFCMinimum, wherein J_GFCReally by formula (1) institute It is fixed；

In formula (1), c indicates the classification number divided, 1≤i≤c, u_ijIndicate j-th of sample x_jIt is under the jurisdiction of the mould of the i-th class Paste degree of membership；U={ u_ij, i=1, L, c；J=1, L, n } indicate that subordinated-degree matrix, m (m > 0) they are fuzzy indicator,For u_ijM It is secondary.

Further, the step 2 the specific steps are：The position that multiple c × d tie up particles is initialized with the random number between 0,1 Set X_h ⁽⁰⁾With speed V_h ⁽⁰⁾。

Further, the step 3 the specific steps are：

λ=1 is initialized, then the cluster centre of the λ times iteration is θ_i ^(λ), cluster centre matrix is P^(λ)={ θ_i ^(λ), i= 1,...,c}；By particle position X_h ^(λ)With every d dimension components for one group, the cluster centre θ of the i-th class is corresponded to_i ^(λ), i=1 ..., c. Definition iterations are λ, maximum iteration λ_max。

Further, the step 4 the specific steps are：

The m powers of fuzzy membership are calculated with formula (2)

ε indicates the positive number of a very little, to overcome the formula incompleteness of formula (3)；M is a certain normal number, to table Existing fuzzy membership, apart from inversely prroportional relationship level, can be taken as 1 without loss of generality with sample, cluster centre；||x_j-θ_i ^(λ)||Table Show and is based on j-th of sample x_jWith the i-th class cluster centre θ_i ^(λ)Distance,

Fuzzy clustering algorithm requires sample, cluster centre distance and the inversely proportional relationship of fuzzy membership, inversely prroportional relationship There are many, the simply linear product inversely prroportional relationship of GFC algorithms selections, other inversely prroportional relationships can also introduce GFC calculations herein Relationship replacement is carried out in method.

Further, the step 5 the specific steps are：

Define PSO algorithm fitness function formulas (4)

Judge ||f(U^(λ))-f(U^(λ-1))||< ε or λ > λ_max, if so, then ui_j(λ) is that iterative algorithm parameter Estimation goes out Optimal fuzzy membership, and enable u_ij ^(λ)=u_ijIn substitution formula (1), and then realize the optimal dividing to sample set X, ε, λ_max It is given in advance threshold value；If not, 6 are gone to step, until condition meets.

Further, the step 6 the specific steps are：

According to the excellent solution fitness function value f (U of PSO algorithms^(λ)), record contemporary individual optimal solution P in particle cluster algorithm_h ^(λ) With group optimal solution g^(λ), λ=λ+1 is enabled, by formula (5), (6) update particle rapidity V_h ^(λ+1)And position X_h ^(λ+1), go to step 3；

V_h ^(λ+1)=wV_h ^(λ)+c₁r₁[P_h ^(λ)-X_h ^(λ)]+c₂r₂[g^(λ)-X_h ^(λ)] (5)

X_h ^(λ+1)=X_h ^(λ)+V_h ^(λ+1) (6)

C in formula (5), (6)₁, c₂For accelerated factor, it is taken as positive constant；r₁, r₂Wei [0,1]Between random number, w is known as Inertial factor.

Compared with the prior art, beneficial effects of the present invention are embodied in：

1. fuzzy indicator m>0 expansion and omission to fuzzy indicator

Cluster target function type (1) and inversely prroportional relationship formula (2) determine the property of GFC algorithms.Fuzzy indicator m is extended to M > 0, by formula (2) it is found that sample, cluster centre cluster centre ||x_j-θ_i||²Be withInversely proportional relationship, as m > 0, With fuzzy membership u_ijDirect proportionality, ||x_j-θ_i||²It is and u_ijInversely, meet fuzzy clustering algorithm The smaller cluster basic principle of sample, the bigger degree of membership of cluster centre distance.Convolution (1) and formula (2), due to fuzzy indicator m > 0, therefore GFC algorithm object functions maximize the minimum for being equivalent to error in class, also comply with the examination of clustering algorithm evaluation Standard.

In addition, by formula (1) it is found that as m > 0, GFC algorithm object functionsMinimum be equivalent to Minimum, i.e. GFC algorithms object function can be unrelated with fuzzy indicator m.Known to convolution (1) and (1), it is only necessary to determine;|x_j-θ_i ||²Value can determine target function value, GFC algorithms can obtain object function independent of fuzzy indicator and carry out classification Judgement, setting to fuzzy indicator can be omitted by being equivalent to GFC algorithms.

2. can be based on the noiseproof feature of the intuitive analysis GFC algorithms of diagram

The autgmentability of 3.GFC algorithm inversely prroportional relationships

GFC algorithms inversely prroportional relationship can be extended to a variety of expression-forms：

Wherein formula (7) is the inversely prroportional relationship of exponential form.

Wherein formula (8) is the inversely prroportional relationship of logarithmic form.

The forms such as the inversely prroportional relationship of polynomial form, and combination inversely prroportional relationship can also be constructed.GFC algorithm inverse ratios The expansion of example analysis relationship, enriches the display form and the scope of application of GFC algorithms, forms GFC algorithm clusters.

4.GFC algorithms have good noise immunity to noise data

The FCM algorithms reason sensitive to noise data is its normalization constraint, as shown in formula (9)：

That is sample x_jFor all kinds of degrees of membership and be 1, as noise data x_kWhen far from Various types of data, sample is fuzzy Degree of membership still obeys normalization constraint, and FCM algorithms is caused still to assign higher fuzzy membership to noise data so that calculates Method can not carry out rejection to noise data.

GFC algorithms fuzzy membership is determined by formula (10)：

By formula (10) it is found that working as noise data x_jFar from all cluster centre θ_iWhen, fuzzy membership u_ijValue will be non- It is often small, without by normalized constraint, to distinguish itself and normal data, therefore GFC algorithms have certain noise Rejection ability.

Description of the drawings

Fig. 1 is the Gaussian data collection for taking (5,5) as cluster centre；

Fig. 2 is the Gaussian data collection for taking (10,10) as cluster centre.

Specific implementation mode

Invention is further described in detail below in conjunction with the accompanying drawings.

A kind of unsupervised data classification method based on generalized fuzzy clustering algorithm (GFC algorithms) of the invention abandons tradition The modeling format of FCM algorithms, the distance between setting sample, cluster centre and the inversely proportional relationship of fuzzy membership, utilize particle Group's algorithm (PSO) searches for the excellent solution of cluster centre in solution space, and turns to cluster object function with fuzzy membership maximum.GFC Algorithm is not limited by normalization constraint, can make effectively excavation and identification to noise data.The inversely prroportional relationship shape constructed Formula can be expanded and be deformed into diversified forms, and the scope of application of clustering algorithm is improved.GFC algorithms can also hide fuzzy indicator Ignore, the interference so as to avoid fuzzy indicator to clustering algorithm.

The method of the present invention carries out as follows：

Step 1：Enable X={ x₁,x₂,L,x_j,L,x_nIndicate given sample set, x_jIndicate j-th of sample；1≤j≤n, N is the number of sample；Optimization division is carried out to sample set X so that target function value J_GFCMinimum, wherein J_GFCBy formula (1) It determines.

In formula (1), c indicates the classification number divided, 1≤i≤c, u_ijIndicate j-th of sample x_jIt is under the jurisdiction of the mould of the i-th class Paste degree of membership.U={ u_ij, i=1, L, c；J=1, L, n } indicate that subordinated-degree matrix, m (m > 0) they are fuzzy indicator,For u_ijM It is secondary.

Step 2：The position X that multiple c × d tie up particles is initialized with the random number between 0,1_h ⁽⁰⁾With speed V_h ⁽⁰⁾。

Step 3：λ=1 is initialized, then the cluster centre of the λ times iteration is θ_i ^(λ), cluster centre matrix is P^(λ)={ θ_i ^(λ), I=1 ..., c }.By particle position X_h ^(λ)With every d dimension components for one group, the cluster centre θ of the i-th class is corresponded to_i ^(λ), i= 1,...,c.Definition iterations are λ, maximum iteration λ_max；

Step 4：The m powers of fuzzy membership are calculated with formula (2)

ε indicates the positive number of a very little, to overcome the formula incompleteness of formula (3)；M is a certain normal number, to table Existing fuzzy membership, apart from inversely prroportional relationship level, can be taken as 1 without loss of generality with sample, cluster centre.||x_j-θ_i ^(λ)||Table Show and is based on j-th of sample x_jWith the i-th class cluster centre θ_i ^(λ)Distance,

Step 5：Define PSO algorithm fitness function formulas (4)

Judge ||f(U^(λ))-f(U^(λ-1))||< ε or λ > λ_max, if so, then u_ij ^(λ)Go out for iterative algorithm parameter Estimation Optimal fuzzy membership, and enable u_ij ^(λ)=u_ijIn substitution formula (1), and then realize the optimal dividing to sample set X, ε, λ_max It is given in advance threshold value.If not, 6 are gone to step, until condition meets.

Step 6：According to the excellent solution fitness function value f (U of PSO algorithms^(λ)), it is optimal to record present age individual in particle cluster algorithm Solve P_h ^(λ)With group optimal solution g^(λ), λ=λ+1 is enabled, by formula (5), (6) update particle rapidity V_h ^(λ+1)And position X_h ^(λ+1), go to step 3。

X_h ^(λ+1)=X_h ^(λ)+V_h ^(λ+1) (6)

Embodiment 1：

In the present embodiment, repeatedly there is the phenomenon that cluster centre consistency in emulation testing in PCM algorithms, cause to cluster As a result invalid.Therefore in order to verify the validity and feasibility of GFC algorithms, GFC algorithms and FCM algorithms are compared survey by selection Examination.

Generalized fuzzy clustering algorithm (GFC) is to carry out as follows：

One, the test based on dimensional Gaussian data set

Test includes two aspects, first, the Cluster Validity of clustering algorithm, is mainly reflected in the test essence of clustering algorithm Degree, second is that noise immunity of the clustering algorithm to noise data, it is desirable that clustering algorithm assigns lower fuzzy membership to noise data, I.e. clustering algorithm can distinguish noise data and normal data.

1) validity test

Manually generated dimensional Gaussian data set is tested, it is 2 to select cluster classification number, is combined using two gaussian randoms Test data set is generated, agreement class center is (5,5) and (10,10), and two class sample numbers are respectively respectively 100, and covariance matrix is all Qu Wei [5 0；0 5].

Particle cluster algorithm provides the approach of GFC algorithms solution, particle position vector sum velocity vector in particle cluster algorithm Often dimension component be all real number, a particle position vector is a feasible solution, and position vector dimension is c × d dimensions, and c is poly- Class classification number, d are the component dimension of sample, have corresponded to the d dimension space coordinates of c cluster centre.Particle scale is taken as 30, iteration Number is defined as 300 times, and the often dimension component value range of particle unknown vector is [0,20], every d dimensions point of particle position vector Measure the specific d dimension components for corresponding to some cluster centre.In order to avoid particle group optimizing calculating is absorbed in the very poor office of Clustering Effect Portion is optimal, chooses FCM algorithms and trains the cluster centre come, a primary position in series for particle cluster algorithm It sets, to improve the clustering performance of GFC algorithms, that is, has：

θ_i(0)=θ_i ^* (11)

Wherein θ_i(0) the positional value X being together in series when having corresponded to particle cluster algorithm initialization assignment_h ⁽⁰⁾, θ_i ^*For FCM algorithms Excellent solution in cluster result, its object is to jump out bad local extremum solution using FCM algorithms guiding GFC algorithms.In addition right Parameter in GFC algorithms (2), takes ε=0.1, M=1

Test result records all kinds of measuring accuracies, preserves the two final cluster centre coordinates of class data iteration, and table 1 gives Measuring accuracy and cluster centre coordinate.

Test result of the table 1 based on dimensional Gaussian data set

As known from Table 1, for the bulk data set with preferable Margin Classification, FCM algorithms and GFC algorithms can all obtain compared with Good classifying quality, clustering precision otherness is little, and GFC algorithms can omit the selection to fuzzy indicator m, simplify algorithm parameter Setting.

2) noise immunity is tested

Examine containment properties of two algorithms to noise data, that is, the fuzzy membership for requiring clustering algorithm to distribute noise data It spends the smaller the better.On the basis of original dimensional Gaussian data set, supplement coordinate is a noise sample of (500,500).It is anti- Test record of making an uproar result includes the poly- of sample class center, all kinds of fuzzy membership degrees of membership of noise data and normal data Class effect, test result are as shown in table 2.

Table 2 is based on the test result of noisy (500,500) dimensional Gaussian data set

By test result table 2 it is found that noise data (500,500) all has the Clustering Effect of FCM algorithms and GFC algorithms Large effect.It is analyzed as the normalization of FCM algorithms constrains, noise data (500,500) is imparted higher by FCM algorithms Fuzzy membership, cause FCM algorithms can not rejection noise data.And influence of the noise data for GFC algorithms is exactly two classes There is consistency phenomenon in cluster centre so that Cluster Validity reduces, but because of the noise immunity design principle of the algorithm, poly- It is better than FCM algorithms in class precision, and noise data is made to be only capable of obtaining minimum fuzzy membership relative to normal data.Noise number According to all kinds of fuzzy membership othernesses it is very small, therefore can utilize noise data characteristic construction rejection method, define mould Degree of membership difference threshold formula formula (12) rejection noise data is pasted, as shown.

max(u_ij)-min(u_ij) < δ₁ (12)

In formula (12), max (u_ij)-min(u_ij) < δ₁For fuzzy membership difference threshold rejection formula_。For arbitrary sample This x_j, its all kinds of fuzzy memberships are u_ij(i ∈ 1 ..., c), as these u_ijWhen meeting the requirement of formula (12), then visual sample x_jFor noise data.In based on the dimensional Gaussian of noisy (500,500) emulation cluster data test, δ is taken₁=0.00001, Rejection can be carried out to noise data, this is because noise data obtains minimum all kinds of fuzzy memberships far from cluster centre Degree.When with GFC algorithms, it should use it for carrying out rejection to noise data first, then carry out clustering again, you can obtain Obtain preferable cluster result.

Two, the test based on UCI data sets

Test of heuristics, iris data sets characteristic such as 3 institute of table are carried out based on iris data sets in UCI machine learning databases Show, the related setting of test is similar with the test based on Gaussian data collection, and particle often ties up component value range and is;0,50], and Value transformation is carried out to parameter M and ε when GFC test of heuristics, to study stability of the algorithm to parameter.Each clustering algorithm according to Parameter and data set carry out 10 tests, calculate all kinds of cluster mean accuracies.Table 4 gives the test knot based on iris data Fruit.

3 experimental data set attribute of table

As shown in Table 4, when M=1, ε=0.1, GFC algorithms obtain minimum cluster mean accuracy 90.60, when M=3, ε= When 0.3, GFC algorithms obtain highest and cluster mean accuracy 92.20, and the minimum and highest average clustering precision of GFC algorithms is above The cluster mean accuracy of FCM algorithms, in addition overall average precision of the GFC algorithms based on various parameters value is 91.47938, is better than The Clustering Effect of FCM algorithms.From two emulation testings it is found that GFC algorithms have more preferably clustering performance compared with FCM algorithms, say The Cluster Validity of set calculating method is illustrated.

Test result of the table 4 based on iris data sets

Step 4：The m powers of fuzzy membership are calculated with formula (2)

ε indicates the positive number of a very little, to overcome the formula incompleteness of formula (3)；M is a certain constant, to show Fuzzy membership, apart from inversely prroportional relationship level, can be taken as 1 without loss of generality with sample, cluster centre.||x_j-θ_i ^(λ)||It indicates Based on j-th of sample x_jWith the i-th class cluster centre θ_i ^(λ)Distance,

Step 5：Define PSO algorithm fitness function formulas (4)

X_h ^(λ+1)=X_h ^(λ)+V_h ^(λ+1) (6)

As depicted in figs. 1 and 2, the Gaussian data collection of a kind of 50 samples composition is provided, center is (5,5), covariance square Battle array is [3,0；0,3], the dispersion degree of data expressed by covariance matrix, take first that (5,5) are Gaussian data collection it is poly- Class center θ₁, as shown in Figure 1, it is second cluster centre θ of Gaussian data collection to take (10,10)₂Cluster centre, as shown in Figure 2. It will be apparent that the cluster centre θ in Fig. 1₁Compared to θ in Fig. 2₂More meet the needs of practical clustering problem.The cluster centre θ of Fig. 1₁ The case where being less than Fig. 2 to the distance of each sample, according to the basic principle of fuzzy clustering algorithm it is found that in Fig. 1 each sample it is fuzzy Degree of membership is higher than the sample in Fig. 2, if Fig. 2 is regarded as the original state of clustering algorithm, Fig. 1 is regarded as cluster optimization final state, Then cluster centre θ in fig. 2₂Cluster centre θ into Fig. 1₁Optimize in evolution process, that is, is equivalent to the maximum of formula (1) Change, so the maximization of GFC algorithms Chinese style (1) and the inversely prroportional relationship of formula (2) are to obey fuzzy clustering algorithm target call.

It can be illustrated by emulation experiment, the GFC algorithms proposed are outstanding on Cluster Validity and noiseproof feature. The inversely prroportional relationship of GFC algorithm constructions sample, cluster centre and fuzzy membership, and with fuzzy membership m powers sum most The big cluster object function for being turned to algorithm, while being searched in solution space in cluster using PSO population biological evolution algorithms The excellent solution of the heart.GFC algorithms do not normalize constraint therefore will not be sensitive to noise data, can make effectively to refuse to noise data Know, while fuzzy indicator can be omitted and omitted, and its inversely prroportional relationship can be transformed to a variety of inversely prroportional relationships, further enhance Adaptability of the GFC algorithms to various data.

In conclusion the invention discloses a kind of unsupervised data based on generalized fuzzy clustering algorithm (GFC algorithms) point Class method, characteristic information are shown as follows：Its characteristic information is shown as follows：1. pair sample set is according to GFC mesh Scalar functions minimization principle carries out optimization division；2. initializing the position and speed value of multiple particles；3. by particle position value Realization cluster centre initialization corresponding with sample clustering center；4. defining the distance and fuzzy membership between sample, cluster centre Inversely proportional relationship is to calculate sample fuzzy membership；5. obtaining newer cluster centre by particle cluster algorithm iterative formula； 6. GFC object functions are calculated.The fuzzy clustering algorithm that the present invention is constructed is not limited by normalization constraint, can be to making an uproar Sound data are made effectively to excavate and identify.The fuzzy membership constructed can expand deformation with cluster centre inversely prroportional relationship form For diversified forms, the scope of application of clustering algorithm is improved, also fuzzy indicator can be made to hide and ignore, so as to avoid fuzzy finger Mark the interference to clustering algorithm.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention.All essences in the present invention All any modification, equivalent and improvement etc., should all be included in the protection scope of the present invention made by within refreshing and principle.

Claims

1. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm, includes the following steps：

Step 2：Initialize the position and speed value of multiple particles；

Step 4：Defining distance and the inversely proportional relationship of fuzzy membership between sample, cluster centre, sample is fuzzy to be subordinate to calculate Category degree；

Step 6：GFC object functions are calculated.

2. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature It is：The step 1 the specific steps are：

Enable X={ x₁,x₂,L,x_j,L,x_nIndicate given sample set, x_jIndicate j-th of sample；1≤j≤n, n are samples Number；Optimization division is carried out to sample set X so that target function value J_GFCMinimum, wherein J_GFCIt is determined by formula (1)；

In formula (1), c indicates the classification number divided, 1≤i≤c, u_ijIndicate j-th of sample x_jIt is under the jurisdiction of the fuzzy person in servitude of the i-th class Category degree；U={ u_ij, i=1, L, c；J=1, L, n } indicate that subordinated-degree matrix, m (m > 0) they are fuzzy indicator,For u_ijM times.

3. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature It is：The step 2 the specific steps are：The position X that multiple c × d tie up particles is initialized with the random number between 0,1_h ⁽⁰⁾And speed Spend V_h ⁽⁰⁾。

4. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature It is：The step 3 the specific steps are：

λ=1 is initialized, then the cluster centre of the λ times iteration is θ_i ^(λ), cluster centre matrix is P^(λ)={ θ_i ^(λ), i=1 ..., c}；By particle position X_h ^(λ)With every d dimension components for one group, the cluster centre θ of the i-th class is corresponded to_i ^(λ), i=1 ..., c.Definition changes Generation number is λ, maximum iteration λ_max。

5. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature It is：The step 4 the specific steps are：

The m powers of fuzzy membership are calculated with formula (2)

ε indicates the positive number of a very little, to overcome the formula incompleteness of formula (3)；M is a certain normal number, to representative model Degree of membership is pasted with sample, cluster centre apart from inversely prroportional relationship level, 1 can be taken as without loss of generality；||x_j-θ_i ^(λ)||Indicate base In j-th of sample x_jWith the i-th class cluster centre θ_i ^(λ)Distance,

Fuzzy clustering algorithm requires sample, cluster centre distance and the inversely proportional relationship of fuzzy membership, and inversely prroportional relationship has more Kind, the simply linear product inversely prroportional relationship of GFC algorithms selections, other inversely prroportional relationships can also be introduced into GFC algorithms herein The replacement of the relationship of progress.

6. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature It is：The step 5 the specific steps are：

Define PSO algorithm fitness function formulas (4)

Judge ||f(U^(λ))-f(U^(λ-1))||< ε or λ > λ_max, if so, then u_ij ^(λ)Go out for iterative algorithm parameter Estimation optimal Fuzzy membership, and enable u_ij ^(λ)=u_ijIn substitution formula (1), and then realize the optimal dividing to sample set X, ε, λ_maxIt is prior Given threshold value；If not, 6 are gone to step, until condition meets.

7. a kind of unsupervised data classification method based on generalized fuzzy clustering algorithm according to claim 1, feature It is：The step 6 the specific steps are：

According to the excellent solution fitness function value f (U of PSO algorithms^(λ)), record contemporary individual optimal solution P in particle cluster algorithm_h ^(λ)And group Optimal solution g^(λ), λ=λ+1 is enabled, by formula (5), (6) update particle rapidity V_h ^(λ+1)And position X_h ^(λ+1), go to step 3；

X_h ^(λ+1)=X_h ^(λ)+V_h ^(λ+1) (6)

C in formula (5), (6)₁, c₂For accelerated factor, it is taken as positive constant；r₁, r₂Wei [0,1]Between random number, w is known as inertia The factor.