CN104182511A

CN104182511A - Cluster-feature-weighted fuzzy compact scattering and clustering method

Info

Publication number: CN104182511A
Application number: CN201410413719.8A
Authority: CN
Inventors: 周媛; 王丽娜; 何军
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing ditavi Data Technology Co., Ltd
Priority date: 2014-08-20
Filing date: 2014-08-20
Publication date: 2014-12-03
Anticipated expiration: 2034-08-20
Also published as: CN104182511B

Abstract

The invention discloses a cluster-feature-weighted fuzzy compact scattering and clustering method and aims at the problems that an existing WFCM algorithm does not take actual situations of sample hard division into consideration and is poor in effect on clustering of data with unbalanced sample distribution and an FCS (fuzzy compactness and separation) algorithm does not take situations of hard division boundary points and neglects influence, of sample feature parameters, on clustering of various kinds. By adjusting sample membership degree and feature weight, actual situations of sample hard division are followed, influence, of the sampler feature parameters, on clustering of various kinds is fully taken into consideration, samples are enabled to be compact in a category and disperse among categories as far as possible, the problem of membership degree of the samples positioned at a hard division boundary is solved, and noise data and abnormal data are divided more effectively under the circumstance that the samples are distributed in an unbalanced manner. The cluster-feature-weighted fuzzy compact scattering and clustering method is high in clustering performance, high in convergence speed, high in iteration efficiency and suitable for being applied to occasions with unbalanced sample distribution and high requirements on instantaneity and accuracy in industrial control.

Description

The fuzzy distribution clustering method that compacts of a kind of bunch of characteristic weighing

Technical field

The invention belongs to technical field of data processing, especially relate to the fuzzy distribution clustering method that compacts of a kind of bunch of characteristic weighing.

Background technology

In natural science and social science, exist a large amount of classification problems, clustering method is a kind of statistical analysis technique of research (sample or index) classification problem, is also an important algorithm of data mining simultaneously, application is very extensive.FCM (FCM) clustering algorithm be commonly use without supervised recognition method, a lot of people constantly improve FCM algorithm, these algorithms have been considered the impact of each characteristic parameter of sample on cluster centre, have improved the situations such as noise, abnormal data impact.But, these clustering algorithms based on FCM, essence has all only been considered the interior compactness of the class of sample (divergence in class), and has ignored diffusive between sample class (between class scatter), can not well process the unbalanced data clusters problem of sample distribution.FCS (the Fuzzy Compactness and Separation) algorithm that the people such as Kuo-Lung Wu proposes considered to compact in class and class between scatter, and compatible hard division and the fuzzy division of sample, this more tallies with the actual situation; The people such as the domestic Song Fengxi of having have proposed the sorting technique of Maximum scatter difference criterion, and this criterion considers divergence between class scatter and class and asks optimum projection vector so that sample is classified; The people such as highland army have introduced Maximum scatter difference criterion by blur level and have proposed FMSDC (fuzzy maximum scatter difference discriminant criterion) algorithm, have carried out dimensionality reduction in fuzzy clustering; The people such as Zhi Xiaobin point out the mistake in the people's such as highland army algorithm, FMSDC-FCS clustering algorithm is proposed, the right version of people's algorithms such as this algorithm Shi Gao army, utilize FCM algorithm initialization degree of membership and sample average, with FMSDC algorithm, carry out dimensionality reduction again, with FCS algorithm, dimensionality reduction data are carried out to cluster, its cluster essence still adopts FCS algorithm.

And utilizing above-mentioned algorithm to carry out in the process of Data classification, we find, real data some in the hard zoning of certain class, the degree of membership of these data does not just need obfuscation, and, for the unbalanced data of sample distribution, how effectively to divide, this is that FCM algorithm and related expanding FCM algorithm cann't be solved.Although FCS algorithm has been considered the hard partition problem of sample, but do not consider in the borderline sample situation of hard division, when this has just caused real data to be classified, while running into data boundary, there is the problem that algorithm lost efficacy.

Summary of the invention

For existing WFCM algorithm, do not consider that sample divides actual conditions firmly when the cluster, can not the unbalanced data of fine processing sample distribution divide, FCS algorithm does not have to consider the situation of hard division frontier point and ignores the problem of sample characteristics parameter on all kinds of clusters impacts, the invention discloses the fuzzy distribution clustering method that compacts of a kind of bunch of characteristic weighing.

In order to achieve the above object, the invention provides following technical scheme:

The fuzzy distribution clustering method that compacts of bunch characteristic weighing, comprises the steps:

Step 1: arrange degree of membership exponent m, characteristic weighing index α ∈ [10 ,-1] ∪ (1,10], { 0.005,0.05,0.5,1}, primary iteration number of times p=0 and iteration error ε > 0, generate initial cluster center a to β ∈ at random _i, (s is characteristic parameter number);

Step 2: according to following formula design factor η _i:

η_{i} = \frac{β}{4} \frac{\min_{i &NotEqual; i^{'}} {| | a_{i} - a_{i^{'}} | |}^{2}}{\max_{t} {| | a_{t} - \overset{&OverBar;}{X} | |}^{2}}

Wherein, for sample average;

Step 3: according to following formula new samples degree of membership μ more _ij:

μ_{ij} = \frac{{(Σ_{k = 1}^{s} ω_{ik}^{α} ({| | x_{jk} - a_{ik} | |}^{2} - η_{i} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2}))}^{\frac{1}{1 - m}}}{Σ_{t = 1}^{c} {(Σ_{k = 1}^{s} ω_{tk}^{α} ({| | x_{jk} - a_{tk} | |}^{2} - η_{t} {| | a_{tk} - \overset{&OverBar;}{X_{k}} | |}^{2}))}^{\frac{1}{1 - m}}}

Note

Δ_{ij} = Σ_{k = 1}^{s} ω_{ik}^{α} ({| | x_{jk} - a_{ik} | |}^{2} - η_{i} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2})

As sample point x _jwhen existence drops on hard division border, Δ now _ij=0, guaranteeing under the prerequisite that each sample point is constant with respect to the distance scale of i class, to all Δs _ij>=0 utilize P (Δ _ij) adjust:

Δ_{ij} = P (Δ_{ij} &GreaterEqual; 0) = Δ_{ij} + rand * \min_{j} (Δ_{ij} > 0) (j = 1, . . ., n)

After adjustment, utilize following formula to calculate new μ _ij:

μ_{ij} = \frac{{Δ_{ij}}^{\frac{1}{1 - m}}}{Σ_{t = 1}^{c} {Δ_{tj}}^{\frac{1}{1 - m}}}

Because there is sample point x _jdrop in the hard zoning of i class, so have μ _ij< 0, therefore to μ _ijcarrying out hard division adjusts:

\{\begin{matrix} μ_{ij} = 1, & Δ_{ij} < 0 \\ μ_{i^{'} j} = 0, & i^{'} &NotEqual; i \end{matrix}

Step 4: according to following formula calculated characteristics weights omega _ik:

ω_{ik} = \frac{{(Σ_{j = 1}^{n} μ_{ij}^{m} ({| | x_{jk} - a_{ik} | |}^{2} - η_{i} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2}))}^{\frac{1}{1 - α}}}{Σ_{t = 1}^{s} {(Σ_{j = 1}^{n} μ_{tj}^{m} ({| | x_{jt} - a_{it} | |}^{2} - η_{i} {| | a_{it} - \overset{&OverBar;}{X_{t}} | |}^{2}))}^{\frac{1}{1 - α}}}

Note

Δ_{ik} = Σ_{j = 1}^{n} μ_{ij}^{m} ({| | x_{jk} - a_{ik} | |}^{2} - η_{i} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2})

If Δ _ik< 0, because ω _ik∈ [0,1], so need be by Δ _ikproject to be greater than 0 interval and guarantee k characteristic parameter of each sample and the distance scale of the hard dividing regions of i class constant, so utilize following formula adjustment Δ _k:

Δ_{ik} = Δ_{ik} - \min_{k} (Δ_{ik}) + \min_{k} (Δ_{ik} > 0)

After adjustment, utilize feature weight formula to calculate new ω _ik;

Step 5: calculate cluster centre a according to following formula _ik:

a_{ik} = \frac{Σ_{j = 1}^{n} μ_{ij}^{m} (x_{jk} - η_{i} \overset{&OverBar;}{X_{k}})}{Σ_{j = 1}^{n} μ_{ij}^{m} (1 - η_{i})}

Step 6: make iterations p=p+1, until otherwise forward step 2 to;

Step 7: the μ that the p time iteration obtained _ijoutput, according to j sample belongs to i class.

Further, described sample degree of membership μ _ijwith feature weight ω _ikcalculate as follows:

Set up objective function:

J_{CWFCS} = Σ_{i = 1}^{c} Σ_{j = 1}^{n} Σ_{k = 1}^{s} μ_{ij}^{m} ω_{ik}^{α} {| | x_{jk} - a_{ik} | |}^{2} - Σ_{i = 1}^{c} Σ_{j = 1}^{n} Σ_{k = 1}^{s} η_{i} μ_{ij}^{m} ω_{ik}^{α} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2}

The FCS clustering problem of bunch characteristic weighing is expressed as follows:

\{\begin{matrix} \min J_{CWFCS} \\ s . t . Σ_{i = 1}^{c} μ_{ij} = 1, Σ_{k = 1}^{s} ω_{ik} = 1 \end{matrix}

Utilize method of Lagrange multipliers to obtain:

L = Σ_{i = 1}^{c} Σ_{j = 1}^{n} Σ_{k = 1}^{s} μ_{ij}^{m} ω_{k}^{α} {| | x_{jk} - a_{ik} | |}^{2} - Σ_{i = 1}^{c} Σ_{j = 1}^{n} Σ_{k = 1}^{s} η_{i} μ_{ij}^{m} ω_{ik}^{α} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2} - Σ_{j = 1}^{n} (λ_{i} (Σ_{i = 1}^{c} μ_{ij} - 1)) - Σ_{i = 1}^{c} (λ_{i} (Σ_{k = 1}^{s} ω_{ik} - 1))

In above formula, λ _i, λ _jit is Lagrange multiplier;

According to above formula respectively to μ _ij, ω _ik, λ _i, λ _j, to ask local derviation and make local derviation result be zero to obtain μ _ij, ω _ik.

The present invention also provides fuzzy the compacting based on bunch characteristic weighing to scatter the industrial data sorting technique of clustering method, comprise: after the data that acquisition sensor collects, by CWFCS method provided by the invention (step 1～seven), the data that gather are classified, then according to the current state of classification results judgement commercial unit or technique.

Further, described sensor collection be aeromotor status data, judgement be the health status of aeromotor.

Beneficial effect:

The present invention has followed the hard actual conditions of dividing of sample, and take into full account the impact that sample characteristics parameter is divided sample, make as far as possible to compact in sample class, disperse between class, solved the sample degree of membership problem that is positioned at hard division border, in the unbalanced situation of sample distribution, for noise data and abnormal data, realized more effective division.Clustering performance is good, and fast convergence rate, iteration efficiency are high.Experiment showed, that this algorithm clustering performance is good, fast convergence rate, iteration efficiency are high.Compared with the conventional method, cluster accuracy rate of the present invention is high, and obvious minimizing consuming time, is suitable for being applied in the occasion that in Industry Control, sample distribution is unbalanced, requirement of real-time is high.

Accompanying drawing explanation

Fuzzy the compacting that Fig. 1 is bunch characteristic weighing scattered clustering method steps flow chart schematic diagram;

Fig. 2 is that the data of Iris data set distribute, the Clustering Effect of CWFCS algorithm, FCS algorithm and WFCM algorithm, cluster centre schematic diagram;

Fig. 3 is β=1 o'clock, CWFCS algorithm cluster result, hard division result and convergence schematic diagram;

Fig. 4 is β=0.5 o'clock, CWFCS algorithm cluster result, hard division result and convergence schematic diagram;

Fig. 5 is β=0.05 o'clock, CWFCS algorithm cluster result, hard division result and convergence schematic diagram;

Fig. 6 is β=0.005 o'clock, CWFCS algorithm cluster result, hard division result and convergence schematic diagram;

Fig. 7 is that the different values of parameter alpha, β, m affect schematic diagram to cluster result.

Embodiment

Below with reference to specific embodiment, technical scheme provided by the invention is elaborated, should understands following embodiment and only for the present invention is described, is not used in and limits the scope of the invention.

We find, real-life data are without supervision clustering, to exist sample to the hard division of cluster centre, and, in the borderline sample of hard division, comparing sample outside hard zoning should be maximum to such degree of membership, but in relatively hard zoning sample Relative Fuzzy is a little again, and each characteristic parameter of sample is to have different impacts on all kinds of cluster results, the present invention, just based on above-mentioned thinking, has proposed a kind of improved fuzzy distribution clustering method that compacts.

First in definition bunch characteristic weighing class, divergence and bunch characteristic weighing between class scatter are as follows:

S_{CWFW} = Σ_{i = 1}^{c} Σ_{j = 1}^{n} Σ_{k = 1}^{s} μ_{ij}^{m} ω_{ik}^{α} {| | x_{jk} - a_{ik} | |}^{2} - - - (1)

S_{CWFB} = Σ_{i = 1}^{c} Σ_{j = 1}^{n} Σ_{k = 1}^{s} η_{i} μ_{ij}^{m} ω_{ik}^{α} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2} - - - (2)

Characteristic weighing factor alpha ∈ [10,0) ∪ (1,10];

Set up objective function:

J_{CWFCS} = Σ_{i = 1}^{c} Σ_{j = 1}^{n} Σ_{k = 1}^{s} μ_{ij}^{m} ω_{ik}^{α} {| | x_{jk} - a_{ik} | |}^{2} - Σ_{i = 1}^{c} Σ_{j = 1}^{n} Σ_{k = 1}^{s} η_{i} μ_{ij}^{m} ω_{ik}^{α} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2}

\{\begin{matrix} \min J_{CWFCS} \\ s . t . Σ_{i = 1}^{c} μ_{ij} = 1, Σ_{k = 1}^{s} ω_{ik} = 1 \end{matrix}

Utilize method of Lagrange multipliers to obtain:

L = Σ_{i = 1}^{c} Σ_{j = 1}^{n} Σ_{k = 1}^{s} μ_{ij}^{m} ω_{k}^{α} {| | x_{jk} - a_{ik} | |}^{2} - Σ_{i = 1}^{c} Σ_{j = 1}^{n} Σ_{k = 1}^{s} η_{i} μ_{ij}^{m} ω_{ik}^{α} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2} - Σ_{j = 1}^{n} (λ_{i} (Σ_{i = 1}^{c} μ_{ij} - 1)) - Σ_{i = 1}^{c} (λ_{i} (Σ_{k = 1}^{s} ω_{ik} - 1))

In above formula, λ _i, λ _jit is Lagrange multiplier;

According to above formula respectively to μ _ij, ω _ik, λ _i, λ _j, to ask local derviation and make local derviation result be zero, try to achieve:

μ_{ij} = \frac{{(Σ_{k = 1}^{s} ω_{ik}^{α} ({| | x_{jk} - a_{ik} | |}^{2} - η_{i} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2}))}^{\frac{1}{1 - m}}}{Σ_{t = 1}^{c} {(Σ_{k = 1}^{s} ω_{tk}^{α} ({| | x_{jk} - a_{tk} | |}^{2} - η_{t} {| | a_{tk} - \overset{&OverBar;}{X_{k}} | |}^{2}))}^{\frac{1}{1 - m}}} - - - (3)

ω_{ik} = \frac{{(Σ_{j = 1}^{n} μ_{ij}^{m} ({| | x_{jk} - a_{ik} | |}^{2} - η_{i} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2}))}^{\frac{1}{1 - α}}}{Σ_{t = 1}^{s} {(Σ_{j = 1}^{n} μ_{tj}^{m} ({| | x_{jt} - a_{it} | |}^{2} - η_{i} {| | a_{it} - \overset{&OverBar;}{X_{t}} | |}^{2}))}^{\frac{1}{1 - α}}} - - - (4)

a_{ik} = \frac{Σ_{j = 1}^{n} μ_{ij}^{m} (x_{jk} - η_{i} \overset{&OverBar;}{X_{k}})}{Σ_{j = 1}^{n} μ_{ij}^{m} (1 - η_{i})} - - - (5)

The fuzzy distribution clustering method that compacts of bunch characteristic weighing, as shown in Figure 1, comprises the steps:

Step 2: according to following formula design factor η _i:

η_{i} = \frac{β}{4} \frac{\min_{i &NotEqual; i^{'}} {| | a_{i} - a_{i^{'}} | |}^{2}}{\max_{t} {| | a_{t} - \overset{&OverBar;}{X} | |}^{2}} - - - (6)

Wherein, for sample average;

Step 3: according to formula (3) new samples degree of membership μ more _ij:

μ_{ij} = \frac{{(Σ_{k = 1}^{s} ω_{ik}^{α} ({| | x_{jk} - a_{ik} | |}^{2} - η_{i} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2}))}^{\frac{1}{1 - m}}}{Σ_{t = 1}^{c} {(Σ_{k = 1}^{s} ω_{tk}^{α} ({| | x_{jk} - a_{tk} | |}^{2} - η_{t} {| | a_{tk} - \overset{&OverBar;}{X_{k}} | |}^{2}))}^{\frac{1}{1 - m}}} - - - (3)

Note

Δ_{ij} = Σ_{k = 1}^{s} ω_{ik}^{α} ({| | x_{jk} - a_{ik} | |}^{2} - η_{i} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2}) - - - (7)

Consider sample point x _jdrop on hard division border condition, if now directly use formula (3) to calculate μ _ijfor positive infinity, algorithm is invalid; For the sample point itself that drops on i class and firmly divide border, just there is ambiguity, if it is hardened and minute is not conformed to actual conditions, but compare x with the sample point outside other drop on hard zoning _jfor i class, there is larger fuzzy membership, guaranteeing under the prerequisite that each sample point is constant with respect to the distance scale of i class, to all Δs _ijfunction P (Δ is adjusted in>=0 utilization _ij) adjust:

Δ_{ij} = P (Δ_{ij} &GreaterEqual; 0) = Δ_{ij} + rand * \min_{j} (Δ_{ij} > 0) (j = 1, . . ., n) - - - (8)

After adjustment, utilize following formula to calculate new μ _ij:

μ_{ij} = \frac{{Δ_{ij}}^{\frac{1}{1 - m}}}{Σ_{t = 1}^{c} {Δ_{tj}}^{\frac{1}{1 - m}}} - - - (9)

\{\begin{matrix} μ_{ij} = 1, & Δ_{ij} < 0 \\ μ_{i^{'} j} = 0, & i^{'} &NotEqual; i \end{matrix} - - - (10)

ω_{ik} = \frac{{(Σ_{j = 1}^{n} μ_{ij}^{m} ({| | x_{jk} - a_{ik} | |}^{2} - η_{i} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2}))}^{\frac{1}{1 - α}}}{Σ_{t = 1}^{s} {(Σ_{j = 1}^{n} μ_{tj}^{m} ({| | x_{jt} - a_{it} | |}^{2} - η_{i} {| | a_{it} - \overset{&OverBar;}{X_{t}} | |}^{2}))}^{\frac{1}{1 - α}}}

Note

Δ_{ik} = Σ_{j = 1}^{n} μ_{ij}^{m} ({| | x_{jk} - a_{ik} | |}^{2} - η_{i} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2}) - - - (11)

Work as Δ _ikwithin=0 o'clock, k characteristic parameter is the same on the impact of i class cluster, so ω _ik=0.

If sample distribution is extremely unbalanced, there is Δ _ik< 0, because ω _ik∈ [0,1], so need be by Δ _ikproject to be greater than 0 interval and guarantee k characteristic parameter of each sample and the distance scale of the hard dividing regions of i class constant, so utilize following formula adjustment Δ _ik:

Δ_{ik} = Δ_{ik} - \min_{k} (Δ_{ik}) + \min_{k} (Δ_{ik} > 0) - - - (12)

After adjustment, utilize feature weight formula to calculate new ω _ik;

Step 5: calculate cluster centre a according to following formula _ik:

a_{ik} = \frac{Σ_{j = 1}^{n} μ_{ij}^{m} (x_{jk} - η_{i} \overset{&OverBar;}{X_{k}})}{Σ_{j = 1}^{n} μ_{ij}^{m} (1 - η_{i})} - - - (13)

Step 6: make iterations p=p+1, until otherwise forward step 2 to;

Pass through above-mentioned steps, the hard actual conditions of dividing of sample have been followed, and take into full account the impact of sample characteristics parameter on all kinds of divisions, make as far as possible to compact in sample class, disperse between class, solve the sample degree of membership problem that is positioned at hard division border, in the unbalanced situation of sample distribution, for noise data and abnormal data, realized more effective division.

Embodiment mono-:

For performance of the present invention is described better, we adopt the inventive method for one of them True Data collection of UCI repository of machine learning databases: Iris data set carries out classification experiments, Fuzzy Exponential m is made as respectively (1.5,2,2.5,3,3.5), iteration error precision gets 10 ^-6parameter beta in of the present invention bunch of feature computation system CWFCS algorithm is made as respectively (0.005,0.05,0.5,1), for representing the unbalanced situation of sample distribution, Iris data set retains all data of first and second class and chooses at random 10 samples from the 3rd class, totally 110 samples are divided into 3 classes, and wherein the 2nd class and the 3rd class have intersection, adopt the cluster result of algorithm of the present invention (being called for short CWFCS algorithm) as shown in Fig. 2～Fig. 6.As can be seen from Figure 2, this algorithm possesses basic the function of convergence, and raw data shown in cluster result and Fig. 2 (a) distributes roughly the same, and Fig. 3～Fig. 6 shows that distance between three class cluster centres is along with β changes and changes.When β is reduced to 0.05 by 1, system ambiguous degree increases, and shows as three class cluster centres and draws close gradually; Because the 3rd class sample number is few more than first and second class, and also and Equations of The Second Kind have overlapping, in order to make to compact in sample class, also make to scatter between class large as far as possible simultaneously, so when β gets 0.005, first and third class centre distance increases a bit on the contrary a little with second and third relative β=0.05 of class centre distance o'clock; The sample that Fig. 3～Fig. 6 (b) provides is divided effect firmly, β reduces to 0.005 gradually by 1, corresponding hard in 110 samples to divide sample number be 79,64,42,0, and this sample that shows that algorithm has herein retained FCS algorithm is divided characteristic firmly, and β more large sample firmly to divide degree higher; Fig. 3～Fig. 6 (c) is cluster centre variable quantity, can find out that algorithm the convergence speed is fast, iteration efficiency is high herein; This algorithm makes to scatter in bunch characteristic weighing class of sample and between as far as possible little and bunch characteristic weighing class, scatters greatly as far as possible, if each cluster centre is overstepping the bounds of propriety, loose between bunch characteristic weighing class, to scatter the fuzzy division degree of less sample higher.Above-mentioned experimental result shows, this algorithm clustering performance is good, and fast convergence rate, iteration efficiency are high.

The impact of the different values of Fig. 7 display parameter α, β, m on cluster.β is less, and mistake minute rate is larger; No matter β gets any value, to same β, and α=2, { during 1.5,2}, when average mistake minute rate minimum and β < 0.5, algorithm is more responsive to α, m value for m ∈.Fig. 7 (a) β=1, during α > 3, is that the larger mistake of α minute rate is less during m round numbers (2,3), otherwise is that the less mistake of α minute rate is less; α < 0 mistiming minute rate is along with α diminishes and diminishes, and m affects not quite.Fig. 7 (b)～(d) show that the trend that algorithm is affected by α, m is basically identical, has the larger mistake of m minute rate larger to a certain α when β < 1; To a certain m (not considering the optimal situation of α=2), if 0 larger mistake of α of α > divides rate less, if 0 less mistake of α of α < divides rate less.

Embodiment bis-:

In order to verify superiority of the present invention, we test Iris data set by FCS, WFCM and tri-methods of CWFCS provided by the invention respectively.

In experiment, Fuzzy Exponential m is made as respectively (1.5,2,2.5,3,3.5) mistake in experiment! Do not find Reference source., iteration error precision gets 10 ^-6, the parameter beta in CWFCS algorithm is made as respectively (0.005,0.05,0.5,1); Experiment repeats 100 times, gets optimal result and average result.By accuracy (Accuracy), iterations (Iter), execution time (Time) three indexs, carry out measure algorithm optimal performance, with Average Accuracy (avg_Accuracy, correct sample number/the total sample number of dividing), mean iterative number of time (avg_Iter) and average execution time (avg_Time) carry out measure algorithm overall performance, best and average result is as shown in table 1 in the cluster result of three kinds of algorithms:

Algorithm	Accury	IterNO	Time	avg_Accury	avg_Iterno	avg_Time
							FCS	0.754545	28	0.028236	0.689091	35	0.193956
WFCM	0.854545	30	0.103216	0.852424	29	0.090867
							CWFCS	0.981818	48	0.055334	0.966364	55	0.063656

Table 1

As can be seen from Table 1, for Iris data set, the high-accuracy of CWFCS algorithm and Average Accuracy are all higher than other two algorithms; The execution time of CWFCS is the shortest, its average execution time than FCS algorithm shortened approximately 67%, Average Accuracy improved 40% than FCS algorithm, than WFCM algorithm time shorten 21%, Average Accuracy improved 23%.

Above-mentioned experimental result is based on obtaining without the Iris data set of making an uproar, and we can also test adding the Iris data set of making an uproar by FCS, WFCM and tri-methods of CWFCS provided by the invention, and experiment parameter and environment are with above-mentioned identical while making an uproar Iris data set for nothing.In the cluster result of three kinds of algorithms preferably and average result as shown in table 2:

Algorithm	Accury	IterNO	Time	avg_Accury	avg_Iterno	avg_Time
							FCS	0.754545	40	0.386212	0.720606	62	0.468495
WFCM	0.845455	26	0.109535	0.845455	29	0.101066
							CWFCS	0.972727	29	0.031420	0.887879	43	0.049336

Table 2

As can be seen from Table 2, for adding the Iris data set of making an uproar, the high-accuracy of CWFCS algorithm and Average Accuracy are also apparently higher than other two algorithms.

Embodiment tri-:

We test Breast Cancer data set by FCS, WFCM and tri-methods of CWFCS provided by the invention respectively again, Breast Cancer data set has 30 attributes, unbalanced for representing sample distribution, 10 samples of the random selection of the first kind, Equations of The Second Kind has 367 samples, and result is as shown in table 2.Table 3 can find out that CWFCS algorithm performance is the most stable, and iterations is slightly higher than WFCM algorithm, and the execution time, clustering precision was higher than other two kinds of algorithms within 0.1 second.

Algorithm	Accury	IterNO	Time	avg_Accury	avg_Iterno	avg_Time
							FCS	0.737401	45	0.827577	0.737401	43	0.533281
WFCM	0.819629	11	0.026210	0.767109	11	0.030475
							CWFCS	0.965517	13	0.074786	0.960212	12	0.075808

Table 3

Embodiment tetra-:

We use respectively FCS, WFCM and tri-methods of CWFCS provided by the invention to test aerial engine air passage emulated data collection (add and make an uproar) again, and result is as shown in table 4.GasPath data set is aerial engine air passage data, comprises DEGT, DNH, tri-characteristic parameters of DFF, and wherein health data sample is totally 200, and fault data sample is random selects 5.

Algorithm	Accury	IterNO	Time	avg_Accury	avg_Iterno	avg_Time
							FCS	0.614634	24	0.290102	0.614634	24	0.181671
WFCM	0.6	19	0.046147	0.6	21	0.052607
							CWFCS	0.917073	15	0.023733	0.86878	23	0.033184

Table 4

As seen from Table 4, for GasPath data set, for being subject to the data of noise pollution to have good robustness in engineering application, and more can divide accurately data, for such data, utilizing in sample class diffusive between compactness and class to carry out the algorithm accuracy rate of cluster will be higher than the WFCM algorithm of only considering compactness in class.

Embodiment five:

The present invention also provides the concrete application process in Industry Control of the present invention:

First, must carry out for the important design parameter in Industry Control status surveillance (various kinds of sensors need to be set conventionally to obtain comprehensive data), after the data that acquisition sensor collects, by CWFCS method provided by the invention (step 1～seven), the data that gather are classified, then according to the current state of classification results judgement commercial unit or technique.For example by sensor, aeromotor is carried out to status surveillance, by the data that gather are classified (by CWFCS method provided by the invention, step 1～seven), judge aeromotor current whether be unhealthy status.

The disclosed technological means of the present invention program is not limited only to the disclosed technological means of above-mentioned embodiment, also comprises the technical scheme being comprised of above technical characterictic combination in any.It should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention, can also make some improvements and modifications, these improvements and modifications are also considered as protection scope of the present invention.

Claims

1. the fuzzy distribution clustering method that compacts of bunch characteristic weighing, is characterized in that, comprises the steps:

Step 1: arrange degree of membership exponent m, characteristic weighing index α ∈ [10 ,-1] ∪ (1,10], β ∈ 0.005,0.05,0.5,1}, primary iteration number of times p=0 and iteration error ε > 0, generate initial cluster center ai at random, (s is characteristic parameter number);

Step 2: according to following formula design factor η _i:

η_{i} = \frac{β}{4} \frac{\min_{i &NotEqual; i^{'}} {| | a_{i} - a_{i^{'}} | |}^{2}}{\max_{t} {| | a_{t} - \overset{&OverBar;}{X} | |}^{2}}

Wherein, for sample average;

μ_{ij} = \frac{{(Σ_{k = 1}^{s} ω_{ik}^{α} ({| | x_{jk} - a_{ik} | |}^{2} - η_{i} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2}))}^{\frac{1}{1 - m}}}{Σ_{t = 1}^{c} {(Σ_{k = 1}^{s} ω_{tk}^{α} ({| | x_{jk} - a_{tk} | |}^{2} - η_{t} {| | a_{tk} - \overset{&OverBar;}{X_{k}} | |}^{2}))}^{\frac{1}{1 - m}}}

Note

Δ_{ij} = Σ_{k = 1}^{s} ω_{ik}^{α} ({| | x_{jk} - a_{ik} | |}^{2} - η_{i} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2})

Δ_{ij} = P (Δ_{ij} &GreaterEqual; 0) = Δ_{ij} + rand * \min_{j} (Δ_{ij} > 0) (j = 1, . . ., n)

After adjustment, utilize following formula to calculate new μ _ij:

μ_{ij} = \frac{{Δ_{ij}}^{\frac{1}{1 - m}}}{Σ_{t = 1}^{c} {Δ_{tj}}^{\frac{1}{1 - m}}}

\{\begin{matrix} μ_{ij} = 1, & Δ_{ij} < 0 \\ μ_{i^{'} j} = 0, & i^{'} &NotEqual; i \end{matrix}

ω_{ik} = \frac{{(Σ_{j = 1}^{n} μ_{ij}^{m} ({| | x_{jk} - a_{ik} | |}^{2} - η_{i} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2}))}^{\frac{1}{1 - α}}}{Σ_{t = 1}^{s} {(Σ_{j = 1}^{n} μ_{tj}^{m} ({| | x_{jt} - a_{it} | |}^{2} - η_{i} {| | a_{it} - \overset{&OverBar;}{X_{t}} | |}^{2}))}^{\frac{1}{1 - α}}}

Note

Δ_{ik} = Σ_{j = 1}^{n} μ_{ij}^{m} ({| | x_{jk} - a_{ik} | |}^{2} - η_{i} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2})

Δ_{ik} = Δ_{ik} - \min_{k} (Δ_{ik}) + \min_{k} (Δ_{ik} > 0)

After adjustment, utilize feature weight formula to calculate new ω _ik;

Step 5: calculate cluster centre a according to following formula _ik:

a_{ik} = \frac{Σ_{j = 1}^{n} μ_{ij}^{m} (x_{jk} - η_{i} \overset{&OverBar;}{X_{k}})}{Σ_{j = 1}^{n} μ_{ij}^{m} (1 - η_{i})}

Step 6: make iterations p=p+1, until otherwise forward step 2 to;

2. the fuzzy distribution clustering method that compacts of according to claim 1 bunch of characteristic weighing, is characterized in that: described sample degree of membership μ _ijwith feature weight ω _ikcalculate as follows:

Set up objective function:

J_{CWFCS} = Σ_{i = 1}^{c} Σ_{j = 1}^{n} Σ_{k = 1}^{s} μ_{ij}^{m} ω_{ik}^{α} {| | x_{jk} - a_{ik} | |}^{2} - Σ_{i = 1}^{c} Σ_{j = 1}^{n} Σ_{k = 1}^{s} η_{i} μ_{ij}^{m} ω_{ik}^{α} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2}

\{\begin{matrix} \min J_{CWFCS} \\ s . t . Σ_{i = 1}^{c} μ_{ij} = 1, Σ_{k = 1}^{s} ω_{ik} = 1 \end{matrix}

Utilize method of Lagrange multipliers to obtain:

L = Σ_{i = 1}^{c} Σ_{j = 1}^{n} Σ_{k = 1}^{s} μ_{ij}^{m} ω_{k}^{α} {| | x_{jk} - a_{ik} | |}^{2} - Σ_{i = 1}^{c} Σ_{j = 1}^{n} Σ_{k = 1}^{s} η_{i} μ_{ij}^{m} ω_{ik}^{α} {| | a_{ik} - \overset{&OverBar;}{X_{k}} | |}^{2} - Σ_{j = 1}^{n} (λ_{i} (Σ_{i = 1}^{c} μ_{ij} - 1)) - Σ_{i = 1}^{c} (λ_{i} (Σ_{k = 1}^{s} ω_{ik} - 1))

In above formula, λ _i, λ _jit is Lagrange multiplier;

3. fuzzy the compacting based on bunch characteristic weighing scattered the skewness weighing apparatus industrial data sorting technique of clustering method, comprise the steps: to obtain after the data that sensor collects, by claim, require the fuzzy distribution clustering method that compacts of bunch characteristic weighing described in 1 or 2 to classify to the data that gather, then according to the current state of classification results judgement commercial unit or technique.

4. fuzzy the compacting based on bunch characteristic weighing according to claim 3 scattered the skewness weighing apparatus industrial data sorting technique of clustering method, what comprise the steps: described sensor collection is aeromotor status data, judgement be the health status of aeromotor.