CN104182511A - Cluster-feature-weighted fuzzy compact scattering and clustering method - Google Patents

Cluster-feature-weighted fuzzy compact scattering and clustering method Download PDF

Info

Publication number
CN104182511A
CN104182511A CN201410413719.8A CN201410413719A CN104182511A CN 104182511 A CN104182511 A CN 104182511A CN 201410413719 A CN201410413719 A CN 201410413719A CN 104182511 A CN104182511 A CN 104182511A
Authority
CN
China
Prior art keywords
sigma
sample
delta
eta
omega
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410413719.8A
Other languages
Chinese (zh)
Other versions
CN104182511B (en
Inventor
周媛
王丽娜
何军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing ditavi Data Technology Co., Ltd
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN201410413719.8A priority Critical patent/CN104182511B/en
Publication of CN104182511A publication Critical patent/CN104182511A/en
Application granted granted Critical
Publication of CN104182511B publication Critical patent/CN104182511B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention discloses a cluster-feature-weighted fuzzy compact scattering and clustering method and aims at the problems that an existing WFCM algorithm does not take actual situations of sample hard division into consideration and is poor in effect on clustering of data with unbalanced sample distribution and an FCS (fuzzy compactness and separation) algorithm does not take situations of hard division boundary points and neglects influence, of sample feature parameters, on clustering of various kinds. By adjusting sample membership degree and feature weight, actual situations of sample hard division are followed, influence, of the sampler feature parameters, on clustering of various kinds is fully taken into consideration, samples are enabled to be compact in a category and disperse among categories as far as possible, the problem of membership degree of the samples positioned at a hard division boundary is solved, and noise data and abnormal data are divided more effectively under the circumstance that the samples are distributed in an unbalanced manner. The cluster-feature-weighted fuzzy compact scattering and clustering method is high in clustering performance, high in convergence speed, high in iteration efficiency and suitable for being applied to occasions with unbalanced sample distribution and high requirements on instantaneity and accuracy in industrial control.

Description

The fuzzy distribution clustering method that compacts of a kind of bunch of characteristic weighing
Technical field
The invention belongs to technical field of data processing, especially relate to the fuzzy distribution clustering method that compacts of a kind of bunch of characteristic weighing.
Background technology
In natural science and social science, exist a large amount of classification problems, clustering method is a kind of statistical analysis technique of research (sample or index) classification problem, is also an important algorithm of data mining simultaneously, application is very extensive.FCM (FCM) clustering algorithm be commonly use without supervised recognition method, a lot of people constantly improve FCM algorithm, these algorithms have been considered the impact of each characteristic parameter of sample on cluster centre, have improved the situations such as noise, abnormal data impact.But, these clustering algorithms based on FCM, essence has all only been considered the interior compactness of the class of sample (divergence in class), and has ignored diffusive between sample class (between class scatter), can not well process the unbalanced data clusters problem of sample distribution.FCS (the Fuzzy Compactness and Separation) algorithm that the people such as Kuo-Lung Wu proposes considered to compact in class and class between scatter, and compatible hard division and the fuzzy division of sample, this more tallies with the actual situation; The people such as the domestic Song Fengxi of having have proposed the sorting technique of Maximum scatter difference criterion, and this criterion considers divergence between class scatter and class and asks optimum projection vector so that sample is classified; The people such as highland army have introduced Maximum scatter difference criterion by blur level and have proposed FMSDC (fuzzy maximum scatter difference discriminant criterion) algorithm, have carried out dimensionality reduction in fuzzy clustering; The people such as Zhi Xiaobin point out the mistake in the people's such as highland army algorithm, FMSDC-FCS clustering algorithm is proposed, the right version of people's algorithms such as this algorithm Shi Gao army, utilize FCM algorithm initialization degree of membership and sample average, with FMSDC algorithm, carry out dimensionality reduction again, with FCS algorithm, dimensionality reduction data are carried out to cluster, its cluster essence still adopts FCS algorithm.
And utilizing above-mentioned algorithm to carry out in the process of Data classification, we find, real data some in the hard zoning of certain class, the degree of membership of these data does not just need obfuscation, and, for the unbalanced data of sample distribution, how effectively to divide, this is that FCM algorithm and related expanding FCM algorithm cann't be solved.Although FCS algorithm has been considered the hard partition problem of sample, but do not consider in the borderline sample situation of hard division, when this has just caused real data to be classified, while running into data boundary, there is the problem that algorithm lost efficacy.
Summary of the invention
For existing WFCM algorithm, do not consider that sample divides actual conditions firmly when the cluster, can not the unbalanced data of fine processing sample distribution divide, FCS algorithm does not have to consider the situation of hard division frontier point and ignores the problem of sample characteristics parameter on all kinds of clusters impacts, the invention discloses the fuzzy distribution clustering method that compacts of a kind of bunch of characteristic weighing.
In order to achieve the above object, the invention provides following technical scheme:
The fuzzy distribution clustering method that compacts of bunch characteristic weighing, comprises the steps:
Step 1: arrange degree of membership exponent m, characteristic weighing index α ∈ [10 ,-1] ∪ (1,10], { 0.005,0.05,0.5,1}, primary iteration number of times p=0 and iteration error ε > 0, generate initial cluster center a to β ∈ at random i, (s is characteristic parameter number);
Step 2: according to following formula design factor η i:
η i = β 4 min i ≠ i ′ | | a i - a i ′ | | 2 max t | | a t - X ‾ | | 2
Wherein, for sample average;
Step 3: according to following formula new samples degree of membership μ more ij:
μ ij = ( Σ k = 1 s ω ik α ( | | x jk - a ik | | 2 - η i | | a ik - X k ‾ | | 2 ) ) 1 1 - m Σ t = 1 c ( Σ k = 1 s ω tk α ( | | x jk - a tk | | 2 - η t | | a tk - X k ‾ | | 2 ) ) 1 1 - m
Note
Δ ij = Σ k = 1 s ω ik α ( | | x jk - a ik | | 2 - η i | | a ik - X k ‾ | | 2 )
As sample point x jwhen existence drops on hard division border, Δ now ij=0, guaranteeing under the prerequisite that each sample point is constant with respect to the distance scale of i class, to all Δs ij>=0 utilize P (Δ ij) adjust:
Δ ij = P ( Δ ij ≥ 0 ) = Δ ij + rand * min j ( Δ ij > 0 ) ( j = 1 , . . . , n )
After adjustment, utilize following formula to calculate new μ ij:
μ ij = Δ ij 1 1 - m Σ t = 1 c Δ tj 1 1 - m
Because there is sample point x jdrop in the hard zoning of i class, so have μ ij< 0, therefore to μ ijcarrying out hard division adjusts:
&mu; ij = 1 , &Delta; ij < 0 &mu; i &prime; j = 0 , i &prime; &NotEqual; i
Step 4: according to following formula calculated characteristics weights omega ik:
&omega; ik = ( &Sigma; j = 1 n &mu; ij m ( | | x jk - a ik | | 2 - &eta; i | | a ik - X k &OverBar; | | 2 ) ) 1 1 - &alpha; &Sigma; t = 1 s ( &Sigma; j = 1 n &mu; tj m ( | | x jt - a it | | 2 - &eta; i | | a it - X t &OverBar; | | 2 ) ) 1 1 - &alpha;
Note
&Delta; ik = &Sigma; j = 1 n &mu; ij m ( | | x jk - a ik | | 2 - &eta; i | | a ik - X k &OverBar; | | 2 )
If Δ ik< 0, because ω ik∈ [0,1], so need be by Δ ikproject to be greater than 0 interval and guarantee k characteristic parameter of each sample and the distance scale of the hard dividing regions of i class constant, so utilize following formula adjustment Δ k:
&Delta; ik = &Delta; ik - min k ( &Delta; ik ) + min k ( &Delta; ik > 0 )
After adjustment, utilize feature weight formula to calculate new ω ik;
Step 5: calculate cluster centre a according to following formula ik:
a ik = &Sigma; j = 1 n &mu; ij m ( x jk - &eta; i X k &OverBar; ) &Sigma; j = 1 n &mu; ij m ( 1 - &eta; i )
Step 6: make iterations p=p+1, until otherwise forward step 2 to;
Step 7: the μ that the p time iteration obtained ijoutput, according to j sample belongs to i class.
Further, described sample degree of membership μ ijwith feature weight ω ikcalculate as follows:
Set up objective function:
J CWFCS = &Sigma; i = 1 c &Sigma; j = 1 n &Sigma; k = 1 s &mu; ij m &omega; ik &alpha; | | x jk - a ik | | 2 - &Sigma; i = 1 c &Sigma; j = 1 n &Sigma; k = 1 s &eta; i &mu; ij m &omega; ik &alpha; | | a ik - X k &OverBar; | | 2
The FCS clustering problem of bunch characteristic weighing is expressed as follows:
min J CWFCS s . t . &Sigma; i = 1 c &mu; ij = 1 , &Sigma; k = 1 s &omega; ik = 1
Utilize method of Lagrange multipliers to obtain:
L = &Sigma; i = 1 c &Sigma; j = 1 n &Sigma; k = 1 s &mu; ij m &omega; k &alpha; | | x jk - a ik | | 2 - &Sigma; i = 1 c &Sigma; j = 1 n &Sigma; k = 1 s &eta; i &mu; ij m &omega; ik &alpha; | | a ik - X k &OverBar; | | 2 - &Sigma; j = 1 n ( &lambda; i ( &Sigma; i = 1 c &mu; ij - 1 ) ) - &Sigma; i = 1 c ( &lambda; i ( &Sigma; k = 1 s &omega; ik - 1 ) )
In above formula, λ i, λ jit is Lagrange multiplier;
According to above formula respectively to μ ij, ω ik, λ i, λ j, to ask local derviation and make local derviation result be zero to obtain μ ij, ω ik.
The present invention also provides fuzzy the compacting based on bunch characteristic weighing to scatter the industrial data sorting technique of clustering method, comprise: after the data that acquisition sensor collects, by CWFCS method provided by the invention (step 1~seven), the data that gather are classified, then according to the current state of classification results judgement commercial unit or technique.
Further, described sensor collection be aeromotor status data, judgement be the health status of aeromotor.
Beneficial effect:
The present invention has followed the hard actual conditions of dividing of sample, and take into full account the impact that sample characteristics parameter is divided sample, make as far as possible to compact in sample class, disperse between class, solved the sample degree of membership problem that is positioned at hard division border, in the unbalanced situation of sample distribution, for noise data and abnormal data, realized more effective division.Clustering performance is good, and fast convergence rate, iteration efficiency are high.Experiment showed, that this algorithm clustering performance is good, fast convergence rate, iteration efficiency are high.Compared with the conventional method, cluster accuracy rate of the present invention is high, and obvious minimizing consuming time, is suitable for being applied in the occasion that in Industry Control, sample distribution is unbalanced, requirement of real-time is high.
Accompanying drawing explanation
Fuzzy the compacting that Fig. 1 is bunch characteristic weighing scattered clustering method steps flow chart schematic diagram;
Fig. 2 is that the data of Iris data set distribute, the Clustering Effect of CWFCS algorithm, FCS algorithm and WFCM algorithm, cluster centre schematic diagram;
Fig. 3 is β=1 o'clock, CWFCS algorithm cluster result, hard division result and convergence schematic diagram;
Fig. 4 is β=0.5 o'clock, CWFCS algorithm cluster result, hard division result and convergence schematic diagram;
Fig. 5 is β=0.05 o'clock, CWFCS algorithm cluster result, hard division result and convergence schematic diagram;
Fig. 6 is β=0.005 o'clock, CWFCS algorithm cluster result, hard division result and convergence schematic diagram;
Fig. 7 is that the different values of parameter alpha, β, m affect schematic diagram to cluster result.
Embodiment
Below with reference to specific embodiment, technical scheme provided by the invention is elaborated, should understands following embodiment and only for the present invention is described, is not used in and limits the scope of the invention.
We find, real-life data are without supervision clustering, to exist sample to the hard division of cluster centre, and, in the borderline sample of hard division, comparing sample outside hard zoning should be maximum to such degree of membership, but in relatively hard zoning sample Relative Fuzzy is a little again, and each characteristic parameter of sample is to have different impacts on all kinds of cluster results, the present invention, just based on above-mentioned thinking, has proposed a kind of improved fuzzy distribution clustering method that compacts.
First in definition bunch characteristic weighing class, divergence and bunch characteristic weighing between class scatter are as follows:
S CWFW = &Sigma; i = 1 c &Sigma; j = 1 n &Sigma; k = 1 s &mu; ij m &omega; ik &alpha; | | x jk - a ik | | 2 - - - ( 1 )
S CWFB = &Sigma; i = 1 c &Sigma; j = 1 n &Sigma; k = 1 s &eta; i &mu; ij m &omega; ik &alpha; | | a ik - X k &OverBar; | | 2 - - - ( 2 )
Characteristic weighing factor alpha ∈ [10,0) ∪ (1,10];
Set up objective function:
J CWFCS = &Sigma; i = 1 c &Sigma; j = 1 n &Sigma; k = 1 s &mu; ij m &omega; ik &alpha; | | x jk - a ik | | 2 - &Sigma; i = 1 c &Sigma; j = 1 n &Sigma; k = 1 s &eta; i &mu; ij m &omega; ik &alpha; | | a ik - X k &OverBar; | | 2
The FCS clustering problem of bunch characteristic weighing is expressed as follows:
min J CWFCS s . t . &Sigma; i = 1 c &mu; ij = 1 , &Sigma; k = 1 s &omega; ik = 1
Utilize method of Lagrange multipliers to obtain:
L = &Sigma; i = 1 c &Sigma; j = 1 n &Sigma; k = 1 s &mu; ij m &omega; k &alpha; | | x jk - a ik | | 2 - &Sigma; i = 1 c &Sigma; j = 1 n &Sigma; k = 1 s &eta; i &mu; ij m &omega; ik &alpha; | | a ik - X k &OverBar; | | 2 - &Sigma; j = 1 n ( &lambda; i ( &Sigma; i = 1 c &mu; ij - 1 ) ) - &Sigma; i = 1 c ( &lambda; i ( &Sigma; k = 1 s &omega; ik - 1 ) )
In above formula, λ i, λ jit is Lagrange multiplier;
According to above formula respectively to μ ij, ω ik, λ i, λ j, to ask local derviation and make local derviation result be zero, try to achieve:
&mu; ij = ( &Sigma; k = 1 s &omega; ik &alpha; ( | | x jk - a ik | | 2 - &eta; i | | a ik - X k &OverBar; | | 2 ) ) 1 1 - m &Sigma; t = 1 c ( &Sigma; k = 1 s &omega; tk &alpha; ( | | x jk - a tk | | 2 - &eta; t | | a tk - X k &OverBar; | | 2 ) ) 1 1 - m - - - ( 3 )
&omega; ik = ( &Sigma; j = 1 n &mu; ij m ( | | x jk - a ik | | 2 - &eta; i | | a ik - X k &OverBar; | | 2 ) ) 1 1 - &alpha; &Sigma; t = 1 s ( &Sigma; j = 1 n &mu; tj m ( | | x jt - a it | | 2 - &eta; i | | a it - X t &OverBar; | | 2 ) ) 1 1 - &alpha; - - - ( 4 )
a ik = &Sigma; j = 1 n &mu; ij m ( x jk - &eta; i X k &OverBar; ) &Sigma; j = 1 n &mu; ij m ( 1 - &eta; i ) - - - ( 5 )
The fuzzy distribution clustering method that compacts of bunch characteristic weighing, as shown in Figure 1, comprises the steps:
Step 1: arrange degree of membership exponent m, characteristic weighing index α ∈ [10 ,-1] ∪ (1,10], { 0.005,0.05,0.5,1}, primary iteration number of times p=0 and iteration error ε > 0, generate initial cluster center a to β ∈ at random i, (s is characteristic parameter number);
Step 2: according to following formula design factor η i:
&eta; i = &beta; 4 min i &NotEqual; i &prime; | | a i - a i &prime; | | 2 max t | | a t - X &OverBar; | | 2 - - - ( 6 )
Wherein, for sample average;
Step 3: according to formula (3) new samples degree of membership μ more ij:
&mu; ij = ( &Sigma; k = 1 s &omega; ik &alpha; ( | | x jk - a ik | | 2 - &eta; i | | a ik - X k &OverBar; | | 2 ) ) 1 1 - m &Sigma; t = 1 c ( &Sigma; k = 1 s &omega; tk &alpha; ( | | x jk - a tk | | 2 - &eta; t | | a tk - X k &OverBar; | | 2 ) ) 1 1 - m - - - ( 3 )
Note
&Delta; ij = &Sigma; k = 1 s &omega; ik &alpha; ( | | x jk - a ik | | 2 - &eta; i | | a ik - X k &OverBar; | | 2 ) - - - ( 7 )
Consider sample point x jdrop on hard division border condition, if now directly use formula (3) to calculate μ ijfor positive infinity, algorithm is invalid; For the sample point itself that drops on i class and firmly divide border, just there is ambiguity, if it is hardened and minute is not conformed to actual conditions, but compare x with the sample point outside other drop on hard zoning jfor i class, there is larger fuzzy membership, guaranteeing under the prerequisite that each sample point is constant with respect to the distance scale of i class, to all Δs ijfunction P (Δ is adjusted in>=0 utilization ij) adjust:
&Delta; ij = P ( &Delta; ij &GreaterEqual; 0 ) = &Delta; ij + rand * min j ( &Delta; ij > 0 ) ( j = 1 , . . . , n ) - - - ( 8 )
After adjustment, utilize following formula to calculate new μ ij:
&mu; ij = &Delta; ij 1 1 - m &Sigma; t = 1 c &Delta; tj 1 1 - m - - - ( 9 )
Because there is sample point x jdrop in the hard zoning of i class, so have μ ij< 0, therefore to μ ijcarrying out hard division adjusts:
&mu; ij = 1 , &Delta; ij < 0 &mu; i &prime; j = 0 , i &prime; &NotEqual; i - - - ( 10 )
Step 4: according to following formula calculated characteristics weights omega ik:
&omega; ik = ( &Sigma; j = 1 n &mu; ij m ( | | x jk - a ik | | 2 - &eta; i | | a ik - X k &OverBar; | | 2 ) ) 1 1 - &alpha; &Sigma; t = 1 s ( &Sigma; j = 1 n &mu; tj m ( | | x jt - a it | | 2 - &eta; i | | a it - X t &OverBar; | | 2 ) ) 1 1 - &alpha;
Note
&Delta; ik = &Sigma; j = 1 n &mu; ij m ( | | x jk - a ik | | 2 - &eta; i | | a ik - X k &OverBar; | | 2 ) - - - ( 11 )
Work as Δ ikwithin=0 o'clock, k characteristic parameter is the same on the impact of i class cluster, so ω ik=0.
If sample distribution is extremely unbalanced, there is Δ ik< 0, because ω ik∈ [0,1], so need be by Δ ikproject to be greater than 0 interval and guarantee k characteristic parameter of each sample and the distance scale of the hard dividing regions of i class constant, so utilize following formula adjustment Δ ik:
&Delta; ik = &Delta; ik - min k ( &Delta; ik ) + min k ( &Delta; ik > 0 ) - - - ( 12 )
After adjustment, utilize feature weight formula to calculate new ω ik;
Step 5: calculate cluster centre a according to following formula ik:
a ik = &Sigma; j = 1 n &mu; ij m ( x jk - &eta; i X k &OverBar; ) &Sigma; j = 1 n &mu; ij m ( 1 - &eta; i ) - - - ( 13 )
Step 6: make iterations p=p+1, until otherwise forward step 2 to;
Step 7: the μ that the p time iteration obtained ijoutput, according to j sample belongs to i class.
Pass through above-mentioned steps, the hard actual conditions of dividing of sample have been followed, and take into full account the impact of sample characteristics parameter on all kinds of divisions, make as far as possible to compact in sample class, disperse between class, solve the sample degree of membership problem that is positioned at hard division border, in the unbalanced situation of sample distribution, for noise data and abnormal data, realized more effective division.
Embodiment mono-:
For performance of the present invention is described better, we adopt the inventive method for one of them True Data collection of UCI repository of machine learning databases: Iris data set carries out classification experiments, Fuzzy Exponential m is made as respectively (1.5,2,2.5,3,3.5), iteration error precision gets 10 -6parameter beta in of the present invention bunch of feature computation system CWFCS algorithm is made as respectively (0.005,0.05,0.5,1), for representing the unbalanced situation of sample distribution, Iris data set retains all data of first and second class and chooses at random 10 samples from the 3rd class, totally 110 samples are divided into 3 classes, and wherein the 2nd class and the 3rd class have intersection, adopt the cluster result of algorithm of the present invention (being called for short CWFCS algorithm) as shown in Fig. 2~Fig. 6.As can be seen from Figure 2, this algorithm possesses basic the function of convergence, and raw data shown in cluster result and Fig. 2 (a) distributes roughly the same, and Fig. 3~Fig. 6 shows that distance between three class cluster centres is along with β changes and changes.When β is reduced to 0.05 by 1, system ambiguous degree increases, and shows as three class cluster centres and draws close gradually; Because the 3rd class sample number is few more than first and second class, and also and Equations of The Second Kind have overlapping, in order to make to compact in sample class, also make to scatter between class large as far as possible simultaneously, so when β gets 0.005, first and third class centre distance increases a bit on the contrary a little with second and third relative β=0.05 of class centre distance o'clock; The sample that Fig. 3~Fig. 6 (b) provides is divided effect firmly, β reduces to 0.005 gradually by 1, corresponding hard in 110 samples to divide sample number be 79,64,42,0, and this sample that shows that algorithm has herein retained FCS algorithm is divided characteristic firmly, and β more large sample firmly to divide degree higher; Fig. 3~Fig. 6 (c) is cluster centre variable quantity, can find out that algorithm the convergence speed is fast, iteration efficiency is high herein; This algorithm makes to scatter in bunch characteristic weighing class of sample and between as far as possible little and bunch characteristic weighing class, scatters greatly as far as possible, if each cluster centre is overstepping the bounds of propriety, loose between bunch characteristic weighing class, to scatter the fuzzy division degree of less sample higher.Above-mentioned experimental result shows, this algorithm clustering performance is good, and fast convergence rate, iteration efficiency are high.
The impact of the different values of Fig. 7 display parameter α, β, m on cluster.β is less, and mistake minute rate is larger; No matter β gets any value, to same β, and α=2, { during 1.5,2}, when average mistake minute rate minimum and β < 0.5, algorithm is more responsive to α, m value for m ∈.Fig. 7 (a) β=1, during α > 3, is that the larger mistake of α minute rate is less during m round numbers (2,3), otherwise is that the less mistake of α minute rate is less; α < 0 mistiming minute rate is along with α diminishes and diminishes, and m affects not quite.Fig. 7 (b)~(d) show that the trend that algorithm is affected by α, m is basically identical, has the larger mistake of m minute rate larger to a certain α when β < 1; To a certain m (not considering the optimal situation of α=2), if 0 larger mistake of α of α > divides rate less, if 0 less mistake of α of α < divides rate less.
Embodiment bis-:
In order to verify superiority of the present invention, we test Iris data set by FCS, WFCM and tri-methods of CWFCS provided by the invention respectively.
In experiment, Fuzzy Exponential m is made as respectively (1.5,2,2.5,3,3.5) mistake in experiment! Do not find Reference source., iteration error precision gets 10 -6, the parameter beta in CWFCS algorithm is made as respectively (0.005,0.05,0.5,1); Experiment repeats 100 times, gets optimal result and average result.By accuracy (Accuracy), iterations (Iter), execution time (Time) three indexs, carry out measure algorithm optimal performance, with Average Accuracy (avg_Accuracy, correct sample number/the total sample number of dividing), mean iterative number of time (avg_Iter) and average execution time (avg_Time) carry out measure algorithm overall performance, best and average result is as shown in table 1 in the cluster result of three kinds of algorithms:
Algorithm Accury IterNO Time avg_Accury avg_Iterno avg_Time
FCS 0.754545 28 0.028236 0.689091 35 0.193956
WFCM 0.854545 30 0.103216 0.852424 29 0.090867
CWFCS 0.981818 48 0.055334 0.966364 55 0.063656
Table 1
As can be seen from Table 1, for Iris data set, the high-accuracy of CWFCS algorithm and Average Accuracy are all higher than other two algorithms; The execution time of CWFCS is the shortest, its average execution time than FCS algorithm shortened approximately 67%, Average Accuracy improved 40% than FCS algorithm, than WFCM algorithm time shorten 21%, Average Accuracy improved 23%.
Above-mentioned experimental result is based on obtaining without the Iris data set of making an uproar, and we can also test adding the Iris data set of making an uproar by FCS, WFCM and tri-methods of CWFCS provided by the invention, and experiment parameter and environment are with above-mentioned identical while making an uproar Iris data set for nothing.In the cluster result of three kinds of algorithms preferably and average result as shown in table 2:
Algorithm Accury IterNO Time avg_Accury avg_Iterno avg_Time
FCS 0.754545 40 0.386212 0.720606 62 0.468495
WFCM 0.845455 26 0.109535 0.845455 29 0.101066
CWFCS 0.972727 29 0.031420 0.887879 43 0.049336
Table 2
As can be seen from Table 2, for adding the Iris data set of making an uproar, the high-accuracy of CWFCS algorithm and Average Accuracy are also apparently higher than other two algorithms.
Embodiment tri-:
We test Breast Cancer data set by FCS, WFCM and tri-methods of CWFCS provided by the invention respectively again, Breast Cancer data set has 30 attributes, unbalanced for representing sample distribution, 10 samples of the random selection of the first kind, Equations of The Second Kind has 367 samples, and result is as shown in table 2.Table 3 can find out that CWFCS algorithm performance is the most stable, and iterations is slightly higher than WFCM algorithm, and the execution time, clustering precision was higher than other two kinds of algorithms within 0.1 second.
Algorithm Accury IterNO Time avg_Accury avg_Iterno avg_Time
FCS 0.737401 45 0.827577 0.737401 43 0.533281
WFCM 0.819629 11 0.026210 0.767109 11 0.030475
CWFCS 0.965517 13 0.074786 0.960212 12 0.075808
Table 3
Embodiment tetra-:
We use respectively FCS, WFCM and tri-methods of CWFCS provided by the invention to test aerial engine air passage emulated data collection (add and make an uproar) again, and result is as shown in table 4.GasPath data set is aerial engine air passage data, comprises DEGT, DNH, tri-characteristic parameters of DFF, and wherein health data sample is totally 200, and fault data sample is random selects 5.
Algorithm Accury IterNO Time avg_Accury avg_Iterno avg_Time
FCS 0.614634 24 0.290102 0.614634 24 0.181671
WFCM 0.6 19 0.046147 0.6 21 0.052607
CWFCS 0.917073 15 0.023733 0.86878 23 0.033184
Table 4
As seen from Table 4, for GasPath data set, for being subject to the data of noise pollution to have good robustness in engineering application, and more can divide accurately data, for such data, utilizing in sample class diffusive between compactness and class to carry out the algorithm accuracy rate of cluster will be higher than the WFCM algorithm of only considering compactness in class.
Embodiment five:
The present invention also provides the concrete application process in Industry Control of the present invention:
First, must carry out for the important design parameter in Industry Control status surveillance (various kinds of sensors need to be set conventionally to obtain comprehensive data), after the data that acquisition sensor collects, by CWFCS method provided by the invention (step 1~seven), the data that gather are classified, then according to the current state of classification results judgement commercial unit or technique.For example by sensor, aeromotor is carried out to status surveillance, by the data that gather are classified (by CWFCS method provided by the invention, step 1~seven), judge aeromotor current whether be unhealthy status.
The disclosed technological means of the present invention program is not limited only to the disclosed technological means of above-mentioned embodiment, also comprises the technical scheme being comprised of above technical characterictic combination in any.It should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention, can also make some improvements and modifications, these improvements and modifications are also considered as protection scope of the present invention.

Claims (4)

1. the fuzzy distribution clustering method that compacts of bunch characteristic weighing, is characterized in that, comprises the steps:
Step 1: arrange degree of membership exponent m, characteristic weighing index α ∈ [10 ,-1] ∪ (1,10], β ∈ 0.005,0.05,0.5,1}, primary iteration number of times p=0 and iteration error ε > 0, generate initial cluster center ai at random, (s is characteristic parameter number);
Step 2: according to following formula design factor η i:
&eta; i = &beta; 4 min i &NotEqual; i &prime; | | a i - a i &prime; | | 2 max t | | a t - X &OverBar; | | 2
Wherein, for sample average;
Step 3: according to following formula new samples degree of membership μ more ij:
&mu; ij = ( &Sigma; k = 1 s &omega; ik &alpha; ( | | x jk - a ik | | 2 - &eta; i | | a ik - X k &OverBar; | | 2 ) ) 1 1 - m &Sigma; t = 1 c ( &Sigma; k = 1 s &omega; tk &alpha; ( | | x jk - a tk | | 2 - &eta; t | | a tk - X k &OverBar; | | 2 ) ) 1 1 - m
Note
&Delta; ij = &Sigma; k = 1 s &omega; ik &alpha; ( | | x jk - a ik | | 2 - &eta; i | | a ik - X k &OverBar; | | 2 )
As sample point x jwhen existence drops on hard division border, Δ now ij=0, guaranteeing under the prerequisite that each sample point is constant with respect to the distance scale of i class, to all Δs ij>=0 utilize P (Δ ij) adjust:
&Delta; ij = P ( &Delta; ij &GreaterEqual; 0 ) = &Delta; ij + rand * min j ( &Delta; ij > 0 ) ( j = 1 , . . . , n )
After adjustment, utilize following formula to calculate new μ ij:
&mu; ij = &Delta; ij 1 1 - m &Sigma; t = 1 c &Delta; tj 1 1 - m
Because there is sample point x jdrop in the hard zoning of i class, so have μ ij< 0, therefore to μ ijcarrying out hard division adjusts:
&mu; ij = 1 , &Delta; ij < 0 &mu; i &prime; j = 0 , i &prime; &NotEqual; i
Step 4: according to following formula calculated characteristics weights omega ik:
&omega; ik = ( &Sigma; j = 1 n &mu; ij m ( | | x jk - a ik | | 2 - &eta; i | | a ik - X k &OverBar; | | 2 ) ) 1 1 - &alpha; &Sigma; t = 1 s ( &Sigma; j = 1 n &mu; tj m ( | | x jt - a it | | 2 - &eta; i | | a it - X t &OverBar; | | 2 ) ) 1 1 - &alpha;
Note
&Delta; ik = &Sigma; j = 1 n &mu; ij m ( | | x jk - a ik | | 2 - &eta; i | | a ik - X k &OverBar; | | 2 )
If Δ ik< 0, because ω ik∈ [0,1], so need be by Δ ikproject to be greater than 0 interval and guarantee k characteristic parameter of each sample and the distance scale of the hard dividing regions of i class constant, so utilize following formula adjustment Δ k:
&Delta; ik = &Delta; ik - min k ( &Delta; ik ) + min k ( &Delta; ik > 0 )
After adjustment, utilize feature weight formula to calculate new ω ik;
Step 5: calculate cluster centre a according to following formula ik:
a ik = &Sigma; j = 1 n &mu; ij m ( x jk - &eta; i X k &OverBar; ) &Sigma; j = 1 n &mu; ij m ( 1 - &eta; i )
Step 6: make iterations p=p+1, until otherwise forward step 2 to;
Step 7: the μ that the p time iteration obtained ijoutput, according to j sample belongs to i class.
2. the fuzzy distribution clustering method that compacts of according to claim 1 bunch of characteristic weighing, is characterized in that: described sample degree of membership μ ijwith feature weight ω ikcalculate as follows:
Set up objective function:
J CWFCS = &Sigma; i = 1 c &Sigma; j = 1 n &Sigma; k = 1 s &mu; ij m &omega; ik &alpha; | | x jk - a ik | | 2 - &Sigma; i = 1 c &Sigma; j = 1 n &Sigma; k = 1 s &eta; i &mu; ij m &omega; ik &alpha; | | a ik - X k &OverBar; | | 2
The FCS clustering problem of bunch characteristic weighing is expressed as follows:
min J CWFCS s . t . &Sigma; i = 1 c &mu; ij = 1 , &Sigma; k = 1 s &omega; ik = 1
Utilize method of Lagrange multipliers to obtain:
L = &Sigma; i = 1 c &Sigma; j = 1 n &Sigma; k = 1 s &mu; ij m &omega; k &alpha; | | x jk - a ik | | 2 - &Sigma; i = 1 c &Sigma; j = 1 n &Sigma; k = 1 s &eta; i &mu; ij m &omega; ik &alpha; | | a ik - X k &OverBar; | | 2 - &Sigma; j = 1 n ( &lambda; i ( &Sigma; i = 1 c &mu; ij - 1 ) ) - &Sigma; i = 1 c ( &lambda; i ( &Sigma; k = 1 s &omega; ik - 1 ) )
In above formula, λ i, λ jit is Lagrange multiplier;
According to above formula respectively to μ ij, ω ik, λ i, λ j, to ask local derviation and make local derviation result be zero to obtain μ ij, ω ik.
3. fuzzy the compacting based on bunch characteristic weighing scattered the skewness weighing apparatus industrial data sorting technique of clustering method, comprise the steps: to obtain after the data that sensor collects, by claim, require the fuzzy distribution clustering method that compacts of bunch characteristic weighing described in 1 or 2 to classify to the data that gather, then according to the current state of classification results judgement commercial unit or technique.
4. fuzzy the compacting based on bunch characteristic weighing according to claim 3 scattered the skewness weighing apparatus industrial data sorting technique of clustering method, what comprise the steps: described sensor collection is aeromotor status data, judgement be the health status of aeromotor.
CN201410413719.8A 2014-08-20 2014-08-20 A kind of fuzzy distribution clustering method that compacts of cluster characteristic weighing Active CN104182511B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410413719.8A CN104182511B (en) 2014-08-20 2014-08-20 A kind of fuzzy distribution clustering method that compacts of cluster characteristic weighing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410413719.8A CN104182511B (en) 2014-08-20 2014-08-20 A kind of fuzzy distribution clustering method that compacts of cluster characteristic weighing

Publications (2)

Publication Number Publication Date
CN104182511A true CN104182511A (en) 2014-12-03
CN104182511B CN104182511B (en) 2017-09-26

Family

ID=51963550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410413719.8A Active CN104182511B (en) 2014-08-20 2014-08-20 A kind of fuzzy distribution clustering method that compacts of cluster characteristic weighing

Country Status (1)

Country Link
CN (1) CN104182511B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127232A (en) * 2016-06-16 2016-11-16 北京市商汤科技开发有限公司 Convolutional neural networks training method and system, object classification method and grader
CN106599618A (en) * 2016-12-23 2017-04-26 吉林大学 Non-supervision classification method for metagenome contigs
CN108628971A (en) * 2018-04-24 2018-10-09 深圳前海微众银行股份有限公司 File classification method, text classifier and the storage medium of imbalanced data sets
CN113345225A (en) * 2021-05-24 2021-09-03 郑州航空工业管理学院 Method and system for predicting real-time road conditions of front roads of logistics vehicles based on V2V communication
CN114073625A (en) * 2021-12-13 2022-02-22 曲阜师范大学 Electroencephalogram control electric wheelchair capable of achieving autonomous navigation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103680158A (en) * 2013-10-14 2014-03-26 长沙理工大学 Dynamic division method for control subarea based on C-mean fuzzy clustering analysis
CN104008197A (en) * 2014-06-13 2014-08-27 南京信息工程大学 Fuzzy compactness and scatter clustering method of feature weighting

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103680158A (en) * 2013-10-14 2014-03-26 长沙理工大学 Dynamic division method for control subarea based on C-mean fuzzy clustering analysis
CN104008197A (en) * 2014-06-13 2014-08-27 南京信息工程大学 Fuzzy compactness and scatter clustering method of feature weighting

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
皋军等: "具有模糊聚类功能的双向二维无监督特征提取方法", 《自动化学报》 *
陈舵等: "一种基于模糊度的聚类有效性函数", 《模式识别与人工智能》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127232A (en) * 2016-06-16 2016-11-16 北京市商汤科技开发有限公司 Convolutional neural networks training method and system, object classification method and grader
CN106127232B (en) * 2016-06-16 2020-01-14 北京市商汤科技开发有限公司 Convolutional neural network training method and system, object classification method and classifier
CN106599618A (en) * 2016-12-23 2017-04-26 吉林大学 Non-supervision classification method for metagenome contigs
CN108628971A (en) * 2018-04-24 2018-10-09 深圳前海微众银行股份有限公司 File classification method, text classifier and the storage medium of imbalanced data sets
CN113345225A (en) * 2021-05-24 2021-09-03 郑州航空工业管理学院 Method and system for predicting real-time road conditions of front roads of logistics vehicles based on V2V communication
CN113345225B (en) * 2021-05-24 2023-04-11 郑州航空工业管理学院 Method and system for predicting real-time road conditions of front road of logistics vehicle based on V2V communication
CN114073625A (en) * 2021-12-13 2022-02-22 曲阜师范大学 Electroencephalogram control electric wheelchair capable of achieving autonomous navigation
CN114073625B (en) * 2021-12-13 2023-12-08 曲阜师范大学 Autonomous navigation's electronic wheelchair of brain electricity control

Also Published As

Publication number Publication date
CN104182511B (en) 2017-09-26

Similar Documents

Publication Publication Date Title
CN104182511A (en) Cluster-feature-weighted fuzzy compact scattering and clustering method
CN111062425B (en) Unbalanced data set processing method based on C-K-SMOTE algorithm
Yao et al. A modified multi-objective sorting particle swarm optimization and its application to the design of the nose shape of a high-speed train
CN108520325A (en) A kind of integral life prediction technique based on acceleration degraded data under changeable environment
CN108549904A (en) Difference secret protection K-means clustering methods based on silhouette coefficient
CN103489046A (en) Method for predicting wind power plant short-term power
CN104732545A (en) Texture image segmentation method combined with sparse neighbor propagation and rapid spectral clustering
CN106792749B (en) wireless sensor network node deployment method based on CFD and clustering algorithm
Zhao et al. Mutation grey wolf elite PSO balanced XGBoost for radar emitter individual identification based on measured signals
CN108898273A (en) A kind of user side load characteristic clustering evaluation method based on morphological analysis
Pei et al. The clustering algorithm based on particle swarm optimization algorithm
CN106357458B (en) Network element method for detecting abnormality and device
CN108519760A (en) A kind of Primary Processing stable state recognition methods based on detection of change-point theory
CN105956318A (en) Improved splitting H-K clustering method-based wind power plant fleet division method
CN106569981A (en) Statistic parameter determination method and system applicable to large-scale data set
CN101702172A (en) Data discretization method based on category-attribute relation dependency
CN104008197B (en) A kind of fuzzy distribution clustering method that compacts of characteristic weighing
CN116244612B (en) HTTP traffic clustering method and device based on self-learning parameter measurement
CN107577896A (en) Equivalence method is polymerize based on the theoretical wind power plant multimachines of mixing Copula
Shuai et al. Integrated parallel forecasting model based on modified fuzzy time series and SVM
CN107169522A (en) A kind of improvement Fuzzy C means clustering algorithm based on rough set and particle cluster algorithm
CN105760478A (en) Large-scale distributed data clustering method based on machine learning
CN108717444A (en) A kind of big data clustering method and device based on distributed frame
CN104268564A (en) Sparse gene expression data analysis method based on truncated power
CN107463528A (en) The gauss hybrid models split-and-merge algorithm examined based on KS

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200601

Address after: 210000 room 602, 6th floor, building 02, No.180, software Avenue, Yuhuatai District, Nanjing City, Jiangsu Province

Patentee after: Nanjing ditavi Data Technology Co., Ltd

Address before: 210044 Nanjing Ning Road, Jiangsu, No. six, No. 219

Patentee before: NANJING University OF INFORMATION SCIENCE & TECHNOLOGY

TR01 Transfer of patent right