CN110096603A - A kind of multiple view fuzzy clustering method based on FCS - Google Patents

A kind of multiple view fuzzy clustering method based on FCS Download PDF

Info

Publication number
CN110096603A
CN110096603A CN201910205968.0A CN201910205968A CN110096603A CN 110096603 A CN110096603 A CN 110096603A CN 201910205968 A CN201910205968 A CN 201910205968A CN 110096603 A CN110096603 A CN 110096603A
Authority
CN
China
Prior art keywords
view
fcs
matrix
clustering
clustering method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910205968.0A
Other languages
Chinese (zh)
Inventor
刘永利
郭倩倩
刘静
郭呈怡
韩秀娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Technology
Original Assignee
Henan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Technology filed Critical Henan University of Technology
Priority to CN201910205968.0A priority Critical patent/CN110096603A/en
Publication of CN110096603A publication Critical patent/CN110096603A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The characteristics of the invention proposes a kind of multiple view fuzzy clustering method based on FCS, on the one hand this method maintains multiple view clustering method, helps to improve cluster accuracy rate according to the importance Cooperative Clustering of different views when cluster;On the other hand it inherits FCS method to comprehensively consider in class between compactness and class the advantages of separation property, is capable of the robustness of Enhancement Method.In cluster process, this method adjusts the consistency of cluster result between each view by weight, and can learn automatically and update weight.In order to assess method, has chosen 4 multiple view data sets and tested, and be divided into and be not added with noise data and addition two kinds of situations of noise data.The experimental results showed that performance of the multiple view fuzzy clustering method based on FCS in terms of cluster accuracy rate and robustness is superior to control methods.

Description

FCS-based multi-view fuzzy clustering method
Technical Field
The invention relates to a fuzzy clustering method, in particular to a multi-view fuzzy clustering method based on FCS, belonging to the field of computer data mining application.
Background
The clustering technology can reveal the hidden knowledge and structure behind the data under the unsupervised condition, so the clustering technology is concerned under the era background of big data. According to different membership value ranges, the clustering technology can be divided into hard clustering and fuzzy clustering. In the hard clustering method, one sample can only completely belong to one cluster, such as the K-Means method; and in the fuzzy clustering, the membership value range is expanded from 0 and 1 binary values of the hard clustering to a real number interval [0,1], namely, one sample is allowed to simultaneously belong to a plurality of clusters with a certain probability. In consideration of the diversity characteristic of the sample theme, the Fuzzy clustering method represented by the FCM (Fuzzy C-Means) method is more advantageous in the aspect of clustering accuracy. However, FCM and its derivatives are sensitive to noisy data, making the effectiveness of the method susceptible. In order to improve the robustness of the method, a Fuzzy clustering method (FCS, Fuzzy computing and Separation) based on intra-class Compactness and inter-class separability is proposed, and the method adds the inter-class distance as a penalty term into a target function, so that the experimental effect is ideal.
Fuzzy clustering methods, including FCM and FCS methods, typically have only a single representation or view of the cluster object. However, real world data is increasingly complex, and the characteristics of multiple views are increasingly prominent, that is, the observation results may be quite different when the same sample is observed from different angles. The multi-view data has multiple representations and the different representations may complement each other. Although data can be clustered starting from each representation, the results are compared across facets; only collaborative clustering of multiple representations of data can form a complete understanding of the data.
The integration of clustering results on multiple representations based on traditional clustering methods is the main method for achieving multi-view data clustering. In combination with a spectral clustering method, a multi-view spectral clustering method based on joint regularization is proposed. An RMKMC (Robust Multi-View K-Means Clustering) method has also been proposed to effectively process large-scale data. In addition, researchers have proposed a MinimaxFCM method that uses a minimization maximum method to reduce the weight of high-weight views, thereby achieving the balance of clustering results among the views. The multi-view clustering method generates a collaborative clustering result through different representations of the comprehensive data, and is beneficial to improving the clustering accuracy; however, noisy data has not received much attention, and thus, when noise occurs, these methods are susceptible to interference when updating the centroid, thereby affecting clustering effectiveness.
Disclosure of Invention
In order to solve the technical problem, the invention provides a multi-view fuzzy clustering method based on FCS, which comprises the following steps:
(1) input multiview dataset X ═ X(1),X(2),…,X(P)P is the number of views, X is a sample matrix composed of all views, and X in the sample matrix X isiThe corresponding row vector is the attribute characteristic of the ith sample, i is more than or equal to 1 and less than or equal to N, and N is the total number of the samples; x(p)Is a sample matrix of the P view, P is more than or equal to 1 and less than or equal to P, and a sample matrix X(p)Middle data point xi (p)Normalizing X, inputting the number K of clusters, inputting parameters gamma, m and β and inputting a threshold parameter ξ;
(2) initializing the centroid of each view according to the data set X obtained in the step 1) to obtain a centroid matrix V ═ V(1),V(2),…,V(P)In which the centroid matrix V(p)Middle Vc (p)The corresponding row vector is the attribute characteristic of the c-th centroid in the p-th view, c is more than or equal to 1 and less than or equal to K, the weight of each view is initialized to obtain a weight vector α, and t is initialized to be 0;
(3) alternatingIteratively computing inter-class separation vector η(p)Updating the membership matrix U, the weight vector α and the centroid matrix V until a convergence condition is reached;
(4) calculating the cluster to which each sample belongs according to the membership matrix U obtained in the step 3), and representing the clustering result by using the vector q.
Further, the step (1) comprises the following steps:
1) inputting a clustering object as a multi-view dataset X ═ X(1),X(2),…,X(P)};
2) Normalizing the data set X input in step 1.1), X ═ X (X-I × X)min)/(I×(xmax-xmin) I) is an N-dimensional column vector with all values of 1, xminAnd xmaxRespectively representing vectors formed by minimum values and maximum values of each dimension of the features in the data set X;
3) inputting the cluster number K, inputting parameters gamma, m and β, and inputting a threshold parameter ξ.
Further, the step (2) comprises the following steps:
1) randomly selecting K samples as initial centroids according to the data set X obtained in the step 2) in the claim 2 to obtain a centroid matrix V;
2) initializing weights α for each view(p)1/P, a weight vector α is obtained.
4. The FCS-based multi-view fuzzy clustering method of claim 1, wherein: the step (3) comprises the following steps:
5)t=t+1;
6) judging whether the convergence condition ║ V is satisfied(t)-V(t-1)║<ξ, if yes, stopping iteration, otherwise, continuing iteration.
Further, in step (4), sample x is selectediThe most subordinate cluster is taken as the class label, and the calculation formula is as follows:
the method expands the FCS method into multi-view data processing, and can effectively process the multi-view clustering problem. The MvFCS gives different weights to the views according to the importance of the views, and the clustering accuracy is easily improved for the collaborative clustering of the views; meanwhile, the robust advantage of the FCS method is inherited by the FCS-based multi-view fuzzy clustering method, and the influence of noise on each centroid is reduced.
Detailed Description
The FCM method is sensitive to noise data, and the intra-class compactness and the inter-class separability are considered at the same time, so that the robustness of the method is enhanced. Let data set X contain N samples, denoted X ═ X1,…,xNOf characteristic dimension D, v1,…,vKAnd expressing K multiplied by N membership degree matrixes by using U for K centroids, wherein the objective function and the constraint of the K centroids are respectively expressed by the formula (1) and the formula (2).
Where m is a parameter controlling the degree of blurring, uciRepresents the degree of membership of the ith object to the c-th cluster, and uci∈[0,1]Equation (1) is divided into two terms, the first term representing the intra-class distance, the second term representing the inter-class separability, x is the center of all samples, η is a K-dimensional vector, in the clustering process,
the FCS method takes into account both the intra-class distance and the inter-class distance, i.e. it optimizes the objective function J by decreasing the intra-class distance and increasing the inter-class distanceFCSThe influence caused by the noise data is effectively reduced.
In order to realize multi-view data clustering and reduce the sensitivity degree to noise data, an FCS method is combined with multi-view fuzzy clustering, and a multi-view fuzzy clustering method based on FCS is provided. The method gives different weights to each view, and combines the results of the views to perform collaborative clustering. In the clustering process, the view with the highest weight is optimized by using a minimization maximum method, so that the consistency of the membership degree matrix among the views is realized.
For a multi-view dataset with P views and N objects X ═ X(1),…,X(P)And (5) an objective function of the FCS-based multi-view fuzzy clustering method is shown as a formula (4), and formulas (6) and (7) are constraint conditions.
Wherein
Wherein,denotes the jth sample, V, in the pth view(P)Is a K × D(P)Centroid matrix of D(p)Representing the number of features of the p-th view sample,representing the c-th centroid of the p-th view,center point distribution vector representing all samples of the p view, α(p)The weight of the p view is α(p)∈[0,1]The parameter γ ∈ [0,1) is used to control the weight distribution of the different views, η(p)K-dimensional vectors representing the separation between classes in the pth view and are automatically updated according to the location of each centroid.
FCS-based multi-view fuzzy clustering method generates multi-view clustering result by optimizing objective function formula (4) limited to constraint formula (6) and formula (7). in clustering process, first, view weight α is used(p)Integrating information of different views, and then returning clusters with different views consistent with each other by using a membership degree matrix UAnd (6) obtaining the result.
Among the variables to be solved, ηc (p)The solving formula of (2) is given (as shown in formula (8)), and the membership degree matrix U and the centroid matrix V are given(p)And the expression of the weight vector α is solved by adopting a Lagrangian multiplier method, and the constructed Lagrangian function is as follows:
wherein JMvFCSAs shown in formula (4), λiAnd μ is the lagrange multiplier.
Taking the centroid matrix and the weight as constants, and the membership uciConsidered as variables, for formula (9) with respect to uciTaking the derivative and making the derivative equal to 0, one can obtain:
note the bookCan be obtained.
For a certain object xiIf there is a centroid vcSo that
I.e. the molecule of formula (11) is less than or equal to 0, then uci1, and for any t (t ∈ {1,2, …, K }, t ≠ c), there is uti=0。
The weight and the membership are regarded as constants, and the centroid matrixAs variables, for the formula (9)Taking the derivative and making the derivative equal to 0, one can obtain:
from the formula (13)Is shown in equation (14).
Considering the membership and centroid matrices as constants, the weight vector α(p)Considered as variables, for equation (9) with respect to α(p)Taking the derivative and making the derivative equal to 0, one can obtain:
the weights α can be obtained from equations (15) and (6)(p)The update formula of (2) is:
in the clustering process, the equations (8), (11), (14) and (16) are updated iteratively, and a clustering result of the multi-view fuzzy clustering method based on FCS can be obtained.
The detailed steps of the FCS-based multi-view fuzzy clustering method are as follows: first, a multi-view data set X ═ X is input(1),X(2),…,X(P)H, number of clusters K, parameters γ, m, and β, threshold parametersNumber ξ, and then initialize the centroid matrix V for each view(p)The weight vector α outputs a clustering result vector label, a membership matrix U and a centroid moment and centroid matrix V of each view after clustering(p)
The time complexity of the FCS-based multi-view fuzzy clustering method mainly depends on four parts, namely an update η, a membership matrix, a centroid matrix and a view weight vector, wherein the time complexity of updating η once is O (P.K)2) The time complexity of updating the membership degree, the centroid matrix and the view weight once is O (P.N.K), so the total time complexity of the method is O (t (3. P.N.K + P.K)2) T) is the number of iterations.
In order to verify the effectiveness of the FCS-based multi-view fuzzy clustering method, 4 multi-view data sets are selected for experiments, and the information of each data set is shown in Table 1.
For comparison, three comparison methods, namely FCS, RMKMC and MinimaxFCM, are selected for the experiment, wherein FCS is a single-view clustering method, and RMKMC and MinimaxFCM are multi-view clustering methods. The experimental procedure first clusters each view individually with FCS to obtain clustering results for each view, records the worst single view result as FCS1 and the best single view result as FCS2, and then clusters FCS by putting the features of each view together, and records the result as FCS. The experimental results were evaluated using three criteria of Accuracy (Accuracy), F-Measure and NMI.
In order to evaluate the robustness of the FCS-based multi-view fuzzy clustering method, 10% -15% of noise data is added to each data set after clustering is completed on an original data set, and clustering effects of the methods are compared again. The number of noise samples added to each data set is shown in table 1.
The experimental data included both non-noise-added data and noise-added data, and two sets of experiments were organized accordingly. For the experiment without adding noise data, the effects of the multi-view clustering method and the single-view clustering method are discussed, and the performances of the three multi-view clustering methods of MvFCS, RMKMC and MinimaxFCM are compared; the robustness of the FCS-based multi-view fuzzy clustering method was verified for experiments with the addition of noisy data portions.
The experimental results on the IS, MF, MMKSD and Cal four data sets are shown in tables 2-5, respectively, each containing the results of the two-part experiment with no noise data added and with noise data added.
TABLE 1 characteristics of a multiview dataset
Data set Number of views Number of clusters Number of samples Number of features Number of noise samples
Image Segmentation(IS) 2 7 210 19 30
Multiple Features(MF) 6 10 2000 649 200
MEU-Mobile KSD(MMKSD) 6 7 357 71 50
Caltech101(Cal) 6 7 769 3766 80
TABLE 2 IS data set test results
TABLE 3 MF data set of experimental results
TABLE 4 MMKSD data set Experimental results
TABLE 5 Cal data set Experimental results
Table 2 gives the experimental results on the IS data set. When no noise data is added, the accuracy of FCS2 is 58.09%, the accuracy of FCS is 60%, and the lowest accuracy (RMKMC method) of the three multiview clustering methods is 62.38%, and the highest (multiview fuzzy clustering method based on FCS) is 66.19%. The results show that compared with the single-view method, the multi-view clustering method can generally obtain higher clustering accuracy; meanwhile, compared with two multi-view clustering methods of RMKMC and MinimaxFCM, the multi-view fuzzy clustering method based on FCS can more effectively complete clustering tasks. When noise data is added, the FCS method shows excellent noise resistance, the multi-view fuzzy clustering method based on the FCS inherits the advantages of the FCS method, the accuracy rate is increased by 0.47% and is slightly influenced, the accuracy rate of the RMKMC method is reduced by 4.29%, the accuracy rate of the MinimaxFCM method is reduced by 7.14%, and the RMKMC method and the MinimaxFCM method are relatively greatly influenced.
On the MF dataset (as in table 3), when no noise data is added, the RMKMC method has lower accuracy than the MinimaxFCM and FCS-based multiview fuzzy clustering method, but still better than the single-view clustering method, while the MinimaxFCM method and FCS-based multiview fuzzy clustering method have similar accuracy. When noise data is added, the accuracy of the MinimaxFCM method is reduced by 11.05%, the accuracy of the F-Measure is reduced by 12.50%, the NMI is reduced by 4.99%, and the reduction range of the FCS-based multi-view fuzzy clustering method is 1.30%, 1.25% and 0.93%, respectively.
As shown in table 4, on the MMKSD dataset, when no noise data is added, all three multi-view clustering methods effectively complete the clustering task, and after the noise data is added, the accuracy of the RMKMC method decreases by 15.96%, the F-Measure decreases by 9.84%, the NMI decreases by 10.06%, the decrease of the MinimaFCM method decreases by 17.08%, 12.78% and 9.47%, respectively, and the decrease of the FCS-based multi-view fuzzy clustering method decreases by 0.82%, 0.26% and 0.26%, respectively.
Table 5 gives the experimental results for the Cal data set. When no noise data is added, the NMI value of the FCS1 is only 0.1252, and therefore, the characteristics of the view have a large negative effect on multi-view clustering, which results in that the clustering result of the multi-view is slightly lower than the FCS 2. However, after noise is added, the FCS-based multi-view fuzzy clustering method still suffers less influence than other multi-view clustering methods, and the FCS-based multi-view fuzzy clustering method is proved to have higher robustness.
From the experimental results of the above data sets, it can be concluded that ⑴ can perform clustering more efficiently by the multiview clustering method than the single-view clustering method, and that ⑵ compared to the two multiview clustering methods of RMKMC and MinimaxFCM, MvFCS has not only higher accuracy when no noise data is added, but also exhibits higher robustness after noise data is added.

Claims (5)

1. A multi-view fuzzy clustering method based on FCS is characterized in that: comprises the following steps:
(1) input multiview dataset X ═ X(1),X(2),…,X(P)P is the number of views, X is a sample matrix composed of all views, and X in the sample matrix X isiThe corresponding row vector is the attribute characteristic of the ith sample, i is more than or equal to 1 and less than or equal to N, and N is the total number of the samples; x(p)Is a sample matrix of the P view, P is more than or equal to 1 and less than or equal to P, and a sample matrix X(p)Middle data point xi (p)The corresponding row vector is the firstNormalizing X, inputting the number K of clusters, inputting parameters gamma, m and β, and inputting a threshold parameter ξ;
(2) initializing the centroid of each view according to the data set X obtained in the step 1) to obtain a centroid matrix V ═ V(1),V(2),…,V(P)In which the centroid matrix V(p)Middle Vc (p)The corresponding row vector is the attribute characteristic of the c-th centroid in the p-th view, c is more than or equal to 1 and less than or equal to K, the weight of each view is initialized to obtain a weight vector α, and t is initialized to be 0;
(3) computing inter-class separation vectors η on alternating iterations(p)Updating the membership matrix U, the weight vector α and the centroid matrix V until a convergence condition is reached;
(4) calculating the cluster to which each sample belongs according to the membership matrix U obtained in the step 3), and representing the clustering result by using the vector q.
2. The FCS-based multi-view fuzzy clustering method of claim 1, wherein: the step (1) comprises the following steps:
1) inputting a clustering object as a multi-view dataset X ═ X(1),X(2),…,X(P)};
2) Normalizing the data set X input in the step 1.1),
X=(X-I×xmin)/(I×(xmax-xmin) I) is an N-dimensional column vector with all values of 1, xminAnd xmaxRespectively representing vectors formed by minimum values and maximum values of each dimension of the features in the data set X;
3) inputting the cluster number K, inputting parameters gamma, m and β, and inputting a threshold parameter ξ.
3. The FCS-based multi-view fuzzy clustering method of claim 1, wherein: the step (2) comprises the following steps:
1) randomly selecting K samples as initial centroids according to the data set X obtained in the step 2) in the claim 2 to obtain a centroid matrix V;
2) initializing weights α for each view(p)1/P, a weight vector α is obtained.
4. The FCS-based multi-view fuzzy clustering method of claim 1, wherein: the step (3) comprises the following steps:
1)
2)
3)
4)
5)t=t+1;
6) judging whether the convergence condition ║ V is satisfied(t)-V(t-1)║<ξ, if yes, stopping iteration, otherwise, continuing iteration.
5. The FCS-based multi-view fuzzy clustering method of claim 1, wherein: in the step (4), a sample x is selectediThe most subordinate cluster is taken as the class label, and the calculation formula is as follows:
CN201910205968.0A 2019-03-19 2019-03-19 A kind of multiple view fuzzy clustering method based on FCS Pending CN110096603A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910205968.0A CN110096603A (en) 2019-03-19 2019-03-19 A kind of multiple view fuzzy clustering method based on FCS

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910205968.0A CN110096603A (en) 2019-03-19 2019-03-19 A kind of multiple view fuzzy clustering method based on FCS

Publications (1)

Publication Number Publication Date
CN110096603A true CN110096603A (en) 2019-08-06

Family

ID=67443257

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910205968.0A Pending CN110096603A (en) 2019-03-19 2019-03-19 A kind of multiple view fuzzy clustering method based on FCS

Country Status (1)

Country Link
CN (1) CN110096603A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723540A (en) * 2021-09-02 2021-11-30 济南大学 Unmanned scene clustering method and system based on multiple views

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723540A (en) * 2021-09-02 2021-11-30 济南大学 Unmanned scene clustering method and system based on multiple views
CN113723540B (en) * 2021-09-02 2024-04-19 济南大学 Unmanned scene clustering method and system based on multiple views

Similar Documents

Publication Publication Date Title
CN108280491B (en) K-means clustering method for differential privacy protection
Wang et al. Distance metric learning for soft subspace clustering in composite kernel space
CN112116017A (en) Data dimension reduction method based on kernel maintenance
CN113076970A (en) Gaussian mixture model clustering machine learning method under deficiency condition
CN113408610B (en) Image identification method based on adaptive matrix iteration extreme learning machine
CN111353534B (en) Graph data category prediction method based on adaptive fractional order gradient
CN115840900A (en) Personalized federal learning method and system based on self-adaptive clustering layering
CN110610188A (en) Markov distance-based shadow rough fuzzy clustering method
Zhang et al. Unsupervised EA-based fuzzy clustering for image segmentation
Rodríguez et al. A new fuzzy clustering algorithm for interval-valued data based on City-Block distance
CN111144443A (en) Method for improving ultralimit learning machine to solve classification problem based on intelligent optimization algorithm
CN118351371A (en) Small sample image classification method and system based on countermeasure training and meta learning
CN110096603A (en) A kind of multiple view fuzzy clustering method based on FCS
CN114399653A (en) Fast multi-view discrete clustering method and system based on anchor point diagram
CN118133931A (en) Safe and efficient federal learning system and method based on generation of countermeasure network
CN104881688A (en) Two-stage clustering algorithm based on difference evolution and fuzzy C-means
Honda et al. Item membership fuzzification in fuzzy co-clustering based on multinomial mixture concept
CN117056763A (en) Community discovery method based on variogram embedding
CN116912567A (en) Image classification method based on pseudo tag semi-supervised learning
CN111126467A (en) Remote sensing image space spectrum clustering method based on multi-target sine and cosine algorithm
Coulson et al. Growing hierarchical self-organising representation map (GHSORM)
Kanzawa A maximizing model of Bezdek-like spherical fuzzy c-means clustering
De Castro et al. An evolutionary clustering technique with local search to design RBF neural network classifiers
Shiao et al. Implementation and comparison of SVM-based multi-task learning methods
CN114037931A (en) Multi-view discrimination method of self-adaptive weight

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination