CN115017988A - Competitive clustering method for state anomaly diagnosis - Google Patents

Competitive clustering method for state anomaly diagnosis Download PDF

Info

Publication number
CN115017988A
CN115017988A CN202210619146.9A CN202210619146A CN115017988A CN 115017988 A CN115017988 A CN 115017988A CN 202210619146 A CN202210619146 A CN 202210619146A CN 115017988 A CN115017988 A CN 115017988A
Authority
CN
China
Prior art keywords
cluster
sample
class
clustering
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210619146.9A
Other languages
Chinese (zh)
Inventor
王培红
徐璐璐
汤若鑫
高俊彦
陈文菲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210619146.9A priority Critical patent/CN115017988A/en
Publication of CN115017988A publication Critical patent/CN115017988A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a competitive clustering method for state anomaly diagnosis, which relates to the technical field of data mining and solves the technical problems that the conventional clustering method cannot effectively reserve an abnormal small sample class and has poor clustering performance. And as the iteration is carried out, calculating the base number of each class cluster so as to eliminate false class clusters smaller than a set threshold value. Through competition among clusters, the number of the clusters is gradually reduced to achieve stability, when the position of a cluster center is not changed any more or the number of iterations is reached, the algorithm is terminated, and a result is output, so that clustering of a data set is realized, the clustering performance is improved, and application based on data sample clustering characteristics is expanded.

Description

Competitive clustering method for state anomaly diagnosis
Technical Field
The application relates to the technical field of data mining, in particular to a big data processing technology, and particularly relates to a competitive clustering method for state anomaly diagnosis.
Background
Clustering is one of the most common techniques in the field of data mining for finding unknown object classes in a data set. The cluster analysis has wide application prospect in the fields of customer segmentation, pattern recognition, medical decision, abnormality detection and the like. The traditional clustering algorithm can well process the clustering problem of balanced data, but many unbalanced data exist in real life, and the data quantity which shows normal performance is far larger than the data quantity which shows abnormal performance in the fields of medical diagnosis, fault diagnosis and the like. The unbalanced data set is characterized in that the number and the density of data objects belonging to a certain category in the same data set are greatly different from those of data objects of other categories, generally, the class with the larger number of data objects is called a large class, and the class with the smaller number of data objects is called a small class. The current clustering method mainly reflects the clustering characteristics of the balanced sample class, while the abnormal (or fault) small sample class is often ignored or is used to divide part of objects in the large class into small classes, so that the obtained classes have relatively uniform scales, which limits the application based on the data sample clustering characteristics.
In order to solve the clustering problem of unbalanced data, scholars propose various methods from different angles, including three methods of data preprocessing, multi-center point and optimization objective function. The first method is data preprocessing, and the method carries out under-sampling and over-sampling on a data set and then carries out clustering, but the under-sampling method only adopts a subset which belongs to a part of the large class and is representative, so that a large amount of effective information in the large class is ignored, and the clustering effect is influenced; the oversampling method performs data analysis by increasing the number of objects in the subclass, so that the original data set reaches an equilibrium state, but on one hand, overfitting may be caused, and on the other hand, noise may be brought to the data set.
The second method is a multi-center method, which solves the problem of 'uniform effect' of the fuzzy clustering algorithm based on the multi-center angle, and the idea is to use a plurality of class centers to replace a single class center to represent a class. However, for some unbalanced data clustering problems with extremely uneven distribution of large categories, the method cannot fully reflect data distribution characteristics, and the effectiveness of the algorithm is reduced.
The third method is a method for optimizing an objective function, which proposes a new algorithm from the viewpoint of objective function optimization, and optimizes the objective function by deducing corresponding clusters, so as to solve the problem of uniform effect. Compared with the prior clustering algorithm, the method is a direct new method and has certain practicability, but the method generally relates to the solution of target function parameters, belongs to the problem of nonlinear function optimization, and is difficult to obtain a global optimal solution, so that the clustering result of the algorithm has relatively large randomness, and the clustering precision of the algorithm is influenced.
At present, an effective clustering method for small sample classes, which can automatically calculate the number of the class clusters and effectively retain the abnormality (or fault), does not exist.
Disclosure of Invention
The application provides a competitive clustering method for state anomaly diagnosis, which aims to effectively retain small sample classes of anomalies (or faults), simultaneously realize automatic calculation of the number of class clusters and improve clustering performance.
The technical purpose of the application is realized by the following technical scheme:
a competitive clustering method for state anomaly diagnosis, comprising:
s1: inputting a data set U, and setting the number c of initial cluster-like groups as c max Determining fuzzy weighting index m, initial value eta 0 The iteration time constant tau and the cluster-like cardinality threshold N are generated randomly to generate a first cluster center set V1, and the initial sample membership degree of a data set U is obtained through a fuzzy C-means clustering algorithm; wherein, U ═ { x ═ x j |j=1,...,n},x j Representing samples, x, in a data set U j E is U, and n represents the total number of samples of U; v1 ═ V i 1., c }, c representing the total number of cluster centers of dataset U, v i A cluster center representing an i-th class cluster;
s2: calculating a sample x j And cluster center v i Obtaining a proportional coefficient alpha according to the Euclidean distance and the initial sample membership degree, and constructing a target function of a competitive clustering algorithm according to the Euclidean distance and the proportional coefficient alpha;
s3: calculating to obtain sample membership through the target function;
s4: calculating the cardinality N of the ith class cluster i If N is present i If the number of the clusters is less than the cardinal number threshold value N, eliminating the clusters to obtain a sample membership degree and a second cluster center set V2' corresponding to the reserved clusters;
s5: calculating the cluster compactness C of each cluster according to the sample membership and the second cluster center set V2 i Then according to cluster compactness C i And updating the sample membership degree and the cluster center to obtain a final sample membership degree and a second cluster center set V2 of the iteration.
S6: when the position of the cluster center is not changed any more or reaches the maximum iteration times, outputting a final result to finish clustering; otherwise, steps S2 to S5 are repeated.
Further, in step S2, a scaling factor α is obtained according to the euclidean distance and the initial sample membership degree, and is expressed as:
Figure BDA0003674412930000021
η(k)=η 0 exp(-k/τ);
wherein,
Figure BDA0003674412930000022
represents a sample x j To cluster heart v i The distance of (a), or Euclidean distance; u. of ij Representing the membership degree of the jth sample belonging to the ith class cluster; m represents a fuzzy weighting index, and 2 is taken; k represents the number of iterations;
the objective function is then expressed as:
Figure BDA0003674412930000023
Figure BDA0003674412930000024
further, in step S3, the sample membership degree is obtained by calculating the sample membership degree through the target function by using a lagrange multiplier method, and is expressed as:
Figure BDA0003674412930000031
Figure BDA0003674412930000032
Figure BDA0003674412930000033
further, in the step S5, the cluster compactness C of each cluster class i Expressed as:
Figure BDA0003674412930000034
wherein,
middle T i ={x j |u ij >u lj ;l=1,2,···,c;l≠i};
η j =||x j -v i ||;
Figure BDA0003674412930000035
T i Representing a sample set divided into ith class clusters; | T i L represents the number of the ith class cluster sample set; eta j Represents a sample x j The filtered value of (a); u. u i Representing the i-th class cluster sample set and the cluster center v i Average value of the distances.
Further, in the step S5, according to the cluster compactness C i Updating the sample membership and the cluster center, and expressing as follows:
Figure BDA0003674412930000036
Figure BDA0003674412930000037
Figure BDA0003674412930000038
Figure BDA0003674412930000039
wherein f is i Representing coefficients assigned to the ith class cluster; s i Is the compactness of the normalized class i cluster, S min Is S i Minimum value of (1).
The beneficial effect of this application lies in: on the basis of realizing automatic cluster number calculation, the target function of a competitive clustering algorithm is improved to enable the sample capacity to play a role in a clustering cost function, so that the interference of sample capacity difference on clustering judgment is weakened, a new membership calculation method is obtained, the membership of the membership calculation method to large classes and small classes can be adaptively adjusted, the clustering effect of processing unbalanced data sets by the algorithm is improved, abnormal (or fault) small sample classes are effectively reserved, meanwhile, the automatic cluster number calculation is realized, the clustering performance is improved, and the application based on the data sample clustering characteristics is expanded.
Drawings
FIG. 1 is a flow chart of a method described herein;
fig. 2 is a schematic diagram illustrating a comparison between the clustering result and other clustering algorithms in the embodiment of the present application.
Detailed Description
The technical solution of the present application will be described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart of the method of the present application, which is a competitive clustering method for abnormal condition diagnosis, and selects 3 unbalanced classes in the Aggregation dataset of the UCI standard dataset as the verified dataset U of the present invention, and the method includes the following steps:
s1: inputting a data set U, and setting the number c of initial cluster-like groups as c max 10, determining fuzzy weighting index m 2 and initial value eta 0 1.3, iteration constant tau 10 and cluster cardinality threshold N7, and randomly generating c max And (4) obtaining the initial sample membership degree of the data set U through a fuzzy C-means clustering algorithm by each cluster center.
S2: calculating a sample x j And cluster heart v i And obtaining a proportional coefficient alpha according to the Euclidean distance and the initial sample membership degree, and constructing a target function of a competitive clustering algorithm according to the Euclidean distance and the proportional coefficient alpha.
Euclidean distance d ij The calculation of (d) is expressed as:
Figure BDA0003674412930000041
wherein,
Figure BDA0003674412930000042
represents a sample x j To cluster heart v i The distance of (a), the euclidean distance; p represents x j Of (c) is calculated.
And then according to d obtained ij And u ij The scaling factor α is calculated as:
Figure BDA0003674412930000043
η(k)=η 0 exp(-k/τ)。
the final objective function is expressed as:
Figure BDA0003674412930000044
Figure BDA0003674412930000051
wherein u is ij Representing the membership degree of the jth sample belonging to the ith cluster; m represents a fuzzy weighting index, and 2 is taken; k denotes the number of iterations.
S3: and calculating the sample membership degree through the target function.
Specifically, the sample membership is calculated as:
Figure BDA0003674412930000052
wherein,
Figure BDA0003674412930000053
indicating the cardinality of the i-th class cluster.
S4: calculating the cardinality N of each class cluster i If N is present i And if the sample membership degree is less than the radix threshold value 7, eliminating the class cluster to obtain a sample membership degree and a second cluster center set V2' corresponding to the reserved class cluster.
S5: besides considering the influence of the class size on the objective function, the influence of the sample distribution of each class on the clustering result must be noted. The present application presents a cluster compactness C i The calculation formula of (2) is used for measuring the distribution state of the samples in the class, so as to obtain the final sample membership and the second cluster center set V2, C of the current iteration i Is expressed as:
Figure BDA0003674412930000054
wherein,
middle T i ={x j |u ij >u lj ;l=1,2,···,c;l≠i};
η j =||x j -v i ||;
Figure BDA0003674412930000055
T i Representing a sample set divided into ith class clusters; i T i L represents the number of the ith class cluster sample set; eta j Represents a sample x j The filtered value of (a); mu.s i Representing the i-th class cluster sample set and the cluster center v i Average value of the distances.
It can be seen from the cluster compactness formula that: c i The smaller the value of (A), the more concentrated the class is, the higher the compactness is; otherwise, it indicates that the more dispersed the class, the lower the compactness.
According to cluster compactness C i Updating the sample membership and the cluster center, and expressing as follows:
Figure BDA0003674412930000056
Figure BDA0003674412930000061
Figure BDA0003674412930000062
Figure BDA0003674412930000063
wherein f is i Representing coefficients assigned to the class i cluster; s. the i Is the compactness of the normalized class i cluster, S min Is S i Minimum value of (1).
S6: competition among clusters, the number of cluster cores is gradually reduced to be stable, when the position of the cluster core is not changed any more or the number of iterations is reached, a final result is output, and clustering is completed; otherwise, steps S2 to S5 are repeated.
The fuzzy C-means clustering algorithm and the competitive clustering algorithm are selected as comparison algorithms, wherein the competitive clustering algorithm is evolved on the basis of the fuzzy C-means clustering algorithm, and the fuzzy C-means clustering algorithm has the advantages that the number of clusters can be automatically calculated, and the number of the clusters needs to be set in advance. For the sake of fairness, we use the cluster number obtained by the competitive clustering method available for the diagnosis of state anomalies (small samples) as the premise of the fuzzy C-means clustering algorithm. For the competitive clustering algorithm, η is set 0 Other parameter settings are the same as the competitive clustering method that can be used for the diagnosis of status anomalies (small samples).
Fig. 2 is a comparison of the clustering results of the three clustering algorithms in the same data set, and the position of the center is shown as a "+" symbol superimposed on the data set, and the final cluster is circled, and fig. 2 (a) is the data set verified by the present application. As can be seen from fig. 2 (b), the fuzzy C-means clustering algorithm divides 3 classes on the basis of initially setting 3 class clusters, which indicates that the algorithm cannot effectively identify the differences between the large and small classes; as seen from (C) in fig. 2, the competitive clustering algorithm still cannot solve the disadvantage that the fuzzy C-means clustering algorithm divides each class, and since the special competitive mechanism of the algorithm automatically ignores the subclass of the right small angle, 3 classes are wrongly divided into 2 classes, and the fault point is generally similar to the subclass, which indicates that the algorithm cannot effectively identify the fault class in some cases. Fig. 2 (d) shows that the competitive clustering method for diagnosing abnormal states (small samples) provided by the present application is applied to the clustering result of the data set, and it can be seen that three classes with large number density difference are correctly classified, which indicates that the algorithm can effectively identify the fault class, and at the same time, can automatically calculate the number of the class clusters.
According to the method, the traditional membership calculation method is improved, so that the membership of the large class and the small class can be adaptively adjusted, the abnormal (or fault) small sample class is effectively reserved, and the clustering effect of processing the unbalanced data set by the algorithm is improved.
The above-mentioned embodiments are only used for illustrating the technical solution of the present invention, and are not limited thereto; the technical solutions described in the foregoing embodiments of the present invention can be modified or equivalent replaced by those skilled in the art, without departing from the structure of the present invention or exceeding the scope defined by the claims.

Claims (5)

1. A competitive clustering method for state anomaly diagnosis, comprising:
s1: inputting a data set U, and setting the number c of initial cluster-like groups as c max Determining fuzzy weighting index m and initial value eta 0 The iteration time constant tau and the cluster-like cardinality threshold N are generated randomly to generate a first cluster center set V1, and the initial sample membership degree of a data set U is obtained through a fuzzy C-means clustering algorithm; wherein, U ═ { x ═ x j |j=1,...,n},x j Representing samples, x, in a data set U j E is U, and n represents the total number of samples of U; v1 ═ { V ═ V i 1., c }, c representing the total number of cluster centers of dataset U, v i A cluster center representing an i-th class cluster;
s2: calculating a sample x j And cluster heart v i Obtaining a proportional coefficient alpha according to the Euclidean distance and the initial sample membership degree, and constructing a target function of a competitive clustering algorithm according to the Euclidean distance and the proportional coefficient alpha;
s3: calculating to obtain sample membership through the target function;
s4: calculating the cardinality N of the ith class cluster i If N is present i If the number of the clusters is less than the cardinal number threshold value N, eliminating the clusters to obtain a sample membership degree and a second cluster center set V2' corresponding to the reserved clusters;
s5: calculating the cluster compactness C of each cluster according to the sample membership and the second cluster center set V2 i Then according to cluster compactness C i Updating the sample membership degree and the cluster center to obtain a final sample membership degree and a second cluster center set V2 of the iteration;
s6: when the position of the cluster center is not changed any more or the number of iterations is reached, outputting a final result to finish clustering; otherwise, steps S2 to S5 are repeated.
2. The competitive clustering method for abnormal situations diagnosis according to claim 1, wherein in step S2, a scaling factor α is obtained according to the euclidean distance and the initial sample membership degree, and is expressed as:
Figure FDA0003674412920000011
η(k)=η 0 exp(-k/τ);
wherein,
Figure FDA0003674412920000014
represents a sample x j To cluster heart v i The distance of (a), or Euclidean distance; u. of ij Representing the membership degree of the jth sample belonging to the ith class cluster; m represents a fuzzy weighting index, and 2 is taken; k represents the number of iterations;
the objective function is then expressed as:
Figure FDA0003674412920000012
Figure FDA0003674412920000013
3. the competitive clustering method for abnormal states diagnosis according to claim 1, wherein in step S3, the objective function calculates the sample membership degree by using lagrangian multiplier method, and the sample membership degree is expressed as:
Figure FDA0003674412920000021
Figure FDA0003674412920000022
Figure FDA0003674412920000023
4. the competitive clustering method for the diagnosis of status abnormality according to claim 1, wherein in the step S5, the cluster compactness C of each cluster is i Expressed as:
Figure FDA0003674412920000024
wherein,
middle T i ={x j |u ij >u lj ;l=1,2,…,c;l≠i};
η j =||x j -v i ||;
Figure FDA0003674412920000025
T i Representing a sample set divided into ith class clusters; i T i L represents the number of the ith class cluster sample set; eta j Represents a sample x j The filtered value of (a); u. of i Representing the i-th class cluster sample set and the cluster center v i Average value of the distances.
5. The competitive clustering method according to claim 1, wherein in the step S5, the cluster compactness C is used as a function of the cluster compactness C i Updating the sample membership and the cluster center, and expressing that:
Figure FDA0003674412920000026
Figure FDA0003674412920000027
Figure FDA0003674412920000028
Figure FDA0003674412920000029
wherein f is i Representing coefficients assigned to the class i cluster; s i Is the compactness of the normalized class i cluster, S min Is S i Minimum value of (1).
CN202210619146.9A 2022-06-01 2022-06-01 Competitive clustering method for state anomaly diagnosis Pending CN115017988A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210619146.9A CN115017988A (en) 2022-06-01 2022-06-01 Competitive clustering method for state anomaly diagnosis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210619146.9A CN115017988A (en) 2022-06-01 2022-06-01 Competitive clustering method for state anomaly diagnosis

Publications (1)

Publication Number Publication Date
CN115017988A true CN115017988A (en) 2022-09-06

Family

ID=83072562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210619146.9A Pending CN115017988A (en) 2022-06-01 2022-06-01 Competitive clustering method for state anomaly diagnosis

Country Status (1)

Country Link
CN (1) CN115017988A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116975672A (en) * 2023-09-22 2023-10-31 山东乐普矿用设备股份有限公司 Temperature monitoring method and system for coal mine belt conveying motor

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116975672A (en) * 2023-09-22 2023-10-31 山东乐普矿用设备股份有限公司 Temperature monitoring method and system for coal mine belt conveying motor
CN116975672B (en) * 2023-09-22 2023-12-15 山东乐普矿用设备股份有限公司 Temperature monitoring method and system for coal mine belt conveying motor

Similar Documents

Publication Publication Date Title
CN111899882B (en) Method and system for predicting cancer
CN108898154A (en) A kind of electric load SOM-FCM Hierarchical clustering methods
CN111553127A (en) Multi-label text data feature selection method and device
CN110532880B (en) Sample screening and expression recognition method, neural network, device and storage medium
CN113408605A (en) Hyperspectral image semi-supervised classification method based on small sample learning
CN110795690A (en) Wind power plant operation abnormal data detection method
CN108280236A (en) A kind of random forest visualization data analysing method based on LargeVis
CN111224805A (en) Network fault root cause detection method, system and storage medium
CN112001788A (en) Credit card default fraud identification method based on RF-DBSCAN algorithm
CN110796159A (en) Power data classification method and system based on k-means algorithm
CN113449802A (en) Graph classification method and device based on multi-granularity mutual information maximization
CN116187835A (en) Data-driven-based method and system for estimating theoretical line loss interval of transformer area
CN115017988A (en) Competitive clustering method for state anomaly diagnosis
CN109948662B (en) Face image depth clustering method based on K-means and MMD
CN111209939A (en) SVM classification prediction method with intelligent parameter optimization module
CN113987910A (en) Method and device for identifying load of residents by coupling neural network and dynamic time planning
CN117195027A (en) Cluster weighted clustering integration method based on member selection
CN111488903A (en) Decision tree feature selection method based on feature weight
Cai et al. Fuzzy criteria in multi-objective feature selection for unsupervised learning
Kiang et al. A comparative analysis of an extended SOM network and K-means analysis
CN112906751A (en) Method for identifying abnormal value through unsupervised learning
Jiang et al. Anomaly detection of Argo data using variational autoencoder and k-means clustering
Lv et al. Imbalanced Data Over-Sampling Method Based on ISODATA Clustering
CN115222945B (en) Deep semantic segmentation network training method based on multi-scale self-adaptive course learning
CN116452910B (en) scRNA-seq data characteristic representation and cell type identification method based on graph neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination