CN104102726A - Modified K-means clustering algorithm based on hierarchical clustering - Google Patents

Modified K-means clustering algorithm based on hierarchical clustering Download PDF

Info

Publication number
CN104102726A
CN104102726A CN201410350480.4A CN201410350480A CN104102726A CN 104102726 A CN104102726 A CN 104102726A CN 201410350480 A CN201410350480 A CN 201410350480A CN 104102726 A CN104102726 A CN 104102726A
Authority
CN
China
Prior art keywords
cluster
class
clustering
data
mean
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410350480.4A
Other languages
Chinese (zh)
Inventor
刘晓波
张明明
袁光前
陈鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang Hangkong University
Original Assignee
Nanchang Hangkong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang Hangkong University filed Critical Nanchang Hangkong University
Priority to CN201410350480.4A priority Critical patent/CN104102726A/en
Publication of CN104102726A publication Critical patent/CN104102726A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Testing Of Devices, Machine Parts, Or Other Structures Thereof (AREA)

Abstract

The invention discloses a modified K-means clustering algorithm based on hierarchical clustering. The algorithm includes the steps of calculating a distance of each paired two of n objects; constructing a cluster of n single members; acquiring a hierarchical cluster according to data; when the data are of K types, calculating a cluster center of each type of data; with the obtained cluster centers as an initial cluster center of a K-mean and a K value as the number of K-means clusters, performing K-means clustering to obtain a clustering result. The comparison on intraclass variance and interclass variance of the clustering result and the modified K-means clustering result shows that the modified K-means clustering is more accurate and more reasonable. The respective advantages of hierarchical clustering and K-means clustering are both given to play as far as possible, respective defects of the two clustering methods are avoided, and clustering quality is greatly improved.

Description

Improvement K means clustering algorithm based on hierarchical clustering
Technical field
The invention belongs to aero engine technology field, be specifically related to the improvement K means clustering algorithm based on hierarchical clustering.
Technical background
Aeromotor, as the core drive parts of aircraft, is directly connected to the security & performance of aircraft operation.The reliability of engine concerns its serviceable life, economic benefit, and the security of the lives and property of passenger.Aeromotor, as the internal part of aircraft, generally can only judge according to the experience of oneself and simple equipment and instrument whether it has problems by technician after flight finishes, and can not guarantee like this accuracy of diagnosis.Yet complicated and diversified along with aeromotor fault, technician's recognition capability is also just relatively limited.Therefore condition monitoring and fault diagnosis technology has been brought into play important effect in aeromotor area of maintenance, condition monitoring and fault diagnosis technology can detect fault-signal on the basis of not disassembling engine, with the fault data monitoring, judge duty and the development trend of parts, thereby also diagnose accurately out of order position and fault fast.
In order accurately and rapidly to find engine failure, generally at a plurality of positions of engine sensor installation, carry out engine signal collection, because a plurality of sensors can respond respectively the signal of different parts, sometimes certain sensor cannot detect fault-signal under some jamming pattern, the metrical information of performance different parts sensor can redundancy, complementation, collaborative advantage, can obtain the data of this fault-signal, by the signal that these sensors are collected, carry out data fusion, find out fault signature, thereby realize engine diagnosis.Therefore the accuracy and efficiency that, how to improve aeromotor Fusion has become the problem facing at present.
Cluster algorithm, as a kind of non-supervisory learning method, is applied to the Data processing of all trades and professions widely.There are many deficiencies in traditional clustering algorithm.Such as hierarchical clustering algorithm, it can only be embodied to a great extent in toy data base, if the data cell of database is too much, its scalability will variation.And hierarchical clustering processing is one step ahead can not be reversed, the data cell between treated class afterwards can not exchange.K-mean algorithm and for example, it has very high dependence for initial value, if the k choosing is not be worthwhile, may cause final result unsatisfactory.
Summary of the invention
The object of the present invention is to provide a kind of improvement K means clustering algorithm of aeromotor Fusion, utilize sensor to measure vibration displacement signal, the improvement K means clustering algorithm of employing based on hierarchical clustering, realize aeromotor Fusion, improve the accuracy and efficiency of data fusion, for Fault Diagnosis of Aeroengines provides foundation.
The present invention takes following technical scheme to realize above-mentioned purpose, and the improvement K means clustering algorithm based on hierarchical clustering, the steps include:
1) calculate n the distance that object is mutual; Calculating range formula is Euclidean distance formula, by calculating a distance matrix;
2) n single member's cluster of structure, finds two nearest clusters, and is merged into a class, and the number of cluster just reduces by a class; The like, calculate newly-generated cluster and the spacing of other clusters;
3) according to step 2) hierarchical clustering that draws of computational data; If thrown the reins to, cluster is finally converted into a class; When data are divided into k class, function
S ( k ) = &Sigma; i = 1 k &Sigma; x &Element; c i | x - x &OverBar; i | 2 min i , j < k ( i &NotEqual; j ) | x i &OverBar; - x j &OverBar; | 2
Obtain minimum, data are just divided into k class;
4), when data are divided into K class, calculate the cluster centre C of each class 1, C 2... C k;
5) using step 4) cluster centre obtained is as the initial cluster center of K average, and K value, as the number of K mean cluster, is carried out K mean cluster and is obtained cluster result.
The present invention is on the basis of above-mentioned steps, with traditional K means clustering method, data are carried out to cluster, the cluster result obtaining and improved K mean cluster result, by being analyzed, relatively both interclass variance and between-group variance, show that improved K mean cluster is more accurate rationally.
The present invention just extracts in large database a part of data as representative, so just solve hierarchical clustering and processed the not strong defect of mass data unit scalability, and obtain initial value by level algorithm, determined k value, just problem k mean algorithm being relied on has the most solved, thereby reduced k-mean algorithm, occurs the probability that result is undesirable.By relatively interclass variance and the between-group variance of cluster result and improved K mean cluster result show that improved K mean cluster is more accurate rationally.What hierarchical clustering and k-mean cluster advantage separately were all tried one's best brings into play, avoids the deficiency of two clustering methods self, has farthest improved the quality of cluster.
Accompanying drawing explanation
Fig. 1 misaligns Dendrogram in the present invention.
Fig. 2 is unbalance dynamic Dendrogram in the present invention.
Fig. 3 touches mill Dendrogram in the present invention.
Fig. 4 is non-fault Dendrogram in the present invention.
Embodiment
The Fusion of the present invention during with several typical faults of aeromotor and non-fault specifically implemented:
The vibration displacement signal sampling data that sensor measures are as table 1:
Table 1: data from the sample survey
1, state model: misalign
The Euclidean distance matrix calculating is as follows:
Euclidean distance between each data point (for example first point is the first row secondary series in Euclidean distance matrix with the Euclidean distance of second point) is as follows:
24 primary datas under condition of misalignment are numbered to 1-24, when starting, cluster regards 24 initial data objects as 24 initial classes, then two nearest classes of Euclidean distance between class and class are merged into a class, the like, Fig. 1 has expressed the process of this merging until all data are gathered is a class.
If as can be seen from Figure 1 thrown the reins to, cluster is finally converted into a class.The present invention introduces following constraint condition.Calculate newly-generated cluster and the spacing of other clusters, if the result obtaining meets function:
S ( k ) = &Sigma; i = 1 k &Sigma; x &Element; c i | x - x &OverBar; i | 2 min i , j < k ( i &NotEqual; j ) | x i &OverBar; - x j &OverBar; | 2
S obtains minimum value, and algorithm finishes.
From figure, can obviously find out that cluster is that three classes or two class effects are better.S (3)=1.8936 again, S (2)=2.2507.
Therefore the result that hierarchical clustering obtains is three classes as the K value of K mean cluster below.
Calculate the mean value of all data objects in every group of cluster as the initial cluster center of K mean cluster.Every group of cluster centre when cluster is three groups: 8.125,23.1175,35.579.
The cluster centre obtained using above as the initial cluster center of K average, and K value, as the number of K mean cluster, is carried out K mean cluster and is obtained cluster result.
With traditional K means clustering method, data are carried out to cluster, the cluster result obtaining will be analyzed with improved K mean cluster result.
Cluster number in the middle of traditional K mean cluster is random, selects 2 cluster numbers herein, and initial cluster center is also to randomly draw from need the data object of cluster.
By both interclass variances of comparison and between-group variance, show that improved K mean cluster is more accurate rationally.
Calculate interclass variance and the between-group variance of traditional K mean cluster result:
S1 represents the interclass variance of first group, and S2 represents the interclass variance of second group, and S represents between-group variance:
S1=26.83,S2=56.107,S=101.8
Interclass variance and the between-group variance of the K mean cluster of computed improved:
S1=3.33,S2=6.98,S3=29.83,S=132.57
According to same step carry out unbalance dynamic, touch mill, the cluster of non-fault operating mode.
2, state model: unbalance dynamic
The Euclidean distance matrix calculating is as follows:
Euclidean distance between each data point is as follows:
As shown in Figure 2,18 primary datas under unbalance dynamic state are numbered to 1-18, when starting, cluster regards 18 initial data objects as 18 initial classes, then two nearest classes of Euclidean distance between class and class are merged into a class, the like, Fig. 2 has expressed the process of this merging until all data are gathered is a class.
S(3)=2.08S(4)=2.04。
K=4 cluster centre: 17.32,23.91,33.2,41.58.
Baseline results variance: S1=4.98, S2=8.05, S3=45.99, S=117.58.
Improve result variance: S1=26.50, S2=12.30, S3=2.43, S4=4.08, S=124.22.
3, state model: touch mill
The Euclidean distance matrix calculating is as follows:
Euclidean distance between each data point is as follows:
As shown in Figure 3,24 primary datas of touching under mill state are numbered to 1-24, when starting, cluster regards 24 initial data objects as 24 initial classes, then two nearest classes of Euclidean distance between class and class are merged into a class, the like, Fig. 3 has expressed the process of this merging until all data are gathered is a class.
S(3)=1.4446,S(2)=2.8598。
Center as k=3 mean cluster is 12.14,23.97,35.68.
Baseline results variance: S1=20.27, S2=33.57, S=32.59.
Improve result variance: S1=9.77, S2=8.42, S3=13.64, S=53.926.
4, state model: non-fault
The Euclidean distance matrix calculating is as follows:
Euclidean distance between each data point is as follows:
As shown in Figure 4,24 primary datas under unfaulty conditions are numbered to 1-24, when starting, cluster regards 24 initial data objects as 24 initial classes, then two nearest classes of Euclidean distance between class and class are merged into a class, the like, Fig. 4 has expressed the process of this merging until all data are gathered is a class.
S(3)=1.3994,S(2)=1.8921。
Three cluster centres as K=3 mean cluster are 13.568,23.728,32.77.
Baseline results variance: S1=11.67, S2=21.17, S=53.68.
Improve result variance: S1=8.67, S2=3.03, S3=3.01, S=66.39.
The interclass variance of the result by above 4 kinds of improved K mean clusters of operating mode comparative analysis is all less than the result of traditional K mean cluster, shows that the improved K mean cluster of the similarity degree result of its data object in one group of data is better than traditional K mean cluster.And the between-group variance of the result of improved K mean cluster is greater than the between-group variance of the result of traditional K mean cluster, show the discrimination between group and group, the result of improved K mean cluster is better than traditional K mean cluster result.
Therefore the cluster result that the cluster result that shows to utilize the K mean cluster after improving to obtain obtains than traditional K mean cluster will be more accurately, rationally.

Claims (1)

1. the improvement K means clustering algorithm based on hierarchical clustering, is characterized in that, the steps include:
1) calculate n the distance that object is mutual; Calculating range formula is Euclidean distance formula, by calculating a distance matrix;
2) n single member's cluster of structure, finds two nearest clusters, and is merged into a class, and the number of cluster just reduces by a class; The like, calculate newly-generated cluster and the spacing of other clusters;
3) according to step 2) hierarchical clustering that draws of computational data; If thrown the reins to, cluster is finally converted into a class; When data are divided into k class, function
S ( k ) = &Sigma; i = 1 k &Sigma; x &Element; c i | x - x &OverBar; i | 2 min i , j < k ( i &NotEqual; j ) | x i &OverBar; - x j &OverBar; | 2
Obtain minimum, data are just divided into k class;
4), when data are divided into K class, calculate the cluster centre C of each class 1, C 2... C k;
5) using step 4) cluster centre obtained is as the initial cluster center of K average, and K value, as the number of K mean cluster, is carried out K mean cluster and is obtained cluster result.
CN201410350480.4A 2014-07-22 2014-07-22 Modified K-means clustering algorithm based on hierarchical clustering Pending CN104102726A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410350480.4A CN104102726A (en) 2014-07-22 2014-07-22 Modified K-means clustering algorithm based on hierarchical clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410350480.4A CN104102726A (en) 2014-07-22 2014-07-22 Modified K-means clustering algorithm based on hierarchical clustering

Publications (1)

Publication Number Publication Date
CN104102726A true CN104102726A (en) 2014-10-15

Family

ID=51670880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410350480.4A Pending CN104102726A (en) 2014-07-22 2014-07-22 Modified K-means clustering algorithm based on hierarchical clustering

Country Status (1)

Country Link
CN (1) CN104102726A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537380A (en) * 2014-12-30 2015-04-22 小米科技有限责任公司 Clustering method and device
CN104915434A (en) * 2015-06-24 2015-09-16 哈尔滨工业大学 Multi-dimensional time sequence classification method based on mahalanobis distance DTW
CN105163182A (en) * 2015-08-24 2015-12-16 Tcl集团股份有限公司 Smart TV user behavior obtaining method and system based on exceptional mining algorithm
CN106530132A (en) * 2016-11-14 2017-03-22 国家电网公司 Power load clustering method and device
CN110486630A (en) * 2019-08-20 2019-11-22 西南石油大学 Natural gas line corrosion default characteristic feature extracting method
CN110940518A (en) * 2019-11-27 2020-03-31 中北大学 Aerospace transmission mechanism analysis method based on fault data
CN116662588A (en) * 2023-08-01 2023-08-29 山东省大数据中心 Intelligent searching method and system for mass data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101004761A (en) * 2007-01-10 2007-07-25 复旦大学 Hierarchy clustering method of successive dichotomy for document in large scale
CN102663100A (en) * 2012-04-13 2012-09-12 西安电子科技大学 Two-stage hybrid particle swarm optimization clustering method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101004761A (en) * 2007-01-10 2007-07-25 复旦大学 Hierarchy clustering method of successive dichotomy for document in large scale
CN102663100A (en) * 2012-04-13 2012-09-12 西安电子科技大学 Two-stage hybrid particle swarm optimization clustering method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SUDIPTO GUHA.ETL: "CURE: An Efficient Clustering Algorithm for Large Databases", 《INFORMATION SYSTEMS》 *
李斌: "一种带约束的最小离差平方和系统聚类法及应用", 《计算机应用》 *
邱苏林: "基于Ward’s 方法的k-平均优化算法及其应用", 《计算机工程与应用》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537380A (en) * 2014-12-30 2015-04-22 小米科技有限责任公司 Clustering method and device
CN104915434A (en) * 2015-06-24 2015-09-16 哈尔滨工业大学 Multi-dimensional time sequence classification method based on mahalanobis distance DTW
CN104915434B (en) * 2015-06-24 2018-03-27 哈尔滨工业大学 A kind of multidimensional time-series sorting technique based on mahalanobis distance DTW
CN105163182A (en) * 2015-08-24 2015-12-16 Tcl集团股份有限公司 Smart TV user behavior obtaining method and system based on exceptional mining algorithm
CN105163182B (en) * 2015-08-24 2019-06-11 Tcl集团股份有限公司 Smart television user behavior acquisition methods and system based on exception mining algorithm
CN106530132A (en) * 2016-11-14 2017-03-22 国家电网公司 Power load clustering method and device
CN110486630A (en) * 2019-08-20 2019-11-22 西南石油大学 Natural gas line corrosion default characteristic feature extracting method
CN110940518A (en) * 2019-11-27 2020-03-31 中北大学 Aerospace transmission mechanism analysis method based on fault data
CN110940518B (en) * 2019-11-27 2021-08-24 中北大学 Aerospace transmission mechanism analysis method based on fault data
CN116662588A (en) * 2023-08-01 2023-08-29 山东省大数据中心 Intelligent searching method and system for mass data
CN116662588B (en) * 2023-08-01 2023-10-10 山东省大数据中心 Intelligent searching method and system for mass data

Similar Documents

Publication Publication Date Title
CN104102726A (en) Modified K-means clustering algorithm based on hierarchical clustering
CN106323635B (en) A kind of rolling bearing fault on-line checking and state evaluating method
CN110132598B (en) Fault noise diagnosis algorithm for rolling bearing of rotating equipment
CN104897403B (en) Self-adaption fault diagnosis method based on permutation entropy (PE) and manifold-based dynamic time warping (MDTW)
KR101316486B1 (en) Error detection method and system
CN103776654B (en) The method for diagnosing faults of multi-sensor information fusion
CN109781411B (en) Bearing fault diagnosis method combining improved sparse filter and KELM
CN110309886B (en) Wireless sensor high-dimensional data real-time anomaly detection method based on deep learning
CN109827777B (en) Rolling bearing fault prediction method based on partial least square method extreme learning machine
CN105760839A (en) Bearing fault diagnosis method based on multi-feature manifold learning and support vector machine
CN112257530B (en) Rolling bearing fault diagnosis method based on blind signal separation and support vector machine
CN103900824B (en) Diagnosis Method of Diesel Fault based on transient speed cluster analysis
CN104614166B (en) Method for identifying failure state of rotor vibration signal of aircraft engine
Yan et al. Fault diagnosis of rotating machinery equipped with multiple sensors using space-time fragments
CN111678699B (en) Early fault monitoring and diagnosing method and system for rolling bearing
CN105956514A (en) Helicopter rotor abnormity detecting method driven by vibration data
CN105425150A (en) Motor fault diagnosis method based on RBF and PCA-SVDD
Guo et al. Dynamic time warping using graph similarity guided symplectic geometry mode decomposition to detect bearing faults
CN105865784A (en) Rolling bearing detection method based on LMD (Local Mean Decomposition) and gray correlation
CN106124988A (en) A kind of motor multi-state fault detection method based on RBF, multilamellar FDA and SVDD
CN105241665A (en) Rolling bearing fault diagnosis method based on IRBFNN-AdaBoost classifier
CN110108474A (en) A kind of rotating machinery operation stability on-line monitoring and appraisal procedure and system
Yu et al. Rolling bearing fault feature extraction and diagnosis method based on MODWPT and DBN
CN114462480A (en) Multi-source sensor rolling mill fault diagnosis method based on non-equilibrium data set
CN114964776A (en) Wheel set bearing fault diagnosis method based on MSE and PSO-SVM

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20141015