CN104102726A - Modified K-means clustering algorithm based on hierarchical clustering - Google Patents
Modified K-means clustering algorithm based on hierarchical clustering Download PDFInfo
- Publication number
- CN104102726A CN104102726A CN201410350480.4A CN201410350480A CN104102726A CN 104102726 A CN104102726 A CN 104102726A CN 201410350480 A CN201410350480 A CN 201410350480A CN 104102726 A CN104102726 A CN 104102726A
- Authority
- CN
- China
- Prior art keywords
- cluster
- class
- clustering
- data
- mean
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Testing Of Devices, Machine Parts, Or Other Structures Thereof (AREA)
Abstract
The invention discloses a modified K-means clustering algorithm based on hierarchical clustering. The algorithm includes the steps of calculating a distance of each paired two of n objects; constructing a cluster of n single members; acquiring a hierarchical cluster according to data; when the data are of K types, calculating a cluster center of each type of data; with the obtained cluster centers as an initial cluster center of a K-mean and a K value as the number of K-means clusters, performing K-means clustering to obtain a clustering result. The comparison on intraclass variance and interclass variance of the clustering result and the modified K-means clustering result shows that the modified K-means clustering is more accurate and more reasonable. The respective advantages of hierarchical clustering and K-means clustering are both given to play as far as possible, respective defects of the two clustering methods are avoided, and clustering quality is greatly improved.
Description
Technical field
The invention belongs to aero engine technology field, be specifically related to the improvement K means clustering algorithm based on hierarchical clustering.
Technical background
Aeromotor, as the core drive parts of aircraft, is directly connected to the security & performance of aircraft operation.The reliability of engine concerns its serviceable life, economic benefit, and the security of the lives and property of passenger.Aeromotor, as the internal part of aircraft, generally can only judge according to the experience of oneself and simple equipment and instrument whether it has problems by technician after flight finishes, and can not guarantee like this accuracy of diagnosis.Yet complicated and diversified along with aeromotor fault, technician's recognition capability is also just relatively limited.Therefore condition monitoring and fault diagnosis technology has been brought into play important effect in aeromotor area of maintenance, condition monitoring and fault diagnosis technology can detect fault-signal on the basis of not disassembling engine, with the fault data monitoring, judge duty and the development trend of parts, thereby also diagnose accurately out of order position and fault fast.
In order accurately and rapidly to find engine failure, generally at a plurality of positions of engine sensor installation, carry out engine signal collection, because a plurality of sensors can respond respectively the signal of different parts, sometimes certain sensor cannot detect fault-signal under some jamming pattern, the metrical information of performance different parts sensor can redundancy, complementation, collaborative advantage, can obtain the data of this fault-signal, by the signal that these sensors are collected, carry out data fusion, find out fault signature, thereby realize engine diagnosis.Therefore the accuracy and efficiency that, how to improve aeromotor Fusion has become the problem facing at present.
Cluster algorithm, as a kind of non-supervisory learning method, is applied to the Data processing of all trades and professions widely.There are many deficiencies in traditional clustering algorithm.Such as hierarchical clustering algorithm, it can only be embodied to a great extent in toy data base, if the data cell of database is too much, its scalability will variation.And hierarchical clustering processing is one step ahead can not be reversed, the data cell between treated class afterwards can not exchange.K-mean algorithm and for example, it has very high dependence for initial value, if the k choosing is not be worthwhile, may cause final result unsatisfactory.
Summary of the invention
The object of the present invention is to provide a kind of improvement K means clustering algorithm of aeromotor Fusion, utilize sensor to measure vibration displacement signal, the improvement K means clustering algorithm of employing based on hierarchical clustering, realize aeromotor Fusion, improve the accuracy and efficiency of data fusion, for Fault Diagnosis of Aeroengines provides foundation.
The present invention takes following technical scheme to realize above-mentioned purpose, and the improvement K means clustering algorithm based on hierarchical clustering, the steps include:
1) calculate n the distance that object is mutual; Calculating range formula is Euclidean distance formula, by calculating a distance matrix;
2) n single member's cluster of structure, finds two nearest clusters, and is merged into a class, and the number of cluster just reduces by a class; The like, calculate newly-generated cluster and the spacing of other clusters;
3) according to step 2) hierarchical clustering that draws of computational data; If thrown the reins to, cluster is finally converted into a class; When data are divided into k class, function
Obtain minimum, data are just divided into k class;
4), when data are divided into K class, calculate the cluster centre C of each class
1, C
2... C
k;
5) using step 4) cluster centre obtained is as the initial cluster center of K average, and K value, as the number of K mean cluster, is carried out K mean cluster and is obtained cluster result.
The present invention is on the basis of above-mentioned steps, with traditional K means clustering method, data are carried out to cluster, the cluster result obtaining and improved K mean cluster result, by being analyzed, relatively both interclass variance and between-group variance, show that improved K mean cluster is more accurate rationally.
The present invention just extracts in large database a part of data as representative, so just solve hierarchical clustering and processed the not strong defect of mass data unit scalability, and obtain initial value by level algorithm, determined k value, just problem k mean algorithm being relied on has the most solved, thereby reduced k-mean algorithm, occurs the probability that result is undesirable.By relatively interclass variance and the between-group variance of cluster result and improved K mean cluster result show that improved K mean cluster is more accurate rationally.What hierarchical clustering and k-mean cluster advantage separately were all tried one's best brings into play, avoids the deficiency of two clustering methods self, has farthest improved the quality of cluster.
Accompanying drawing explanation
Fig. 1 misaligns Dendrogram in the present invention.
Fig. 2 is unbalance dynamic Dendrogram in the present invention.
Fig. 3 touches mill Dendrogram in the present invention.
Fig. 4 is non-fault Dendrogram in the present invention.
Embodiment
The Fusion of the present invention during with several typical faults of aeromotor and non-fault specifically implemented:
The vibration displacement signal sampling data that sensor measures are as table 1:
Table 1: data from the sample survey
1, state model: misalign
The Euclidean distance matrix calculating is as follows:
Euclidean distance between each data point (for example first point is the first row secondary series in Euclidean distance matrix with the Euclidean distance of second point) is as follows:
24 primary datas under condition of misalignment are numbered to 1-24, when starting, cluster regards 24 initial data objects as 24 initial classes, then two nearest classes of Euclidean distance between class and class are merged into a class, the like, Fig. 1 has expressed the process of this merging until all data are gathered is a class.
If as can be seen from Figure 1 thrown the reins to, cluster is finally converted into a class.The present invention introduces following constraint condition.Calculate newly-generated cluster and the spacing of other clusters, if the result obtaining meets function:
S obtains minimum value, and algorithm finishes.
From figure, can obviously find out that cluster is that three classes or two class effects are better.S (3)=1.8936 again, S (2)=2.2507.
Therefore the result that hierarchical clustering obtains is three classes as the K value of K mean cluster below.
Calculate the mean value of all data objects in every group of cluster as the initial cluster center of K mean cluster.Every group of cluster centre when cluster is three groups: 8.125,23.1175,35.579.
The cluster centre obtained using above as the initial cluster center of K average, and K value, as the number of K mean cluster, is carried out K mean cluster and is obtained cluster result.
With traditional K means clustering method, data are carried out to cluster, the cluster result obtaining will be analyzed with improved K mean cluster result.
Cluster number in the middle of traditional K mean cluster is random, selects 2 cluster numbers herein, and initial cluster center is also to randomly draw from need the data object of cluster.
By both interclass variances of comparison and between-group variance, show that improved K mean cluster is more accurate rationally.
Calculate interclass variance and the between-group variance of traditional K mean cluster result:
S1 represents the interclass variance of first group, and S2 represents the interclass variance of second group, and S represents between-group variance:
S1=26.83,S2=56.107,S=101.8
Interclass variance and the between-group variance of the K mean cluster of computed improved:
S1=3.33,S2=6.98,S3=29.83,S=132.57
According to same step carry out unbalance dynamic, touch mill, the cluster of non-fault operating mode.
2, state model: unbalance dynamic
The Euclidean distance matrix calculating is as follows:
Euclidean distance between each data point is as follows:
As shown in Figure 2,18 primary datas under unbalance dynamic state are numbered to 1-18, when starting, cluster regards 18 initial data objects as 18 initial classes, then two nearest classes of Euclidean distance between class and class are merged into a class, the like, Fig. 2 has expressed the process of this merging until all data are gathered is a class.
S(3)=2.08S(4)=2.04。
K=4 cluster centre: 17.32,23.91,33.2,41.58.
Baseline results variance: S1=4.98, S2=8.05, S3=45.99, S=117.58.
Improve result variance: S1=26.50, S2=12.30, S3=2.43, S4=4.08, S=124.22.
3, state model: touch mill
The Euclidean distance matrix calculating is as follows:
Euclidean distance between each data point is as follows:
As shown in Figure 3,24 primary datas of touching under mill state are numbered to 1-24, when starting, cluster regards 24 initial data objects as 24 initial classes, then two nearest classes of Euclidean distance between class and class are merged into a class, the like, Fig. 3 has expressed the process of this merging until all data are gathered is a class.
S(3)=1.4446,S(2)=2.8598。
Center as k=3 mean cluster is 12.14,23.97,35.68.
Baseline results variance: S1=20.27, S2=33.57, S=32.59.
Improve result variance: S1=9.77, S2=8.42, S3=13.64, S=53.926.
4, state model: non-fault
The Euclidean distance matrix calculating is as follows:
Euclidean distance between each data point is as follows:
As shown in Figure 4,24 primary datas under unfaulty conditions are numbered to 1-24, when starting, cluster regards 24 initial data objects as 24 initial classes, then two nearest classes of Euclidean distance between class and class are merged into a class, the like, Fig. 4 has expressed the process of this merging until all data are gathered is a class.
S(3)=1.3994,S(2)=1.8921。
Three cluster centres as K=3 mean cluster are 13.568,23.728,32.77.
Baseline results variance: S1=11.67, S2=21.17, S=53.68.
Improve result variance: S1=8.67, S2=3.03, S3=3.01, S=66.39.
The interclass variance of the result by above 4 kinds of improved K mean clusters of operating mode comparative analysis is all less than the result of traditional K mean cluster, shows that the improved K mean cluster of the similarity degree result of its data object in one group of data is better than traditional K mean cluster.And the between-group variance of the result of improved K mean cluster is greater than the between-group variance of the result of traditional K mean cluster, show the discrimination between group and group, the result of improved K mean cluster is better than traditional K mean cluster result.
Therefore the cluster result that the cluster result that shows to utilize the K mean cluster after improving to obtain obtains than traditional K mean cluster will be more accurately, rationally.
Claims (1)
1. the improvement K means clustering algorithm based on hierarchical clustering, is characterized in that, the steps include:
1) calculate n the distance that object is mutual; Calculating range formula is Euclidean distance formula, by calculating a distance matrix;
2) n single member's cluster of structure, finds two nearest clusters, and is merged into a class, and the number of cluster just reduces by a class; The like, calculate newly-generated cluster and the spacing of other clusters;
3) according to step 2) hierarchical clustering that draws of computational data; If thrown the reins to, cluster is finally converted into a class; When data are divided into k class, function
Obtain minimum, data are just divided into k class;
4), when data are divided into K class, calculate the cluster centre C of each class
1, C
2... C
k;
5) using step 4) cluster centre obtained is as the initial cluster center of K average, and K value, as the number of K mean cluster, is carried out K mean cluster and is obtained cluster result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410350480.4A CN104102726A (en) | 2014-07-22 | 2014-07-22 | Modified K-means clustering algorithm based on hierarchical clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410350480.4A CN104102726A (en) | 2014-07-22 | 2014-07-22 | Modified K-means clustering algorithm based on hierarchical clustering |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104102726A true CN104102726A (en) | 2014-10-15 |
Family
ID=51670880
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410350480.4A Pending CN104102726A (en) | 2014-07-22 | 2014-07-22 | Modified K-means clustering algorithm based on hierarchical clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104102726A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104537380A (en) * | 2014-12-30 | 2015-04-22 | 小米科技有限责任公司 | Clustering method and device |
CN104915434A (en) * | 2015-06-24 | 2015-09-16 | 哈尔滨工业大学 | Multi-dimensional time sequence classification method based on mahalanobis distance DTW |
CN105163182A (en) * | 2015-08-24 | 2015-12-16 | Tcl集团股份有限公司 | Smart TV user behavior obtaining method and system based on exceptional mining algorithm |
CN106530132A (en) * | 2016-11-14 | 2017-03-22 | 国家电网公司 | Power load clustering method and device |
CN110486630A (en) * | 2019-08-20 | 2019-11-22 | 西南石油大学 | Natural gas line corrosion default characteristic feature extracting method |
CN110940518A (en) * | 2019-11-27 | 2020-03-31 | 中北大学 | Aerospace transmission mechanism analysis method based on fault data |
CN116662588A (en) * | 2023-08-01 | 2023-08-29 | 山东省大数据中心 | Intelligent searching method and system for mass data |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101004761A (en) * | 2007-01-10 | 2007-07-25 | 复旦大学 | Hierarchy clustering method of successive dichotomy for document in large scale |
CN102663100A (en) * | 2012-04-13 | 2012-09-12 | 西安电子科技大学 | Two-stage hybrid particle swarm optimization clustering method |
-
2014
- 2014-07-22 CN CN201410350480.4A patent/CN104102726A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101004761A (en) * | 2007-01-10 | 2007-07-25 | 复旦大学 | Hierarchy clustering method of successive dichotomy for document in large scale |
CN102663100A (en) * | 2012-04-13 | 2012-09-12 | 西安电子科技大学 | Two-stage hybrid particle swarm optimization clustering method |
Non-Patent Citations (3)
Title |
---|
SUDIPTO GUHA.ETL: "CURE: An Efficient Clustering Algorithm for Large Databases", 《INFORMATION SYSTEMS》 * |
李斌: "一种带约束的最小离差平方和系统聚类法及应用", 《计算机应用》 * |
邱苏林: "基于Ward’s 方法的k-平均优化算法及其应用", 《计算机工程与应用》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104537380A (en) * | 2014-12-30 | 2015-04-22 | 小米科技有限责任公司 | Clustering method and device |
CN104915434A (en) * | 2015-06-24 | 2015-09-16 | 哈尔滨工业大学 | Multi-dimensional time sequence classification method based on mahalanobis distance DTW |
CN104915434B (en) * | 2015-06-24 | 2018-03-27 | 哈尔滨工业大学 | A kind of multidimensional time-series sorting technique based on mahalanobis distance DTW |
CN105163182A (en) * | 2015-08-24 | 2015-12-16 | Tcl集团股份有限公司 | Smart TV user behavior obtaining method and system based on exceptional mining algorithm |
CN105163182B (en) * | 2015-08-24 | 2019-06-11 | Tcl集团股份有限公司 | Smart television user behavior acquisition methods and system based on exception mining algorithm |
CN106530132A (en) * | 2016-11-14 | 2017-03-22 | 国家电网公司 | Power load clustering method and device |
CN110486630A (en) * | 2019-08-20 | 2019-11-22 | 西南石油大学 | Natural gas line corrosion default characteristic feature extracting method |
CN110940518A (en) * | 2019-11-27 | 2020-03-31 | 中北大学 | Aerospace transmission mechanism analysis method based on fault data |
CN110940518B (en) * | 2019-11-27 | 2021-08-24 | 中北大学 | Aerospace transmission mechanism analysis method based on fault data |
CN116662588A (en) * | 2023-08-01 | 2023-08-29 | 山东省大数据中心 | Intelligent searching method and system for mass data |
CN116662588B (en) * | 2023-08-01 | 2023-10-10 | 山东省大数据中心 | Intelligent searching method and system for mass data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104102726A (en) | Modified K-means clustering algorithm based on hierarchical clustering | |
CN106323635B (en) | A kind of rolling bearing fault on-line checking and state evaluating method | |
CN110132598B (en) | Fault noise diagnosis algorithm for rolling bearing of rotating equipment | |
CN104897403B (en) | Self-adaption fault diagnosis method based on permutation entropy (PE) and manifold-based dynamic time warping (MDTW) | |
KR101316486B1 (en) | Error detection method and system | |
CN103776654B (en) | The method for diagnosing faults of multi-sensor information fusion | |
CN109781411B (en) | Bearing fault diagnosis method combining improved sparse filter and KELM | |
CN110309886B (en) | Wireless sensor high-dimensional data real-time anomaly detection method based on deep learning | |
CN109827777B (en) | Rolling bearing fault prediction method based on partial least square method extreme learning machine | |
CN105760839A (en) | Bearing fault diagnosis method based on multi-feature manifold learning and support vector machine | |
CN112257530B (en) | Rolling bearing fault diagnosis method based on blind signal separation and support vector machine | |
CN103900824B (en) | Diagnosis Method of Diesel Fault based on transient speed cluster analysis | |
CN104614166B (en) | Method for identifying failure state of rotor vibration signal of aircraft engine | |
Yan et al. | Fault diagnosis of rotating machinery equipped with multiple sensors using space-time fragments | |
CN111678699B (en) | Early fault monitoring and diagnosing method and system for rolling bearing | |
CN105956514A (en) | Helicopter rotor abnormity detecting method driven by vibration data | |
CN105425150A (en) | Motor fault diagnosis method based on RBF and PCA-SVDD | |
Guo et al. | Dynamic time warping using graph similarity guided symplectic geometry mode decomposition to detect bearing faults | |
CN105865784A (en) | Rolling bearing detection method based on LMD (Local Mean Decomposition) and gray correlation | |
CN106124988A (en) | A kind of motor multi-state fault detection method based on RBF, multilamellar FDA and SVDD | |
CN105241665A (en) | Rolling bearing fault diagnosis method based on IRBFNN-AdaBoost classifier | |
CN110108474A (en) | A kind of rotating machinery operation stability on-line monitoring and appraisal procedure and system | |
Yu et al. | Rolling bearing fault feature extraction and diagnosis method based on MODWPT and DBN | |
CN114462480A (en) | Multi-source sensor rolling mill fault diagnosis method based on non-equilibrium data set | |
CN114964776A (en) | Wheel set bearing fault diagnosis method based on MSE and PSO-SVM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20141015 |