CN106372669A

CN106372669A - Double-order adaptive wavelet clustering method

Info

Publication number: CN106372669A
Application number: CN201610799993.2A
Authority: CN
Inventors: 左红艳; 刘晓波; 洪连环
Original assignee: Nanchang Hangkong University
Current assignee: Nanchang Hangkong University
Priority date: 2016-08-31
Filing date: 2016-08-31
Publication date: 2017-02-01

Abstract

The invention discloses a double-order adaptive wavelet clustering method and is applied to data classification and mode identification. The method mainly comprises four steps that firstly, the coarse mesh quantitative data space is employed, space regions having clustering are searched, and data pre-sorting clustering is realized; secondly, statistics of the information of sub clustering is carried out, an optimal quantitative value is calculated automatically according to data distribution characteristics of each sub clustering and is stored; thirdly, a quantitative value of each sub clustering and the boundary information are extracted, adaptive fine division of the data space of each sub clustering is carried out to realize wavelet clustering; and lastly, the clustering result and an information storage table are outputted. Compared with the prior art, influence of setting quantitative values and a density threshold on clustering precision can be eliminated through the double-order adaptive wavelet clustering method, clustering precision is improved especially for the nonuniform-density data, and diagnosis precision is substantially higher than a traditional wavelet clustering method.

Description

The adaptive Wave Cluster method of a kind of pair of rank

Technical field

Present invention is generally directed in big data, classification and pattern especially for the big data uneven in Density Distribution are known Not.

Background technology

The cluster analyses of unsupervised learning are one of machine learning field important branch, the clustering method based on grid Object space is quantified as a limited number of grid cell, forms a network, right as operating using notable genre cell As realizing cluster, its processing speed is quickly.Wave Cluster algorithm is the typical algorithm based on Grid Clustering, and it is by wavelet transformation Organically combine a kind of unified algorithm based on grid and density being formed with Grid Clustering Algorithm, therefore Wave Cluster method has The advantage of Grid Clustering, such as unsupervised instruct cluster, the speed of service fast, can effective process large data sets, arbitrary shape can be found Cluster etc..And due to incorporating of wavelet transformation technique, Wave Cluster method can carry out effective denoising to data, cluster is made to tie Fruit is not affected by noise, and can find to cluster on different metric spaces.But the clustering precision of Wave Cluster is quantified The impact of value setting, provides optimum quantization value, is the key obtaining high accuracy cluster result.For the uneven number of Density Distribution Strong point collection, is difficult to get a rational quantized value.If quantized value is too high, several groups can be split up into cluster class；Amount Change value is too low, and the cluster that should separate may be merged into same cluster class.

Content of the invention

The purpose of the present invention is for Density Distribution uneven big data point set, proposes a kind of pair of rank adaptive wavelet cluster Analysis method, it is therefore an objective to eliminate the impact of stress and strain model and threshold value, improves the precision of cluster.

The present invention is achieved in that a kind of pair of rank adaptive wavelet clustering method it is characterised in that being broadly divided into four Stage:

First stage, coarse grid divides, and Pre-sorting clusters, and is divided each cluster obtaining, referred to as son cluster by coarse grid；

Its calculation step is:

Step 1: input d dimensional vector point set x to be analyzed, thick division quantifies value parameter k, density threshold w；

Step 2: quantify grid cell, store each grid cell with cell element array c；

Step 3: count the data points in each grid cell, extract data and notable grid list in notable grid cell The location tags of unit are stored in first row and the secondary series of newly-built information table c1；

Step 4: the location tags according to c1 secondary series form distance matrix, application BFS connection is adjacent notable Grid cell realizes cluster.The cluster mark of notable grid cell is stored in the 3rd row of information table c1；

Step 5: output thick division Pre-sorting cluster result；And Pre-sorting clustering information table；

Second stage, counts each sub- clustering information and calculates the quantized value that son cluster adaptive refinement divides.

1) the quantized value k that sub- cluster is often tieed up_iComputational methods

The thin division quantized value k of son cluster i-th dimension_iCan be obtained by formula (1),

Wherein m_iIt is the lattice number of sub- cluster i-th dimension；B is maximal density and the minimum density of the notable grid cell of sub- cluster Than；D is the dimension of data to be sorted.

2) sub- clustering information statistics and storage

In information table c1, count each notable grid cell density d, be stored in c1 the 4th row；Calculate each height cluster Grid maximal density and minimum density, than b, are stored in the 5th row of the first row of c1 each height cluster；Calculated according to formula (1) The quantized value k of each dimension_i, calculate the maximum boundary j that each height cluster is often tieed up_imax, minimum border j_imin, form d × 3 matrix, as formula (2) shown in formula, it is stored in the 6th row of the first row of c1 cluster.

\begin{matrix} j_{1 \max} & j_{1 \min} & k_{1} \\ . . . & . . . & . . . \\ j_{i \max} & j_{i \min} & ki \end{matrix} - - - (2)

3) set up information MAP table c2

From the information list c1 of thick cluster, the statistical information of same height cluster is extracted in classification, and classification is stored in newly-built In c2 { n 1 } information table, such as c2 { i 1 } stores the statistical information of i-th son cluster.Extract the information of c2 { n 1 }, for follow-up Adaptive wavelet cluster, the information Store after adaptive wavelet cluster is to c2 { n 2 }.The foundation of message store table c2 so that Coarse grid Pre-sorting cluster and adaptive wavelet cluster establish mapping relations, and final cluster result and initial data it Between establish mapping relations.

Phase III, the self adaptation in sub- cluster data space is thin to be divided and Wave Cluster；

Adaptive refinement divides the step of Wave Cluster as follows:

Step 1: extract the initial data x, the maximum boundary j often tieing up of the son cluster of c2 { n 1 }_imax, minimum border j_imin, Quantized value k_i.

Step 2: quantify son cluster grid cell.

Step 3: the location tags extracting data and notable grid cell in notable grid cell are stored in newly-built information table Go out first row and the secondary series of c3.

Step 4: wavelet transformation is implemented to data in notable grid cell, extracts wavelet conversion coefficient, be stored in information table Go out the 3rd row of c3.

Step 5: the location tags according to c3 secondary series form distance matrix, applies BFS principle, in small echo Connect adjacent notable grid cell in feature space after conversion and realize cluster.Cluster mark is stored in the 4th of information table c3 Row.

Step 6: c3 message store table is stored in corresponding c2 { n 2 }.

Step 7: output cluster result figure, with the different cluster of hue distinguishes.

Step 8: circulation step 1 to 7, until to all of sub- cluster, self adaptation carefully divides Wave Cluster and completes.

Fourth stage, the cluster result of the thin grid division of output adaptive and message store table.According to message store table c2, Comparison data, judges the type of cluster mark, and identified cluster result.

The present invention compared with prior art, has advantages below and salience effect: need not specify classification number, realize no supervising Superintend and direct guidance cluster, the efficiency high of cluster, be particularly directed to the pattern recognition and classification of the big data of Density inhomogeneity distribution, this side Method improves its clustering precision, reduces the impact to cluster result of quantized value and density threshold.

Brief description

Fig. 1 is double rank adaptive wavelet clustering method flow charts of the present invention.

Fig. 2 is that the thick division Pre-sorting of the present invention clusters flow process.

Fig. 3 is the flow chart in the quantization characteristic space of the present invention.

Fig. 4 is the c1 notable grid cell information memory structure table of the present invention.

Fig. 5 is that the adaptive refinement of the present invention divides Wave Cluster flow chart.

Fig. 6 is the information memory structure table of the c3 of the present invention.

Fig. 7 is the double rank adaptive wavelet cluster result example of application of the present invention.

Fig. 8 is the application k=10 single-order adaptive wavelet cluster result example of the present invention.

Fig. 9 is the application k=30 single-order adaptive wavelet cluster result example of the present invention.

Figure 10 is the application k=45 single-order adaptive wavelet cluster result example of the present invention.

Specific embodiment

As shown in Figure 1, Figure 2, shown in Fig. 3, Fig. 4, Fig. 5, Fig. 6, Fig. 7, Fig. 8, Fig. 9, Figure 10, further describe below in conjunction with the accompanying drawings The algorithmic procedure of the present invention.The adaptive Wave Cluster algorithm main-process stream of double rank is as shown in Figure 1.Double rank adaptive wavelet cluster sides Method is broadly divided into four-stage: the first rank, and coarse grid divides, and Pre-sorting clusters.Divided each cluster obtaining by coarse grid, claim For sub- cluster；Second-order, statistics and each sub- clustering information of calculating；3rd rank, the self adaptation of the data space of all sub- clusters is thin Divide and Wave Cluster；Fourth order, the cluster result of the thin grid division of output adaptive and message store table.The calculation in each stage The detailed process of method, being described as follows respectively.

1st, thick division Pre-sorting clustering phase；

Its flow chart is as shown in Figure 2

Step 1: input d dimensional vector point set x to be analyzed；Thick division quantifies value parameter k；Density threshold w；

Step 2: calculate data point set x often one-dimensional on maximum h_i, minima l_i, and step-length s_i.

Step-length s_iComputing formula:

s_{i} = \frac{h_{i} - l_{i}}{k} - - - (1)

l_iIt is data point set in d_iMinima in dimension；h_iIt is data point set in d_iMaximum in dimension；Wherein 1≤i≤d, K is quantized value.

Step 3: quantify grid cell, store each grid cell with cell element array c.Fig. 3 example is the amount of 2-D data Change process.

Often one-dimensional by d dimension data space is divided into k equal length and right open interval is closed on a disjoint left side, thus will be whole Individual data space is divided into k^dIndividual non-intersect and equal-sized rectangular element, arbitrary grid cell c_iInterval range be:

\begin{matrix} c_{i} = [l_{1} + (i - 1) \times s_{1}, l_{1} + i \times s_{1}) \times [l_{2} + (j - 1) \times s_{2}, l_{2} + j \times s_{2}) \times . . . . \\ \times [l_{d} + (m - 1) \times s_{d}, l_{d} + m \times s_{d}) \end{matrix} - - - (2)

Step 4: count the data points den (c in each grid cell_i), as den (c_i) more than density threshold w when, this Grid cell is notable grid cell, and the location tags extracting data and notable grid cell in notable grid cell are stored in newly Build first row and the secondary series of information table c1.C1 storage table structure is as shown in Figure 4.

Step 5: the location tags according to c1 secondary series form distance matrix, application BFS connection is adjacent notable Grid cell realizes cluster.The cluster mark of notable grid cell is stored in the 3rd row of information table c1, as shown in Figure 4.

Step 6: output thick division Pre-sorting cluster result；And Pre-sorting clustering information table.

2nd, sub- clustering information statistics and calculation stages

1) the thin division quantized value k of sub- cluster i-th dimension_iComputing formula be shown below

Wherein m_iIt is the lattice number of sub- cluster i-th dimension；B is maximal density and the minimum density of the notable grid cell of sub- cluster Than；D is the dimension of data to be sorted

2) sub- clustering information statistics and storage

The structure of the message store table c1 of son cluster, as shown in Figure 4.In information table c1, count each notable grid cell Density d, is stored in c1 the 4th row；Calculate each height cluster grid maximal density with minimum density than b, be stored in each height of c1 5th row of the first row of cluster；, the quantized value k of each dimension is calculated according to formula (1)_i, calculate the maximum that each height cluster is often tieed up Border j_imax, minimum border j_imin, form d × 3 matrix, as shown in formula (2) formula, be stored in the 6th of the first row that c1 clusters the Row.

\begin{matrix} j_{1 \max} & j_{1 \min} & k_{1} \\ . . . & . . . & . . . \\ j_{i \max} & j_{i \min} & ki \end{matrix} - - - (2)

3) set up information MAP table c2

3rd, adaptive refinement and the Wave Cluster stage

Adaptive refinement divides the process of Wave Cluster as shown in Figure 5

Adaptive refinement divides the step of Wave Cluster as follows:

Step 1: extract the initial data x, the maximum boundary j often tieing up of son cluster_imax, minimum border j_imin, quantized value k_i.

Extract the data message x of the first row in c2 { n 1 } unit, and the 7th data message arranging in c2 { n 1 } unit: Maximum boundary coordinate j_imax, minimum boundary coordinate j_imin, quantized value k_i, as the input information of adaptive wavelet cluster.

Step 2: calculate often one-dimensional step pitch s_i.

According to the quantized value k on often one-dimensional_i, net boundary j_imax,j_imin, formula (2) material calculation s_i.

Step 3: quantify son cluster grid cell, sub- Cluster space is quantized into k₁×k₂×...×k_iIndividual non-intersect and big Little equal grid cell.

Step 4: data x is rendered in unit c', each grid cell c'_iStore the data point set in its spatial dimension.

Step 5: count the data points den (c' in each grid cell_i), as den (c'_i) more than density threshold w' when, This grid cell is notable grid cell, and the location tags extracting data and notable grid cell in notable grid cell are stored in Newly-built information table goes out first row and the secondary series of c3.C3 storage table structure is as shown in Figure 6.

Step 6: wavelet transformation is implemented to data in notable grid cell, extracts wavelet conversion coefficient, be stored in information table Go out the 3rd row of c3.

Step 7: the location tags according to c3 secondary series form distance matrix, applies BFS principle, in small echo Connect adjacent notable grid cell in feature space after conversion and realize cluster.Cluster mark is stored in the 4th of information table c3 Row.

Step 8: output cluster result figure, with the different cluster of hue distinguishes.C3 message store table is stored in accordingly In c2 { n 2 }.

Step 9: circulation step 1 to 8, until to all of sub- cluster, self adaptation carefully divides Wave Cluster and completes.

4th, output information table and cluster result；

Output information storage table c2, comparison data, judge the type of cluster mark, and identified cluster.Output adaptive is thin The cluster result of grid division and message store table.

The inventive method, applies matlab software and realizes.In order to verify the advantage of the method, to aeroengine rotor The rotor unbalance of exerciser casing station acquisition, misalign, static pieces touch rub and pedestal looseness acceleration signal feature Vector data is verified.The diagnostic result of the poly- method of the adaptive small echo of double rank is as shown in figure 8, fault diagnosis accuracy is 96.82.In order to highlight the advantage of the method, and compare with traditional Wave Cluster, Traditional Wavelet cluster result such as Fig. 8, figure 9th, shown in Figure 10.Fig. 8, Fig. 9 are respectively the cluster result when quantized value k is 10 and 30, and because stress and strain model is too big, distance is relatively Near data is gathered for a class it is impossible to correct cluster, and Figure 10 is the Traditional Wavelet cluster result that quantized value k is when 45, and distance is relatively Near fault data is distinguished, but the loose looseness fault of distribution has been polymerized to three classes, the fault diagnosis of Figure 10 correct Rate is 88.34.

Claims

1. the adaptive Wave Cluster method of a kind of pair of rank is it is characterised in that be broadly divided into four-stage:

Second stage, counts each sub- clustering information and calculates the quantized value that son cluster adaptive refinement divides；

Fourth stage, the cluster result of the thin grid division of output adaptive and message store table.According to message store table, compare logarithm According to, judge the type of cluster mark, and identified cluster result.

2. the adaptive Wave Cluster method of a kind of pair of rank according to claim 1 it is characterised in that: the first stage its Calculation step is:

Step 1: input d dimensional vector data point set x to be analyzed, thick division quantifies value parameter k, density threshold w；

Step 2: quantify grid cell, store each grid cell with cell element array c；

Step 3: count the data points in each grid cell, extract data and notable grid cell in notable grid cell Location tags are stored in first row and the secondary series of newly-built information table c1；

Step 4: the location tags according to c1 secondary series form distance matrix, application BFS connects adjacent notable grid Unit realizes cluster.The cluster mark of notable grid cell is stored in the 3rd row of information table c1；

Step 5: output thick division Pre-sorting cluster result；And Pre-sorting clustering information table.

3. a kind of pair of rank adaptive wavelet clustering method according to claim 1 it is characterised in that: the information of second stage Statistics and computational methods are:

1) the quantized value k that sub- cluster is often tieed up_iComputational methods；

Wherein m_iIt is the lattice number of sub- cluster i-th dimension；B is maximal density and the minimum density ratio of the notable grid cell of sub- cluster；D is The dimension of data to be sorted；

2) sub- clustering information statistics and storage；

In information table c1, count each notable grid cell density d, be stored in c1 the 4th row；Calculate each height cluster grid Maximal density and minimum density, than b, are stored in the 5th row of the first row of c1 each height cluster；, each dimension is calculated according to formula (1) Quantized value k_i, calculate the maximum boundary j that each height cluster is often tieed up_imax, minimum border j_imin, form d × 3 matrix, as formula (2) Shown in formula, it is stored in the 6th row of the first row of c1 cluster；

\begin{matrix} j_{1 \max} & j_{1 \min} & k_{1} \\ ... & ... & ... \\ j_{i \max} & j_{i \min} & k i \end{matrix} - - - (2)

3) set up information MAP table c2；

From the information list c1 of thick cluster, the statistical information of same height cluster is extracted in classification, and classification is stored in newly-built c2 { n 1 }, in information table, such as c2 { i 1 } stores the statistical information of i-th son cluster, extracts the information of c2 { n 1 }, for follow-up from Adapt to Wave Cluster, the information Store after adaptive wavelet cluster to c2 { n 2 }；The foundation of message store table c2 is so that coarse net Lattice Pre-sorting cluster is established with adaptive wavelet cluster builds between mapping relations, and final cluster result and initial data Mapping relations are found.

4. a kind of pair of rank adaptive wavelet clustering method according to claim 1 it is characterised in that: phase III self adaptation The step of refinement point Wave Cluster is as follows:

Step 1: extract the initial data x, the maximum boundary j often tieing up of the son cluster of c2 { n 1 }_imax, minimum border j_imin, quantify Value k_i；

Step 2: quantify son cluster grid cell；

Step 3: the location tags extracting data and notable grid cell in notable grid cell are stored in newly-built information table and go out c3 First row and secondary series；

Step 4: wavelet transformation is implemented to data in notable grid cell, extracts wavelet conversion coefficient, be stored in information table and go out c3 The 3rd row；

Step 5: the location tags according to c3 secondary series form distance matrix, applies BFS principle, in wavelet transformation Connect adjacent notable grid cell in feature space afterwards and realize cluster.Cluster mark is stored in the 4th row of information table c3；

Step 6: c3 message store table is stored in corresponding c2 { n 2 }；

Step 7: output cluster result figure, with the different cluster of hue distinguishes；