CN106372669A - Double-order adaptive wavelet clustering method - Google Patents
Double-order adaptive wavelet clustering method Download PDFInfo
- Publication number
- CN106372669A CN106372669A CN201610799993.2A CN201610799993A CN106372669A CN 106372669 A CN106372669 A CN 106372669A CN 201610799993 A CN201610799993 A CN 201610799993A CN 106372669 A CN106372669 A CN 106372669A
- Authority
- CN
- China
- Prior art keywords
- cluster
- clustering
- information
- data
- grid cell
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23211—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with adaptive number of clusters
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a double-order adaptive wavelet clustering method and is applied to data classification and mode identification. The method mainly comprises four steps that firstly, the coarse mesh quantitative data space is employed, space regions having clustering are searched, and data pre-sorting clustering is realized; secondly, statistics of the information of sub clustering is carried out, an optimal quantitative value is calculated automatically according to data distribution characteristics of each sub clustering and is stored; thirdly, a quantitative value of each sub clustering and the boundary information are extracted, adaptive fine division of the data space of each sub clustering is carried out to realize wavelet clustering; and lastly, the clustering result and an information storage table are outputted. Compared with the prior art, influence of setting quantitative values and a density threshold on clustering precision can be eliminated through the double-order adaptive wavelet clustering method, clustering precision is improved especially for the nonuniform-density data, and diagnosis precision is substantially higher than a traditional wavelet clustering method.
Description
Technical field
Present invention is generally directed in big data, classification and pattern especially for the big data uneven in Density Distribution are known
Not.
Background technology
The cluster analyses of unsupervised learning are one of machine learning field important branch, the clustering method based on grid
Object space is quantified as a limited number of grid cell, forms a network, right as operating using notable genre cell
As realizing cluster, its processing speed is quickly.Wave Cluster algorithm is the typical algorithm based on Grid Clustering, and it is by wavelet transformation
Organically combine a kind of unified algorithm based on grid and density being formed with Grid Clustering Algorithm, therefore Wave Cluster method has
The advantage of Grid Clustering, such as unsupervised instruct cluster, the speed of service fast, can effective process large data sets, arbitrary shape can be found
Cluster etc..And due to incorporating of wavelet transformation technique, Wave Cluster method can carry out effective denoising to data, cluster is made to tie
Fruit is not affected by noise, and can find to cluster on different metric spaces.But the clustering precision of Wave Cluster is quantified
The impact of value setting, provides optimum quantization value, is the key obtaining high accuracy cluster result.For the uneven number of Density Distribution
Strong point collection, is difficult to get a rational quantized value.If quantized value is too high, several groups can be split up into cluster class;Amount
Change value is too low, and the cluster that should separate may be merged into same cluster class.
Content of the invention
The purpose of the present invention is for Density Distribution uneven big data point set, proposes a kind of pair of rank adaptive wavelet cluster
Analysis method, it is therefore an objective to eliminate the impact of stress and strain model and threshold value, improves the precision of cluster.
The present invention is achieved in that a kind of pair of rank adaptive wavelet clustering method it is characterised in that being broadly divided into four
Stage:
First stage, coarse grid divides, and Pre-sorting clusters, and is divided each cluster obtaining, referred to as son cluster by coarse grid;
Its calculation step is:
Step 1: input d dimensional vector point set x to be analyzed, thick division quantifies value parameter k, density threshold w;
Step 2: quantify grid cell, store each grid cell with cell element array c;
Step 3: count the data points in each grid cell, extract data and notable grid list in notable grid cell
The location tags of unit are stored in first row and the secondary series of newly-built information table c1;
Step 4: the location tags according to c1 secondary series form distance matrix, application BFS connection is adjacent notable
Grid cell realizes cluster.The cluster mark of notable grid cell is stored in the 3rd row of information table c1;
Step 5: output thick division Pre-sorting cluster result;And Pre-sorting clustering information table;
Second stage, counts each sub- clustering information and calculates the quantized value that son cluster adaptive refinement divides.
1) the quantized value k that sub- cluster is often tieed upiComputational methods
The thin division quantized value k of son cluster i-th dimensioniCan be obtained by formula (1),
Wherein miIt is the lattice number of sub- cluster i-th dimension;B is maximal density and the minimum density of the notable grid cell of sub- cluster
Than;D is the dimension of data to be sorted.
2) sub- clustering information statistics and storage
In information table c1, count each notable grid cell density d, be stored in c1 the 4th row;Calculate each height cluster
Grid maximal density and minimum density, than b, are stored in the 5th row of the first row of c1 each height cluster;Calculated according to formula (1)
The quantized value k of each dimensioni, calculate the maximum boundary j that each height cluster is often tieed upimax, minimum border jimin, form d × 3 matrix, as formula
(2) shown in formula, it is stored in the 6th row of the first row of c1 cluster.
3) set up information MAP table c2
From the information list c1 of thick cluster, the statistical information of same height cluster is extracted in classification, and classification is stored in newly-built
In c2 { n 1 } information table, such as c2 { i 1 } stores the statistical information of i-th son cluster.Extract the information of c2 { n 1 }, for follow-up
Adaptive wavelet cluster, the information Store after adaptive wavelet cluster is to c2 { n 2 }.The foundation of message store table c2 so that
Coarse grid Pre-sorting cluster and adaptive wavelet cluster establish mapping relations, and final cluster result and initial data it
Between establish mapping relations.
Phase III, the self adaptation in sub- cluster data space is thin to be divided and Wave Cluster;
Adaptive refinement divides the step of Wave Cluster as follows:
Step 1: extract the initial data x, the maximum boundary j often tieing up of the son cluster of c2 { n 1 }imax, minimum border jimin,
Quantized value ki.
Step 2: quantify son cluster grid cell.
Step 3: the location tags extracting data and notable grid cell in notable grid cell are stored in newly-built information table
Go out first row and the secondary series of c3.
Step 4: wavelet transformation is implemented to data in notable grid cell, extracts wavelet conversion coefficient, be stored in information table
Go out the 3rd row of c3.
Step 5: the location tags according to c3 secondary series form distance matrix, applies BFS principle, in small echo
Connect adjacent notable grid cell in feature space after conversion and realize cluster.Cluster mark is stored in the 4th of information table c3
Row.
Step 6: c3 message store table is stored in corresponding c2 { n 2 }.
Step 7: output cluster result figure, with the different cluster of hue distinguishes.
Step 8: circulation step 1 to 7, until to all of sub- cluster, self adaptation carefully divides Wave Cluster and completes.
Fourth stage, the cluster result of the thin grid division of output adaptive and message store table.According to message store table c2,
Comparison data, judges the type of cluster mark, and identified cluster result.
The present invention compared with prior art, has advantages below and salience effect: need not specify classification number, realize no supervising
Superintend and direct guidance cluster, the efficiency high of cluster, be particularly directed to the pattern recognition and classification of the big data of Density inhomogeneity distribution, this side
Method improves its clustering precision, reduces the impact to cluster result of quantized value and density threshold.
Brief description
Fig. 1 is double rank adaptive wavelet clustering method flow charts of the present invention.
Fig. 2 is that the thick division Pre-sorting of the present invention clusters flow process.
Fig. 3 is the flow chart in the quantization characteristic space of the present invention.
Fig. 4 is the c1 notable grid cell information memory structure table of the present invention.
Fig. 5 is that the adaptive refinement of the present invention divides Wave Cluster flow chart.
Fig. 6 is the information memory structure table of the c3 of the present invention.
Fig. 7 is the double rank adaptive wavelet cluster result example of application of the present invention.
Fig. 8 is the application k=10 single-order adaptive wavelet cluster result example of the present invention.
Fig. 9 is the application k=30 single-order adaptive wavelet cluster result example of the present invention.
Figure 10 is the application k=45 single-order adaptive wavelet cluster result example of the present invention.
Specific embodiment
As shown in Figure 1, Figure 2, shown in Fig. 3, Fig. 4, Fig. 5, Fig. 6, Fig. 7, Fig. 8, Fig. 9, Figure 10, further describe below in conjunction with the accompanying drawings
The algorithmic procedure of the present invention.The adaptive Wave Cluster algorithm main-process stream of double rank is as shown in Figure 1.Double rank adaptive wavelet cluster sides
Method is broadly divided into four-stage: the first rank, and coarse grid divides, and Pre-sorting clusters.Divided each cluster obtaining by coarse grid, claim
For sub- cluster;Second-order, statistics and each sub- clustering information of calculating;3rd rank, the self adaptation of the data space of all sub- clusters is thin
Divide and Wave Cluster;Fourth order, the cluster result of the thin grid division of output adaptive and message store table.The calculation in each stage
The detailed process of method, being described as follows respectively.
1st, thick division Pre-sorting clustering phase;
Its flow chart is as shown in Figure 2
Step 1: input d dimensional vector point set x to be analyzed;Thick division quantifies value parameter k;Density threshold w;
Step 2: calculate data point set x often one-dimensional on maximum hi, minima li, and step-length si.
Step-length siComputing formula:
liIt is data point set in diMinima in dimension;hiIt is data point set in diMaximum in dimension;Wherein 1≤i≤d,
K is quantized value.
Step 3: quantify grid cell, store each grid cell with cell element array c.Fig. 3 example is the amount of 2-D data
Change process.
Often one-dimensional by d dimension data space is divided into k equal length and right open interval is closed on a disjoint left side, thus will be whole
Individual data space is divided into kdIndividual non-intersect and equal-sized rectangular element, arbitrary grid cell ciInterval range be:
Step 4: count the data points den (c in each grid celli), as den (ci) more than density threshold w when, this
Grid cell is notable grid cell, and the location tags extracting data and notable grid cell in notable grid cell are stored in newly
Build first row and the secondary series of information table c1.C1 storage table structure is as shown in Figure 4.
Step 5: the location tags according to c1 secondary series form distance matrix, application BFS connection is adjacent notable
Grid cell realizes cluster.The cluster mark of notable grid cell is stored in the 3rd row of information table c1, as shown in Figure 4.
Step 6: output thick division Pre-sorting cluster result;And Pre-sorting clustering information table.
2nd, sub- clustering information statistics and calculation stages
1) the thin division quantized value k of sub- cluster i-th dimensioniComputing formula be shown below
Wherein miIt is the lattice number of sub- cluster i-th dimension;B is maximal density and the minimum density of the notable grid cell of sub- cluster
Than;D is the dimension of data to be sorted
2) sub- clustering information statistics and storage
The structure of the message store table c1 of son cluster, as shown in Figure 4.In information table c1, count each notable grid cell
Density d, is stored in c1 the 4th row;Calculate each height cluster grid maximal density with minimum density than b, be stored in each height of c1
5th row of the first row of cluster;, the quantized value k of each dimension is calculated according to formula (1)i, calculate the maximum that each height cluster is often tieed up
Border jimax, minimum border jimin, form d × 3 matrix, as shown in formula (2) formula, be stored in the 6th of the first row that c1 clusters the
Row.
3) set up information MAP table c2
From the information list c1 of thick cluster, the statistical information of same height cluster is extracted in classification, and classification is stored in newly-built
In c2 { n 1 } information table, such as c2 { i 1 } stores the statistical information of i-th son cluster.Extract the information of c2 { n 1 }, for follow-up
Adaptive wavelet cluster, the information Store after adaptive wavelet cluster is to c2 { n 2 }.The foundation of message store table c2 so that
Coarse grid Pre-sorting cluster and adaptive wavelet cluster establish mapping relations, and final cluster result and initial data it
Between establish mapping relations.
3rd, adaptive refinement and the Wave Cluster stage
Adaptive refinement divides the process of Wave Cluster as shown in Figure 5
Adaptive refinement divides the step of Wave Cluster as follows:
Step 1: extract the initial data x, the maximum boundary j often tieing up of son clusterimax, minimum border jimin, quantized value ki.
Extract the data message x of the first row in c2 { n 1 } unit, and the 7th data message arranging in c2 { n 1 } unit:
Maximum boundary coordinate jimax, minimum boundary coordinate jimin, quantized value ki, as the input information of adaptive wavelet cluster.
Step 2: calculate often one-dimensional step pitch si.
According to the quantized value k on often one-dimensionali, net boundary jimax,jimin, formula (2) material calculation si.
Step 3: quantify son cluster grid cell, sub- Cluster space is quantized into k1×k2×...×kiIndividual non-intersect and big
Little equal grid cell.
Step 4: data x is rendered in unit c', each grid cell c'iStore the data point set in its spatial dimension.
Step 5: count the data points den (c' in each grid celli), as den (c'i) more than density threshold w' when,
This grid cell is notable grid cell, and the location tags extracting data and notable grid cell in notable grid cell are stored in
Newly-built information table goes out first row and the secondary series of c3.C3 storage table structure is as shown in Figure 6.
Step 6: wavelet transformation is implemented to data in notable grid cell, extracts wavelet conversion coefficient, be stored in information table
Go out the 3rd row of c3.
Step 7: the location tags according to c3 secondary series form distance matrix, applies BFS principle, in small echo
Connect adjacent notable grid cell in feature space after conversion and realize cluster.Cluster mark is stored in the 4th of information table c3
Row.
Step 8: output cluster result figure, with the different cluster of hue distinguishes.C3 message store table is stored in accordingly
In c2 { n 2 }.
Step 9: circulation step 1 to 8, until to all of sub- cluster, self adaptation carefully divides Wave Cluster and completes.
4th, output information table and cluster result;
Output information storage table c2, comparison data, judge the type of cluster mark, and identified cluster.Output adaptive is thin
The cluster result of grid division and message store table.
The inventive method, applies matlab software and realizes.In order to verify the advantage of the method, to aeroengine rotor
The rotor unbalance of exerciser casing station acquisition, misalign, static pieces touch rub and pedestal looseness acceleration signal feature
Vector data is verified.The diagnostic result of the poly- method of the adaptive small echo of double rank is as shown in figure 8, fault diagnosis accuracy is
96.82.In order to highlight the advantage of the method, and compare with traditional Wave Cluster, Traditional Wavelet cluster result such as Fig. 8, figure
9th, shown in Figure 10.Fig. 8, Fig. 9 are respectively the cluster result when quantized value k is 10 and 30, and because stress and strain model is too big, distance is relatively
Near data is gathered for a class it is impossible to correct cluster, and Figure 10 is the Traditional Wavelet cluster result that quantized value k is when 45, and distance is relatively
Near fault data is distinguished, but the loose looseness fault of distribution has been polymerized to three classes, the fault diagnosis of Figure 10 correct
Rate is 88.34.
Claims (4)
1. the adaptive Wave Cluster method of a kind of pair of rank is it is characterised in that be broadly divided into four-stage:
First stage, coarse grid divides, and Pre-sorting clusters, and is divided each cluster obtaining, referred to as son cluster by coarse grid;
Second stage, counts each sub- clustering information and calculates the quantized value that son cluster adaptive refinement divides;
Phase III, the self adaptation in sub- cluster data space is thin to be divided and Wave Cluster;
Fourth stage, the cluster result of the thin grid division of output adaptive and message store table.According to message store table, compare logarithm
According to, judge the type of cluster mark, and identified cluster result.
2. the adaptive Wave Cluster method of a kind of pair of rank according to claim 1 it is characterised in that: the first stage its
Calculation step is:
Step 1: input d dimensional vector data point set x to be analyzed, thick division quantifies value parameter k, density threshold w;
Step 2: quantify grid cell, store each grid cell with cell element array c;
Step 3: count the data points in each grid cell, extract data and notable grid cell in notable grid cell
Location tags are stored in first row and the secondary series of newly-built information table c1;
Step 4: the location tags according to c1 secondary series form distance matrix, application BFS connects adjacent notable grid
Unit realizes cluster.The cluster mark of notable grid cell is stored in the 3rd row of information table c1;
Step 5: output thick division Pre-sorting cluster result;And Pre-sorting clustering information table.
3. a kind of pair of rank adaptive wavelet clustering method according to claim 1 it is characterised in that: the information of second stage
Statistics and computational methods are:
1) the quantized value k that sub- cluster is often tieed upiComputational methods;
The thin division quantized value k of son cluster i-th dimensioniCan be obtained by formula (1),
Wherein miIt is the lattice number of sub- cluster i-th dimension;B is maximal density and the minimum density ratio of the notable grid cell of sub- cluster;D is
The dimension of data to be sorted;
2) sub- clustering information statistics and storage;
In information table c1, count each notable grid cell density d, be stored in c1 the 4th row;Calculate each height cluster grid
Maximal density and minimum density, than b, are stored in the 5th row of the first row of c1 each height cluster;, each dimension is calculated according to formula (1)
Quantized value ki, calculate the maximum boundary j that each height cluster is often tieed upimax, minimum border jimin, form d × 3 matrix, as formula (2)
Shown in formula, it is stored in the 6th row of the first row of c1 cluster;
3) set up information MAP table c2;
From the information list c1 of thick cluster, the statistical information of same height cluster is extracted in classification, and classification is stored in newly-built c2 { n
1 }, in information table, such as c2 { i 1 } stores the statistical information of i-th son cluster, extracts the information of c2 { n 1 }, for follow-up from
Adapt to Wave Cluster, the information Store after adaptive wavelet cluster to c2 { n 2 };The foundation of message store table c2 is so that coarse net
Lattice Pre-sorting cluster is established with adaptive wavelet cluster builds between mapping relations, and final cluster result and initial data
Mapping relations are found.
4. a kind of pair of rank adaptive wavelet clustering method according to claim 1 it is characterised in that: phase III self adaptation
The step of refinement point Wave Cluster is as follows:
Step 1: extract the initial data x, the maximum boundary j often tieing up of the son cluster of c2 { n 1 }imax, minimum border jimin, quantify
Value ki;
Step 2: quantify son cluster grid cell;
Step 3: the location tags extracting data and notable grid cell in notable grid cell are stored in newly-built information table and go out c3
First row and secondary series;
Step 4: wavelet transformation is implemented to data in notable grid cell, extracts wavelet conversion coefficient, be stored in information table and go out c3
The 3rd row;
Step 5: the location tags according to c3 secondary series form distance matrix, applies BFS principle, in wavelet transformation
Connect adjacent notable grid cell in feature space afterwards and realize cluster.Cluster mark is stored in the 4th row of information table c3;
Step 6: c3 message store table is stored in corresponding c2 { n 2 };
Step 7: output cluster result figure, with the different cluster of hue distinguishes;
Step 8: circulation step 1 to 7, until to all of sub- cluster, self adaptation carefully divides Wave Cluster and completes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610799993.2A CN106372669A (en) | 2016-08-31 | 2016-08-31 | Double-order adaptive wavelet clustering method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610799993.2A CN106372669A (en) | 2016-08-31 | 2016-08-31 | Double-order adaptive wavelet clustering method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106372669A true CN106372669A (en) | 2017-02-01 |
Family
ID=57899320
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610799993.2A Pending CN106372669A (en) | 2016-08-31 | 2016-08-31 | Double-order adaptive wavelet clustering method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106372669A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107292039A (en) * | 2017-06-27 | 2017-10-24 | 哈尔滨工程大学 | A kind of UUV based on Wave Cluster patrols bank profile construction method |
CN110633719A (en) * | 2018-06-21 | 2019-12-31 | 北京新羿生物科技有限公司 | Micro-droplet data classification method |
-
2016
- 2016-08-31 CN CN201610799993.2A patent/CN106372669A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107292039A (en) * | 2017-06-27 | 2017-10-24 | 哈尔滨工程大学 | A kind of UUV based on Wave Cluster patrols bank profile construction method |
CN107292039B (en) * | 2017-06-27 | 2020-12-29 | 哈尔滨工程大学 | UUV bank patrolling profile construction method based on wavelet clustering |
CN110633719A (en) * | 2018-06-21 | 2019-12-31 | 北京新羿生物科技有限公司 | Micro-droplet data classification method |
CN110633719B (en) * | 2018-06-21 | 2022-05-20 | 清华大学 | Micro-droplet data classification method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105574534A (en) | Significant object detection method based on sparse subspace clustering and low-order expression | |
CN101699514B (en) | Immune clone quantum clustering-based SAR image segmenting method | |
CN104751000B (en) | A kind of electromechanical combined drive state monitoring signals wavelet de-noising method | |
CN105843919A (en) | Moving object track clustering method based on multi-feature fusion and clustering ensemble | |
CN107240136B (en) | Static image compression method based on deep learning model | |
CN110147760B (en) | Novel efficient electric energy quality disturbance image feature extraction and identification method | |
CN106326925A (en) | Apple disease image identification method based on deep learning network | |
CN103577841A (en) | Human body behavior identification method adopting non-supervision multiple-view feature selection | |
CN106340004A (en) | Fuzzy clustering preprocessing cloud system-based parallel cloud drift wind inversion method | |
CN110458189A (en) | Compressed sensing and depth convolutional neural networks Power Quality Disturbance Classification Method | |
Hazra et al. | Shape oriented feature selection for tomato plant identification | |
CN110726898A (en) | Power distribution network fault type identification method | |
CN105631478A (en) | Plant classification method based on sparse expression dictionary learning | |
CN114021483A (en) | Ultra-short-term wind power prediction method based on time domain characteristics and XGboost | |
CN114548190A (en) | Wind turbine fault diagnosis method based on self-adaptive residual error neural network | |
CN106372669A (en) | Double-order adaptive wavelet clustering method | |
CN105787113B (en) | A kind of mining algorithm based on PLM database towards DPIPP technique information | |
CN117171544B (en) | Motor vibration fault diagnosis method based on multichannel fusion convolutional neural network | |
CN114357870A (en) | Metering equipment operation performance prediction analysis method based on local weighted partial least squares | |
CN103528820A (en) | Rolling bearing fault diagnosis method based on distance evaluation factor potential energy function | |
Zhang et al. | An analysis of CNN feature extractor based on KL divergence | |
CN111523573A (en) | Bridge structure state evaluation method and system based on multi-parameter fusion | |
CN103020864B (en) | Corn fine breed breeding method | |
Bhattacharyya et al. | Long term prediction of rainfall in Andhra Pradesh with Deep learning | |
Yazdi et al. | Hierarchical tree clustering of fuzzy number |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170201 |