CN110390358A - A kind of deep learning method based on feature clustering - Google Patents
A kind of deep learning method based on feature clustering Download PDFInfo
- Publication number
- CN110390358A CN110390358A CN201910665812.0A CN201910665812A CN110390358A CN 110390358 A CN110390358 A CN 110390358A CN 201910665812 A CN201910665812 A CN 201910665812A CN 110390358 A CN110390358 A CN 110390358A
- Authority
- CN
- China
- Prior art keywords
- characteristic variable
- clustering
- data
- relative coefficient
- deep learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000013135 deep learning Methods 0.000 title claims abstract description 20
- 238000012545 processing Methods 0.000 claims abstract description 23
- 230000006870 function Effects 0.000 claims abstract description 11
- 238000012216 screening Methods 0.000 claims abstract description 8
- 238000013528 artificial neural network Methods 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 11
- 238000013501 data transformation Methods 0.000 claims description 8
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000013480 data collection Methods 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 230000003252 repetitive effect Effects 0.000 claims description 3
- 238000007637 random forest analysis Methods 0.000 claims description 2
- 230000006399 behavior Effects 0.000 claims 1
- 230000006872 improvement Effects 0.000 description 11
- 238000013527 convolutional neural network Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 3
- 238000013136 deep learning model Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2113—Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The deep learning method based on feature clustering that the invention discloses a kind of, select characteristic variable the following steps are included: being concentrated in specific data, carry out data prediction, calculate the relative coefficient between characteristic variable to selecting characteristic variable, and using the high characteristic variable of custom function screening relative coefficient, filter out principal component in characteristic variable, the principal component filtered out constructed, clustering processing, neural network configuration instructed based on cluster result;The present invention is by carrying out data prediction to the characteristic variable selected, data zooming can eliminate the difference of the characteristic attributes such as characteristic, order of magnitude between different samples, sample can be mapped to low dimensional space and is shown by dimension-reduction treatment, convenient for the later period, data shape selects most suitable cluster mode according to the observation, the accuracy of feature clustering can be improved, by screening the high characteristic variable of relative coefficient using custom function, it can solve and select the relatively low problem of correlation between variables in clustering.
Description
Technical field
The present invention relates to machine learning techniques field more particularly to a kind of deep learning methods based on feature clustering.
Background technique
In deep learning field, the deep learning framework of mainstream has DNN, RNN and CNN, and DNN is the nerve that feature connects entirely
Network is a kind of general deep learning method;RNN is Recognition with Recurrent Neural Network and a kind of full connection structure, is mainly used for having
Time context, such as the field NLP;CNN is convolutional neural networks, is characterized in the part connection based on spatial coherence,
It is mainly used for field of image processing.Also clearly, CNN's is this for the advantage and disadvantage of the neural network structure of these three mainstreams at present
The connection of feature local correlation reduces a large amount of parameter storage and calculates;DNN does not consider feature correlation, directly to all spies
The full connection of sign, causes largely calculating and storage pressure, and many incoherent features are also attached calculating, causes a large amount of
Interference and unnecessary connection calculate;There is also similar problems by RNN.
For image or other data sets with spatial coherence feature, can directly be learnt using CNN, but it is right
It in the data for not having similar this space local correlations of image, directly will not then be got well using CNN effect, directly use DNN
Then there is the connection calculating and parameter storage pressure of a large amount of uncorrelated features, therefore, the present invention proposes a kind of poly- based on feature
The deep learning method of class, to solve shortcoming in the prior art.
Summary of the invention
In view of the above-mentioned problems, the present invention proposes a kind of deep learning method based on feature clustering, by selecting
Characteristic variable carries out data prediction, and data zooming can eliminate the difference of the characteristic attributes such as characteristic, order of magnitude between different samples
Different, sample can be mapped to low dimensional space and is shown by dimension-reduction treatment, and convenient for the later period, data shape is selected according to the observation
Most suitable cluster mode, can improve the accuracy of feature clustering, high by screening relative coefficient using custom function
Characteristic variable can solve and select the relatively low problem of correlation between variables in clustering.
The present invention proposes a kind of deep learning method based on feature clustering, comprising the following steps:
Step 1: being based on specific set of data, selects mostly important characteristic variable in specific data concentration;
Step 2: carrying out data prediction to the characteristic variable selected, including carries out data zooming, data transformation sum number
According to dimension-reduction treatment;
Step 3: the relative coefficient between characteristic variable is calculated, using relative coefficient as similarity measurement, and is utilized
Custom function screens the high characteristic variable of relative coefficient;
Step 4: based on the relative coefficient between characteristic variable, the principal component in characteristic variable is filtered out;
Step 5: constructing the principal component filtered out, forms network chart structure;
Step 6: carrying out clustering processing to network chart structure, the high characteristic variable of relative coefficient be divided into one kind,
Obtain cluster result;
Step 7: neural network configuration is instructed based on obtained cluster result.
Further improvement lies in that: the characteristic variable in the step 1 can use correlation, Gini coefficient, comentropy, system
Any one method in meter inspection or random forest is chosen.
Further improvement lies in that: data zooming process in the step 2 are as follows: the characteristic variable that will acquire is proportionally
It is converted, the characteristic variable after conversion is compressed between (0,1).
Further improvement lies in that: data transformation is using in discrete Fourier transform or wavelet transform in the step 2
Any one mode carry out data transformation.
Further improvement lies in that: the characteristic variable mistake that custom function screening relative coefficient is high is utilized in the step 3
Journey are as follows: calculating correlation matrix first filters out the characteristic variable that relative coefficient is higher than preset value, correlation is higher than pre-
If the relative coefficient of value takes 1, it is labeled as target signature variable, 0 is set by non-targeted characteristic variable, then finds out and meet item
Then the characteristic variable of part deletes the high characteristic variable of relative coefficient.
Further improvement lies in that: the principal component process in characteristic variable is filtered out in the step 4 are as follows:
S1: then characteristic variable data set is carried out average value by input feature vector variable data collection;
S2: covariance matrix is calculated;
S3: the eigen vector of covariance matrix is sought with Eigenvalues Decomposition method;
S4: characteristic value is sorted from large to small, and selects maximum k, then by its corresponding k feature vector point
It Zuo Wei not row vector composition characteristic vector matrix;
S5: finally data are transformed into the new space of k feature vector building.
Further improvement lies in that: in the step 6 to network chart structure carry out clustering processing can be used hierarchical cluster or
It is that K-meana clustering algorithm carries out clustering processing.
Further improvement lies in that: when carrying out clustering processing using hierarchical cluster in the step 6, last cluster result
Node as clustering processing next time.
Further improvement lies in that: when carrying out clustering processing using K-meana clustering algorithm in the step 6, detailed process
Are as follows:
T1: determining the number of cluster first, and K sample is then chosen from the data set of network chart structure as cluster
Then center calculates cluster centre to the Euclidean distance between other samples, chooses the smallest sample of cluster and sorted out, sorted out
With the class where cluster centre, initial clustering result is obtained;
T2: calculating the mean value of all samples in initial clustering result, then determines a new cluster centre, and repeat T1
Operation;
T3: repetitive operation always completes clustering processing until cluster centre no longer moves.
The invention has the benefit that data zooming can by carrying out data prediction to the characteristic variable selected
The difference for eliminating the characteristic attributes such as characteristic, order of magnitude between different samples can guarantee each sample characteristics numerical quantity of result
All on the same order of magnitude, sample can be mapped to low dimensional space and is shown by dimension-reduction treatment, convenient for the later period according to sight
Data shape is examined to select most suitable cluster mode, the accuracy of feature clustering can be improved, by being sieved using custom function
The characteristic variable that relative coefficient is high is selected, can solve and select the relatively low problem of correlation between variables in clustering, and
The calculating of the method for the present invention and storage pressure are lower, can be reduced the training time of deep learning model, efficiency is higher.
Specific embodiment
In order to deepen the understanding of the present invention, the present invention is further described below in conjunction with embodiment, the present embodiment
For explaining only the invention, it is not intended to limit the scope of the present invention..
A kind of deep learning method based on feature clustering, comprising the following steps:
Step 1: being based on specific set of data, selects mostly important characteristic variable, characteristic variable in specific data concentration
It can be chosen with correlation method;
Step 2: carrying out data prediction to the characteristic variable selected, including carries out data zooming, data transformation sum number
According to dimension-reduction treatment, data zooming process are as follows: the characteristic variable that will acquire proportionally is converted, and the feature after conversion is become
Amount is compressed between (0,1), and data transformation uses wavelet transform;
Step 3: the relative coefficient between characteristic variable is calculated, using relative coefficient as similarity measurement, and is utilized
Custom function screens the high characteristic variable of relative coefficient, utilizes the high characteristic variable of custom function screening relative coefficient
Process are as follows: calculating correlation matrix first filters out the characteristic variable that relative coefficient is higher than preset value, correlation is higher than
The relative coefficient of preset value takes 1, is labeled as target signature variable, sets 0 for non-targeted characteristic variable, then find out and meet
Then the characteristic variable of condition deletes the high characteristic variable of relative coefficient;
Step 4: based on the relative coefficient between characteristic variable, the principal component in characteristic variable, process are filtered out are as follows:
S1: then characteristic variable data set is carried out average value by input feature vector variable data collection;
S2: covariance matrix is calculated;
S3: the eigen vector of covariance matrix is sought with Eigenvalues Decomposition method;
S4: characteristic value is sorted from large to small, and selects maximum k, then by its corresponding k feature vector point
It Zuo Wei not row vector composition characteristic vector matrix;
S5: finally data are transformed into the new space of k feature vector building;
Step 5: constructing the principal component filtered out, forms network chart structure;
Step 6: carrying out clustering processing to network chart structure, carries out clustering processing using K-meana clustering algorithm, will
The high characteristic variable of relative coefficient is divided into one kind, obtains cluster result, when clustering processing, detailed process are as follows:
T1: determining the number of cluster first, and K sample is then chosen from the data set of network chart structure as cluster
Then center calculates cluster centre to the Euclidean distance between other samples, chooses the smallest sample of cluster and sorted out, sorted out
With the class where cluster centre, initial clustering result is obtained;
T2: calculating the mean value of all samples in initial clustering result, then determines a new cluster centre, and repeat T1
Operation;
T3: repetitive operation always completes clustering processing until cluster centre no longer moves;
Step 7: neural network configuration is instructed based on obtained cluster result.
Further improvement lies in that: in the step 6 to network chart structure carry out clustering processing can be used hierarchical cluster or
It is that K-meana clustering algorithm carries out clustering processing.
By carrying out data prediction to the characteristic variable selected, data zooming can be eliminated special between different samples
The difference of the characteristic attributes such as property, the order of magnitude, can guarantee each sample characteristics numerical quantity of result all on the same order of magnitude,
Sample can be mapped to low dimensional space and is shown by dimension-reduction treatment, and convenient for the later period, data shape is most suitable to select according to the observation
The cluster mode of conjunction, can improve the accuracy of feature clustering, by screening the high feature of relative coefficient using custom function
Variable can solve and select the relatively low problem of correlation between variables in clustering, and the calculating of the method for the present invention and deposit
It is lower to store up pressure, can be reduced the training time of deep learning model, efficiency is higher.
The basic principles, main features and advantages of the invention have been shown and described above.The technical staff of the industry should
Understand, the present invention is not limited to the above embodiments, and the above embodiments and description only describe originals of the invention
Reason, without departing from the spirit and scope of the present invention, various changes and improvements may be made to the invention, these changes and improvements
It all fall within the protetion scope of the claimed invention.The claimed scope of the invention is by appended claims and its equivalent circle
It is fixed.
Claims (9)
1. a kind of deep learning method based on feature clustering, which comprises the following steps:
Step 1: being based on specific set of data, selects mostly important characteristic variable in specific data concentration;
Step 2: carrying out data prediction to the characteristic variable selected, including carries out data zooming, data transformation and data drop
Dimension processing;
Step 3: calculating the relative coefficient between characteristic variable, using relative coefficient as similarity measurement, and utilizes and makes by oneself
The high characteristic variable of adopted function screening relative coefficient;
Step 4: based on the relative coefficient between characteristic variable, the principal component in characteristic variable is filtered out;
Step 5: constructing the principal component filtered out, forms network chart structure;
Step 6: clustering processing is carried out to network chart structure, the high characteristic variable of relative coefficient is divided into one kind, is obtained
Cluster result;
Step 7: neural network configuration is instructed based on obtained cluster result.
2. a kind of deep learning method based on feature clustering according to claim 1, it is characterised in that: the step 1
In characteristic variable can use any one method in correlation, Gini coefficient, comentropy, statistical check or random forest
It is chosen.
3. a kind of deep learning method based on feature clustering according to claim 1, it is characterised in that: the step 2
Middle data zooming process are as follows: the characteristic variable that will acquire proportionally is converted, and the characteristic variable after conversion is compressed to
Between (0,1).
4. a kind of deep learning method based on feature clustering according to claim 1, it is characterised in that: the step 2
Middle data transformation carries out data transformation using any one mode in discrete Fourier transform or wavelet transform.
5. a kind of deep learning method based on feature clustering according to claim 1, it is characterised in that: the step 3
The middle characteristic variable process high using custom function screening relative coefficient are as follows: calculating correlation matrix first filters out
Relative coefficient is higher than the characteristic variable of preset value, and the relative coefficient that correlation is higher than preset value is taken 1, special labeled as target
Variable is levied, 0 is set by non-targeted characteristic variable, then finds out qualified characteristic variable, then delete relative coefficient
High characteristic variable.
6. a kind of deep learning method based on feature clustering according to claim 1, it is characterised in that: the step 4
In filter out principal component process in characteristic variable are as follows:
S1: then characteristic variable data set is carried out average value by input feature vector variable data collection;
S2: covariance matrix is calculated;
S3: the eigen vector of covariance matrix is sought with Eigenvalues Decomposition method;
S4: characteristic value is sorted from large to small, and is selected maximum k, is then made its corresponding k feature vector respectively
For row vector composition characteristic vector matrix;
S5: finally data are transformed into the new space of k feature vector building.
7. a kind of deep learning method based on feature clustering according to claim 1, it is characterised in that: the step 6
In carry out clustering processing to network chart structure hierarchical cluster or K-meana clustering algorithm can be used carrying out clustering processing.
8. a kind of deep learning method based on feature clustering according to claim 7, it is characterised in that: the step 6
When the middle progress clustering processing using hierarchical cluster, node of the last cluster result as clustering processing next time.
9. a kind of deep learning method based on feature clustering according to claim 6, it is characterised in that: the step 6
When the middle progress clustering processing using K-meana clustering algorithm, detailed process are as follows:
T1: determining the number of cluster first, then from K sample is chosen in the data set of network chart structure as in cluster
Then the heart calculates cluster centre and arrives the Euclidean distance between other samples, choose and cluster the smallest sample and sorted out, sort out and
Class where cluster centre obtains initial clustering result;
T2: calculating the mean value of all samples in initial clustering result, then determines a new cluster centre, and repeats T1 behaviour
Make;
T3: repetitive operation always completes clustering processing until cluster centre no longer moves.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910665812.0A CN110390358A (en) | 2019-07-23 | 2019-07-23 | A kind of deep learning method based on feature clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910665812.0A CN110390358A (en) | 2019-07-23 | 2019-07-23 | A kind of deep learning method based on feature clustering |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110390358A true CN110390358A (en) | 2019-10-29 |
Family
ID=68287222
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910665812.0A Pending CN110390358A (en) | 2019-07-23 | 2019-07-23 | A kind of deep learning method based on feature clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110390358A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111338897A (en) * | 2020-02-24 | 2020-06-26 | 京东数字科技控股有限公司 | Identification method of abnormal node in application host, monitoring equipment and electronic equipment |
CN112043252A (en) * | 2020-10-10 | 2020-12-08 | 山东大学 | Emotion recognition system and method based on respiratory component in pulse signal |
CN113257365A (en) * | 2021-05-26 | 2021-08-13 | 南开大学 | Clustering method and system for non-standardized single cell transcriptome sequencing data |
TWI752485B (en) * | 2019-11-14 | 2022-01-11 | 大陸商支付寶(杭州)信息技術有限公司 | User clustering and feature learning method, device, and computer-readable medium |
CN116955117A (en) * | 2023-09-18 | 2023-10-27 | 深圳市艺高智慧科技有限公司 | Computer radiator performance analysis system based on data visualization enhancement |
-
2019
- 2019-07-23 CN CN201910665812.0A patent/CN110390358A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI752485B (en) * | 2019-11-14 | 2022-01-11 | 大陸商支付寶(杭州)信息技術有限公司 | User clustering and feature learning method, device, and computer-readable medium |
CN111338897A (en) * | 2020-02-24 | 2020-06-26 | 京东数字科技控股有限公司 | Identification method of abnormal node in application host, monitoring equipment and electronic equipment |
CN112043252A (en) * | 2020-10-10 | 2020-12-08 | 山东大学 | Emotion recognition system and method based on respiratory component in pulse signal |
CN113257365A (en) * | 2021-05-26 | 2021-08-13 | 南开大学 | Clustering method and system for non-standardized single cell transcriptome sequencing data |
CN116955117A (en) * | 2023-09-18 | 2023-10-27 | 深圳市艺高智慧科技有限公司 | Computer radiator performance analysis system based on data visualization enhancement |
CN116955117B (en) * | 2023-09-18 | 2023-12-22 | 深圳市艺高智慧科技有限公司 | Computer radiator performance analysis system based on data visualization enhancement |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110390358A (en) | A kind of deep learning method based on feature clustering | |
CN110348399B (en) | Hyperspectral intelligent classification method based on prototype learning mechanism and multidimensional residual error network | |
Yu et al. | Mixed pooling for convolutional neural networks | |
CN106960214A (en) | Object identification method based on image | |
CN109993100B (en) | Method for realizing facial expression recognition based on deep feature clustering | |
JP4618098B2 (en) | Image processing system | |
CN109948647A (en) | A kind of electrocardiogram classification method and system based on depth residual error network | |
CN110353694B (en) | Motion recognition method based on feature selection | |
CN104392250A (en) | Image classification method based on MapReduce | |
CN111105160A (en) | Steel quality prediction method based on tendency heterogeneous bagging algorithm | |
CN109711401A (en) | A kind of Method for text detection in natural scene image based on Faster Rcnn | |
CN109325510B (en) | Image feature point matching method based on grid statistics | |
CN110334777A (en) | A kind of unsupervised attribute selection method of weighting multi-angle of view | |
CN101833667A (en) | Pattern recognition classification method expressed based on grouping sparsity | |
CN109344898A (en) | Convolutional neural networks image classification method based on sparse coding pre-training | |
CN109711442A (en) | Unsupervised layer-by-layer generation confrontation feature representation learning method | |
CN109447153A (en) | Divergence-excitation self-encoding encoder and its classification method for lack of balance data classification | |
Guan et al. | Defect detection and classification for plain woven fabric based on deep learning | |
Andrews et al. | Fast scalable and accurate discovery of dags using the best order score search and grow shrink trees | |
CN111666999A (en) | Remote sensing image classification method | |
CN116776245A (en) | Three-phase inverter equipment fault diagnosis method based on machine learning | |
Çakmak | Grapevine Leaves Classification Using Transfer Learning and Fine Tuning | |
CN116051924B (en) | Divide-and-conquer defense method for image countermeasure sample | |
CN110222553A (en) | A kind of recognition methods again of the Multi-shot pedestrian based on rarefaction representation | |
CN113723281A (en) | High-resolution image classification method based on local adaptive scale ensemble learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191029 |