CN110390358A - A kind of deep learning method based on feature clustering - Google Patents

A kind of deep learning method based on feature clustering Download PDF

Info

Publication number
CN110390358A
CN110390358A CN201910665812.0A CN201910665812A CN110390358A CN 110390358 A CN110390358 A CN 110390358A CN 201910665812 A CN201910665812 A CN 201910665812A CN 110390358 A CN110390358 A CN 110390358A
Authority
CN
China
Prior art keywords
characteristic variable
clustering
data
relative coefficient
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910665812.0A
Other languages
Chinese (zh)
Inventor
杨勇
黄淑英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201910665812.0A priority Critical patent/CN110390358A/en
Publication of CN110390358A publication Critical patent/CN110390358A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The deep learning method based on feature clustering that the invention discloses a kind of, select characteristic variable the following steps are included: being concentrated in specific data, carry out data prediction, calculate the relative coefficient between characteristic variable to selecting characteristic variable, and using the high characteristic variable of custom function screening relative coefficient, filter out principal component in characteristic variable, the principal component filtered out constructed, clustering processing, neural network configuration instructed based on cluster result;The present invention is by carrying out data prediction to the characteristic variable selected, data zooming can eliminate the difference of the characteristic attributes such as characteristic, order of magnitude between different samples, sample can be mapped to low dimensional space and is shown by dimension-reduction treatment, convenient for the later period, data shape selects most suitable cluster mode according to the observation, the accuracy of feature clustering can be improved, by screening the high characteristic variable of relative coefficient using custom function, it can solve and select the relatively low problem of correlation between variables in clustering.

Description

A kind of deep learning method based on feature clustering
Technical field
The present invention relates to machine learning techniques field more particularly to a kind of deep learning methods based on feature clustering.
Background technique
In deep learning field, the deep learning framework of mainstream has DNN, RNN and CNN, and DNN is the nerve that feature connects entirely Network is a kind of general deep learning method;RNN is Recognition with Recurrent Neural Network and a kind of full connection structure, is mainly used for having Time context, such as the field NLP;CNN is convolutional neural networks, is characterized in the part connection based on spatial coherence, It is mainly used for field of image processing.Also clearly, CNN's is this for the advantage and disadvantage of the neural network structure of these three mainstreams at present The connection of feature local correlation reduces a large amount of parameter storage and calculates;DNN does not consider feature correlation, directly to all spies The full connection of sign, causes largely calculating and storage pressure, and many incoherent features are also attached calculating, causes a large amount of Interference and unnecessary connection calculate;There is also similar problems by RNN.
For image or other data sets with spatial coherence feature, can directly be learnt using CNN, but it is right It in the data for not having similar this space local correlations of image, directly will not then be got well using CNN effect, directly use DNN Then there is the connection calculating and parameter storage pressure of a large amount of uncorrelated features, therefore, the present invention proposes a kind of poly- based on feature The deep learning method of class, to solve shortcoming in the prior art.
Summary of the invention
In view of the above-mentioned problems, the present invention proposes a kind of deep learning method based on feature clustering, by selecting Characteristic variable carries out data prediction, and data zooming can eliminate the difference of the characteristic attributes such as characteristic, order of magnitude between different samples Different, sample can be mapped to low dimensional space and is shown by dimension-reduction treatment, and convenient for the later period, data shape is selected according to the observation Most suitable cluster mode, can improve the accuracy of feature clustering, high by screening relative coefficient using custom function Characteristic variable can solve and select the relatively low problem of correlation between variables in clustering.
The present invention proposes a kind of deep learning method based on feature clustering, comprising the following steps:
Step 1: being based on specific set of data, selects mostly important characteristic variable in specific data concentration;
Step 2: carrying out data prediction to the characteristic variable selected, including carries out data zooming, data transformation sum number According to dimension-reduction treatment;
Step 3: the relative coefficient between characteristic variable is calculated, using relative coefficient as similarity measurement, and is utilized Custom function screens the high characteristic variable of relative coefficient;
Step 4: based on the relative coefficient between characteristic variable, the principal component in characteristic variable is filtered out;
Step 5: constructing the principal component filtered out, forms network chart structure;
Step 6: carrying out clustering processing to network chart structure, the high characteristic variable of relative coefficient be divided into one kind, Obtain cluster result;
Step 7: neural network configuration is instructed based on obtained cluster result.
Further improvement lies in that: the characteristic variable in the step 1 can use correlation, Gini coefficient, comentropy, system Any one method in meter inspection or random forest is chosen.
Further improvement lies in that: data zooming process in the step 2 are as follows: the characteristic variable that will acquire is proportionally It is converted, the characteristic variable after conversion is compressed between (0,1).
Further improvement lies in that: data transformation is using in discrete Fourier transform or wavelet transform in the step 2 Any one mode carry out data transformation.
Further improvement lies in that: the characteristic variable mistake that custom function screening relative coefficient is high is utilized in the step 3 Journey are as follows: calculating correlation matrix first filters out the characteristic variable that relative coefficient is higher than preset value, correlation is higher than pre- If the relative coefficient of value takes 1, it is labeled as target signature variable, 0 is set by non-targeted characteristic variable, then finds out and meet item Then the characteristic variable of part deletes the high characteristic variable of relative coefficient.
Further improvement lies in that: the principal component process in characteristic variable is filtered out in the step 4 are as follows:
S1: then characteristic variable data set is carried out average value by input feature vector variable data collection;
S2: covariance matrix is calculated;
S3: the eigen vector of covariance matrix is sought with Eigenvalues Decomposition method;
S4: characteristic value is sorted from large to small, and selects maximum k, then by its corresponding k feature vector point It Zuo Wei not row vector composition characteristic vector matrix;
S5: finally data are transformed into the new space of k feature vector building.
Further improvement lies in that: in the step 6 to network chart structure carry out clustering processing can be used hierarchical cluster or It is that K-meana clustering algorithm carries out clustering processing.
Further improvement lies in that: when carrying out clustering processing using hierarchical cluster in the step 6, last cluster result Node as clustering processing next time.
Further improvement lies in that: when carrying out clustering processing using K-meana clustering algorithm in the step 6, detailed process Are as follows:
T1: determining the number of cluster first, and K sample is then chosen from the data set of network chart structure as cluster Then center calculates cluster centre to the Euclidean distance between other samples, chooses the smallest sample of cluster and sorted out, sorted out With the class where cluster centre, initial clustering result is obtained;
T2: calculating the mean value of all samples in initial clustering result, then determines a new cluster centre, and repeat T1 Operation;
T3: repetitive operation always completes clustering processing until cluster centre no longer moves.
The invention has the benefit that data zooming can by carrying out data prediction to the characteristic variable selected The difference for eliminating the characteristic attributes such as characteristic, order of magnitude between different samples can guarantee each sample characteristics numerical quantity of result All on the same order of magnitude, sample can be mapped to low dimensional space and is shown by dimension-reduction treatment, convenient for the later period according to sight Data shape is examined to select most suitable cluster mode, the accuracy of feature clustering can be improved, by being sieved using custom function The characteristic variable that relative coefficient is high is selected, can solve and select the relatively low problem of correlation between variables in clustering, and The calculating of the method for the present invention and storage pressure are lower, can be reduced the training time of deep learning model, efficiency is higher.
Specific embodiment
In order to deepen the understanding of the present invention, the present invention is further described below in conjunction with embodiment, the present embodiment For explaining only the invention, it is not intended to limit the scope of the present invention..
A kind of deep learning method based on feature clustering, comprising the following steps:
Step 1: being based on specific set of data, selects mostly important characteristic variable, characteristic variable in specific data concentration It can be chosen with correlation method;
Step 2: carrying out data prediction to the characteristic variable selected, including carries out data zooming, data transformation sum number According to dimension-reduction treatment, data zooming process are as follows: the characteristic variable that will acquire proportionally is converted, and the feature after conversion is become Amount is compressed between (0,1), and data transformation uses wavelet transform;
Step 3: the relative coefficient between characteristic variable is calculated, using relative coefficient as similarity measurement, and is utilized Custom function screens the high characteristic variable of relative coefficient, utilizes the high characteristic variable of custom function screening relative coefficient Process are as follows: calculating correlation matrix first filters out the characteristic variable that relative coefficient is higher than preset value, correlation is higher than The relative coefficient of preset value takes 1, is labeled as target signature variable, sets 0 for non-targeted characteristic variable, then find out and meet Then the characteristic variable of condition deletes the high characteristic variable of relative coefficient;
Step 4: based on the relative coefficient between characteristic variable, the principal component in characteristic variable, process are filtered out are as follows:
S1: then characteristic variable data set is carried out average value by input feature vector variable data collection;
S2: covariance matrix is calculated;
S3: the eigen vector of covariance matrix is sought with Eigenvalues Decomposition method;
S4: characteristic value is sorted from large to small, and selects maximum k, then by its corresponding k feature vector point It Zuo Wei not row vector composition characteristic vector matrix;
S5: finally data are transformed into the new space of k feature vector building;
Step 5: constructing the principal component filtered out, forms network chart structure;
Step 6: carrying out clustering processing to network chart structure, carries out clustering processing using K-meana clustering algorithm, will The high characteristic variable of relative coefficient is divided into one kind, obtains cluster result, when clustering processing, detailed process are as follows:
T1: determining the number of cluster first, and K sample is then chosen from the data set of network chart structure as cluster Then center calculates cluster centre to the Euclidean distance between other samples, chooses the smallest sample of cluster and sorted out, sorted out With the class where cluster centre, initial clustering result is obtained;
T2: calculating the mean value of all samples in initial clustering result, then determines a new cluster centre, and repeat T1 Operation;
T3: repetitive operation always completes clustering processing until cluster centre no longer moves;
Step 7: neural network configuration is instructed based on obtained cluster result.
Further improvement lies in that: in the step 6 to network chart structure carry out clustering processing can be used hierarchical cluster or It is that K-meana clustering algorithm carries out clustering processing.
By carrying out data prediction to the characteristic variable selected, data zooming can be eliminated special between different samples The difference of the characteristic attributes such as property, the order of magnitude, can guarantee each sample characteristics numerical quantity of result all on the same order of magnitude, Sample can be mapped to low dimensional space and is shown by dimension-reduction treatment, and convenient for the later period, data shape is most suitable to select according to the observation The cluster mode of conjunction, can improve the accuracy of feature clustering, by screening the high feature of relative coefficient using custom function Variable can solve and select the relatively low problem of correlation between variables in clustering, and the calculating of the method for the present invention and deposit It is lower to store up pressure, can be reduced the training time of deep learning model, efficiency is higher.
The basic principles, main features and advantages of the invention have been shown and described above.The technical staff of the industry should Understand, the present invention is not limited to the above embodiments, and the above embodiments and description only describe originals of the invention Reason, without departing from the spirit and scope of the present invention, various changes and improvements may be made to the invention, these changes and improvements It all fall within the protetion scope of the claimed invention.The claimed scope of the invention is by appended claims and its equivalent circle It is fixed.

Claims (9)

1. a kind of deep learning method based on feature clustering, which comprises the following steps:
Step 1: being based on specific set of data, selects mostly important characteristic variable in specific data concentration;
Step 2: carrying out data prediction to the characteristic variable selected, including carries out data zooming, data transformation and data drop Dimension processing;
Step 3: calculating the relative coefficient between characteristic variable, using relative coefficient as similarity measurement, and utilizes and makes by oneself The high characteristic variable of adopted function screening relative coefficient;
Step 4: based on the relative coefficient between characteristic variable, the principal component in characteristic variable is filtered out;
Step 5: constructing the principal component filtered out, forms network chart structure;
Step 6: clustering processing is carried out to network chart structure, the high characteristic variable of relative coefficient is divided into one kind, is obtained Cluster result;
Step 7: neural network configuration is instructed based on obtained cluster result.
2. a kind of deep learning method based on feature clustering according to claim 1, it is characterised in that: the step 1 In characteristic variable can use any one method in correlation, Gini coefficient, comentropy, statistical check or random forest It is chosen.
3. a kind of deep learning method based on feature clustering according to claim 1, it is characterised in that: the step 2 Middle data zooming process are as follows: the characteristic variable that will acquire proportionally is converted, and the characteristic variable after conversion is compressed to Between (0,1).
4. a kind of deep learning method based on feature clustering according to claim 1, it is characterised in that: the step 2 Middle data transformation carries out data transformation using any one mode in discrete Fourier transform or wavelet transform.
5. a kind of deep learning method based on feature clustering according to claim 1, it is characterised in that: the step 3 The middle characteristic variable process high using custom function screening relative coefficient are as follows: calculating correlation matrix first filters out Relative coefficient is higher than the characteristic variable of preset value, and the relative coefficient that correlation is higher than preset value is taken 1, special labeled as target Variable is levied, 0 is set by non-targeted characteristic variable, then finds out qualified characteristic variable, then delete relative coefficient High characteristic variable.
6. a kind of deep learning method based on feature clustering according to claim 1, it is characterised in that: the step 4 In filter out principal component process in characteristic variable are as follows:
S1: then characteristic variable data set is carried out average value by input feature vector variable data collection;
S2: covariance matrix is calculated;
S3: the eigen vector of covariance matrix is sought with Eigenvalues Decomposition method;
S4: characteristic value is sorted from large to small, and is selected maximum k, is then made its corresponding k feature vector respectively For row vector composition characteristic vector matrix;
S5: finally data are transformed into the new space of k feature vector building.
7. a kind of deep learning method based on feature clustering according to claim 1, it is characterised in that: the step 6 In carry out clustering processing to network chart structure hierarchical cluster or K-meana clustering algorithm can be used carrying out clustering processing.
8. a kind of deep learning method based on feature clustering according to claim 7, it is characterised in that: the step 6 When the middle progress clustering processing using hierarchical cluster, node of the last cluster result as clustering processing next time.
9. a kind of deep learning method based on feature clustering according to claim 6, it is characterised in that: the step 6 When the middle progress clustering processing using K-meana clustering algorithm, detailed process are as follows:
T1: determining the number of cluster first, then from K sample is chosen in the data set of network chart structure as in cluster Then the heart calculates cluster centre and arrives the Euclidean distance between other samples, choose and cluster the smallest sample and sorted out, sort out and Class where cluster centre obtains initial clustering result;
T2: calculating the mean value of all samples in initial clustering result, then determines a new cluster centre, and repeats T1 behaviour Make;
T3: repetitive operation always completes clustering processing until cluster centre no longer moves.
CN201910665812.0A 2019-07-23 2019-07-23 A kind of deep learning method based on feature clustering Pending CN110390358A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910665812.0A CN110390358A (en) 2019-07-23 2019-07-23 A kind of deep learning method based on feature clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910665812.0A CN110390358A (en) 2019-07-23 2019-07-23 A kind of deep learning method based on feature clustering

Publications (1)

Publication Number Publication Date
CN110390358A true CN110390358A (en) 2019-10-29

Family

ID=68287222

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910665812.0A Pending CN110390358A (en) 2019-07-23 2019-07-23 A kind of deep learning method based on feature clustering

Country Status (1)

Country Link
CN (1) CN110390358A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111338897A (en) * 2020-02-24 2020-06-26 京东数字科技控股有限公司 Identification method of abnormal node in application host, monitoring equipment and electronic equipment
CN112043252A (en) * 2020-10-10 2020-12-08 山东大学 Emotion recognition system and method based on respiratory component in pulse signal
CN113257365A (en) * 2021-05-26 2021-08-13 南开大学 Clustering method and system for non-standardized single cell transcriptome sequencing data
TWI752485B (en) * 2019-11-14 2022-01-11 大陸商支付寶(杭州)信息技術有限公司 User clustering and feature learning method, device, and computer-readable medium
CN116955117A (en) * 2023-09-18 2023-10-27 深圳市艺高智慧科技有限公司 Computer radiator performance analysis system based on data visualization enhancement

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI752485B (en) * 2019-11-14 2022-01-11 大陸商支付寶(杭州)信息技術有限公司 User clustering and feature learning method, device, and computer-readable medium
CN111338897A (en) * 2020-02-24 2020-06-26 京东数字科技控股有限公司 Identification method of abnormal node in application host, monitoring equipment and electronic equipment
CN112043252A (en) * 2020-10-10 2020-12-08 山东大学 Emotion recognition system and method based on respiratory component in pulse signal
CN113257365A (en) * 2021-05-26 2021-08-13 南开大学 Clustering method and system for non-standardized single cell transcriptome sequencing data
CN116955117A (en) * 2023-09-18 2023-10-27 深圳市艺高智慧科技有限公司 Computer radiator performance analysis system based on data visualization enhancement
CN116955117B (en) * 2023-09-18 2023-12-22 深圳市艺高智慧科技有限公司 Computer radiator performance analysis system based on data visualization enhancement

Similar Documents

Publication Publication Date Title
CN110390358A (en) A kind of deep learning method based on feature clustering
CN110348399B (en) Hyperspectral intelligent classification method based on prototype learning mechanism and multidimensional residual error network
Yu et al. Mixed pooling for convolutional neural networks
CN106960214A (en) Object identification method based on image
CN109993100B (en) Method for realizing facial expression recognition based on deep feature clustering
JP4618098B2 (en) Image processing system
CN109948647A (en) A kind of electrocardiogram classification method and system based on depth residual error network
CN110353694B (en) Motion recognition method based on feature selection
CN104392250A (en) Image classification method based on MapReduce
CN111105160A (en) Steel quality prediction method based on tendency heterogeneous bagging algorithm
CN109711401A (en) A kind of Method for text detection in natural scene image based on Faster Rcnn
CN109325510B (en) Image feature point matching method based on grid statistics
CN110334777A (en) A kind of unsupervised attribute selection method of weighting multi-angle of view
CN101833667A (en) Pattern recognition classification method expressed based on grouping sparsity
CN109344898A (en) Convolutional neural networks image classification method based on sparse coding pre-training
CN109711442A (en) Unsupervised layer-by-layer generation confrontation feature representation learning method
CN109447153A (en) Divergence-excitation self-encoding encoder and its classification method for lack of balance data classification
Guan et al. Defect detection and classification for plain woven fabric based on deep learning
Andrews et al. Fast scalable and accurate discovery of dags using the best order score search and grow shrink trees
CN111666999A (en) Remote sensing image classification method
CN116776245A (en) Three-phase inverter equipment fault diagnosis method based on machine learning
Çakmak Grapevine Leaves Classification Using Transfer Learning and Fine Tuning
CN116051924B (en) Divide-and-conquer defense method for image countermeasure sample
CN110222553A (en) A kind of recognition methods again of the Multi-shot pedestrian based on rarefaction representation
CN113723281A (en) High-resolution image classification method based on local adaptive scale ensemble learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191029