CN106503731A - A kind of based on conditional mutual information and the unsupervised feature selection approach of K means - Google Patents
A kind of based on conditional mutual information and the unsupervised feature selection approach of K means Download PDFInfo
- Publication number
- CN106503731A CN106503731A CN201610888945.0A CN201610888945A CN106503731A CN 106503731 A CN106503731 A CN 106503731A CN 201610888945 A CN201610888945 A CN 201610888945A CN 106503731 A CN106503731 A CN 106503731A
- Authority
- CN
- China
- Prior art keywords
- feature
- character
- cluster
- character subset
- subset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of based on conditional mutual information and the unsupervised feature selection approach of K means, the data without class label are clustered by the different K means algorithms of multiple primary condition first, then on the basis of cluster each time, consider each feature modularization metric and different characteristic between conditional mutual information, select using the correlation independence index between feature that the degree of correlation is high and the little character subset of redundancy.By being collected the character subset that different K means cluster results are obtained, final character subset is obtained.The present invention can be effectively applied to without label and unbalanced data set, and the character subset degree of correlation height of acquisition, redundancy are little.
Description
Technical field
The invention belongs to the feature selection issues in machine learning field, and in particular to be a kind of using conditional mutual information with
Method of the K-means algorithms to carrying out unsupervised feature selecting without label data collection.
Background technology
In the practical application of machine learning, feature quantity is often more, wherein there may be incoherent feature, feature
Between also likely to be present and interdepend.Characteristic Number is more, and the time that analyzes needed for feature, training pattern is longer, Er Qierong
Easily cause " dimension disaster ", make model increasingly complex, so as to bring the consequences such as model Generalization Ability decline.Therefore, feature is carried out
Select particularly important.
Feature selecting is also referred to as feature subset selection or Attributions selection, refers to one character subset of selection from whole features,
Make the model for constructing more preferable.Feature selecting can reject the feature of uncorrelated or redundancy, so as to reach minimizing Characteristic Number, carry
High model accuracy, reduces the purpose of run time.On the other hand, very positively related feature reduction model is selected, makes to grind
Study carefully the process that personnel should be readily appreciated that data are produced.
Different from the combination for building learning model according to search optimal feature subset, feature selection approach can be big
Cause is divided into two class of packaged type feature selecting (Wrapper) and filtering type feature selecting (Filter).Packaged type feature selecting is continuous
Repeatedly operation learning algorithm goes the quality of evaluation attribute collection, and it is better than filtering type feature selecting in precision, but for other
For grader, its Generalization Capability is poor.High Dimensional Data Set is faced, as packaged type feature selecting needs and specific study
Algorithm is combined closely, and therefore the computation complexity in learning process is very high.Filtering type feature selecting specifically need not learn to calculate
Method, but carry out the quality of Fast Evaluation feature using suitable criterion, therefore it is a kind of higher method of computational efficiency.
Existing most of traditional characteristic system of selection is to improve nicety of grading as optimization aim, does not take into full account number
According to the distribution situation of sample, and the generally results of learning of pursuit big class, easily ignore the learning performance of group.For solving data not
The problem of balance, in data plane, can carry out double sampling to the positive class sample of training set, so that positive and negative class before training
Sample reaches balance, is then learnt (Exploratory under-sampling for class- again accordingly
Imbalance learning.Liu X Y, Wu J, Zhou Z H), but this is obtained by cannot all data, can make score
Class precision reduces.In algorithm aspect, according to data category be distributed disequilibrium the characteristics of traditional characteristic selection algorithm is carried out
Improve, so that algorithm adapts to unbalanced sample (the feature selecting new algorithm in unbalanced problem of category distribution:IM-IG. outstanding ring
Space, Chen Yan, Li Guozheng), but this method is confined to the unbalanced problem of two classes, for the unbalanced problem of multiclass and does not apply to.
For filtering type feature selecting, existing many supervised feature selection approach are suggested at present, such as apply mutually
Information is estimated to candidate feature, and the several features for selecting ranking most front are used as the input (Using of neural network classifier
mutual information for selecting features in supervised neural net
Learning.R.Battiti), but this method have ignored the redundancy between feature, so as to cause the spy for selecting many redundancies
Levy, and be unfavorable for that the performance of subsequent classifier is improved.And this method is only applicable to the data with class label information, right
In unsupervised feature selecting and do not apply to.
In unsupervised feature selecting field, many is applied to the unsupervised feature selection approach of text and is suggested, but this
A little methods cannot directly apply to numeric type data.Certain applications in the unsupervised feature selection approach of numeric data, such as towards
The unsupervised filtering type feature selecting algorithm of characteristic of division, based on one-pass clustering algorithm, using each feature in different clusters
Between the importance degree that showed as basis for estimation, Changing Pattern selected characteristic subset finally according to importance (towards point
The unsupervised feature selection approach research of category feature. Wang Lianxi, Jiang Shengyi), this method is only using one-pass clustering algorithm logarithm
According to being divided so that the result of cluster has randomness, it is impossible to ensure the accuracy of feature selecting.
The present invention is clustered to the data without class label by the different K-means algorithms of multiple primary condition first,
Then here cluster on the basis of, consider each feature modularization metric and different characteristic between conditional mutual information,
Degree of correlation height and the little character subset of redundancy is obtained, finally the character subset that different K-means cluster results are obtained is carried out
Collect.
Content of the invention
Purpose:The technical problem to be solved is the feature selection issues without label data collection, proposes a kind of base
In conditional mutual information and the unsupervised feature selection approach of K-means.By the different K-means algorithms pair of multiple primary condition
Data without class label are clustered, and eliminating carries out the randomness of feature selecting on single cluster result, and it is uneven to reduce data
Impact of the weighing apparatus to feature selecting.On the basis of cluster each time, the modularization metric and not of each feature is considered
With the conditional mutual information between feature, degree of correlation height is selected using the correlation independence index between feature and redundancy is little
Combinations of features.By being collected the character subset that different K-means cluster results are obtained, final feature is obtained
Collection.The present invention can be effectively applied to without label and unbalanced data set, and the character subset degree of correlation height of acquisition, redundancy
Degree is little.
Technical scheme is as follows:
A kind of based on conditional mutual information and the unsupervised feature selection approach of K-means, comprise the following steps:
Step 1), to carrying out the K-means clusters of multiple different K values and different cluster centres without label data collection, and obtain
Obtain cluster result every time;
Step 2), according to step 1) the different cluster results that obtain, it is special each to be constructed for each cluster result successively
The feature vector chart that levies;
Step 3), according to step 2) feature vector chart that constructs, calculate the modularization metric of each feature, and by mould
The maximum feature of block metric is put in character subset;
Step 4), according to step 3) the initial characteristicses subset that obtains, in calculating each residue character relative to character subset
The conditional mutual information of each feature, so that calculate correlation independence metric of each residue character relative to character subset;
Step 5), by step 3) the modularization metric of each residue character that obtains and step 4) and obtain related independent
Property metric is added with certain weight, using result of calculation as each residue character score;
Step 6), by step 5) feature of highest scoring that obtains is put in character subset, is then made iteratively step
4), step 5), step 6), the Characteristic Number in character subset reach required for number;
Step 7), by step 6) obtain according to different K-means cluster results formed character subset collected, obtain
Arrive final character subset.
Further, of the invention based on conditional mutual information and the unsupervised feature selection approach of K-means, step 1) right
Without the K-means clusters that label data collection carries out multiple different K values and different cluster centres, and obtain each cluster result.
The present invention is first by K-means clustering algorithms to carrying out the different cluster of multiple initial value without label data collection.During initialization,
The maximum cluster number and min cluster number of K-means clustering algorithm, and cluster number of times are artificially specified.Carry out each time
During cluster, K-means algorithms randomly choose a number as the number of cluster between maximum cluster number and min cluster number
K, and k point is randomly choosed in data set as initial barycenter, by K-means clustering algorithms, can obtain successively each
The result of secondary cluster, i.e. class label C.
Further, of the invention based on conditional mutual information and the unsupervised feature selection approach of K-means, step 2) root
According to step 1) the different cluster results that obtain, construct the feature vector chart of each feature successively for each cluster result.Right
The construction of the feature vector chart of a certain feature in data set, is in the case of known to this feature lower eigenvalue and class label, incites somebody to action
Each sample is used as a point, it is assumed that the class that certain sample is located contains x sample, then by the point corresponding to the sample and and
The immediate x-1 sample point of its characteristic value is connected, more than all samples execution that data is concentrated under same feature
Operation, you can construct the feature vector chart of this feature.
Further, of the invention based on conditional mutual information and the unsupervised feature selection approach of K-means, step 3) root
According to step 2) feature vector chart that constructs, the modularization metric of each feature is calculated, computing formula is:
In formula, i, j are steps 2) two points in the feature vector chart that constructs;AijIt is the adjacent square of feature vector chart
, if there is side, A from i to j in battle arrayij=1, it is otherwise 0;M is the sum for always connecting side in number, i.e. feature vector chart;kiAnd kj
It is the number of degrees of node i and j respectively;Binary function δ (Ci,Cj) represent that if node i and j belong to same cluster, for 1, otherwise for
0;After feature vector chart according to each feature calculates respective modularization metric, all of modularization metric is entered
Row normalization, obtains Q ', the feature corresponding to Q ' maximums is put in character subset.
Further, of the invention based on conditional mutual information and the unsupervised feature selection approach of K-means, step 4) root
According to step 3) the initial characteristicses subset that obtains, calculate condition mutual trust of each residue character relative to each feature in character subset
Breath, so as to calculate correlation independence metric of each residue character relative to character subset, computing formula is:
In formula, frIt is the residue character for not being selected into character subset, fjIt is the feature in character subset, S is feature
Collection;Wherein RI (fr,fj) represent residue character frRelative to one of feature in character subset fjCorrelation independence, computing formula
For:
In formula, H (C) is the entropy of target variable C, I (fr;C|fj) and I (fj;C|fi) it is feature frWith feature fjCondition
Mutual information, computing formula is:
In formula, N is the number of sample in data set, and C is the quantity of class.Each residue character is calculated relative to feature
After the correlation independence metric of subset, all of correlation independence metric is normalized, I is obtainedri'.
Further, of the invention based on conditional mutual information and the unsupervised feature selection approach of K-means, step 5) will
Step 3) the normalizing block metric of each residue character and the step 4 that obtain) obtain the standardization of each residue character
Correlation independence metric is added with certain weight, i.e.,:S=wQ'+ (1-w) Iri', the w people in formula is specified, span
For [0,1], using result of calculation as each residue character score.
Further, of the invention based on conditional mutual information and the unsupervised feature selection approach of K-means, step 6) will
Step 5) feature corresponding to the s maximums that obtain is put in character subset, is then made iteratively step 4), step 5), step
Rapid 6) Characteristic Number in character subset reaches required number, and Characteristic Number is artificially specified.
Further, of the invention based on conditional mutual information and the unsupervised feature selection approach of K-means, step 7) will
Step 6) character subset formed according to different K-means cluster results that obtains collected, according to required feature
Number selects the most several features of occurrence number, constitutes final character subset.
Beneficial effect
The present invention is directed to the feature selection issues without label data collection in machine learning, by K-means algorithms and feature
Between conditional mutual information combine, contribute to selecting and concentrate most important feature without label data.The method is by repeatedly
The different K-means algorithms of primary condition, to clustering without label data collection, can eliminate
The randomness of selection is levied, impact of the data nonbalance to feature selecting is reduced, and conventional feature selection approach is compensate for injustice
Weighing apparatus data set features Selection effect is undesirable or defect that be only applicable to label data collection;Meanwhile, in order to obtain the degree of correlation high,
The little character subset of redundancy, this method consider the modularization metric of each feature on the basis of cluster each time
And the conditional mutual information between different characteristic, degree of correlation height and redundancy is selected using the correlation independence index between feature
The little combinations of features of degree, by being collected the character subset for repeatedly extracting, obtains final character subset.K-means
Algorithm and the combination of conditional mutual information so that this feature selection algorithm both can apply to balance or nonequilibrium without label data
Collection, and the degree of correlation of energy lifting feature subset, reduce its redundancy, so as to select most important characteristic set.
Description of the drawings
Fig. 1 is the flow chart of the unsupervised feature selection approach based on conditional mutual information and K-means.
Fig. 2 is the example to data set structural feature vectogram.
Specific embodiment
Below in conjunction with the accompanying drawings the enforcement of technical scheme is described in further detail:
In conjunction with flow chart and case study on implementation to of the present invention based on conditional mutual information and the unsupervised feature of K-means
System of selection is described in further detail.
The implementation case carries out feature selecting using conditional mutual information and K-means algorithms to the data set without label.Such as
Shown in Fig. 1, this method is comprised the steps of:
Step 10, to carrying out the K-means clusters of multiple different K values and different cluster centres without label data collection, and obtains
Obtain cluster result every time;
Step 101, maximum cluster number MAX of K-means algorithms and min cluster number MIN are advance in input phase
Given, before clustering every time, randomly choose a number in the range of [MAX, MIN] as number k of cluster, and in data set with
Machine selects k point as initial barycenter;
Step 102, the total degree T for carrying out K-means clustering algorithms are previously given in input phase, often execute one
Secondary K-means algorithms, can obtain a group cluster result i.e. class label C, repeat K-means clusters, until clustering number of times
Total degree set in advance is reached, the different cluster result of T groups may finally be obtained;
Step 20, according to cluster result obtained in the previous step, constructs each feature for each cluster result successively
Feature vector chart;
Data are concentrated the construction of the feature vector chart of a certain feature, are the features of sample under this feature by step 201
In the case of value and class label are known, first using each sample as a point, the number comprising two features as shown in Figure 2
According to one sample of each round dot on right side and square point expression, the numeral for putting side represent the big of the characteristic value corresponding to point
Little;
Step 202, if the total sample number that the class that certain sample is located includes is x, by the point corresponding to the sample with
X-1 sample point immediate with its characteristic value is connected, as shown in Fig. 2 the class that sample 1 is located is C1, the sample that C1 classes include
Sum is 4, then by the point corresponding to sample 1 and and immediate 3 sample points of its characteristic value, i.e. sample 2, sample 7, sample
6 are connected;
Step 203, to data set under same feature in all sample execution steps 202 operation, you can construct this
The feature vector chart of feature;
Step 204, the operation that all feature execution steps 201-203 are concentrated to data, you can construct all features
Feature vector chart, as shown in Fig. 2 data set of the left side comprising 2 features, after a K-means cluster of step 10
Class label C1 and C2 are obtained, right side is feature 1 and the feature vector chart corresponding to feature 2 respectively;
Step 30, according to the feature vector chart that previous step is constructed, calculates the modularization metric of each feature, and by mould
The maximum feature of block metric is put in character subset;
Step 301, according to formulaCalculate the respective modularization degree of each feature
Value;
Step 302, the modularization metric of each feature is normalized, Q ' is obtained;
Step 303, the feature corresponding to Q ' maximums is put in character subset, and which is deleted from residue character;
Step 40, according to character subset obtained in the previous step, calculates correlation of each residue character relative to character subset
Independence measurement value;
Step 401, according to conditional mutual information formula
Calculate I (fr;C|fj) and I (fj;C|fi) value, i.e., residue character with select the conditional mutual information of feature;
Step 402, according to formulaEach residue character is calculated relative to feature
The correlation independence of a certain feature in subset;
Step 403, according to formulaEach residue is calculated relative to character subset
Correlation independence metric;
Step 404, the correlation independence metric of each residue character is normalized, I is obtainedri';
Step 50, by the modularization metric Q ' of each residue character obtained according to step 30 and step 40 obtain every
The correlation independence metric I of individual featureri' be added with certain weight, using result of calculation as each residue character score;
Weight w of step 501, modularization metric and correlation independence metric is preset in input phase, value
Scope is [0,1], and default setting is 0.3;
Step 502, according to formula s=wQ'+ (1-w) Iri', calculate the score of each residue character;
Step 60, the feature of previous step highest scoring is put in character subset, and which is deleted from residue character, weight
Multiple execution step 40, step 50, step 60, the Characteristic Number in character subset reach required number, the spy of needs
Levy number a to preset in input phase;
Step 70, the character subset formed according to different K-means cluster results obtained in the previous step is collected, root
A most feature of occurrence number is selected according to the Characteristic Number for needing, final character subset is constituted and export.
Claims (8)
1. a kind of based on conditional mutual information and the unsupervised feature selection approach of K-means, it is characterised in that including following step
Suddenly:
Step 1), to carrying out the K-means clusters of multiple different K values and different cluster centres without label data collection, and obtain every
Secondary cluster result;
Step 2), according to step 1) the different cluster results that obtain, each feature is constructed for each cluster result successively
Feature vector chart;
Step 3), according to step 2) feature vector chart that constructs, calculate the modularization metric of each feature, and by modularization
The maximum feature of metric is put in character subset;
Step 4), according to step 3) the initial characteristicses subset that obtains, calculate each residue character relative in character subset each
The conditional mutual information of feature, so that calculate correlation independence metric of each residue character relative to character subset;
Step 5), by step 3) the modularization metric of each residue character and the step 4 that obtain) the correlation independence degree that obtains
Value is added with certain weight, using result of calculation as each residue character score;
Step 6), by step 5) feature of highest scoring that obtains is put in character subset, is then made iteratively step 4), step
Rapid 5), step 6), the Characteristic Number in character subset reach required for number;
Step 7), by step 6) obtain according to different K-means cluster results formed character subset collected, obtain most
Whole character subset.
2. the method for claim 1, it is characterised in that step 1) to without label data collection carry out multiple different K values and
The K-means clusters of different cluster centres, and obtain each cluster result;During initialization, K-means clusters are artificially specified
The maximum cluster number of algorithm and min cluster number, and cluster number of times;When being clustered each time, K-means algorithms exist
A number is randomly choosed as the number k of cluster between maximum cluster number and min cluster number, and is selected in data set at random
K point is selected as initial barycenter, by K-means clustering algorithms, the result for being clustered each time successively, i.e. class label C.
3. the method for claim 1, it is characterised in that further, step 2) according to step 1) difference that obtains gathers
Class result, constructs the feature vector chart of each feature successively for each cluster result;The spy that a certain feature is concentrated to data
The construction of vectogram is levied, is in the case of known to this feature lower eigenvalue and class label, using each sample as a point, vacation
If the class that certain sample is located contains x sample, then by the point corresponding to the sample with and the immediate x-1 of its characteristic value individual
Sample point is connected, and executes above operation to all samples that data are concentrated, you can construct this feature under same feature
Feature vector chart.
4. the method for claim 1, it is characterised in that step 3) according to step 2) feature vector chart that constructs, meter
The modularization metric of each feature is calculated, computing formula is:
In formula, i, j are steps 2) two points in the feature vector chart that constructs;AijIt is the adjacency matrix of feature vector chart,
If there is side, A from i to jij=1, it is otherwise 0;M is the sum for always connecting side in number, i.e. feature vector chart;kiAnd kjPoint
It is not the number of degrees of node i and j;Binary function δ (Ci,Cj) represent that if node i and j belong to same cluster, for 1, it is otherwise 0;
After feature vector chart according to each feature calculates respective modularization metric, all of modularization metric is carried out
Normalization, obtains Q ', the feature corresponding to Q ' maximums is put in character subset.
5. the method for claim 1, it is characterised in that step 4) according to step 3) the initial characteristicses subset that obtains, meter
Conditional mutual information of each residue character relative to each feature in character subset is calculated, relative so as to calculate each residue character
In the correlation independence metric of character subset, computing formula is:
In formula, frIt is the residue character for not being selected into character subset, fjIt is the feature in character subset, S is character subset;Its
Middle RI (fr,fj) represent residue character frRelative to one of feature in character subset fjCorrelation independence, computing formula is:
In formula, H (C) is the entropy of target variable C, I (fr;C|fj) and I (fj;C|fi) it is feature frWith feature fjCondition mutual trust
Cease, computing formula is:
In formula, N is the number of sample in data set, and C is the quantity of class.Each residue character is calculated relative to character subset
Correlation independence metric after, all of correlation independence metric is normalized, I is obtainedri'.
6. the method for claim 1, it is characterised in that step 5) by step 3) specification of each residue character that obtains
Change modularization metric and step 4) the standardization correlation independence metric that obtains each residue character is added with certain weight,
I.e.:S=wQ'+ (1-w) Iri', the w people in formula is specified, and span is [0,1], and result of calculation is remaining special as each
The score that levies.
7. the method for claim 1, it is characterised in that step 6) by step 5) spy corresponding to the s maximums that obtain
Levy and be put in character subset, be then made iteratively step 4), step 5), step 6), the Characteristic Number in character subset
Number required for reaching, Characteristic Number are artificially specified.
8. the method for claim 1, it is characterised in that step 7) by step 6) obtain poly- according to different K-means
The character subset that class result is formed is collected, and selects the most several features of occurrence number according to required Characteristic Number,
Constitute final character subset.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610888945.0A CN106503731A (en) | 2016-10-11 | 2016-10-11 | A kind of based on conditional mutual information and the unsupervised feature selection approach of K means |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610888945.0A CN106503731A (en) | 2016-10-11 | 2016-10-11 | A kind of based on conditional mutual information and the unsupervised feature selection approach of K means |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106503731A true CN106503731A (en) | 2017-03-15 |
Family
ID=58293652
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610888945.0A Pending CN106503731A (en) | 2016-10-11 | 2016-10-11 | A kind of based on conditional mutual information and the unsupervised feature selection approach of K means |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106503731A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239798A (en) * | 2017-05-24 | 2017-10-10 | 武汉大学 | A kind of feature selection approach of software-oriented defect number prediction |
CN108363784A (en) * | 2018-01-20 | 2018-08-03 | 西北工业大学 | A kind of public sentiment trend estimate method based on text machine learning |
CN109068180A (en) * | 2018-09-28 | 2018-12-21 | 武汉斗鱼网络科技有限公司 | A kind of method and relevant device of determining video selection collection |
CN109255368A (en) * | 2018-08-07 | 2019-01-22 | 平安科技(深圳)有限公司 | Randomly select method, apparatus, electronic equipment and the storage medium of feature |
CN109493929A (en) * | 2018-09-20 | 2019-03-19 | 北京工业大学 | Low redundancy feature selection method based on grouping variable |
EP3456673A1 (en) * | 2017-08-07 | 2019-03-20 | Otis Elevator Company | Predictive elevator condition monitoring using qualitative and quantitative informations |
CN109506761A (en) * | 2018-06-12 | 2019-03-22 | 国网四川省电力公司乐山供电公司 | A kind of transformer surface vibration feature extracting method |
CN109816034A (en) * | 2019-01-31 | 2019-05-28 | 清华大学 | Signal characteristic combines choosing method, device, computer equipment and storage medium |
CN110298398A (en) * | 2019-06-25 | 2019-10-01 | 大连大学 | Wireless protocols frame feature selection approach based on improved mutual imformation |
CN110426612A (en) * | 2019-08-17 | 2019-11-08 | 福州大学 | A kind of two-stage type transformer oil paper insulation time domain dielectric response characteristic quantity preferred method |
CN110942149A (en) * | 2019-10-31 | 2020-03-31 | 河海大学 | Feature variable selection method based on information change rate and condition mutual information |
CN117076962A (en) * | 2023-10-13 | 2023-11-17 | 腾讯科技(深圳)有限公司 | Data analysis method, device and equipment applied to artificial intelligence field |
CN117454314A (en) * | 2023-12-19 | 2024-01-26 | 深圳航天科创泛在电气有限公司 | Wind turbine component running state prediction method, device, equipment and storage medium |
-
2016
- 2016-10-11 CN CN201610888945.0A patent/CN106503731A/en active Pending
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239798B (en) * | 2017-05-24 | 2020-06-09 | 武汉大学 | Feature selection method for predicting number of software defects |
CN107239798A (en) * | 2017-05-24 | 2017-10-10 | 武汉大学 | A kind of feature selection approach of software-oriented defect number prediction |
EP3456673A1 (en) * | 2017-08-07 | 2019-03-20 | Otis Elevator Company | Predictive elevator condition monitoring using qualitative and quantitative informations |
US10737904B2 (en) | 2017-08-07 | 2020-08-11 | Otis Elevator Company | Elevator condition monitoring using heterogeneous sources |
CN108363784A (en) * | 2018-01-20 | 2018-08-03 | 西北工业大学 | A kind of public sentiment trend estimate method based on text machine learning |
CN109506761A (en) * | 2018-06-12 | 2019-03-22 | 国网四川省电力公司乐山供电公司 | A kind of transformer surface vibration feature extracting method |
CN109506761B (en) * | 2018-06-12 | 2021-08-27 | 国网四川省电力公司乐山供电公司 | Transformer surface vibration feature extraction method |
CN109255368A (en) * | 2018-08-07 | 2019-01-22 | 平安科技(深圳)有限公司 | Randomly select method, apparatus, electronic equipment and the storage medium of feature |
CN109255368B (en) * | 2018-08-07 | 2023-12-22 | 平安科技(深圳)有限公司 | Method, device, electronic equipment and storage medium for randomly selecting characteristics |
CN109493929B (en) * | 2018-09-20 | 2022-03-15 | 北京工业大学 | Low redundancy feature selection method based on grouping variables |
CN109493929A (en) * | 2018-09-20 | 2019-03-19 | 北京工业大学 | Low redundancy feature selection method based on grouping variable |
CN109068180B (en) * | 2018-09-28 | 2021-02-02 | 武汉斗鱼网络科技有限公司 | Method for determining video fine selection set and related equipment |
CN109068180A (en) * | 2018-09-28 | 2018-12-21 | 武汉斗鱼网络科技有限公司 | A kind of method and relevant device of determining video selection collection |
CN109816034B (en) * | 2019-01-31 | 2021-08-27 | 清华大学 | Signal characteristic combination selection method and device, computer equipment and storage medium |
CN109816034A (en) * | 2019-01-31 | 2019-05-28 | 清华大学 | Signal characteristic combines choosing method, device, computer equipment and storage medium |
CN110298398B (en) * | 2019-06-25 | 2021-08-03 | 大连大学 | Wireless protocol frame characteristic selection method based on improved mutual information |
CN110298398A (en) * | 2019-06-25 | 2019-10-01 | 大连大学 | Wireless protocols frame feature selection approach based on improved mutual imformation |
CN110426612A (en) * | 2019-08-17 | 2019-11-08 | 福州大学 | A kind of two-stage type transformer oil paper insulation time domain dielectric response characteristic quantity preferred method |
CN110942149B (en) * | 2019-10-31 | 2020-09-22 | 河海大学 | Feature variable selection method based on information change rate and condition mutual information |
CN110942149A (en) * | 2019-10-31 | 2020-03-31 | 河海大学 | Feature variable selection method based on information change rate and condition mutual information |
CN117076962A (en) * | 2023-10-13 | 2023-11-17 | 腾讯科技(深圳)有限公司 | Data analysis method, device and equipment applied to artificial intelligence field |
CN117076962B (en) * | 2023-10-13 | 2024-01-26 | 腾讯科技(深圳)有限公司 | Data analysis method, device and equipment applied to artificial intelligence field |
CN117454314A (en) * | 2023-12-19 | 2024-01-26 | 深圳航天科创泛在电气有限公司 | Wind turbine component running state prediction method, device, equipment and storage medium |
CN117454314B (en) * | 2023-12-19 | 2024-03-05 | 深圳航天科创泛在电气有限公司 | Wind turbine component running state prediction method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106503731A (en) | A kind of based on conditional mutual information and the unsupervised feature selection approach of K means | |
US10713597B2 (en) | Systems and methods for preparing data for use by machine learning algorithms | |
CN106021364B (en) | Foundation, image searching method and the device of picture searching dependency prediction model | |
CN103679132B (en) | A kind of nude picture detection method and system | |
CN110674407A (en) | Hybrid recommendation method based on graph convolution neural network | |
CN109840282A (en) | A kind of knowledge mapping optimization method based on fuzzy theory | |
CN107203785A (en) | Multipath Gaussian kernel Fuzzy c-Means Clustering Algorithm | |
CN103886330A (en) | Classification method based on semi-supervised SVM ensemble learning | |
CN109543723A (en) | A kind of image clustering method of robust | |
CN108596264A (en) | A kind of community discovery method based on deep learning | |
Du et al. | Improving the performance of feature selection and data clustering with novel global search and elite-guided artificial bee colony algorithm | |
CN112949954B (en) | Method for establishing financial fraud recognition model based on recognition learning | |
Ismaili et al. | A supervised methodology to measure the variables contribution to a clustering | |
CN107704872A (en) | A kind of K means based on relatively most discrete dimension segmentation cluster initial center choosing method | |
Zang et al. | Improved spectral clustering based on density combining DNA genetic algorithm | |
Ganji et al. | Lagrangian constrained community detection | |
CN117235331A (en) | Fair federation learning method for cross-domain social network node classification tasks | |
Bandyopadhyay et al. | Integrating network embedding and community outlier detection via multiclass graph description | |
CN108446740B (en) | A kind of consistent Synergistic method of multilayer for brain image case history feature extraction | |
Ahmed et al. | Improving prediction of plant disease using k-efficient clustering and classification algorithms | |
Bhardwaj et al. | Forecasting GDP per capita of OECD countries using machine learning and deep learning models | |
Sangita et al. | An improved k-means clustering approach for teaching evaluation | |
Dantas et al. | Adaptive batch SOM for multiple dissimilarity data tables | |
Mousavi | A New Clustering Method Using Evolutionary Algorithms for Determining Initial States, and Diverse Pairwise Distances for Clustering | |
Lee et al. | A new artificial bee colony based clustering method and its application to the business failure prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170315 |
|
WD01 | Invention patent application deemed withdrawn after publication |