CN106503731A - A kind of based on conditional mutual information and the unsupervised feature selection approach of K means - Google Patents

A kind of based on conditional mutual information and the unsupervised feature selection approach of K means Download PDF

Info

Publication number
CN106503731A
CN106503731A CN201610888945.0A CN201610888945A CN106503731A CN 106503731 A CN106503731 A CN 106503731A CN 201610888945 A CN201610888945 A CN 201610888945A CN 106503731 A CN106503731 A CN 106503731A
Authority
CN
China
Prior art keywords
feature
character
cluster
character subset
subset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610888945.0A
Other languages
Chinese (zh)
Inventor
马廷淮
邵文晔
曹杰
薛羽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN201610888945.0A priority Critical patent/CN106503731A/en
Publication of CN106503731A publication Critical patent/CN106503731A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of based on conditional mutual information and the unsupervised feature selection approach of K means, the data without class label are clustered by the different K means algorithms of multiple primary condition first, then on the basis of cluster each time, consider each feature modularization metric and different characteristic between conditional mutual information, select using the correlation independence index between feature that the degree of correlation is high and the little character subset of redundancy.By being collected the character subset that different K means cluster results are obtained, final character subset is obtained.The present invention can be effectively applied to without label and unbalanced data set, and the character subset degree of correlation height of acquisition, redundancy are little.

Description

A kind of based on conditional mutual information and the unsupervised feature selection approach of K-means
Technical field
The invention belongs to the feature selection issues in machine learning field, and in particular to be a kind of using conditional mutual information with Method of the K-means algorithms to carrying out unsupervised feature selecting without label data collection.
Background technology
In the practical application of machine learning, feature quantity is often more, wherein there may be incoherent feature, feature Between also likely to be present and interdepend.Characteristic Number is more, and the time that analyzes needed for feature, training pattern is longer, Er Qierong Easily cause " dimension disaster ", make model increasingly complex, so as to bring the consequences such as model Generalization Ability decline.Therefore, feature is carried out Select particularly important.
Feature selecting is also referred to as feature subset selection or Attributions selection, refers to one character subset of selection from whole features, Make the model for constructing more preferable.Feature selecting can reject the feature of uncorrelated or redundancy, so as to reach minimizing Characteristic Number, carry High model accuracy, reduces the purpose of run time.On the other hand, very positively related feature reduction model is selected, makes to grind Study carefully the process that personnel should be readily appreciated that data are produced.
Different from the combination for building learning model according to search optimal feature subset, feature selection approach can be big Cause is divided into two class of packaged type feature selecting (Wrapper) and filtering type feature selecting (Filter).Packaged type feature selecting is continuous Repeatedly operation learning algorithm goes the quality of evaluation attribute collection, and it is better than filtering type feature selecting in precision, but for other For grader, its Generalization Capability is poor.High Dimensional Data Set is faced, as packaged type feature selecting needs and specific study Algorithm is combined closely, and therefore the computation complexity in learning process is very high.Filtering type feature selecting specifically need not learn to calculate Method, but carry out the quality of Fast Evaluation feature using suitable criterion, therefore it is a kind of higher method of computational efficiency.
Existing most of traditional characteristic system of selection is to improve nicety of grading as optimization aim, does not take into full account number According to the distribution situation of sample, and the generally results of learning of pursuit big class, easily ignore the learning performance of group.For solving data not The problem of balance, in data plane, can carry out double sampling to the positive class sample of training set, so that positive and negative class before training Sample reaches balance, is then learnt (Exploratory under-sampling for class- again accordingly Imbalance learning.Liu X Y, Wu J, Zhou Z H), but this is obtained by cannot all data, can make score Class precision reduces.In algorithm aspect, according to data category be distributed disequilibrium the characteristics of traditional characteristic selection algorithm is carried out Improve, so that algorithm adapts to unbalanced sample (the feature selecting new algorithm in unbalanced problem of category distribution:IM-IG. outstanding ring Space, Chen Yan, Li Guozheng), but this method is confined to the unbalanced problem of two classes, for the unbalanced problem of multiclass and does not apply to.
For filtering type feature selecting, existing many supervised feature selection approach are suggested at present, such as apply mutually Information is estimated to candidate feature, and the several features for selecting ranking most front are used as the input (Using of neural network classifier mutual information for selecting features in supervised neural net Learning.R.Battiti), but this method have ignored the redundancy between feature, so as to cause the spy for selecting many redundancies Levy, and be unfavorable for that the performance of subsequent classifier is improved.And this method is only applicable to the data with class label information, right In unsupervised feature selecting and do not apply to.
In unsupervised feature selecting field, many is applied to the unsupervised feature selection approach of text and is suggested, but this A little methods cannot directly apply to numeric type data.Certain applications in the unsupervised feature selection approach of numeric data, such as towards The unsupervised filtering type feature selecting algorithm of characteristic of division, based on one-pass clustering algorithm, using each feature in different clusters Between the importance degree that showed as basis for estimation, Changing Pattern selected characteristic subset finally according to importance (towards point The unsupervised feature selection approach research of category feature. Wang Lianxi, Jiang Shengyi), this method is only using one-pass clustering algorithm logarithm According to being divided so that the result of cluster has randomness, it is impossible to ensure the accuracy of feature selecting.
The present invention is clustered to the data without class label by the different K-means algorithms of multiple primary condition first, Then here cluster on the basis of, consider each feature modularization metric and different characteristic between conditional mutual information, Degree of correlation height and the little character subset of redundancy is obtained, finally the character subset that different K-means cluster results are obtained is carried out Collect.
Content of the invention
Purpose:The technical problem to be solved is the feature selection issues without label data collection, proposes a kind of base In conditional mutual information and the unsupervised feature selection approach of K-means.By the different K-means algorithms pair of multiple primary condition Data without class label are clustered, and eliminating carries out the randomness of feature selecting on single cluster result, and it is uneven to reduce data Impact of the weighing apparatus to feature selecting.On the basis of cluster each time, the modularization metric and not of each feature is considered With the conditional mutual information between feature, degree of correlation height is selected using the correlation independence index between feature and redundancy is little Combinations of features.By being collected the character subset that different K-means cluster results are obtained, final feature is obtained Collection.The present invention can be effectively applied to without label and unbalanced data set, and the character subset degree of correlation height of acquisition, redundancy Degree is little.
Technical scheme is as follows:
A kind of based on conditional mutual information and the unsupervised feature selection approach of K-means, comprise the following steps:
Step 1), to carrying out the K-means clusters of multiple different K values and different cluster centres without label data collection, and obtain Obtain cluster result every time;
Step 2), according to step 1) the different cluster results that obtain, it is special each to be constructed for each cluster result successively The feature vector chart that levies;
Step 3), according to step 2) feature vector chart that constructs, calculate the modularization metric of each feature, and by mould The maximum feature of block metric is put in character subset;
Step 4), according to step 3) the initial characteristicses subset that obtains, in calculating each residue character relative to character subset The conditional mutual information of each feature, so that calculate correlation independence metric of each residue character relative to character subset;
Step 5), by step 3) the modularization metric of each residue character that obtains and step 4) and obtain related independent Property metric is added with certain weight, using result of calculation as each residue character score;
Step 6), by step 5) feature of highest scoring that obtains is put in character subset, is then made iteratively step 4), step 5), step 6), the Characteristic Number in character subset reach required for number;
Step 7), by step 6) obtain according to different K-means cluster results formed character subset collected, obtain Arrive final character subset.
Further, of the invention based on conditional mutual information and the unsupervised feature selection approach of K-means, step 1) right Without the K-means clusters that label data collection carries out multiple different K values and different cluster centres, and obtain each cluster result. The present invention is first by K-means clustering algorithms to carrying out the different cluster of multiple initial value without label data collection.During initialization, The maximum cluster number and min cluster number of K-means clustering algorithm, and cluster number of times are artificially specified.Carry out each time During cluster, K-means algorithms randomly choose a number as the number of cluster between maximum cluster number and min cluster number K, and k point is randomly choosed in data set as initial barycenter, by K-means clustering algorithms, can obtain successively each The result of secondary cluster, i.e. class label C.
Further, of the invention based on conditional mutual information and the unsupervised feature selection approach of K-means, step 2) root According to step 1) the different cluster results that obtain, construct the feature vector chart of each feature successively for each cluster result.Right The construction of the feature vector chart of a certain feature in data set, is in the case of known to this feature lower eigenvalue and class label, incites somebody to action Each sample is used as a point, it is assumed that the class that certain sample is located contains x sample, then by the point corresponding to the sample and and The immediate x-1 sample point of its characteristic value is connected, more than all samples execution that data is concentrated under same feature Operation, you can construct the feature vector chart of this feature.
Further, of the invention based on conditional mutual information and the unsupervised feature selection approach of K-means, step 3) root According to step 2) feature vector chart that constructs, the modularization metric of each feature is calculated, computing formula is:
In formula, i, j are steps 2) two points in the feature vector chart that constructs;AijIt is the adjacent square of feature vector chart , if there is side, A from i to j in battle arrayij=1, it is otherwise 0;M is the sum for always connecting side in number, i.e. feature vector chart;kiAnd kj It is the number of degrees of node i and j respectively;Binary function δ (Ci,Cj) represent that if node i and j belong to same cluster, for 1, otherwise for 0;After feature vector chart according to each feature calculates respective modularization metric, all of modularization metric is entered Row normalization, obtains Q ', the feature corresponding to Q ' maximums is put in character subset.
Further, of the invention based on conditional mutual information and the unsupervised feature selection approach of K-means, step 4) root According to step 3) the initial characteristicses subset that obtains, calculate condition mutual trust of each residue character relative to each feature in character subset Breath, so as to calculate correlation independence metric of each residue character relative to character subset, computing formula is:
In formula, frIt is the residue character for not being selected into character subset, fjIt is the feature in character subset, S is feature Collection;Wherein RI (fr,fj) represent residue character frRelative to one of feature in character subset fjCorrelation independence, computing formula For:
In formula, H (C) is the entropy of target variable C, I (fr;C|fj) and I (fj;C|fi) it is feature frWith feature fjCondition Mutual information, computing formula is:
In formula, N is the number of sample in data set, and C is the quantity of class.Each residue character is calculated relative to feature After the correlation independence metric of subset, all of correlation independence metric is normalized, I is obtainedri'.
Further, of the invention based on conditional mutual information and the unsupervised feature selection approach of K-means, step 5) will Step 3) the normalizing block metric of each residue character and the step 4 that obtain) obtain the standardization of each residue character Correlation independence metric is added with certain weight, i.e.,:S=wQ'+ (1-w) Iri', the w people in formula is specified, span For [0,1], using result of calculation as each residue character score.
Further, of the invention based on conditional mutual information and the unsupervised feature selection approach of K-means, step 6) will Step 5) feature corresponding to the s maximums that obtain is put in character subset, is then made iteratively step 4), step 5), step Rapid 6) Characteristic Number in character subset reaches required number, and Characteristic Number is artificially specified.
Further, of the invention based on conditional mutual information and the unsupervised feature selection approach of K-means, step 7) will Step 6) character subset formed according to different K-means cluster results that obtains collected, according to required feature Number selects the most several features of occurrence number, constitutes final character subset.
Beneficial effect
The present invention is directed to the feature selection issues without label data collection in machine learning, by K-means algorithms and feature Between conditional mutual information combine, contribute to selecting and concentrate most important feature without label data.The method is by repeatedly The different K-means algorithms of primary condition, to clustering without label data collection, can eliminate The randomness of selection is levied, impact of the data nonbalance to feature selecting is reduced, and conventional feature selection approach is compensate for injustice Weighing apparatus data set features Selection effect is undesirable or defect that be only applicable to label data collection;Meanwhile, in order to obtain the degree of correlation high, The little character subset of redundancy, this method consider the modularization metric of each feature on the basis of cluster each time And the conditional mutual information between different characteristic, degree of correlation height and redundancy is selected using the correlation independence index between feature The little combinations of features of degree, by being collected the character subset for repeatedly extracting, obtains final character subset.K-means Algorithm and the combination of conditional mutual information so that this feature selection algorithm both can apply to balance or nonequilibrium without label data Collection, and the degree of correlation of energy lifting feature subset, reduce its redundancy, so as to select most important characteristic set.
Description of the drawings
Fig. 1 is the flow chart of the unsupervised feature selection approach based on conditional mutual information and K-means.
Fig. 2 is the example to data set structural feature vectogram.
Specific embodiment
Below in conjunction with the accompanying drawings the enforcement of technical scheme is described in further detail:
In conjunction with flow chart and case study on implementation to of the present invention based on conditional mutual information and the unsupervised feature of K-means System of selection is described in further detail.
The implementation case carries out feature selecting using conditional mutual information and K-means algorithms to the data set without label.Such as Shown in Fig. 1, this method is comprised the steps of:
Step 10, to carrying out the K-means clusters of multiple different K values and different cluster centres without label data collection, and obtains Obtain cluster result every time;
Step 101, maximum cluster number MAX of K-means algorithms and min cluster number MIN are advance in input phase Given, before clustering every time, randomly choose a number in the range of [MAX, MIN] as number k of cluster, and in data set with Machine selects k point as initial barycenter;
Step 102, the total degree T for carrying out K-means clustering algorithms are previously given in input phase, often execute one Secondary K-means algorithms, can obtain a group cluster result i.e. class label C, repeat K-means clusters, until clustering number of times Total degree set in advance is reached, the different cluster result of T groups may finally be obtained;
Step 20, according to cluster result obtained in the previous step, constructs each feature for each cluster result successively Feature vector chart;
Data are concentrated the construction of the feature vector chart of a certain feature, are the features of sample under this feature by step 201 In the case of value and class label are known, first using each sample as a point, the number comprising two features as shown in Figure 2 According to one sample of each round dot on right side and square point expression, the numeral for putting side represent the big of the characteristic value corresponding to point Little;
Step 202, if the total sample number that the class that certain sample is located includes is x, by the point corresponding to the sample with X-1 sample point immediate with its characteristic value is connected, as shown in Fig. 2 the class that sample 1 is located is C1, the sample that C1 classes include Sum is 4, then by the point corresponding to sample 1 and and immediate 3 sample points of its characteristic value, i.e. sample 2, sample 7, sample 6 are connected;
Step 203, to data set under same feature in all sample execution steps 202 operation, you can construct this The feature vector chart of feature;
Step 204, the operation that all feature execution steps 201-203 are concentrated to data, you can construct all features Feature vector chart, as shown in Fig. 2 data set of the left side comprising 2 features, after a K-means cluster of step 10 Class label C1 and C2 are obtained, right side is feature 1 and the feature vector chart corresponding to feature 2 respectively;
Step 30, according to the feature vector chart that previous step is constructed, calculates the modularization metric of each feature, and by mould The maximum feature of block metric is put in character subset;
Step 301, according to formulaCalculate the respective modularization degree of each feature Value;
Step 302, the modularization metric of each feature is normalized, Q ' is obtained;
Step 303, the feature corresponding to Q ' maximums is put in character subset, and which is deleted from residue character;
Step 40, according to character subset obtained in the previous step, calculates correlation of each residue character relative to character subset Independence measurement value;
Step 401, according to conditional mutual information formula Calculate I (fr;C|fj) and I (fj;C|fi) value, i.e., residue character with select the conditional mutual information of feature;
Step 402, according to formulaEach residue character is calculated relative to feature The correlation independence of a certain feature in subset;
Step 403, according to formulaEach residue is calculated relative to character subset Correlation independence metric;
Step 404, the correlation independence metric of each residue character is normalized, I is obtainedri';
Step 50, by the modularization metric Q ' of each residue character obtained according to step 30 and step 40 obtain every The correlation independence metric I of individual featureri' be added with certain weight, using result of calculation as each residue character score;
Weight w of step 501, modularization metric and correlation independence metric is preset in input phase, value Scope is [0,1], and default setting is 0.3;
Step 502, according to formula s=wQ'+ (1-w) Iri', calculate the score of each residue character;
Step 60, the feature of previous step highest scoring is put in character subset, and which is deleted from residue character, weight Multiple execution step 40, step 50, step 60, the Characteristic Number in character subset reach required number, the spy of needs Levy number a to preset in input phase;
Step 70, the character subset formed according to different K-means cluster results obtained in the previous step is collected, root A most feature of occurrence number is selected according to the Characteristic Number for needing, final character subset is constituted and export.

Claims (8)

1. a kind of based on conditional mutual information and the unsupervised feature selection approach of K-means, it is characterised in that including following step Suddenly:
Step 1), to carrying out the K-means clusters of multiple different K values and different cluster centres without label data collection, and obtain every Secondary cluster result;
Step 2), according to step 1) the different cluster results that obtain, each feature is constructed for each cluster result successively Feature vector chart;
Step 3), according to step 2) feature vector chart that constructs, calculate the modularization metric of each feature, and by modularization The maximum feature of metric is put in character subset;
Step 4), according to step 3) the initial characteristicses subset that obtains, calculate each residue character relative in character subset each The conditional mutual information of feature, so that calculate correlation independence metric of each residue character relative to character subset;
Step 5), by step 3) the modularization metric of each residue character and the step 4 that obtain) the correlation independence degree that obtains Value is added with certain weight, using result of calculation as each residue character score;
Step 6), by step 5) feature of highest scoring that obtains is put in character subset, is then made iteratively step 4), step Rapid 5), step 6), the Characteristic Number in character subset reach required for number;
Step 7), by step 6) obtain according to different K-means cluster results formed character subset collected, obtain most Whole character subset.
2. the method for claim 1, it is characterised in that step 1) to without label data collection carry out multiple different K values and The K-means clusters of different cluster centres, and obtain each cluster result;During initialization, K-means clusters are artificially specified The maximum cluster number of algorithm and min cluster number, and cluster number of times;When being clustered each time, K-means algorithms exist A number is randomly choosed as the number k of cluster between maximum cluster number and min cluster number, and is selected in data set at random K point is selected as initial barycenter, by K-means clustering algorithms, the result for being clustered each time successively, i.e. class label C.
3. the method for claim 1, it is characterised in that further, step 2) according to step 1) difference that obtains gathers Class result, constructs the feature vector chart of each feature successively for each cluster result;The spy that a certain feature is concentrated to data The construction of vectogram is levied, is in the case of known to this feature lower eigenvalue and class label, using each sample as a point, vacation If the class that certain sample is located contains x sample, then by the point corresponding to the sample with and the immediate x-1 of its characteristic value individual Sample point is connected, and executes above operation to all samples that data are concentrated, you can construct this feature under same feature Feature vector chart.
4. the method for claim 1, it is characterised in that step 3) according to step 2) feature vector chart that constructs, meter The modularization metric of each feature is calculated, computing formula is:
Q = Σ i j [ A i j 2 M - k i * k j ( 2 M ) * ( 2 M ) ] δ ( C i , C j )
In formula, i, j are steps 2) two points in the feature vector chart that constructs;AijIt is the adjacency matrix of feature vector chart, If there is side, A from i to jij=1, it is otherwise 0;M is the sum for always connecting side in number, i.e. feature vector chart;kiAnd kjPoint It is not the number of degrees of node i and j;Binary function δ (Ci,Cj) represent that if node i and j belong to same cluster, for 1, it is otherwise 0; After feature vector chart according to each feature calculates respective modularization metric, all of modularization metric is carried out Normalization, obtains Q ', the feature corresponding to Q ' maximums is put in character subset.
5. the method for claim 1, it is characterised in that step 4) according to step 3) the initial characteristicses subset that obtains, meter Conditional mutual information of each residue character relative to each feature in character subset is calculated, relative so as to calculate each residue character In the correlation independence metric of character subset, computing formula is:
I r i ( f r ; C | S ) = Σ f j ∈ S R I ( f r , f j )
In formula, frIt is the residue character for not being selected into character subset, fjIt is the feature in character subset, S is character subset;Its Middle RI (fr,fj) represent residue character frRelative to one of feature in character subset fjCorrelation independence, computing formula is:
R I ( f r , f j ) = I ( f r ; C | f j ) + I ( f j ; C | f i ) 2 H ( C )
In formula, H (C) is the entropy of target variable C, I (fr;C|fj) and I (fj;C|fi) it is feature frWith feature fjCondition mutual trust Cease, computing formula is:
I ( X i ; Y | X j ) = Σ i = 1 N Σ j = 1 N Σ k = 1 C p ( x i , x j , y k ) log p ( x i , y k | x j ) p ( x i | x j ) p ( y k | x j )
In formula, N is the number of sample in data set, and C is the quantity of class.Each residue character is calculated relative to character subset Correlation independence metric after, all of correlation independence metric is normalized, I is obtainedri'.
6. the method for claim 1, it is characterised in that step 5) by step 3) specification of each residue character that obtains Change modularization metric and step 4) the standardization correlation independence metric that obtains each residue character is added with certain weight, I.e.:S=wQ'+ (1-w) Iri', the w people in formula is specified, and span is [0,1], and result of calculation is remaining special as each The score that levies.
7. the method for claim 1, it is characterised in that step 6) by step 5) spy corresponding to the s maximums that obtain Levy and be put in character subset, be then made iteratively step 4), step 5), step 6), the Characteristic Number in character subset Number required for reaching, Characteristic Number are artificially specified.
8. the method for claim 1, it is characterised in that step 7) by step 6) obtain poly- according to different K-means The character subset that class result is formed is collected, and selects the most several features of occurrence number according to required Characteristic Number, Constitute final character subset.
CN201610888945.0A 2016-10-11 2016-10-11 A kind of based on conditional mutual information and the unsupervised feature selection approach of K means Pending CN106503731A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610888945.0A CN106503731A (en) 2016-10-11 2016-10-11 A kind of based on conditional mutual information and the unsupervised feature selection approach of K means

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610888945.0A CN106503731A (en) 2016-10-11 2016-10-11 A kind of based on conditional mutual information and the unsupervised feature selection approach of K means

Publications (1)

Publication Number Publication Date
CN106503731A true CN106503731A (en) 2017-03-15

Family

ID=58293652

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610888945.0A Pending CN106503731A (en) 2016-10-11 2016-10-11 A kind of based on conditional mutual information and the unsupervised feature selection approach of K means

Country Status (1)

Country Link
CN (1) CN106503731A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239798A (en) * 2017-05-24 2017-10-10 武汉大学 A kind of feature selection approach of software-oriented defect number prediction
CN108363784A (en) * 2018-01-20 2018-08-03 西北工业大学 A kind of public sentiment trend estimate method based on text machine learning
CN109068180A (en) * 2018-09-28 2018-12-21 武汉斗鱼网络科技有限公司 A kind of method and relevant device of determining video selection collection
CN109255368A (en) * 2018-08-07 2019-01-22 平安科技(深圳)有限公司 Randomly select method, apparatus, electronic equipment and the storage medium of feature
CN109493929A (en) * 2018-09-20 2019-03-19 北京工业大学 Low redundancy feature selection method based on grouping variable
EP3456673A1 (en) * 2017-08-07 2019-03-20 Otis Elevator Company Predictive elevator condition monitoring using qualitative and quantitative informations
CN109506761A (en) * 2018-06-12 2019-03-22 国网四川省电力公司乐山供电公司 A kind of transformer surface vibration feature extracting method
CN109816034A (en) * 2019-01-31 2019-05-28 清华大学 Signal characteristic combines choosing method, device, computer equipment and storage medium
CN110298398A (en) * 2019-06-25 2019-10-01 大连大学 Wireless protocols frame feature selection approach based on improved mutual imformation
CN110426612A (en) * 2019-08-17 2019-11-08 福州大学 A kind of two-stage type transformer oil paper insulation time domain dielectric response characteristic quantity preferred method
CN110942149A (en) * 2019-10-31 2020-03-31 河海大学 Feature variable selection method based on information change rate and condition mutual information
CN117076962A (en) * 2023-10-13 2023-11-17 腾讯科技(深圳)有限公司 Data analysis method, device and equipment applied to artificial intelligence field
CN117454314A (en) * 2023-12-19 2024-01-26 深圳航天科创泛在电气有限公司 Wind turbine component running state prediction method, device, equipment and storage medium

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239798B (en) * 2017-05-24 2020-06-09 武汉大学 Feature selection method for predicting number of software defects
CN107239798A (en) * 2017-05-24 2017-10-10 武汉大学 A kind of feature selection approach of software-oriented defect number prediction
EP3456673A1 (en) * 2017-08-07 2019-03-20 Otis Elevator Company Predictive elevator condition monitoring using qualitative and quantitative informations
US10737904B2 (en) 2017-08-07 2020-08-11 Otis Elevator Company Elevator condition monitoring using heterogeneous sources
CN108363784A (en) * 2018-01-20 2018-08-03 西北工业大学 A kind of public sentiment trend estimate method based on text machine learning
CN109506761A (en) * 2018-06-12 2019-03-22 国网四川省电力公司乐山供电公司 A kind of transformer surface vibration feature extracting method
CN109506761B (en) * 2018-06-12 2021-08-27 国网四川省电力公司乐山供电公司 Transformer surface vibration feature extraction method
CN109255368A (en) * 2018-08-07 2019-01-22 平安科技(深圳)有限公司 Randomly select method, apparatus, electronic equipment and the storage medium of feature
CN109255368B (en) * 2018-08-07 2023-12-22 平安科技(深圳)有限公司 Method, device, electronic equipment and storage medium for randomly selecting characteristics
CN109493929B (en) * 2018-09-20 2022-03-15 北京工业大学 Low redundancy feature selection method based on grouping variables
CN109493929A (en) * 2018-09-20 2019-03-19 北京工业大学 Low redundancy feature selection method based on grouping variable
CN109068180B (en) * 2018-09-28 2021-02-02 武汉斗鱼网络科技有限公司 Method for determining video fine selection set and related equipment
CN109068180A (en) * 2018-09-28 2018-12-21 武汉斗鱼网络科技有限公司 A kind of method and relevant device of determining video selection collection
CN109816034B (en) * 2019-01-31 2021-08-27 清华大学 Signal characteristic combination selection method and device, computer equipment and storage medium
CN109816034A (en) * 2019-01-31 2019-05-28 清华大学 Signal characteristic combines choosing method, device, computer equipment and storage medium
CN110298398B (en) * 2019-06-25 2021-08-03 大连大学 Wireless protocol frame characteristic selection method based on improved mutual information
CN110298398A (en) * 2019-06-25 2019-10-01 大连大学 Wireless protocols frame feature selection approach based on improved mutual imformation
CN110426612A (en) * 2019-08-17 2019-11-08 福州大学 A kind of two-stage type transformer oil paper insulation time domain dielectric response characteristic quantity preferred method
CN110942149B (en) * 2019-10-31 2020-09-22 河海大学 Feature variable selection method based on information change rate and condition mutual information
CN110942149A (en) * 2019-10-31 2020-03-31 河海大学 Feature variable selection method based on information change rate and condition mutual information
CN117076962A (en) * 2023-10-13 2023-11-17 腾讯科技(深圳)有限公司 Data analysis method, device and equipment applied to artificial intelligence field
CN117076962B (en) * 2023-10-13 2024-01-26 腾讯科技(深圳)有限公司 Data analysis method, device and equipment applied to artificial intelligence field
CN117454314A (en) * 2023-12-19 2024-01-26 深圳航天科创泛在电气有限公司 Wind turbine component running state prediction method, device, equipment and storage medium
CN117454314B (en) * 2023-12-19 2024-03-05 深圳航天科创泛在电气有限公司 Wind turbine component running state prediction method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106503731A (en) A kind of based on conditional mutual information and the unsupervised feature selection approach of K means
CN106021364B (en) Foundation, image searching method and the device of picture searching dependency prediction model
Kuo et al. Integration of particle swarm optimization and genetic algorithm for dynamic clustering
CN103679132B (en) A kind of nude picture detection method and system
CN105487526B (en) A kind of Fast RVM sewage treatment method for diagnosing faults
AU2019210306A1 (en) Systems and methods for preparing data for use by machine learning algorithms
CN107103332A (en) A kind of Method Using Relevance Vector Machine sorting technique towards large-scale dataset
CN110674407A (en) Hybrid recommendation method based on graph convolution neural network
CN103258210B (en) A kind of high-definition image classification method based on dictionary learning
CN103886330A (en) Classification method based on semi-supervised SVM ensemble learning
CN109543723A (en) A kind of image clustering method of robust
CN108596264A (en) A kind of community discovery method based on deep learning
CN108062566A (en) A kind of intelligent integrated flexible measurement method based on the potential feature extraction of multinuclear
Du et al. Improving the performance of feature selection and data clustering with novel global search and elite-guided artificial bee colony algorithm
Poojitha et al. A collocation of IRIS flower using neural network clustering tool in MATLAB
CN107273922A (en) A kind of screening sample and weighing computation method learnt towards multi-source instance migration
Ismaili et al. A supervised methodology to measure the variables contribution to a clustering
CN117435982A (en) Method for rapidly identifying network water army through multiple dimensions
Bandyopadhyay et al. Integrating network embedding and community outlier detection via multiclass graph description
CN108446740B (en) A kind of consistent Synergistic method of multilayer for brain image case history feature extraction
Ahmed et al. Improving prediction of plant disease using k-efficient clustering and classification algorithms
Dalatu et al. Hybrid distance functions for K-Means clustering algorithms
Sangita et al. An improved k-means clustering approach for teaching evaluation
CN108573264A (en) A kind of household industry potential customers' recognition methods based on novel bee group's clustering algorithm
Bhardwaj et al. Forecasting GDP per capita of OECD countries using machine learning and deep learning models

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170315

WD01 Invention patent application deemed withdrawn after publication