CN106169085A - Feature selection approach based on measure information - Google Patents

Feature selection approach based on measure information Download PDF

Info

Publication number
CN106169085A
CN106169085A CN201610542270.4A CN201610542270A CN106169085A CN 106169085 A CN106169085 A CN 106169085A CN 201610542270 A CN201610542270 A CN 201610542270A CN 106169085 A CN106169085 A CN 106169085A
Authority
CN
China
Prior art keywords
formula
class label
feature
features
make
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610542270.4A
Other languages
Chinese (zh)
Inventor
郭继昌
顾翔元
李重仪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201610542270.4A priority Critical patent/CN106169085A/en
Publication of CN106169085A publication Critical patent/CN106169085A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to machine learning, data mining technology field, for proposing a kind of feature selecting algorithm based on measure information, and whether there is the balance coefficient generally optimum to some data set performances by experiment show.The technical solution used in the present invention is, feature selection approach based on measure information, step is as follows: utilize feature fiSU (f with class label ci;C) He two features fi、ftThe three tunnel interactive information I (f with class label ci;ft;C) the two amount, building object function is:In above formula, fiFor the feature do not chosen, X is the feature set do not chosen, and c is class label, and D is for meeting I (fi;fs;C) maximum f more than zerosFeature set, fsIt is a feature just selected, ftFor the feature of D subset, β is balance coefficient.Present invention is mainly applied to machine learning, data mining occasion.

Description

Feature selection approach based on measure information
Technical field
The invention belongs to machine learning, data mining technology field, relate to a kind of feature selection side based on measure information Method.
Background technology
As a kind of important way of Dimensionality Reduction, feature selection is according to certain module, from primitive character Choose a preferably subset as final feature, thus reduce intrinsic dimensionality.Calculate with study according to character subset module The relation of method, feature selecting algorithm can be divided into filtering type (Filter), embedded (Embedded) and packaging type (Wrapper).Three compares, and Embedded algorithm and packaging type algorithm characteristics Selection effect are good, but the most more;Filtering type algorithm Feature selection effect is relatively poor, but time-consumingly few, compares and is suitable for being applied to High Dimensional Data Set.According to different modules, mistake Filter formula algorithm can be divided into algorithm based on measure information, algorithm based on distance metric, algorithm based on consistency metric and Algorithm based on subordinate tolerance.The present invention proposes a kind of feature selecting algorithm based on measure information.
For sake of convenience, only the feature selection being based only upon measure information is analyzed.For being based only upon measure information Feature selection, people mainly launch research in terms of following two: a kind of is the feature selecting algorithm merely with mutual information.Its In, utilizing the mutual information of feature and class label to weigh the mutual information between dependency and feature, to weigh the algorithm of redundancy more;Separately A kind of is the feature selecting algorithm utilizing mutual information to combine three tunnel interactive information.Owing to the algorithm of the second situation exists mutual information Do not have effectively to be combined with three tunnel interactive information and make the undesirable problem of feature selection effect, therefore to the second situation Feature selection is studied.
Summary of the invention
For overcoming the deficiencies in the prior art, it is contemplated that propose a kind of feature selecting algorithm based on measure information, and The balance coefficient generally optimum to some data set performances whether is there is by experiment show.The technical side that the present invention uses Case is, feature selection approach based on measure information, and step is as follows: set X, Y, Z as three discrete random variables, the three of X, Y, Z Road interactive information I (X;Y;Z) with the conditional mutual information I (X of X, Y, Z;And the mutual information I (X of X, Y Y/Z);Y) there is following relation:
I(X;Y;Z)=I (X;Y/Z)-I(X;Y) (7)
Use well-balanced uncertainty SU (Symmetrical Uncertainty) to mutual information normalization, feature fiWith class The SU value of label c is as follows:
S U ( f i ; c ) = 2 I ( f i ; c ) H ( f i ) + H ( c ) - - - ( 8 )
Wherein, H (fi) it is characterized fiEntropy, H (c) is the entropy of class label c, I (fi;C) it is characterized fiMutual with class label c Information;
Utilize formula (7), make X=fi, Y=ft, Z=c, obtain formula (9):
I(fi;ft;C)=I (fi;ft/c)-I(fi;ft) (9)
Wherein, I (fi;ft;C) it is two features fi、ftWith the three tunnel interactive information of class label c, I (fi;ft/ c) it is in class Two features f under label c known conditionsiWith ftMutual information, I (fi;ft) it is characterized fiWith ftMutual information;
Utilize feature fiSU (f with class label ci;C) He two features fi、ftThree tunnel interactive information I with class label c (fi;ft;C) the two amount, building object function is:
arg max f i ∈ X [ S U ( f i ; c ) + β Σ f t ∈ D I ( f i ; f t ; c ) ] - - - ( 10 )
In above formula, fiFor the feature do not chosen, X is the feature set do not chosen, and c is class label, and D is for meeting I (fi;fs;c) The maximum f more than zerosFeature set, fsIt is a feature just selected, ftFor the feature of D subset, β is balance coefficient.
β takes one in 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0.
Concrete steps are further refined as,
Step 1: call WEKA software, uses minimum description length discrete method that data characteristics is carried out discretization;
Step 2: initializing S, D and X, making S, D is empty set, and X is all features of data set;
Step 3: make the X=f in formula (1)i, utilize formula (1) to calculate the entropy H (f of all features in Xi),
H ( X ) = - Σ x ∈ X p ( x ) log p ( x ) - - - ( 1 )
P (x) is the probability density function of variable x;
Step 4: make the X=c in formula (1), utilizes formula (1) to calculate entropy H (c) of class label c;
Step 5: make the X=f in formula (3)i, Y=c, utilize formula (3) to calculate all features and the mutual information I of class label in X (fi;C),
I ( X ; Y ) = Σ x ∈ X Σ y ∈ Y p ( x , y ) log p ( x , y ) p ( x ) p ( y ) - - - ( 3 )
Wherein, p (x), p (y) are respectively variable X, the probability density function of Y, I (Y;X) it is the mutual information of Y and X, and p (x, y) For variable X, the Joint Distribution probability density function of Y;
Step 6: utilize formula (8), calculates all features and the SU value of class label in X
S U ( f i ; c ) = 2 I ( f i ; c ) H ( f i ) + H ( c ) - - - ( 8 )
Wherein, H (fi) it is characterized fiEntropy, H (c) is the entropy of class label c, I (fi;C) it is characterized fiMutual with class label c Information;
Step 7: taking-up and class label have feature f of maximum SU value from Xi, put in S, make fsFor fi
Step 8: making β is 0.1;
Step 9: make the X=f in formula (6)i, Y=ft, Z=c, utilize formula (6), calculate all features and class label c in X, fsConditional information I (fi;ft/ c):
I ( X ; Y / Z ) = Σ x ∈ X Σ y ∈ Y Σ z ∈ Z p ( x , y , z ) log p ( x , y / z ) p ( x / z ) p ( y / z ) - - - ( 6 )
Wherein, (x, y, z) be the joint probability density function of X, Y and Z to p, and p (x, y/z) is the connection of X, Y under the conditions of Z=z Closing probability density function, p (x/z) is the probability density function of X under the conditions of Z=z, and p (y/z) is the general of Y under the conditions of Z=z Rate density function;
Step 10: make the X=f in formula (3)i, Y=ft, utilize formula (3), calculate all features and the mutual trust of class label in X Breath I (fi;ft):
Step 11: utilize formula (9), calculates all features and class label c, f in XsThree tunnel interactive information;
I(fi;ft;C)=I (fi;ft/c)-I(fi;ft) (9)
Wherein, I (fi;ft;C) it is two features fi、ftWith the three tunnel interactive information of class label c, I (fi;ft/ c) it is in class Two features f under label c known conditionsiWith ftMutual information, I (fi;ft) it is characterized fiWith ftMutual information;
Step 12: if the maximum of three tunnel interactive information is more than zero, carry out step 13, step 14, step 15, step 16, step 17, step 18 and step 19;Otherwise, step 20, step 21, step 22, step 23 and step 24 are carried out;
Step 13: by fsPut in D;
Step 14: make the X=f in formula (1)i, utilize formula (1) to calculate the entropy H (f of all features in Xi);
Step 15: make the X=c in formula (1), utilizes formula (1) to calculate entropy H (c) of class label;
Step 16: make the X=f in formula (3)i, Y=c, utilize formula (3) to calculate all features and the mutual information of class label in X I(fi;c).
Step 17: utilize formula (8), calculates all features and the SU value of class label in X;
Step 18: utilize the result that formula (8) and formula (9) obtain, calculating formula (10):
Step 19: take out feature f that formula (10) is set up from Xi, put in S, make fsFor fi
Step 20: make the X=f in formula (1)i, utilize formula (1) to calculate the entropy H (f of all features in Xi);
Step 21: make the X=c in formula (1), utilizes formula (1) to calculate entropy H (c) of class label;
Step 22: make the X=f in formula (3)i, Y=c, utilize formula (3) to calculate all features and the mutual information of class label in X I(fi;c);
Step 23: utilize formula (8), calculates all features and the SU value of class label in X;
Step 24: taking-up and class label have feature f of maximum SU value from Xi, put in S, make fsFor fi
Step 25: repeatedly carry out step 9, step 10, step 11 and step 12, until selecting | S | feature, S is this calculation The character subset that method is chosen, | S | is characterized the number of subset, and N takes 30;When the characteristic number of data set is more than 30, | S | takes 30, Other data sets, | S | fetch data collection characteristic number, feature put into the order in S be i.e. this algorithm characteristics select order;
Step 26: make β be respectively 0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9 and 1.0, carry out step 9, step 10, step 11, step 12 and step 25.
Also include step 27: utilize WEKA software, the performance of selected characteristic tested, comprises the concrete steps that:
Step 27.1: utilize WEKA software, chooses first 1 in S, first 2 ..., front | S | feature;
Step 27.2: use C4.5 grader and ten folding cross validation methods that the feature chosen is tested;
Step 27.3: often group experiment is all carried out 10 times, takes its meansigma methods as final result, takes often accuracy rate in group result The highest
Characteristic of correspondence number is the number of final selected characteristic;
Step 27.4: be changed to be based only on an arest neighbors example, partial decision tree by the C4.5 grader in step 27.2 Middle acquisition is advised
Then (PART) and naive Bayesian (Bayesian) grader, carries out step 27.1, step 27.2 and step Rapid 27.3.
The feature of the present invention and providing the benefit that:
1) present invention proposes a kind of feature selecting algorithm based on measure information, with the calculation of existing some other same type Method is compared, and the algorithm of the present invention has a certain upgrade in feature selection effect;
2) present invention attempted the SU value to feature and class label and two features and class label three tunnel interactive information this Two amounts are weighed, and verify whether that there is the balance to some data set performances are generally optimum joins by experimental result Number;
3) feature selecting algorithm that the present invention proposes has certain using value, and this algorithm can be applied to such as digitized map As fields such as process and computer visions.
Accompanying drawing illustrates:
Fig. 1 gives the block diagram based on measure information feature selecting algorithm that the present invention proposes;
It is 0.2 that Fig. 2 gives β, the feature selection result of four kinds of graders of Vehicle data set,
Wherein, (a) is the feature selection result of C4.5 grader;B () is the feature selection result of IB1 grader;(c) be The feature selection result of PART grader;(d) beThe feature selection result of Bayesian grader;
It is 0.2 that Fig. 3 gives β, the feature selection result of four kinds of graders of Movement_libras data set.
Wherein, (a) is the feature selection result of C4.5 grader;B () is the feature selection result of IB1 grader;(c) be The feature selection result of PART grader;(d) beThe feature selection result of Bayesian grader.
Detailed description of the invention
The present invention first mutual information to class label Yu feature carries out well-balanced uncertainty (Symmetrical Uncertainty, SU) normalization;Then to the SU value after normalization with and three tunnels of class label and two features believe alternately Breath carries out certain balance;Propose a kind of feature selecting algorithm based on measure information, and whether deposited by experiment show At the balance coefficient generally optimum to some data set performances.
The present invention utilizes class label and the SU value of feature and class label and three tunnel interactive information the two amounts of two features, Propose a kind of feature selecting algorithm based on measure information.Details are as follows for concrete technical scheme:
The background knowledge of 1.1 measure informations
Convenient for statement, only process discrete random variable.Assuming that X is a discrete random variable, p (x) is the general of this variable Rate density function.Comentropy is often used to state the size of obtained quantity of information, and comentropy H (X) can be expressed as:
H ( X ) = - Σ x ∈ X p ( x ) log p ( x ) - - - ( 1 )
For obey Joint Distribution be p (x, variable x y) and variable y, its combination entropy H (X, Y) can be expressed as:
H ( X , Y ) = - Σ x ∈ X Σ y ∈ Y p ( x , y ) log p ( x , y ) - - - ( 2 )
Mutual information is often used to quantify the common information that two variablees are comprised.Mutual information I (the X of X and Y;Y) can represent For:
I ( X ; Y ) = Σ x ∈ X Σ y ∈ Y p ( x , y ) log p ( x , y ) p ( x ) p ( y ) - - - ( 3 )
I(X;Y)=I (Y;X) (4)
Wherein, p (x), p (y) are respectively variable X, the probability density function of Y, I (Y;X) it is the mutual information of Y and X.
Mutual information I (the X of X and Y;Y) there is following relation with the entropy H (X) of X, the entropy H (Y) of Y and combination entropy H (X, Y):
I(X;Y)=H (X)+H (Y)-H (X, Y) (5)
Conditional mutual information is used to quantify under a variable known case, the common information that two other variable is comprised.I (X;Y/Z) can be expressed as:
I ( X , Y / Z ) = Σ x ∈ X Σ y = Y Σ z ∈ Z p ( x , y , z ) log p ( x , y / z ) p ( x / z ) p ( y / z ) - - - ( 6 )
Wherein, (x, y, z) be the joint probability density function of X, Y and Z to p, and p (x, y/z) is the connection of X, Y under the conditions of Z=z Closing probability density function, p (x/z) is the probability density function of X under the conditions of Z=z, and p (y/z) is the general of Y under the conditions of Z=z Rate density function.
Three tunnel interactive information are the extensions of mutual information, the three tunnel interactive information I (X of three discrete random variables X, Y, Z;Y; Z) with the conditional mutual information I (X of X, Y, Z;And the mutual information I (X of X, Y Y/Z);Y) there is following relation:
I(X;Y;Z)=I (X;Y/Z)-I(X;Y) (7)
1.2 feature selecting algorithm based on measure information
Feature and the mutual information I (f of class labeli;C), this feature f is showniDegree of correlation with class label c.Association relationship is more Greatly, show that feature is the most relevant to class label.
Two features and three tunnel interactive information I (f of class labeli;fs;C) size, is from the two feature fiAnd fsIn That obtain with class label c information with respectively from feature fiOr fsIn obtain with class label information and size.As I (fi;fs; C) > 0 time, show two features fiAnd fsThere is synergism, i.e. from fiAnd fsIn obtain with the information of class label more than individually from fiOr fsIn obtain and the summation of class label information.As I (fi;fs;C) < when 0, two features f are showniAnd fsThere is redundancy, i.e. from fi And fsIn obtain with the information of class label less than individually from fiOr fsIn the summation with class label information that obtains.
As can be seen from the above: feature and the mutual information I (f of class labeli;C) value is the biggest, two features and the three of class label Road interactive information I (fi;fs;C) more than zero, this situation can obtain more information from selection feature.Feature and class label Mutual information I (fi;C) value is the least, two features and three tunnel interactive information I (f of class labeli;fs;C) value is less than zero, this feelings Condition then obtains less information from selection feature.Other two kinds of situations are from selecting the information obtained feature between both the above Between situation acquisition information.
If employing mutual information, feature selection process can preferentially choose the feature that association relationship is big, and association relationship is big Feature is not necessarily marked feature, therefore have employed well-balanced uncertainty (SU) to mutual information normalization, feature fiWith class label c's SU value is as follows:
S U ( f i ; c ) = 2 I ( f i ; c ) H ( f i ) + H ( c ) - - - ( 8 )
Wherein, H (fi) it is characterized fiEntropy, H (c) is the entropy of class label c, I (fi;C) it is characterized fiMutual with class label c Information.
Utilize formula (7), make X=fi, Y=ft, Z=c, obtain formula (9)
I(fi;ft;C)=I (fi;ft/c)-I(fi;ft) (9)
Wherein, I (fi;ft;C) it is two features fi、ftWith the three tunnel interactive information of class label c, I (fi;ft/ c) it is in class Two features f under label c known conditionsiWith ftMutual information, I (fi;ft) it is characterized fiWith ftMutual information.
Therefore, feature f is utilizediSU (f with class label ci;C) He two features fi、ftBelieve alternately with three tunnels of class label c Breath I (fi;ft;C) the two amount, proposes a kind of feature selecting algorithm based on measure information.The object function of this algorithm is:
arg max f i &Element; X &lsqb; S U ( f i ; c ) + &beta; &Sigma; f t &Element; D I ( f i ; f t ; c ) &rsqb; - - - ( 10 )
In above formula, fiFor the feature do not chosen, X is the feature set do not chosen, and c is class label, and D is for meeting I (fi;fs;c) The maximum f more than zerosFeature set, fsIt is a feature just selected, ftFor the feature of D subset, β is balance coefficient.General next Saying, comparing with three tunnel interactive information of class label with two features, feature is even more important with the SU value of class label.For simplifying, β takes 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0 this 10 number.
The flow chart of this algorithm is as follows:
Input: N: selected characteristic number;
Output: S: the feature after sequence.
Below in conjunction with algorithm block diagram and algorithm flow chart, the present invention is described in detail.
As it is shown in figure 1, the present invention provides a kind of feature selecting algorithm based on measure information, transport under matlab platform OK.It comprises the following steps:
Step 1: call WEKA software, uses minimum description length discrete method that data characteristics is carried out discretization.
Step 2: initialize S, D and X.Making S, D is empty set, and X is all features of data set.
Step 3: make the X=f in formula (1)i, utilize formula (1) to calculate the entropy H (f of all features in Xi)。
Step 4: make the X=c in formula (1), utilizes formula (1) to calculate entropy H (c) of class label.
Step 5: make the X=f in formula (3)i, Y=c, utilize formula (3) to calculate all features and the mutual information I of class label in X (fi;c).
Step 6: utilize formula (8), calculates all features and the SU value of class label in X.
Step 7: taking-up and class label have feature f of maximum SU value from Xi, put in S, make fsFor fi
Step 8: making β is 0.1.
Step 9: make the X=f in formula (6)i, Y=ft, Z=c, utilize formula (6), calculate all features and class label c in X, fsConditional information I (fi;ft/c)。
Step 10: make the X=f in formula (3)i, Y=ft, utilize formula (3), calculate all features and the mutual trust of class label in X Breath I (fi;ft)。
Step 11: utilize formula (9), calculates all features and class label c, f in XsThree tunnel interactive information.
Step 12: if the maximum of three tunnel interactive information is more than zero, carry out step 13, step 14, step 15, step 16, step 17, step 18 and step 19;Otherwise, step 20, step 21, step 22, step 23 and step 24 are carried out.
Step 13: by fsPut in D.
Step 14: make the X=f in formula (1)i, utilize formula (1) to calculate the entropy H (f of all features in Xi)。
Step 15: make the X=c in formula (1), utilizes formula (1) to calculate entropy H (c) of class label.
Step 16: make the X=f in formula (3)i, Y=c, utilize formula (3) to calculate all features and the mutual information of class label in X I(fi;c).
Step 17: utilize formula (8), calculates all features and the SU value of class label in X.
Step 18: utilize the result that formula (8) and formula (9) obtain, calculating formula (10).
Step 19: take out feature f that formula (10) is set up from Xi, put in S, make fsFor fi
Step 20: make the X=f in formula (1)i, utilize formula (1) to calculate the entropy H (f of all features in Xi)。
Step 21: make the X=c in formula (1), utilizes formula (1) to calculate entropy H (c) of class label.
Step 22: make the X=f in formula (3)i, Y=c, utilize formula (3) to calculate all features and the mutual information of class label in X I(fi;c).
Step 23: utilize formula (8), calculates all features and the SU value of class label in X.
Step 24: taking-up and class label have feature f of maximum SU value from Xi, put in S, make fsFor fi
Step 25: repeatedly carry out step 9, step 10, step 11 and step 12, until selecting | S | feature, S is this calculation The character subset that method is chosen, | S | is characterized the number of subset, and N takes 30.When the characteristic number of data set is more than 30, | S | takes 30. Other data sets, | S | fetch data collection characteristic number.It is i.e. the order that this algorithm characteristics selects that feature puts into the order in S.
Step 26: make β be respectively 0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9 and 1.0, carry out step 9, step 10, step 11, step 12 and step 25.
Step 27: utilize WEKA software, tests the performance of selected characteristic.
Step 27.1: utilize WEKA software, chooses first 1 in S, first 2 ..., front | S | feature.
Step 27.2: use C4.5 grader and ten folding cross validation methods that the feature chosen is tested.
Step 27.3: often group experiment is all carried out 10 times, takes its meansigma methods as final result, takes often accuracy rate in group result The highest characteristic of correspondence number is the number of final selected characteristic.
Step 27.4: be changed to the C4.5 grader in step 27.2 be based only on an arest neighbors example (Instance Base 1, IB1), partial decision tree obtains rule (PART) and naive Bayesian (Bayesian) grader, is carried out Step 27.1, step 27.2 and step 27.3.

Claims (4)

1. a feature selection approach based on measure information, is characterized in that, step is as follows: set X, Y, Z as three Discrete Stochastic Variable, the three tunnel interactive information I (X of X, Y, Z;Y;Z) with the conditional mutual information I (X of X, Y, Z;And the mutual information I (X of X, Y Y/Z); Y) there is following relation:
I(X;Y;Z)=I (X;Y/Z)-I(X;Y) (7)
Use well-balanced uncertainty SU (Symmetrical Uncertainty) to mutual information normalization, feature fiWith class label c SU value as follows:
S U ( f i ; c ) = 2 I ( f i ; c ) H ( f i ) + H ( c ) - - - ( 8 )
Wherein, H (fi) it is characterized fiEntropy, H (c) is the entropy of class label c, I (fi;C) it is characterized fiMutual information with class label c;
Utilize formula (7), make X=fi, Y=ft, Z=c, obtain formula (9):
I(fi;ft;C)=I (fi;ft/c)-I(fi;ft) (9)
Wherein, I (fi;ft;C) it is two features fi、ftWith the three tunnel interactive information of class label c, I (fi;ft/ c) it is at class label c Two features f under known conditionsiWith ftMutual information, I (fi;ft) it is characterized fiWith ftMutual information;
Utilize feature fiSU (f with class label ci;C) He two features fi、ftThe three tunnel interactive information I (f with class label ci;ft; C) the two amount, building object function is:
arg m a x f i &Element; X &lsqb; S U ( f i ; c ) + &beta; &Sigma; f t &Element; D I ( f i ; f t ; c ) &rsqb; - - - ( 10 )
In above formula, fiFor the feature do not chosen, X is the feature set do not chosen, and c is class label, and D is for meeting I (fi;fs;C) maximum The value f more than zerosFeature set, fsIt is a feature just selected, ftFor the feature of D subset, β is balance coefficient.
2. feature selection approach based on measure information as claimed in claim 1, is characterized in that, β takes 0.1,0.2,0.3, 0.4, in 0.5,0.6,0.7,0.8,0.9,1.0.
3. feature selection approach based on measure information as claimed in claim 1, is characterized in that, concrete steps refine further For,
Step 1: call WEKA software, uses minimum description length discrete method that data characteristics is carried out discretization;
Step 2: initializing S, D and X, making S, D is empty set, and X is all features of data set;
Step 3: make the X=f in formula (1)i, utilize formula (1) to calculate the entropy H (f of all features in Xi),
H ( X ) = - &Sigma; x &Element; X p ( x ) log p ( x ) - - - ( 1 )
P (x) is the probability density function of variable x;
Step 4: make the X=c in formula (1), utilizes formula (1) to calculate entropy H (c) of class label c;
Step 5: make the X=f in formula (3)i, Y=c, utilize formula (3) to calculate all features and the mutual information I (f of class label in Xi; C),
I ( X ; Y ) = &Sigma; x &Element; X &Sigma; y &Element; Y p ( x , y ) l o g p ( x , y ) p ( x ) p ( y ) - - - ( 3 )
Wherein, p (x), p (y) are respectively variable X, the probability density function of Y, I (Y;X) being the mutual information of Y and X, (x, y) for becoming for p The Joint Distribution probability density function of amount X, Y;
Step 6: utilize formula (8), calculates all features and the SU value of class label in X
S U ( f i ; c ) = 2 I ( f i ; c ) H ( f i ) + H ( c ) - - - ( 8 )
Wherein, H (fi) it is characterized fiEntropy, H (c) is the entropy of class label c, I (fi;C) it is characterized fiMutual information with class label c;
Step 7: taking-up and class label have feature f of maximum SU value from Xi, put in S, make fsFor fi
Step 8: making β is 0.1;
Step 9: make the X=f in formula (6)i, Y=ft, Z=c, utilize formula (6), calculate all features and class label c, f in Xs's Conditional information I (fi;ft/ c):
I ( X ; Y / Z ) = &Sigma; x &Element; X &Sigma; y &Element; Y &Sigma; z &Element; Z p ( x , y , z ) l o g p ( x , y / z ) p ( x / z ) p ( y / z ) - - - ( 6 )
Wherein, (x, y, z) be the joint probability density function of X, Y and Z to p, and p (x, y/z) is that the associating of X, Y is general under the conditions of Z=z Rate density function, p (x/z) is the probability density function of X under the conditions of Z=z, and p (y/z) is that the probability of Y is close under the conditions of Z=z Degree function;
Step 10: make the X=f in formula (3)i, Y=ft, utilize formula (3), calculate all features and the mutual information I of class label in X (fi;ft):
Step 11: utilize formula (9), calculates all features and class label c, f in XsThree tunnel interactive information;
I(fi;ft;C)=I (fi;ft/c)-I(fi;ft) (9)
Wherein, I (fi;ft;C) it is two features fi、ftWith the three tunnel interactive information of class label c, I (fi;ft/ c) it is at class label c Two features f under known conditionsiWith ftMutual information, I (fi;ft) it is characterized fiWith ftMutual information;
Step 12: if the maximum of three tunnel interactive information more than zero, carry out step 13, step 14, step 15, step 16,
Step 17, step 18 and step 19;Otherwise, step 20, step 21, step 22, step 23 and step 24 are carried out;
Step 13: by fsPut in D;
Step 14: make the X=f in formula (1)i, utilize formula (1) to calculate the entropy H (f of all features in Xi);
Step 15: make the X=c in formula (1), utilizes formula (1) to calculate entropy H (c) of class label;
Step 16: make the X=f in formula (3)i, Y=c, utilize formula (3) to calculate all features and the mutual information I (f of class label in Xi; c)。
Step 17: utilize formula (8), calculates all features and the SU value of class label in X;
Step 18: utilize the result that formula (8) and formula (9) obtain, calculating formula (10):
Step 19: take out feature f that formula (10) is set up from Xi, put in S, make fsFor fi
Step 20: make the X=f in formula (1)i, utilize formula (1) to calculate the entropy H (f of all features in Xi);
Step 21: make the X=c in formula (1), utilizes formula (1) to calculate entropy H (c) of class label;
Step 22: make the X=f in formula (3)i, Y=c, utilize formula (3) to calculate all features and the mutual information I (f of class label in Xi; c);
Step 23: utilize formula (8), calculates all features and the SU value of class label in X;
Step 24: taking-up and class label have feature f of maximum SU value from Xi, put in S, make fsFor fi
Step 25: repeatedly carry out step 9, step 10, step 11 and step 12, until selecting | S | feature, S is the choosing of this algorithm The character subset taken, | S | is characterized the number of subset, and N takes 30;When the characteristic number of data set is more than 30, | S | takes 30, other Data set, | S | fetch data collection characteristic number, feature put into the order in S be i.e. this algorithm characteristics select order;
Step 26: make β be respectively 0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9 and 1.0, carries out step 9, step 10, step Rapid 11, step 12 and step 25.
4. feature selection approach based on measure information as claimed in claim 3, is characterized in that, also include step 27: utilize WEKA software, tests the performance of selected characteristic, comprises the concrete steps that:
Step 27.1: utilize WEKA software, chooses first 1 in S, first 2 ..., front | S | feature;
Step 27.2: use C4.5 grader and ten folding cross validation methods that the feature chosen is tested;
Step 27.3: often group experiment is all carried out 10 times, takes its meansigma methods as final result, take often in group result accuracy rate the highest Characteristic of correspondence number is the number of final selected characteristic;
Step 27.4: be changed to be based only in an arest neighbors example, partial decision tree by the C4.5 grader in step 27.2 and obtain Take rule (PART) and naive Bayesian (Bayesian) grader, carries out step 27.1, step 27.2 and step 27.3。
CN201610542270.4A 2016-07-11 2016-07-11 Feature selection approach based on measure information Pending CN106169085A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610542270.4A CN106169085A (en) 2016-07-11 2016-07-11 Feature selection approach based on measure information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610542270.4A CN106169085A (en) 2016-07-11 2016-07-11 Feature selection approach based on measure information

Publications (1)

Publication Number Publication Date
CN106169085A true CN106169085A (en) 2016-11-30

Family

ID=58064881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610542270.4A Pending CN106169085A (en) 2016-07-11 2016-07-11 Feature selection approach based on measure information

Country Status (1)

Country Link
CN (1) CN106169085A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135508A (en) * 2019-05-21 2019-08-16 腾讯科技(深圳)有限公司 Model training method, device, electronic equipment and computer readable storage medium
CN110298398A (en) * 2019-06-25 2019-10-01 大连大学 Wireless protocols frame feature selection approach based on improved mutual imformation
CN110942149A (en) * 2019-10-31 2020-03-31 河海大学 Feature variable selection method based on information change rate and condition mutual information

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135508A (en) * 2019-05-21 2019-08-16 腾讯科技(深圳)有限公司 Model training method, device, electronic equipment and computer readable storage medium
CN110135508B (en) * 2019-05-21 2022-11-29 腾讯科技(深圳)有限公司 Model training method and device, electronic equipment and computer readable storage medium
CN110298398A (en) * 2019-06-25 2019-10-01 大连大学 Wireless protocols frame feature selection approach based on improved mutual imformation
CN110298398B (en) * 2019-06-25 2021-08-03 大连大学 Wireless protocol frame characteristic selection method based on improved mutual information
CN110942149A (en) * 2019-10-31 2020-03-31 河海大学 Feature variable selection method based on information change rate and condition mutual information
CN110942149B (en) * 2019-10-31 2020-09-22 河海大学 Feature variable selection method based on information change rate and condition mutual information

Similar Documents

Publication Publication Date Title
Sun et al. Feature selection using rough entropy-based uncertainty measures in incomplete decision systems
CN103257921B (en) Improved random forest algorithm based system and method for software fault prediction
CN102298579A (en) Scientific and technical literature-oriented model and method for sequencing papers, authors and periodicals
Duncan et al. Stochastic integration for fractional Brownian motion in a Hilbert space
CN105183796A (en) Distributed link prediction method based on clustering
CN106169085A (en) Feature selection approach based on measure information
Huang et al. An extended nonmonotone line search technique for large-scale unconstrained optimization
CN102147727A (en) Method for predicting software workload of newly-added software project
CN106327340A (en) Method and device for detecting abnormal node set in financial network
CN105825430A (en) Heterogeneous social network-based detection method
Bohmann et al. Constructing equivariant spectra via categorical Mackey functors
CN108509388A (en) Feature selection approach based on maximal correlation minimal redundancy and sequence
CN108345567A (en) Feature selection approach based on conditional mutual information
Vallecillo et al. Adding Random Operations to OCL.
CN105447241B (en) A kind of ESOP of logical function of Digital Logical Circuits minimizes method
CN105138527A (en) Data classification regression method and data classification regression device
CN109801073A (en) Risk subscribers recognition methods, device, computer equipment and storage medium
CN105824937A (en) Attribute selection method based on binary system firefly algorithm
CN103279549B (en) A kind of acquisition methods of target data of destination object and device
CN104899283A (en) Frequent sub-graph mining and optimizing method for single uncertain graph
CN104657473A (en) Large-scale data mining method capable of guaranteeing quality monotony
CN105022798A (en) Categorical data mining method of discrete Bayesian network on the basis of prediction relationship
Qu et al. Meta‐modeling of fractional constitutive relationships for rocks based on physics‐induced machine learning
CN104866588A (en) Frequent sub-graph mining method aiming at individual uncertain graph
Atalay et al. Circular success and failure runs in a sequence of exchangeable binary trials

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20161130

RJ01 Rejection of invention patent application after publication