CN106169085A

CN106169085A - Feature selection approach based on measure information

Info

Publication number: CN106169085A
Application number: CN201610542270.4A
Authority: CN
Inventors: 郭继昌; 顾翔元; 李重仪
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2016-07-11
Filing date: 2016-07-11
Publication date: 2016-11-30

Abstract

The invention belongs to machine learning, data mining technology field, for proposing a kind of feature selecting algorithm based on measure information, and whether there is the balance coefficient generally optimum to some data set performances by experiment show.The technical solution used in the present invention is, feature selection approach based on measure information, step is as follows: utilize feature f_iSU (f with class label c_i；C) He two features f_i、f_tThe three tunnel interactive information I (f with class label c_i；f_t；C) the two amount, building object function is:In above formula, f_iFor the feature do not chosen, X is the feature set do not chosen, and c is class label, and D is for meeting I (f_i；f_s；C) maximum f more than zero_sFeature set, f_sIt is a feature just selected, f_tFor the feature of D subset, β is balance coefficient.Present invention is mainly applied to machine learning, data mining occasion.

Description

Feature selection approach based on measure information

Technical field

The invention belongs to machine learning, data mining technology field, relate to a kind of feature selection side based on measure information Method.

Background technology

As a kind of important way of Dimensionality Reduction, feature selection is according to certain module, from primitive character Choose a preferably subset as final feature, thus reduce intrinsic dimensionality.Calculate with study according to character subset module The relation of method, feature selecting algorithm can be divided into filtering type (Filter), embedded (Embedded) and packaging type (Wrapper).Three compares, and Embedded algorithm and packaging type algorithm characteristics Selection effect are good, but the most more；Filtering type algorithm Feature selection effect is relatively poor, but time-consumingly few, compares and is suitable for being applied to High Dimensional Data Set.According to different modules, mistake Filter formula algorithm can be divided into algorithm based on measure information, algorithm based on distance metric, algorithm based on consistency metric and Algorithm based on subordinate tolerance.The present invention proposes a kind of feature selecting algorithm based on measure information.

For sake of convenience, only the feature selection being based only upon measure information is analyzed.For being based only upon measure information Feature selection, people mainly launch research in terms of following two: a kind of is the feature selecting algorithm merely with mutual information.Its In, utilizing the mutual information of feature and class label to weigh the mutual information between dependency and feature, to weigh the algorithm of redundancy more；Separately A kind of is the feature selecting algorithm utilizing mutual information to combine three tunnel interactive information.Owing to the algorithm of the second situation exists mutual information Do not have effectively to be combined with three tunnel interactive information and make the undesirable problem of feature selection effect, therefore to the second situation Feature selection is studied.

Summary of the invention

For overcoming the deficiencies in the prior art, it is contemplated that propose a kind of feature selecting algorithm based on measure information, and The balance coefficient generally optimum to some data set performances whether is there is by experiment show.The technical side that the present invention uses Case is, feature selection approach based on measure information, and step is as follows: set X, Y, Z as three discrete random variables, the three of X, Y, Z Road interactive information I (X；Y；Z) with the conditional mutual information I (X of X, Y, Z；And the mutual information I (X of X, Y Y/Z)；Y) there is following relation:

I(X；Y；Z)=I (X；Y/Z)-I(X；Y) (7)

Use well-balanced uncertainty SU (Symmetrical Uncertainty) to mutual information normalization, feature f_iWith class The SU value of label c is as follows:

S U (f_{i}; c) = \frac{2 I (f_{i}; c)}{H (f_{i}) + H (c)} - - - (8)

Wherein, H (f_i) it is characterized f_iEntropy, H (c) is the entropy of class label c, I (f_i；C) it is characterized f_iMutual with class label c Information；

Utilize formula (7), make X=f_i, Y=f_t, Z=c, obtain formula (9):

I(f_i；f_t；C)=I (f_i；f_t/c)-I(f_i；f_t) (9)

Wherein, I (f_i；f_t；C) it is two features f_i、f_tWith the three tunnel interactive information of class label c, I (f_i；f_t/ c) it is in class Two features f under label c known conditions_iWith f_tMutual information, I (f_i；f_t) it is characterized f_iWith f_tMutual information；

Utilize feature f_iSU (f with class label c_i；C) He two features f_i、f_tThree tunnel interactive information I with class label c (f_i；f_t；C) the two amount, building object function is:

\begin{matrix} \arg & \max_{f_{i} &Element; X} [S U (f_{i}; c) + β \underset{f_{t} &Element; D}{Σ} I (f_{i}; f_{t}; c)] \end{matrix} - - - (10)

In above formula, f_iFor the feature do not chosen, X is the feature set do not chosen, and c is class label, and D is for meeting I (f_i；f_s；c) The maximum f more than zero_sFeature set, f_sIt is a feature just selected, f_tFor the feature of D subset, β is balance coefficient.

β takes one in 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0.

Concrete steps are further refined as,

Step 1: call WEKA software, uses minimum description length discrete method that data characteristics is carried out discretization；

Step 2: initializing S, D and X, making S, D is empty set, and X is all features of data set；

Step 3: make the X=f in formula (1)_i, utilize formula (1) to calculate the entropy H (f of all features in X_i),

H (X) = - \underset{x &Element; X}{Σ} p (x) \log p (x) - - - (1)

P (x) is the probability density function of variable x；

Step 4: make the X=c in formula (1), utilizes formula (1) to calculate entropy H (c) of class label c；

Step 5: make the X=f in formula (3)_i, Y=c, utilize formula (3) to calculate all features and the mutual information I of class label in X (f_i；C),

I (X; Y) = \underset{x &Element; X}{Σ} \underset{y &Element; Y}{Σ} p (x, y) \log \frac{p (x, y)}{p (x) p (y)} - - - (3)

Wherein, p (x), p (y) are respectively variable X, the probability density function of Y, I (Y；X) it is the mutual information of Y and X, and p (x, y) For variable X, the Joint Distribution probability density function of Y；

Step 6: utilize formula (8), calculates all features and the SU value of class label in X

S U (f_{i}; c) = \frac{2 I (f_{i}; c)}{H (f_{i}) + H (c)} - - - (8)

Step 7: taking-up and class label have feature f of maximum SU value from X_i, put in S, make f_sFor f_i；

Step 8: making β is 0.1；

Step 9: make the X=f in formula (6)_i, Y=f_t, Z=c, utilize formula (6), calculate all features and class label c in X, f_sConditional information I (f_i；f_t/ c):

I (X; Y / Z) = \underset{x &Element; X}{Σ} \underset{y &Element; Y}{Σ} \underset{z &Element; Z}{Σ} p (x, y, z) \log \frac{p (x, y / z)}{p (x / z) p (y / z)} - - - (6)

Wherein, (x, y, z) be the joint probability density function of X, Y and Z to p, and p (x, y/z) is the connection of X, Y under the conditions of Z=z Closing probability density function, p (x/z) is the probability density function of X under the conditions of Z=z, and p (y/z) is the general of Y under the conditions of Z=z Rate density function；

Step 10: make the X=f in formula (3)_i, Y=f_t, utilize formula (3), calculate all features and the mutual trust of class label in X Breath I (f_i；f_t):

Step 11: utilize formula (9), calculates all features and class label c, f in X_sThree tunnel interactive information；

I(f_i；f_t；C)=I (f_i；f_t/c)-I(f_i；f_t) (9)

Step 12: if the maximum of three tunnel interactive information is more than zero, carry out step 13, step 14, step 15, step 16, step 17, step 18 and step 19；Otherwise, step 20, step 21, step 22, step 23 and step 24 are carried out；

Step 13: by f_sPut in D；

Step 14: make the X=f in formula (1)_i, utilize formula (1) to calculate the entropy H (f of all features in X_i)；

Step 15: make the X=c in formula (1), utilizes formula (1) to calculate entropy H (c) of class label；

Step 16: make the X=f in formula (3)_i, Y=c, utilize formula (3) to calculate all features and the mutual information of class label in X I(f_i；c).

Step 17: utilize formula (8), calculates all features and the SU value of class label in X；

Step 18: utilize the result that formula (8) and formula (9) obtain, calculating formula (10):

Step 19: take out feature f that formula (10) is set up from X_i, put in S, make f_sFor f_i；

Step 20: make the X=f in formula (1)_i, utilize formula (1) to calculate the entropy H (f of all features in X_i)；

Step 21: make the X=c in formula (1), utilizes formula (1) to calculate entropy H (c) of class label；

Step 22: make the X=f in formula (3)_i, Y=c, utilize formula (3) to calculate all features and the mutual information of class label in X I(f_i；c)；

Step 23: utilize formula (8), calculates all features and the SU value of class label in X；

Step 24: taking-up and class label have feature f of maximum SU value from X_i, put in S, make f_sFor f_i；

Step 25: repeatedly carry out step 9, step 10, step 11 and step 12, until selecting | S | feature, S is this calculation The character subset that method is chosen, | S | is characterized the number of subset, and N takes 30；When the characteristic number of data set is more than 30, | S | takes 30, Other data sets, | S | fetch data collection characteristic number, feature put into the order in S be i.e. this algorithm characteristics select order；

Step 26: make β be respectively 0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9 and 1.0, carry out step 9, step 10, step 11, step 12 and step 25.

Also include step 27: utilize WEKA software, the performance of selected characteristic tested, comprises the concrete steps that:

Step 27.1: utilize WEKA software, chooses first 1 in S, first 2 ..., front | S | feature；

Step 27.2: use C4.5 grader and ten folding cross validation methods that the feature chosen is tested；

Step 27.3: often group experiment is all carried out 10 times, takes its meansigma methods as final result, takes often accuracy rate in group result The highest

Characteristic of correspondence number is the number of final selected characteristic；

Step 27.4: be changed to be based only on an arest neighbors example, partial decision tree by the C4.5 grader in step 27.2 Middle acquisition is advised

Then (PART) and naive Bayesian (Bayesian) grader, carries out step 27.1, step 27.2 and step Rapid 27.3.

The feature of the present invention and providing the benefit that:

1) present invention proposes a kind of feature selecting algorithm based on measure information, with the calculation of existing some other same type Method is compared, and the algorithm of the present invention has a certain upgrade in feature selection effect；

2) present invention attempted the SU value to feature and class label and two features and class label three tunnel interactive information this Two amounts are weighed, and verify whether that there is the balance to some data set performances are generally optimum joins by experimental result Number；

3) feature selecting algorithm that the present invention proposes has certain using value, and this algorithm can be applied to such as digitized map As fields such as process and computer visions.

Accompanying drawing illustrates:

Fig. 1 gives the block diagram based on measure information feature selecting algorithm that the present invention proposes；

It is 0.2 that Fig. 2 gives β, the feature selection result of four kinds of graders of Vehicle data set,

Wherein, (a) is the feature selection result of C4.5 grader；B () is the feature selection result of IB1 grader；(c) be The feature selection result of PART grader；(d) beThe feature selection result of Bayesian grader；

It is 0.2 that Fig. 3 gives β, the feature selection result of four kinds of graders of Movement_libras data set.

Wherein, (a) is the feature selection result of C4.5 grader；B () is the feature selection result of IB1 grader；(c) be The feature selection result of PART grader；(d) beThe feature selection result of Bayesian grader.

Detailed description of the invention

The present invention first mutual information to class label Yu feature carries out well-balanced uncertainty (Symmetrical Uncertainty, SU) normalization；Then to the SU value after normalization with and three tunnels of class label and two features believe alternately Breath carries out certain balance；Propose a kind of feature selecting algorithm based on measure information, and whether deposited by experiment show At the balance coefficient generally optimum to some data set performances.

The present invention utilizes class label and the SU value of feature and class label and three tunnel interactive information the two amounts of two features, Propose a kind of feature selecting algorithm based on measure information.Details are as follows for concrete technical scheme:

The background knowledge of 1.1 measure informations

Convenient for statement, only process discrete random variable.Assuming that X is a discrete random variable, p (x) is the general of this variable Rate density function.Comentropy is often used to state the size of obtained quantity of information, and comentropy H (X) can be expressed as:

H (X) = - \underset{x &Element; X}{Σ} p (x) \log p (x) - - - (1)

For obey Joint Distribution be p (x, variable x y) and variable y, its combination entropy H (X, Y) can be expressed as:

H (X, Y) = - \underset{x &Element; X}{Σ} \underset{y &Element; Y}{Σ} p (x, y) \log p (x, y) - - - (2)

Mutual information is often used to quantify the common information that two variablees are comprised.Mutual information I (the X of X and Y；Y) can represent For:

I (X; Y) = \underset{x &Element; X}{Σ} \underset{y &Element; Y}{Σ} p (x, y) \log \frac{p (x, y)}{p (x) p (y)} - - - (3)

I(X；Y)=I (Y；X) (4)

Wherein, p (x), p (y) are respectively variable X, the probability density function of Y, I (Y；X) it is the mutual information of Y and X.

Mutual information I (the X of X and Y；Y) there is following relation with the entropy H (X) of X, the entropy H (Y) of Y and combination entropy H (X, Y):

I(X；Y)=H (X)+H (Y)-H (X, Y) (5)

Conditional mutual information is used to quantify under a variable known case, the common information that two other variable is comprised.I (X；Y/Z) can be expressed as:

I (X, Y / Z) = \underset{x &Element; X}{Σ} \underset{y = Y}{Σ} \underset{z &Element; Z}{Σ} p (x, y, z) \log \frac{p (x, y / z)}{p (x / z) p (y / z)} - - - (6)

Wherein, (x, y, z) be the joint probability density function of X, Y and Z to p, and p (x, y/z) is the connection of X, Y under the conditions of Z=z Closing probability density function, p (x/z) is the probability density function of X under the conditions of Z=z, and p (y/z) is the general of Y under the conditions of Z=z Rate density function.

Three tunnel interactive information are the extensions of mutual information, the three tunnel interactive information I (X of three discrete random variables X, Y, Z；Y； Z) with the conditional mutual information I (X of X, Y, Z；And the mutual information I (X of X, Y Y/Z)；Y) there is following relation:

I(X；Y；Z)=I (X；Y/Z)-I(X；Y) (7)

1.2 feature selecting algorithm based on measure information

Feature and the mutual information I (f of class label_i；C), this feature f is shown_iDegree of correlation with class label c.Association relationship is more Greatly, show that feature is the most relevant to class label.

Two features and three tunnel interactive information I (f of class label_i；f_s；C) size, is from the two feature f_iAnd f_sIn That obtain with class label c information with respectively from feature f_iOr f_sIn obtain with class label information and size.As I (f_i；f_s； C) > 0 time, show two features f_iAnd f_sThere is synergism, i.e. from f_iAnd f_sIn obtain with the information of class label more than individually from f_iOr f_sIn obtain and the summation of class label information.As I (f_i；f_s；C) < when 0, two features f are shown_iAnd f_sThere is redundancy, i.e. from f_i And f_sIn obtain with the information of class label less than individually from f_iOr f_sIn the summation with class label information that obtains.

As can be seen from the above: feature and the mutual information I (f of class label_i；C) value is the biggest, two features and the three of class label Road interactive information I (f_i；f_s；C) more than zero, this situation can obtain more information from selection feature.Feature and class label Mutual information I (f_i；C) value is the least, two features and three tunnel interactive information I (f of class label_i；f_s；C) value is less than zero, this feelings Condition then obtains less information from selection feature.Other two kinds of situations are from selecting the information obtained feature between both the above Between situation acquisition information.

If employing mutual information, feature selection process can preferentially choose the feature that association relationship is big, and association relationship is big Feature is not necessarily marked feature, therefore have employed well-balanced uncertainty (SU) to mutual information normalization, feature f_iWith class label c's SU value is as follows:

S U (f_{i}; c) = \frac{2 I (f_{i}; c)}{H (f_{i}) + H (c)} - - - (8)

Wherein, H (f_i) it is characterized f_iEntropy, H (c) is the entropy of class label c, I (f_i；C) it is characterized f_iMutual with class label c Information.

Utilize formula (7), make X=f_i, Y=f_t, Z=c, obtain formula (9)

I(f_i；f_t；C)=I (f_i；f_t/c)-I(f_i；f_t) (9)

Wherein, I (f_i；f_t；C) it is two features f_i、f_tWith the three tunnel interactive information of class label c, I (f_i；f_t/ c) it is in class Two features f under label c known conditions_iWith f_tMutual information, I (f_i；f_t) it is characterized f_iWith f_tMutual information.

Therefore, feature f is utilized_iSU (f with class label c_i；C) He two features f_i、f_tBelieve alternately with three tunnels of class label c Breath I (f_i；f_t；C) the two amount, proposes a kind of feature selecting algorithm based on measure information.The object function of this algorithm is:

\begin{matrix} \arg & \max_{f_{i} &Element; X} [S U (f_{i}; c) + β \underset{f_{t} &Element; D}{Σ} I (f_{i}; f_{t}; c)] \end{matrix} - - - (10)

In above formula, f_iFor the feature do not chosen, X is the feature set do not chosen, and c is class label, and D is for meeting I (f_i；f_s；c) The maximum f more than zero_sFeature set, f_sIt is a feature just selected, f_tFor the feature of D subset, β is balance coefficient.General next Saying, comparing with three tunnel interactive information of class label with two features, feature is even more important with the SU value of class label.For simplifying, β takes 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0 this 10 number.

The flow chart of this algorithm is as follows:

Input: N: selected characteristic number；

Output: S: the feature after sequence.

Below in conjunction with algorithm block diagram and algorithm flow chart, the present invention is described in detail.

As it is shown in figure 1, the present invention provides a kind of feature selecting algorithm based on measure information, transport under matlab platform OK.It comprises the following steps:

Step 1: call WEKA software, uses minimum description length discrete method that data characteristics is carried out discretization.

Step 2: initialize S, D and X.Making S, D is empty set, and X is all features of data set.

Step 3: make the X=f in formula (1)_i, utilize formula (1) to calculate the entropy H (f of all features in X_i)。

Step 4: make the X=c in formula (1), utilizes formula (1) to calculate entropy H (c) of class label.

Step 5: make the X=f in formula (3)_i, Y=c, utilize formula (3) to calculate all features and the mutual information I of class label in X (f_i；c).

Step 6: utilize formula (8), calculates all features and the SU value of class label in X.

Step 7: taking-up and class label have feature f of maximum SU value from X_i, put in S, make f_sFor f_i。

Step 8: making β is 0.1.

Step 9: make the X=f in formula (6)_i, Y=f_t, Z=c, utilize formula (6), calculate all features and class label c in X, f_sConditional information I (f_i；f_t/c)。

Step 10: make the X=f in formula (3)_i, Y=f_t, utilize formula (3), calculate all features and the mutual trust of class label in X Breath I (f_i；f_t)。

Step 11: utilize formula (9), calculates all features and class label c, f in X_sThree tunnel interactive information.

Step 12: if the maximum of three tunnel interactive information is more than zero, carry out step 13, step 14, step 15, step 16, step 17, step 18 and step 19；Otherwise, step 20, step 21, step 22, step 23 and step 24 are carried out.

Step 13: by f_sPut in D.

Step 14: make the X=f in formula (1)_i, utilize formula (1) to calculate the entropy H (f of all features in X_i)。

Step 15: make the X=c in formula (1), utilizes formula (1) to calculate entropy H (c) of class label.

Step 17: utilize formula (8), calculates all features and the SU value of class label in X.

Step 18: utilize the result that formula (8) and formula (9) obtain, calculating formula (10).

Step 19: take out feature f that formula (10) is set up from X_i, put in S, make f_sFor f_i。

Step 20: make the X=f in formula (1)_i, utilize formula (1) to calculate the entropy H (f of all features in X_i)。

Step 21: make the X=c in formula (1), utilizes formula (1) to calculate entropy H (c) of class label.

Step 22: make the X=f in formula (3)_i, Y=c, utilize formula (3) to calculate all features and the mutual information of class label in X I(f_i；c).

Step 23: utilize formula (8), calculates all features and the SU value of class label in X.

Step 24: taking-up and class label have feature f of maximum SU value from X_i, put in S, make f_sFor f_i。

Step 25: repeatedly carry out step 9, step 10, step 11 and step 12, until selecting | S | feature, S is this calculation The character subset that method is chosen, | S | is characterized the number of subset, and N takes 30.When the characteristic number of data set is more than 30, | S | takes 30. Other data sets, | S | fetch data collection characteristic number.It is i.e. the order that this algorithm characteristics selects that feature puts into the order in S.

Step 27: utilize WEKA software, tests the performance of selected characteristic.

Step 27.1: utilize WEKA software, chooses first 1 in S, first 2 ..., front | S | feature.

Step 27.2: use C4.5 grader and ten folding cross validation methods that the feature chosen is tested.

Step 27.3: often group experiment is all carried out 10 times, takes its meansigma methods as final result, takes often accuracy rate in group result The highest characteristic of correspondence number is the number of final selected characteristic.

Step 27.4: be changed to the C4.5 grader in step 27.2 be based only on an arest neighbors example (Instance Base 1, IB1), partial decision tree obtains rule (PART) and naive Bayesian (Bayesian) grader, is carried out Step 27.1, step 27.2 and step 27.3.

Claims

1. a feature selection approach based on measure information, is characterized in that, step is as follows: set X, Y, Z as three Discrete Stochastic Variable, the three tunnel interactive information I (X of X, Y, Z；Y；Z) with the conditional mutual information I (X of X, Y, Z；And the mutual information I (X of X, Y Y/Z)； Y) there is following relation:

I(X；Y；Z)=I (X；Y/Z)-I(X；Y) (7)

Use well-balanced uncertainty SU (Symmetrical Uncertainty) to mutual information normalization, feature f_iWith class label c SU value as follows:

S U (f_{i}; c) = \frac{2 I (f_{i}; c)}{H (f_{i}) + H (c)} - - - (8)

Wherein, H (f_i) it is characterized f_iEntropy, H (c) is the entropy of class label c, I (f_i；C) it is characterized f_iMutual information with class label c；

Utilize formula (7), make X=f_i, Y=f_t, Z=c, obtain formula (9):

I(f_i；f_t；C)=I (f_i；f_t/c)-I(f_i；f_t) (9)

Wherein, I (f_i；f_t；C) it is two features f_i、f_tWith the three tunnel interactive information of class label c, I (f_i；f_t/ c) it is at class label c Two features f under known conditions_iWith f_tMutual information, I (f_i；f_t) it is characterized f_iWith f_tMutual information；

Utilize feature f_iSU (f with class label c_i；C) He two features f_i、f_tThe three tunnel interactive information I (f with class label c_i；f_t； C) the two amount, building object function is:

\begin{matrix} \arg & \underset{f_{i} &Element; X}{m a x} [S U (f_{i}; c) + β \underset{f_{t} &Element; D}{Σ} I (f_{i}; f_{t}; c)] \end{matrix} - - - (10)

In above formula, f_iFor the feature do not chosen, X is the feature set do not chosen, and c is class label, and D is for meeting I (f_i；f_s；C) maximum The value f more than zero_sFeature set, f_sIt is a feature just selected, f_tFor the feature of D subset, β is balance coefficient.

2. feature selection approach based on measure information as claimed in claim 1, is characterized in that, β takes 0.1,0.2,0.3, 0.4, in 0.5,0.6,0.7,0.8,0.9,1.0.

3. feature selection approach based on measure information as claimed in claim 1, is characterized in that, concrete steps refine further For,

H (X) = - \underset{x &Element; X}{Σ} p (x) \log p (x) - - - (1)

P (x) is the probability density function of variable x；

Step 5: make the X=f in formula (3)_i, Y=c, utilize formula (3) to calculate all features and the mutual information I (f of class label in X_i； C),

I (X; Y) = \underset{x &Element; X}{Σ} \underset{y &Element; Y}{Σ} p (x, y) l o g \frac{p (x, y)}{p (x) p (y)} - - - (3)

Wherein, p (x), p (y) are respectively variable X, the probability density function of Y, I (Y；X) being the mutual information of Y and X, (x, y) for becoming for p The Joint Distribution probability density function of amount X, Y；

S U (f_{i}; c) = \frac{2 I (f_{i}; c)}{H (f_{i}) + H (c)} - - - (8)

Step 8: making β is 0.1；

Step 9: make the X=f in formula (6)_i, Y=f_t, Z=c, utilize formula (6), calculate all features and class label c, f in X_s's Conditional information I (f_i；f_t/ c):

I (X; Y / Z) = \underset{x &Element; X}{Σ} \underset{y &Element; Y}{Σ} \underset{z &Element; Z}{Σ} p (x, y, z) l o g \frac{p (x, y / z)}{p (x / z) p (y / z)} - - - (6)

Wherein, (x, y, z) be the joint probability density function of X, Y and Z to p, and p (x, y/z) is that the associating of X, Y is general under the conditions of Z=z Rate density function, p (x/z) is the probability density function of X under the conditions of Z=z, and p (y/z) is that the probability of Y is close under the conditions of Z=z Degree function；

Step 10: make the X=f in formula (3)_i, Y=f_t, utilize formula (3), calculate all features and the mutual information I of class label in X (f_i；f_t):

I(f_i；f_t；C)=I (f_i；f_t/c)-I(f_i；f_t) (9)

Step 12: if the maximum of three tunnel interactive information more than zero, carry out step 13, step 14, step 15, step 16,

Step 17, step 18 and step 19；Otherwise, step 20, step 21, step 22, step 23 and step 24 are carried out；

Step 13: by f_sPut in D；

Step 16: make the X=f in formula (3)_i, Y=c, utilize formula (3) to calculate all features and the mutual information I (f of class label in X_i； c)。

Step 22: make the X=f in formula (3)_i, Y=c, utilize formula (3) to calculate all features and the mutual information I (f of class label in X_i； c)；

Step 25: repeatedly carry out step 9, step 10, step 11 and step 12, until selecting | S | feature, S is the choosing of this algorithm The character subset taken, | S | is characterized the number of subset, and N takes 30；When the characteristic number of data set is more than 30, | S | takes 30, other Data set, | S | fetch data collection characteristic number, feature put into the order in S be i.e. this algorithm characteristics select order；

Step 26: make β be respectively 0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9 and 1.0, carries out step 9, step 10, step Rapid 11, step 12 and step 25.

4. feature selection approach based on measure information as claimed in claim 3, is characterized in that, also include step 27: utilize WEKA software, tests the performance of selected characteristic, comprises the concrete steps that:

Step 27.3: often group experiment is all carried out 10 times, takes its meansigma methods as final result, take often in group result accuracy rate the highest Characteristic of correspondence number is the number of final selected characteristic；

Step 27.4: be changed to be based only in an arest neighbors example, partial decision tree by the C4.5 grader in step 27.2 and obtain Take rule (PART) and naive Bayesian (Bayesian) grader, carries out step 27.1, step 27.2 and step 27.3。