CN106991447A

CN106991447A - A kind of embedded multi-class attribute tags dynamic feature selection algorithm

Info

Publication number: CN106991447A
Application number: CN201710222600.6A
Authority: CN
Inventors: 黄金杰; 孔庆达; 潘晓真
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2017-04-06
Filing date: 2017-04-06
Publication date: 2017-07-28

Abstract

The invention discloses the embedded multi-class attribute tags dynamic feature selection method of one kind, improve the deficiency of the feature selecting algorithm of traditional multi-class attribute tags, propose a kind of embedded multi-class attribute tags dynamic feature selection method (ML_NIFS), the method both take into account the correlation between multi-tag property set inside, it is also contemplated that the factor that the calculating of comentropy constantly changes in feature selecting interpretational criteria.Finally verified, as a result show that proposed algorithm can carry out effective dimensionality reduction to data attribute, and improve follow-up classifying quality.

Description

A kind of embedded multi-class attribute tags dynamic feature selection algorithm

Technical field

It is specifically a kind of embedded multi-class attribute tags dynamic feature selection side the present invention relates to area of pattern recognition Method.

Background technology

Arithmetic speed is fast, efficiency comparison because it has in higher-dimension attribute data processing procedure for traditional mutual information metric algorithm High the advantages of, it is widely used in characteristic dimension Algorithm for Reduction.But with the rapid development of science and technology, many technical field ratios As computer network communication, biochemical medicine engineering all develop towards multi-class attribute tags data type direction.Multi-tag is classified Problem is exactly the type characteristic according to multi-tag data, builds corresponding disaggregated model, and according to criterion to unknown data Category attribute judged, by sample data simultaneously be divided into multiple class labels.Single tag attributes classification problem and The fundamental difference of multi-tag attributive classification problem is that single tag attributes classification problem sample data can only belong to a classification mark Label, and the classification problem sample data of multi-tag attribute may belong to multiple class labels, this extremely meets Information Number at this stage The characteristics of according to high development.Therefore get the attention.

The classification of multi-tag attribute and traditional single tag attributes classification, multi-tag attributive classification problem is also same Sample is faced with " dimension disaster " problem, and " dimension disaster " similarly drastically influence the classification capacity of multi-tag attributive classification device. " dimensionality reduction technology " of characteristic attribute can reduce the dimension of characteristic attribute, the classification accuracy of grader be improved, in single mark While signing applicable in attributive classification problem, this skill of characteristic attribute can be similarly used in multi-tag attributive classification problem Art, to reach the effect of attribute reduction.Feature " dimensionality reduction " technology is generally generally divided into two sides of feature selecting and feature extraction Face, feature selecting is divided into according to the difference of its interpretational criteria, filtering type, packaging type, embedded.Main research multi-tag of the invention Feature selection issues.

There are two kinds of basic developing direction at this stage in multi-tag feature selecting algorithm：On the one hand it is the algorithm of data conversion Direction, is on the one hand that algorithm adapts to direction.Feature selecting algorithm research based on data conversion is to be turned institute's label data Change, be converted into single label category attribute, repeatedly use the feature selecting algorithm of single label to reach multi-tag feature selecting Purpose.The feature selecting algorithm research adapted to based on algorithm is to be deformed the feature selecting algorithm of single label and algorithm Improve, adapt it to the feature selecting algorithm of multi-tag attribute.Common algorithms at this stage have the SVM features converted based on data Selection algorithm, KNN algorithms, these algorithms do not account for the dependency relation inside tag attributes, the feature choosing based on mutual information The correlative relationship between attribute can be analyzed well according to the relevant knowledge of mutual information in information theory by selecting algorithm.But it is conventional It is enough ineffective yet come the evaluation method for the mutual information for weighing the correlation between two variables, only consider feature and classification it Between correlation and feature and selected the correlation between feature, sample data will be chosen with feature by not Disconnected to be determined, the estimated value of comentropy is showed in the dynamic process being continually changing.

Based on it is contemplated above the problem of, the present invention proposes a kind of embedded multi-class attribute tags dynamic feature selection Algorithm (ML-DIFS), the algorithm is calculated by mutual information, not only considers correlation between characteristic attribute and tag attributes also Consider the correlation and redundancy between characteristic attribute, while being additionally contemplates that multi-tag community-internal, tag attributes belong to label Correlation between property.The embedded dynamic multi-tag feature selecting algorithm proposed, will have been recognized by embedded grader Sample data rejected, accuracy, the real-time that information entropy estimation is ensured with this.

The content of the invention

Embedded multi-class attribute tags dynamic feature selection method is based on it is an object of the invention to provide one kind, to solve The problem of being proposed in certainly above-mentioned background technology；To achieve the above object, the present invention provides following technical scheme：Specifically a kind of base Comprise the following steps in embedded multi-class attribute tags dynamic feature selection method：

Traditional feature selection approach based on mutual information is introduced first.

1. data acquisition system is pre-processed

Database is extremely easy by noise data, AFR control and inconsistent data in real world now Invade and harass, have substantial amounts of Data Preprocessing Technology at this stage, can generally be divided into data scrubbing, data integration, data conversion and Hough transformation technology.Data scrubbing can clear data middle noise data, correct it is inconsistent, voluntarily fill up the missing of sample data Data, data conversion (data normalization) can improve the precision and validity for the algorithm for being related to distance metric.Such as people wish Data are hoped to meet certain specific data distribution, it is desirable alternatively to which each data characteristics is mapped into a certain section of specific data interval It is interior, all it is to need to carry out data conversion.The pretreatment of data acquisition system mainly divides several parts for this paper：First will Noise data and inconsistent data AFR control in data acquisition system are handled.Second by data set with classification not phase completely The attribute data of pass is deleted.Attribute data progress norm normalized is made norm be normalized to 1 by the 3rd, then is had：

2. the relevant knowledge of mutual information

The selection target of feature selecting is to select key in the characteristic attribute for most worthy of classifying, feature selecting The problem of needing to solve is metric question, and metric question will consider correlative relationship, the attribute between property set and class label Dependency relation inside the redundancy sexual intercourse of collection and property set and tag attributes collection.Therefore for this correlation problem Discuss, the mutual information in selection information theory analyzes the correlation between attribute as module.Information theory is described below Middle correlation theory and operation rule.

Comentropy is vital concept in information theory theory, and comentropy is a kind of uncertainty degree for characterizing variable, Purpose is the number of representation manners content.

Wherein, p (x_i) probability that variable X value is xi is represented, the uncertainty degree of variable X can just use comentropy H (X) To represent, the size and the probability distribution of variable of H (X) value have relation, therefore effectively overcome partial noise number in comentropy According to interference.

Conditional entropy refers under conditions of a known variable that the variable of the uncertainty degree of another variable, i.e., one is to another The degree of strength of the degree of dependence of one variable, therefore stochastic variable X can use condition to another stochastic variable Y degree of dependence Entropy is characterized.

Wherein, p (x_i) represent variable X prior probability p (x_i|y_j) represent variable Y under the conditions of known after variable X Test probability.

Mutual information is to characterize the degree that interdepends between two stochastic variables, co-owning between two variables of expression Information content number, when the value of mutual information for 0 is that minimum value represents that identical information is not present between two variables, when mutual Represent that the identical information that two variables are included is relatively more when the value of information is larger.It is defined as：

I(X；Y)=H (X)-H (X | Y) (4)

Mutual information very effective can reflect the correlation between two stochastic variables, and can pass through numerical value Form is showed, and the tightness degree of the correlation between two stochastic variables is stated with the size of numerical value, but in meter The growth pattern of information is also contemplated that while calculating two stochastic variable mutual information content, if directly with the size of mutual information To select feature, it will select those values than larger feature, so mutual information is normalized, in processing procedure Degree formula using the correlation between symmetrical uncertain SU measures characteristics variable and characteristic variable is as follows：

By formula (5) it can be seen that the excursion of SU degree of correlation values is by 0 to 1, if SU value is 0, represent X with It is separate that correlation, i.e. X, which is not present, with Y in Y.If SU value is 1, represent that X and Y has very strong correlation, such as Fruit X and Y represents attribute information and classification information respectively, and SU value is more big, represents that feature has strong correlation for the selection of classification Property.If X and Y represents two attribute informations respectively, SU value is more big then to be represented between feature and feature, between attribute and attribute Most in very strong redundancy.

3. the metric question based on mutual information

By Mutual Information Theory in information theory, redundancy, single spy between single features attribute and single features attribute Levying correlation between attribute and single label category attribute, the correlation between single label category attribute can be by following Formula is calculated：

Redundancy(X_i；X_j)=SU (X_i,X_j) (6)

Correlation(X_i；Y_j)=SU (X_i,Y_j) (7)

Correlation(Y_i；Y_j)=SU (Y_i,Y_j) (8)

The above formula of calculation formula by to(for) the redundancy between single feature attribute and characteristic attribute set can pass through The method that the redundancy summation of single attributive character and each attributive character in characteristic attribute set is averaged is calculated, public Formula is as follows：

Wherein, | X | represent in characteristic attribute set, the number of characteristic attribute, X_jRepresent some in characteristic attribute set Characteristic attribute.

Algorithm considers that application is the feature selecting algorithm of multi-tag, so to single features attribute and multi-tag class The relevance formula that the set that other attribute is constituted is produced is defined as：

Wherein, | Y | represent the number of label category attribute in label category attribute set, Y_jRepresent label category attribute collection Some label category attribute in conjunction.

This embedded multi-class attribute tags dynamic feature selection algorithm not only considers mutual between characteristic attribute Correlation between relation, characteristic attribute and label category attribute, the phase being additionally contemplates that between multi-tag category attribute inside Influence of the mutual relation to feature selecting, it is total for, if the category attribute of the category attribute of certain class label and other labels has Stronger correlation, then for such label category attribute, selected characteristic attribute out can be associated to other The stronger label category attribute of property equally just has preferable classification performance.So the correlation between tag attributes can be by following Formula solved.

Wherein, | Y | represent the number of label category attribute in label category attribute set, Y_jRepresent label category attribute collection Some label category attribute in conjunction, W (Y_i) represent Y_iThe average value of first closing property in multi-tag category attribute set, numerical value Show that this label category attribute possesses more correlation label category attributes in label category attribute set more greatly.Then to this The beneficial characteristic attribute of the classifying quality of the label category attribute label category attribute higher to correlation equally has actively just To influence.

Based on considerations above, following formula can be expressed as with reference to formula (9) and formula (10) relativity measurement：

4. feature ordering and feature selecting

In this ML_NIFS algorithm, calculate the degree of correlation of characteristic attribute and multi-tag category attribute, calculate characteristic attribute with The redundancy of characteristic attribute collection, by the degree of correlation between characteristic attribute and multi-tag category attribute and characteristic attribute and characteristic attribute The redundancy of collection combines, that is, the interpretational criteria being characterized, and then is ranked up feature by the interpretational criteria of feature, special The interpretational criteria levied is as follows：

W(X_i)=CCorrelation (X_i；Y)-Redundancy(X_i；H) (13)

Wherein, H is ranked characteristic attribute set, X_iTo wait the characteristic attribute of selection, CCorrelation (X_i； Y characteristic attribute and the correlation of multi-tag category attribute set, Redundancy (X) are represented_i；H characteristic attribute X) is represented_iWith The redundancy of the characteristic attribute collection of sequence

Feature selecting is the process that the feature for passing through feature ordering is carried out to selection, generally in multi-tag class In the feature selecting algorithm of other attribute, conventional method is the interpretational criteria according to follow-up sorting algorithm, feature, sets feature The threshold value of selection, feature selecting is carried out by threshold value.This algorithm characteristics is from the point of view of classification capacity, in the feature sequence sequenced The correlation that ranking is between feature above and multi-tag category attribute in row is stronger, characteristic attribute and characteristic attribute it Between redundancy than relatively low, the effect to classification is more preferable.The globality between characteristic attribute is considered simultaneously, multiple features should be belonged to Property integrally as analysis object.Sequencing feature attribute set H characteristic attributes subset and multi-tag can be obtained by formula (10) The correlation of category attribute collection.

Relatedness computation formula is as follows：

Wherein, H represents candidate feature set, and Y represents multi-tag category attribute, | Y | represent multi-tag category attribute collection Number of tags, | H | represent the number of characteristic attribute in ordering feature set.

According to the order of ordering characteristic attribute, the average value of the degree of correlation is calculated by formula (13)：

H_jRepresent to deserved preceding j characteristic attribute；If Correlation (H_j；Y) it is more than Correlation_{It is average}(H；Y) And Correlation (H_j+1；Y) it is less than Correlatino_{It is average}(H；Y), then this j characteristic attribute is exactly obtained spy Levy attribute.

5. Embedded dynamic mutual imformation computational methods

Module based on mutual information, the probability distribution situation progress that we will concentrate to feature in sample data first Rational to calculate, after being determined for sample data, feature is in the case where the probability of place sample data set is namely uniquely determined Come, but being constantly selected with feature, the sample data that sample data is concentrated will be identified constantly, then Will be varied from the calculating process of mutual information, if still select traditional computational methods based on mutual information will produce compared with Big error, therefore, identified sample data provide " deceptive information " to non-selected feature in terms of calculating.

For the dynamic feature selecting proposed in algorithm, main research contents is how to recognize that those can be by The sample data of feature recognition is selected, and data are rejected from data set, and information is calculated from new according to remaining sample data Entropy, selects during algorithm is run a kind of embedded grader to carry out the identification of sample, embedded KNN points is selected herein herein Class device recognizes recognizable sample, and deletion that the sample data recognized by KNN graders is concentrated from sample data, not While changing feature with Category Relevance, the number of the sample data of data set and the dimension of feature are reduced.

Brief description of the drawings

Embedded multi-class attribute dynamic feature selection methods of the Fig. 1 based on mutual information

Fig. 2 applications have selected feature to be classified, the mean accuracy that classifier parameters=1 is classified

Fig. 3 applications have selected feature to be classified, the coverage rate that classifier parameters=1 is classified

Fig. 4 applications have selected feature to be classified, the ranking loss that classifier parameters=1 is classified

Fig. 5 applications have selected feature to be classified, the mean accuracy that classifier parameters=0.8 is classified

Fig. 6 applications have selected feature to be classified, the coverage rate that classifier parameters=0.8 is classified

Fig. 7 applications have selected feature to be classified, the ranking loss that classifier parameters=0.8 is classified

Embodiment

Characteristic set is divided into two parts, is to have selected characteristic set and alternative characteristic set respectively, respectively with H and X is represented.Multi-tag category attribute represents that sample data set is represented with O with Y.

First, correlation highest characteristic attribute is selected according to formula (12), and be added into feature set H, simultaneously will It is removed from characteristic attribute collection X.

Then, according to formula (16) by Euclidean distance d, sample is searchedNearest samples, sample size is k. This k nearest samples constitutes neighbour's data acquisition system

Wherein, (Y^NN)_iIn the data sample for representing i-th of multi-tag category attribute, the category result data of multiple labels, The quantity for concentrating sample is N.For the sample of sample classification will be carried out.

By neighbour's data setIn attribute data it is accurate by maximum ballot with each label category attribute respectively Then carry out the categorical attribute of judgement sample data.By the sample of KNN graders judgement sample concentration is used for multiple times in each label Under classification, and whether judgement sample data correctly classified, will if each label category attribute is correctly classified Sample data is deleted from data sample set

Then, remaining characteristic attribute in feature set X and new sample data set are calculated into comentropy from new, by calculating Formula (13) will be such that the maximum characteristic attribute of formula (13) adds in feature set H.Simultaneously by this characteristic attribute from characteristic attribute collection X Remove.

Finally, step (2) is repeated with step (3) until all characteristic attributes arrange completion, or data sample Untill this classification quantity less than KNN graders.

It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms；Therefore, no matter From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power Profit is required rather than described above is limited, it is intended that all in the implication and scope of the equivalency of claim by falling Change is included in the present invention, and any reference in claim should not be considered as to the claim involved by limitation；

Moreover, it will be appreciated that although the present specification is described in terms of embodiments, not each embodiment is only wrapped Containing an independent technical scheme, this narrating mode of specification is only that for clarity, those skilled in the art should Using specification as an entirety, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art It may be appreciated other embodiment.

Claims

1. a kind of embedded multi-class attribute tags dynamic feature selection method, it is characterised in that comprise the following steps：It is situated between first Continue traditional feature selection approach based on mutual information.

1. data acquisition system is pre-processed

Database is extremely easily invaded and harassed by noise data, AFR control and inconsistent data in real world now, There is substantial amounts of Data Preprocessing Technology at this stage, can generally be divided into data scrubbing, data integration, data conversion and data rule About technology.Data scrubbing can clear data middle noise data, correct it is inconsistent, voluntarily fill up the missing data of sample data, Data conversion (data normalization) can improve the precision and validity for the algorithm for being related to distance metric.Such as it is desirable to data Meet certain specific data distribution, it is desirable alternatively to each data characteristics is mapped in a certain section of specific data interval, all It is to need to carry out data conversion.The pretreatment of data acquisition system mainly divides several parts for this paper：First by data Noise data and inconsistent data AFR control in set are handled.Second will be completely unrelated with classification in data set Attribute data is deleted.Attribute data progress norm normalized is made norm be normalized to 1 by the 3rd, then is had：

{\hat{f}}_{i} = \frac{f_{i}}{| | f_{i} | |} - - - (1)

2. the relevant knowledge of mutual information

The selection target of feature selecting is to select key in the characteristic attribute for most worthy of classifying, feature selecting need The problem of solution is metric question, metric question to consider correlative relationship between property set and class label, property set and Dependency relation inside the redundancy sexual intercourse of property set and tag attributes collection.Therefore discussed for this correlation problem, Mutual information in selection information theory analyzes the correlation between attribute as module.It is described below in information theory related Theoretical and operation rule.

Comentropy is vital concept in information theory theory, and comentropy is a kind of uncertainty degree for characterizing variable, purpose It is the number of representation manners content.

H (X) = - Σ_{i = 1}^{n} p (x_{i}) \log p (x_{i}) - - - (2)

Wherein, p (x_i) probability that variable X value is xi is represented, the uncertainty degree of variable X just can be with comentropy H (X) come table Show, the size and the probability distribution of variable of H (X) value have relation, therefore effectively overcome partial noise data in comentropy Interference.

Conditional entropy refers under conditions of a known variable that the variable of the uncertainty degree of another variable, i.e., one is to another The degree of strength of the degree of dependence of variable, thus stochastic variable X to another stochastic variable Y degree of dependence can with conditional entropy come Characterize.

H (X | Y) = - Σ_{j = 1}^{m} p (y_{j}) Σ_{i = 1}^{n} p (x_{i} | y_{j}) l o g 2 p (x_{i} | y_{j}) - - - (3)

Wherein, p (x_i) represent variable X prior probability p (x_i|y_j) represent that variable Y posteriority of variable X under the conditions of known is general Rate.

Mutual information is to characterize the degree that interdepends between two stochastic variables, represents the jointly owned letter between two variables Breath amount number, when the value of mutual information for 0 is that minimum value represents that identical information is not present between two variables, work as mutual information Represent that the identical information that two variables are included is relatively more when being worth larger.It is defined as：

I(X；Y)=H (X)-H (X | Y) (4)

Mutual information very effective can reflect the correlation between two stochastic variables, and can be by the form of numerical value Show, the tightness degree of the correlation between two stochastic variables is stated with the size of numerical value, but calculating two The growth pattern of information is also contemplated that while individual stochastic variable mutual information content, if directly selected with the size of mutual information Select feature, it will select those values than larger feature, so mutual information is normalized, used in processing procedure The degree formula of correlation between symmetrical uncertainty SU measures characteristics variable and characteristic variable is as follows：

S U (X, Y) = \frac{2 I (X; Y)}{H (X) + H (Y)} - - - (5)

By formula (5) it can be seen that the excursion of SU degree of correlation values is by 0 to 1, if SU value is 0, X and Y is represented not It is separate for there is correlation, i.e. X with Y.If SU value is 1, represent that X and Y has very strong correlation, if X Attribute information and classification information are represented respectively with Y, SU value is more big, represent that feature has strong correlation for the selection of classification. If X and Y represents two attribute informations respectively, SU value is more big then to be represented between feature and feature, between attribute and attribute most In very strong redundancy.

3. the metric question based on mutual information

By Mutual Information Theory in information theory, redundancy, single features category between single features attribute and single features attribute Property can be by formula below with the correlation between the correlation between single label category attribute, single label category attribute Calculated：

Redundancy(X_i；X_j)=SU (X_i,X_j) (6)

Correlation(X_i；Y_j)=SU (X_i,Y_j) (7)

Correlation(Y_i；Y_j)=SU (Y_i,Y_j) (8)

Can be by single for the calculation formula of the redundancy between single feature attribute and characteristic attribute set by above formula The method that the redundancy summation of attributive character and each attributive character in characteristic attribute set is averaged is calculated, and formula is such as Under：

Re d u n d a n c y (X_{i}; X) = \frac{1}{| X |} \underset{X_{j} &Element; X}{Σ} Re d u n d a n c y (X_{i}; X_{j}) - - - (9)

Wherein, | X | represent in characteristic attribute set, the number of characteristic attribute, X_jRepresent some feature category in characteristic attribute set Property.

Algorithm considers that application is the feature selecting algorithm of multi-tag, so belonging to single features attribute and multi-tag classification Property constituted set produce relevance formula be defined as：

C o r r e l a t i o n (X_{i}; Y) = \frac{1}{| Y |} \underset{Y_{j} &Element; Y}{Σ} C o r r e l a t i o n (X_{i}; Y_{j}) - - - (10)

Wherein, | Y | represent the number of label category attribute in label category attribute set, Y_jRepresent in label category attribute set Some label category attribute.

This embedded multi-class attribute tags dynamic feature selection algorithm not only consider correlation between characteristic attribute, Correlation between characteristic attribute and label category attribute, the correlation being additionally contemplates that between multi-tag category attribute inside Influence to feature selecting, it is total for, if the category attribute of the category attribute of certain class label and other labels have it is stronger Correlation, then for such label category attribute, selected characteristic attribute out can be stronger to other associated property Label category attribute equally just have preferable classification performance.So the correlation between tag attributes can be by following formula Solved.

W (Y_{i}) = \frac{1}{| Y | - 1} \underset{Y_{j} &Element; Y, j &NotEqual; i}{Σ} C o r r e l a t i o n (Y_{i}, Y_{j}) - - - (11)

Wherein, | Y | represent the number of label category attribute in label category attribute set, Y_jRepresent in label category attribute set Some label category attribute, W (Y_i) represent Y_iThe average value of first closing property in multi-tag category attribute set, numerical value is bigger Show that this label category attribute possesses more correlation label category attributes in label category attribute set.Then to this label The beneficial characteristic attribute of the classifying quality of the category attribute label category attribute higher to correlation equally has actively positive Influence.

C C o r r e l a t i o n (X_{i}; Y) = \frac{1}{| Y |} \underset{Y_{j} &Element; Y}{Σ} (C o r r e l a t i o n (X_{i}; Y_{j}) + W (Y_{j})) - - - (12)

4. feature ordering and feature selecting

In this ML_NIFS algorithm, the degree of correlation of characteristic attribute and multi-tag category attribute is calculated, characteristic attribute and feature is calculated The redundancy of property set, by the degree of correlation between characteristic attribute and multi-tag category attribute and characteristic attribute and characteristic attribute collection Redundancy combines, that is, the interpretational criteria being characterized, and then is ranked up feature by the interpretational criteria of feature, feature Interpretational criteria is as follows：

W(X_i)=CCorrelation (X_i；Y)-Redundancy(X_i；H) (13)

Wherein, H is ranked characteristic attribute set, X_iTo wait the characteristic attribute of selection, CCorrelation (X_i；Y) table Show characteristic attribute and the correlation of multi-tag category attribute set, Redundancy (X_i；H characteristic attribute X) is represented_iWith having sorted Characteristic attribute collection redundancy

Feature selecting is the process that the feature for passing through feature ordering is carried out to selection, is generally belonged in multi-tag classification Property feature selecting algorithm in, conventional method is the interpretational criteria according to follow-up sorting algorithm, feature, set feature selecting Threshold value, feature selecting is carried out by threshold value.This algorithm characteristics is from the point of view of classification capacity, in the characteristic sequence sequenced The correlation that ranking is between feature and multi-tag category attribute above is stronger, between characteristic attribute and characteristic attribute Redundancy is than relatively low, and the effect to classification is more preferable.The globality between characteristic attribute is considered simultaneously, should be whole by multiple characteristic attributes Body is used as analysis object.Sequencing feature attribute set H characteristic attributes subset and multi-tag classification can be obtained by formula (10) The correlation of property set.Relatedness computation formula is as follows：

C o r r e l a t i o n (H; Y) = \frac{1}{| H |} \underset{X_{i} &Element; H}{Σ} \frac{1}{| Y |} \underset{Y_{j} &Element; Y}{Σ} C o r r e l a t i o n (X_{i}; Y_{j}) - - - (14)

Wherein, H represents candidate feature set, and Y represents multi-tag category attribute, | Y | represent the label of multi-tag category attribute collection Number, | H | represent the number of characteristic attribute in ordering feature set.

H_jRepresent to deserved preceding j characteristic attribute；If Correlation (H_j；Y) it is more than Correlation_{It is average}(H；Y) and Correlation(H_j+1；Y) it is less than Correlation_{It is average}(H；Y), then this j characteristic attribute is exactly obtained feature category Property.

5. Embedded dynamic mutual imformation computational methods

Module based on mutual information, it is reasonable that the probability distribution situation that we will be concentrated in sample data to feature first is carried out Calculating, for sample data determine after, feature is namely uniquely decided in the probability of place sample data set, but With being constantly selected for feature, the sample data that sample data is concentrated will be identified constantly, then in mutual trust It will be varied from the calculating process of breath, if it is larger still to select traditional computational methods based on mutual information to produce Error, therefore, identified sample data provide " deceptive information " to non-selected feature in terms of calculating.

For the dynamic feature selecting proposed in algorithm, main research contents is how to recognize that those can be by having selected spy The sample data of identification is levied, and data are rejected from data set, and comentropy is calculated from new according to remaining sample data, this Selected works are selected to be embedded in a kind of grader to carry out the identification of sample during algorithm is run, and embedded KNN graders are selected herein To recognize recognizable sample, and the deletion that the sample data recognized by KNN graders is concentrated from sample data, do not changing While feature is with Category Relevance, the number of the sample data of data set and the dimension of feature are reduced.

2. the embedded multi-class attribute tags dynamic feature selection method of one kind according to claim 1, it is characterised in that： The degree of correlation of characteristic attribute and multi-tag category attribute is calculated, the redundancy of characteristic attribute and characteristic attribute collection is calculated, by feature The degree of correlation between attribute and multi-tag category attribute combines with characteristic attribute with the redundancy of characteristic attribute collection, is spy The interpretational criteria levied, and then be ranked up feature by the interpretational criteria of feature, the interpretational criteria of feature is as follows：

W(X_i)=CCorrelation (X_i；Y)-Redundancy(X_i；H) (16)

Feature selecting is the process that the feature for passing through feature ordering is carried out to selection, is generally belonged in multi-tag classification Property feature selecting algorithm in, conventional method is the interpretational criteria according to follow-up sorting algorithm, feature, set feature selecting Threshold value, feature selecting is carried out by threshold value.This method feature is from the point of view of classification capacity, in the characteristic sequence sequenced The correlation that ranking is between feature and multi-tag category attribute above is stronger, between characteristic attribute and characteristic attribute Redundancy is than relatively low, and the effect to classification is more preferable.The globality between characteristic attribute is considered simultaneously, should be whole by multiple characteristic attributes Body is used as analysis object.Sequencing feature attribute set H characteristic attributes subset and the phase of multi-tag category attribute collection can be obtained Guan Xing.

Relatedness computation formula is as follows：

C o r r e l a t i o n (H; Y) = \frac{1}{| H |} \underset{X_{i} &Element; H}{Σ} \frac{1}{| Y |} \underset{Y_{j} &Element; Y}{Σ} C o r r e l a t i o n (X_{i}; Y_{j}) - - - (17)

According to the order of ordering characteristic attribute, the average value of the degree of correlation is calculated by formula (18)：

A kind of embedded multi-class attribute tags dynamic feature selection method after improvement, passes through the correlation of mutual information in information theory Knowwhy, the multi-tag dynamic feature selection algorithm based on mutual information that the present invention is described, is reasonably analyzing feature It is mutual between the correlation of correlation, characteristic attribute and category attribute between attribute and characteristic attribute, category attribute Relation, and dynamic feature selecting is carried out by dynamic mutual information computational methods, precision of the data result by classification, classification Coverage rate, classification ranking lose 3 evaluation criterions experimental result is analyzed, show can obtaining for feature selecting algorithm Smaller character subset is obtained, characteristic dimension is reduced, is that the effect of classification is become better and better, and with preferable stability.