CN108805156A

CN108805156A - A kind of improved selective Nae Bayesianmethod

Info

Publication number: CN108805156A
Application number: CN201810291375.6A
Authority: CN
Inventors: 姚全珠; 李莎莎; 费蓉; 范慧敏; 白赞
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2018-04-03
Filing date: 2018-04-03
Publication date: 2018-11-13

Abstract

The improved selective Nae Bayesianmethod of one kind disclosed by the invention, includes the following steps：WoE values and IV values are introduced into Attributions selection, obtained and the higher attribute set of the classification degree of correlation, construction Naive Bayes Classifier；Then redundant attributes are further deleted on its basis, obtain optimum attributes subset.On the basis of the existing bayesian algorithm of the improved selective Nae Bayesianmethod of the present invention, WoE and IV indexs are introduced into Attributions selection, classification performance of the naive Bayesian in Attribute Redundancy is improved, while keeping the classification performance of naive Bayesian in the case of attribute not redundancy；It is screened to obtain first round attribute set according to threshold value, to reduce traversal space, solves the problems, such as to improve the correctness of classification while reducing attribute dimensions.

Description

A kind of improved selective Nae Bayesianmethod

Technical field

The invention belongs to attribute selection method technical fields, and in particular to a kind of improved selective naive Bayesian side Method.

Background technology

For a learning tasks, given attribute collection, some of which attribute may be very crucial, very useful, we by its Referred to as " association attributes " (relevant feature), otiose attribute are known as " unrelated attribute " (irrelevant feature).The process of association attributes subset is selected from given attribute set, referred to as " Attributions selection " (feature selection)。

Dimension disaster problem often is caused because attribute is excessive in realistic task, if can be screened by Attributions selection Go out important attribute, then the efficiency of processing high dimensional data can be greatly improved.In addition to removal " unrelated attribute ", should also remove " redundant attributes ", the generic attribute that i.e. those information for itself being included can be deduced out from other attributes.It should be noted that Being that Attributions selection process must assure that does not lose important attribute, otherwise follow-up learning process can because the missing of important information and The performance that can not have been obtained.Selected attribute can improve the performance of model, the characteristics of capable of more helping us understand data, Fabric, this to further improving model, algorithm suffers from important function.

Common attribute selection method can substantially be divided into three classes：Filtering type (filter), packaging type (wrapper) and Embedded (embedding).Filtering type method passes through related coefficient, information gain (InfoGain), ratio of profit increase (Gain Ratio), the methods of OneR calculates the degree of correlation that primitive attribute concentrates each conditional attribute and category attribute, then carries out " filtering ", then with filtered attribute come training pattern.The standard of filter attribute is to carry out related journey according to primitive attribute collection The priority of degree sorts.

The case where need not considering subsequently to learn from filtering type Attributions selection, is different, and packaging type Attributions selection is directly final Using the performance of learner to be used as the evaluation criterion of attribute set.Generally, due to packaging type attribute selection method It is optimized directly against given learner, therefore from the point of view of final learner performance, packaging type Attributions selection is than filtering type category Property selection it is more preferable.But then, due to needing repeatedly to train learner during Attributions selection, the choosing of packaging type attribute The computing cost selected is usually more much bigger than filtering type Attributions selection.

In filtering type and packaging type attribute selection method, Attributions selection process has significantly with learner training process Difference；Unlike this, embedded Attributions selection is that Attributions selection process and learner training process combine together, and the two exists It is completed in the same optimization process, i.e., has automatically carried out Attributions selection in learner training process.

Invention content

The object of the present invention is to provide a kind of improved selective Nae Bayesianmethods, solve and are reducing attribute dimension The problem of correctness of classification is improved while spending

The technical solution adopted in the present invention is a kind of improved selective Nae Bayesianmethod, including following step Suddenly：

Step 1, the data set T containing n attribute is given, if S={ A₁, A₂..., A_nIt is that Category Attributes variable is limited Collection, C={ C₁, C₂..., C_mIt is class variable, m is the value number of class variable, C_jFor j-th of value of class variable；

When discussing two classification problems, that is, assume j=2, C={ C₁, C₂When, for arbitrary conditional attribute variables A_iIf It has S_iA different valueThat is attribute A_iK-th of value be expressed as a_ik；

Step 2, WoE indexs are defined

WoE indexs are a kind of coding forms to original argument, to carry out WoE codings to a variable, need first This variable is grouped processing, such as formula (2) and (3)：

In formula (2)-(3) formula：C₁Indicate the class label of the 1st training sample, C₂Indicate the category of the 2nd training sample Label, P (A=a_ik| C=C₁) expression attribute be a_ik, classification C₁Conditional probability, P (A=a_ik| C=C₂) expression attribute be a_ik、 Classification is C₂Conditional probability, N (C) indicate classification be C sample number, N be data sample sum, N (A=a_ik| C) indicate class Not and attribute value is a_ikWhen sample number；

Step 3, IV indexs are defined

IV indexs are the information content for weighing variable, i.e., independent variable is for the influence degree of target variable, such as formula (4) shown in：

IV(a_ik, C) and=(P (A=a_ik| C=C₁)-P (A=a_ik| C=C₂))*WoE(a_ik,C) (4)

Then attribute A_iIV values be each grouping the sum of IV values, i.e.,：

Step 4, in conjunction with step 1, the IV indexs of the WoE indexs of step 2 and step 3 are introduced into Attributions selection, construction Piao Plain Bayes classifier；

Step 5, on the basis of step 4, need to first pass through IV indexs has the Category Attributes variable of the most original of step 1 Limit collection S is filtered, and obtains the attribute set S' for meeting threshold requirement, and to the attribute in S' according to IV values size by height to Low sequence is ranked up, and finally search can be such that the performance of grader is optimal on the attribute set S' of aligned orderly Attribute set.

It is of the invention to be further characterized in that,

The concrete operations of step 4 are：

Step 4.1, the attribute set highly relevant with classification is filtered out from primitive attribute set by the calculating of IV values：

According to naive Bayesian weighted formula it is found that carry out classification to sample X needs formula (6), formula (7)：

In formula (6)-(7)：P(a_ik|C₁) and P (A=a_ik| C=C₁) identical, expression attribute is a_ik, classification C₁Item Part probability；P(a_ik|C₂) and P (A=a_ik| C=C₂) identical, expression attribute is a_ik, classification C₂Conditional probability；P(C₁) table Show that classification is C₁Conditional probability；P(C₂) expression classification be C₂Conditional probability；P(C₁| X) indicate attribute be X, classification C₁'s Conditional probability；P(C₂| X) indicate attribute be X, classification C₂Conditional probability；X indicates each database sample without class label This uses n dimensional feature vectors；

Step 4.2, selected threshold carries out attribute filtering

Formula (6), which is normalized, can obtain formula (8)：

WhereinUnderstand that a is constant under given data set,Similarly, to public affairs Formula (7), which is normalized, can obtain formula (9)：

In formula (8)-(9)：P(C₁| X) ' normalization after indicate attribute be X, classification C₁Conditional probability；P(C₂|X)' Indicate that attribute is X after normalizing, classification C₂Conditional probability；

Step 4.3, in step 4.2 Naive Bayes Classifier is constructed on the preferable attribute set of classification capacity.

The division of threshold value and the degree of correlation of IV values measurement attribute and generic attribute in step 4.2 is as follows：

Degree of correlation	IV values
		Non-correlation	IV<0.02
Weak dependence	0.02≤IV<0.1
		Medium correlation	0.1≤IV<0.3
Strong correlation	IV≥0.3

The concrete operations of step 5 are：

Step 5.1, sample data set T to be sorted in input step 1, conditional attribute set, that is, Category Attributes variable have Limit collection S={ A₁, A₂..., A_n, decision attribute set, that is, class variable C={ C₁, C₂..., C_m}；And to sample data to be sorted Collect T and carries out data prediction；

Step 5.2, alternative conditions attribute set S is initialized, is S' by the attribute set that Attributions selection is selected, not Selected attribute set is S ", and the attribute set according to the height sequence of the IV indexs of attribute is S " ', enable S', S ", S " ' be all Sky, maximum accuracy Accuracy_max=0, current accuracy Accuracy_cur=0；

Step 5.3, in design conditions attribute set S all properties information value IV values, and pass through threshold value carry out first Wheel screening, the attribute that IV values are more than or equal to threshold value are added in the S " ' in set step 5.2, and IV values are less than the attribute of threshold value It is added in set S ", and sorts from high to low according to IV values to the attribute in S "；

Step 5.4, if S " ' is 0, terminate to calculate, preserve S' and Accuracy at this time_max；

Step 5.5, if S " ' is not 0, continue to calculate, select first attribute A in attribute set S " '_i, by its from It deletes, and is added in S' in S " ', Naive Bayes Classifier is constructed on newer S', and calculate Accuracy_cur；

Step 5.6, if Accuracy in step 5.5_cur>Accuracy_max, then it is updated to Accuracy_max= Accuracy_cur；If the Accuracy in step 5.5_cur≤Accuracy_max, by A_iIt removes, is added in S " from S', Preserve S' and Accuracy at this time_max, terminate to calculate.

The beneficial effects of the invention are as follows：A kind of existing Bayes of improved selective Nae Bayesianmethod of the invention On the basis of algorithm, WoE and IV indexs are introduced into Attributions selection, improve classification performance of the naive Bayesian in Attribute Redundancy, The classification performance of naive Bayesian is kept in the case of attribute not redundancy simultaneously；It is screened to obtain first round attribute according to threshold value Subset solves the problems, such as to improve the correctness of classification while reducing attribute dimensions to reduce traversal space.

Specific implementation mode

The present invention is described in detail With reference to embodiment.

The most common conventional method of correlation is linear dependence method between computation attribute, but as an attribute Ordering mechanism, major defect is only sensitive to linear relationship, if there are nonlinear relationships between attribute, even if they With one-to-one relationship, related coefficient may also be close to 0, and it includes numeric type that this method, which requires all properties, Attribute value.In order to overcome these disadvantages, introduce some methods based on information theory, as information gain, ratio of profit increase and it is symmetrical not The serial of methods such as deterministic coefficient, but when carrying out Attributions selection by the correlation between computation attribute and classification, these There is no specific threshold values to set for method, can only sort according to degree of correlation to attribute, still needs to be carried in time and space efficiency It is high.

If there are two discrete random variable X and Y, KL divergences are also known as using relative entropy and portray two discrete random variables The distance between D_KLFormula (1) is shown in in=(YX), definition：

D_KL(Y | X)=∑ (P_Y(i)log(P_Y(i)/P_X(i))) (1)

In formula (1), the signal of each variable：P_Y(i) probability distribution when being stochastic variable Y=i；P_X(i) it is random The probability distribution of variable X=i；

Relative entropy is a kind of mode for describing two probability distribution variances, and value is greater than always equal to 0, when and only When two branches are identical, relative entropy is equal to 0.WoE values and IV values are exactly to develop from relative entropy.WoE(Weight of Evidence) it is a kind of coding form to original argument；IV (Information Value) is for weighing variable The influence degree of information content, i.e. independent variable for target variable.

A kind of improved selective Nae Bayesianmethod of the invention, includes the following steps：

Step 2, WoE indexs are defined

WoE is a kind of quantitative analysis method of the combination example based on particular category, and this method is as a kind of probability statistics Theory is suggested for the first time the 1950s, is applied to medical diagnosis system later, the eighties in last century, WoE was used extensively In GIS-Geographic Information System.In recent years, gradually attention is recorded with the expansion of bank and other financial mechanism business and to personal credit, WoE is increasingly becoming the hot spot of research as a kind of method for weighing customer capital quality.

WoE indexs are a kind of coding forms to original argument, to carry out WoE codings to a variable, need first This variable is grouped processing, is also discretization, branch mailbox etc..Such as formula (2) and (3)：

In formula (2)-(3) formula：C₁Indicate the class label of the 1st training sample, C₂Indicate the category of the 2nd training sample Label, P (A=a_ik| C=C₁) expression attribute be a_ik, classification C₁Conditional probability, P (A=a_ik| C=C₂) expression attribute be a_ik、 Classification is C₂Conditional probability, N (C) indicate classification be C sample number, N be data sample sum, N (A=a_ik| C) indicate class Not and attribute value is a_ikWhen sample number.

What value number changed the influence WoE reflections brought is a kind of proportionate relationship, and is reduced by Logarithmic calculation Positive and negative example, such as under certain value of a certain attribute, more one of positive example can't be such that the generation of WoE values dramatically changes so that accidentally Influence of the problems such as difference sampling, noise to WoE can obtain certain control.WoE is primarily used to weigh under same attribute respectively A value is to the tendentiousness of classification results, if certain value of an attribute only occurs primary and is directed to classification results Positive example can not illustrate this value just to have classification results to be positive absolute guidance quality.Therefore, when needing to weigh the attribute When in entire property set to the percentage contribution of classification results, it is necessary to the height conduct for being included in the value frequency of occurrences is considered, This has just caused the appearance of IV indexs.

Step 3, IV indexs are defined

IV(a_ik, C) and=(P (A=a_ik| C=C₁)-P (A=a_ik| C=C₂))*WoE(a_ik,C) (4)

Then attribute A_iIV values be each grouping the sum of IV values, i.e.,：

Step 4, in conjunction with step 1, step 2WoE indexs and step 3IV indexs are introduced into Attributions selection, construct simple shellfish This grader of leaf.

IV values are introduced into Naive Bayes Classification Algorithm, as a part for Attributions selection, to fully consider IV values pair The influence of Naive Bayes Classification Algorithm can be seen that from formula of mathematical (4)-(5) of IV values：

As P (A_i=a_ik| C=C₁)>P(A_i=a_ik| C=C₂)>When 0, WoE (a_ik,C)>0, IV (a_ik,C)>0；

When 0<P(A_i=a_ik| C=C₁)<P(A_i=a_ik| C=C₂) when, WoE (a_ik,C)<0, IV (a_ik,C)>0；

As P (A_i=a_ik| C=C₁)=P (A_i=a_ik| C=C₂) when, WoE (a_ik, C) and=0, IV (a_ik, C)=0.

Step 4.2, selected threshold carries out attribute filtering

Formula (6), which is normalized, can obtain formula (8)：

In formula (8)-(9)：P(C₁| X) ' indicate that attribute is X after normalizing, classification C₁Conditional probability；P(C₂|X)' Indicate that attribute is X after normalizing, classification C₂Conditional probability；

By analyzing above-mentioned formula：

As β (a_ik)>When 1, i.e. WoE (a_ik,C)>0, IV (a_ik,C)>When 0, β (a_ik) bigger, P (C₁| X) ' value get over It is small, and P (C₂| X) ' value it is bigger；

When 0<β(a_ik)<When 1, i.e. WoE (a_ik,C)<0, IV (a_ik,C)>When 0, β (a_ik) bigger, P (C₁| X) ' value get over It is small, and P (C₂| X) ' value it is bigger.

In summary：As β (a_ik)>When 0, β (a_ik) bigger, posterior probability P (C₁| X) ' and P (C₂| X) ' difference is bigger；When β(a_ikWhen)=1, β (a_ik) to P (C₁| X) ' and P (C₂| X) ' value all do not influence.

Due to β (a_ik) and IV (a_ik, C) and directly proportional, that is, as IV (a_ik,C)>When 0, value is bigger, shows different Difference of the class label on the variable is bigger, that is to say, that the classification capacity of the variable is better. IV(a_ik, C)=0 when, the variable Classification is not influenced.

As shown in table 1, the threshold value for giving the degree of correlation that IV values weigh attribute and generic attribute divides.

The threshold value that 1 IV values of table weigh the degree of correlation of attribute and generic attribute divides

Degree of correlation	IV values
		Non-correlation	IV<0.02
Weak dependence	0.02≤IV<0.1
		Medium correlation	0.1≤IV<0.3
Strong correlation	IV30.3

From the foregoing, it will be observed that IV values are introduced Attributions selection, there will be following advantage：(1) IV can weigh each attribute to classification Influence size；(2) compare with other most Feature Selection Algorithms, IV has more pervasive threshold value to determine attribute and class The degree of correlation of attribute, and other most Attributions selection indexs are typically only capable to enough sort to influence power size, are not easy to determine selection How many a attributes can just achieve the effect that relatively satisfactory；(3) IV calculating speeds are fast, and space-time expense is also small, and it is pre- to be appropriate for data Processing.

But directly IV indexs are also needed to consider problems with as Attributions selection：Actually calculate in, IV values for from Decorrelated probability value can be obtained by statistics by dissipating the calculating of attribute, but be difficult to obtain the probability distribution of data for connection attribute, And the calculatings tool of integral acquires a certain degree of difficulty and larger space-time expense, so in IV calculating early period, can by connection attribute from Dispersion processing, to ensure the high efficiency of IV indexs.In addition, in actual data, it often will appear under certain attribute certain and take It is worth the case where frequency is 0, i.e., as attribute A_iValue is a_ikWhen, there is no the examples that classification is C, and it is 0 that this, which may result in denominator, Situation so that the value of WoE tends to be infinitely great, it is clear that is unreasonable.Normalized is taken for such case, at it We indicated that the slight variations of value will not cause the fluctuation of WoE values in preceding analysis, therefore in attribute certain value When instance number is 0, we are handled as 1.

The purpose of attribute set selection, exactly finds a succinct subset of primitive attribute set so that learning algorithm exists It only include the grader of the pinpoint accuracy as far as possible of generation one after being run on the data acquisition system of attribute in this subset.And it is original Attribute set is mostly such as the data set of UCI Repository of machine learning databases, such as 2 institute of table Show, there are the attributes such as Continuous valued attributes and missing values.Therefore, the key of attribute set selection is to find a simplification and excellent Attribute set.One excellent attribute set had not only shown that this set was whole very strong with the relevance of generic attribute, but also table Whole redundancy very little in this set between attribute and attribute now.In order to select an optimal subset, attribute is being carried out When subset selects, it is necessary to consider the relevance in subset between attribute and generic attribute and the redundancy between attribute and attribute Property.

2 data set of table describes

Selectivity Nae Bayesianmethod of the present invention is the category that can be obtained by IV index screenings with generic attribute strong correlation Temper collection, but the redundancy between ignoring attribute；And although Nae Bayesianmethod can solve Attribute Redundancy and ask Topic, but there is no the standards determined when carrying out attribute set screening, it is likely that it causes exhaustive to search for, it is multiple to increase calculating Miscellaneous degree and operational efficiency.

Step 5, IV indexs need to be first passed through to be filtered the Category Attributes variable finite aggregate S of most original, obtain meeting threshold It is worth desired attribute set S', and the attribute in S' is ranked up according to the sequence of IV values size from high to low, is finally arranging Show the attribute set that search can be such that the performance of grader is optimal on the attribute set S' of sequence.

Concrete operations are as follows：

Attribute selected by the present invention is the subset of naive Bayesian attribute set, it can improve naive Bayesian Classification performance in Attribute Redundancy, at the same in the case of attribute not redundancy keep naive Bayesian classification performance.Pass through IV concepts are introduced, are screened to obtain first round attribute set according to threshold value, to reduce traversal space, and pass through sweep forward Two wheel selections are carried out to attribute set, according to the principle of greedy algorithm, it is assumed that in each step of search, which thinks current All localized variations in attribute set are all optimal selections.

By the fine or not reference always related to specific interpretational criteria, different for the optimal subset that Attributions selection is selected It is often different to the degree of recognition of " optimal subset " under system, the optimal subset that standard A is obtained be likely at standard B be not Best.For supervised learning, final purpose and the meaning of Attributions selection are to make the accuracy of grader more preferable.Cause This, the present invention uses most intuitive way, and nicety of grading (or error rate) is used as evaluation criterion, by Naive Bayes Classification Measurement of the accuracy of device as Attributions selection quality degree.

For the validity of verification method, first, UCI Repository of machine learning are chosen The database of databases, reference table 1 compare attribute set number and classification correctness under different threshold values.To demonstrate,prove The real correctness for introducing IV indexs.Then, pass through IV indexs and common attribute selection method Cor (Correlation), GR (Gain Ratio), IG (INfoGain), OneR are compared, and calculate the phase of conditional attribute and category attribute under each method Pass degree and classification accuracy rate, to obtain the validity of the method for the present invention.Finally, to Naive Bayes Classifier and process The comparison of the performance of Naive Bayes Classifier after Attributions selection, it was demonstrated that improved the method for the present invention can significantly drop Ensure the accuracy of classification while low attribute dimensions.

Claims

1. a kind of improved selective Nae Bayesianmethod, which is characterized in that include the following steps：

Step 1, the data set T containing n attribute is given, if S={ A₁, A₂..., A_nIt is Category Attributes variable finite aggregate, C= {C₁, C₂..., C_mIt is class variable, m is the value number of class variable, C_jFor j-th of value of class variable；

When discussing two classification problems, that is, assume j=2, C={ C₁, C₂When, for arbitrary conditional attribute variables A_iIf it has S_i A different valueThat is attribute A_iK-th of value be expressed as a_ik；

Step 2, WoE indexs are defined

In formula (2)-(3) formula：C₁Indicate the class label of the 1st training sample, C₂Indicate the class label of the 2nd training sample, P (A=a_ik| C=C₁) expression attribute be a_ik, classification C₁Conditional probability, P (A=a_ik| C=C₂) expression attribute be a_ik, classification For C₂Conditional probability, N (C) indicate classification be C sample number, N be data sample sum, N (A=a_ik| C) indicate classification and category Property value be a_ikWhen sample number；

Step 3, IV indexs are defined

IV indexs are the information content for weighing variable, i.e., independent variable for target variable influence degree, such as formula (4) institute Show：

IV(a_ik, C) and=(P (A=a_ik| C=C₁)-P (A=a_ik| C=C₂))*WoE(a_ik,C) (4)

Then attribute A_iIV values be each grouping the sum of IV values, i.e.,：

Step 4, in conjunction with step 1, the IV indexs of the WoE indexs of step 2 and step 3 are introduced into Attributions selection, construct simple shellfish This grader of leaf；

Step 5, on the basis of step 4, Category Attributes variable finite aggregate S of the IV indexs to the most original of step 1 need to be first passed through It is filtered, obtains the attribute set S' for meeting threshold requirement, and to the attribute in S' according to IV values size from high to low suitable Sequence is ranked up, and finally search can make the property set that the performance of grader is optimal on the attribute set S' of aligned orderly It closes.

2. the improved selective Nae Bayesianmethod of one kind according to claim 1, which is characterized in that the step 4 Concrete operations be：

In formula (6)-(7)：P(a_ik|C₁) and P (A=a_ik| C=C₁) identical, expression attribute is a_ik, classification C₁Condition it is general Rate；P(a_ik|C₂) and P (A=a_ik| C=C₂) identical, expression attribute is a_ik, classification C₂Conditional probability；P(C₁) indicate classification For C₁Conditional probability；P(C₂) expression classification be C₂Conditional probability；P(C₁| X) indicate attribute be X, classification C₁Condition it is general Rate；P(C₂| X) indicate attribute be X, classification C₂Conditional probability；X expressions each do not have the database sample of class label to be tieed up with n Feature vector；

Step 4.2, selected threshold carries out attribute filtering

Formula (6), which is normalized, can obtain formula (8)：

WhereinUnderstand that a is constant under given data set,Similarly, to formula (7) Formula (9) can be obtained by being normalized：

In formula (8)-(9)：P(C₁| X) ' indicate that attribute is X after normalizing, classification C₁Conditional probability；P(C₂| X) ' indicate After normalization attribute be X, classification C₂Conditional probability；

3. the improved selective Nae Bayesianmethod of one kind according to claim 2, which is characterized in that the step The division of threshold value and the degree of correlation of IV values measurement attribute and generic attribute in 4.2 is as follows：

4. the improved selective Nae Bayesianmethod of one kind according to claim 1, which is characterized in that the step 5 Concrete operations be：

Step 5.1, sample data set T to be sorted in input step 1, conditional attribute set, that is, Category Attributes variable finite aggregate S ={ A₁, A₂..., A_n, decision attribute set, that is, class variable C={ C₁, C₂..., C_m}；And to sample data set T to be sorted into Line number Data preprocess；

Step 5.2, alternative conditions attribute set S is initialized, is S' by the attribute set that Attributions selection is selected, is not selected Attribute set be S ", the attribute set according to the height sequence of the IV indexs of attribute is S " ', enable S', S ", S " ' be all sky, it is maximum Accuracy Accuracy_max=0, current accuracy Accuracy_cur=0；

Step 5.3, in design conditions attribute set S all properties information value IV values, and pass through threshold value carry out first round sieve Choosing, the attribute that IV values are more than or equal to threshold value are added in the S " ' in set step 5.2, and the attribute that IV values are less than threshold value adds It sorts from high to low according to IV values into set S ", and to the attribute in S "；

Step 5.5, if S " ' is not 0, continue to calculate, select first attribute A in attribute set S " '_i, it is deleted from S " ' It removes, and is added in S', Naive Bayes Classifier is constructed on newer S', and calculate Accuracy_cur；

Step 5.6, if Accuracy in step 5.5_cur>Accuracy_max, then it is updated to Accuracy_max= Accuracy_cur；If the Accuracy in step 5.5_cur≤Accuracy_max, by A_iIt removes, is added in S " from S', protect Deposit S' and Accuracy at this time_max, terminate to calculate.