CN101231702A

CN101231702A - Categorizer integration method

Info

Publication number: CN101231702A
Application number: CNA2008100467899A
Authority: CN
Inventors: 高常鑫; 桑农; 王岳环; 唐奇伶; 李密; 高峻; 笪邦友
Original assignee: Huazhong University of Science and Technology
Current assignee: Streamax Technology Co Ltd
Priority date: 2008-01-25
Filing date: 2008-01-25
Publication date: 2008-07-30
Anticipated expiration: 2028-01-25
Also published as: CN100587708C

Abstract

The invention provides a classifier integration method, which comprises the following steps that: firstly, the weight of a training sample is initialized; secondly, the classifier training is carried out to the training sample, and a plurality of the optimal sub-classifiers are selected out by adopting the classifier performance evaluation norm which is described by the error seriousness and the generalization ability; thirdly, the optimal sub-classifiers are combined. The classifier performance evaluation norm adopted by the invention can precisely select out the sub-classifiers with good performance, the better of the sub-classifier performance is, the less quantity of the sub-classifiers required by obtaining the classifiers which have the same performance in a combining way is, namely the frequency and the time of the training cycle are less. Besides, the invention also adjusts and combines the classifiers through the feedback, thereby further improving the performance of the classifier.

Description

A kind of categorizer integration method

Technical field

The invention belongs to mode identification method, be specifically related to a kind of by the integrated method that improves the performance of sorter to sub-classifier.

Background technology

Sorter is integrated to be the popular a kind of technology that improves the sorter performance of target.We know the importance of sorter performance in pattern-recognition, but sometimes the precision of single sorter is limited, and categorizer integration method is just by integrated each sorter, construct a more high performance sorter, wherein commonly used is self-service gathering (Bagging, bootstrap aggregation) method and enhancing method (Boosting) method.

Bagging independent random from training set selects a certain number of data to form self-service data set, and each self-service data set all is used to train a sub-classifier independently, and final classification results is chosen in a vote according to the court verdict of these sub-classifiers.

The Boosting method successively produces a series of sorters in training, the employed training set of each sorter all is a subclass that puts forward from total training set, whether each sample appears at the performance of depending on the sorter that produced before this in this subclass, and existing sorter judges that the sample of makeing mistakes will appear at bigger probability in the new training subclass.This makes the sorter that produces thereafter be absorbed in processing comparatively difficult sample concerning existing sorter more and distinguishes problem.

The difference of Bagging and Boosting is that the selection of the training set of Bagging is at random, and each takes turns between the training set separate, and the selection of the training set of Boosting is not independently, and learning outcome of each each wheel of the selection of taking turns training set and front is relevant; Each anticipation function of Bagging does not have weight, and Boosting has weight; The generation that can walk abreast of each anticipation function of Bagging, and each anticipation function of Boosting can only generate in proper order.

Bagging and Boosting can improve the sorter performance effectively, but in the accuracy of most data centralization Boosting than Bagging height, and in great majority are used, accuracy rate is more even more important than arithmetic speed, because it is very fast that the cost performance of computing machine improves, and the training of data is off-lines.Therefore Boosting uses widely than Bagging in the reality.

The Boosting method can strengthen the generalization ability of given algorithm, but also has two shortcomings: this method need be known the lower limit of sub-classifier study accuracy, and this is difficult to accomplish in practical problems; Secondly, this method may cause sorter afterwards to concentrate too much on the minority sample of difficulty especially, causes performance unstable.

Adaptive Boosting method (AdaBoost:Adaptive Boosting) basic idea is to utilize a plurality of sub-classifiers to stack up by certain method, constitutes the strong classifier that classification capacity is very strong.The AdaBoost algorithm is based on the Boosting algorithm.The AdaBoost algorithm no longer needs to know in advance the training error rate of sub-classifier, but comes the training error rate of rudimentary algorithm is adapted to automatically by the weights of each sub-classifier of dynamic adjustments, thereby has caused concern widely.With the Boosting class of algorithms seemingly, the AdaBoost method is adjusted each sample according to existing sorter to the classification situation of each sample in total training set and is appeared at probability in the new training subclass.Different is that AdaBoost does not need to know in advance the scope of sub-classifier predictablity rate, but sets corresponding weights according to the predictablity rate of sub-classifier automatically.When using the AdaBoost training, the training error rate upper limit is a dull function that descends, like this, as long as surpassing at random that the performance of basic sorter can be stable guessed, and the circulation long enough time, just can make the experience rate reduce to low level arbitrarily, and might guarantee that extensive error rate also is lower than an approximate upper limit.It makes the performance of sub-classifier get a promotion by setting up Combination of Multiple Classifiers, because it is exclusive to the self-adaptation of sorter performance with to crossing the immunity of learning phenomenon, has caused great concern in recent years, and has been used widely in target detection.

AdaBoost has superiority in theory very much, but AdaBoost also has a lot of fuzzy places in actual applications: how (1) selects best sub-classifier; (2) how better to make up these sub-classifiers.

Summary of the invention

The objective of the invention is to propose a kind of categorizer integration method, use more effective sorter performance evaluation criterion to select the good sub-classifier of performance, reduce sorter training time and cycle index.

A kind of categorizer integration method, concrete steps are:

(1) gives training sample weights initialize;

(2) training sample is carried out the sorter training and obtain C best sub-classifier, C is a frequency of training;

(2.1) make t=1 cycle of training;

(2.2) adopt the weights normalization method to determine the weights of each training sample when the training of t wheel;

(2.3) take predtermined category device method for designing to generate corresponding sub-classifier at each feature of training sample, therefrom select best sub-classifier, and calculate the error rate that adopts best sub-classifier that all training sample classification are produced;

(2.4) upgrade the training sample weights according to error rate;

(2.5) if t＜C, t=t+1 returns step (2.2), otherwise enters step (3);

(3) C best sub-classifier of combination;

It is characterized in that,

Described step (2.3) adopts one of following dual mode to select best sub-classifier:

The sub-classifier h of j feature correspondence of i, calculation training sample _jExtent of error

{em}_{j} = \frac{1}{2} Σ_{i = 1}^{n} w_{t, i} | h_{j} (x_{i}) - y_{i} | | f_{j} (x_{i}) - θ_{j} |,

Wherein n represents the number of training sample, x _iThe expression training sample, w _{T, i}The weights of i sample when representing the training of t wheel, f _j(x _i) presentation class device h _jTo sample x _iClassification response, θ _jThe presentation class threshold value;

The sub-classifier of selecting minimal error degree correspondence is as best sub-classifier.

The sub-classifier h of j feature correspondence of ii, calculation training sample _jExtent of error

{em}_{j} = \frac{1}{2} Σ_{i = 1}^{n} w_{t, i} | h_{j} (x_{i}) - y_{i} | | f_{j} (x_{i}) - θ_{j} |,

Wherein n represents the number of training sample, x _iThe expression training sample,

w _{T, i}The weights of i sample when representing the training of t wheel, f _j(x _i) presentation class device h _jTo sample x _iClassification response, θ _jThe presentation class threshold value;

Calculate the sub-classifier h of j feature correspondence _jGeneralization ability

, f wherein _j(x ₊) expression sub-classifier h _jTo separating class threshold value θ _jOr the classification response of the nearest positive sample of classifying face, f _j(x _-) expression sub-classifier h _jTo separating class threshold value θ _jOr the classification response of the nearest negative sample of classifying face;

Miscount degree summation Error _j=em _j+ G _j, select minimal error degree summation Error _jCorresponding sub-classifier is as best sub-classifier.

The sorter performance evaluation criterion that the present invention adopts can be selected the good sub-classifier of performance exactly, and the sub-classifier performance is good more, and the sub-classifier quantity that makes up the sorter needs that obtain identical performance is just few more, promptly trains round-robin number of times and time all few more.In addition, the present invention also by the feedback adjusting assembled classifier, further strengthens the sorter performance.

Description of drawings

Fig. 1 is a process flow diagram of the present invention;

Fig. 2 is the sorter performance evaluation criterion synoptic diagram of hierarchical description;

Fig. 3 is the distribution situation of two groups of experimental datas, and the positive negative sample of Fig. 3 (a) is desirable evenly distribution, and the positive negative sample of Fig. 3 (b) is a Gaussian distribution;

Fig. 4 is to use sorter error rate interpretational criteria that the classification performance of experimental data is described, and Fig. 4 (a) is to classification of Data performance specification shown in Fig. 3 (a), and Fig. 4 (b) is to classification of Data performance specification shown in Fig. 3 (b);

Fig. 5 is to use the sorter performance specification of the sorter performance evaluation criterion of extent of error tolerance to experimental data, and Fig. 5 (a) is to classification of Data performance specification shown in Fig. 3 (a), and Fig. 5 (b) is to classification of Data performance specification shown in Fig. 3 (b);

Fig. 6 is the sorter performance specification of the sorter performance evaluation criterion of generalization ability to experimental data, and Fig. 6 (a) is to classification of Data performance specification shown in Fig. 3 (a), and Fig. 6 (b) is to classification of Data performance specification shown in Fig. 3 (b);

Fig. 7 is to use the sorter performance specification of the sorter performance evaluation criterion of hierarchy to experimental data, and Fig. 7 (a) is to classification of Data performance specification shown in Fig. 3 (a), and Fig. 7 (b) is to classification of Data performance specification shown in Fig. 3 (b);

Fig. 8 is an embodiment of the invention classification based training sample distribution synoptic diagram;

Fig. 9 is through assembled classifier performance of optimizing and the assembled classifier performance comparison diagram of not optimizing;

Figure 10 is through the assembled classifier performance of optimizing and being made of ten above sub-classifiers and the assembled classifier performance comparison diagram of optimization not.

Embodiment

Steps flow chart of the present invention now illustrates as shown in Figure 1:

If import n training sample (x _i, y _i), i=1 ..., n, wherein x _iThe characteristic value collection of expression training sample, i.e. x _i={ x _I1, x _I2..., x _Iq, the feature of an element representation in the set, this feature may be that numerical value also may be vector, q is the number of feature;

M negative sample arranged, l positive sample in the known training sample.

(1) design sub-classifier step.Determine the building method of sub-classifier according to known training sample, the sub-classifier form that j feature of training sample generates is

H wherein _jThe classification results of () expression sub-classifier, can only be ± 1, h _j()=1 expression sub-classifier classification results is positive sample, h _j()=-1 expression sub-classifier classification results is a negative sample; f _j() is the classification response of sub-classifier to sample, i.e. f when being characterized as a numerical value _j() representation feature value is when being characterized as f when vectorial _j() representation feature is to the distance of this sub-classifier classifying face; θ _jBe the sub-classifier threshold value, just obtained θ when obtaining a sub-classifier _j, it is difference along with the difference of sub-classifier method for designing; p _jGet ± 1, work as p _j=1 o'clock, p _jf _j()＜p _jθ _jExpression f _j()＜θ _j, work as p _j=-1 o'clock, p _jf _j()＜p _jθ _jExpression f _j()＞θ _jThe sub-classifier that obtains sorter threshold value or classifying face can have different methods for designing, as neural net method, least squares error method (MSE), support vector machine (SVM) and hidden Markov model (HMM) etc.

(2) initialization training sample weights step.The initialization error weight, w _{1, i}The weights initial value of representing i sample is for y _i=-1 sample

w

_{1, i} = \frac{1}{2 m};

For y _i=1 sample

w

_{1, i} = \frac{1}{2 l},

Do not adopt all samples to put the method for identical weights herein, but positive negative sample respectively account for 1/2.

(3) training classifier step.For each t=1 ... C, wherein C is the number of times of training, C macrotaxonomy device performance more is good more, can reach high precision arbitrarily, the accuracy that generalized case just can obtain near 90% for about 50 times along with C increases in theory.

(3.1) weights normalization,

w_{t, i} &LeftArrow; \frac{w_{t, i}}{Σ_{v = 1}^{n} w_{t, v}},

Make all weights and be 1, w _{T, i}The weights of i sample when representing the training of t wheel;

(3.2) make j=1 ... q generates corresponding sub-classifier h at j feature in the characteristic set according to the method for (1) _j, calculate the sorter performance metric of current weight, use the sorter performance evaluation criterion of hierarchical description to estimate the sub-classifier quality;

(3.3) selection has the sub-classifier of top performance

Join in the strong classifier and go, and calculate its error rate

ϵ_{t} = \frac{1}{2} = Σ_{i = 1}^{n} w_{t, i} | {\overset{&OverBar;}{h}}_{t} (x_{i}) - y_{i} |,

The expression sub-classifier

Classification results to i sample;

(3.4) upgrade the pairing weights of each sample:

w_{t + 1, i} = w_{t, i} β_{t}^{1 - e_{i}}

Wherein, w _{T+1, i}The weights of representing i the sample in t wheel training renewal back; If

To i sample x _iCorrect classification, then e _i=0, on the contrary e _i=1,

β_{t} = \frac{ϵ_{t}}{1 - ϵ_{t}} .

(4) assembled classifier step.The strong classifier that generates is:

Wherein best sub-classifier weights

α_{t} = \log \frac{1}{β_{t}} .

(5) optimize the sorter step.By the method for feedback, adjust the sorter coefficient, strengthen the sorter performance.The present invention uses the feedforward neural network method of feedback, and the training error of tolerance network is adjusted neural network weight according to error, makes the sorter performance reach optimum.

The sub-classifier formal description of the present invention's design is as follows:

The design of sorter is actually a kind of strategy of optimizing, the least squares error method is to seek the sorter of square error minimum, the Fisher linear discriminant is a sorter of seeking divergence ratio maximum between class scatter and the class, neural network is to seek error function sorter hour, the sorter of maximization class interval during support vector machine.Equality when but each sample is considered in the design of these sorters, but under the AdaBoost framework, closely-related during the performance of the weights of sample and front sorter, those wrong samples that divides should more be absorbed in the design of sorter, be the bigger samples of weights, the weights wrong cost of dividing of sample that varies in size is different, so the present invention least squares error method of using realizes that promptly j feature generates corresponding sub-classifier h according to the least squares error method _jClassification function is that weights d by each component of classifying face describes in the least squares error method.With m _tRepresent the number of samples that sampling obtains in the current training, then the j of current sample is characterized as x _Rjsj, 0＜r＜m _t, 0＜s _j＜g _j, g _jIt is the dimension of j feature.D={d then _s, 0＜s _j＜g _jThe sub-classifier of j feature correspondence obtains by following method:

(1) the preliminary classification function d is set ₀, be set to less non-zero random number, b={b _r, 0＜r＜m _tThe allowance of expression sample, the present invention is with b _rAll be changed to 1;

(2) the new sample of input;

(3) u input samples are

x_{uj} = {x_{uj s_{j}}}, 0 < s_{j} < g_{j};

(4) revise classification function, press the following formula correction:

d←d+η(u)ω _u(b _u-d ^tx _uj)x _uj；

(5) if the error in classification of classification function less than certain threshold value, learning process finished when perhaps iterations reached certain value, otherwise forwarded (2) to.

The adjusting range of classification function adopts η (u)=η (0)/(u+1), wherein η (0)=0.002 in the u time iteration of η (u) control among the present invention; ω _uBe the weights of u input sample; d _u ^tExpression d _uTransposition; The end of learning with iterations control among the present invention, iteration stopping when iterations reaches 10000 times.

Above-mentioned algorithm has just obtained the sub-classifier of j feature correspondence, the described sub-classifier design formula of corresponding step (1)

θ_{j} = 0,

Be d ^tx _UjThe sub-classifier of representing j feature correspondence at＞0 o'clock is positive sample, d to u sample classification result ^tx _UjThe sub-classifier of representing j feature correspondence at＜0 o'clock is negative sample to u sample classification result.

Can guarantee that like this sample is identical, obtain different classification under the different situation of corresponding weights, could guarantee so can always not select identical sorter, and this algorithm is fit to the situation that is characterized as numerical value and is characterized as the vector.

The sorter performance evaluation criterion of the hierarchical description that the present invention proposes is described below:

Therefore how the quality of sorter performance is most important in pattern classification, and the performance of interpretive classification device is directly connected to the quality of classification.In the present invention, good more description criterion guarantees that the sub-classifier performance that each circulation obtains is good more, and it is just few that combination obtains the sorter quantity that the sorter of identical performance needs, and the round-robin number of times is just few, has reduced the time of training.

The structural representation of the sorter performance evaluation criterion of hierarchical description as shown in Figure 2.

During the actual enforcement of the present invention, as long as find best sorter, and do not need all sorter performances are all described in detail, for the interpretational criteria of sorter is described, we are described in detail each sorter performance, and just do not find best sub-classifier.Sorter performance criteria for different situations is described below:

(1) error rate of different under many circumstances sub-classifiers is different, uses the performance that the error rate performance criterion just can the interpretive classification device;

(2) when a plurality of sub-classifier of same error rate, original error rate performance criterion is the quality of interpretive classification device well, the performance of using the performance evaluation criterion that adds extent of error to come the interpretive classification device;

(3) when the performance evaluation criterion of the adding extent of error of a plurality of sub-classifiers is identical, add the performance that generalization ability comes the interpretive classification device, promptly use to add the sorter interpretational criteria that extent of error tolerance and generalization ability are described combination.

Last one deck of the sorter performance evaluation criterion of hierarchical description performance of descriptor sorter fully just like this.

The superiority of the sorter performance criteria that proposes among the present invention is described with simple DATA DISTRIBUTION below.Fig. 3 has shown the distribution situation of two groups of data, and positive negative sample shown in Fig. 3 (a) is that ideal uniform distributes, and the positive negative sample of Fig. 3 (b) is a Gaussian distribution.Wherein negative sample from 0.1 to 1.1 evenly distributes among Fig. 3 (a), and positive sample from 0.9 to 1.9 evenly distributes; Negative sample is that average is that 0.6 variance is 0.1 Gaussian distribution among Fig. 3 (b), and positive sample is that average is that 1.4 variances are 0.1 Gaussian distribution.

The performance criteria of sorter is the error rate of classification in the former algorithm, adopts the sub-classifier h of j feature correspondence _jError rate to all sample classifications is described ε _jFor:

ϵ_{j} = \frac{1}{2} Σ_{i}^{n} w_{t, i} | h_{j} (x_{i}) - y_{i} |

Wherein n represents the number of sample; x _iThe eigenwert of expression training sample; y _i=1, corresponding negative sample of 1} and positive sample; w _{T, i}The weights of i sample when representing the training of t wheel; h _j(x _i) sorter of j feature correspondence of expression is to the classification results of i sample; y _iThe classification of representing i sample, promptly i sample is positive sample or negative sample.Weights with each sample in once circulating are all fixed, and are irrelevant with feature.Use this criterion as shown in Figure 4, Fig. 4 (a), Fig. 3 (a) in Fig. 4 (b) difference correspondence, Fig. 3 (b) figure to above-mentioned classification of Data device performance specification.

Only consider that as can be seen from Figure 4 the sorter performance that error rate is described is poor, may can not get best sorter at the error rate same case, the sorter of the middle minimal error rate correspondence of Fig. 4 (a) and Fig. 4 (b) all has a plurality of, and the error rate of non-minimum also has a plurality of situations among Fig. 4 (b).When the present invention then attempts using the mistake of statistics sample, add extent of error tolerance.The sub-classifier h of j feature correspondence _jAdding extent of error tolerance em is described _jForm is as follows:

{em}_{j} = \frac{1}{2} Σ_{i}^{n} w_{t, i} | h_{j} (x_{i}) - y_{i} | | f_{j} (x_{i}) - θ_{j} |

Wherein n represents the number of sample; x _iThe expression training sample; y _i=1, corresponding negative sample of 1} and positive sample; w _{T, i}The weights of i sample when representing the training of t wheel; f _j(x _i) expression sub-classifier h _jTo sample x _iClassification response; θ _jThe presentation class threshold value; | f _j(x _i)-θ _j| big more expression separates the class threshold value or the classifying face distance is far away more, and then extent of error is big more.Use this criterion as shown in Figure 5 to the sorter performance specification of raw data, (a) (b) respectively (a) in the corresponding diagram 3 (b) scheme.

Found desirable sorter among Fig. 5 (a) as can be seen, but Fig. 5 (b) still can not obtain best sorter.Therefore under the identical but non-vanishing situation of sample error rate, adding extent of error tolerance is a feasible method, but the sample error rate then lost efficacy when being all zero.The present invention has considered the generalization ability description of sorter, add the tolerance of sorter generalization ability when the sample error rate is zero, and when the sample error rate was non-vanishing, generalization ability is described as constant, and the judgment criterion form is as follows:

G wherein _jThe sub-classifier h that represents j feature correspondence _jGeneralization ability is described; f _j(x ₊) expression separates the response of the nearest positive sample of class threshold value or classifying face to current sub-classifier; f _j(x _-) expression separates the response of the nearest negative sample of class threshold value or classifying face to current sub-classifier; θ _jThe presentation class threshold value.When generalization ability is described G _jHour sorter performance is good more more.Use this criterion to the sorter performance specification of raw data as shown in Figure 6, Fig. 6 (a), Fig. 6 (b) be the Fig. 3 (a) in the corresponding diagram 3 respectively, Fig. 3 (b).

The sorter performance evaluation criterion form that adding extent of error tolerance of last one deck of sorter performance evaluation criterion of whole layering and generalization ability are described combination is as follows:

Error _j＝em _j+G _j

Error wherein _jThe sub-classifier h that represents j feature correspondence _jThe sorter performance that adding extent of error tolerance of last one deck of hierarchical description and generalization ability are described combination, em _jThe performance specification that adds extent of error tolerance, G _jRepresent this sorter generalization ability description.Error _jMore little, its corresponding sub-classifier performance is good more.The sorter performance evaluation criterion of whole layering is described as shown in Figure 7, Fig. 7 (a), the Fig. 3 (a) in Fig. 7 (b) difference corresponding diagram 3, Fig. 3 (b).

From the above-mentioned experiment sorter performance evaluation criterion performance of interpretive classification device better of hierarchical description as can be seen.

By each sub-classifier coefficient of feedback adjusting, improve the classification performance of sorter integral body at last.The principle of passing through the better combination of feedback sub-classifier that the present invention proposes is as follows:

The method for designing that does not have theory to prove to depend in the AdaBoost classical way to obtain the sample error rate of sorter is optimum, and clearly Zui You classifiers combination method can reduce the number from sorter.Therefore need to seek optimum classifiers combination method, the present invention uses the feedforward neural network method of feedback, and the error of tolerance network is adjusted neural network weight according to error, makes the sorter performance reach optimum.

The neural network that the present invention uses adopts the learning algorithm that supervision is arranged, and it is known promptly being used for the classification of the sample learnt.When importing learning sample successively, network is revised weights with the deviation of desired output according to neuronic actual output with iterative manner, finally obtains the weights of expecting.In actual the enforcement, use the single-layer perceptron network to realize that the sub-classifier of selecting is as a neuron of perceptron.Sorter form before optimization of the present invention's combination is

Z () = \{\begin{matrix} 1 & Σ_{t = 1}^{C} α_{t} {\overset{&OverBar;}{h}}_{t} () &GreaterEqual; 0 \\ 0 & otherwise \end{matrix} .

Assembled classifier weights α={ α _t, t=1 ..., C, α _tThe optimal classification device weights that expression is used to make up.Total C the neuron of the single-layer perceptron network that the present invention uses, use this method refreshing weight α specific algorithm as follows:

(1) obtains initial weight α={ α _t, t=1 ..., C makes k=1;

(2) input learning sample x _k

(3) calculate neuronic actual output.If the learning sample of the k time input is x _k, the weights that are connected with t neuron are α _t, t neuronic actual being output as For the t that selects in the training classifier step best sub-classifier to sample x _kClassification results.

(4) refreshing weight.Because each sample has only two kinds of situations of positive negative sample, so C neuronic desired output is y _k, weights are pressed following formula and are upgraded:

α = α + λ Σ_{t = 1}^{C} (y_{k} - {\overset{&OverBar;}{h}}_{t} (x_{k})) x_{k};

(5) if adopt weights α to obtain correct classification results, finish when perhaps k is more than or equal to iterations, otherwise k=k+1 forwards step (2) to.

Wherein adjust the size that parameter lambda ∈ (0,1) control weights are adjusted, if the λ value is too big, vibration may appear in algorithm; If the λ value is too little, speed of convergence is very slow.Study end when k surpasses iterations 2000 among the present invention.

The experimental data that the present invention uses is described below: totally 5000 samples, and positive and negative each 2500, distribute as shown in Figure 8, each sample is made up of two features, and the dimension of each feature is 1.According to flow process of the present invention, use the least squares error method to design sub-classifier, sorter performance evaluation criterion with layering is selected best sorter, the present invention has compared the sorter performance of optimizing and not have to optimize then: through (the present invention λ in this experiment gets 0.002) classifiers combination performance after the Feedback Neural Network optimization and the classifiers combination performance that do not have optimization more as shown in Figure 9, the sample data of corresponding diagram 8.Figure 10 shows above the comparable situation behind ten sorters.

From the experimental result of Fig. 9 and Figure 10 as can be seen, sorter after optimization classification back error sample number little than before optimizing, and it is apparent in view in the sorter before optimization to increase situation about increasing behind the sub-classifier, sorter after the optimization has then slowed down a lot, optimizes in theory and can eliminate this situation fully.Behind feedback adjusting sorter coefficient, the performance of sorter is improved.

Claims

1. categorizer integration method may further comprise the steps:

(1) gives training sample weights initialize;

(2.1) make t=1 cycle of training;

(2.4) upgrade the training sample weights according to error rate;

(2.5) if t＜C, t=t+1 returns step (2.2), otherwise enters step (3);

(3) C best sub-classifier of combination;

It is characterized in that,

{em}_{j} = \frac{1}{2} Σ_{i = 1}^{n} w_{t, i} | h_{j} (x_{i}) - y_{i} | | f_{j} (x_{i}) - θ_{j} |,

{em}_{j} = \frac{1}{2} Σ_{i = 1}^{n} w_{t, i} | h_{j} (x_{i}) - y_{i} | | f_{j} (x_{i}) - θ_{j} |,

2. categorizer integration method according to claim 1 is characterized in that, described method also comprises step (4): adjust the assembled classifier weights, be specially,

(4.1) obtain assembled classifier weights α={ α _t, t=1 ..., C, α _tThe best sub-classifier weights that expression is used to make up make k=1;

(4.2) input known class training sample x _k

(4.3) obtain the employing assembled classifier to training sample x _kClassification results

(4.4) refreshing weight

α = α + λ Σ_{t = 1}^{C} (y_{k} - {\overset{&OverBar;}{h}}_{t} (x_{k})) x_{k},

λ is the adjustment parameter,

The best sub-classifier of selecting in t cycle of training for described step (2) is to sample x _kClassification results, y _kThe best sub-classifier of selecting in t cycle of training for described step (2) is to sample x _kThe expectation classification results;

(4.5) if adopt weights α to obtain correct classification results or k, finish, otherwise k=k+1 forwards step (4.2) to more than or equal to iterations.