CN106529579A

CN106529579A - Improved construction method of robust AdaBoost classifier based on Ransac algorithm

Info

Publication number: CN106529579A
Application number: CN201610917782.4A
Authority: CN
Inventors: 曹万鹏
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2016-10-20
Filing date: 2016-10-20
Publication date: 2017-03-22

Abstract

The invention discloses an improved construction method of robust AdaBoost classifier based on Ransac algorithm, comprising: initializing parameter configuration, randomly selecting the quantity samples, adding the samples meeting the model sample, updating a classifier model, iterating to construct the next classifier model, calculating the classification correct rate, and finally selecting a classification model. According to the invention, the Ransac algorithm is introduced into the AdaBoost classifier model construction process and through the iterative modeling process, correct samples are searched. At the same time, a best classification model is selected from the classifier model completely based on AdaBoost algorithm. Through the above strategy, the invention realizes the construction of robust AdaBoost classifier based on Ransac algorithm, and the design of the obtained classifier model is completely free from the influence of an external point. Finally, the AdaBoost classifier model based on this method is used to verify the handwriting samples with a certain amount of external points. The experimental results show that the method proposed in this invention is better than the other two construction methods of AdaBoost classifier and achieves higher classification accuracy.

Description

A kind of robust AdaBoost grader construction methods based on Ransac algorithms of improvement

Technical field

The present invention relates to mode identification method, the robust based on Ransac algorithms of more particularly to a kind of improvement AdaBoost grader construction methods.

Background technology

It is that sample to be detected chooses optimal classification vacation from optional the classification that sorting algorithm is namely based on sorter model If it belongs to machine learning category in artificial intelligence, the very big concern of field correlative study person is attracted.People put into Substantial amounts of time and efforts research such as C4.5, SVMs, bayesian algorithm, AdaBoost algorithms and K- arest neighbors point The sorting algorithms such as class algorithm, and they are applied to into the different necks such as face recognition, person's handwriting checking, data analysis and medical application Domain.

Mono- words of AdaBoost are derived from the abbreviation of Adaptive Boosting (self adaptation enhancing), are by Yoav The machine learning Meta algorithm that Freund and Robert Schapire are proposed.The guideline of its design is to guarantee currently to train sample Originally there is highest nicety of grading.By by different Weak Classifiers, (so-called Weak Classifier refers to that nicety of grading is better than slightly here Random guess) reasonably combine, strong classifier is formed, it is although the nicety of grading of each Weak Classifier is not high, final Strong classifier obtains tremendous increase on classification performance.AdaBoost algorithms in the sense that be it is adaptive, by adjust By the wrong sample weights divided of Weak Classifier before whole, attention degree of the follow-up Weak Classifier to wrong point of sample improved, is realized finally The design of sorter model.This is based on, the appropriate design of one group of Weak Classifier can be combined into strong classifier, obtain one Gratifying nicety of grading on the whole.But, as anything all has dual character, although AdaBoost algorithm advantages It is numerous, but which is sensitive to exterior point, is easier in some cases to suffer from this and cause the degradation of grader overall performance, mistake Effect.This is because constantly the sample to correctly classifying is weighted, the especially continuous weighting to exterior point so that exterior point Weights rapid growth.Excessive exterior point weights cause classifier algorithm constantly to deviate to exterior point, and then away from most of normal sample This, inevitably causes the degradation of designed sorter model.

The sustainable growth for finding a kind of effective means to limit exterior point weights is AdaBoost algorithms design sorter model Middle problem demanding prompt solution.In recent years there has been proposed many different methods are used for suppressing the ceaselessly expansion of sample weights, Great majority are confined to sample weights and arrange this direction, and algorithm or excessively simple is not added with distinguishing to exterior point and normal sample；Or calculate Method is complicated, increases operation difficulty.And because the improper of threshold value setting has the misjudged disconnected situation of point and exterior point, this The correct classification of algorithm is hindered, inevitably causes the degradation of grader.For this reason, it is necessary to find a kind of effective means from The adverse effect of exterior point is removed in the structure of classification model.In this patent, mutually tied with AdaBoost algorithms using Ransac algorithms Close, the degradation problem occurred in solving the sorter model training that exterior point causes.

The content of the invention

For the defect and the deficiency of threshold setting method of traditional AdaBoost algorithms, propose a kind of improvement based on The robust AdaBoost grader construction methods of Ransac algorithms.Built in Weak Classifier different from other AdaBoost algorithms The simple means for using sample weighting and control of right in journey, this algorithm is by being incorporated into AdaBoost classification by Ransac algorithms In device model construction process, and correct sample is searched by the process of iterative model building, remove potential exterior point, effectively overcome existing There is sample weighting method in AdaBoost algorithms.

Meanwhile, strong this advantage of exterior point ability is removed by means of Ransac algorithms, be all based on AdaBoost algorithm structures Optimal classification model is chosen in the sorter model built, and need not only consider how the problem for sample weighting, and effectively Eliminate and degraded by the sorter model that exterior point causes.By above-mentioned strategy, finally realized based on Ransac algorithms herein Robust AdaBoost graders build, and the design of the sorter model of acquisition is not affected by exterior point completely.

A kind of robust AdaBoost grader construction methods based on Ransac algorithms of improvement, comprise the steps：

(1) initial parameter is arranged

This algorithm whole parameter is initialized, including：The maximum number for arranging disaggregated model is N_max；Arrange and currently just building point Class device model is ith, arranges i=1 when most starting；It is T that setting builds the maximum iteration time of each sorter model_max；Arrange The iterations for building each sorter model is j, arranges j=1 during beginning；The sample of each initial construction grader is set Number is M.

(2) a certain amount of sample is randomly selected

Random M sample of selection from sample set.

(3) built based on AdaBoost graders initial model

The sample selected based on these, using AdaBoost Algorithm for Training strong classifiers, i.e. so that sorter model is most accorded with This M sample before being fated.

(4) sorter model updates

In the remaining sample of sorter model Ci classification that AdaBoost algorithms build；Correctly divided by the model with all The sample of class builds sorter model Ci based on AdaBoost algorithms again；Judge to meet this new again with this new grader Whole samples of sorter model, update j=j+1；Repeat the above steps 4 are to step 6 until meeting the sample number of grader Ci No longer change or iterations have been over T_max；.

(5) build next sorter model

I=i+1 is updated, judges the sorter model set up whether more than N_max, such as not less than returning to step (2).

(6) classification accuracy rate is calculated

Calculate the corresponding sample classification precision of all classification device model.

(7) final classification model is selected

Compare the corresponding correct nicety of grading of all classification device model, the sorter model quilt with highest classification accuracy rate It is defined as the sorter model of final selection.

Compared with prior art, the present invention has following obvious advantage and beneficial effect：

(1) the robust AdaBoost grader construction methods based on Ransac algorithms of a kind of improvement of the invention, can be effective Exterior point interference is removed, the setting of sample weighting threshold value need not be not only carried out, and be avoided the improper of sample weighting threshold value setting There is the misjudged disconnected situation of point and exterior point.

(2) even if the present invention contains the situation of a large amount of exterior points in the sample using Ransac algorithms, still can be with robust Estimate sorter model parameter and the advantage that do not affected by exterior point, realize the accurate structure of disaggregated model.For checking one's duty Class algorithm, with the inventive method in person's handwriting checking test, identifies the handwriting and writes people's identity and judged, even if this paper algorithms exist When occurring in that 15% exterior point in sample, nicety of grading still can reach 95.67%.

Description of the drawings

Fig. 1 is a kind of robust AdaBoost graders structure side based on Ransac algorithms of improvement proposed by the invention Method functional block diagram；

Fig. 2 is the flow chart of method involved in the present invention；

Specific embodiment

The present invention will be further described with reference to the accompanying drawings and detailed description.

As shown in Figure 1, 2, the present invention discloses a kind of robust AdaBoost grader structures based on Ransac algorithms of improvement Construction method, comprises the following steps：

Step 1, initial parameter are arranged

This algorithm whole parameter is initialized, including：The maximum number for arranging disaggregated model is 100；Arrange and currently just building point Class device model is ith, arranges i=1 when most starting；It is 100 that setting builds the maximum iteration time of each sorter model；Arrange The iterations for building each sorter model is j, arranges j=1 during beginning；The sample of each initial construction grader is set Number is 50.In this experiment, according to experiment demand every time from the person's handwriting sample of random 200 different peoples of selection of HIT-MW Sample Storehouses This, and according to a certain percentage which part writing sample classification is mislabeled, a certain proportion of exterior point of artificial generation.

Step 2, a certain amount of sample are randomly selected

50 initial samples of random selection are concentrated from writing sample.Based on above-mentioned sample, using current writing sample Person's handwriting microstructure features are extracted as the sample characteristics of Weak Classifier in AdaBoost algorithms, and Weak Classifier is built based on SVM. This paper AdaBoost algorithms and the anti-exterior point interference performance of other AdaBoost algorithms are verified, is repeated this experiment repeatedly, is asked Take average nicety of grading of every kind of algorithm under the effect of different proportion exterior point.

Step 3, is built based on AdaBoost graders initial model

The writing sample selected based on these at random and its corresponding person's handwriting microstructure features, are built using SVM algorithm weak Grader, at the same using AdaBoost Algorithm for Training obtain strong classifier model so that the sorter model best suit it is current this The judgement of 50 writing samples.

Step 4, sorter model update

The remaining sample of sorter model Ci classification built based on AdaBoost algorithms；With all correct by the model The sample of classification builds sorter model Ci based on AdaBoost algorithms again；Judge again to meet with this new grader and be somebody's turn to do Whole samples of new sorter model, update j=j+1；Repeat the above steps 3 are to step 4 until meeting the sample of grader Ci Number no longer change or iterations have been over 100.

Step 5, builds next sorter model

I=i+1 is updated, whether judges the sorter model number of foundation more than 100, such as not less than returning to step 2.

Step 6, classification accuracy rate are calculated

Calculating is all based on the sorter model of this patent algorithm structure to whole writing samples in the writing sample storehouse Sample classification precision, nicety of grading Accuracy=(correct classification person's handwriting sample number)/(whole writing sample numbers).

Step 7, final classification model are selected

Compare the corresponding correct nicety of grading of all classification device model, with highest nicety of grading A_maxSorter model It is confirmed as the sorter model of final selection.

In order to verify effectiveness of the invention, this algorithm is applied to into person's handwriting confirmatory experiment, and is entered with other two methods Row compares, and the accuracy of three kinds of sorting techniques is as shown in table 1.

1 the inventive method of table and the comparative result of two kinds of recognition methods

1 data display of table set forth herein improvement AdaBoost sorting techniques and other two kinds of AdaBoost methods point Class precision, three kinds of methods are respectively：Context of methods, the common AdaBoost algorithms for not having control of right and fixed weight threshold The AdaBoost methods of control.As it can be seen from table 1 when no exterior point in sample or less exterior point, the classification of three kinds of methods Precision very close to.But proportion increases in the sample with exterior point, two methods niceties of grading receives very big negative in addition Affect, nicety of grading monotonic decreasing.And compared with both approaches, this paper algorithms are by the good removal exterior point of Ransac algorithms Jamming performance, after exterior point ratio is continuously increased, nicety of grading is still stable, with the obvious advantage to other two methods.In contrast to another Outer two methods, even if this paper algorithms occur in that 15% exterior point in the sample, nicety of grading still can reach 95.67%.

Claims

1. robust AdaBoost grader construction methods based on Ransac algorithms of a kind of improvement, it is characterised in that include as Lower step：

Step (1) initial parameter is arranged

This algorithm whole parameter is initialized, including：The maximum number for arranging disaggregated model is N_max；Setting is currently just building grader Model is ith, arranges i=1 when most starting；It is T that setting builds the maximum iteration time of each sorter model_max；Arrange and build The iterations of each sorter model is j, arranges j=1 during beginning；The number of samples for arranging each initial construction grader is M；

Step (2) amount sample is randomly selected

Random M sample of selection from sample set；

Step (3) is built based on AdaBoost graders initial model

The sample selected based on these, using AdaBoost Algorithm for Training strong classifiers, i.e. so that sorter model is best suited working as Front this M sample；

Step (4) sorter model updates

In the remaining sample of sorter model Ci classification that AdaBoost algorithms build；With all by the model correctly classification Sample builds sorter model Ci based on AdaBoost algorithms again；Judge again to meet the new classification with this new grader Whole samples of device model, update j=j+1；Repeat the above steps 4 to step 6 until meet grader Ci sample number no longer Change or iterations have been over T_max；

Step (5) builds next sorter model

I=i+1 is updated, judges the sorter model set up whether more than N_max, such as not less than returning to step (2)；

Step (6) classification accuracy rate is calculated

Calculate the corresponding sample classification precision of all classification device model；

Step (7) final classification model is selected

Compare the corresponding correct nicety of grading of all classification device model, the sorter model with highest classification accuracy rate is determined For the final sorter model chosen.