CN102521599A

CN102521599A - Mode training method based on ensemble learning and mode indentifying method

Info

Publication number: CN102521599A
Application number: CN2011103033624A
Authority: CN
Inventors: 唐胜; 韩淇; 张勇东; 李锦涛
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2011-09-30
Filing date: 2011-09-30
Publication date: 2012-06-27

Abstract

The invention provides a mode training method based on ensemble learning and a mode indentifying method. The mode training method comprises the following steps of: 1) carrying out dictionary learning on training samples to generate a redundant dictionary; 2) utilizing the redundant dictionary to carry out sparse encoding on the training samples to obtain a sparse encoding coefficient of each training sample; 3) carrying out sparse subspace division on all the training samples according to the sparse encoding coefficients; and 4) carrying out sub-model training on the training sample in each sparse subspace to obtain a sub-model for classifying. According to the mode training method based on the ensemble learning and the mode indentifying method, provided by the invention, higher indentifying performance can be obtained; and meanwhile, the training efficiency and the detection efficiency can be obviously improved.

Description

A kind of pattern drill and recognition methods based on integrated study

Technical field

The present invention relates to the intelligence system field, more specifically, relate to pattern-recognition and machine learning field.

Background technology

For the very high data of dimension, classic method is trained the detection model of cognition that comes out on small data set, be difficult to include various possible sample situation, on open data set, promotes poor performance, accuracy of detection is low.Particularly for the image/video data that rapidly increase on the internet; Not only intrinsic dimensionality is high; And have characteristics such as broad covered area, content is various, renewal is fast; Press for the pattern learning method of research more,, improve the accuracy of detection of algorithm on open multi-medium data collection to include the various samples that possibly occur as far as possible towards extensive training dataset.

The article " Monocular Pedestrian Detection:Survey and Experiments " that M.Enzweiler in 2009 and D.M.Gavrila deliver on the 2179-2195 page or leaf of IEEE Transactions on Pattern Analysis and Machine Intelligence shows through the research of pedestrian detection in to image: select best characteristic and pattern classifier to make up, it is obvious that institute's profit brought does not increase training sample set institute profit brought.Also explained thus towards the necessity of the pattern learning method research of extensive training dataset.

But extensive training dataset has proposed new challenge for the traditional mode learning method: (1) schema category is diversity, and distance is very big in the class of same pattern, causes accuracy of detection low; (2), thereby detection speed is descended because the increase of training sample number causes the optimal classification face very complicated; (3) since the time complexity of model training usually between O (n ²) to O (n ³) between; Wherein n is the training sample number; If consider the restriction of actual memory again, when the training sample number is increased to hundreds of thousands even when up to a million, the classic method of the single model of training is one and is difficult to stand or even almost impossible thing on whole training set.

For extensive training sample set, because training sample is too many, training and detection speed can significantly descend.In order to raise the efficiency, integrated study (Ensemble Learning) method is taked the strategy of " dividing and rule ", takes different strategies to be divided into different subsets extensive training set, then the corresponding submodel of training on each subclass; Adopting diverse ways to merge, calculate the integrate score value that detects sample, thereby provide unified differentiation during detection the score value that gets on each submodel.Famous international top-level meeting Advances in Neural Information Processing Systems of nineteen ninety-five and 1996 and up-to-date correlative study achievement show: a good integrated classifier is because the knowledge between each sorter is complementary; Decision-making is independent; The mistake that indivedual sorters bring can not propagate in the integrated classifier because of unrelated, thereby integrated classifier is more more effective than single sorter.And, since when training each subclass number of samples much smaller than the training sample sum, thereby the integrated study method can significantly reduce memory cost and improve training effectiveness, simultaneously because the optimal classification face on the subclass is simple, thereby can improve detection efficiency.For example, if extensive training dataset is divided into the k sub-set, then each subclass is carried out SVMs (SVM) training.Because the training sample number of each subclass reduces to n/k during training, therefore the SVM model training time complexity on the single subclass is merely O (n ²/ k ²) to O (n ³/ k ³) between.Therefore, the training time complexity O (n of the SVM model of all k sub-set ²/ k) to O (n ³/ k ²) between.With respect to single SVM model training method, training effectiveness has improved k to k ²Doubly.Owing to the minimizing of training sample number on the single subclass, therefore the support vector number of SVM model also can reduce on the single subclass, thereby has improved detection speed simultaneously.

Though the integrated study method is different, fusion method was different when the difference of subset division strategy was with detection when their key distinction was to train.Integrated study method the earliest has Bagging method, Boosting and the Adaboost method of random division subclass.Average fusion method is mainly adopted in fusion between the different sub classifier result, is about to detect the mean value that gets score value of sample on all sub-classifiers as the integrate score value.Because average fusion method needs whole submodels to participate in the detection of sample, thereby be difficult to further improve detection speed.Submit on October 9th, 2009; Application number is that following digital image training and detection method have been put down in writing in 200910092710.0 patented claim " a kind of digital image training and detection method "; At first training sample set is carried out cluster analysis; Sample set is divided into a plurality of subclass; To each trained SVM submodel, merge the testing result of confirming to detect sample at the weight coefficient on each subclass (sample belongs to the degree coefficient of subclass) thereby to the testing result on a plurality of SVM submodels according to detecting sample during detection.But this weight coefficient can't guarantee its sparse property, and therefore the efficient of training and fusion awaits further raising, and wherein sparse sign nonzero coefficient number is less.

Summary of the invention

The purpose of this invention is to provide a kind of pattern drill and recognition methods, to improve the speed and the pattern-recognition accuracy of pattern drill and identification based on integrated study.

According to one aspect of the invention, a kind of pattern drill method based on integrated study is provided, comprising:

1) training sample is carried out dictionary study, generate redundant dictionary;

2) utilize said redundant dictionary that said training sample is carried out sparse coding, obtain the sparse coding coefficient of each training sample;

3) according to said sparse coding coefficient all training samples being carried out sparse subspace divides;

4) carry out the submodel training for the training sample in each sparse subspace, obtain the submodel that is used to classify.

According to a further aspect of the invention, a kind of mode identification method according to above-mentioned pattern drill method is provided also, has comprised:

1) utilizes said redundant dictionary to carry out sparse coding, obtain to detect the sparse coding coefficient of sample detecting sample;

2), utilize selected submodel identification to detect sample according to the said submodel of said sparse coding coefficient selecting;

3) recognition result that merges selected submodel carries out the identification of said detection sample.

Compare with existing method, effect of the present invention is: can obtain higher recognition performance, can significantly improve training effectiveness and detection efficiency simultaneously.

Description of drawings

Fig. 1 is pattern drill and testing process figure in accordance with a preferred embodiment of the present invention;

Fig. 2 is the sparse subspace image division synoptic diagram based on Sparse NMF in accordance with a preferred embodiment of the present invention;

Fig. 3 is that the AP contrast synoptic diagram with prior art is divided in accordance with a preferred embodiment of the present invention sparse subspace;

Fig. 4 .1.a-Fig. 4 .4.b is a sparse subspace division effect synoptic diagram on TRECVID 2008 training datasets in accordance with a preferred embodiment of the present invention;

Fig. 5 is according to the sparse coding fusion of another preferred embodiment of the present invention and the AP contrast synoptic diagram of prior art;

Fig. 6 is the AP contrast synoptic diagram according to the digital picture test experience of another preferred embodiment of the present invention and prior art.

Embodiment

In order to make the object of the invention, technical scheme and advantage clearer, below in conjunction with accompanying drawing, to according to an embodiment of the invention based on the pattern drill and the recognition methods further explain of integrated study.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.

The present invention has made full use of the dirigibility and the sparse characteristic of sparse coding and has carried out pattern drill and identification, carries out simple declaration in the face of sparse coding down.

Signal sparse coding (Sparse Coding) or rarefaction representation (Sparse Representations) based on redundant dictionary are that a kind of new signal indication is theoretical; Adopt ultra complete redundancy functions system (redundant dictionary) to replace traditional orthogonal basis function; For example with fixedly Orthogonal Wavelets of data independence etc.; The main information of trying hard to adopt the linear combination of the least possible nonzero coefficient and the base vector of redundant dictionary (sparse base) to come expression signal, thus the solution procedure of signal processing problems simplified.Represent to compare with traditional fixedly orthogonal basis, rarefaction representation is that the sparse expansion in signal adaptive ground provides great dirigibility.This sparse expansion both can realize the high efficiency of data compression, the more important thing is the physical feature of the redundancy properties seizure original signal that can utilize dictionary.Therefore, in recent years, rarefaction representation began to be applied in image/video denoising, image classification, recognition of face and other computer vision and area of pattern recognition, became the research focus.

Be example below with the image detection, specify of the present invention in conjunction with Fig. 1 based on the pattern drill of integrated study and the concrete implementation procedure of recognition methods.

As shown in Figure 1, training process comprises that mainly dictionary study, sparse coding, sparse subspace are divided and the step of submodel training.

(1) dictionary study: extensive training dataset being carried out dictionary study, generate the redundant dictionary relevant with training data, also is basis matrix.Redundant dictionary study adopts sparse nonnegative matrix to decompose (Sparse Non-negative Matrix Factorization, Sparse NMF) method in the preferred embodiments of the present invention.

Nonnegative matrix is decomposed (Non-negative Matrix Factorization; NMF) because of only using stack without subtraction; Than principal component analysis (PCA) of representing based on the overall situation (Principal Component Analysis PCA) and vector quantization (Vector Quantization VQ), stronger local expression ability is arranged.The correlative study conclusion is illustrated among the NMF, and basis matrix or matrix of coefficients are carried out the constraint of sparse property, can form the subspace that has more the local expression ability.For this reason, the present invention decomposes nonnegative matrix and carries out the constraint of sparse property, promptly adopts sparse nonnegative matrix to decompose, and forms the sparse subspace that has more the local expression ability, and in order to training set is divided, thereby improve accuracy of detection.

If

The data matrix that expression is made up of N width of cloth view data, wherein x _iIt is the column vector that is characterized as element by i width of cloth image.One of ordinary skill in the art will appreciate that, because the dimension of view data is very high, thus directly do not adopt view data, and after preferably image being extracted characteristic, adopt the characteristic of image to form data matrix.According to a preferred embodiment of the invention, adopt the vision keyword feature based on SIFT, totally 500 tie up, this characteristic derives from the vision keyword feature that Columbia University announces on the internet.

Similar with X, establish

The basis matrix that expression is made up of width of cloth base image, k=800 for example wherein, then this basis matrix D is redundant dictionary; α=[α ₁α ₂α _N] be sparse matrix of coefficients, then the NMF of training sample image decomposes and can be expressed as:

X≈αD，s.t.D＞0，α＞0 (1)

Then finding the solution of D and α can be converted into the minimized optimization problem of reconstructed error:

\min_{D, α} Σ_{i = 1}^{N} (\frac{1}{2} {| | X - αD | |}^{2}) = \min_{D, α_{i}} Σ_{i = 1}^{N} (\frac{1}{2} {| | x_{i} - α_{i} D | |}^{2}) - - - (2)

Generally, piece image is only by minority base image overlay, i.e. column vector α _iIn the non-zero entry number seldom, can be with α _iL ₀Normal form

Add (2) formula as penalty factor, get finding the solution of Sparse NMF and be converted into following optimization problem:

\min_{D, α_{i}} Σ_{i = 1}^{1} (\frac{1}{2} {| | x_{i} - α_{i} D | |}^{2} + λ {| | α_{i} | |}_{0}), s . t . D > 0, &ForAll; α_{i} > 0 - - - (3)

Because l ₀Finding the solution of normal form is a np problem, can be converted into l usually ₁Normal form is found the solution,

\min_{D, α_{i}} Σ_{i = 1}^{N} (\frac{1}{2} {| | x_{i} - α_{i} D | |}^{2} + λ {| | α_{i} | |}_{1}), s . t . D > 0, &ForAll; α_{i} > 0 - - - (4)

Utilize above-mentioned formula (4); Preferably, can adopt in Online learning for matrix factorization and sparse coding one literary composition that Julien Mairal, Francis Bach, Jean Ponce and Guillermo Sapiro deliver disclosed on-line study method (Online Dictionary Learning) to calculate redundant dictionary.The advantage speed of this on-line study method is fast, not only can be to the mass data batch processing, and can also handle in real time dynamic data.One of ordinary skill in the art will appreciate that; Except above-mentioned on-line study method; Can also adopt the off-line learning method; K.Engan in 1999 for example; S.O.Aase and J.Hakon Husoy are at ICAS SP (IEEE Int.Conf.Acoust.; Speech, Signal Process) disclosed KSVD method etc. in An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation.Signal Processing one literary composition that disclosed MOD method and M.Aharon, M.Elad and A.Bruckstein deliver on ICASSP ' 06 in Method of optimal directions for frame desig one literary composition of delivering on ' 99.

In above sparse NMF solution procedure, because α _iDimension very big, be 800 the dimension, so preferably, limit alpha _iIn the nonzero element number.Further preferably, limit alpha _iIn the nonzero element number be no more than 20; If surpass 20, then the non-zero entry smaller elements put 0.

(2) sparse coding: utilize above-mentioned redundant dictionary D that the training sample data are carried out sparse coding, obtain the sparse coding alpha of i training sample _i={ α _I1, α _I2..., α _Ik, wherein k is the number of sparse base, α _IjBe the coefficient of i training sample on j sparse base.Sparse coding can adopt various soft-thresholds (Soft Thresholding) method, or the LARS-Lasso method is found the solution.

According to a preferred embodiment of the invention, adopt the LARS-Lasso method to find the solution the sparse coding coefficient of training sample.Particularly, after accomplishing the finding the solution of redundant dictionary D, promptly D is that formula (4) can be converted into the Lasso problem under the known situation, and this problem can be found the solution with LARS-Lasso method efficiently.

(3) sparse subspace is divided: according to the sparse coding coefficient training sample set is divided into a plurality of sparse subspaces; If α _Ij≠ 0, then current sample is divided into the sparse subspace of j sparse basis representation.Can know that by sparse coding the nonzero element number in the sparse coding coefficient is considerably less, thereby make the subspace number under each training sample considerably less, reduce the subspace and divided redundancy, improve the training effectiveness of model on each subspace.

(4) submodel training: for the training sample of each sparse subspace, adopt SVMs (SVM) to train, generate a plurality of sub-classifier models.One of ordinary skill in the art will appreciate that the submodel training also can be adopted other learning methods, for example neural network, decision tree etc.

One of ordinary skill in the art will appreciate that; The method that adopts nonnegative matrix to decompose in the preferred embodiment is carried out dictionary study; But this is not the sole mode of dictionary study, can directly adopt above-mentioned on-line study method or off-line learning method to carry out dictionary study yet.

As shown in Figure 1, identifying mainly comprise sparse coding, submodel select and detect, based on the fusion of sparse coding with finally discern this 4 steps.

(1) sparse coding: utilize the redundant dictionary D that generates in the training process, adopt the sparse coding method identical to carry out sparse coding, obtain detecting the sparse coding alpha of sample with training process to detecting sample data _{I '}={ α _{I ' 1}, α _{I ' 2}..., α _{I ' k}.

(2) submodel is selected and is detected: only select the corresponding submodel of nonzero element in the sparse coding coefficient, i.e. { m _j| α _{I ' j}≠ 0, m _jBe the submodel on j the sparse subspace }, detect one by one detecting sample respectively, obtain detecting sample corresponding score value { s on above-mentioned each submodel _{I ' j}| α _{I ' j}≠ 0, s _{I ' j}For detecting sample submodel m on j sparse subspace _jCorresponding score value }.Nonzero element more also can only be selected the pairing submodel of bigger element in the sparse coding coefficient, for example only selects preceding 20 the pairing submodels according to the element after the big minispread.

(3) based on the fusion of sparse coding: with described non-zero sparse coding coefficient { α _{I ' j}| α _{I ' j}≠ 0} and the corresponding score value { s that gets _{I ' j}| α _{I ' j}≠ 0} weighted mean:

Wherein: m is α _{I '}The number of middle non-zero entry

(4) identification is with this mean value S _{I ' j}Detect the integrate score value of sample i ' as this, and detect the identification of sample according to this integrate score value.

One of ordinary skill in the art will appreciate that above-mentioned identifying is not unique implementation, can also adopt other modes, for example the mark that is obtained on all submodels is on average merged.

In order to verify the beneficial effect of the inventive method, utilize according to the preferred embodiment of the invention method to carry out semantic concept detection (being semantic classification) experimental verification on the international video frequency searching evaluation and test TRECVID large-scale dataset at 2008,2009.TRECVID2008 and 2009 has identical training set (have 109 hours video, contain 43616 key frames) and 10 common semantic concepts.TRECVID 2009 has changed notion and 10 the too high notions of occurrence frequency that some occurrence frequencies are little among the TRECVID 2008; And at TRECVID 2008 test sets (109 hours videos; Totally 35766 camera lenses) increase by 180 hours video on the basis, reached 289 hours videos (totally 97150 camera lenses).Positive routine number (#Pos) that both semantic concept titles (Concept), numbering (ID) and each semantic concept occur in training set and the positive routine number (#Hit) that in test set model answer file, occurs are as shown in table 1, and " * " in the table representes the semantic concept that both are common.

The semantic concept collection of table 1TRECVID 2008 (TV08) and 2009 (TV09)

On this data set, three groups of experiments have been accomplished altogether: the validation verification that (1) sparse subspace is divided; (2) validation verification that merges based on sparse coding; (3) with the contrast of existing method.In these experiments, adopt the evaluation index of the general evaluation index AP of TRECVID as each semantic concept detection performance, AP is an overall target that can reflect recall ratio and precision ratio simultaneously, the AP value is big more, explains that the recognition performance of system under test (SUT) is good more.Respectively above-mentioned three groups of experiments are explained below:

1, the validation verification experiment of sparse subspace division

In order to verify the validity of the sparse subspace division methods that the present invention proposes; In this group experiment, adopt the Bagging integrated study method and the method for the present invention of single SVM method (on whole training set, only training a SVM), random division subclass to compare; Wherein in order effectively to contrast, the sub-set size that the sparse subspace that each sub-set size of random division and the present invention propose in Bagging is divided is in full accord.Experimental result is as shown in Figure 3, and experimental result shows: (1) is compared with traditional single SVM method, and the average A P value (be MAP) of the inventive method during 20 semantic concepts detect on the TRECVID2008 data set improved 8.5%, promptly brings up to 0.141 from 0.130.Wherein, 13 tangible object class notions of local feature are arranged; Like Dog, Bus, Telephone, Hand, Boat_ship, Flower etc.; Its AP value is obviously high than single SVM method, and its reason is that the sparse nonnegative matrix resolution that the present invention adopts forms the strong sparse subspace of local expression ability.Aspect efficient, model training speed has improved 4.2 times than single SVM method.Because the inventive method only selects corresponding a small amount of sparse subspace model to detect, average detected speed improves 5.6 times.

(2) compare with traditional Bagging integrated study method, the MAP of the inventive method in 20 semantic concepts detect improved 6.8%, promptly brings up to 0.141 from 0.132.Wherein, similar with above-mentioned experimental result, because sparse nonnegative matrix resolution forms the advantage of local expression ability to, for the tangible object class notion of local feature, its AP value is obviously high than Bagging method.

Fig. 4 .1.a-Fig. 4 .4.b divides the effect synoptic diagram according to the preferred embodiment of the present invention sparse subspace on this training dataset.Can find out that from this Fig. 4 .1.a for the division of ship (Boat_ship), the 159th sparse subspace mainly is reflection canoe (like wooden boat), Fig. 4 .1.b shows the 171st sparse subspace, and it mainly is a large ship.For the division of telephone set (Telephone), Fig. 4 .2.a illustrates the situation of hand-held receiver of (the 258th sparse subspace) main reflection and mobile phone, the situation of the main reflection office of Fig. 4 .2.b (the 271st sparse subspace) desk telephone.For the division of demonstration (Demonstration-Or-Protest) this semantic concept, Fig. 4 .3.a (the 32nd sparse subspace) is the demonstration of reflection close shot, and Fig. 4 .3.b (the 20th sparse subspace) is the demonstration of distant view.Equally, for the division of this semantic classes of townscape (Cityscape), Fig. 4 .4.a (the 37th sparse subspace) is the townscape of reflection close shot, and Fig. 4 .4.b (the 185th sparse subspace) is the townscape of distant view.The validity of the sparse subspace division that the present invention proposes has been described thus intuitively.

The validation verification experiment of 2, merging based on sparse coding

In order to verify the validity that merges based on sparse coding, the inventive method and average fusion method commonly used, classical Adaboost fusion method are compared.For convenient contrast, in experiment, all adopt identical aforementioned subspace division methods based on Sparse NMF, experimental result is as shown in Figure 5.Experimental result shows: the MAP of the inventive method during 20 semantic concepts detect on the TRECVID2008 data set improved 5.2% than average fusion method; Promptly bring up to 0.141 from 0.134; Improved 3.7% than Adaboost fusion method, promptly brought up to 0.141 from 0.136.Wherein, for the tangible object class notion of most of local feature, for example Dog, Airplane-flying, Hand, Boat_ship etc., the AP value of detection of the present inventionly improves obviously than other two kinds of fusion methods based on the sparse coding fusion method.It is more more effective than average fusion method, Adaboost fusion method based on the sparse coding fusion method to have explained that thus the present invention proposes.

3, with existing method synthesis contrast experiment

Based on TRECVID 2009 large-scale datasets 20 semantic concepts being trained and detect, and carried out contrasting (experimental result of Columbia University derives from TRECVID 2009 reports that this university submits to) with Columbia University testing result.Comparing result is as shown in Figure 6; Although the present invention has only used the part vision keyword feature of Columbia University; But the accuracy of detection AP value such as 13 semantic concept classifications such as Infant, Traffic-intersection, Doorway, Person-ridinga-Bicycle, Person-playing-amusical-instrument, Hand, Boat_ship, Singing is significantly higher than Columbia University; Total mean accuracy (MAP, 0.177) has improved 18.0% than Columbia University (0.150).

In a word; Pattern drill and recognition methods that the present invention proposes; The sparse characteristic that makes full use of sparse coding during training is divided into a plurality of sparse subspaces with extensive training sample set, adopts the non-zero sparse coding coefficient that detects sample that the score value that gets on the corresponding submodel is merged during detection.The introducing of sparse coding makes the subspace number under each training sample few as far as possible, divides redundancy thereby can reduce the subspace, improves training effectiveness; Make the related submodel of test sample book few as far as possible simultaneously, thereby improved testing efficiency.

Should be noted that and understand, under the situation that does not break away from the desired the spirit and scope of the present invention of accompanying Claim, can make various modifications and improvement the present invention of above-mentioned detailed description.Therefore, the scope of the technical scheme of requirement protection does not receive the restriction of given any specific exemplary teachings.

Claims

1. pattern drill method based on integrated study comprises:

2. method according to claim 1 is characterized in that, also comprises step before the said step 1): the characteristic of extracting said training sample; Said step 1) to 4) operation is all carried out according to the characteristic of said training sample.

3. method according to claim 1 is characterized in that, the said dictionary study of said step 1) is to adopt sparse nonnegative matrix decomposition, on-line study method or off-line learning method to carry out.

4. according to each described method in the claim 1 to 3, it is characterized in that said step 2) also comprise the threshold value that the nonzero element number is set, element minimum in the said sparse coding coefficient is put 0, equal said threshold value up to the number of nonzero element.

5. according to each described method in the claim 1 to 3, it is characterized in that said step 2) employing soft-threshold method or LARS-Lasso method.

6. according to each described method in the claim 1 to 3, it is characterized in that the said submodel training of said step 4) adopts SVMs, neural network or decision tree to train.

7. mode identification method that utilizes each said submodel of claim 1 to 6.

8. method according to claim 7 is characterized in that, comprising:

9. method according to claim 8 is characterized in that, said step 2) described in select said submodel for selecting the corresponding submodel of nonzero element in the said sparse coding coefficient.

10. method according to claim 8 is characterized in that, said step 2) described in select said submodel further to comprise:

Element in the said sparse coding coefficient sorts according to size;

Only select the corresponding submodel of the one or more elements in front, ordering back.