CN101807256B

CN101807256B - Object identification detection method based on multiresolution frame

Info

Publication number: CN101807256B
Application number: CN 201010134143
Authority: CN
Inventors: 张加万; 付磊; 张怡; 高中杰
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2010-03-29
Filing date: 2010-03-29
Publication date: 2013-03-20
Anticipated expiration: 2030-03-29
Also published as: CN101807256A

Abstract

The invention belongs to the field of computer vision and relates to an object identification detection method based on a multiresolution frame. The method comprises the following steps: combining a Speeded Up Robust Features (SURF) descriptor with a common simple model (Bag-of-word model) in text categorization and an SVM (support vector machine), thereby constructing a supervised-learning two-dimensional object categorizer based on SURF; and on the basis of the categorizer, combining the SVM categorizer based on SURF with the image multiresolution theory, thereby detecting the object in the spaces with different resolutions. The invention can effectively solve the problems caused by scale change, revolution change, translation change, illumination change, visual angle change and the like in object identification and detection. Under the conditions of a multiresolution object detection frame, the invention can effectively solve the problem of long detection time in using a moving window method, and accurately and quickly detect the position of the object.

Description

A kind of object identification detection method based on the multiresolution framework

Technical field

The invention belongs to computer vision field, relate to a kind of object identification detection method.

Background technology

The background technology that relates among the present invention has:

(1). fast robust (Speeded Up Robust Features, SURF) image local feature descriptor (seeing document [1]): SURF has yardstick and invariable rotary, the feature that has fine robustness for illumination variation, and than other local features, this feature has been taked different feature extraction modes, increased the extraction rate of feature, reduced simultaneously the dimension of describing, and has preferably feature specificity, reach good discrimination, the many aspects in computer vision had obtained application in recent years.

(2). in numerous object identification detection algorithms, object detection algorithm based on supervised learning can detect the object of wanting classification in the image quickly and accurately, and not the object detection of other classifications not out, i.e. recognition object when detecting, and this object detection algorithm can carry out object detection to single image, also can obtain reasonable detection effect for complex scene.For example, the people such as Agarwal (seeing document [2]) propose to use the Forstner point-of-interest just detecting sub-detected image spy, then by SnoW (Sparse Network of Winnows) learning framework training classifier, utilize this sorter to form the act of categorization figure of multiscale space, realize the detection of object by analytic activity figure.The people such as Dalal (seeing document [3]) propose to utilize the gradient direction histogram, realize the detection of people's face in conjunction with the svm classifier device, and reach good effect.But these algorithms still face a lot of problems and challenge, such as problems such as dimensional variation, rotation variation, translation variation, illumination variation, visual angle change.

Summary of the invention

The above-mentioned deficiency that the object of the invention is to the customer service prior art, providing a kind of can have the accurate object identification detection method that also detects fast object space of the complex conditions such as dimensional variation, visual angle change, rotation variation, brightness variation, partial occlusion.For this reason, the present invention adopts following technical scheme:

A kind of object identification detection method based on the multiresolution framework comprises object detection two parts under structure multiresolution framework sorter and the multiresolution framework, and step is as follows:

The first step: the positive and negative sample image of selecting to divide respectively type objects, positive sample is the sub-category subject image of wanting, and negative sample is any other subject image of non-this type objects, and establishing original image resolution is R, be configured to the training set T (R) of sorter;

Second step: training set T (R) is done respectively the sampling of different sampling rate σ, and the resolution that obtains image is respectively r=1, and 2,3 ... image training set T (r) under each resolution of R;

The 3rd step: obtain object classification device C (r) according to following method, finally obtain a hierarchical classifier H (r) who comprises each resolution, H (r) is made of R independent sorter C (r):

(a) from image training set T (r), extract respectively the SURF feature descriptor of all images in positive sample and the negative sample, positive and negative sample characteristics descriptor is stored respectively among two set; (b) utilize the K-means clustering method, the feature descriptor that aligns the negative sample collection carries out cluster analysis; (c) with the cluster combination of positive and negative sample set, form the characteristics of image dictionary; (d) according to the index of positive and negative sample set in cluster, obtain every width of cloth image clustering histogram, and according to the corresponding corresponding label of positive negative sample, histogram data and label are carried out the support vector machine training, obtain the sorter C (r) under the resolution r;

The 4th step: the image that image to be detected is extracted respectively R resolution different resolution according to the structure flow process of multiresolution sorter, then the test pattern of different resolution is extracted successively again the multi-scale image of different scale s, s=1,2,3 ... S, scale factor is β;

The 5th step: each metric space at each image in different resolution uses the window of formed objects to detect, and window uses different windows to detect between different resolution, and the window size of each resolution is (w _r, h _r)=(w, h)/a ^R-r, w, h are respectively the length of original resolution window and wide, a is fixed constant;

The 6th step: for each resolution r and yardstick s, the state of all initial detecting windows all is initialized as 1, use sorter C (r) with each metric space image of same window size detection resolution r, on metric space s, to detect is 0, the window area that does not namely comprise object abandons, be that 1 window area that namely comprises object keeps with detection, be delivered among the same metric space s of r+1 resolution and go, carry out successively, until detect highest resolution R, obtain the zone that different scale space s among the highest resolution R comprises object;

The 7th step: each metric space obtains the final position of object by the method with the Mean-shift cluster under original resolution R.

As preferred implementation, wherein the step (b) in the 3rd step comprises the following steps: to utilize the K-means clustering method, the feature descriptor of positive and negative sample set is gathered into respectively the K class, each self-forming K key word, and obtain the cluster index of feature descriptor in every width of cloth image, be which cluster each character symbol exists in the middle of, write down simultaneously the center value of each cluster centre.

The present invention is by summing up relative merits and the scope of application of current main object identification detection technique, according to the characteristics that the image local feature descriptor of relatively good performance is arranged in object classification identification at present, constructed a kind of object classification device based on local feature description's symbol, and the problem that faces in the implementation algorithm according to existing object detection technology, propose a kind of object classification device region-by-region that under image multiresolution framework, utilizes and carried out the method for object detection, obtained good target recognition detection effect.The problem that the dimensional variation that faces during the present invention can effectively solve object identification and detect, rotation variation, translation variation, illumination variation, visual angle change etc. are brought, under multiresolution object detection framework, can effectively solve the problem of using the object detection overlong time that the moving window method brings, accurately and detect fast the position of object.

Description of drawings

Fig. 1 is based on the object identification detection method overview flow chart of multiresolution framework;

(a) of Fig. 2, (b) (c) (d) (e) (f) image be respectively σ=0, Isosorbide-5-Nitrae, 16,64,256 multi-scale Representation;

Fig. 3 multiresolution framework;

Fig. 4 (a), (b) are respectively multiresolution object detection effect and traditional technique in measuring effect under 640 * 480 resolution.

The single resolution object of table 1 detects and the multiresolution object detection time table of comparisons.

Embodiment

The present invention proposes the object identification detection method based on the multiresolution framework, fast robust feature descriptor (SURF) is combined in conjunction with naive model Bag-of-word (word bag model) model and SVM (support vector machine) commonly used in the text classification, constructed the supervised learning binary object sorter based on the SURF feature; On the basis of above-mentioned sorter, will based on the theoretical combination of the multiresolution of the svm classifier device of SURF feature and image, under different resolution space, carry out the detection of object.The problem that the dimensional variation that faces during the present invention can effectively solve object identification and detect, rotation variation, translation variation, illumination variation, visual angle change etc. are brought, under multiresolution object detection framework, can effectively solve the problem of using the object detection overlong time that the moving window method brings, accurately and detect fast the position of object.

The present invention is based on the object identification detection method of multiresolution framework, and Fig. 1 is overview flow chart, specifically may further comprise the steps:

1. the structure of multiresolution framework sorter;

Below be the fast robust tagsort device flow process under the multiresolution framework:

(1). choose training set and test set for the sorter structure.(mainly use ETH-80 for object detection algorithm of the present invention from an image data base, Caltech101 and 2,006 three kinds of databases of The PASCAL Visual Object Classes Challenge, document [4] specifically sees reference, [5] and [6]) in select respectively to divide the positive and negative sample image of type objects, for object detecting system, the positive general employing of sample only comprises the image of this type objects, the least possible background that comprises, to reduce the interference of background, can carry out pre-service during positive sample set structure, positive sample is done some manually to be cut apart, only extract the positive sample of subject image conduct in the image, the general employing of negative sample does not comprise the image that object only comprises the image of background or comprises other objects;

(2). the resolution of establishing image is respectively r=1,2,3......R, R is original image resolution, the sampling rate of image is σ, and the sampling (can adopt simple partiting row sampling, sampling rate σ=0.5) that the training set T (R) under the former resolution R is made respectively different sampling rate σ obtains the image collection T (r) under each resolution, these image collections become respectively the training set under each resolution, have consisted of so a plurality of training set;

(3). under each resolution r, extract respectively SURF feature and the character symbol of each image among each training set T (r), by Bag-of-word model and SVM constructed object sorter C (r) separately under each resolution r, can adopt different number of clusters under the different resolution, so corresponding sorter under each image resolution ratio can arrange the threshold value of different sorter C (r) judgment object classifications under different resolution.High resolving power from the low resolution of image to image training classifier gradually finally obtains a hierarchical classifier H (r) who comprises each resolution like this, and H (r) is made of R independent sorter C (r).Wherein the concrete construction process of C (r) is as follows:

(a) from resolution r hypograph set T (r), extract respectively the SURF feature descriptor of all images in positive sample and the negative sample, positive and negative sample characteristics descriptor is stored respectively among two set; (b). utilize the K-means clustering method, the feature descriptor of positive and negative sample set is gathered into respectively the K class, each self-forming K key word, and obtain the cluster index of feature descriptor in every width of cloth image, be which cluster each character symbol exists in the middle of, write down simultaneously the center value of each cluster centre; (c). with the cluster combination of positive and negative sample set, form a characteristics of image dictionary that size is 2K, form the inquiry of characteristics of image histogram in order to test pattern; (d) according to the index of positive and negative sample set in cluster in (b), obtain every width of cloth image clustering histogram, and according to the corresponding corresponding label of positive negative sample, histogram data and label are sent into SVM train, obtain the sorter C (r) under the resolution r.

2. the detection of object under the multiresolution framework:

(1) test pattern is extracted respectively the image of R resolution different resolution according to the structure flow process of multiresolution sorter, then the test pattern of different resolution is extracted successively again the multi-scale image of different scale s, s=1,2,3 ... S, scale factor is β, and yardstick is larger, and the scale factor of the gaussian filtering of doing is larger.The multi-resolution image of test pattern and the multi-scale image under each resolution have so just been obtained;

(2) obtain the image of each each yardstick of resolution after, each metric space at each image in different resolution uses the window of formed objects to detect, window uses different windows to detect between different resolution, and the window size of each resolution is (w _r, h _r)=(w, h)/a ^R-r, w wherein, h are respectively the length of original resolution window and wide, a is the constant of fixing.Detection different metric space under same resolution can carry out simultaneously, is independent of each other, and must carry out from low to high between the different resolution;

(3) for each resolution r and yardstick s, the state of all initial detecting windows all is initialized as 1, namely acquiescence comprises object, use sorter C (r) with each metric space image of same window size detection resolution r, on metric space s, to detect is 0, the window area that does not namely comprise object abandons, be that 1 window area that namely comprises object keeps with detection, being delivered to a high resolution is to go among the same metric space s of r+1 resolution, carry out successively, until detect highest resolution R, this has just obtained the zone that different scale space s among the highest resolution R comprises the soil body;

(4) owing to use the method for this window classification also insensitive in some little changes of yardstick and position for object, so sorter can be near a position duplicate detection repeatedly, in order to obtain the last position of object, need again each metric space under the original resolution to obtain the final position of object by the method with the Mean-shift cluster.

Fig. 2 is the multi-scale Representation of image, Fig. 3 is the multiresolution framework, Fig. 4 (a), (b) are respectively the contrast that multiresolution detects effect and the given standard detection effect of database (what test use is The PASCAL Visual Object Classes Challenge 2006 image data bases), and the difference of two width of cloth figure is carried out mark with rectangle frame.As can be seen from Table 1: the sorter by the fast robust feature detects under the framework at the object multiresolution and has obtained good detection effect, can comprising the complete object under test that detects under few spatial context of trying one's best, obtain reasonable verification and measurement ratio.

The single resolution object of table 1 detects and the multiresolution object detection time table of comparisons

List of references

[1]Bay.H，Tuytelaars.T，Gool.LV.Surf：Speeded?up?robust?features.InThe?ninth?European?Conference?on?Computer?Vision，2006.

[2]Agarwal，S.，A.Awan，and?D.Roth，Learning?to?detect?objects?in?images?via?a?sparse，part-based?representation.Pattern?Analysis?and?Machine?Intelligence，IEEE?Transactions?on，2004.26(11)：p.1475-1490.

[3]Dalal，N.and?B.Triggs.Histograms?of?oriented?gradients?for?human?detection.in?Computer?Vision?and?Pattern?Recognition，2005.CVPR?2005.IEEE?Computer?Society?Conference?on.2005.

[4]http://www.mis.informatik.tu-darmstadt.de/Research/Projects/categorization/eth80-db.html.

[5]http://www.vision.caltech.edu/ImageDatasets/Caltech101/

[6]http://www.pascal-network.org/challenges/VOC/voc2006/index.html

Claims

1. the object identification detection method based on the multiresolution framework comprises object detection two parts under structure multiresolution framework sorter and the multiresolution framework, and step is as follows:

The first step: the positive and negative sample image of selecting to divide respectively type objects, positive sample is the sub-category subject image of wanting, negative sample is any other subject image of non-this type objects, and establishing original image resolution is R, is configured to the training set T (R) of sorter;

(a) from image training set T (r), extract respectively the SURF feature descriptor of all images in positive sample and the negative sample, positive and negative sample characteristics descriptor is stored respectively among two set; (b) utilize the K-means clustering method, the feature descriptor that aligns the negative sample collection carries out cluster analysis; (c) with the cluster combination of positive and negative sample set, form the characteristics of image dictionary; (d) according to the index of positive and negative sample set in cluster, obtain every width of cloth image clustering histogram, and according to the corresponding corresponding label of positive negative sample, histogram data and label are carried out the support vector machine training, obtain resolution and be the sorter C (r) under the r;

The 6th step: for each resolution r and yardstick s, the state of all initial detecting windows all is initialized as 1, use sorter C (r) with same window size detection resolution each metric space image as r, on metric space s, to detect is 0, the window area that does not namely comprise object abandons, be that 1 window area that namely comprises object keeps with detection, be delivered to resolution and be among the same metric space s of r+1 and go, carry out successively, until detect highest resolution, be the original image resolution of R, obtain the zone that different scale space s in the highest resolution comprises object;

The 7th step: be the final position that each metric space obtains object under the original image resolution of R by the method with the Mean-shift cluster.

2. the object identification detection method based on the multiresolution framework according to claim 1, it is characterized in that, the step (b) in the 3rd step wherein comprises the following steps: to utilize the K-means clustering method, the feature descriptor of positive and negative sample set is gathered into respectively the K class, each self-forming K key word, and obtain the cluster index of feature descriptor in every width of cloth image, namely which cluster each character symbol exists in the middle of, writes down simultaneously the center value of each cluster centre.