CN102810159B

CN102810159B - Human body detecting method based on SURF (Speed Up Robust Feature) efficient matching kernel

Info

Publication number: CN102810159B
Application number: CN201210196526.2A
Authority: CN
Inventors: 韩红; 王瑞; 谢福强; 李晓君; 顾建银; 张红蕾; 韩启强; 刘三军; 郭玉言; 甘露
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2012-06-14
Filing date: 2012-06-14
Publication date: 2014-10-29
Anticipated expiration: 2032-06-14
Also published as: CN102810159A

Abstract

The invention provides a human body detecting method based on an SURF efficient matching kernel, and mainly solves the problem that image background hybridity can not be better processed in the existing method. The method comprises the steps that a negative sample is obtained through bootstrap in an INRIA (Institute National de Recherce en Informatique et Automatique) database, and a training sample set of the whole human body is formed by the negative sample and a positive sample in the database; SURF descriptor feature points are extracted under different image scales for the training sample; feature points are extracted by random sampling to constitute the initial vector basis of a visual vocabulary; constrained singular value decomposition is utilized for the initial vector basis to obtain the maximum kernel function feature; the maximum kernel function feature in different image scales is weighted to obtain the features under all the image scales; the obtained features are trained in different classes by an SVM (Support Vector Machine) classifier, and a detection classifier is obtained; and the image to be detected is input to the classifier to obtain the final detection result. The method disclosed by the invention can be used for accurately detecting the human body, and can be used for intelligent monitoring, driver auxiliary systems and virtual video.

Description

The human body detecting method that efficiently mates core based on SURF

Technical field

The invention belongs to technical field of image processing, relate to static human detection method, can be used for intelligent monitoring, driver assistance system, human body motion capture, porny filtration and virtual video.

Background technology

In computer vision field, human detection is a technology that application prospect is very wide, human detection has good application prospect in a plurality of fields, but the diversity due to human body attitude, mixing and clothes texture of background, illumination condition, many-sided factor such as self blocks and causes human detection to become a very difficult problem.At present, in still image, the method for human detection mainly contains detection method, the method based on manikin and the method based on statistical classification based on kinetic characteristic.

Detection method based on kinetic characteristic is that attitude while utilizing human body to stablize changes and the symmetry of human body is the cycle and changes this characteristic, in time domain, construct self similarity matrix, motion change by human cyclin reflects character different and other motions of matter, and utilize this analytical approach that movement human is detected, but the method algorithm complex is large, higher to human motion stability requirement.

Method based on manikin, have clear and definite manikin, then according to the relation between each position of Construction of A Model and human body, carries out human body identification.This method can be processed occlusion issue, and can infer the attitude of human body.But the deficiency of this method is the structure difficulty of model, solves complexity.

Method based on statistical classification, obtains a sorter by machine learning from a series of training data learnings, with this sorter, represents human body, then utilizes this sorter that input is classified and identified.The advantage of the method based on statistical classification is that testing result is stable, and effect is better, and shortcoming is to need a lot of training datas, and is difficult to solve the problem that insufficient light and background mix.Wherein based on SURF Speed Up Robust Feature, efficiently mate the human body detecting method of core, the characteristics of image of its input sorter is a kind of characterization image method based on local, traditional background challenge can be avoided, better human detection result can be obtained.

Summary of the invention

The present invention seeks to the deficiency for above-mentioned prior art, a kind of human body detecting method that efficiently mates core based on SURF of proposition, to reduce the complexity of image characteristics extraction, improves the sign ability of feature, effectively improves the accuracy of human detection.

Technical scheme of the present invention realizes as follows:

(1) from institut national de recherche en infomatique et automatique INRIA database, by bootstrapping, operate and obtain negative sample, and form whole human body training sample set together with other positive sample in database;

(2) every width training sample image is divided into 8 * 8 pixel grid, each grid, respectively by the graphical rule sampling of 16 and 25 pixel sizes, extracts the SURF descriptor unique point F of all training images;

(3) by the SURF descriptor unique point F to all training images, carry out stochastic sampling, obtain the visual vocabulary of whole training sample 350 dimensions, with the 350 dimension visual vocabularies that obtain, form initial base vector R;

(4) by initial base vector R, utilize the core svd CKSVD of belt restraining to carry out dictionary learning, obtain maximum kernel Function feature r;

(5) by maximizing the similar maximum kernel Function feature r of eigenwert retrieval inhibition, and press descending and extract kernel function eigenwert, delete the same element of maximal value, obtain proper vector G, characteristics of image G to each different images yardstick is weighted summation, obtains the feature G ' of all graphical rules:

G′＝G×A _l，

Wherein, A _lfor the weight of different figure phase yardsticks, l=[1,2], w _l=1/p _l, p is the pixel size of the graphical rule of the SURF unique point extracted, p={16,25};

(6) store the feature G ' of all graphical rules, select the low dimensional feature h of similar Gaussian distribution in G ', as the SURF of final image, efficiently mate core feature X;

(7) use support vector machine svm classifier device efficiently to mate core feature X to resulting SURF and carry out classification based training, obtain finally for detection of sorter;

(8) input image to be detected, utilize the sorter having obtained to determine final testing result.

The present invention has the following advantages compared with prior art:

1, the characteristics of image that efficiently mates core due to the SURF using in the present invention can avoid the traditional expression producing based on image representation method edge and based on profile fuzzy, can obtain better human detection result.

2, the present invention, because the more traditional Image Description Methods dimension of the image feature information extracting is low, can effectively reduce and extract characteristic time and data calculated amount.

3, the present invention, owing to being human body detecting method based on local visual characteristic information, when processing mixes background image, can obtain better result.

Accompanying drawing explanation

Fig. 1 is schematic flow sheet of the present invention;

Fig. 2 is the positive sample image of part using in the present invention;

Fig. 3 is the part negative sample image using in the present invention;

Fig. 4 is with the present invention and existing methodical detection performance comparison diagram;

Fig. 5 is result figure human body image being detected with the present invention.

Embodiment

With reference to Fig. 1, specific embodiment of the invention step is as follows:

Step 1, from institut national de recherche en infomatique et automatique INRIA database, by bootstrapping, operate a large amount of negative sample image obtaining, and together with other positive sample image in database composing training sample set, wherein as shown in Figure 2, positive sample image as shown in Figure 3 for negative sample image.

Step 2, the SURF descriptor unique point F of extraction training sample set.

2a) j width training image is divided into 8 * 8 pixel grid, each grid, respectively by the graphical rule sampling of 16 and 25 pixel sizes, obtains the SURF Speed Up Robust Feature descriptor unique point F of i width training image _j;

2b) according to step 2a) extract the SURF descriptor unique point F of all training images, wherein, F={F ₁..., F _j..., F _n, j ∈ [1, M], M is number of training.

Step 3, the initial base vector R of acquisition visual vocabulary.

3a) to each width training sample image, on 8 * 8 image grid, according to 16,25 pixel size yardsticks, 15 SURF unique points that obtained by step (2) of random sampling, are designated as respectively i represents i width training image;

3b) repeating step 3a), extract at random the SURF unique point of all training samples, be designated as F '; Utilize k-means clustering method to carry out cluster to SURF unique point similar in F ', define 350 cluster centres, obtain the visual vocabulary of whole training image 350 dimensions, form the initial base vector R of visual vocabulary.

Step 4, obtains the maximum kernel Function feature vector r of initial base vector R.

4a) initial base vector R is used to projection coefficient v, projects on the space of one 350 dimension, obtain the projection R ' of R:

R′＝Rv，

v＝[v ₁,...v _i...,v _N]

v _i＝(R ^TR) ^-1(R ^Tr _i)，i∈[1,N]，

Wherein, r _ithe maximum kernel feature of i unique point extracting in piece image, v _ibe the low-dimensional projection coefficient of i unique point extracting in piece image, N is the quantity of the unique point chosen at random in piece image;

4b) maximum kernel Function feature vector r is approached to the projection R ' of initial base vector R on projector space, obtains approximating function f (r):

f(r)＝arg?min‖r-R′‖，

By R '=Rv substitution above formula:

f(r)＝arg?min‖r-Rv‖，

Wherein, ‖ ‖ represents 2 norms, and arg min ‖ ‖ represents to minimize;

4c) v in f (r)=arg min ‖ r-Rv ‖ and r are launched, obtain maximum kernel Function feature vector r to 2 of initial base vector R approximating function f (v, r):

f (v, r) = \frac{1}{N} Σ_{i = 1}^{N} {| | r_{i} - {Rv}_{i} | |}^{2},

Wherein, r=[r ₁... r _i..., r _n], represent maximum kernel Function feature vector;

4d) use random gradient descent method to solve approximating function f (v, r), obtain maximum kernel Function feature vector r.

Step 5, by the element in descending sort maximum kernel Function feature vector r, delete the same element of maximal value in maximum kernel Function feature vector r, obtain proper vector G, proper vector G under each different images yardstick is weighted to summation, obtains the characteristics of image G ' on all graphical rules:

G′＝G×A _l，

Wherein, A _lfor the weight of different figure phase yardsticks, l=[1,2], w _l=1/p _l, p is the pixel size that extracts the graphical rule of SURF descriptor unique point, p={16,25}.

Step 6, stores the characteristics of image G ' under all graphical rules, selects the low dimensional feature h of similar Gaussian distribution in characteristics of image G ', as the SURF of final image, efficiently mates core feature X.

Step 7, is used support vector machine svm classifier method efficiently to mate core feature X to the SURF having obtained and carries out classification learning, obtains the final sorter for human detection.

Step 8, is used the sorter for human detection having obtained, and determines final testing result.

(8a) input image to be detected, the region that is 128 * 64 pixels using a size in the image to be detected upper left corner is as first scanning window, every to 8 pixels of right translation or downwards 16 pixels of translation as a new scanning window, obtain thus one group of scanning window, input step (7) gained sorter, obtains the sorter mark of each scanning window;

(8b) according to the sorter mark judgement of scanning window, in altimetric image, whether comprised human body, if the scanning window of sorter output contains human body, from all scanning windows that contains human body, find out scanning window that sorter mark is the highest as main window;

(8c) main window and other human body windows are combined to judgement, when other human body windows in main window around and overlapping while being greater than 1/2, by this window and main window combination, obtain the human body window after combining;

(8d) retain the human body window after combination, delete main window and all involved human body windows;

If (8e) also have remaining human body window, find out again human body window that wherein sorter mark is the highest as main window, and repeating step (8b)-(8d);

(8f) on tested person's volume image, mark all testing results, as by the final human detection result of altimetric image, adopt rectangle frame to represent testing result, the human body being detected is in rectangle frame.

Effect of the present invention can be verified by following emulation experiment:

1) emulation experiment condition setting: emulation experiment of the present invention has compiled on Matlab 2009a, execution environment is the HP workstation under Windows framework.Test required positive sample and negative sample and be all taken from institut national de recherche en infomatique et automatique INRIA database.Use 2416 positive samples and 13500 negative samples as training set, 1132 positive samples and 4050 negative samples are as test set, the size of positive sample and negative sample image is 128 * 64 pixels, and Fig. 2 has provided the wherein positive sample image of part, and Fig. 3 is part negative sample image.

2) emulation content and interpretation of result

Emulation one: use respectively the present invention and existing method to classify to characteristics of image, classification performance as shown in Figure 4.In Fig. 4, curve is above classification performance curve of the present invention, and curve is below existing methodical classification performance curve, and as can be seen from Figure 4, classification performance of the present invention is higher than existing methodical classification performance.

Emulation two: use respectively the inventive method and existing method to carry out human detection to same width from the image of Massachusetts science and engineering MIT database, testing result as shown in Figure 5.Wherein, Fig. 5 (a) is used existing method to carry out human detection, carry out the experimental result before window fusion, Fig. 5 (b) is existing methodical final detection result, Fig. 5 (c) represents to adopt this method to carry out human detection, carry out the experimental result before window fusion, the final detection result that Fig. 5 (d) is this method.Method of the present invention has higher human detection accuracy as can be seen from Figure 5.

To sum up, the present invention, when reducing the complexity of image characteristics extraction, has improved the ability to express of feature, thereby has made this method be very suitable for the human detection of still image, compared with the conventional method, this method can greatly reduce the empty scape rate of human detection simultaneously.

Claims

1. based on SURF, efficiently mate a human body detecting method for core, comprise the steps:

(2) every width training sample image is divided into 8 * 8 pixel grid, each grid, respectively by the graphical rule sampling of 16 and 25 pixel sizes, extracts the SURF Speed Up Robust Feature descriptor unique point F of all training images;

(5) by maximizing eigenwert extraction method, suppress similar maximum kernel Function feature r, and press descending and extract kernel function eigenwert, delete the same element of maximal value, obtain proper vector G, characteristics of image G to each different images yardstick is weighted summation, obtains the feature G' of all graphical rules:

G'＝G×A _l，

Wherein, A _lfor the weight of different images yardstick, l=[1,2], w _l=1/p _l, p is the pixel size of the graphical rule of the SURF unique point extracted, p={16,25}, p _ll element in p;

(6) store the feature G' of all graphical rules, select the low dimensional feature h of similar Gaussian distribution in G', as the SURF of final image, efficiently mate core feature X;

2. method according to claim 1, wherein step 2) described in the SURF descriptor unique point F of all training images of extraction, carry out as follows:

2a) j width training image is divided into 8 * 8 pixel grid, each grid, respectively by the graphical rule sampling of 16 and 25 pixel sizes, obtains the SURF Speed Up Robust Feature unique point F of j width training image _j;

2b) according to step 2a) extract the SURF descriptor unique point F of all training images, wherein, F={F ₁..., F _j..., F _m, j ∈ [1, M], M is number of training.

3. method according to claim 1, the visual vocabulary of whole training sample 350 dimensions of the acquisition described in step (3) wherein, carries out as follows:

3a) to each width training sample image, on 8 * 8 image grid, according to 16,25 pixel size yardsticks, 15 SURF unique points that obtained by step (2) of random sampling, are designated as F respectively _i', i represents i width training image;

3b) repeating step 3a), extract the SURF unique point of all training samples, be designated as F', define 350 cluster centres, utilize k-means clustering method to carry out cluster to SURF unique point similar in F', obtain the visual vocabulary of whole training sample 350 dimensions.

4. method according to claim 1, wherein the described input image to be detected of step (8), utilizes the sorter having obtained to determine final testing result, carries out as follows:

(8a) input detected image, the region that is 128 * 64 pixels using a size in the detected image upper left corner is as first scanning window, every to 8 pixels of right translation or downwards 16 pixels of translation as a new scanning window, obtain thus one group of scanning window, input step (7) gained sorter, obtains the sorter mark of each scanning window;

(8f) on tested person's volume image, mark all testing results, as by the final human detection result of altimetric image, adopt rectangle frame to represent testing result, by the human body being detected in rectangle frame.