CN103455826B

CN103455826B - Efficient matching kernel body detection method based on rapid robustness characteristics

Info

Publication number: CN103455826B
Application number: CN201310405276.3A
Authority: CN
Inventors: 韩红; 焦李成; 郭玉言; 马文萍; 马晶晶; 侯彪; 祝健飞
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2013-09-08
Filing date: 2013-09-08
Publication date: 2017-02-08
Anticipated expiration: 2033-09-08
Also published as: CN103455826A

Abstract

The invention provides an efficient matching kernel body detection method based on rapid robustness characteristics. The efficient matching kernel body detection method mainly solves the problem that a traditional method can not well solve the problem of image background mixing or uneven illumination. The efficient matching nuclear body detection method includes the first step of selecting training sample set images, the second step of extracting SURF characteristic points of the images, the third step of building an initial vector basis of each layer, the fourth step of obtaining the biggest kernel function characteristic of a sampling layer, the fifth step of obtaining image efficient matching kernel characteristics, the sixth step of carrying out classified training, the seventh step of imputing the images to scan, the eighth step of detecting a scanning window, and the ninth step of outputting detection results. Through layered extraction of local information of the images to carry out characteristic learning, the characteristics are mapped to a low-dimensional space and are gathered into a characteristic set, then a linear classifier is used for training the characteristics, and a body detection classifier is obtained. The efficient matching kernel body detection method can be used for accurately detecting body information in natural images in the field of image processing.

Description

Efficient matchings core human body detecting method based on fast robust feature

Technical field

The invention belongs to technical field of image processing, the one kind further relating to static human detection technique field is based on Efficient matchings core (the Efficient Match Kernel EMK) human body detecting method of fast robust feature.The present invention can use In from still image, human body information is detected, to reach the purpose of identification human body target.

Background technology

Human detection is to judge the process of human body information position from natural image, in recent years because it is in intelligence The using value in the fields such as monitoring, driver assistance system, human body motion capture, porny filtration, has become as computer A key technology in visual field.But the multiformity due to human body attitude, the mixing and clothes texture of background, illumination bar Part, many factors such as itself blocks and leads to human detection to become an extremely difficult problem.At present, people in still image The method that health check-up is surveyed is broadly divided into two big class：Human body detecting method based on anthropometric dummy and the human detection side based on study Method.

The first, the human body detecting method based on anthropometric dummy.The method does not need learning database, has clear and definite human body Model, then carries out human bioequivalence according to the relation between each position and human body of model construction.

Patent " a kind of human body detecting method " (number of patent application that Beijing Jiaotong University applies at it CN201010218630.8, publication number CN101908150A) disclose a kind of detection method based on anthropometric dummy.The method is led to Cross multiple bodily forms, the human sample of multiple posture is set up has the human detection template of certain fuzziness to determine human body candidate area Domain.The method can preferably process occlusion issue, can extrapolate the attitude of human body, improves efficiency and the precision of human detection, But, the deficiency that the method yet suffers from is that, because matching algorithm is more complicated, computation complexity is higher, complicated in background In the case of the good testing result of difficult to reach.

Second, the human body detecting method based on study.The method passes through machine learning from a series of training data middle schools Acquistion, to a grader, is then classified to input window using this grader and is identified.

Patent " the real-time body's detection based on AdaBoost framework and head color that Harbin Engineering University applies at it Disclose one kind in method " (number of patent application CN201110104892.6, publication number CN102163281A) and combine multiple dimensioned ladder Degree rectangular histogram HOG feature and the human body detecting method of head color histogram feature.The method extracts histogram of gradients HOG feature While combine feature templates, increased head feature differentiation function, improve verification and measurement ratio compared with traditional method, especially It is to have good feature identification effect for the background little image space of change, but, the deficiency that the method yet suffers from is, Mix or during uneven illumination for background, testing result can be interfered.

Content of the invention

It is an object of the invention to overcoming the shortcomings of above-mentioned prior art, a kind of height based on fast robust feature is proposed Effect coupling core human body detecting method, using the human body detecting method based on study, by the local message of Multi-layer technology image, so After carry out dictionary learning by Feature Mapping to lower dimensional space, assemble feature set, using linear classifier, feature set instructed Practice, obtain the grader of a human detection, recycle this grader to carry out human detection to image to be detected.

For achieving the above object, the present invention includes obtaining detecting grader and using the grader being obtained, image being carried out Two processes of detection, implement step as follows：

First process, obtains detecting comprising the following steps that of grader：

(1) select training sample set image：

1a) utilize bootstrapping operation, from the non-human natural image of INRIA data base, obtain enough negative sample images；

1b) the negative sample image obtaining is formed new negative sample collection with the negative sample collection in INRIA data base；

1c) the new negative sample collection image obtaining is constituted human body training sample with the positive sample collection in INRIA data base Collection.

(2) extract image SURF characteristic point：

2a) each image concentrating human body training sample is divided into the grid of 8*8 pixel, to each grid, presses respectively 16th, the yardstick sampling of 25,36 pixel sizes, each yardstick sampling trellised sample level of shape；

2b) to each 8*8 pixel grid, the horizontal direction gradient of sampled point and Vertical Square in grid after every layer of sampling of calculating To the quadratic sum of gradient, by corresponding for gradient quadratic sum maximum sampled point, as this pixel grid in the quick Shandong of sample level Rod feature SURF characteristic point；

2c) each image that human body training sample is concentrated, the fast robust of all pixels grid from each sample level In property feature SURF characteristic point, randomly select 15 characteristic points, the image as human body training sample set is fast in sample level Fast robust features SURF characteristic point.

(3) construct every layer of initial base vector：

Using k means clustering method, fast robust in each sample level for all images is concentrated to human body training sample Property feature SURF characteristic point clustered, define 450 cluster centres, obtain whole human body training sample set in sample level 450 dimension visual vocabularies, constitute the initial base vector of sample level.

(4) obtain the maximum kernel Function feature of sample level：

For the initial base vector of each sample level, core singular value decomposition CKSVD being utilized respectively belt restraining carries out dictionary Study, obtains the maximum kernel Function feature of sample level.

(5) obtain Efficient image coupling core feature：

5a) to each sample level, the element value of the maximum kernel Function feature of arrangement sample level, judges greatest member in descending order Whether the element number of value is 1, if it is, the maximum kernel Function feature of sample level is exported as the characteristic vector of sample level, Otherwise, element equal with maximum for element value in the maximum kernel Function feature of sample level is set to zero, by the sampling after zero setting The maximum kernel Function feature of layer exports as the characteristic vector of sample level；

5b) characteristic vector of all sample level is weighted suing for peace, obtains all scale features, all yardsticks of storage are special Levy；

5c) every row element of all scale features vector is averaged, on abscissa line, corresponding average point is carried out all Adding up of value number, obtains the distribution of the element average of all row of all scale features vectors, selects the element average of all row The feature of similar Gauss distribution in distribution, as the efficient matchings core feature of the fast robust characteristic of final image.

(6) classification based training：

The efficient matchings core feature of the fast robust characteristic extracted is carried out point using support vector machines grader Class is trained, and obtains detecting grader.

Second process, using comprising the following steps that the detection grader being obtained is detected to image：

(7) input picture is scanned：

One tested altimetric image of input, scans the tested altimetric image of view picture with window scanning method, obtains one group of scanning window figure Picture, this group scanning window image is input to detection grader.

(8) detect scanning window：

8a) judge whether include human body information in inputted scanning window image with detection grader, if not existing Human body information, then be non-human natural image by this detected framing, otherwise, all has human body information from judge In scanning window image, find out detection grader fraction highest scanning window image as main window image；

8b) the remaining scanning window image having human body information beyond main window image, will be with main window image weight The folded scanning window image more than 50% carries out window combination operation with main window image, the window that window combination is obtained as One testing result preserves, and deletes all images participating in window combination；

8c) judge to have whether the scanning window image of human body information also has residue, if it has, finding out remaining scanning window In image, detection grader fraction highest image is as main window image, execution step 8b), otherwise, execution step (9).

(9) export testing result：

All windows that window combination is obtained go out in tested altimetric image subscript, export the image after marking, as tested The human detection result of altimetric image.

The present invention compared with prior art has advantages below：

First, the present invention employs fast robust feature, fast robust in the characteristic extraction procedure of human detection Feature by calculate local image gradient situation of change area image is counted, constitute entire image with statistics The feature of matter, can avoid asking of the Fuzzy Representation that in prior art, the image representing method based on edge with based on profile produces Topic is so that the present invention is when process mixes background and uneven illumination image, it is possible to obtain preferably testing result.

Second, the present invention to image layered extraction feature, is effectively utilized not in the characteristic extraction procedure of human detection With the characteristic point information on yardstick, it is to avoid the too small local match error brought of prior art mesoscale is so that the present invention is permissible Obtain preferable testing result.

3rd, the characteristics of image extracting is adopted dictionary learning method by image feature maps to lower dimensional space by the present invention, Assemble feature set, compared with prior art reduce the dimension of characteristics of image, effectively reduce the calculating time of characteristics of image The amount of calculation of data.

Brief description

Fig. 1 is the flow chart of the present invention；

Fig. 2 is sample image used in the present invention；

Fig. 3 is the present invention and the grader classification performance comparison diagram based on histogram of gradients HOG feature human body detecting method；

Fig. 4 is that the inventive method is carried out to uneven illumination image with based on histogram of gradients HOG feature human body detecting method The analogous diagram of human detection；

Fig. 5 is that the inventive method is carried out to complex background image with based on histogram of gradients HOG feature human body detecting method The analogous diagram of human detection.

Specific embodiment

The present invention will be further described below in conjunction with the accompanying drawings.

Referring to the drawings 1, the present invention comprises the following steps that：

Step 1, selects training sample set image.

Using bootstrapping operation, from the non-human natural image of INRIA data base, obtain enough negative sample images.

The comprising the following steps that of bootstrapping operation：

The first step, randomly selects m positive sample image and n negative sample image, wherein 100≤m from INRIA data base ≤ 500,100≤n≤800, and n≤m≤3n, using gradient orientation histogram HOG feature extracting method, selected is owned Positive and negative sample image carries out feature extraction, carries out classification based training using support vector machines grader to the feature extracted, obtains Preliminary classification device.

Second step, continuous random chooses the non-human natural image in INRIA data base, using sweeping of sample image size Retouch window, from left to right with 8 pixels as Moving Unit, from top to bottom with 16 pixels as Moving Unit, scanning view picture is tested The non-human natural image surveyed；Image in all of scanning window is input to preliminary classification device detected, preserves classification The wrong scanning window image dividing of device, until wrong point of scanning window amount of images reaches a and opens, 200≤a≤500, stop selection non- Human body natural's image；From wrong point of scanning window image, random choose b opens image, and 1/5a≤b≤1/3a, with current negative sample The new negative sample collection of this image composition.

3rd step, to the m positive sample image randomly selecting and new negative sample collection, carries out gradient orientation histogram HOG Feature extraction, training grader, the non-human natural image of detection and renewal negative sample collection.

4th step, repeats the 3rd step operation, until the final training sample set after updating is by 2416 positive samples Image and 13500 negative sample image compositions, size is 128 × 64 pixels.

The negative sample image obtaining is formed new negative sample collection with the negative sample collection in INRIA data base.

The new negative sample collection image obtaining is constituted human body training sample set with the positive sample collection in INRIA data base.

In embodiments of the invention, final human body training sample is concentrated, training sample set by 2416 positive samples with 13500 negative sample compositions, test sample collection is made up of with 4050 negative samples 1132 positive samples, and the size of sample image is equal For 128 × 64 pixels.

Fig. 2 is the part sample image that the present invention uses, and wherein Fig. 2 (a) is part positive sample figure used in the present invention Picture, Fig. 2 (b) is to be part negative sample image used in the present invention.

Step 2, extracts the SURF characteristic point of image.

The each image that human body training sample is concentrated is divided into the grid of 8*8 pixel, to each grid, press 16 respectively, 25th, the yardstick sampling of 36 pixel sizes, each yardstick sampling trellised sample level of shape.

To each 8*8 pixel grid, the horizontal direction gradient of sampled point and vertical direction in grid after every layer of sampling of calculating The quadratic sum of gradient, by corresponding for gradient quadratic sum maximum sampled point, as this pixel grid sample level fast robust Property feature SURF characteristic point.

The each image that human body training sample is concentrated, from each sample level, the fast robust of all pixels grid is special Levy in SURF characteristic point, randomly select 15 characteristic points, as the quick Shandong in sample level for the image of human body training sample set Rod feature SURF characteristic point.

Step 3, constructs every layer of initial base vector.

Using k means clustering method, to all training sample image in each sample level all fast robust features SURF characteristic point is clustered, and defines 450 cluster centres, obtains the 450 dimension visual vocabularies in sample level for the whole training sample, Constitute the initial base vector of sample level.

The comprising the following steps that of k means clustering method：

The first step, to each sample level, quick in sample level from all sample images of human body training sample set at random In robust features SURF characteristic point, choose 450 fast robust feature SURF characteristic points, as the initial clustering of sample level Center, respectively by the data value of 450 initial cluster centers, as the cluster centre value of place initial cluster center.

Second step, calculates human body training sample set and arrives each in all fast robust feature SURF characteristic points of sample level The Euclidean distance of cluster centre.

3rd step, each fast robust feature SURF characteristic point of sample level is grouped into the cluster closest with itself In the classification that center is located.

4th step, after judging to sort out, whether the statistical average of the fast robust feature SURF characteristic point of each class is equal to Cluster centre value, if it is, execution the 5th step, otherwise, the statistical average of the characteristic point of required each class is gathered as new Class central value, returns second step.

5th step, preserves 450 cluster centre values, constitutes column vectors with 450 cluster centre values, using this column vector as Initial base vector output in sample level for the whole human body training sample set.

Step 4, obtains the maximum kernel Function feature of sample level.

The step of bootstrapping operation is as follows：

The first step, for each sample level, the initial base vector of sample level is projected to one 450 dimension spatially, leads to Cross following formula to calculate, obtain the projection vector of the initial base vector of sample level：

R=R₁×[v₁,...v_j...,v_N]

Wherein, R represents the projection vector of the initial base vector of sample level, R₁Represent the initial base vector of sample level, v_jRepresent The vector of the projection coefficient of j-th characteristic point that all sample images of human body training sample set extract in sample level, v_j=| v_1j,v_2j,...v_sj,v_Mj|^T, v_sjRepresent that human body training sample concentrates j-th spy that s width sample image extracts in sample level Levy projection coefficient a little, M represents all sample image numbers of human body training sample set, j=1,2 ..., N, N represent that human body is instructed Practice the number of the characteristic point that each width sample image in sample set randomly selects in sample level.

Second step, constructs an approximating function as the following formula, the first primordium on projector space up approach sample level The projection vector of vector：

F (r)=arg min | | r-P | |

Wherein, r represents maximum kernel Function feature in sample level, and R represents the projection vector of the initial base vector p of sample level, | | | | represent 2 norms, arg min | | | | represent and minimize.

By R=R₁×[v₁,...v_j...,v_N] substitute into above formula, and r is pressed r=[r₁,...r_j...,r_N] launch, as the following formula Obtain maximum kernel Function feature r to initial base vector R₁2 approximating functions f (v, r)：

f (v, r) = \frac{1}{N} Σ_{j = 1}^{N} | | r_{j} - R_{1} v_{j} | |^{2}

Wherein, v represents the low dimension projective coefficient of all characteristic points that all sample images of human body training sample set extract Vector, and v=[v₁,...v_j...,v_N], v_jAll sample images of expression human body training sample set extract in sample level The vector of the projection coefficient of j-th characteristic point, N represent human body training sample concentrate each width sample image in sample level with The number of the characteristic point that machine is chosen, r_jRepresent j-th characteristic point of all sample images extractions of human body training sample set Big core characteristic vector, R₁Represent the initial base vector of sample level.

3rd step, solves approximating function using stochastic gradient descent method, and acquisition following formula carrys out iteration and updates in this sample level Maximum kernel Function feature, constitutes the image feature representation of low-dimensional：

r (k + 1) = r (k) - \frac{η}{k} \frac{\partial Σ_{j = 1}^{N} | | r_{j} - R {(R^{T} R)}^{- 1} (R^{T} r_{j}) | |^{2}}{\partial r}

Wherein, r (k+1) represents the maximum kernel Function feature in the sample level that iteration obtains for k+1 time, and k represents iterationses, R (k) represents the maximum kernel Function feature in the sample level that iteration obtains for k time, and η represents learning rate, is a constant,Represent meter The derivative to r for the formula in calculation bracket, r_jRepresent what all sample images that human body training sample is concentrated extracted in sample level The maximum kernel characteristic vector of j-th characteristic point, R₁Represent the initial base vector in sample level, R₁ ^TRepresent the first primordium in sample level Vectorial R₁Transposed vector, set iterationses as 1000 times, the r obtaining after the completion of iteration (1000) as in sample level Whole maximum kernel Function feature, j=1,2 ..., N, N represent each width sample image of human body training sample concentration in sample level On the characteristic point randomly selecting quantity.

Step 5, obtains Efficient image coupling core feature.

To each sample level, the element value of the maximum kernel Function feature of arrangement sample level, judges greatest member value in descending order Element number whether be 1, if it is, using the maximum kernel Function feature of sample level as sample level characteristic vector output, no Then, element equal with maximum for element value in the maximum kernel Function feature of sample level is set to zero, by the sample level after zero setting Maximum kernel Function feature as sample level characteristic vector export.

The characteristic vector of all sample level is weighted suing for peace, obtains all scale features, store all scale features.

The mode of weighted sum is as follows：

G*=G_i×A_i

Wherein, G* represents all sample level features, G_iRepresent the characteristic vector of each sample level, i=1,2,3, A_iRepresent every The corresponding weight of individual sample level,w_i=1/p_i, p_iRepresent that the pixel of the sampling scale of each sample level is big Little, p_i={ 16,25,36 }.

Every row element of all scale features vector is averaged, on abscissa line, corresponding average point carries out average Adding up of number, obtains the distribution of the element average of all row of all scale features vectors, selects the element distribution of mean value of all row In similar Gauss distribution feature, as the efficient matchings core feature of the fast robust characteristic of final image.

Step 6, classification based training.

Step 7, input picture is scanned.

What window scanned comprises the following steps that：

The first step, by the area of sample image size in a human body training sample set in the tested altimetric image upper left corner of input Domain, as first scanning window, using this scanning window as Current Scan window, preserves Current Scan video in window.

Second step, by Current Scan window on detected image to 8 pixels of right translation or move down 16 pixels and obtain To a new scanning window, go to replace Current Scan window with new scanning window, preserve Current Scan video in window.

3rd step, moves Current Scan window as stated above, goes to replace Current Scan window with the scanning window after movement Mouth, till scanning through the image that view picture is detected, preserves all of scanning window image.

Step 8, detects scanning window.

8a) judge whether include human body information in inputted scanning window image with detection grader, if not existing Human body information, then be non-human natural image by this detected framing, otherwise, all has human body information from judge In scanning window image, find out detection grader fraction highest scanning window image as main window image.

8b) the remaining scanning window image having human body information beyond main window image, will be with main window image weight The folded scanning window image more than 50% carries out window combination operation with main window image, the window that window combination is obtained as One testing result preserves, and deletes all images participating in window combination.

The comprising the following steps that of window combination：

The first step, by window combination in need image from 1 start serial number；

Second step, by every width need the grader fraction of window combination image window combination in need image

The weight that the proportion accounting in detection grader fraction sum weights as image boundary；

3rd step, using following formula, is weighted to each edge circle of the image needing window combination：

X = x_{1} \times \frac{m_{1}}{A} + x_{2} \times \frac{m_{2}}{A} + ... + x_{M} \times \frac{m_{M}}{A}

Wherein, the pixel value being expert on tested altimetric image for the window edge that X obtains after representing weighting or column Pixel value, x₁,x₂,...x_MRepresent the pixel being expert on tested altimetric image for the image boundary participating in window combination respectively Value or the pixel value of column, m₁,m₂,...m_MRepresent the image corresponding grader fraction participating in window combination respectively, M represents Participate in the image number of window combination, A represents the Image Classifier fraction sum participating in window combination,M represents Participate in the image number of window combination, g represents the numbering of window combination image, m_gRepresent that g width participates in the image of window combination Grader fraction.

4th step, the border after weighting is formed a window.

8c) judge to have whether the scanning window image of human body information also has residue, if it has, finding out remaining scanning window In image, detection grader fraction highest image is as main window image, execution step 8b), otherwise, execution step 9.

Step 9, exports testing result.

The effect of the present invention can be further illustrated by following emulation：

1st, emulation experiment condition setting：

The emulation experiment of the present invention compiles on Matlab 2009a and completes, and simulated environment is the HP under Windows framework Work station.The required positive sample image of experiment and negative sample image are taken from INRIA data base, and training sample includes 2416 Positive sample and 13500 negative samples, test sample collection includes 1132 positive samples and 4050 negative samples, positive sample and negative sample The size of image is 128 × 64 pixels, and Fig. 2 is part sample image used in the present invention, and wherein Fig. 2 (a) is the present invention Used in part positive sample image, Fig. 2 (b) be the present invention used in part negative sample image.

2nd, emulation content and interpretation of result：

Emulation 1：

The accuracy rate obtaining after the completion of classifier training is to judge one of important indicator of classifier performance.In order to obtain relatively The grader of good performance, when we are to extracting sample image feature, the sampling number of plies to sample image and initial base vector low-dimensional Substantial amounts of experiment has been done in the selection of projection this two parameters of dimension during projection, and the different sampling numbers of plies is carried out to sample image Sampling, different projection dimensions carry out to initial base vector projecting trains the accuracy rate that grader obtains to be contrasted, contrast Result is as shown in table 1.

As it can be seen from table 1 projecting dimension for identical, grader sample image being carried out during 3 layers of sampling is accurate Rate is higher than grader accuracy rate when sample image is carried out with 2 layers of sampling；And for the identical sampling number of plies, not necessarily throw Shadow dimension is higher, and grader accuracy rate is higher.Can be seen that from the data of in figure and sample image is carried out with 3 layers of sampling, will just Primordium vector is highest to the grader accuracy rate obtaining during 450 dimension projection, and the classification performance of acquisition is best.

Emulation 2：

Using the present invention with based on histogram of gradients HOG feature human body detecting method, human body training sample set is carried out respectively Feature extraction, trains grader, and the classifier performance obtaining is contrasted.Classifier performance contrast schematic diagram referring to the drawings 3, Select in Fig. 3 by comparing kidney-Yang rate TPR (True Positive Rate) and false sun rate FPR (False Positives Rates) recipient performance characteristic ROC (the Receiver Operating Characteristic) curve of relation is evaluating point The performance of class device.ROC curve is more top to be inclined to left drift angle, and its corresponding grader is more outstanding.

Axis of abscissas in accompanying drawing 3 represents false sun rate FPR (False Positives Rates), and axis of ordinates represents true Positive rate TPR (True Positive Rate).The curve being indicated with square in accompanying drawing 3 represents grader kidney-Yang rate of the present invention and vacation The ROC curve of positive rate relation, represents the classification based on histogram of gradients HOG feature human body detecting method with the curve that cross indicates Device kidney-Yang rate and the ROC curve of false sun rate relation.It can be seen from figure 3 that the ROC curve that the present invention obtains is compared based on histogram of gradients The ROC curve that HOG feature human body detecting method obtains, more top tend to left drift angle, illustrate that the classification performance of the present invention is better than Classification performance based on histogram of gradients HOG feature human body detecting method.

Emulation 3：

With the present invention with based on histogram of gradients HOG feature human body detecting method to the nature figure from INRIA data base As carrying out human detection, testing result is as shown in Figure 4 and Figure 5.

Fig. 4 is the image of a width uneven illumination, and image 4 (a) represents the human detection result of the present invention, the white side of in figure Frame, represents the result that in the detection detection of classifier image of the present invention, human body information rear hatch merges.Image 4 (b) expression is based on The human detection result of histogram of gradients HOG feature human body detecting method, the white box of in figure, represent that the detection of the method divides The result that in class device detection image, human body information rear hatch merges.From fig. 4, it can be seen that in the case of uneven illumination, this Bright method, compared to based on histogram of gradients HOG feature human body detecting method, greatly can reduce false alarm rate, can be more accurately Detect all human body informations in altimetric image to be checked.

Fig. 5 is that a web has the image that complex background and personage are blocked, and image 5 (a) represents the human detection knot of the present invention Really, the white box of in figure, represents the result that in the detection detection of classifier image of the present invention, human body information rear hatch merges.Figure As 5 (b) represents the human detection result based on histogram of gradients HOG feature human body detecting method, the white box of in figure, represent The result that in the detection detection of classifier image of the method, human body information rear hatch merges.From fig. 5, it can be seen that there being the complicated back of the body In the case that scape and personage are blocked, can more accurately mark human body information using the inventive method, and window merge after obtain Window size is more suitable compared with based on histogram of gradients HOG feature human body detecting method, has higher human detection accuracy.

In sum, the inventive method can be even in uneven illumination, and background is complicated and there is general in the case of partial occlusion Human detection is out.Thus illustrating that this method is very suitable for the human detection in natural image.

Claims

1. a kind of efficient matchings core human body detecting method based on fast robust feature, including obtaining detecting grader and utilization The grader being obtained carries out to image detecting two processes, implements step as follows：

(1) select training sample set image：

1c) the new negative sample collection image obtaining is constituted human body training sample set with the positive sample collection in INRIA data base；

(2) extract image SURF characteristic point：

2a) each image concentrating human body training sample is divided into the grid of 8*8 pixel, to each grid, press 16 respectively, 25th, the yardstick sampling of 36 pixel sizes, each yardstick sampling trellised sample level of shape；

2b) to each 8*8 pixel grid, the horizontal direction gradient of sampled point and vertical direction ladder in grid after every layer of sampling of calculating The quadratic sum of degree, by corresponding for gradient quadratic sum maximum sampled point, as this pixel grid sample level fast robust Feature SURF characteristic point；

2c) each image that human body training sample is concentrated, from each sample level, the fast robust of all pixels grid is special Levy in SURF characteristic point, randomly select 15 characteristic points, as the quick Shandong in sample level for the image of human body training sample set Rod feature SURF characteristic point；

(3) construct every layer of initial base vector：

Using k means clustering method, human body training sample is concentrated fast robust in each sample level for all images special Levy SURF characteristic point to be clustered, define 450 cluster centres, obtain whole human body training sample set in sample level 450 Dimension visual vocabulary, constitutes the initial base vector of sample level；

The comprising the following steps that of described k means clustering method：

The first step, to each sample level, at random from all sample images of human body training sample set sample level fast robust Property feature SURF characteristic point in, choose 450 fast robust feature SURF characteristic points, as in the initial clustering of sample level The heart, respectively by the data value of 450 initial cluster centers, as the cluster centre value of place initial cluster center；

Second step, all fast robust feature SURF characteristic points calculating human body training sample set in sample level cluster to each The Euclidean distance at center；

3rd step, each fast robust feature SURF characteristic point of sample level is grouped into the cluster centre closest with itself In the classification being located；

4th step, after judging to sort out, whether the statistical average of the fast robust feature SURF characteristic point of each class is equal to cluster Central value, if it is, execution the 5th step, otherwise, using the statistical average of the characteristic point of required each class as in new cluster Center value, returns second step；

5th step, preserves 450 cluster centre values, constitutes column vectors with 450 cluster centre values, using this column vector as whole Initial base vector output in sample level for the human body training sample set；

(4) obtain the maximum kernel Function feature of sample level：

For the initial base vector of each sample level, core singular value decomposition CKSVD being utilized respectively belt restraining carries out dictionary learning, Obtain the maximum kernel Function feature of sample level；

The comprising the following steps that of described dictionary learning：

The first step, for each sample level, the initial base vector of sample level is projected to one 450 dimension spatially, by under Formula calculates, and obtains the projection vector of the initial base vector of sample level：

R=R₁×[v₁,...v_j...,v_N]

Wherein, R represents the projection vector of the initial base vector of sample level, R₁Represent the initial base vector of sample level, v_jRepresent human body The vector of the projection coefficient of j-th characteristic point that all sample images of training sample set extract in sample level, v_j=[v_1j, v_2j,...v_sj,v_Mj]^T, v_sjRepresent that human body training sample concentrates j-th characteristic point that s width sample image extracts in sample level Projection coefficient, M represents the sample image number of human body training sample set, j=1, and 2 ..., N, N represent human body training sample set In the number of characteristic point that randomly selects in sample level of each width sample image；

Second step, constructs an approximating function as the following formula, the initial base vector on projector space up approach sample level Projection vector：

F (r)=arg min | | r-R | |

Wherein, r represents the maximum kernel Function feature in sample level, | | | | represent 2 norms, arg min | | | | represent and ask Little value；

3rd step, calculates approximating function, obtains following formula and carrys out the maximum kernel Function feature that iteration updates in sample level, constitutes low-dimensional Image feature representation：

Wherein, r (k+1) represents the maximum kernel Function feature in the sample level that iteration obtains for k+1 time, and k represents iterationses, r (k) Represent the maximum kernel Function feature in the sample level that iteration obtains for k time, η represents learning rate, be a constant,Represent that calculating includes The derivative to r for the formula in number, r_jExtract j-th of all sample images that human body training sample is concentrated is represented on sample level The maximum kernel characteristic vector of characteristic point, R₁ ^TRepresent the initial base vector R in sample level₁Transposed vector, set iterationses as 1000 times, the r obtaining after the completion of iteration (1000) is as the final maximum kernel Function feature in sample level；

(5) obtain Efficient image coupling core feature：

5a) to each sample level, the element value of the maximum kernel Function feature of arrangement sample level, judges greatest member value in descending order Whether element number is 1, if it is, the maximum kernel Function feature of sample level is exported as the characteristic vector of sample level, otherwise, Element equal with maximum for element value in the maximum kernel Function feature of sample level is set to zero, by the sample level after zero setting Big kernel function feature exports as the characteristic vector of sample level；

5b) characteristic vector of all sample level is weighted suing for peace, obtains all scale features, store all scale features；

The mode of described weighted sum is as follows：

G*=G_i×A_i

Wherein, G* represents all scale features, G_iRepresent the characteristic vector of each sample level, i=1,2,3, A_iRepresent each sampling The corresponding weight of layer,w_i=1/p_i, p_iRepresent the pixel size of the sampling scale of each sample level, p_i= {16,25,36}；

5c) every row element of all scale features vector is averaged, on abscissa line, corresponding average point carries out average Adding up of number, obtains the distribution of the element average of all row of all scale features vectors, selects the element distribution of mean value of all row In similar Gauss distribution feature, as the efficient matchings core feature of the fast robust characteristic of final image；

(6) classification based training：

Using support vector machines grader, the efficient matchings core feature of the fast robust characteristic extracted is carried out with classification instruction Practice, obtain detecting grader；

(7) input picture is scanned：

One tested altimetric image of input, scans the tested altimetric image of view picture with window scanning method, obtains one group of scanning window image, will This group scanning window image is input to detection grader；

(8) detect scanning window：

8a) judge whether include human body information in inputted scanning window image with detection grader, if there is not human body Information, then by this detected framing be non-human natural image, otherwise, from all scannings having human body information judged In video in window, find out detection grader fraction highest scanning window image as main window image；

8b) the remaining scanning window image having human body information beyond main window image, will be overlapping with main window image big Scanning window image in 50% and main window image carry out window combination operation, and the window that window combination is obtained is as one Testing result preserves, and deletes all images participating in window combination；

8c) judge to have whether the scanning window image of human body information also has residue, if it has, finding out remaining scanning window image Middle detection grader fraction highest image is as main window image, execution step 8b), otherwise, execution step (9)；

(9) export testing result：

All windows that window combination is obtained go out in tested altimetric image subscript, export the image after marking, as tested mapping The human detection result of picture.

2. the efficient matchings core human body detecting method based on fast robust feature according to claim 1, its feature exists In：Step 1a) the comprising the following steps that of described bootstrapping operation：

The first step, randomly selects m positive sample image and n negative sample image from INRIA data base, and wherein 100≤m≤ 500,100≤n≤800, and n≤m≤3n, using gradient orientation histogram HOG feature extracting method, to selected all just Negative sample image carries out feature extraction, carries out classification based training using support vector machines grader to the feature extracted, and obtains just Beginning grader；

Second step, continuous random chooses the non-human natural image in INRIA data base, using the scanning of sample image size Window, from left to right with 8 pixels as Moving Unit, from top to bottom with 16 pixels as Moving Unit, scanning view picture is detected Non-human natural image；Image in all of scanning window is input to preliminary classification device detected, preserves grader The wrong scanning window image dividing, until wrong point of scanning window amount of images reaches a and opens, 200≤a≤500, stop selection inhuman Body natural image；From wrong point of scanning window image, random choose b opens image, and 1/5a≤b≤1/3a, with current negative sample The new negative sample collection of image composition；

3rd step, to the m positive sample image randomly selecting and new negative sample collection, carries out gradient orientation histogram HOG feature Extract, train grader, detect non-human natural image and renewal negative sample collection；

4th step, repeats the 3rd step operation, until the final training sample set after updating is by 2416 positive sample images Form with 13500 negative sample images, size is 128 × 64 pixels.

3. the efficient matchings core human body detecting method based on fast robust feature according to claim 1, its feature exists In：The comprising the following steps that of step (7) described window scanning method：

The first step, the region of sample image size in a human body training sample set in the tested altimetric image upper left corner of input is made For first scanning window, using this scanning window as Current Scan window, preserve Current Scan video in window；

Second step, by Current Scan window on detected image to 8 pixels of right translation or move down 16 pixels and obtain one Individual new scanning window, goes to replace Current Scan window with new scanning window, preserves Current Scan video in window；

3rd step, moves Current Scan window as stated above, goes replacement Current Scan window straight with the scanning window after movement To scanning through the image that view picture is detected, preserve all of scanning window image.

4. the efficient matchings core human body detecting method based on fast robust feature according to claim 1, its feature exists In：Step 8b) the comprising the following steps that of described window combination operation：

Second step, every width is needed window combination image grader fraction the image detection of window combination in need divide The weight that the proportion accounting in class device fraction sum weights as image boundary；

Wherein, the pixel value being expert on tested altimetric image for the window edge that X obtains after representing weighting or the picture of column Element value, x₁,x₂,...x_ERepresent respectively the pixel value being expert on tested altimetric image for the image boundary participating in window combination or The pixel value of column, m₁,m₂,...m_ERepresent the image corresponding detection grader fraction participating in window combination respectively, E represents Participate in the image number of window combination, A represents the image detection grader fraction sum participating in window combination,g Represent the numbering of window combination image, m_gRepresent that g width participates in the detection grader fraction of the image of window combination；

4th step, the border after weighting is formed a window.