CN103455826A

CN103455826A - Efficient matching kernel body detection method based on rapid robustness characteristics

Info

Publication number: CN103455826A
Application number: CN2013104052763A
Authority: CN
Inventors: 韩红; 焦李成; 郭玉言; 马文萍; 马晶晶; 侯彪; 祝健飞
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2013-09-08
Filing date: 2013-09-08
Publication date: 2013-12-18
Anticipated expiration: 2033-09-08
Also published as: CN103455826B

Abstract

The present invention proposes an efficient matching kernel human body detection method based on fast and robust features, which mainly solves the problem that traditional methods cannot better deal with mixed image backgrounds or uneven illumination. The implementation steps are: (1) Select the image of the training sample set; (2) Extract the SURF feature points of the image; (3) Construct the initial vector base of each layer; (4) Obtain the maximum kernel function feature of the sampling layer; (5) Obtain Efficient image matching of nuclear features; (6) classification training; (7) input image for scanning; (8) detection of scanning window; (9) output of detection results. The invention extracts the local information of the image layer by layer, performs feature learning to map the features to a low-dimensional space, assembles them into a feature set, and then uses a linear classifier to train the features to obtain a human body detection classifier. The invention can accurately detect human body information in natural images in the field of image processing.

Description

An Efficient Matching Kernel Human Detection Method Based on Fast and Robust Features

技术领域technical field

本发明属于图像处理技术领域，更进一步涉及静态人体检测技术领域的一种基于快速鲁棒性特征的高效匹配核（Efficient Match Kernel EMK）人体检测方法。本发明可用于从静态图像中，将人体信息检测出来，以达到识别人体目标的目的。The invention belongs to the technical field of image processing, and further relates to a fast and robust feature-based Efficient Match Kernel (Efficient Match Kernel EMK) human detection method in the technical field of static human detection. The invention can be used to detect human body information from static images to achieve the purpose of identifying human body targets.

背景技术Background technique

人体检测是从自然图像中判断出人体信息所在位置的过程，近年来由于其在智能监控、驾驶员辅助系统、人体运动捕捉、色情图片过滤等领域的应用价值，已经成为计算机视觉领域中的一项关键技术。但由于人体姿态的多样性，背景的混杂以及衣服纹理，光照条件，自身遮挡等多方面的因素导致人体检测成为一个非常困难的问题。目前，静态图像中人体检测的方法主要分为两大类：基于人体模型的人体检测方法和基于学习的人体检测方法。Human body detection is the process of judging the location of human body information from natural images. In recent years, due to its application value in the fields of intelligent monitoring, driver assistance systems, human motion capture, and pornographic image filtering, it has become a computer vision field. key technology. However, human body detection has become a very difficult problem due to the diversity of human body poses, background clutter, clothing textures, lighting conditions, and self-occlusion. At present, the methods of human detection in static images are mainly divided into two categories: human body detection methods based on human body models and human body detection methods based on learning.

第一种，基于人体模型的人体检测方法。该方法不需要学习数据库，有明确的人体模型，然后根据模型构造的各个部位与人体之间的关系进行人体识别。The first is a human detection method based on a human body model. This method does not need to learn the database, has a clear human body model, and then performs human body recognition according to the relationship between each part of the model construction and the human body.

北京交通大学在其申请的专利“一种人体检测方法”（专利申请号CN201010218630.8，公开号CN101908150A）公开了一种基于人体模型的检测方法。该方法通过多种体形、多种姿势的人体样本建立具有一定模糊性的人体检测模板来确定人体候补区域。该方法能较好的处理遮挡问题，可以推算出人体的姿态，提高人体检测的效率和精度，但是，该方法仍然存在的不足是，由于匹配算法比较复杂，计算复杂度较高，在背景复杂的情况下难以达到很好的检测结果。Beijing Jiaotong University disclosed a human body model-based detection method in its patent application "a human body detection method" (patent application number CN201010218630.8, publication number CN101908150A). In this method, human body detection templates with a certain degree of ambiguity are established through human body samples of various body shapes and postures to determine human body candidate areas. This method can better deal with the occlusion problem, can calculate the posture of the human body, and improve the efficiency and accuracy of human body detection. It is difficult to achieve good detection results under the circumstances.

第二种，基于学习的人体检测方法。该方法通过机器学习从一系列训练数据中学习得到一个分类器，然后利用该分类器对输入窗口进行分类及识别。The second is a learning-based human detection method. The method learns a classifier from a series of training data through machine learning, and then uses the classifier to classify and identify input windows.

哈尔滨工程大学在其申请的专利“基于AdaBoost框架和头部颜色的实时人体检测方法”（专利申请号CN201110104892.6，公开号CN102163281A）中公开了一种结合多尺度梯度直方图HOG特征和头部颜色直方图特征的人体检测方法。该方法提取梯度直方图HOG特征的同时结合了特征模板，增加了头部特征判别的功能，与传统方法相比提高了检测率，特别是对于背景变化不大的图像空间有良好的特征识别效果，但是，该方法仍然存在的不足是，对于背景混杂或者光照不均的情况，检测结果会受到干扰。Harbin Engineering University disclosed a method combining multi-scale gradient histogram HOG features and head Human detection method based on color histogram features. This method extracts the HOG feature of the gradient histogram and combines the feature template at the same time, which increases the function of head feature discrimination, improves the detection rate compared with the traditional method, and has a good feature recognition effect especially for the image space with little background change. , however, this method still has the disadvantage that the detection results will be disturbed in the case of mixed background or uneven illumination.

发明内容Contents of the invention

本发明的目的在于克服上述已有技术的不足，提出一种基于快速鲁棒性特征的高效匹配核人体检测方法，采用基于学习的人体检测方法，通过分层提取图像的局部信息，然后进行字典学习将特征映射到低维空间，集合成特征集，利用线性分类器对特征集进行训练，得到一个人体检测的分类器，再利用此分类器对待检测的图像进行人体检测。The purpose of the present invention is to overcome the deficiencies of the above-mentioned prior art, propose a kind of high-efficiency matching kernel human body detection method based on fast and robust features, adopt the human body detection method based on learning, extract the local information of the image by layering, and then perform dictionary Learn to map features to a low-dimensional space, assemble them into a feature set, use a linear classifier to train the feature set, get a classifier for human detection, and then use this classifier to perform human detection on the image to be detected.

为实现上述目的，本发明包括得到检测分类器和利用所获得的分类器对图像进行检测两个过程，具体实现步骤如下：In order to achieve the above object, the present invention includes two processes of obtaining a detection classifier and utilizing the obtained classifier to detect images, and the specific implementation steps are as follows:

第一个过程，得到检测分类器的具体步骤如下：In the first process, the specific steps to obtain the detection classifier are as follows:

（1）选择训练样本集图像：(1) Select training sample set images:

1a）利用自举操作，从INRIA数据库的非人体自然图像中，获得足够的负样本图像；1a) Using a bootstrap operation, obtain enough negative samples from the non-human natural images of the INRIA database;

1b）将获得的负样本图像与INRIA数据库中的负样本集组成新的负样本集；1b) Combining the obtained negative sample image with the negative sample set in the INRIA database to form a new negative sample set;

1c）将获得的新的负样本集图像与INRIA数据库中的正样本集构成人体训练样本集。1c) The obtained new negative sample set image and the positive sample set in the INRIA database constitute the human training sample set.

（2）提取图像SURF特征点：(2) Extract image SURF feature points:

2a）将人体训练样本集中的每幅图像分割成8*8像素的格子，对每个格子，分别按16、25、36像素大小的尺度采样，每个尺度采样形成格子的一个采样层；2a) Divide each image in the human training sample set into a grid of 8*8 pixels, and sample each grid at a scale of 16, 25, and 36 pixels, and each scale sampling forms a sampling layer of the grid;

2b）对每个8*8像素格子，计算每层采样后格子内采样点的水平方向梯度与垂直方向梯度的平方和，将梯度平方和最大值对应的采样点，作为该像素格子在采样层的快速鲁棒性特征SURF特征点；2b) For each 8*8 pixel grid, calculate the sum of the squares of the horizontal gradient and the vertical gradient of the sampling points in the grid after each layer of sampling, and use the sampling point corresponding to the maximum value of the gradient square sum as the pixel grid in the sampling layer The fast and robust feature SURF feature points;

2c）对人体训练样本集中的每幅图像，从每个采样层上所有像素格子的快速鲁棒性特征SURF特征点中，随机选取15个特征点，作为人体训练样本集的图像在采样层上的快速鲁棒性特征SURF特征点。2c) For each image in the human training sample set, randomly select 15 feature points from the fast robustness feature SURF feature points of all pixel grids on each sampling layer, and use them as images of the human training sample set on the sampling layer The fast robustness features of SURF feature points.

（3）构造每层初始向量基：(3) Construct the initial vector base of each layer:

利用k均值聚类方法，对人体训练样本集中所有图像在每个采样层上的快速鲁棒性特征SURF特征点进行聚类，定义450个聚类中心，获得整个人体训练样本集在采样层上的450维视觉词汇，构成采样层的初始基向量。Use the k-means clustering method to cluster the fast and robust feature SURF feature points of all images in the human training sample set on each sampling layer, define 450 cluster centers, and obtain the entire human training sample set on the sampling layer The 450-dimensional visual vocabulary constitutes the initial basis vector of the sampling layer.

（4）获得采样层的最大核函数特征：(4) Obtain the maximum kernel function feature of the sampling layer:

对于每个采样层的初始基向量，分别利用带约束的核奇异值分解CKSVD进行字典学习，得到采样层的最大核函数特征。For the initial basis vector of each sampling layer, the dictionary learning is carried out by using the kernel singular value decomposition CKSVD with constraints respectively, and the maximum kernel function feature of the sampling layer is obtained.

（5）获得图像高效匹配核特征：(5) Obtain image efficient matching kernel features:

5a）对每个采样层，按降序排列采样层的最大核函数特征的元素值，判断最大元素值的元素个数是否为1，如果是，将采样层的最大核函数特征作为采样层的特征向量输出，否则，将采样层的最大核函数特征中元素值与最大值相等的元素置成零，将置零后的采样层的最大核函数特征作为采样层的特征向量输出；5a) For each sampling layer, arrange the element values of the maximum kernel function feature of the sampling layer in descending order, and judge whether the number of elements of the maximum element value is 1, and if so, use the maximum kernel function feature of the sampling layer as the feature of the sampling layer Vector output, otherwise, the element value equal to the maximum value in the maximum kernel function feature of the sampling layer is set to zero, and the maximum kernel function feature of the sampling layer after zeroing is output as the feature vector of the sampling layer;

5b）对所有采样层的特征向量进行加权求和，得到所有尺度特征，储存所有尺度特征；5b) Perform weighted summation of the feature vectors of all sampling layers to obtain all scale features and store all scale features;

5c）对所有尺度特征向量的每行元素求均值，在坐标横轴上对应的均值点进行均值个数的累加，得到所有尺度特征向量所有行的元素均值的分布，选择所有行的元素均值分布中类似高斯分布的特征，作为最终图像的快速鲁棒性特性的高效匹配核特征。5c) Calculate the mean value of the elements of each row of all scale feature vectors, and accumulate the number of mean values at the corresponding mean points on the horizontal axis of the coordinates to obtain the distribution of the mean values of the elements of all rows of all scale feature vectors, and select the distribution of the mean values of all rows Gaussian-like distribution of features as efficient matching kernel features for fast and robust properties of the final image.

（6）分类训练：(6) Classification training:

使用支持向量机SVM分类器对提取到的快速鲁棒性特性的高效匹配核特征进行分类训练，得到检测分类器。Using the support vector machine (SVM) classifier to classify and train the extracted high-efficiency matching kernel features of fast and robust features, a detection classifier is obtained.

第二个过程，利用所获得的检测分类器对图像进行检测的具体步骤如下：In the second process, the specific steps of using the obtained detection classifier to detect the image are as follows:

（7）输入图像进行扫描：(7) Input image to scan:

输入一幅被检测图像，用窗口扫描法扫描整幅被检测图像，得到一组扫描窗口图像，将该组扫描窗口图像输入到检测分类器。Input a detected image, scan the entire detected image by window scanning method to obtain a group of scanning window images, and input the group of scanning window images into the detection classifier.

（8）检测扫描窗口：(8) Detection scan window:

8a）用检测分类器判断所输入的扫描窗口图像中是否包含有人体信息，若不存在人体信息，则将该被检测图像定位为非人体自然图像，否则，从判断出的所有有人体信息的扫描窗口图像中，找出检测分类器分数最高的扫描窗口图像作为主窗口图像；8a) Use the detection classifier to judge whether the input scanning window image contains human body information, if there is no human body information, then locate the detected image as a non-human natural image, otherwise, from all the judged human body information In the scan window image, find the scan window image with the highest score of the detection classifier as the main window image;

8b）从主窗口图像以外剩余的有人体信息的扫描窗口图像中，将与主窗口图像重叠大于50%的扫描窗口图像与主窗口图像进行窗口组合操作，将窗口组合得到的窗口作为一个检测结果保存，删除所有参与窗口组合的图像；8b) From the remaining scanning window images with human body information other than the main window image, the scanning window image overlapping with the main window image by more than 50% is combined with the main window image, and the window obtained by window combination is used as a detection result Save, delete all images participating in the window composition;

8c）判断有人体信息的扫描窗口图像是否还有剩余，如果有，找出剩余的扫描窗口图像中检测分类器分数最高的图像作为主窗口图像，执行步骤8b），否则，执行步骤（9）。8c) Determine whether there are any scan window images with human body information left, if so, find the image with the highest detection classifier score among the remaining scan window images as the main window image, and perform step 8b), otherwise, perform step (9) .

（9）输出检测结果：(9) Output detection results:

将窗口组合得到的所有窗口在被检测图像上标出，输出标出后的图像，作为被检测图像的人体检测结果。All the windows obtained by combining the windows are marked on the detected image, and the marked image is output as the human body detection result of the detected image.

本发明与现有技术相比具有以下优点：Compared with the prior art, the present invention has the following advantages:

第一，本发明在人体检测的特征提取过程中采用了快速鲁棒性特征，快速鲁棒性特征通过计算局部的图像梯度变化情况对区域图像进行统计，构成整幅图像的带有统计性质的特征，可以避免现有技术中基于边缘和基于轮廓的图像表示方法产生的模糊表示的问题，使得本发明在处理混杂背景和光照不均图像时，可以获得更好的检测结果。First, the present invention uses fast robust features in the feature extraction process of human body detection. The fast robust features perform statistics on regional images by calculating local image gradient changes to form a statistical image of the entire image. feature, which can avoid the problem of fuzzy representation produced by edge-based and contour-based image representation methods in the prior art, so that the present invention can obtain better detection results when dealing with mixed background and unevenly illuminated images.

第二，本发明在人体检测的特征提取过程中对图像分层提取特征，有效利用了不同尺度上的特征点信息，避免现有技术中尺度过小带来的局部匹配误差，使得本发明可以取得较好的检测结果。Second, in the feature extraction process of human body detection, the present invention extracts features layer by layer from images, effectively utilizes feature point information on different scales, and avoids local matching errors caused by too small scales in the prior art, so that the present invention can Get better test results.

第三，本发明将提取的图像特征采用字典学习方法将图像特征映射到低维空间，集合成特征集，与现有技术相比降低了图像特征的维数，有效缩减了图像特征的计算时间和数据的计算量。Third, the present invention uses the dictionary learning method to map the extracted image features to a low-dimensional space, and assembles them into a feature set. Compared with the prior art, the dimensionality of image features is reduced, and the calculation time of image features is effectively reduced. and data calculations.

附图说明Description of drawings

图1是本发明的流程图；Fig. 1 is a flow chart of the present invention;

图2是本发明中使用的样本图像；Fig. 2 is a sample image used in the present invention;

图3是本发明与基于梯度直方图HOG特征人体检测方法的分类器分类性能比较图；Fig. 3 is the comparison diagram of the classifier classification performance of the present invention and the human body detection method based on gradient histogram HOG feature;

图4是本发明方法与基于梯度直方图HOG特征人体检测方法对光照不均图像进行人体检测的仿真图；Fig. 4 is the simulation diagram of the method of the present invention and the human body detection method based on the gradient histogram HOG feature to the uneven illumination image;

图5是本发明方法与基于梯度直方图HOG特征人体检测方法对复杂背景图像进行人体检测的仿真图。FIG. 5 is a simulation diagram of the method of the present invention and the human body detection method based on the gradient histogram HOG feature for human body detection on complex background images.

具体实施方式Detailed ways

下面结合附图对本发明做进一步的描述。The present invention will be further described below in conjunction with the accompanying drawings.

参照附图1，本发明的具体步骤如下：With reference to accompanying drawing 1, concrete steps of the present invention are as follows:

步骤1，选择训练样本集图像。Step 1. Select training sample set images.

利用自举操作，从INRIA数据库的非人体自然图像中，获得足够的负样本图像。Using a bootstrap operation, enough negative images are obtained from the non-human natural images of the INRIA database.

自举操作的具体步骤如下：The specific steps of the bootstrap operation are as follows:

第一步，从INRIA数据库中随机选取m个正样本图像与n个负样本图像，其中100≤m≤500,100≤n≤800,且n≤m≤3n，使用梯度方向直方图HOG特征提取方法，对所选取的所有正负样本图像进行特征提取，利用支持向量机SVM分类器对提取的特征进行分类训练，得到初始分类器。In the first step, m positive sample images and n negative sample images are randomly selected from the INRIA database, among which 100≤m≤500, 100≤n≤800, and n≤m≤3n, using the gradient orientation histogram HOG feature extraction method, Feature extraction is performed on all the selected positive and negative sample images, and the SVM classifier is used to perform classification training on the extracted features to obtain the initial classifier.

第二步，连续随机选取INRIA数据库中的非人体自然图像，采用样本图像大小的扫描窗口，从左至右以8个像素为移动单位，从上至下以16个像素为移动单位，扫描整幅被检测的非人体自然图像；将所有的扫描窗口里的图像输入到初始分类器进行检测，保存分类器错分的扫描窗口图像，直至错分的扫描窗口图像数量达到a张，200≤a≤500,停止选取非人体自然图像；从错分的扫描窗口图像中随机挑选b张图像，1/5a≤b≤1/3a，与当前的负样本图像组成新的负样本集。The second step is to continuously randomly select non-human natural images in the INRIA database, use the scanning window of the size of the sample image, and use 8 pixels as the moving unit from left to right, and 16 pixels as the moving unit from top to bottom to scan the whole body. A detected non-human natural image; input all the images in the scanning window to the initial classifier for detection, save the misclassified scanning window images until the number of misclassified scanning window images reaches a, 200≤a ≤500, stop selecting non-human natural images; randomly select b images from misclassified scan window images, 1/5a≤b≤1/3a, and form a new negative sample set with the current negative sample image.

第三步，对随机选取的m个正样本图像和新的负样本集，进行梯度方向直方图HOG特征提取、训练分类器、检测非人体自然图像及更新负样本集。The third step is to extract m positive sample images and a new negative sample set at random, perform gradient direction histogram HOG feature extraction, train a classifier, detect non-human natural images and update the negative sample set.

第四步，重复执行第三步操作，直至更新后的最终的训练样本集由2416个正样本图像与13500个负样本图像组成，样本的大小均为128×64像素。The fourth step is to repeat the operation of the third step until the updated final training sample set consists of 2416 positive sample images and 13500 negative sample images, and the size of the samples is 128×64 pixels.

将获得的负样本图像与INRIA数据库中的负样本集组成新的负样本集。The obtained negative sample image and the negative sample set in the INRIA database are combined to form a new negative sample set.

将获得的新的负样本集图像与INRIA数据库中的正样本集构成人体训练样本集。The obtained new negative sample set image and the positive sample set in the INRIA database constitute the human training sample set.

本发明的实施例中，最终的人体训练样本集中，训练样本集由2416个正样本与13500个负样本组成，测试样本集由1132个正样本与4050个负样本组成，样本图像的大小均为128×64像素。In the embodiment of the present invention, in the final human training sample set, the training sample set consists of 2416 positive samples and 13500 negative samples, the test sample set consists of 1132 positive samples and 4050 negative samples, and the size of the sample images is 128×64 pixels.

图2是本发明使用的部分样本图像，其中图2(a)为本发明中使用的部分正样本图像，图2(b)为本发明中使用的为部分负样本图像。Figure 2 is a partial sample image used in the present invention, wherein Figure 2(a) is a partial positive sample image used in the present invention, and Figure 2(b) is a partial negative sample image used in the present invention.

步骤2，提取图像的SURF特征点。Step 2, extract the SURF feature points of the image.

将人体训练样本集中的每幅图像分割成8*8像素的格子，对每个格子，分别按16、25、36像素大小的尺度采样，每个尺度采样形成格子的一个采样层。Each image in the human training sample set is divided into 8*8 pixel grids, and each grid is sampled at a scale of 16, 25, and 36 pixels, and each scale sampling forms a sampling layer of the grid.

对每个8*8像素格子，计算每层采样后格子内采样点的水平方向梯度与垂直方向梯度的平方和，将梯度平方和最大值对应的采样点，作为该像素格子在采样层的快速鲁棒性特征SURF特征点。For each 8*8 pixel grid, calculate the sum of the squares of the horizontal gradient and the vertical gradient of the sampling points in the grid after each layer of sampling, and use the sampling point corresponding to the maximum value of the gradient square sum as the fast speed of the pixel grid in the sampling layer Robustness features SURF feature points.

对人体训练样本集中的每幅图像，从每个采样层上所有像素格子的快速鲁棒性特征SURF特征点中，随机选取15个特征点，作为人体训练样本集的图像在采样层上的快速鲁棒性特征SURF特征点。For each image in the human body training sample set, 15 feature points are randomly selected from the fast robust feature SURF feature points of all pixel grids on each sampling layer, and used as the fast Robustness features SURF feature points.

步骤3，构造每层初始向量基。Step 3, construct the initial vector base of each layer.

利用k均值聚类方法，对所有训练样本图像在每个采样层上所有快速鲁棒性特征SURF特征点进行聚类，定义450个聚类中心，获得整个训练样本在采样层的450维视觉词汇，构成采样层的初始向量基。Using the k-means clustering method, cluster all the fast and robust feature SURF feature points of all training sample images on each sampling layer, define 450 cluster centers, and obtain the 450-dimensional visual vocabulary of the entire training sample at the sampling layer , constituting the initial vector basis of the sampling layer.

k均值聚类方法的具体步骤如下：The specific steps of the k-means clustering method are as follows:

第一步，对每个采样层，随机从人体训练样本集的所有样本图像在采样层的快速鲁棒性特征SURF特征点中，选取450个快速鲁棒性特征SURF特征点，作为采样层的初始聚类中心，分别将450个初始聚类中心的数据值，作为所在初始聚类中心的聚类中心值。In the first step, for each sampling layer, randomly select 450 fast robustness feature SURF feature points from all sample images of the human training sample set in the sampling layer as the fast robustness feature SURF feature points, as the sampling layer For the initial clustering centers, the data values of 450 initial clustering centers are used as the clustering center values of the initial clustering centers.

第二步，计算人体训练样本集在采样层的所有快速鲁棒性特征SURF特征点到各个聚类中心的欧式距离。The second step is to calculate the Euclidean distance from all the fast and robust SURF feature points of the human training sample set at the sampling layer to each cluster center.

第三步，将采样层的每个快速鲁棒性特征SURF特征点归到与自身距离最近的聚类中心所在的类别中。The third step is to classify each fast robust feature SURF feature point of the sampling layer into the category of the cluster center closest to itself.

第四步，判断归类后每一类的快速鲁棒性特征SURF特征点的数据平均值是否等于聚类中心值，如果是，执行第五步，否则，将所求每一类的特征点的数据平均值作为新的聚类中心值，返回第二步。The fourth step is to judge whether the data average value of the fast robust feature SURF feature points of each class after classification is equal to the cluster center value, if yes, perform the fifth step, otherwise, calculate the feature points of each class The average value of the data is used as the new cluster center value, and returns to the second step.

第五步，保存450个聚类中心值，用450个聚类中心值构成列向量，将该列向量作为整个人体训练样本集在采样层上的初始基向量输出。The fifth step is to save 450 cluster center values, use the 450 cluster center values to form a column vector, and output the column vector as the initial basis vector of the entire human training sample set on the sampling layer.

步骤4，获得采样层的最大核函数特征。Step 4, obtain the maximum kernel function feature of the sampling layer.

自举操作的步骤如下：The steps for bootstrap operation are as follows:

第一步，对于每个采样层，将采样层的初始基向量投影到一个450维的空间上，通过下式计算，获得采样层的初始基向量的投影向量：In the first step, for each sampling layer, the initial basis vector of the sampling layer is projected onto a 450-dimensional space, and the projection vector of the initial basis vector of the sampling layer is obtained by the following formula:

R=R₁×[v₁,...v_j...,v_N]R=R ₁ ×[v ₁ ,...v _j ...,v _N ]

其中，R表示采样层的初始基向量的投影向量，R₁表示采样层的初始基向量，v_j表示人体训练样本集的所有样本图像在采样层上提取的第j个特征点的投影系数的向量，v_j=[v_1j,v_2j,...v_ij...,v_Mj]^T，v_ij表示人体训练样本集中第i幅样本图像在采样层上提取的第j个特征点的投影系数，M表示人体训练样本集的所有样本图像个数，j=1,2,...,N，N表示人体训练样本集中的每一幅样本图像在采样层上随机选取的特征点的个数。Among them, R represents the projection vector of the initial basis vector of the sampling layer, _R1 represents the initial basis vector of the sampling layer, and _vj represents the projection coefficient of the jth feature point extracted on the sampling layer from all sample images of the human training sample set Vector, v _j =[v _1j ,v _2j ,...v _ij ...,v _Mj ] ^T , v _ij represents the jth feature point extracted from the i-th sample image in the human training sample set on the sampling layer Projection coefficient, M represents the number of all sample images in the human training sample set, j=1, 2,..., N, N represents the number of feature points randomly selected on the sampling layer for each sample image in the human training sample set number.

第二步，按下式构造一个逼近函数，在投影空间上去拟合逼近采样层上的初始基向量的投影：In the second step, an approximation function is constructed according to the following formula to fit the projection of the initial basis vector on the approximation sampling layer in the projection space:

f(r)=argmin||r-R||f(r)=argmin||r-R||

其中，r表示采样层上最大核函数特征，R表示采样层的初始基向量R₁的投影，||·||表示2范数，argmin||·||表示求最小值。Among them, r represents the maximum kernel function feature on the sampling layer, R represents the projection of the initial basis vector R ₁ of the sampling layer, ||·|| represents the 2-norm, and argmin||·|| represents the minimum value.

将R=R₁×[v₁,...v_j...,v_N]代入上式，并将r按r=[r₁,...r_j...,r_N]展开，按下式得到最大核函数特征r对初始基向量R₁的2次逼近函数f(v,r)：Substitute R=R ₁ ×[v ₁ ,...v _j ...,v _N ] into the above formula, and expand r by r=[r ₁ ,...r _j ...,r _N ], The quadratic approximation function f(v,r) of the maximum kernel function feature r to the initial basis vector R ₁ is obtained by the following formula:

$f f ((v v,, r r)) = = \frac{11}{N N} {Σ Σ}_{j j = = 11}^{N N} {| | | | {r r}_{j j} - - {R R}_{11} {v v}_{j j} | | | |}^{22}$

其中，v表示人体训练样本集的所有样本图像提取的所有特征点的低维投影系数向量，且v=[v₁,...v_j...,v_N]，v_j表示人体训练样本集的所有样本图像在采样层上提取的第j个特征点的投影系数的向量，N表示人体训练样本集中的每一幅样本图像在采样层上随机选取的特征点的个数，r_j表示人体训练样本集的所有样本图像提取的第j个特征点的最大核特征向量，R₁表示采样层的初始基向量。Among them, v represents the low-dimensional projection coefficient vector of all feature points extracted from all sample images of the human training sample set, and v=[v ₁ ,...v _j ...,v _N ], v _j represents the human training sample The vector of the projection coefficients of the jth feature point extracted on the sampling layer from all sample images in the set, N represents the number of randomly selected feature points on the sampling layer for each sample image in the human training sample set, r _j represents The largest kernel feature vector of the jth feature point extracted from all sample images of the human training sample set, R ₁ represents the initial basis vector of the sampling layer.

第三步，使用随机梯度下降法求解逼近函数，获得下式来迭代更新该采样层上的最大核函数特征，构成低维的图像特征表示：The third step is to use the stochastic gradient descent method to solve the approximation function, and obtain the following formula to iteratively update the maximum kernel function feature on the sampling layer to form a low-dimensional image feature representation:

$r r ((k k + + 11)) = = r r ((k k)) - - \frac{η η}{k k} \frac{&PartialD; &PartialD; {Σ Σ}_{j j = = 11}^{N N} {| | | | {r r}_{j j} - - R R {(({R R}^{T T} R R))}^{- - 11} (({R R}^{T T} {r r}_{j j})) | | | |}^{22}}{&PartialD; &PartialD; r r}$

其中，r（k+1）表示迭代k+1次得到的采样层上的最大核函数特征，k表示迭代次数，r（k）表示迭代k次得到的采样层上的最大核函数特征，η表示学习率，是个常量，

表示计算括号里的式子对r的导数，r_j表示人体训练样本集中的所有样本图像在采样层上提取的第j个特征点的最大核特征向量，R₁表示采样层上的初始基向量，R₁ ^T表示采样层上的初始基向量R₁的转置向量，设定迭代次数为1000次，迭代完成后得到的r(1000)作为采样层上的最终的最大核函数特征，j=1,2,...,N，N表示人体训练样本集中的每一幅样本图像在采样层上的随机选取的特征点的数量。Among them, r(k+1) represents the maximum kernel function feature on the sampling layer obtained by iterating k+1 times, k represents the number of iterations, r(k) represents the maximum kernel function feature on the sampling layer obtained by iterating k times, η Indicates the learning rate, which is a constant,

Indicates the calculation of the derivative of the formula in parentheses to r, r _j represents the maximum kernel feature vector of the jth feature point extracted from all sample images in the human training sample set on the sampling layer, and R ₁ represents the initial basis vector on the sampling layer , R ₁ ^T represents the transpose vector of the initial basis vector R ₁ on the sampling layer, set the number of iterations to 1000, and the r(1000) obtained after the iteration is used as the final maximum kernel function feature on the sampling layer, j= 1, 2,..., N, N represents the number of randomly selected feature points on the sampling layer for each sample image in the human training sample set.

步骤5，获得图像高效匹配核特征。Step 5, obtain image high-efficiency matching kernel features.

对每个采样层，按降序排列采样层的最大核函数特征的元素值，判断最大元素值的元素个数是否为1，如果是，将采样层的最大核函数特征作为采样层的特征向量输出，否则，将采样层的最大核函数特征中元素值与最大值相等的元素置成零，将置零后的采样层的最大核函数特征作为采样层的特征向量输出。For each sampling layer, arrange the element values of the maximum kernel function feature of the sampling layer in descending order, and judge whether the number of elements of the maximum element value is 1, and if so, output the maximum kernel function feature of the sampling layer as the feature vector of the sampling layer , otherwise, the element whose value is equal to the maximum value in the maximum kernel function feature of the sampling layer is set to zero, and the maximum kernel function feature of the sampling layer after zeroing is output as the feature vector of the sampling layer.

对所有采样层的特征向量进行加权求和，得到所有尺度特征，储存所有尺度特征。Weighted and summed the feature vectors of all sampling layers to obtain all scale features and store all scale features.

加权求和的方式如下：The method of weighted summation is as follows:

G*=G_i×A_i G*=G _i ×A _i

其中，G*表示所有采样层特征，G_i表示每个采样层的特征向量，i=1,2,3，A_i表示每个采样层对应的权重，

w_i=1/p_i，p_i表示每个采样层的采样尺度的像素大小，p_i={16,25,36}。Among them, G* represents the characteristics of all sampling layers, G _i represents the feature vector of each sampling layer, i=1,2,3, A _i represents the weight corresponding to each sampling layer,

w _i =1/p _i , p _i represents the pixel size of the sampling scale of each sampling layer, p _i ={16,25,36}.

对所有尺度特征向量的每行元素求均值，在坐标横轴上对应的均值点进行均值个数的累加，得到所有尺度特征向量所有行的元素均值的分布，选择所有行的元素均值分布中类似高斯分布的特征，作为最终图像的快速鲁棒性特性的高效匹配核特征。Calculate the mean value of each row element of all scale feature vectors, and accumulate the number of mean values at the corresponding mean points on the horizontal axis of the coordinates to obtain the distribution of the element mean values of all rows of all scale feature vectors, and select the element mean value distribution of all rows similar to Gaussian-distributed features as efficient matching kernel features for fast robustness properties of the final image.

步骤6，分类训练。Step 6, classification training.

步骤7，输入图像进行扫描。Step 7, input image to scan.

窗口扫描的具体步骤如下：The specific steps of window scanning are as follows:

第一步，将输入的被检测图像左上角的一个人体训练样本集中样本图像大小的区域作为第一个扫描窗口，将该扫描窗口作为当前扫描窗口，保存当前扫描窗口图像。In the first step, the area of the sample image size in a human training sample set in the upper left corner of the input detected image is used as the first scanning window, and the scanning window is used as the current scanning window, and the current scanning window image is saved.

第二步，将当前扫描窗口在被检测的图像上向右平移8个像素或下移16个像素得到一个新的扫描窗口，用新的扫描窗口去替换当前扫描窗口，保存当前扫描窗口图像。In the second step, the current scanning window is shifted to the right by 8 pixels or down by 16 pixels on the detected image to obtain a new scanning window, and the current scanning window is replaced with the new scanning window, and the image of the current scanning window is saved.

第三步，按上述方法移动当前扫描窗口，用移动后的扫描窗口去替换当前扫描窗口直至扫描完整幅被检测的图像为止，保存所有的扫描窗口图像。The third step is to move the current scanning window according to the above method, and replace the current scanning window with the moved scanning window until the complete image to be detected is scanned, and all the scanning window images are saved.

步骤8，检测扫描窗口。Step 8, detecting the scan window.

8a）用检测分类器判断所输入的扫描窗口图像中是否包含有人体信息，若不存在人体信息，则将该被检测图像定位为非人体自然图像，否则，从判断出的所有有人体信息的扫描窗口图像中，找出检测分类器分数最高的扫描窗口图像作为主窗口图像。8a) Use the detection classifier to judge whether the input scanning window image contains human body information, if there is no human body information, then locate the detected image as a non-human natural image, otherwise, from all the judged human body information Among the scan window images, find the scan window image with the highest detection classifier score as the main window image.

8b）从主窗口图像以外剩余的有人体信息的扫描窗口图像中，将与主窗口图像重叠大于50%的扫描窗口图像与主窗口图像进行窗口组合操作，将窗口组合得到的窗口作为一个检测结果保存，删除所有参与窗口组合的图像。8b) From the remaining scanning window images with human body information other than the main window image, the scanning window image overlapping with the main window image by more than 50% is combined with the main window image, and the window obtained by window combination is used as a detection result Save, delete all images participating in the window composition.

窗口组合的具体步骤如下：The specific steps of window composition are as follows:

第一步，将所有需要窗口组合的图像从1开始顺序编号。The first step is to sequentially number all images that require window combinations starting from 1.

第二步，将每幅需要窗口组合的图像的分类器分数，在所有需要窗口组合的图像的分类器分数之和中占的比重作为图像边界加权的权重。In the second step, the proportion of the classifier score of each image requiring window combination in the sum of the classifier scores of all images requiring window combination is used as the weight of image boundary weighting.

第三步，利用下式，对需要窗口组合的图像的每条边界进行加权：In the third step, use the following formula to weight each boundary of the image that needs window combination:

$X x = = {x x}_{11} \times \times \frac{{m m}_{11}}{A A} + + {x x}_{22} \times \times \frac{{m m}_{22}}{A A} + + . . . . . . + + {x x}_{N N} \times \times \frac{{m m}_{N N}}{A A}$

其中，X表示加权后得到的窗口边界在被检测图像上的所在行的像素值或所在列的像素值，x₁,x₂,...x_N分别表示参与窗口组合的图像边界在被检测图像上的所在行的像素值或所在列的像素值，m₁,m₂,...m_N分别表示参与窗口组合的图像对应的分类器分数，N表示参与窗口组合的图像个数，A表示参与窗口组合的图像分类器分数之和，

N表示参与窗口组合的图像个数，i表示窗口组合图像的编号，m_i表示第i幅参与窗口组合的图像的分类器分数。Among them, X represents the pixel value of the row or column of the window boundary on the detected image obtained after weighting, and x ₁ , x ₂ ,...x _N respectively represent the image boundaries participating in the window combination in the detected image The pixel value of the row or the pixel value of the column on the image, m ₁ , m ₂ ,...m _N respectively represent the classifier scores corresponding to the images participating in the window combination, N represents the number of images participating in the window combination, A Denotes the sum of the image classifier scores participating in the window combination,

N represents the number of images participating in the window combination, i represents the number of the window combination image, and m _i represents the classifier score of the i-th image participating in the window combination.

第四步，将加权后的边界组成一个窗口。In the fourth step, the weighted boundaries are combined into a window.

8c）判断有人体信息的扫描窗口图像是否还有剩余，如果有，找出剩余的扫描窗口图像中检测分类器分数最高的图像作为主窗口图像，执行步骤8b），否则，执行步骤9。8c) Determine whether there are any scan window images with human body information left, if so, find the image with the highest detection classifier score among the remaining scan window images as the main window image, and perform step 8b), otherwise, perform step 9.

步骤9，输出检测结果。Step 9, output the detection result.

本发明的效果可通过以下仿真进一步说明：Effect of the present invention can be further illustrated by following simulation:

1、仿真实验条件设置：1. Simulation experiment condition setting:

本发明的仿真实验在Matlab2009a上编译完成，仿真环境为Windows框架下的HP工作站。实验所需的正样本图像和负样本图像均取自于INRIA数据库，训练样本包括2416个正样本与13500个负样本，测试样本集包括1132个正样本与4050个负样本，正样本与负样本图像的大小均为128×64像素，图2是本发明中使用的部分样本图像，其中图2(a)为本发明中使用的部分正样本图像，图2(b)为本发明中使用的部分负样本图像。The simulation experiment of the present invention is compiled on Matlab2009a, and the simulation environment is an HP workstation under the Windows framework. The positive sample images and negative sample images required for the experiment are taken from the INRIA database. The training samples include 2416 positive samples and 13500 negative samples. The test sample set includes 1132 positive samples and 4050 negative samples. The positive samples and negative samples The size of the image is 128×64 pixels, and Fig. 2 is a part of the sample image used in the present invention, wherein Fig. 2 (a) is a part of the positive sample image used in the present invention, and Fig. 2 (b) is a part of the sample image used in the present invention Partial negative sample images.

2、仿真内容及结果分析：2. Simulation content and result analysis:

仿真1：Simulation 1:

分类器训练完成后得到的准确率是判断分类器性能的重要指标之一。为了得到较好性能的分类器，我们对提取样本图像特征时，对样本图像的采样层数和初始基向量低维投影时的投影维数这两个参数的选取做了大量的实验，将不同的采样层数对样本图像进行采样、不同的投影维数对初始基向量进行投影训练分类器得到的准确率进行了对比，对比结果如表1所示。The accuracy rate obtained after the classifier training is completed is one of the important indicators for judging the performance of the classifier. In order to obtain a classifier with better performance, we have done a lot of experiments on the selection of two parameters, the number of sampling layers of the sample image and the projection dimension of the low-dimensional projection of the initial basis vector, when extracting the features of the sample image. The number of sampling layers samples the sample image, and different projection dimensions project the initial basis vector to the accuracy rate of the training classifier. The comparison results are shown in Table 1.

从表1可以看出，对于相同的投影维数，对样本图像进行3层采样时的分类器准确率高于对样本图像进行2层采样时的分类器准确率；而对于相同的采样层数，并不一定是投影维数越高，分类器准确率就越高。从图中的数据可以看出，对样本图像进行3层采样，将初始基向量向450维投影时获得的分类器准确率是最高的，获得的分类性能是最好的。It can be seen from Table 1 that for the same projection dimension, the accuracy rate of the classifier when the sample image is sampled with 3 layers is higher than that when the sample image is sampled with 2 layers; while for the same number of sampling layers , it does not necessarily mean that the higher the projection dimension, the higher the accuracy of the classifier. It can be seen from the data in the figure that the accuracy of the classifier obtained when the sample image is sampled at three layers and the initial basis vector is projected to 450 dimensions is the highest, and the classification performance obtained is the best.

仿真2：Simulation 2:

分别使用本发明和基于梯度直方图HOG特征人体检测方法对人体训练样本集进行特征提取，训练分类器，对获得的分类器性能进行对比。分类器性能对比示意图参照附图3，图3中选择通过比较真阳率TPR（True Positive Rate）和假阳率FPR（False Positives Rates）关系的接收者操作特征ROC（Receiver OperatingCharacteristic）曲线来评价分类器的性能。ROC曲线越靠上倾向左顶角，其对应的分类器就越优秀。Using the present invention and the human body detection method based on the gradient histogram HOG feature to extract the features of the human body training sample set, train the classifier, and compare the performance of the obtained classifiers. The performance comparison diagram of the classifier is shown in Figure 3. In Figure 3, the ROC (Receiver Operating Characteristic) curve of the relationship between the true positive rate TPR (True Positive Rate) and the false positive rate FPR (False Positives Rates) is selected to evaluate the classification device performance. The higher the ROC curve is towards the top left corner, the better the corresponding classifier.

附图3中的横坐标轴表示假阳率FPR（False Positives Rates），纵坐标轴表示真阳率TPR（True Positive Rate）。附图3中以方块标示的曲线表示本发明分类器真阳率和假阳率关系的ROC曲线，以十字标示的曲线表示基于梯度直方图HOG特征人体检测方法的分类器真阳率和假阳率关系的ROC曲线。从图3可见，本发明得到的ROC曲线相比基于梯度直方图HOG特征人体检测方法得到的ROC曲线，更靠上倾向于左顶角，说明本发明的分类性能优于基于梯度直方图HOG特征人体检测方法的分类性能。The axis of abscissa in Figure 3 represents the false positive rate FPR (False Positives Rates), and the axis of ordinate represents the true positive rate TPR (True Positive Rate). In accompanying drawing 3, represent the ROC curve of classifier true positive rate and the false positive rate relation of classifier true positive rate and false positive rate with the curve marked with box, represent the true positive rate and false positive rate of classifier based on gradient histogram HOG feature human body detection method with the curve marked with cross The ROC curve of the rate relationship. It can be seen from Figure 3 that the ROC curve obtained by the present invention is more inclined to the left top corner than the ROC curve obtained by the human body detection method based on the gradient histogram HOG feature, indicating that the classification performance of the present invention is better than that based on the gradient histogram HOG feature Classification performance of human detection methods.

仿真3：Simulation 3:

用本发明与基于梯度直方图HOG特征人体检测方法对来自INRIA数据库的自然图像进行人体检测，检测结果如图4和图5所示。Using the present invention and the human body detection method based on the gradient histogram HOG feature to detect the human body from the natural image of the INRIA database, the detection results are shown in Figure 4 and Figure 5 .

图4是一幅光照不均的图像，图像4(a)表示本发明的人体检测结果，图中的白色方框，表示本发明的检测分类器检测图像中人体信息后窗口合并的结果。图像4(b)表示基于梯度直方图HOG特征人体检测方法的人体检测结果，图中的白色方框，表示该方法的检测分类器检测图像中人体信息后窗口合并的结果。从图4可以看出，在光照不均的情况下，本发明的方法相较于基于梯度直方图HOG特征人体检测方法，能大大的降低虚警率，能更准确的检测出待检测图像中的所有人体信息。Fig. 4 is an image of uneven illumination, and image 4 (a) represents the human body detection result of the present invention, and the white box in the figure represents the result of window merging after the human body information in the detection classifier detection image of the present invention. Image 4(b) shows the result of human detection based on the gradient histogram HOG feature human detection method. The white box in the figure represents the result of window merging after the detection classifier of this method detects human information in the image. It can be seen from Fig. 4 that in the case of uneven illumination, the method of the present invention can greatly reduce the false alarm rate and more accurately detect the human body detection method based on the gradient histogram HOG feature. all human body information.

图5是一幅带有复杂背景和人物遮挡的图像，图像5(a)表示本发明的人体检测结果，图中的白色方框，表示本发明的检测分类器检测图像中人体信息后窗口合并的结果。图像5(b)表示基于梯度直方图HOG特征人体检测方法的人体检测结果，图中的白色方框，表示该方法的检测分类器检测图像中人体信息后窗口合并的结果。从图5可以看出，在有复杂背景和人物遮挡的情况下，使用本发明方法能更准确的标出人体信息，且窗口合并后得到的窗口大小较于基于梯度直方图HOG特征人体检测方法更合适，具有更高的人体检测正确率。Fig. 5 is an image with complex background and people occlusion, image 5 (a) represents the human body detection result of the present invention, and the white box in the figure represents that the detection classifier of the present invention detects the human body information in the image and the window merges the result of. Image 5(b) shows the human detection result based on the gradient histogram HOG feature human detection method, and the white box in the figure represents the result of window merging after the detection classifier of this method detects the human body information in the image. As can be seen from Figure 5, in the case of complex backgrounds and people occluded, using the method of the present invention can more accurately mark human body information, and the window size obtained after window merging is larger than that of the human body detection method based on the gradient histogram HOG feature It is more suitable and has a higher accuracy rate of human detection.

综上所述，本发明方法能够在光照不均匀，背景复杂及存在部分遮挡的情况下将人体检测出来。从而说明本方法非常适合于自然图像中的人体检测。To sum up, the method of the present invention can detect the human body under the conditions of uneven illumination, complex background and partial occlusion. It shows that this method is very suitable for human detection in natural images.

Claims

1. the efficient coupling core human body detecting method based on the fast robust feature, comprise and obtain detecting sorter and utilize the sorter obtained to be detected two processes to image, and the specific implementation step is as follows:

First process, the concrete steps that obtain detecting sorter are as follows:

(1) select the training sample set image:

1a) utilize the bootstrapping operation, from the non-human body natural image of INRIA database, obtain enough negative sample images;

1b) the negative sample image of acquisition and the negative sample collection in the INRIA database are formed to new negative sample collection;

Positive sample set in the new negative sample collection image that 1c) will obtain and INRIA database forms the human body training sample set;

(2) extract image SURF unique point:

Every width image of 2a) the human body training sample being concentrated is divided into the grid of 8*8 pixel, to each grid, by the yardstick of 16,25,36 pixel sizes, samples respectively, and the trellised sample level of each yardstick sampling shape;

2b) to each 8*8 pixel grid, calculate after every layer of sampling the horizontal direction gradient of sampled point and the quadratic sum of vertical gradient in grid, by sampled point corresponding to gradient quadratic sum maximal value, as this pixel grid in the fast robust feature SURF of sample level unique point;

Every width image of 2c) the human body training sample being concentrated, each sample level in the fast robust feature SURF unique point of all pixel grid, choose at random 15 unique points, as the fast robust feature SURF unique point of image on sample level of human body training sample set;

(3) construct every layer of initial vector base:

Utilize the k means clustering method, to the human body training sample, concentrate the fast robust feature SURF unique point of all images on each sample level to carry out cluster, define 450 cluster centres, obtain the 450 dimension visual vocabularies of whole human body training sample set on sample level, form the initial base vector of sample level;

(4) obtain the maximum kernel Function feature of sample level:

For the initial base vector of each sample level, utilize respectively the core svd CKSVD of belt restraining to carry out dictionary learning, obtain the maximum kernel Function feature of sample level;

(5) obtain image and efficiently mate the core feature:

5a) to each sample level, press the element value of the maximum kernel Function feature of descending sort sample level, whether the element number that judges the greatest member value is 1, if, proper vector output using the maximum kernel Function feature of sample level as sample level, otherwise, the element that in the maximum kernel Function feature of sample level, element value equates with maximal value is set to zero, the proper vector output using the maximum kernel Function feature of the sample level after zero setting as sample level;

5b) proper vector of all sample level is weighted to summation, obtains all scale features, store all scale features;

5c) every row element of all scale feature vectors is averaged, on the coordinate transverse axis, corresponding average point carries out the cumulative of average number, obtain the distribution of the element average of all row of all scale feature vectors, select the feature of similar Gaussian distribution in the element distribution of mean value of all row, as the efficient coupling core feature of the fast robust characteristic of final image;

(6) classification based training:

Use the support vector machines sorter to carry out classification based training to the efficient coupling core feature of the fast robust characteristic extracted, obtain detecting sorter;

Second process, the concrete steps that the detection sorter that utilization obtains is detected image are as follows:

(7) input picture is scanned:

Input the detected image of a width, with the detected image of window scanning method scanning view picture, obtain one group of scanning window image, this group scanning window image is input to the detection sorter;

(8) detect scanning window:

8a) with detecting sorter, judge in the scanning window image of inputting whether include human body information, if there is not human body information, should be detected framing is non-human body natural image, otherwise, from all scanning window images that human body information arranged of judging, find out and detect scanning window image that the sorter mark is the highest as the main window image;

8b) from the remaining scanning window image that human body information arranged beyond the main window image, to be greater than 50% scanning window image and main window image with the main window doubling of the image and carry out the window combination operation, the window that window combination is obtained is preserved as a testing result, deletes the image of all participation window combination;

8c) judgement has the scanning window image of human body information whether to also have residue, if having, finds out in remaining scanning window image and detects image that the sorter mark is the highest as the main window image, execution step 8b), otherwise, execution step (9);

(9) output detections result:

All windows that window combination is obtained mark on detected image, and the image after output marks, as the human detection result of detected image.

2. the efficient coupling core human body detecting method based on the fast robust feature according to claim 1 is characterized in that: step 1a) concrete steps of described bootstrapping operation are as follows:

The first step, choose at random m positive sample image and n negative sample image from the INRIA database, 100≤m≤500 wherein, 100≤n≤800, and n≤m≤3n, used gradient orientation histogram HOG feature extracting method, and selected all positive and negative sample image is carried out to feature extraction, utilize the support vector machines sorter to carry out classification based training to the feature of extracting, obtain the preliminary classification device;

Second step, choose at random continuously the non-human body natural image in the INRIA database, adopt the scanning window of sample image size, 8 pixels of take from left to right are Moving Unit, 16 pixels of take from top to bottom are Moving Unit, the detected non-human body natural image of scanning view picture; Image in all scanning windows is input to the preliminary classification device and is detected, preserve the sorter scanning window image of wrong minute, open until the scanning window amount of images of wrong minute reaches a, 200≤a≤500, stop choosing non-human body natural image; The scanning window image divided from mistake, random choose b opens image, and 1/5a≤b≤1/3a forms new negative sample collection with current negative sample image;

The 3rd step, to m positive sample image and the new negative sample collection of choosing at random, carry out gradient orientation histogram HOG feature extraction, training classifier, the non-human body natural image of detection and upgrade the negative sample collection;

The 4th step, repeat the 3rd step operation, until the final training sample set after upgrading is comprised of 2416 positive sample images and 13500 negative sample images, size is 128 * 64 pixels.

3. the efficient coupling core human body detecting method based on the fast robust feature according to claim 1, it is characterized in that: the concrete steps of the described k means clustering method of step (3) are as follows:

The first step, to each sample level, random all sample images from the human body training sample set are the fast robust feature SURF of sample level unique point, choose 450 fast robust feature SURF unique points, initial cluster center as sample level, respectively by the data value of 450 initial cluster centers, as the cluster centre value of place initial cluster center;

Second step, calculate the human body training sample set and arrive the Euclidean distance of each cluster centre in all fast robust feature SURF unique points of sample level;

The 3rd step, be grouped into each fast robust feature SURF unique point of sample level in the classification at the cluster centre place nearest with self;

The 4th step, whether the data mean value that the fast robust feature SURF unique point of rear each class is sorted out in judgement equals the cluster centre value, if, carry out the 5th step, otherwise, using the data mean value of the unique point of required each class as new cluster centre value, return to second step;

The 5th step, preserve 450 cluster centre values, by 450 cluster centre values, forms column vector, using the initial base vector output on sample level as whole human body training sample set of this column vector.

4. the efficient coupling core human body detecting method based on the fast robust feature according to claim 1, it is characterized in that: the concrete steps of the described dictionary learning of step (4) are as follows:

The first step, for each sample level, project to the initial base vector of sample level on the space of one 450 dimension, by following formula, calculates, and obtains the projection vector of the initial base vector of sample level:

R=R ₁×[v ₁,...v _j...,v _N]

Wherein, R means the projection vector of the initial base vector of sample level, R ₁the initial base vector that means sample level, v _jthe vector of the projection coefficient of j the unique point that all sample images of expression human body training sample set extract on sample level, v _j=[v _1j, v _2j... v _ij..., v _mj] ^t, v _ijthe projection coefficient that means j the unique point that the human body training sample concentrates i width sample image to extract on sample level, M means the sample image number of human body training sample set, j=1,2, ..., N, N means the number of the unique point that each concentrated width sample image of human body training sample is chosen at random on sample level;

Second step, press an approximating function of following formula structure, the projection of the initial base vector on projector space gets on the approach sample level:

f(r)=argmin||r-R||

Wherein, r means the maximum kernel Function feature on sample level, and R means the initial base vector R on sample level ₁projection, || || mean 2 norms, argmin|||| means to minimize;

The 3rd step, calculate approximating function, obtains following formula and come iteration to upgrade the maximum kernel Function feature on sample level, forms the image feature representation of low-dimensional:

r (k + 1) = r (k) - \frac{η}{k} \frac{&PartialD; Σ_{j = 1}^{N} {| | r_{j} - R_{1} {({R_{1}}^{T} R_{1})}^{- 1} ({R_{1}}^{T} r_{j}) | |}^{2}}{&PartialD; r}

Wherein, the maximum kernel Function feature on the sample level that r(k+1) the expression iteration obtains for k+1 time, k means iterations, the maximum kernel Function feature on the sample level that r(k) the expression iteration obtains for k time, η means learning rate, is a constant,

the derivative of formula in expression calculating bracket to r, r _jthe maximum kernel proper vector that means j the unique point that the concentrated all sample images of human body training sample extract on sample level, R ₁mean the initial base vector on sample level, R ₁ ^tmean the initial base vector R on sample level ₁transposed vector, setting iterations is 1000 times, the r obtained after iteration completes (1000) is as the final maximum kernel Function feature on sample level, j=1,2, ..., N, N means the quantity of the unique point at random chosen of each width sample image on sample level that the human body training sample is concentrated.

5. the efficient coupling core human body detecting method based on the fast robust feature according to claim 1, it is characterized in that: step 5b) mode of described weighted sum is as follows:

G*=G _i×A _i

Wherein, G* means all scale features, G _ithe proper vector that means each sample level, i=1,2,3, A _imean the weight that each sample level is corresponding,

w _i=1/p _i, p _ithe pixel size that means the sampling scale of each sample level, p _i={ 16,25,36}.

6. the efficient coupling core human body detecting method based on the fast robust feature according to claim 1, it is characterized in that: the concrete steps of the described window scanning method of step (7) are as follows:

The first step, using the zone of sample image size in a human body training sample set in the detected image upper left corner of input as first scanning window, using this scanning window as current scanning window, preserve current scanning window image;

Second step, by current scanning window on detected image to 8 pixels of right translation or move down 16 pixels and obtain a new scanning window, remove to replace current scanning window with new scanning window, preserve current scanning window image;

The 3rd step, mobile current scanning window as stated above, go to replace current scanning window with the scanning window after movement until scan the detected image of complete width, preserves all scanning window images.

7. the efficient coupling core human body detecting method based on the fast robust feature according to claim 1 is characterized in that: step 8b) concrete steps of described window combination operation are as follows:

The first step, by all images of window combination that need since 1 serial number;

Second step, need every width the sorter mark of the image of window combination, and the proportion accounted in the detection sorter mark sum of all images that need window combination is as the weight of image boundary weighting;

The 3rd step, utilize following formula, and every border of the image that needs window combination is weighted:

X = x_{1} \times \frac{m_{1}}{A} + x_{2} \times \frac{m_{2}}{A} + . . . + x_{N} \times \frac{m_{N}}{A}

Wherein, X means the pixel value of being expert on detected image of the window edge that obtains after weighting or the pixel value of column, x ₁, x ₂... x _nthe pixel value of being expert at of the image boundary that mean to participate in respectively window combination on detected image or the pixel value of column, m ₁, m ₂... m _nmean to participate in respectively the detection sorter mark corresponding to image of window combination, N means to participate in the image number of window combination, and A means to participate in the image detection sorter mark sum of window combination,

n means to participate in the image number of window combination, and i means the numbering of window combination image, m _imean that the i width participates in the detection sorter mark of the image of window combination;

The 4th step, form a window by the border after weighting.