CN107886066A

CN107886066A - A kind of pedestrian detection method based on improvement HOG SSLBP

Info

Publication number: CN107886066A
Application number: CN201711084863.1A
Authority: CN
Inventors: 程德强; 唐世轩; 李岩; 寇旗旗; 姚洁; 刘海; 白春梦; 李腾腾; 赵广源
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2017-11-07
Filing date: 2017-11-07
Publication date: 2018-04-06

Abstract

The present invention relates to a kind of based on the pedestrian detection method for improving HOG SSLBP, comprise the following steps：Obtain image to be detected sequence；Extract the SSLBP features of above-mentioned image and HOG features and merged, obtain fusion feature；Above-mentioned fusion feature is classified using the HIK SVM classifiers trained, obtains pedestrian detection result.The pedestrian detection method of the present invention has robustness to illumination, noise and rotation, while retaining original image texture information and marginal information, simply and effectively solves scale problem, improve the scale invariability of local binary patterns, obtain scale invariant feature and improve accuracy of detection, and extraction feature is time-consuming shorter, has higher real-time.

Description

Pedestrian detection method based on improved HOG-SSLBP

Technical Field

The invention relates to the technical field of computer vision and model identification, in particular to a pedestrian detection method based on improved HOG-SSLBP.

Background

Pedestrian detection, as one of object detection techniques, has been widely used in automobile driving assistance, video monitoring systems, and content-based video retrieval. Pedestrian detection can be viewed as a process of feature extraction combined with classifier design in order to automatically analyze and detect the presence of pedestrians from an unknown video or image. By adopting the intelligent image information analysis technology of pedestrian detection, the uncertainty problem in image understanding and pattern recognition related to target detection can be effectively processed.

At present, more classical pedestrian detection methods include a pedestrian detection method combining an HOG descriptor and an SVM classifier, a method for detecting a pedestrian with a serious adhesion phenomenon by using a variable part model (DPM), and a pedestrian detection method based on an HOG-LBP feature. The use of HOG and its improved method has met with great success. The HOG and the LBP are fused to be used as characteristics for pedestrian detection, and the defect of edge direction information loss caused by singly using the LBP characteristics for image description is overcome. And the HOF and the local self-similarity feature are used for image description, and the HIKSVM classifier is used for pedestrian detection, so that the accuracy and the real-time performance of the algorithm are improved.

When the Local Binary Pattern (LBP) is used as an operator for describing local texture features of an image and used for extracting the texture features, the LBP has the remarkable advantages of rotation invariance, gray scale invariance and the like, and the extracted features are the local texture features of the image. But has the defects of higher feature spectrum dimension, poor robustness to illumination, rotation and scale and the like.

The HOG descriptor generation process is long, so that the speed is low, the real-time performance is poor, and the problem of shielding is difficult to process. Due to the nature of the gradient, the descriptor is quite sensitive to noise. The LBP and the improved operator thereof can not solve the problem of scale change, and have the defects of higher characteristic spectrum dimension, poor noise robustness and the like.

Disclosure of Invention

In view of the above analysis, the present invention aims to provide a pedestrian detection method based on improved HOG-SSLBP, so as to solve the problem that the existing pedestrian detection method cannot solve the scale change problem or has poor real-time performance.

The purpose of the invention is mainly realized by the following technical scheme:

a pedestrian detection method based on improved HOG-SSLBP specifically comprises the following steps:

acquiring an image sequence to be detected;

extracting and fusing the SSLBP characteristic and the HOG characteristic of the image sequence to obtain a fusion characteristic;

and classifying the fusion characteristics by using a trained HIK SVM classifier to obtain a pedestrian detection result.

The invention has the following beneficial effects:

the invention adopts a feature extraction method of fusing Scale Selection Local Binary Pattern (SSLBP) features of an original image with HOG features to extract image features. The method has robustness to illumination, noise and rotation, and retains original image texture information and edge information; the scale problem is simply and effectively solved, the scale invariance of the local binary pattern is improved, the scale invariance characteristic is obtained, the detection precision is improved, the time consumption for extracting the characteristic is short, and the real-time performance is high.

On the basis of the scheme, the invention is further improved as follows:

further, the extracting the SSLBP features of the image sequence specifically includes the following steps:

extracting scale sensitive features by using CLBP;

and processing the obtained scale sensitive characteristics by applying a scale selection scheme to obtain the SSLBP characteristics.

The beneficial effect of adopting the further scheme is that:

and the CLBP is used for extracting scale sensitive features, so that the influence of rotation is eliminated.

Further, the scale selection scheme specifically comprises the following steps:

obtaining a scale space of an image using a gaussian filter;

constructing a local mode histogram of each image only containing the dominant mode in the scale space;

for each dominant mode, selecting a maximum frequency of occurrence between different scales;

and constructing and obtaining a new feature histogram by using the selected maximum occurrence frequency, wherein the new feature histogram is used as the scale-invariant feature of the image.

Further, the dominant mode is determined by:

obtaining a scale space of a training sample set through a two-dimensional Gaussian filter;

constructing a local mode histogram for the image in each scale space;

for each mode, selecting a maximum frequency of occurrence between different scales;

constructing and obtaining a new feature histogram by using the selected maximum occurrence frequency;

and performing descending classification on the obtained new feature histogram, and selecting a mode with high average frequency as a dominant mode.

Further, the training process of the HIK SVM classifier specifically comprises the following steps:

inputting a training sample set;

extracting and fusing the SSLBP characteristic and the HOG characteristic of the training sample to obtain a fusion characteristic;

and training the HIK SVM classifier by using the fusion characteristics.

Further, the extracting the SSLBP feature of the training sample specifically includes the following steps:

extracting scale sensitive features of the training sample by using CLBP;

and processing the obtained scale sensitive characteristics by using a scale selection scheme to obtain the SSLBP characteristics of the training sample.

Further, the scale-sensitive feature is extracted by using the CLBP, and specifically, a two-dimensional joint histogram "CLBP _ S/C" or "CLBP _ M/C" is established as the scale-sensitive feature.

Further, the extracting the HOG features specifically includes the following steps:

carrying out normalization processing on an image to be detected;

calculating the gradient of the image after the normalization processing by using first order differential;

performing direction weight projection based on the gradient amplitude obtained by calculation to obtain a feature vector;

and carrying out normalization processing on the feature vectors to obtain the HOG features.

Further, the fusion characteristics are obtained, and specifically, the extracted SSLBP characteristics and HOG characteristics of the image to be detected are fused in series.

The beneficial effect of adopting the further scheme is that:

the two image features are fused, so that any texture information and edge information are not lost, and robustness is provided for illumination, rotation, noise and scale.

Further, for the obtained fusion features, the principal component analysis method is used for dimensionality reduction.

The beneficial effect of adopting the further scheme is that:

the characteristic dimension can be greatly reduced, the operation efficiency is improved, and the real-time performance of pedestrian detection is improved.

In the invention, the technical schemes can be combined with each other to realize more preferable combination schemes. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.

Fig. 1 shows a flow chart of a pedestrian detection method based on improved HOG-SSLBP.

Fig. 2 shows a schematic diagram of the HOG feature generation process.

Detailed Description

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.

The invention discloses a pedestrian detection method based on improved HOG-SSLBP. The method comprises the following steps:

and step S1, acquiring the image sequence to be detected.

And acquiring an image sequence to be detected, and preprocessing the image to improve the quality of the image to be detected.

Step S2, extracting image features using modified SSLBP

Specifically, a CLBP is used for extracting scale-sensitive features; a scale selection scheme is applied to obtain SSLBP features.

Step S201, using CLBP to extract scale sensitive features.

CLBP is a gray-scale texture operator that characterizes the local spatial structure of image texture. Giving the central pixel g_cAnd P circularly uniformly spaced adjacent g_pThe radius is R, P is 0, 1. G can be simply calculated_cAnd g_pDifference between because of d_p＝g_p-g_c。d_pCan be further broken down into two components:

wherein,is a function of the sign, m_pIs d_pThe amplitude of (c).

Thus, three operators are defined as:

CLBP_C_P，R＝t(g_c，c_I)

wherein t is a threshold functionc_mIs m for an image_pAverage value of c_IIs the average gray level of the image. CLBP _ S can be obtained_P，RAnd CLBP _ M_P，RAll can have 2^pA different value. Since CLBP _ C contains only one byte, it can have two values that are rotation invariant.

To eliminate the effect of rotation, i.e. to assign a unique identifier for each rotation-invariant pattern, the rotation-invariant quantities CLBP _ S and CLBP _ M are defined as:

wherein, ROR (x, i) is circularly right-shifted i times on P byte number x. Much smaller than CLBP _ S_P，R(CLBP_M_P，R)。

Since CLBP _ S, CLBP _ M and CLBP _ C generate binary strings and contain supplementary information, scale-sensitive features are extracted by creating a two-dimensional joint histogram "CLBP _ S/C" or "CLBP _ M/C" to obtain more information and avoid huge dimension problems.

Step S202, a scale selection scheme is applied to obtain SSLBP, and scale invariant features are extracted.

When an image is zoomed in or out, there is still a dominant mode, but it will occupy a larger or smaller area of the image. I.e. the percentage of such a pattern in the image is not changed, as is the characteristic scale of the pattern. Thus, by finding the characteristic scale of a pattern and extracting its percentage over that scale, scale invariance can be achieved. In the embodiment of the invention, a novel and simple scale invariant feature extraction method is designed for LBP by determining the dominant mode of a scale space, and the method comprises the following steps: finding out a dominant mode by analyzing a scale space of a training set and applying a training stage; and extracting scale-invariant features of the training and testing set based on the dominant mode.

The dominant mode is found by applying a training phase by analyzing the scale space of the training set. The method specifically comprises the following steps: giving a training sample set, obtaining a scale space through a two-dimensional Gaussian filter, and constructing multiple scales; constructing a local mode histogram for the image in each scale space; for each pattern, the maximum frequency of occurrence of the pattern between different scales is preserved to construct a new histogram that serves as a scale-invariant feature for a given training sample; the dominant mode is determined by selecting a high average frequency in the entire training set.

The scale invariant feature extraction method extracts frequency information from learning and fixed patterns, thus preserving pattern types that contain important information. For two images I₁And I₂The characteristics will be obtained according to the above method:andwherein P is_AAnd P_BIs twoThe dominant mode of the type.

In particular, for the above obtainedDetermining the dominant mode at different scales, specifically comprising the following steps:

step S20201, initialize a pattern histogram for the training sample setWhere HS is the size of the rotation invariant histogram. The training sample set is as follows: t ═ f_i1, 2., N }, where f_iIs a training image and N is the training set size.

Step S20202, for the image f in the training sample set_iAnd obtaining a scale space through a two-dimensional Gaussian filter.

In particular, g_σFor two-dimensional Gaussian filter with standard deviation sigma for constructing scale space, based on g_σFor image f_iObtaining a scale space s_l。

Where L is the size of the scale space for a given image,

step S20203, for the scale space S obtained above_lEstablishing a mode histogram based on CLBP _ S/C

Step S20204, selecting the maximum frequency of occurrence of the mode in the mode histogram, and constructing an image f_iA new histogram of (2):

in the formula, k is more than or equal to 0 and less than or equal to HS-1, and HS is the size of the rotation invariant histogram.

Step S20205, adding the value of the new histogram to

Step S20206, repeating steps S20202 to S20205 until all images in the training sample are processed.

Step S20207, for the above obtainedSorting in descending order to obtain K patterns with the first K maximum frequenciesK is the number of dominant patterns learned for CLBP _ S/C.

For theThe specific steps for determining the dominant mode at different scales are similar to the above steps, and are not described herein again.

And step S203, after the dominant mode is determined, extracting the scale invariant feature of the test image sequence. First, given a sample, a scale space is obtained by a two-dimensional gaussian filter. Then, for each image in the scale space, a local pattern histogram is constructed that contains only the pre-learned dominant patterns. And finally, for each dominant mode, selecting the maximum occurrence frequency of different scales to construct a new feature histogram, and extracting the scale to select local binary pattern features.

In particular, by the aboveExtracting scale selection local binary pattern features, specifically comprising the following steps:

step S20301, obtaining the scale space of given sample by two-dimensional Gaussian filter based on g_σObtaining a scale space s for an image I_lWherein g is_σIs a two-dimensional gaussian filter with standard deviation sigma for constructing a scale space.

Step S20302, for the above scale space S_lEstablishing a mode histogram based on CLBP _ S/CSpecifically, for each pixel, a mode tag is first computed based on CLBP _ S/C if its mode tag does not belong toThen the pixel will not be rightIt is helpful.

Step S20303, establishing the scale selection feature of the image I:

in the formula, K is more than or equal to 1 and less than or equal to K.

By passingThe specific method and process for extracting the scale selection features are similar to the steps described above, and are not repeated here。

Cascading the two obtained features, and extracting the scale-invariant features of the image:

in the traditional method, most of the invariant features of local or global scale are extracted. From an implementation point of view, the invention is proposed to belong to an intermediate path. The invention firstly extracts local scale change characteristics and then applies global transformation to realize scale invariance. On the other hand, from the perspective of scale space, the present invention proposes a new method.

Step S3, extracting features using histogram of oriented gradients

The HOG (Histogram of Oriented gradient) is an image descriptor for solving the human body target detection, and in this implementation, features of a Histogram of gradient directions (HOG) are used to express a human body, and appearance information and motion information of the human body are extracted to form a rich feature set. As shown in fig. 2, the method specifically includes the following steps:

step S301, normalization processing is performed on the input image. The normalization of the image aims to improve the robustness of the detector to illumination, the normalization of the color space is to normalize the color information of the whole image so as to reduce the influence of different illumination and background, and the Gamma and the normalization of the color space of the image are introduced as the preprocessing means of feature extraction in order to improve the robustness of detection.

Step S302, calculating image gradient by using first order differential. Image edges are caused by abrupt changes in local features of the image, including gray scale, color, and texture. The change between adjacent pixel points in an image is less, the change of the area is flat, the gradient amplitude is smaller, otherwise, the gradient amplitude is larger. The gradient corresponds to the first derivative of the image. The gradient of any pixel point (x, y) in the image f (x, y) is:

in the formula, G_xIs a gradient in the x-direction, G_yIs the gradient along the y direction, x, y are the horizontal and vertical coordinate value of the pixel point, the amplitude and the direction angle of the gradient are respectively expressed as:

in the formula, G_xIs a gradient in the x-direction, G_yIs the gradient along the y direction, and x and y are the horizontal and vertical coordinate values of the pixel points.

Because the calculation amount of the modulus is large, the following formula is generally used for approximate solution:

in the formula, G_xIs a gradient in the x-direction, G_yIs the gradient in the y-direction.

In this embodiment, the gradient and direction of the image are calculated by using the template [ -1, 0, 1], and the gradients in the horizontal and vertical directions are calculated by using the gradient template as follows:

in the formula, G_x，G_yAnd the gradients of the pixel points (x, y) in the horizontal direction and the vertical direction are respectively expressed, H (x, y) expresses the gray value of the pixel points (x, y), and the amplitude and direction calculation formula of the gradients is as follows:

for range limitation of the gradient direction, an unsigned range is used, so the gradient direction can be expressed as:

and step S303, projecting direction weight based on the gradient amplitude of the pixel.

Dividing the whole target window into cell units (cells) with different sizes, and calculating the gradient information of each cell respectively, wherein the cell units comprise gradient sizes and gradient directions. The gradient directions of the pixels are evenly divided into 9 bins within the interval of 0-180 degrees, when the gradient directions exceed 9 bins, the detection performance is not obviously improved, but the detection operand is increased, the pixels in each cell perform weighted voting for the histogram of the gradient direction in which the pixels are located, and the weighted weight is the gradient amplitude of the pixels.

And step S304, carrying out normalization processing on the feature vectors.

Specifically, the local cell units are normalized, and the absolute value of the gradient amplitude is reduced, so that the absolute value is easily influenced by the contrast between the foreground and the background and the local illumination, and a more accurate detection effect is obtained.

Combining a plurality of cell units (cells) into a larger block (block), wherein the whole image can be regarded as a window to be detected, the larger block can be regarded as a sliding window, and the larger block can slide from left to right and from top to bottom in sequence to obtain blocks with repeated cell units and gradient information of the same cell units (cells) in different blocks (blocks).

The block information is normalized, and the final detection effect is affected by the sizes of different cell units and different blocks. In the present embodiment, the block size is 3 × 3 cells, and the cell size is 6 × 6 pixels, where the detection effect is the best. If the size of the chunk is too large, the effect of the normalization is diminished resulting in an increased error rate, and if the size of the chunk is too small, useful information is filtered out instead.

This example usesAnd (epsilon is a small constant and avoids the denominator being 0), and the HOG feature vectors in the block blocks are normalized, so that the feature vector space has robustness to illumination, shadow and edge change.

A high-dimensional vector consisting of β × ζ × η data is obtained, that is, a description vector of the HOG for the image is obtained, where β represents the number of directional units (bin) in each cell, and ζ and η represent the number of blocks and the number of cells in one block, respectively.

And step S4, fusing the obtained SSLBP characteristic and HOG characteristic.

Let SSLBP feature vectors of the obtained image be expressed as: x₁＝[X₁₁，X₁₂，...，X_1n]The HOG feature vector of the image is represented as: x₂＝[X₂₁，X₂₂，...，X_2m]Then the feature vector after serial fusion is: x ═ X₁₁，X₁₂，...，X_1n，X₂₁，X₂₂，...，X_2m]。

After obtaining the SSLBP feature of the original image and the HOG feature of the original image are fused in a serial fusion mode, the final image fusion feature vector can be obtained.

And step S5, performing dimensionality reduction on the obtained fusion feature vector by using PCA.

After the final image fusion feature vector is obtained, the Principal Component Analysis (PCA) method is used for reducing the dimension so as to reduce the calculated amount and improve the real-time property of detection. In particular, the amount of the solvent to be used,

given m d-dimensional spatial samples x₁，x₂，...，x_mAnd calculating a covariance matrix:

wherein m is the number of spatial samples

Decomposing the covariance matrix S to obtain a matrix U composed of eigenvectors corresponding to the first k largest eigenvalues_k＝[u₁，u₂，...，u_k]And obtaining the characteristics after dimensionality reduction:

and step S6, classifying the features after dimensionality reduction by using a trained HIKSVM classifier to obtain a pedestrian detection result.

The training process of the HIKSVM classifier comprises the following steps: inputting a training sample set; extracting and fusing the SSLBP characteristic and the HOG characteristic of the training sample to obtain a fusion characteristic; and training the HIK SVM classifier by using the fusion characteristics. The extracting and fusing the SSLBP feature and the HOG feature of the training sample is similar to the extracting and fusing the SSLBP feature and the HOG feature of the test image in the above-mentioned content, and specific content is not described again here. And using the trained HIKSVM classifier for the feature classification obtained after the dimensionality reduction to obtain a pedestrian detection result of the image.

In summary, the embodiment of the present invention provides a pedestrian detection method based on improved HOG-SSLBP, which uses an image descriptor fused with HOG and SSLBP, so as to enhance scale invariance and rotation invariance, have good robustness to illumination, rotation, noise and scale, and improve detection accuracy; meanwhile, the PCA is used for dimensionality reduction, so that the characteristic dimensionality can be greatly reduced while original image texture information and edge information are kept, the operation efficiency is improved, and the real-time performance of pedestrian detection is improved.

Those skilled in the art will appreciate that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program, which is stored in a computer readable storage medium, to instruct related hardware. The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims

1. A pedestrian detection method based on improved HOG-SSLBP is characterized by comprising the following steps:

acquiring an image sequence to be detected;

2. The method according to claim 1, wherein said extracting SSLBP features of the sequence of images comprises in particular the steps of:

extracting scale sensitive features by using CLBP;

3. The method according to claim 2, wherein the scale selection scheme comprises in particular the steps of:

obtaining a scale space of an image using a gaussian filter;

4. The method of claim 3, wherein the dominant mode is determined by:

constructing a local mode histogram for the image in each scale space;

5. The method according to any one of claims 1 to 4, wherein the training process of the HIK SVM classifier specifically comprises the following steps:

inputting a training sample set;

extracting and fusing the SSLBP characteristic and the HOG characteristic of the training sample to obtain the fusion characteristic of the sample;

and training the HIK SVM classifier by using the fusion characteristics.

6. The method of claim 5, wherein the extracting the SSLBP features of the training samples comprises the steps of:

extracting scale sensitive features of the training sample by using CLBP;

7. The method according to claim 4, characterized in that the use of CLBP to extract scale-sensitive features, in particular to build a two-dimensional joint histogram "CLBP _ S/C" or "CLBP _ M/C" as scale-sensitive features.

8. The method according to claim 7, wherein said extracting the HOG features comprises in particular the steps of:

carrying out normalization processing on an image to be detected;

9. The method according to claim 8, wherein the fusion features are obtained, in particular SSLBP features and HOG features of the extracted image to be detected are fused in series.

10. The method of claim 9, wherein for the obtained fusion features, principal component analysis is used for dimensionality reduction.