CN112163499A

CN112163499A - Small target pedestrian detection method based on fusion features

Info

Publication number: CN112163499A
Application number: CN202011007245.9A
Authority: CN
Inventors: 邹腾涛; 杨尚明; 朱俊林; 邓翔文
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2021-01-01

Abstract

The invention discloses a small-target pedestrian detection method based on fusion characteristics, which is characterized in that fusion multi-scale convolution characteristics are adopted to detect that pedestrians share characteristic information among convolution layers when analyzing that an SSD algorithm is applied to a place with deficiency in pedestrian detection, and the input of each layer comprises the output of the previous layer and partial convolution information of the previous convolution layer, so that the central point position of the pedestrian is better regressed; the defect that a small-resolution feature map is not easy to detect a small-target pedestrian is considered, the size of the finally generated mask map is increased, and the problem that the resolution of the small-area feature map is insufficient is solved. The method expands the size of the final feature map by fusing the convolution features of each stage, enriches the feature information of the small target, well realizes the identification of the small target, improves the identification accuracy of pedestrian detection of the small target, and has higher recall rate and identification rate compared with the SDD party.

Description

Small target pedestrian detection method based on fusion features

Technical Field

The invention belongs to the technical field of target detection, and particularly relates to a small target pedestrian detection method based on fusion characteristics.

Background

Pedestrian detection has wide application in the fields of intelligent monitoring, image retrieval, environmental perception and the like, and thus becomes an important research topic in the field of computer vision. Accurate pedestrian detection provides better preconditions for subsequent analysis, such as pedestrian tracking, people counting, re-recognition of people, attitude estimation, face recognition and the like.

Pedestrian detection has been extensively studied for many years. Most of the traditional detection methods achieve the aim by manually designing a characteristic and constructing a detection model of the load. The research of this type of target detection method focuses on the skillful design of suitable features and powerful classifiers, such as HoG + SVM, HoG + DPM, DOT + RF, etc. With the rapid development of deep learning, many studies combine feature extraction and feature classification using deep neural networks, resulting in a stage of target detection algorithms. The algorithm directly inputs the identification picture into a neural network, and performs target identification by predicting the position coordinates of the target. Representative algorithms for a stage are YOLO, YOLO v2, SSD, etc. The SSD, which is the most commonly used one-stage detection algorithm in the field of object detection, has a high detection rate and detection speed, and is also widely used in the field of object detection in the industry.

In a target detection method called SSD, a Single Shot MultiBox Detector, the method consists of two parts in total, a feature extraction network and an identification network. The feature extraction network is composed of a full convolution network, two common structures are respectively designed based on VGG and VGG improvements, a feature map of extracted features is input into a recognition network to predict the specific position and the category of a target, the recognition network is also composed of the full convolution network and is not used for YOLO to predict the target on the last feature map, the SSD respectively extracts feature maps of Conv Conv4_3, Conv7, Conv8_2, Conv9_2, Conv10_2 and Conv11_2, and then 6 default boxes with different dimensions are respectively constructed at each point on the feature maps. And then, respectively carrying out fine adjustment and prediction on the detection frames generated on the different feature maps, and deleting the detection frame with the identification result as the background so as to generate a plurality of detection frames which preliminarily accord with the conditions. Finally, the detection frames generated on different feature maps are combined, a part of overlapped or incorrect detection frames are suppressed by adopting an NMS (non-maximum suppression algorithm), and a final detection frame set (namely a detection result) is generated, although the SSD is a general target detection algorithm, in the industry, the SSD is often used in the field of pedestrian detection and monitoring safety, and a good detection effect can also be obtained. SSD-based pedestrian detection algorithms do not share features between different scale feature maps, although the detection rate for small target pedestrians is improved by predicting pedestrians on different feature maps. The convolution characteristics of the small target pedestrian are rich in the shallow characteristic diagram, but redundant information is more and difficult to identify; fewer deep profiles exist and are difficult to identify. Target frames with multiple sizes are predicted on multiple convolution feature maps, position fine adjustment and category prediction need to be carried out on each target frame, the calculated amount is large, and the number of detection frames meeting requirements is large, so that the detection time of an NMS algorithm is increased, and the step of obtaining the final recognition result is complicated.

Disclosure of Invention

Aiming at the defects in the prior art, the method for detecting the small target pedestrian based on the fusion features solves the problems that the small target pedestrian cannot be identified or the identification rate is low in the background technology.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that: a small target pedestrian detection method based on fusion features comprises the following steps:

s1, extracting the features of the image to be detected through a Unet full convolution neural network to generate a corresponding feature map;

s2, inputting the feature maps into a center prediction network and a position prediction network respectively to obtain corresponding prediction results, and constructing a pedestrian detection frame set according to the prediction results;

and S3, deleting the detection frames with the overlapping rate larger than the set value in the pedestrian detection frame set by adopting a non-maximum inhibition algorithm to obtain a final pedestrian detection result, and realizing small-target pedestrian detection.

Further, the sizes of the images to be detected input into the Unet full convolution neural network in the step S1 are the same;

the feature map generated by the Unet full convolution neural network in the step S1

Comprises the following steps:

where f (-) is the RELU activation function;

is the ith characteristic diagram of the l-1 layer;

a convolution kernel of the first convolution layer in the Unet full convolution neural network;

the translation parameter of the first layer convolution layer in the Unet full convolution neural network.

Further, in step S2, the central prediction network is an independent convolution layer with a convolution kernel of 1 × 1 and a computation function of Sigmod function, and is used to fuse the extracted feature maps and generate a pedestrian position mask map M.

Further, in step S2, the position prediction network is an independent convolution layer with a convolution kernel of 1 × 1, and is configured to reconstruct the extracted feature map and obtain a four-channel position feature map N.

Further, the position mask map M is:

out_cen＝f(o)

in the formula, out_cenThe expression of the position mask map M, f (o) is the function value of 0 in the Sigmod function, and the expression of the Sigmod function is

Further, a plurality of pedestrian detection boxes are included in the position feature map N, and each pedestrian detection box is marked with an upper left corner coordinate (x1, y1) and a lower right corner coordinate (x2, y 2).

Further, the step S2 is specifically:

s21, when a position with a pixel value larger than 0.5 exists in the arbitrary position mask image M, determining that the pedestrian detection frame where the position is located in the position feature image N corresponding to the position mask image M is valid;

and S22, constructing a pedestrian detection frame set by taking all the confirmed effective pedestrian detection frames as the constituent elements in the pedestrian detection frame set.

Further, the step S3 is specifically:

s31, taking the pixel value of each pedestrian detection frame in the pedestrian detection frame set B in the corresponding position mask image M as the confidence score of the pedestrian detection frame, thereby constructing a score set S of all pedestrian detection frames;

s32, deleting the pedestrian detection frame corresponding to the highest confidence score S in the score set S from the pedestrian detection frame set B;

s33, deleting other pedestrian detection frames with the pedestrian detection frame overlapping area larger than 0.5 corresponding to the highest confidence score S from the pedestrian detection frame set B;

and S34, repeating the steps S31-S33 until the pedestrian detection frame set B is empty, and finally taking the current pedestrian detection frame set as a pedestrian detection result to realize small-target pedestrian detection.

The invention has the beneficial effects that:

the invention provides a small-target pedestrian detection method based on fusion features, wherein fusion multi-scale convolution features are adopted to detect that pedestrians share feature information among convolution layers when analyzing that an SSD algorithm is applied to a place with defects in pedestrian detection, and the input of each layer comprises the output of the previous layer and partial convolution information of the previous convolution layer, so that the central point position of the pedestrian is better regressed; the defect that a small-resolution feature map is not easy to detect a small-target pedestrian is considered, the size of the finally generated mask map is increased, and the problem that the resolution of the small-area feature map is insufficient is solved. The method expands the size of the final feature map by fusing the convolution features of each stage, enriches the feature information of the small target, well realizes the identification of the small target, improves the identification accuracy of pedestrian detection of the small target, and has higher recall rate and identification rate compared with the SDD party.

Drawings

Fig. 1 is a flow chart of a small-target pedestrian detection method based on fusion features provided by the invention.

Fig. 2 is a schematic diagram of the calculation composition of the Unet full convolution neural network provided by the present invention.

Fig. 3 is a structural diagram of a pnet full convolution neural network provided by the present invention.

Fig. 4 is a schematic diagram of the time convolution operation in the Unet full convolution neural network provided by the present invention.

Fig. 5 is a schematic diagram of Max pool operation process in the Unet full convolution neural network provided by the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

As shown in fig. 1, a method for detecting a small target pedestrian based on fusion features includes the following steps:

The size of the images to be detected input into the Unet full convolution neural network in the step S1 is the same;

when the feature extraction is performed in the Unet full convolution neural network in step S1, the feature map is generated by repeating the convolution calculation, the down-sampling calculation and the up-sampling calculation, and the specific composition is as shown in fig. 2

Comprises the following steps:

where f (-) is the RELU activation function;

is the ith characteristic diagram of the l-1 layer;

the translation parameter of the first layer convolution layer in the Unet full convolution neural network is obtained;

in the above process, the expression of the downsampling calculation is:

the expression for the upsampling calculation is:

fig. 3 is a structural diagram of a net full convolution neural network according to the present invention, conv represents a convolution operation, and copy and crop represents a convolution operation in which a convolution feature obtained earlier and a convolution feature obtained currently are connected and output to a next neural layer. The activation function expressed by ReLU is expressed as:

in fig. 3, Up indicates the feature size expansion, a linear interpolation algorithm is used to expand the feature size, Max pool indicates the down-sampling operation, fig. 4 is a schematic diagram of the time convolution operation in the above process, and fig. 5 is a schematic diagram of the Max pool operation process in the above process.

In step S2, the central prediction network is an independent convolution layer with a convolution kernel of 1 × 1 and a computation function of Sigmod function, and is used to fuse the extracted feature maps and generate a pedestrian position mask map M:

out_cen＝f(o)

In the step S2, the position prediction network is an independent convolution layer with a convolution kernel of 1 × 1, and is configured to reconstruct the extracted feature map and obtain a four-channel position feature map N, where the position feature map N includes a plurality of pedestrian detection frames, and each pedestrian detection frame is labeled with an upper-left coordinate (x1, y1) and a lower-right coordinate (x2, y 2).

The step S2 is specifically:

The step S3 is specifically:

In one embodiment of the present invention, the experimental comparison between the method of the present invention and the SDD method in the background art for small target pedestrian identification is provided:

compared with the background technology, the pedestrian detection method has the advantages that the multi-scale convolution characteristics are fused to identify the pedestrian, the pedestrian is detected by generating the mask image with the same size as the detection image, and the detection rate of the small target pedestrian is improved. Experiments show that compared with the original SDD, the method provided by the invention has higher recall rate and recognition rate. We chose two data sets, INRIA and Caltech, to implement the improved scheme and compare experiments with the original SSD. As can be seen from tables 1 and 2, the method has better embodiment in recall rate and accuracy rate. This also demonstrates the effectiveness of our approach.

Table 1 comparative experimental results of SSD algorithm and our proposed algorithm on INRIA row human data set

Method of producing a composite material	Recall rate	Rate of accuracy
			SSD	13.3％	88.6％
Ours	9.8％	92.5％

Table 2 SSD algorithm and comparative experimental results of our proposed algorithm on Caltech pedestrian dataset

Method of producing a composite material	Recall rate	Rate of accuracy
			SSD	21.1％	85.3％
Ours	13.7％	90.2％

In addition, in order to verify that the invention can effectively improve the pedestrian detection accuracy of small targets in the data set, we performed improvement scheme experiments on the small target data set of Caltech, which divides the pedestrian mark frames into 3 different subsets according to their pixel heights, wherein the pedestrian pixels smaller than 30 pixels are regarded as small targets. By adopting the scheme and the SSD algorithm, comparison is carried out, the omission factor is used as a judgment index, and table 3 shows that the omission factor of the Caltech small target data set is obviously improved by the method.

Table 3 SSD algorithm and comparative experimental results of our proposed algorithm on Caltech pedestrian dataset

Method of producing a composite material	Rate of missed examination
		SSD	48.5％
Ours	35.2％

The invention has the beneficial effects that:

Claims

1. A small target pedestrian detection method based on fusion features is characterized by comprising the following steps:

2. The fusion feature-based small-target pedestrian detection method according to claim 1, wherein the images to be detected input into the Unet full convolution neural network in the step S1 are the same in size;

Comprises the following steps:

where f (-) is the RELU activation function;

is the ith characteristic diagram of the l-1 layer;

for layer I volume in Unet full convolution neural networkTranslation parameters of the stack.

3. The method for detecting small-target pedestrians according to the fused feature of claim 1, wherein in the step S2, the central prediction network is an independent convolution layer with convolution kernel of 1 × 1 and calculation function of Sigmod function, and is used for fusing the extracted feature maps and generating the position mask map M of the pedestrian.

4. The method according to claim 3, wherein in step S2, the position prediction network is an independent convolution layer with convolution kernel of 1 × 1, and is used to reconstruct the extracted feature map and obtain a four-channel position feature map N.

5. The fusion-feature-based small-target pedestrian detection method according to claim 4, wherein the position mask map M is:

out_cen＝f(o)

6. The fusion-feature-based small-target pedestrian detection method according to claim 5, wherein a plurality of pedestrian detection boxes are included in the position feature map N, and each pedestrian detection box is labeled with an upper-left corner coordinate (x1, y1) and a lower-right corner coordinate (x2, y 2).

7. The method for detecting small-target pedestrians based on fusion characteristics according to claim 6, wherein the step S2 is specifically that:

8. The method for detecting small-target pedestrians based on fusion characteristics according to claim 7, wherein the step S3 is specifically that: