CN112926500B

CN112926500B - Pedestrian detection method combining head and overall information

Info

Publication number: CN112926500B
Application number: CN202110302808.5A
Authority: CN
Inventors: 陈勇; 谢文阳; 刘焕淋; 黄美永; 黄俊杰
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2022-09-20
Anticipated expiration: 2041-03-22
Also published as: CN112926500A

Abstract

The invention relates to a pedestrian detection method combining head and overall information, and belongs to the field of target detection. First, feature information of a target is extracted using a convolutional neural network and a plurality of feature maps having different resolutions and activation degrees are obtained. Secondly, the feature maps are used for constructing a feature pyramid, and output of different substructures is fused to provide targeted feature information for the head and the whole detection of the pedestrian respectively. Then, a head detection branch is added on the basis of pedestrian detection and the head of the pedestrian and the pedestrian as a whole are predicted from the corresponding feature map. Finally, the outputs of the two branches are fused using an improved non-maxima suppression algorithm and the final result is obtained. The invention fully utilizes the characteristic information of the pedestrian and effectively improves the detection accuracy of the shielded pedestrian.

Description

Pedestrian detection method combining head and overall information

Technical Field

The invention belongs to the field of target detection, and relates to a pedestrian detection method combining head and overall information.

Background

Pedestrian detection methods can be divided into two categories according to differences in feature extraction methods: the first type is a detection method based on manual characteristics, and the method adopts a pre-designed characteristic extraction operator to obtain characteristic information; the second type is a detection method based on deep learning, and the method adopts a self-learning mode to obtain characteristic information.

The detection method based on the manual feature extraction operator comprises the following steps: firstly, obtaining key point information of a target in an image by using a filter, then calculating a gradient value between each key point and adjacent pixels of the key point and generating a statistical histogram, and finally, carrying out feature classification by using a classifier such as SVM or Adaboost and the like to obtain pedestrian information in the image. The method can capture local features of the target in the image to describe the appearance and the shape of the target, and has the advantages of simpler calculation and better detection accuracy, but the method has poor detection effect on the shielded pedestrians and the accuracy is far from meeting the actual requirement. A large number of interference factors such as shielding, illumination change and the like exist in a natural scene, due to the characteristics of pedestrians, the pedestrians are easily shielded by objects in the scene, shielding conditions can occur between the pedestrians, key points captured by a traditional method are generally description of appearance shapes of the pedestrians, meanwhile, gestures of the pedestrians can also change greatly, and accordingly detection accuracy of the method is greatly reduced.

The detection method based on deep learning comprises the following steps: with the rapid development of computer vision and big data technology, deep learning methods represented by convolutional neural networks have shown excellent performance in such tasks and gradually replace traditional methods to become mainstream. Among them, the YOLO series model and the R-CNN series model are widely adopted and many improved versions are emerging. Due to the excellent feature description capability of the convolutional neural network, the detection accuracy of the method reaches a relatively high level and the detection speed is faster and faster, but the problem of detection accuracy reduction caused by occlusion still exists.

In some methods, an attention mechanism is adopted to enable a model to pay more attention to an effective part of a target which is not shielded, and the model is guided to correct the position of the target, so that the influence caused by the problem is relieved to a certain extent, but the method can increase the calculated amount, so that the real-time performance of the detector is reduced. In addition, some methods adopt a characteristic pyramid structure to detect by using characteristic graphs with different sizes, so that the detection accuracy of the model on pedestrians with different scales is improved, but the method fails to fully consider the characteristic that the activation degree of a network shallow layer on a small-scale target is higher, so that the improvement on the detection accuracy is still not very obvious. The pedestrian detection data set generally marks pedestrians by using a rectangular bounding box, and a large number of background pixels are introduced so as to influence the learning of the pedestrian features by a model in a training phase. In order to solve the problem, some methods mark the pedestrian by using a straight line from the head to the bottom so as to improve the detection performance under the shielding condition, but the method only improves the shielding problem between the pedestrian and the pedestrian to a certain extent, and the detection effect on the pedestrian shielded by the object in the scene is still not ideal.

Disclosure of Invention

In view of the above, the present invention is directed to a pedestrian detection method combining head and overall information. In order to improve the detection accuracy of the blocking pedestrian, the method simultaneously detects the head and the whole of the pedestrian and combines the head and the whole of the pedestrian to enhance the characteristic information of the pedestrian. In addition, the method improves the detection accuracy of the small-scale target by introducing a network shallow characteristic diagram. The method comprises the steps of constructing a characteristic pyramid with a multi-layer structure and fusing characteristic graphs output by different substructures so as to respectively provide targeted characteristic information for head detection and overall detection.

In order to achieve the purpose, the invention provides the following technical scheme:

a pedestrian detection method incorporating head and ensemble information, the method comprising the steps of:

s1: converting the pedestrian in the data set and the head rectangular boundary frame label thereof into a central point label, and simultaneously carrying out corresponding preprocessing on the image;

s2: constructing a feature extraction module based on a deep convolutional neural network to obtain pedestrian head and overall feature map information for detection;

s3: constructing a detection module comprising two branches of head detection and integral detection, wherein the detection module predicts the position, height and offset information of a central point from a characteristic diagram to generate a head boundary frame and an integral boundary frame;

s4: and for the obtained head bounding box and the whole bounding box, combining the two by using an improved non-maximum suppression algorithm, and filtering the bounding box with lower confidence coefficient to obtain a final detection result.

Optionally, the S1 includes the following steps:

s11: scaling the training image in random proportion, filling the training image by using gray pixel points if the size of the training image is smaller than a preset size, cutting edges if the size of the training image is larger than the preset size, and correcting the position of a boundary frame label;

s12: randomly and horizontally overturning the training image, and correcting the coordinate of the bounding box;

s13: converting the image from RGB color space to HSV or HSL color space, and randomly adjusting the brightness of the image;

s14: calculating and obtaining the head center position (x) according to the label information ^h ,y ^h ) And overall center position (x) ^b ,y ^b ) Separately generating a mask M of the center point of the head using a two-dimensional Gaussian function G (·) ^head And global center point mask M ^body ；

Optionally, in S2, constructing the feature extraction module includes the following steps:

s21: using a backbone network to perform feature extraction on the image to obtain four feature maps { p) with different activation degrees and sizes ₁ ,p ₂ ,p ₃ ,p ₄ }；

S22: for feature map { p ₁ ,p ₂ ,p ₃ ,p ₄ Calculating by convolution to obtain a characteristic map (P) ₁ ,P ₂ ,P ₃ ,P ₄ To a characteristic map P ₄ Using the same convolution calculation to obtain the feature map P ₅ Thereby forming a characteristic pyramid with a five-layer structure;

s23: for feature map P ₂ And P ₃ Upsampling to size and P ₁ Keeping consistency, fusing the three characteristic maps to obtain a characteristic map F for head detection ^head (ii) a For feature map P ₄ And P ₅ Performing the same operation to make it equal to P ₃ Consistent, fused feature map P ₃ 、P ₄ And P ₅ Obtaining a characteristic diagram F for overall detection of pedestrians ^body 。

Optionally, in S3, combining the non-maximum suppression algorithm and the modified non-maximum suppression algorithm, and filtering out the bounding box with the lower confidence to obtain the final detection result includes the following steps:

s31: head feature map F ^head Obtaining the head center point C by medium prediction ^head Height H ^head And a position offset amount O ^head While generating the head bounding box B ^head ；

S32: from the global feature map F ^body Obtaining the center point C of the whole body by middle prediction ^body Height H ^body And a position offset amount O ^body While generating the whole bounding box B ^body 。

Optionally, in S4, the screening of the detection result specifically includes the following steps:

s41: for the pedestrian's whole boundary frame outputted by the detection section

Wherein

And

respectively the upper left corner and the lower right corner of the bounding box based on the height h of the bounding box ^b And width w ^b Calculating and obtaining the head area H of the pedestrian _region ；

S42: for each pedestrian the whole bounding box B ^body First, the head region H is determined _region Whether or not there is a head bounding box B ^head If yes, selecting the head bounding box with highest confidence s in the region to be paired with to obtain { B ^body ,B ^head ,s ^body ,s ^head }；

S43: if the confidence of the whole bounding box is higher, the whole bounding box is directly reserved, and if the confidence of the pedestrian bounding box is lower, but a head bounding box matched with the pedestrian bounding box exists, and the confidence of the head bounding box is higher, the whole pedestrian bounding box is still reserved.

The invention has the beneficial effects that: the invention considers the characteristic that the head of the pedestrian is not easy to be shielded, and combines the head of the pedestrian with the pedestrian detection so as to improve the detection accuracy. By introducing the characteristic information of the network shallow layer, the activation degree of the head and the small-scale target in the characteristic diagram is improved. In addition, the introduction of background pixel reduction by labeling the head and the whole with the central point improves the discrimination capability of the network on the pedestrian characteristics.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.

Drawings

For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a network model structure of the present invention;

FIG. 2 is a feature extraction module structure of a model;

FIG. 3 is a detection module structure of the model;

fig. 4 is a schematic view of the pedestrian head region defined by the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.

Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; for a better explanation of the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.

A pedestrian detection method combining head and whole information is realized based on a convolutional neural network and uses a central point to mark the head and the whole. Firstly, a characteristic pyramid with a multilayer structure is constructed, and specific characteristic information is provided for head detection and overall detection by fusing characteristic graphs output by different substructures of the characteristic pyramid. Then, the prediction module predicts a target center point, a height and an offset from the feature map and generates a pedestrian head boundary frame and an overall boundary frame, respectively. Finally, an information fusion method is provided for better combining the pedestrian head information and the overall information.

Extracting head and integral feature information by using a backbone network, obtaining four feature maps with different resolutions, carrying out convolution operation on the four feature maps, and obtaining a fifth feature map by using the same convolution operation on the feature map with the minimum resolution, thereby forming a feature pyramid with a five-layer structure; fusing the feature maps of different substructures of the feature pyramid to obtain feature maps respectively used for head detection and overall detection; the detection module comprises a head detection branch and an integral detection branch, central point, height and offset information are respectively obtained by predicting from respective characteristic graphs, and a head boundary frame and an integral boundary frame are generated according to the information; and finally, combining the head information and the whole information to screen and output the detection result.

The invention provides a pedestrian detection method combining a head part and overall information, which is mainly divided into four parts, wherein the first part is used for preprocessing a training image, converting an original rectangular boundary frame label into a central point mask label, and expanding the training image by adopting modes of zooming, rotating, horizontally turning and the like; the second part is to adopt a convolution neural network to carry out feature extraction, construct a feature pyramid and fuse features; the third part is to predict the head and the whole of the pedestrian from the feature map and generate a corresponding bounding box; and the fourth part is to combine the head bounding box with the whole bounding box, remove the bounding box with lower confidence and obtain the final detection result.

The network model structure of the pedestrian detection method combining the head and the overall information is shown in fig. 1, and specifically comprises the following steps:

1. training image pre-processing

(1) According to the rectangular bounding box label in the data set, calculating to obtain the head center (x) ^h ,y ^h ) And overall center (x) ^b ,y ^b ) Using a Gaussian function G (-) in terms of standard deviation { σ } of bounding box width and height _w ,σ _h Respectively generating a head center mask M ^head And an integral center mask M ^body 。

(2) The training images are expanded by adopting modes of zooming, rotating, horizontally turning and the like, and the effectiveness of the characteristics learned by the model is improved.

2. Pedestrian feature extraction

(1) The structure of the feature extraction module of the invention is shown in fig. 2, and the feature extraction module uses a backbone network to extract the features of the processed image, and outputs four feature maps { p with different resolutions } ₁ ,p ₂ ,p ₃ ,p ₄ And performing convolution calculation on the four characteristic graphs to increase the relevance among channels, and simultaneously expanding the characteristic graphs into five characteristic graphs to form a characteristic pyramid structure { P } ₁ ,P ₂ ,P ₃ ,P ₄ ,P ₅ }。

(2) Fusion feature map { P } ₁ ,P ₂ ,P ₃ Obtaining a feature map F for head detection ^head Fusing feature maps { P } ₃ ,P ₄ ,P ₅ Obtaining a characteristic map F for overall detection of pedestrians ^body 。

3. Head and integrity detection

The structure of the detection module of the invention is shown in fig. 3, and comprises two branches of head detection and integral detection. It is derived from the characteristic diagram F ^head Obtaining the head center point C by medium prediction ^head Height H ^head And a position offset amount O ^head From the characteristic diagram F ^body Obtaining the center point C of the whole body by middle prediction ^body Height H ^body And a position offset amount O ^body . Respectively generating pedestrian head boundary frames B according to the information ^head And an integral bounding box B ^body 。

4. Screening of test results

(1) FIG. 4 is a schematic of the pedestrian head region defined in the present invention, for the pedestrian's entire bounding box output by the detection portion

Wherein

And

respectively the upper left corner and the lower right corner of the bounding box based on the height h of the bounding box ^b And width w ^b Calculating and obtaining the head area H of the pedestrian _region 。

(2) And screening the whole pedestrian bounding box by using a non-maximum suppression algorithm. For each pedestrian the whole bounding box B ^body First, the head region H is determined _region Whether or not there is a head bounding box B ^head If yes, selecting the head bounding box with highest confidence in the region to be paired with to obtain { B ^body ,B ^head ,s ^body ,s ^head Where s is the confidence. If the confidence of the whole bounding box is higher, the whole bounding box is directly reserved, and if the confidence of the pedestrian bounding box is lower, but a head bounding box matched with the pedestrian bounding box exists, and the confidence of the head bounding box is higher, the whole pedestrian bounding box is still reserved.

The invention relates to a pedestrian detection method combining head and overall information, which mainly comprises two stages of training and testing.

(1) Training phase

The training phase mainly comprises feature extraction and updating of model weight parameters. Training the model by using the preprocessed image, comparing various numerical values obtained by model prediction with the real values of the labels to obtain various loss values, and updating the weight parameters of the model by using a gradient back propagation algorithm according to the total loss value. And when the iteration times of the model reach a preset value, terminating the training process and storing the weight parameters.

(2) Testing phase

The testing stage needs to load the trained model weight parameters, and only uses the scaling mode for the input image to make the size of the input image meet the input requirement of the model. At the moment, the model does not perform gradient back propagation any more, and directly outputs the detection result so as to detect the pedestrian in the image and obtain the final bounding box information.

Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims

1. A pedestrian detection method combining head and overall information, characterized by: the method comprises the following steps:

s4: for the obtained head bounding box and the whole bounding box, combining the two by using an improved non-maximum suppression algorithm, and simultaneously filtering the bounding box with lower confidence coefficient to obtain a final detection result;

the S1 includes the steps of:

s13: converting the image from an RGB color space to an HSV or HSL color space, and simultaneously randomly adjusting the brightness of the image;

s14: calculating and obtaining the head center position (x) according to the label information ^h ,y ^h ) And overall center position (x) ^b ,y ^b ) Using a two-dimensional heightThe head center mask M is generated by a respective Gaussian function G (·) ^head And global center point mask M ^body ；

In S2, constructing the feature extraction module includes the following steps:

S22: for feature map { p ₁ ,p ₂ ,p ₃ ,p ₄ Calculating by convolution to obtain a characteristic map (P) ₁ ,P ₂ ,P ₃ ,P ₄ H.c. to feature map P ₄ Using the same convolution calculation to obtain the feature map P ₅ Thereby forming a characteristic pyramid with a five-layer structure;

s23: for feature map P ₂ And P ₃ Upsampling to size and P ₁ Keeping consistency, fusing the three characteristic maps to obtain a characteristic map F for head detection ^head (ii) a For feature map P ₄ And P ₅ Performing the same operation to make it equal to P ₃ Consistent, fused feature map P ₃ 、P ₄ And P ₅ Obtaining a characteristic diagram F for overall detection of pedestrians ^body ；

The S3 specifically includes the following steps:

S32: from the global feature map F ^body Obtaining the center point C of the whole body by middle prediction ^body Height H ^body And a position offset amount O ^body While generating the whole bounding box B ^body ；

The S4 specifically includes the following steps:

s41: for the detection partSub-output pedestrian integral boundary frame

Wherein

And