CN112907616B

CN112907616B - Pedestrian detection method based on thermal imaging background filtering

Info

Publication number: CN112907616B
Application number: CN202110460457.0A
Authority: CN
Inventors: 张森林; 卢晨; 刘妹琴; 郑荣濠; 董山玲
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2022-05-03
Anticipated expiration: 2041-04-27
Also published as: CN112907616A

Abstract

The invention discloses a pedestrian detection method based on thermal imaging background filtering, which comprises the following steps of: firstly, histogram equalization processing is carried out on an original thermal imaging picture acquired by a thermal imaging infrared camera, then a suitable threshold value is set for threshold value segmentation to obtain a preliminary candidate region for pedestrian detection, meanwhile, a foreground and a background are separated from the relation between a front frame and a rear frame of the picture based on a Gaussian mixture model to obtain a background subtraction picture, and a composite picture obtained by connecting the foreground and the rear frames is sent to a follow-up improved Faster R-CNN frame to complete pedestrian detection. The invention solves the problem of temperature drift of the imaging result of the thermal imaging camera through normalization, uses threshold segmentation and background subtraction to filter the background, fully utilizes the characteristics of the thermal imaging picture, and improves the accuracy of pedestrian detection in low-light and no-light environments.

Description

Pedestrian detection method based on thermal imaging background filtering

Technical Field

The invention relates to a pedestrian detection method based on thermal imaging background filtering, and belongs to the field of target detection of image processing.

Background

Vision is the most direct and dominant method for obtaining environmental information biologically, and the amount of information obtained by vision is also very abundant, so processing of visual information plays a crucial role in environmental information processing. Vision-based target detection is a research hotspot in the field of computer vision at present.

In recent years, with the development of the fields of artificial intelligence, deep learning, and the like, visual target detection has been developed. Different from the traditional target detection method based on feature extraction, the target detection method based on deep learning extracts deep information of images through a deep neural network, and uses massive data for training, so that the accuracy and speed of target detection are greatly improved.

In the field of object detection, pedestrian detection is an important component. The pedestrian detection is to use a computer technology to judge whether a pedestrian exists in a picture or a video and select the pedestrian position in the picture. Pedestrian detection has important application in fields such as autopilot, unmanned aerial vehicle, control. The pedestrian detection method currently mainstream includes: global detection, local-based detection, motion-based detection, multi-camera stereo vision detection.

Target detection based on visible light images has received extensive attention and research because of the characteristics of low equipment cost, wide application range and the like. However, visible light images are very susceptible to environmental influences. Factors such as appearance change, shading and illumination condition change can have great influence on target detection based on visible light. The appearance of infrared thermal imaging cameras provides ideas for solving the problems. Thermographic images have a distinct advantage over visible light images, in which an object is represented by its temperature and radiant heat, which means that thermographic images can be used both day and night. In addition, thermal images eliminate the effect of color and illumination changes on the appearance of the object. With the remarkable development of heat sensors in recent years, much research has been conducted on pedestrian detection and tracking in thermal images.

Disclosure of Invention

The invention aims to solve the defects of a visible light pedestrian detection method under the conditions of weak light and no light. The invention provides a pedestrian detection method based on thermal imaging background filtering. The method uses a thermal imaging sensor to obtain a thermal imaging image of the environment, and improves the pedestrian detection precision through a preprocessing method of background filtering and a pedestrian detection model based on improved FasterR-CNN.

The invention adopts the following specific technical scheme:

a pedestrian detection method based on thermal imaging background filtering comprises the following steps:

s1: firstly, processing a thermal imaging image acquired by a thermal imaging camera by using a histogram equalization method, so as to solve the problems of deviation and drift of the thermal imaging image and obtain a histogram equalization enhanced image;

s2: based on a Gaussian mixture model, separating foreground and background from the histogram equalization enhanced image obtained after the processing of S1 according to the relation between the previous frame and the next frame to obtain a binary background subtraction image;

s3: performing double-threshold segmentation on a thermal imaging image acquired by a thermal imaging camera by using upper and lower thresholds of imaging of pedestrians in the thermal imaging image to obtain a binary threshold segmentation image after segmentation of the pedestrians and a background;

s4: superposing the binary background subtraction image obtained in the step S2 and the binary threshold segmentation image obtained in the step S3 to obtain a binary background filtering image for distinguishing a foreground from a background, and performing background removal on the histogram equalization enhanced image obtained in the step S1 by using the binary background filtering image to obtain a background filtering image which is only promising;

s5: inputting the background filtered image obtained in S4 into a pre-constructed and trained pedestrian detection network based on improved FasterR-CNN for human body candidate region extraction and pedestrian detection, in the pedestrian detection process, firstly performing feature extraction on the background filtered image by a convolutional neural network to obtain a feature map, then extracting three proportional target suggestion frames respectively corresponding to a head, a half body and a human body from the feature map by an improved RPN, then projecting the target suggestion frames onto the feature map to obtain corresponding feature matrices, sequentially passing each feature matrix through a ROIpooling layer and a full connection layer to obtain a category probability and a boundary frame regression parameter, and finally combining the intersection relations between the three proportional target suggestion frames by taking the head as a reference to obtain a final thermal imaging pedestrian detection result.

Preferably, the specific implementation method in S1 is:

converting the thermal imaging image into a thermal imaging gray image, then counting to obtain a cumulative normalized histogram, and then mapping the thermal imaging gray image pixel by pixel according to a mapping relation to form a histogram equalization enhanced image, wherein the mapping relation is as follows:

p′_i＝min{x}+s_i·(max{x}-min{x})

in the formula: p'_iRepresenting the equalized gray value s in the histogram equalized enhanced image obtained by mapping the pixel with the gray value i in the thermal imaging gray level image_iThe histogram probability accumulated value of a pixel with the gray level i in the thermal imaging gray level image is obtained from the accumulated normalized histogram; min { x } represents the minimum grayscale value in the thermal imaging grayscale map, and max { x } represents the maximum grayscale value in the thermal imaging grayscale map.

Preferably, the specific implementation method of S2 is as follows:

s21: training a Gaussian mixture model by using a plurality of enhanced images in the histogram equalization enhanced image; during training, firstly, initializing a basic Gaussian mixture matrix by using a first frame of enhanced image, then inputting the enhanced image frame by frame, comparing each newly added pixel with the mean value of the prior Gaussian mixture model, updating matrix coefficients if the newly added pixel is within 3 times of the variance with the mean value, or creating a new Gaussian distribution;

s22: and matching the histogram equalization enhanced image to be segmented pixel by adopting a Gaussian mixture model obtained in the step S21, and if one pixel value can be matched with one Gaussian mixture matrix, considering the pixel as a background, otherwise, considering the pixel as a foreground.

Preferably, the specific implementation method of S3 is as follows:

s31: calibrating a thermal imaging camera for acquiring a thermal imaging image, and determining upper and lower thresholds of pedestrian imaging in the thermal imaging camera;

s32: pixels between the upper threshold value and the lower threshold value in the thermal imaging map are regarded as a pedestrian area, and the rest pixels are regarded as a background area.

Preferably, the specific implementation method of S4 is as follows:

s41: adding the binary background subtraction image obtained in the step S2 and the binary threshold segmentation image obtained in the step S3 to obtain a binary background filtered image with a foreground pixel value of 1 and a background pixel value of 0;

s42: and multiplying the binary background filtering image and the histogram equalization enhanced image obtained in the step S1 by pixel points one by one to obtain a final background filtering image.

Preferably, in S5, the pedestrian detection network based on the improved fasterrr-CNN includes a convolutional neural network, an improved RPN network, a roiploling layer, and a full link layer, wherein the thermal imaging pedestrian detection result is obtained as follows:

s51: inputting the background filtered image into a convolutional neural network to obtain a corresponding characteristic map;

s52: feeding the feature map obtained in the step S51 into a target suggestion box which is possibly existed in the improved RPN network extraction target; for each position in the image, initializing 9 possible candidate frames according to the area of three sizes and the orthogonal combination of the proportions of the three sizes; the minimum proportion corresponds to a target suggestion frame of a human head, the middle proportion corresponds to a target suggestion frame of a half body, the half body is an upper half body or a lower half body, and the maximum proportion corresponds to a target suggestion frame of a human body;

s53: projecting the target suggestion boxes obtained in the step S52 to the feature map obtained in the step S51 to obtain corresponding feature matrixes, scaling each feature matrix to 7 × 7 through a ROIploling layer, and then flattening and sending the feature matrixes to a full-connection layer to obtain final class probability and regression parameters of the bounding box;

s54: taking the human head target frame with the highest reliability as a reference, and regarding each human head target frame, if a half body target frame exists or a human body target frame intersects with the human body target frame, combining the human body target frame and the human body target frame into a human body together, if no other target frame intersects with the human body target frame, regarding the human head target frame as the human head detected by the human body occlusion, and taking the human head target frame as a final target frame; and regarding the human body target frame, if the human body target frame does not intersect with any human head target frame, judging the human body target frame to be misjudged, and abandoning the human body target frame.

Further, the areas of the three sizes are 128 × 128,256 × 256,384 × 384, respectively.

Further, the ratio of the three sizes is 1:1,1:2 and 1:3 respectively.

Further, whether intersection exists among the target frames is judged through intersection comparison among the target frames.

Preferably, the improved FasterR-CNN based pedestrian detection network is trained in advance using labeled thermal imaging datasets.

The invention solves the problem of temperature drift of the imaging result of the thermal imaging camera by a histogram equalization method, uses threshold segmentation and background subtraction to filter the background, can fully utilize the characteristics of a thermal imaging picture, and improves the pedestrian detection precision in low-light and no-light environments.

Drawings

FIG. 1 is an overall flow chart of a thermal imaging background filtering based pedestrian detection algorithm as disclosed in the present invention.

FIG. 2 is a diagram of the neural network architecture of the improved Faster R-CNN.

Fig. 3 is a thermal imaging diagram used as an example.

Fig. 4 is a histogram equalization enhanced image obtained after the histogram equalization method processing.

Fig. 5 is a binary background subtraction image obtained based on a gaussian mixture model.

Fig. 6 is a binarized dual-threshold-segmented image obtained by dual-threshold segmentation.

Fig. 7 is a background-filtered image obtained based on a binary background subtraction image and a binary threshold segmentation image.

Fig. 8 is a head target suggestion box, a half-length target suggestion box, and a whole-body target suggestion box obtained by the RPN network being improved.

Fig. 9 is the final target detection result.

Detailed Description

The invention will be further elucidated and described with reference to the drawings and the detailed description. The technical features of the embodiments of the present invention can be combined correspondingly without mutual conflict.

In a preferred embodiment of the present invention, the open source tool Pytorch based on deep learning implements a pedestrian detection method based on thermal imaging background filtering. As shown in fig. 1, the method for detecting a pedestrian based on background filtering disclosed by the invention comprises two parts of thermal imaging image background filtering and deep learning model construction, training and detection, and the specific implementation process is as follows:

first, background filtering treatment of thermal imaging image

A thermal imaging camera is first used to acquire a segment of thermal imaging video, which consists of a series of successive frames of thermal imaging images. For the image at the time t, as shown in fig. 1, a histogram equalization method is firstly used for background filtering to improve the deviation and drift problems of the thermal imaging image, and the specific implementation process is as follows:

1) and performing histogram equalization method processing on the thermal imaging graph needing target detection.

Firstly, converting an original thermal imaging image into a thermal imaging gray level image, expressing the thermal imaging gray level image as { x }, counting the proportion of total pixels of each gray level on the image, and obtaining the occurrence probability of pixels with gray level i in the image as follows:

in the formula: n represents the number of all pixels in the image;

p obtained as described above_x(i) Is the histogram of the gray scale map.

Then, the cumulative normalized histogram of the thermographic image is obtained by accumulation:

in the formula: s_kHistogram probability accumulation for k-gray pixelsA value;

finally, mapping the thermal imaging gray level image pixel by pixel according to a mapping relation to form a histogram equalization enhanced image, wherein the mapping relation is as follows:

p′_i＝min{x}+s_i·(max{x}-min{x})

in the formula: p'_iExpressing the gray value after equalization in the histogram equalization enhanced image obtained by mapping the pixel with the gray value i in the thermal imaging gray level image, namely as the pixel value in the histogram equalization enhanced image, s_iThe histogram probability accumulated value of the pixel with the gray level i in the thermal imaging gray level image can be obtained from the accumulated normalized histogram; min { x } represents the minimum grayscale value in the thermal imaging grayscale map, and max { x } represents the maximum grayscale value in the thermal imaging grayscale map.

Taking fig. 3 as an example, after the above operation is performed on the original image, each pixel may be mapped to a new pixel, and a histogram equalization enhanced image may be obtained, as shown in fig. 4.

2) Based on a Gaussian mixture model, separating foreground and background according to the relation between the front frame and the rear frame in the histogram equalization enhanced image obtained after 1) processing to obtain a background subtraction image. In the process, the enhanced image of the previous t frames of the enhanced image obtained by histogram equalization is used for training a Gaussian mixture model, and the value of specific t can be adjusted according to needs. The training process of the Gaussian mixture model is as follows:

firstly, a first frame of enhanced image is used for initializing a basic Gaussian mixture matrix, and a Gaussian mixture model is established for each pixel point on an image at the moment t:

in the formula: x_tIs the pixel value of the pixel point at the time t, k is the number of Gaussian distribution functions, w_i,t、μ_i,t、

Respectively representing the weight corresponding to the ith Gaussian modelThe coefficient, the mean and the variance are,

is a gaussian density function.

And then, inputting subsequent enhanced images frame by frame, comparing the newly added pixels with the mean value of the existing Gaussian mixture model, if the newly added pixels and the mean value are within 3 times of the variance, updating the matrix coefficient, and otherwise, creating a new Gaussian distribution. The model update formula is as follows:

w_i,t＝(1+α)w_i,t-1

μ_i,t＝ρμ_i,t-1+(1-ρ)X_t

in the formula: alpha is a model weight updating coefficient, rho is a model mean value updating coefficient,

and finally, carrying out background pixel matching on the subsequent histogram equalization enhanced image to be segmented by adopting the mixed Gaussian model obtained in the previous step. If a pixel value can match one of the gaussian mixture matrices, the pixel is considered as a background and is recorded as 0, otherwise, the pixel is considered as a foreground and is recorded as 1, and thus the final binary background subtraction image is shown in fig. 5.

3) And performing double-threshold segmentation on the thermal imaging image acquired by the thermal imaging camera by using the upper and lower thresholds of the image of the pedestrian in the thermal imaging image to obtain a threshold segmentation image after segmentation of the pedestrian and the background.

Before segmentation, the thermal imaging camera needs to be calibrated, the upper and lower threshold boundaries of pedestrian imaging for the thermal imaging camera are determined, and the upper and lower thresholds are set to be T respectively_u,T_dBased on these two thresholds, the image can be divided into two parts: the first part is greater than or equal to T_dAnd is less than or equal to T_uIs marked as 1, and the second part is smaller than T_dOr greater than T_uIs denoted as 0, and the formula is as follows:

in the formula: p (x, y) represents a pixel value of a point (x, y) in the image, and f (x, y) represents the resulting binary threshold-divided image, as shown in fig. 6 in this embodiment.

4) Adding the binary background subtraction image and the binary threshold segmentation image to obtain a binary background filtered image for distinguishing the foreground from the background, and performing pixel-by-pixel multiplication on the binary background filtered image and the histogram equalization enhanced image obtained by histogram equalization, so as to remove the background of the histogram equalization enhanced image by using the binary background filtered image to obtain a background filtered image which is only promising, as shown in fig. 7.

Construction, training and detection of pedestrian detection network based on improved FasterR-CNN

In the part, the obtained background filtering image is sent to a trained improved FasterR-CNN framework for reasoning, so that human body candidate region extraction and pedestrian detection are realized, and a pedestrian detection result based on a thermal imaging image under the condition of weak light or no light is finally obtained. The pedestrian detection network structure based on the improved FasterR-CNN is shown in FIG. 2:

firstly, inputting the background filtering image obtained in the previous step into a convolutional neural network to obtain a characteristic diagram of the image. In this embodiment, the convolutional neural network may employ a ResNet-101 network.

Then, the feature map of the image is sent to an improved RPN (region suggestion network) to extract a candidate region where the target may exist. In the improved RPN network, compared with the common RPN network, the improved RPN network is characterized in that three proportion target suggestion boxes respectively corresponding to a human head, a human body (the half body can be an upper half body or a lower half body) and the human body are extracted from a characteristic diagram, wherein the three proportions are specifically determined according to a detection target. In this example, for each location in the image, 9 possible candidate boxes, 128 × 128(1:1), 128 × 256(1:2), 128 × 384(1:3), 256 × 256(1:1), 256 × 512(1:2), 256 × 768(1:3), 384 × 384(1:1), 384 × 768(1:2), 384 × 1152(1:3), were initialized in three sizes of areas (128 × 128,256 × 256,384 × 384) and the three sizes of ratios (1:1,1:2,1:3) orthogonally combined. The minimum ratio of 1:1 corresponds to the target suggestion box for the human head, the intermediate ratio of 1:2 corresponds to the target suggestion box for the half of the body, and the maximum ratio of 1:3 corresponds to the target suggestion box for the human body, which can subsequently be used to combine to form a complete human body, as shown in fig. 8.

And then, projecting the target suggestion frame on the characteristic diagram to obtain corresponding characteristic matrixes, zooming each characteristic matrix to 7 × 7 in turn through ROIploling layers, and flattening and sending the characteristic matrixes into a full-connection layer to obtain class probability and regression parameters of the boundary frame.

And finally, combining the intersection relationship among the three proportional target suggestion frames by taking the human head as a reference to obtain a final thermal imaging pedestrian detection result. Taking the human head as a reference means that the human head target frame with the highest reliability is taken as a reference, for each human head target frame, if a half body target frame exists or a human body target frame and the human body target frame have an intersection, the human head target frame and the human body target frame are combined together to form a human body, if no other target frame and the human body target frame have an intersection, the human head is considered to be the human head detected by the human body being blocked, and the human head target frame is taken as a final target frame; and regarding the human body target frame, if the human body target frame does not intersect with any human head target frame, judging the human body target frame to be misjudged, and abandoning the human body target frame.

Specifically, whether or not there is an intersection between the target frames is determined by the intersection ratio between the target frames. Defining the target frame to be represented by the lower left corner coordinate and the upper right corner coordinate of the target frame, and then representing the human head detection frame as D_head(x_head-bl,y_head-bl,x_head-ur,y_head-ur) The half-length object box is denoted as D_half(x_half-bl,y_half-bl,x_half-ur,y_half-ur) The human target box is denoted as D_body(x_body-bl,y_body-bl,x_body-ur,y_body-ur). Because the head imaging characteristics are most obvious in the thermal imaging image, the reliability of the detected human head target frames is higher, and for each human head target frame, if a half body or a human body target frame has an Intersection with the human head target frame, namely the Intersection over Unit (IoU) between the target frames is larger than zero, a human body is combined; if no other detection frame has intersection with the detection frame, the detection frame is regarded as the head of the person detected by the occlusion, and the final target is obtainedThe frame is the human head target frame. And regarding the human body target frame, if no human head has intersection, judging as misjudgment, and abandoning the target frame.

In the formula: IoU_1-2Representing the intersection ratio between object boxes 1 and 2. D₁Representing the target frame 1, with coordinates (x)_1-bl,y_1-bl,x_1-ur,y_1-ur)，D₁Representing the target frame 2, with coordinates (x)_2-bl,y_2-bl,x_2-ur,y_2-ur)。

Combined overall pedestrian target frame D_peopleThe coordinate is (x)_p-bl,y_p-bl,x_p-ur,y_p-ur) Wherein:

x_p-bl＝min(x_1-bl,x_2-bl)

y_p-bl＝min(y_1-bl,y_2-bl)

x_p-ur＝max(x_1-ur,x_2-ur)

y_p-ur＝max(y_1-ur,y_2-ur)

the final pedestrian detection result is shown in fig. 9.

In addition, the pedestrian detection network based on the improved FasterR-CNN needs to be trained by using a thermal imaging data set with labels in advance before being used for actual detection, and the training method belongs to the prior art. In this embodiment, the specific implementation manner that the training process can adopt is as follows:

1. initializing the parameters of the preposed convolutional layer by using an ImageNet pre-training classification model, and training an RPN network;

2. training a classification and bounding box regression network by using the obtained target suggestion box;

3. fine tuning the RPN by using the trained pre-convolutional network layer;

4. fine-tuning the classification and bounding box regression network by using the trained pre-convolutional network layer;

5. the RPN network and the classification and bounding box regression network share the trained pre-convolutional network layer to form a complete network model.

The above-described embodiments are merely preferred embodiments of the present invention, which should not be construed as limiting the invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, the technical scheme obtained by adopting the mode of equivalent replacement or equivalent transformation is within the protection scope of the invention.

Claims

1. A pedestrian detection method based on thermal imaging background filtering is characterized by comprising the following steps:

s2: based on a Gaussian mixture model, separating foreground and background from the histogram equalization enhanced image obtained after the processing of S1 according to the relation between the front and rear frames to obtain a binary background subtraction image;

2. The pedestrian detection method based on thermal imaging background filtering as claimed in claim 1, wherein the specific implementation method in S1 is:

p′_i＝min{x}+s_i·(max{x}-min{x})

3. The pedestrian detection method based on thermal imaging background filtering as claimed in claim 1, wherein the specific implementation method of S2 is as follows:

4. The pedestrian detection method based on thermal imaging background filtering as claimed in claim 1, wherein the specific implementation method of S3 is as follows:

5. The pedestrian detection method based on thermal imaging background filtering as claimed in claim 1, wherein the specific implementation method of S4 is as follows:

6. The pedestrian detection method based on thermal imaging background filtering as claimed in claim 1, wherein in S5, the pedestrian detection network based on modified FasterR-CNN comprises a convolutional neural network, a modified RPN network, a roiploling layer and a full connectivity layer, wherein the thermal imaging pedestrian detection result is obtained as follows:

s52: feeding the feature map obtained in the step S51 into a target suggestion box which is possibly existed in the improved RPN network extraction target; for each position in the image, initializing 9 possible candidate frames according to the area with three sizes and the orthogonal combination of the proportions with the three sizes; the minimum proportion corresponds to a target suggestion frame of a human head, the middle proportion corresponds to a target suggestion frame of a half body, the half body is an upper half body or a lower half body, and the maximum proportion corresponds to a target suggestion frame of a human body;

s53: projecting the target suggestion frames obtained in the step S52 on the feature map obtained in the step S51 to obtain corresponding feature matrixes, scaling each feature matrix to 7 × 7 through a ROIPooling layer, flattening and sending the feature matrixes into a full connection layer to obtain final class probability and regression parameters of the boundary frame;

7. The method of claim 6, wherein the three sizes of areas are 128 x 128,256 x 256,384 x 384, respectively.

8. The pedestrian detection method based on thermal imaging background filtering according to claim 6, wherein the ratio of the three sizes is 1:1,1:2,1:3, respectively.

9. The pedestrian detection method based on thermal imaging background filtering as claimed in claim 6, wherein whether there is an intersection between the target frames is determined by an intersection ratio between the target frames.

10. The pedestrian detection method based on thermal imaging background filtering according to claim 1, wherein the pedestrian detection network based on modified FasterR-CNN is trained using labeled thermal imaging data sets in advance.