CN114332942A - Night infrared pedestrian detection method and system based on improved YOLOv3 - Google Patents

Night infrared pedestrian detection method and system based on improved YOLOv3 Download PDF

Info

Publication number
CN114332942A
CN114332942A CN202111662884.3A CN202111662884A CN114332942A CN 114332942 A CN114332942 A CN 114332942A CN 202111662884 A CN202111662884 A CN 202111662884A CN 114332942 A CN114332942 A CN 114332942A
Authority
CN
China
Prior art keywords
pedestrian
infrared
yolov3
sab
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111662884.3A
Other languages
Chinese (zh)
Inventor
郑庆祥
金积德
田亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202111662884.3A priority Critical patent/CN114332942A/en
Publication of CN114332942A publication Critical patent/CN114332942A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a night infrared pedestrian detection method and system based on improved YOLOv3, which comprises the steps of collecting a plurality of infrared pedestrian detection data sets by using a plurality of infrared cameras, carrying out pixel value contrast enhancement processing on the data sets, increasing pedestrian pixel values and reducing background pixel values; improving and constructing a night infrared pedestrian detection network model YOLOv3-SAB based on YOLOv3, introducing a stem downsampling module and asymmetric convolution to improve network feature extraction capability and feature expression capability, introducing bottleneck residual error to reduce calculation parameters in the model and improve the pedestrian detection speed of the model; clustering by using a mean clustering algorithm to generate a specific prior aiming frame, so that the positioning accuracy of the model target is improved; CIoU is used as a regression loss function of a YOLOv3-SAB network boundary frame, so that model convergence is accelerated, and accuracy of a prediction frame is improved; training a YOLOv3-SAB network to generate a nighttime infrared pedestrian detection model; and carrying out real-time infrared pedestrian detection at night by using a night infrared pedestrian detection model. The invention effectively improves the detection precision and detection speed for pedestrians at night.

Description

Night infrared pedestrian detection method and system based on improved YOLOv3
Technical Field
The invention belongs to the technical field of deep learning target detection, and particularly relates to a night infrared pedestrian detection method and system based on an improved deep learning target detection model (YOLOv 3).
Background
Deep learning target detection is taken as an important branch in the field of machine vision, has been widely applied to the fields of automobile auxiliary driving, intelligent monitoring, military striking and the like, and becomes a research hotspot in the field of computer vision. The infrared image is obtained through the thermal radiation imaging of infrared imaging system according to the object, and the infrared image has the advantage that the interference killing feature is strong, be difficult for receiving adverse circumstances influence, compares in utilizing visible light to carry out target detection, and infrared target detection has higher research and using value.
In recent years, with new development and breakthrough of deep Convolutional Neural Network (CNN) in the field of deep learning target detection, the pedestrian detection technology related to the deep convolutional neural network is rapidly developed. The target detection algorithm based on the convolutional neural network realizes end-to-end detection, a large amount of characteristic information is extracted through the multilayer neural network, the artificial characteristic design link is omitted, and the method has strong robustness and generalization capability. Currently, the mainstream target detection algorithms based on CNN are divided into two categories: a two-stage (two-stage) detection model based on candidate regions and a one-stage (one-stage) detection model based on bounding box regression. The two-stage detection model divides target detection into two steps, firstly selects a candidate Region (Region probable) containing a target in an input image, and then classifies and position regresses the candidate Region to obtain a detection result, wherein the detection algorithm comprises the following steps: R-CNN, Fast R-CNN, Faster R-CNN. The single-stage detection model omits the generation stage of a candidate region, and puts feature extraction, regression and classification in the same convolutional neural network to directly obtain target classification and position information, wherein the algorithms comprise a YOLO series, an SSD and the like.
The target detection precision of the double-stage detection model is high, but the detection speed is low, and the real-time property is difficult to meet; the target detection of the single-stage detection model is good in real-time performance and high in detection speed, but the detection precision is low. The method has the advantages that the pedestrian detection by utilizing visible light is greatly influenced by illumination conditions and weather changes, the limitation is large under the conditions of weak illumination conditions and severe weather at night, and the infrared technology is utilized to detect pedestrians, so that the target detection model has low target feature extraction capability and feature expression capability due to the defects of color deficiency and insufficient texture existing in infrared images although the pedestrian detection is slightly influenced by illumination and environment.
Disclosure of Invention
In order to solve the problem faced by the target detection, the invention provides a night infrared pedestrian detection method and system based on an improved deep learning target detection model (YOLOv3), so that the night pedestrian detection effect is effectively improved.
The method adopts the technical scheme that: a nighttime infrared pedestrian detection method based on improved YOLOv3 comprises the following steps:
step 1: constructing a deep learning target detection network YOLOv3-SAB, and taking charge of extracting pedestrian appearance characteristics and pedestrian head, body and limb contour characteristics;
the deep learning target detection network YOLOv3-SAB is an improved network based on a YOLOv3 neural network;
wherein, a downsampling module stem is used for replacing the downsampling convolution with the convolution kernel of 3 multiplied by 3 and the step of 2 in the characteristic extraction network part in the YOLOv3 neural network; the stem module is formed by connecting two branches in parallel, one branch performs convolution operation with 1 × 1, 1 and 3 × 3 stride and 2 stride on the last output feature, the other branch performs maximum pooling operation with 2 × 2 and 2 stride on the last output feature, feature graphs output by the two branches are added, the number of channels is kept unchanged, and a fused feature graph is obtained;
replacing a residual error module in a feature extraction network part in the YOLOv3 neural network by using a bottleneck residual error module, and replacing convolution with the step length of 3 multiplied by 3 and 1 in the bottleneck residual error module by asymmetric convolution; the bottleneck residual error module is formed by connecting two branches in parallel, one branch firstly uses convolution with 1 multiplied by 1 and step length of 1 to compress the number of channels, then uses asymmetric convolution to extract features, keeps the number of the channels unchanged, finally restores the number of the channels by performing convolution with 1 multiplied by 1 and step length of 1, and carries out short circuit connection on the other branch, adds feature graphs output by the two branches and keeps the number of the channels unchanged to obtain a fused feature graph;
replacing convolution with 3 multiplied by 3 and step length of 1 in a bottleneck residual error module by using asymmetric convolution, respectively performing convolution operation and normalization processing with convolution kernel scales of 3 multiplied by 3, 3 multiplied by 1 and 1 multiplied by 3 on a feature graph output last time by using the asymmetric convolution, fusing the three feature graphs respectively subjected to the convolution operation and the normalization processing, keeping the number of channels unchanged, and finally activating by using a Relu activation function;
step 2: training a deep learning target detection network YOLOv 3-SAB;
the method specifically comprises the following substeps:
step 2.1: acquiring a plurality of pedestrian infrared video data under different visual angles and scenes by using a plurality of infrared cameras, extracting the plurality of pedestrian infrared video data frame by frame into a plurality of infrared picture data, and selecting the plurality of infrared picture data to make an infrared pedestrian detection data set; performing enhancement processing on the plurality of infrared picture data, increasing the gray value of the pedestrian and reducing the gray value of the background, and performing image labeling on the position information, the pedestrian size information and the pedestrian category information of the pedestrian target in the plurality of infrared picture data, wherein the pedestrian category information is used as a classification basis according to the pedestrian appearance information, the contour information and the size information;
step 2.2: clustering pedestrian sizes marked on the images by adopting a mean clustering algorithm to obtain a prior aiming frame size aiming at night infrared pedestrian detection, and replacing the prior aiming frame size obtained by clustering with the original checking frame size in a deep learning target detection network YOLOv 3-SAB;
step 2.3: according to the infrared pedestrian detection data set acquired in the step 2.1, distributing the data set into a training data set and a testing data set according to a preset proportion; training the deep learning target detection network YOLOv3-SAB by using a training data set, wherein in the training process, the deep learning target detection network YOLOv3-SAB generates and stores a model weight file every time one iteration is carried out, and the training is stopped until the deep learning target detection network YOLOv3-SAB reaches a convergence state; the deep learning target detection network YOLOv3-SAB uses the model weight file as a pedestrian detection weight to detect all the plurality of infrared picture data in the test data set, all the pedestrian information in the test data set is obtained through detection and stored, the detection precision is obtained through comparison calculation according to the detection result and the real pedestrian information marked in the step 2.1, and the model weight file with the highest detection precision is selected as the model weight of the deep learning target detection network YOLOv 3-SAB;
and step 3: and (3) inputting the real-time collected plural infrared video images into a deep learning target detection network YOLOv3-SAB, and performing real-time pedestrian detection on the video data by using the model weight file in the step 2.3 as a pedestrian detection weight.
The technical scheme adopted by the system of the invention is as follows: a night infrared pedestrian detection system based on improved YOLOv3 comprises the following modules:
the module 1 is used for constructing a deep learning target detection network YOLOv3-SAB and is responsible for extracting pedestrian appearance characteristics and pedestrian head, body and limb contour characteristics;
the deep learning target detection network YOLOv3-SAB is an improved network based on a YOLOv3 neural network;
wherein, a downsampling module stem is used for replacing the downsampling convolution with the convolution kernel of 3 multiplied by 3 and the step of 2 in the characteristic extraction network part in the YOLOv3 neural network; the stem module is formed by connecting two branches in parallel, one branch performs convolution operation with 1 × 1, 1 and 3 × 3 stride and 2 stride on the last output feature, the other branch performs maximum pooling operation with 2 × 2 and 2 stride on the last output feature, the feature graphs output by the two branches are added, the number of channels is kept unchanged, and the added and fused feature graph is obtained;
replacing a residual error module in a feature extraction network part in the YOLOv3 neural network by using a bottleneck residual error module, and replacing convolution with the step length of 3 multiplied by 3 and 1 in the bottleneck residual error module by asymmetric convolution; the bottleneck residual error module is formed by connecting two branches in parallel, one branch firstly uses convolution with 1 multiplied by 1 and step length of 1 to compress the number of channels, then uses asymmetric convolution to extract features, keeps the number of channels unchanged, finally restores the number of channels by carrying out convolution with 1 multiplied by 1 and step length of 1, and carries out short circuit connection on the other branch, adds the feature graphs output by the two branches and keeps the number of channels unchanged to obtain a feature graph after addition and fusion;
replacing convolution with 3 multiplied by 3 and step length of 1 in a bottleneck residual error module by using asymmetric convolution, respectively performing convolution operation and normalization processing with convolution kernel scales of 3 multiplied by 3, 3 multiplied by 1 and 1 multiplied by 3 on a feature map output last time by using the asymmetric convolution, adding the three feature maps respectively subjected to the convolution operation and the normalization processing, keeping the number of channels unchanged, and finally activating by using a Relu activation function;
the module 2 is used for training a deep learning target detection network YOLOv 3-SAB;
the method specifically comprises the following sub-modules:
the module 2.1 is used for acquiring a plurality of pedestrian infrared video data under different visual angles and scenes by using a plurality of infrared cameras, extracting the plurality of pedestrian infrared video data frame by frame into a plurality of infrared picture data, and selecting the plurality of infrared picture data to make an infrared pedestrian detection data set; performing enhancement processing on the plurality of infrared picture data, increasing the gray value of the pedestrian and reducing the gray value of the background, and performing image labeling on the position information, the pedestrian size information and the pedestrian category information of the pedestrian target in the plurality of infrared picture data, wherein the pedestrian category information is used as a classification basis according to the pedestrian appearance information, the contour information and the size information;
a module 2.2, configured to cluster the pedestrian sizes marked on the images by using a mean clustering algorithm, obtain a priori aiming frame size for nighttime infrared pedestrian detection by clustering, and replace the original checking frame size in the deep learning target detection network YOLOv3-SAB with the clustered priori aiming frame size;
the module 2.3 is used for distributing the data set into a training data set and a testing data set according to a preset proportion according to the infrared pedestrian detection data set acquired in the module 2.1; training the deep learning target detection network YOLOv3-SAB by using a training data set, wherein in the training process, the deep learning target detection network YOLOv3-SAB generates and stores a model weight file every time one iteration is carried out, and the training is stopped until the deep learning target detection network YOLOv3-SAB reaches a convergence state; the deep learning target detection network YOLOv3-SAB uses the model weight file as a pedestrian detection weight to detect all the plurality of infrared picture data in the test data set, all the pedestrian information in the test data set is obtained through detection and stored, the detection precision is obtained through comparison calculation according to the detection result and the real pedestrian information marked in the module 2.1, and the model weight file with the highest detection precision is selected as the model weight of the deep learning target detection network YOLOv 3-SAB;
and the module 3 is used for inputting the plural infrared video images acquired in real time into the deep learning target detection network YOLOv3-SAB, and performing real-time pedestrian detection on the video data by using the model weight file in the module 2.3 as the pedestrian detection weight of the deep learning target detection network YOLOv 3-SAB.
The invention has the beneficial effects that:
(1) the invention discloses a night infrared pedestrian detection method based on an improved deep learning target detection model (YOLOv3), which is applied to the fields of night pedestrian detection and rescue, and the like, which cannot be responded by a common digital night vision device, and visible light such as no light at night, complex background, severe environment and climate, and the like, and can realize real-time, clear, accurate and efficient pedestrian detection under the scenes such as automatic driving, intelligent monitoring, military reconnaissance, field rescue, and the like.
(2) The nighttime infrared pedestrian detection method based on the improved deep learning target detection model (YOLOv3) increases the infrared image contrast by using a piecewise linear transformation method, increases the pedestrian pixel value, reduces the background pixel value, enriches the pedestrian characteristic information and reduces the background characteristic information; a stem downsampling module and an asymmetric convolution module are used for improving the feature extraction capability and the feature expression capability of a deep learning target detection model (YOLOv 3); and the pedestrian position information prediction accuracy is improved by using a priori aiming frame and a CIoU boundary frame loss function aiming at night infrared pedestrian detection.
(3) According to the night infrared pedestrian detection method based on the improved deep learning target detection model (YOLOv3), the bottleneck residual error module is used for reducing the number of convolution channels to reduce the calculation parameters of the model, and after the bottleneck residual error module is used, the calculation amount of the model is reduced by 3.48 times of that of the original model, so that the calculation cost is greatly reduced. The size of the model weight file of the improved deep learning target detection model (YOLOv3-SAB) is 133MB, 113MB is reduced compared with the YOLOv3 model weight file, and the YOLOv3-SAB model weight file is suitable for being deployed on embedded systems and edge computing units such as Ingland or raspberry pies and meets the real-time detection requirement.
Drawings
FIG. 1 is a flow chart of a method of an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a deep learning target detection network YOLOv3-SAB according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a down-sampling module (stem module) according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a bottleneck residual error module according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating an asymmetric convolution structure according to an embodiment of the present invention;
FIG. 6 is a flow chart of a prior aiming block for K-means cluster design in the embodiment of the present invention;
FIG. 7 is a P-R curve diagram of the test result of the detection accuracy in the embodiment of the present invention;
fig. 8 is a diagram illustrating the detection effect of infrared pedestrians at night in the embodiment of the present invention.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
Referring to fig. 1, the night infrared pedestrian detection method based on improved YOLOv3 provided by the invention comprises the following steps:
step 1: constructing a deep learning target detection network YOLOv 3-SAB;
please refer to fig. 2, in the embodiment, a deep learning target detection network YOLOv3-SAB is constructed based on a YOLOv3 neural network algorithm in deep learning, a pedestrian feature extraction algorithm in the YOLOv3 neural network algorithm is partially improved, and a new deep learning target detection model (the deep learning target detection network YOLOv3-SAB) is obtained, wherein the improved deep learning target detection network YOLOv3-SAB feature extraction algorithm is responsible for extracting pedestrian appearance features and pedestrian head, body and limb contour features.
Please refer to fig. 3, in this embodiment, a downsampling module (stem module) is used to replace a downsampling convolution with a convolution kernel of 3 × 3 and a step of 2 in a feature extraction algorithm (darknev 53 network algorithm) part of a deep learning target detection model (yoolov 3), the stem module is composed of two branches connected in parallel, one branch performs convolution operations with a step of 1 × 1, a step of 1 and 3 × 3, and a step of 2 on a feature output last time, the other branch performs maximum pooling operations with a step of 2 × 2 and a step of 2 on a feature output last time, feature maps output by the two branches are added and the number of channels is kept unchanged, and a feature map after addition and fusion is obtained.
Referring to fig. 4, in the present embodiment, a bottleneck residual module is used to replace a residual module in a feature extraction algorithm (Darknet53 network algorithm) part of a deep learning target detection model (YOLOv3), a bottleneck residual module is used to reduce model calculation parameters, and a convolution with a stride of 1 and a size of 3 × 3 in the bottleneck residual is replaced with an asymmetric convolution. The bottleneck residual error is formed by connecting two branches in parallel, one branch firstly uses convolution with 1 multiplied by 1 and step length of 1 to compress the number of channels, then uses asymmetric convolution to extract the characteristics, keeps the number of channels unchanged, finally restores the number of channels by carrying out convolution with 1 multiplied by 1 and step length of 1, and carries out short circuit connection on the other branch, adds the characteristic graphs output by the two branches and keeps the number of channels unchanged, thus obtaining the characteristic graph after addition and fusion.
Referring to fig. 5, in the present embodiment, asymmetric convolution is used to replace convolution with a step of 1 and a convolution kernel scale of 3 × 3, 3 × 1, and 1 × 3 in a bottleneck residual, the asymmetric convolution performs convolution operations and normalization (batch normalization) on a feature map output last time, the three feature maps respectively subjected to the convolution operations and normalization are added while keeping the number of channels unchanged, and finally, a Relu activation function is used for activation.
Step 2: training a deep learning target detection network YOLOv 3-SAB;
the method specifically comprises the following substeps:
step 2.1: acquiring infrared video data of a plurality of pedestrians at different viewing angles and scenes by using a plurality of infrared cameras, extracting the infrared video data of the plurality of pedestrians frame by frame to form a plurality of infrared picture data, and selecting 10000 pieces of the infrared picture data to make an infrared pedestrian detection data set; performing enhancement processing on the plurality of infrared picture data, increasing the gray value of the pedestrian and reducing the gray value of the background, and then performing image annotation on the position information, the pedestrian size information and the pedestrian category information of the pedestrian target in the plurality of infrared picture data by using LabelImg software, wherein the pedestrian category information is used as a classification basis according to the pedestrian appearance information, the contour information and the size information; and generating and storing each image as an xml file after the labeling is finished.
In this embodiment, the gray value of the plural infrared image data is transformed according to the piecewise linear function shown in the following formula (1);
Figure BDA0003450587830000071
wherein R represents the gray value of the plurality of infrared picture data before conversion, and R represents the gray value of the plurality of infrared picture data after conversion.
Step 2.2: clustering pedestrian sizes after image labeling by adopting a mean clustering algorithm to obtain a prior aiming frame size aiming at night infrared pedestrian detection, and replacing the prior aiming frame size obtained by clustering with the original checking frame size in a deep learning target detection network YOLOv 3-SAB;
referring to fig. 7, in the embodiment, clustering the pedestrian size after image labeling by using a mean clustering algorithm specifically includes the following sub-steps:
step 2.2.1: randomly selecting K marking frames (box) as initial prior aiming frames (anchor) in the data set after the image marking, wherein the K initial prior aiming frames are K clustering centers of the cluster;
step 2.2.2: using the intersection ratio of the marking frame and the prior aiming frame to measure the distance from 1 to IoU, and distributing the marking frame closest to the initial aiming frame to the cluster where the initial aiming frame is located;
step 2.2.3: calculating the mean value of the width and the height of all the aiming frames in each cluster, and updating the width and the height of a clustering center in each cluster according to the mean value of the width and the height of all the aiming frames in each cluster;
step 2.2.4: and (4) repeating the step 2.2.2 and the step 2.2.3 until the width and the height of the clustering center are not changed any more, and taking the size of the clustering center obtained at the moment as the prior aiming frame size of the infrared detection of the pedestrians at night.
In the embodiment, a CIoU target detection boundary box regression loss function is used to replace an original IoU boundary box regression loss function in a YOLOv3 neural network, and the function is used as a boundary box regression loss function of a deep learning target detection network YOLOv 3-SAB;
the boundary box regression loss function of the deep learning target detection network YOLOv3-SAB is as follows:
Figure BDA0003450587830000081
Figure BDA0003450587830000082
Figure BDA0003450587830000083
in the above formula, α represents a weight parameter, and ν represents an aspect ratio consistency parameter; h isgtAnd wgtRepresents the height and width of the real box; w and h represent the height and width of the prediction box; b and bgtRepresents the center point of the prediction frame and the real frame, rho () represents the Euclidean distance, c represents the prediction frame b and the real frame bgtThe diagonal distance of the minimum circumscribed matrix of (1); 1-IoU represent the intersection ratio of the annotation box and the prior aiming box.
Step 2.3: according to the infrared pedestrian detection data set acquired in the step 2.1, distributing the data set into a training data set and a testing data set according to a ratio of 8: 2; training a deep learning target detection network YOLOv3-SAB by using a training data set, wherein in the training process, the deep learning target detection network YOLOv3-SAB generates and stores a model weight file every time one iteration is carried out until the deep learning target detection network YOLOv3-SAB reaches a convergence state, and stopping training; the deep learning target detection network YOLOv3-SAB uses the model weight file as a pedestrian detection weight to detect all the plurality of infrared picture data in the test data set, all the pedestrian information in the test data set is obtained through detection and stored, the detection precision is obtained through comparison calculation according to the detection result and the real pedestrian information marked in the step 2.1, and the model weight file with the highest detection precision is selected as the model weight of the deep learning target detection network YOLOv 3-SAB;
in this embodiment, the detection precision is obtained by performing comparison calculation according to the detection result and the real pedestrian information marked in step 2.1, and the calculation content mainly includes: precision ratio Precision, Recall ratio Recall, average accuracy AP and frame rate per second FPS;
Figure BDA0003450587830000091
Figure BDA0003450587830000092
Figure BDA0003450587830000093
the method comprises the following steps that TP represents the number of correctly detected pedestrians in all night infrared pedestrian images in a test data set, FP represents the number of wrongly detected pedestrians in all night infrared pedestrian images in the test data set, and FN represents the number of undetected pedestrians in all night infrared pedestrian images in the test data set; in the formula (7), P represents Precision and R represents Recall.
And step 3: the deep learning target detection network YOLOv3-SAB is embedded into an embedded development system such as Invitta or raspberry pi. In the embedded development system, the deep learning target detection network YOLOv3-SAB uses the model weight file to perform real-time pedestrian detection on a plurality of infrared video images acquired by a plurality of infrared image acquisition devices in real time. And (3) inputting the real-time collected plural infrared video images into a deep learning target detection network YOLOv3-SAB, and carrying out real-time pedestrian detection on the video data by using the model weight file in the step 2.3 as a pedestrian detection weight through the deep learning target detection network YOLOv 3-SAB.
In this embodiment, the deep learning target detection network YOLOv3-SAB uses the model weight file in step 2.3 as the weight for pedestrian detection, classifies and regresses pedestrians in the video data, marks the classified and regressed pedestrian information in the video data in the form of a mark frame, and outputs the detected video data to obtain the final pedestrian detection result.
The effects of the present embodiment are further described below by way of specific implementations.
The software environment selected by the embodiment is as follows: the system comprises an ubuntu18.04 operating system, GPU acceleration software CUDA11.1 and CUDNN8.2, a Pythrch deep learning framework and a programming language Python; hardware environment: the processor is Intel Core i9-10850K, and the graphics card is NVIDIA GeForce RTX 3080.
In this embodiment, the data set created in step 2.1 is distributed into a training data set and a testing data set according to a ratio of 8:2, an initial learning rate is set to 0.001, a momentum is set to 0.9, a size of an image input pixel value is 416 × 416, small-batch training is performed with 6 images as one batch until all images in the training data set are trained once and an iteration is completed, 300 iterations are performed on the improved deep learning target detection network YOLOv3-SAB in total, a model weight file is generated and stored after each iteration is completed, and the model weight file is used for model testing.
In this embodiment, the test result of the improved deep learning target detection network (YOLOv3-SAB) is compared with the performance of the currently popular deep learning target detection model (YOLOv3, SSD and Faster R-CNN), and the P-R curves of different models are plotted as shown in fig. 7 for different model performance pairs such as those shown in table 1. As can be seen from Table 1 and FIG. 7, the detection precision (AP value) of the Yolov3-SAB model reaches 94.78% and is higher than those of the Yolov3, the SSD and the Faster R-CNN model, the detection speed (FPS) of the Yolov3-SAB model reaches 44.27 frames/s and is higher than those of the SSD and the Faster R-CNN model, and the weight of the Yolov3-SAB model is 131MB, which is reduced by about one time compared with that of the Yolov3 model, and the Yolov3-SAB model is suitable for being deployed on edge computing embedded equipment such as English viagra or raspberry pie.
TABLE 1 comparison of various model Properties
Figure BDA0003450587830000101
In the embodiment, an improved deep learning target detection network (YOLOv3-SAB) is used, the model weight file described in step 2.3 is used as a pedestrian detection weight, pedestrians in real-time collected complex infrared video data are classified and regressed in a night scene, the classified and regressed pedestrian information is marked in the video data in a form of a marking frame, the detected video data is output to obtain a pedestrian detection result, the pedestrian detection result is shown in fig. 8, and the detection result shows that the detection precision is high and phenomena such as false detection and missing detection do not occur.
It should be understood that the various embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and like parts between various embodiments may be referred to. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principle and the implementation mode of the present invention are explained by applying specific examples in the present specification, and the above descriptions of the examples are only used to help understanding the method and the core idea of the present invention; meanwhile, for the general technical personnel in the field, according to the idea of the invention, the specific implementation mode and the application range can be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (7)

1. A nighttime infrared pedestrian detection method based on improved YOLOv3 is characterized by comprising the following steps:
step 1: constructing a deep learning target detection network YOLOv3-SAB, and taking charge of extracting pedestrian appearance characteristics and pedestrian head, body and limb contour characteristics;
the deep learning target detection network YOLOv3-SAB is an improved network based on a YOLOv3 neural network;
wherein, a downsampling module stem is used for replacing the downsampling convolution with the convolution kernel of 3 multiplied by 3 and the step of 2 in the characteristic extraction network part in the YOLOv3 neural network; the stem module is formed by connecting two branches in parallel, one branch performs convolution operation with 1 × 1, 1 and 3 × 3 stride and 2 stride on the last output feature, the other branch performs maximum pooling operation with 2 × 2 and 2 stride on the last output feature, feature graphs output by the two branches are added, the number of channels is kept unchanged, and a fused feature graph is obtained;
replacing a residual error module in a feature extraction network part in the YOLOv3 neural network by using a bottleneck residual error module, and replacing convolution with the step length of 3 multiplied by 3 and 1 in the bottleneck residual error module by asymmetric convolution; the bottleneck residual error module is formed by connecting two branches in parallel, one branch firstly uses convolution with 1 multiplied by 1 and step length of 1 to compress the number of channels, then uses asymmetric convolution to extract features, keeps the number of the channels unchanged, finally restores the number of the channels by performing convolution with 1 multiplied by 1 and step length of 1, and carries out short circuit connection on the other branch, adds feature graphs output by the two branches and keeps the number of the channels unchanged to obtain a fused feature graph;
replacing convolution with 3 multiplied by 3 and step length of 1 in a bottleneck residual error module by using asymmetric convolution, respectively performing convolution operation and normalization processing with convolution kernel scales of 3 multiplied by 3, 3 multiplied by 1 and 1 multiplied by 3 on a feature graph output last time by using the asymmetric convolution, fusing the three feature graphs respectively subjected to the convolution operation and the normalization processing, keeping the number of channels unchanged, and finally activating by using a Relu activation function;
step 2: training a deep learning target detection network YOLOv 3-SAB;
the method specifically comprises the following substeps:
step 2.1: acquiring a plurality of pedestrian infrared video data under different visual angles and scenes by using a plurality of infrared cameras, extracting the plurality of pedestrian infrared video data frame by frame into a plurality of infrared picture data, and selecting the plurality of infrared picture data to make an infrared pedestrian detection data set; performing enhancement processing on the plurality of infrared picture data, increasing the gray value of the pedestrian and reducing the gray value of the background, and performing image labeling on the position information, the pedestrian size information and the pedestrian category information of the pedestrian target in the plurality of infrared picture data, wherein the pedestrian category information is used as a classification basis according to the pedestrian appearance information, the contour information and the size information;
step 2.2: clustering pedestrian sizes marked on the images by adopting a mean clustering algorithm to obtain a prior aiming frame size aiming at night infrared pedestrian detection, and replacing the prior aiming frame size obtained by clustering with the original checking frame size in a deep learning target detection network YOLOv 3-SAB;
step 2.3: according to the infrared pedestrian detection data set acquired in the step 2.1, distributing the data set into a training data set and a testing data set according to a preset proportion; training the deep learning target detection network YOLOv3-SAB by using a training data set, wherein in the training process, the deep learning target detection network YOLOv3-SAB generates and stores a model weight file every time one iteration is carried out, and the training is stopped until the deep learning target detection network YOLOv3-SAB reaches a convergence state; the deep learning target detection network YOLOv3-SAB uses the model weight file as a pedestrian detection weight to detect all the plurality of infrared picture data in the test data set, all the pedestrian information in the test data set is obtained through detection and stored, the detection precision is obtained through comparison calculation according to the detection result and the real pedestrian information marked in the step 2.1, and the model weight file with the highest detection precision is selected as the model weight of the deep learning target detection network YOLOv 3-SAB;
and step 3: and (3) inputting the real-time collected plural infrared video images into a deep learning target detection network YOLOv3-SAB, and performing real-time pedestrian detection on the video data by using the model weight file in the step 2.3 as a pedestrian detection weight.
2. The improved YOLOv 3-based nighttime infrared pedestrian detection method of claim 1, wherein: in step 2.1, converting the gray value of the data of the plurality of infrared pictures according to a piecewise linear function shown in the following formula (1);
Figure FDA0003450587820000021
wherein R represents the gray value of the plurality of infrared picture data before conversion, and R represents the gray value of the plurality of infrared picture data after conversion.
3. The improved YOLOv 3-based nighttime infrared pedestrian detection method according to claim 1, wherein the clustering of the pedestrian sizes after the image labeling by using a mean value clustering algorithm in step 2.2 is specifically realized by the following substeps:
step 2.2.1: randomly selecting K marking frames as initial prior aiming frames in the data set marked by the image, wherein the K initial prior aiming frames are K clustering centers of the cluster;
step 2.2.2: using the intersection ratio of the marking frame and the prior aiming frame to measure the distance from 1 to IoU, and distributing the marking frame closest to the initial aiming frame to the cluster where the initial aiming frame is located;
step 2.2.3: calculating the mean value of the width and the height of all the aiming frames in each cluster, and updating the width and the height of a clustering center in each cluster according to the mean value of the width and the height of all the aiming frames in each cluster;
step 2.2.4: and (4) repeating the step 2.2.2 and the step 2.2.3 until the width and the height of the clustering center are not changed any more, and taking the size of the clustering center obtained at the moment as the prior aiming frame size of the infrared detection of the pedestrians at night.
4. The improved YOLOv 3-based nighttime infrared pedestrian detection method of claim 1, wherein: in the step 2.2, a CIoU target detection boundary box regression loss function is used for replacing an original IoU boundary box regression loss function in a YOLOv3 neural network and is used as a boundary box regression loss function of a deep learning target detection network YOLOv 3-SAB;
the bounding box regression loss function of the deep learning target detection network YOLOv3-SAB is as follows:
Figure FDA0003450587820000031
Figure FDA0003450587820000032
Figure FDA0003450587820000033
wherein, alpha represents a weight parameter, and nu represents an aspect ratio consistency parameter; h isgt、wgtRepresenting the height and width of a real frame; h. w represents the height and width of the prediction box; b and bgtRepresents the center point of the prediction frame and the real frame, rho () represents the Euclidean distance, c represents the prediction frame b and the real frame bgtThe diagonal distance of the minimum circumscribed matrix of (1); 1-IoU represent the intersection ratio of the annotation box and the prior aiming box.
5. The improved YOLOv 3-based nighttime infrared pedestrian detection method of claim 1, wherein: in step 2.3, the detection precision is obtained by comparing and calculating the detection result with the real pedestrian information marked in step 2.1, and the calculation content mainly includes: precision ratio Precision, Recall ratio Recall, average accuracy AP and frame rate per second FPS;
Figure FDA0003450587820000034
Figure FDA0003450587820000041
Figure FDA0003450587820000042
the method comprises the following steps that TP represents the number of correctly detected pedestrians in all night infrared pedestrian images in a test data set, FP represents the number of wrongly detected pedestrians in all night infrared pedestrian images in the test data set, and FN represents the number of undetected pedestrians in all night infrared pedestrian images in the test data set; in the formula (7), P represents Precision and R represents Recall.
6. The improved YOLOv 3-based nighttime infrared pedestrian detection method according to any one of claims 1-5, wherein: in the step 3, the model weight file in the step 2.3 is used as a pedestrian detection weight to perform real-time pedestrian detection on the video data, the deep learning target detection network YOLOv3-SAB uses the model weight file in the step 2.3 as a pedestrian detection weight to classify and regress pedestrians in the video data, the classified and regressed pedestrian information is marked in the video data in a form of a marking frame, and the detected video data is output to obtain a final pedestrian detection result.
7. A night infrared pedestrian detection system based on improved YOLOv3 is characterized by comprising the following modules:
the module 1 is used for constructing a deep learning target detection network YOLOv3-SAB and is responsible for extracting pedestrian appearance characteristics and pedestrian head, body and limb contour characteristics;
the deep learning target detection network YOLOv3-SAB is an improved network based on a YOLOv3 neural network;
wherein, a downsampling module stem is used for replacing the downsampling convolution with the convolution kernel of 3 multiplied by 3 and the step of 2 in the characteristic extraction network part in the YOLOv3 neural network; the stem module is formed by connecting two branches in parallel, one branch performs convolution operation with 1 × 1, 1 and 3 × 3 stride and 2 stride on the last output feature, the other branch performs maximum pooling operation with 2 × 2 and 2 stride on the last output feature, the feature graphs output by the two branches are added, the number of channels is kept unchanged, and the added and fused feature graph is obtained;
replacing a residual error module in a feature extraction network part in the YOLOv3 neural network by using a bottleneck residual error module, and replacing convolution with the step length of 3 multiplied by 3 and 1 in the bottleneck residual error module by asymmetric convolution; the bottleneck residual error module is formed by connecting two branches in parallel, one branch firstly uses convolution with 1 multiplied by 1 and step length of 1 to compress the number of channels, then uses asymmetric convolution to extract features, keeps the number of channels unchanged, finally restores the number of channels by carrying out convolution with 1 multiplied by 1 and step length of 1, and carries out short circuit connection on the other branch, adds the feature graphs output by the two branches and keeps the number of channels unchanged to obtain a feature graph after addition and fusion;
replacing convolution with 3 multiplied by 3 and step length of 1 in a bottleneck residual error module by using asymmetric convolution, respectively performing convolution operation and normalization processing with convolution kernel scales of 3 multiplied by 3, 3 multiplied by 1 and 1 multiplied by 3 on a feature map output last time by using the asymmetric convolution, adding the three feature maps respectively subjected to the convolution operation and the normalization processing, keeping the number of channels unchanged, and finally activating by using a Relu activation function;
the module 2 is used for training a deep learning target detection network YOLOv 3-SAB;
the method specifically comprises the following sub-modules:
the module 2.1 is used for acquiring a plurality of pedestrian infrared video data under different visual angles and scenes by using a plurality of infrared cameras, extracting the plurality of pedestrian infrared video data frame by frame into a plurality of infrared picture data, and selecting the plurality of infrared picture data to make an infrared pedestrian detection data set; performing enhancement processing on the plurality of infrared picture data, increasing the gray value of the pedestrian and reducing the gray value of the background, and performing image labeling on the position information, the pedestrian size information and the pedestrian category information of the pedestrian target in the plurality of infrared picture data, wherein the pedestrian category information is used as a classification basis according to the pedestrian appearance information, the contour information and the size information;
a module 2.2, configured to cluster the pedestrian sizes marked on the images by using a mean clustering algorithm, obtain a priori aiming frame size for nighttime infrared pedestrian detection by clustering, and replace the original checking frame size in the deep learning target detection network YOLOv3-SAB with the clustered priori aiming frame size;
the module 2.3 is used for distributing the data set into a training data set and a testing data set according to a preset proportion according to the infrared pedestrian detection data set acquired in the module 2.1; training the deep learning target detection network YOLOv3-SAB by using a training data set, wherein in the training process, the deep learning target detection network YOLOv3-SAB generates and stores a model weight file every time one iteration is carried out, and the training is stopped until the deep learning target detection network YOLOv3-SAB reaches a convergence state; the deep learning target detection network YOLOv3-SAB uses the model weight file as a pedestrian detection weight to detect all the plurality of infrared picture data in the test data set, all the pedestrian information in the test data set is obtained through detection and stored, the detection precision is obtained through comparison calculation according to the detection result and the real pedestrian information marked in the module 2.1, and the model weight file with the highest detection precision is selected as the model weight of the deep learning target detection network YOLOv 3-SAB;
and the module 3 is used for inputting the plural infrared video images acquired in real time into the deep learning target detection network YOLOv3-SAB, and performing real-time pedestrian detection on the video data by using the model weight file in the module 2.3 as the pedestrian detection weight of the deep learning target detection network YOLOv 3-SAB.
CN202111662884.3A 2021-12-31 2021-12-31 Night infrared pedestrian detection method and system based on improved YOLOv3 Pending CN114332942A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111662884.3A CN114332942A (en) 2021-12-31 2021-12-31 Night infrared pedestrian detection method and system based on improved YOLOv3

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111662884.3A CN114332942A (en) 2021-12-31 2021-12-31 Night infrared pedestrian detection method and system based on improved YOLOv3

Publications (1)

Publication Number Publication Date
CN114332942A true CN114332942A (en) 2022-04-12

Family

ID=81020008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111662884.3A Pending CN114332942A (en) 2021-12-31 2021-12-31 Night infrared pedestrian detection method and system based on improved YOLOv3

Country Status (1)

Country Link
CN (1) CN114332942A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115393892A (en) * 2022-07-20 2022-11-25 东北电力大学 Crowd scene pedestrian detection method based on improved double-candidate-frame cross replacement strategy and loss function
CN115690565A (en) * 2022-09-28 2023-02-03 大连海洋大学 Target detection method for cultivated fugu rubripes by fusing knowledge and improving YOLOv5

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115393892A (en) * 2022-07-20 2022-11-25 东北电力大学 Crowd scene pedestrian detection method based on improved double-candidate-frame cross replacement strategy and loss function
CN115393892B (en) * 2022-07-20 2023-08-04 东北电力大学 Congestion scene pedestrian detection method based on improved double-candidate-frame cross replacement strategy and loss function
CN115690565A (en) * 2022-09-28 2023-02-03 大连海洋大学 Target detection method for cultivated fugu rubripes by fusing knowledge and improving YOLOv5
CN115690565B (en) * 2022-09-28 2024-02-20 大连海洋大学 Method for detecting cultivated takifugu rubripes target by fusing knowledge and improving YOLOv5

Similar Documents

Publication Publication Date Title
CN111104898B (en) Image scene classification method and device based on target semantics and attention mechanism
CN108764065B (en) Pedestrian re-recognition feature fusion aided learning method
CN109241982B (en) Target detection method based on deep and shallow layer convolutional neural network
Xie et al. Detecting trees in street images via deep learning with attention module
CN114220035A (en) Rapid pest detection method based on improved YOLO V4
CN108154102A (en) A kind of traffic sign recognition method
CN109376580B (en) Electric power tower component identification method based on deep learning
CN114495029B (en) Traffic target detection method and system based on improved YOLOv4
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN114332942A (en) Night infrared pedestrian detection method and system based on improved YOLOv3
Lyu et al. Small object recognition algorithm of grain pests based on SSD feature fusion
CN103729620B (en) A kind of multi-view pedestrian detection method based on multi-view Bayesian network
CN115690542A (en) Improved yolov 5-based aerial insulator directional identification method
CN106599806A (en) Local curved-surface geometric feature-based human body action recognition method
CN116129291A (en) Unmanned aerial vehicle animal husbandry-oriented image target recognition method and device
CN109919215B (en) Target detection method for improving characteristic pyramid network based on clustering algorithm
CN111915558A (en) Pin state detection method for high-voltage transmission line
CN113486712B (en) Multi-face recognition method, system and medium based on deep learning
CN111881803B (en) Face recognition method based on improved YOLOv3
CN112949510A (en) Human detection method based on fast R-CNN thermal infrared image
Mirani et al. Object Recognition in Different Lighting Conditions at Various Angles by Deep Learning Method
CN111339950B (en) Remote sensing image target detection method
CN116385401B (en) High-precision visual detection method for textile defects
CN117132910A (en) Vehicle detection method and device for unmanned aerial vehicle and storage medium
Sun et al. UAV image detection algorithm based on improved YOLOv5

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination