CN112488165A

CN112488165A - Infrared pedestrian identification method and system based on deep learning model

Info

Publication number: CN112488165A
Application number: CN202011298623.3A
Authority: CN
Inventors: 黄绍帅; 崔光茫; 张家承; 蔡斌斌; 方德宸; 苏展
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2020-11-18
Filing date: 2020-11-18
Publication date: 2021-03-12

Abstract

The invention discloses an infrared pedestrian recognition method and system based on a deep learning model. Acquiring an infrared image containing a pedestrian and preprocessing the infrared image through a data preprocessing module, manually marking the preprocessed infrared image, and dividing the preprocessed infrared image into a training set and a test set of a detection model according to a set proportion; building a network model through a detection network and a model training module, optimizing a detection method, adjusting parameters, deriving a model detection effect, and determining a final model; an interactive window is established through an external interactive module, synchronous contrast and play of the video to be processed and the processed video are achieved, and therefore the system achieves the function of rapidly and accurately detecting pedestrians under the condition of complete no light.

Description

Infrared pedestrian identification method and system based on deep learning model

Technical Field

The invention relates to the technical field of infrared imaging, in particular to an infrared pedestrian identification method and system based on a deep learning model.

Background

The neural network algorithm is an algorithm simulating human thinking, the constructed network model is a simulated human brain, the training set is learning data provided by people, the training set is continuously sent to a computer for training to obtain a model, and the model is corrected by using the test set to finally obtain a model which is closest to ideal output. The model can identify the pedestrians in the image to the maximum extent under the condition of the infrared image. Since 2005, training libraries of pedestrian detection technologies tend to be large-scale, detection accuracy tends to be practical, and detection speed tends to be real-time. However, the existing pedestrian detection data set has few infrared thermal imaging pictures, and few algorithms are used for identifying pedestrians in the infrared images. Generally, a pedestrian recognition machine based on a general image depends on the quality of a picture, and it is difficult to accurately recognize a pedestrian in the case of insufficient light at night, so that an infrared recognition system is required for assistance.

Pedestrian detection develops for more than ten years, the current mainstream is the traditional algorithms such as Haar characteristic + Adaboost algorithm, Hog characteristic + Svm algorithm and the like, and deep learning is in rapid development because the deep learning is more in line with human thinking mode, open-source code mode and rapid development of big data in recent years and continuous optimization of network framework. The pedestrian detection is widely applied to the field of computer application, and due to the limitations of insufficient illumination, complex background, large change of human body form and the like in the traditional pedestrian identification method, the pedestrian cannot be accurately and truly identified, so that the pedestrian is required to be rapidly and accurately detected under the condition of complete no light, and the infrared thermal imaging and the pedestrian identification are combined to be matched with an advanced deep learning algorithm.

For example, chinese patent CN106407948A, published 2017, 2, month 15, a pedestrian detection and identification method based on an infrared night vision device, includes the following steps: collecting and storing video frames through an infrared night vision device, updating the effective data of the latest three frames in real time, and preprocessing the collected three-frame video images; performing area matching on the processed three frames of video images, performing three-frame difference calculation after image compensation is completed, and performing morphological expansion and corrosion treatment on the images; and carrying out pedestrian identification on the image according to the geometric characteristics and the motion rate characteristics. The method provides an improved three-frame difference method for pedestrian detection on the basis of the existing pedestrian detection, can better extract the outline of the pedestrian, and can identify and classify the target by combining the geometric and motion rate characteristics of the pedestrian, namely, the pedestrian moving on the road can be identified. The pedestrian identification detection method does not consider how to quickly and accurately detect the pedestrian under the condition of complete no light, and cannot quickly and accurately identify and detect the pedestrian under the condition of lack of visible light.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the technical problem that the existing pedestrian recognition system cannot quickly and accurately detect the pedestrian under the condition of no light is solved. The infrared pedestrian identification method and system based on the deep learning model can quickly and accurately detect pedestrians under the condition of no light.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows: an infrared pedestrian recognition method based on a deep learning model comprises the following steps:

s1: acquiring data and preprocessing the data: acquiring an infrared image containing pedestrians, marking the infrared image, and dividing the infrared image into a training set and a test set of a detection model according to a set proportion;

s2: building a neural network: image segmentation is carried out on the picture by using a target detection network fast-Rcnn, and feature extraction and identification of the target picture are completed by using Vgg-16;

s3: model training: carrying out model training on the neural network by using a training set, and calculating loss functions of the training set and a testing set;

s4: designing an interactive interface: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video. An infrared thermal imager is used for shooting videos, and because the convolutional network can automatically extract information of various aspects such as color, contrast and the like, the videos are shot at different temperatures; because pedestrians in various postures are needed to enhance the robustness of the network, organization personnel are selected to take a swing shot because the street shot cannot be well recorded, 10000 pictures with good effect are intercepted as a picture set, the picture set is manually framed and selected as a data set, all data are divided into a training set and a testing set according to the proportion of 3: 1, and finally the training set is converted into a proper format and trained by using TensorFlow; the method comprises the steps of extracting features of pictures by using a convolutional neural network, completing construction of the neural network through Vgg-16, and training a model by using the most basic parameters and methods to train the network after the framework is constructed. And then, continuously adding an advanced optimization method, adjusting parameters, observing the recognition speed and the recognition accuracy before and after application, and continuously testing the effect and modifying the network. The training network has no endpoint, but the more times the training is better, because the network has the problems of insufficient fitting and overfitting, loss curves in a training set and a testing set need to be drawn, and a model of a better effect batch needs to be selected for derivation. Then, the effect of the model is required to be tested, the effect comprises a first type error rate and a second type error rate, the generated problems are theoretically analyzed, and network parameters such as regularization and the like and even a network structure are adjusted according to the analysis result. After the model is built and optimized, manufacturing a basic GUI (graphical user interface) by using PyQt5 and a Tkinter library, and realizing and displaying functions and feedback results of the model through an interactive interface. Setting menu bar, status bar and tool bar to make interface realize basic function, such as file leading-in, closing, image capturing, etc, then further designing to realize synchronous contrast playing of video to be processed and processed video.

Preferably, the step S1 includes the following steps:

s11: after acquiring pedestrian video source data, framing the video source by using Opencv;

s12: manually selecting and marking a frame of the picture by using LabelImg;

s13: making a VOC data set, and converting the VOC data set into a TFrecord;

s14: and finishing the pretreatment. The preprocessing of the picture is beneficial to the identification and extraction of the features in the picture later.

Preferably, the step S13 includes the following steps: the marked pedestrian picture set is sorted into a data set, the marked data set is converted into a file with a TFRecord format, the annotated xml file is converted into a csv format, all data are divided into a training set and a testing set according to the ratio of 3: 1 by using xml _ to _ csv.py, a train.csv training set and an eval.csv testing set are generated, and finally the TFRecord file is generated. The 3: 1 ratio may be more convenient to detect the training effect after the training test.

Preferably, the step S2 includes the following steps:

s21: carrying out corresponding preprocessing on the image;

s22: inputting a preprocessed picture into a convolutional neural network to extract features, wherein the feature extraction network is intended to select Vgg _ 16;

s23: obtaining 256 × H × W feature maps by convolution of the common feature maps by 3 × 3, and obtaining H × W9 Anchors by a series of processing;

s24: each Anchors is subjected to post-processing to obtain k frames with the largest score;

s25: the k candidate frames correspond to the original image and are mapped to the public feature map;

s26: candidate boxes on the public map get a set of features of standard size by Roi posing;

s27: the Roi features are reclassified and regressed. Vgg _16 is an excellent convolutional neural network framework. The Vgg-16 is used for identifying a single target picture, but the Vgg-16 can quickly identify the target picture, and the situation that a plurality of targets appear on one picture cannot be processed at all, so that the picture needs to be subjected to image segmentation firstly, namely, the picture is divided into a plurality of specific areas with unique properties, and an interested target is proposed. Fast-Rcnn, which belongs to the Rcnn pedigree, using one of the target detection networks, has been subjected to Rcnn, Fast-Rcnn, and finally to Fast-Rcnn, which already has a high recognition capability, and is divided into two parts, an image recognition part (Fast-Rcnn) and a candidate frame selection part (RPN).

Preferably, the loss function in step S3 is:

L(p，u，t^u，v)＝L_cls(p，u)+β[u≥1]L_loc(t^u，v)

wherein, t^uRepresents the predicted result, u represents the category, and v represents the true result. The loss function is the integration of the classified loss and the regressed loss, where the classification is log loss, i.e., the probability of a true classification is negatively log, and the regressed loss is substantially the same as R-CNN. The classification layer outputs the K +1 dimension, representing K classes and 1 background class.

Preferably, the step S3 includes the following steps:

s31: selecting 25% ROI from each image;

s32: increasing a data set by adopting a random horizontal turning mode;

s33: growing the ROI of each image to at least 2000;

s34: calculating loss functions of the training set and the test set;

s35: and repeating the steps, and calculating the loss functions of the training set and the test set after each training. In actual training, each mini-batch contains 2 images and 128 region probes (or ROIs), i.e., 64 ROIs per image. About 25% of the ROIs are then picked from these ROIs, both having IOU values greater than 0.5. In addition, only random horizontal inversion was used to increase the data set, and about 2000 ROIs were obtained per image when tested.

Preferably, the step S4 includes the following steps:

s41: establishing an interactive window;

s42: setting a menu bar, a status bar and a tool bar;

s43: setting a video playing window to be processed and a processed video playing window;

s44: and synchronously comparing and playing the video to be processed and the processed video. The functions of the program can be operated through a visual interactive interface, and the running result of the program is displayed in a visual mode, for example, the video to be processed and the processed video are synchronously compared and played. The result is displayed while the interface is designed to be beautiful and concise. The GUI is to be able to respond quickly to the user's operation and feedback results. In the design, various factors such as reduction of memory burden of a user are considered, and the interface is simple but the function is complete.

An infrared pedestrian recognition system based on a deep learning model is controlled by applying the method, and comprises the following steps:

a data preprocessing module: the system comprises a detection model, a training set and a test set, wherein the detection model is used for acquiring an infrared image containing pedestrians, preprocessing the infrared image, manually marking the preprocessed infrared image, and then dividing the preprocessed infrared image into the training set and the test set of the detection model according to a set proportion;

a detection network and model training module: the method is used for building a network model, optimizing a detection method, adjusting parameters, deriving a model detection effect and determining a final model;

an external interaction module: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video. The infrared pedestrian recognition system realizes the function that pedestrians cannot be rapidly and accurately detected under the condition of complete no light through the data preprocessing module, the detection network, the model training module and the external interaction module, and is more convenient for users to operate through the external interaction module.

The substantial effects of the invention are as follows: acquiring an infrared image containing a pedestrian and preprocessing the infrared image through a data preprocessing module, manually marking the preprocessed infrared image, and dividing the preprocessed infrared image into a training set and a test set of a detection model according to a set proportion; building a network model through a detection network and a model training module, optimizing a detection method, adjusting parameters, deriving a model detection effect, and determining a final model; an interactive window is established through an external interactive module, synchronous contrast and play of the video to be processed and the processed video are achieved, and therefore the system achieves the function of rapidly and accurately detecting pedestrians under the condition of complete no light.

Drawings

FIG. 1 is a flow chart of the overall implementation steps of the present invention.

Fig. 2 is an original infrared image.

Fig. 3 is an image after marking.

Fig. 4 is a schematic diagram of the present invention.

Detailed Description

The following provides a more detailed description of the present invention, with reference to the accompanying drawings.

An infrared pedestrian recognition method based on a deep learning model, as shown in fig. 1, includes the following steps:

s1: acquiring data and preprocessing the data: acquiring an infrared image containing pedestrians, marking the infrared image, and dividing the infrared image into a training set and a test set of a detection model according to a set proportion; step S1 includes the following steps:

s12: manually selecting and marking a frame of the picture by using LabelImg; the original infrared image is shown in fig. 2, and the marked image is shown in fig. 3;

s13: making a VOC data set, and converting the VOC data set into a TFrecord; the step S13 includes the following steps: the marked pedestrian picture set is sorted into a data set, the marked data set is converted into a file with a TFRecord format, the annotated xml file is converted into a csv format, all data are divided into a training set and a testing set according to the ratio of 3: 1 by using xml _ to _ csv.py, a train.csv training set and an eval.csv testing set are generated, and finally the TFRecord file is generated. The 3: 1 ratio may be more convenient to detect the training effect after the training test.

S2: building a neural network: image segmentation is carried out on the picture by using a target detection network fast-Rcnn, and feature extraction and identification of the target picture are completed by using Vgg-16; step S2 includes the following steps:

s21: carrying out corresponding preprocessing on the image;

S3: model training: carrying out model training on the neural network by using a training set, and calculating loss functions of the training set and a testing set; the loss function in step S3 is:

L(p，u，t^u，v)＝L_cls(p，u)+β[u≥1]L_loc(t^u，v)

wherein, t^uRepresents the predicted result, u represents the category, and v represents the true result. The loss function is the integration of the classified loss and the regressed loss, where the classification is log loss, i.e., the probability of a true classification is negatively log, and the regressed loss is substantially the same as R-CNN. The classification layer outputs the K +1 dimension, representing K classes and 1 background class. Step S3 includes the following steps:

s31: selecting 25% ROI from each image;

s32: increasing a data set by adopting a random horizontal turning mode;

s33: growing the ROI of each image to at least 2000;

s34: calculating loss functions of the training set and the test set;

S4: designing an interactive interface: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video. Step S4 includes the following steps:

s41: establishing an interactive window;

s42: setting a menu bar, a status bar and a tool bar;

An infrared pedestrian recognition system based on a deep learning model is controlled by applying the method as shown in fig. 4, and comprises a data preprocessing module: the system comprises a detection model, a training set and a test set, wherein the detection model is used for acquiring an infrared image containing pedestrians, preprocessing the infrared image, manually marking the preprocessed infrared image, and then dividing the preprocessed infrared image into the training set and the test set of the detection model according to a set proportion; a detection network and model training module: the method is used for building a network model, optimizing a detection method, adjusting parameters, deriving a model detection effect and determining a final model; an external interaction module: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video. The infrared pedestrian recognition system realizes the function that pedestrians cannot be rapidly and accurately detected under the condition of complete no light through the data preprocessing module, the detection network, the model training module and the external interaction module, and is more convenient for users to operate through the external interaction module.

In the embodiment, an infrared thermal imager is used for shooting videos, and because the convolutional network can automatically extract information of various aspects such as color, contrast and the like, the videos are selected to be shot at different temperatures; because pedestrians in various postures are needed to enhance the robustness of the network, organization personnel are selected to take a swing shot because the street shot cannot be well recorded, 10000 pictures with good effect are intercepted as a picture set, the picture set is manually framed and selected as a data set, all data are divided into a training set and a testing set according to the proportion of 3: 1, and finally the training set is converted into a proper format and trained by using TensorFlow; the method comprises the steps of extracting features of pictures by using a convolutional neural network, completing construction of the neural network through Vgg-16, and training a model by using the most basic parameters and methods to train the network after the framework is constructed. And then, continuously adding an advanced optimization method, adjusting parameters, observing the recognition speed and the recognition accuracy before and after application, and continuously testing the effect and modifying the network. The training network has no endpoint, but the more times the training is better, because the network has the problems of insufficient fitting and overfitting, loss curves in a training set and a testing set need to be drawn, and a model of a better effect batch needs to be selected for derivation. Then, the effect of the model is required to be tested, the effect comprises a first type error rate and a second type error rate, the generated problems are theoretically analyzed, and network parameters such as regularization and the like and even a network structure are adjusted according to the analysis result. After the model is built and optimized, the PyQt5 and the Tkinter library are used for basic GUI manufacturing, functions and feedback results of the model are achieved and displayed through an interactive interface, a menu bar, a status bar and a toolbar are set, so that the interface can achieve basic functions, such as file importing, file closing, image capturing and the like, and then further design is conducted to achieve synchronous contrast playing of the video to be processed and the processed video.

The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims

1. An infrared pedestrian recognition method based on a deep learning model is characterized by comprising the following steps:

s4: designing an interactive interface: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video.

2. The infrared pedestrian recognition method based on the deep learning model as claimed in claim 1, wherein the step S1 includes the steps of:

s12: manually selecting and marking a frame of the picture by using LabelImg;

s13: making a VOC data set, and converting the VOC data set into a TFrecord;

s14: and finishing the pretreatment.

3. The infrared pedestrian recognition method based on the deep learning model as claimed in claim 2, wherein the step S13 specifically comprises the steps of: the marked pedestrian picture set is sorted into a data set, the marked data set is converted into a file with a TFRecord format, the annotated xml file is converted into a csv format, all data are divided into a training set and a testing set according to the ratio of 3: 1 by using xml _ to _ csv.py, a train.csv training set and an eval.csv testing set are generated, and finally the TFRecord file is generated.

4. The infrared pedestrian recognition method based on the deep learning model according to claim 1 or 2, wherein the step S2 includes the steps of:

s21: carrying out corresponding preprocessing on the image;

s26: obtaining a group of characteristics with standard size by a candidate box on the public map through Roipooling;

s27: the Roi features are reclassified and regressed.

5. The infrared pedestrian recognition method based on the deep learning model as claimed in claim 1, wherein the loss function in the step S3 is:

L(p，u，t^u，v)＝L_cls(p，u)+β[u≥1]L_loc(t^u，v)

wherein, t^uRepresents the predicted result, u represents the category, and v represents the true result.

6. The infrared pedestrian recognition method based on the deep learning model as claimed in claim 1 or 5, wherein the step S3 comprises the steps of:

s31: selecting 25% ROI from each image;

s32: increasing a data set by adopting a random horizontal turning mode;

s33: growing the ROI of each image to at least 2000;

s34: calculating loss functions of the training set and the test set;

s35: and repeating the steps, and calculating the loss functions of the training set and the test set after each training.

7. The infrared pedestrian recognition method based on the deep learning model as claimed in claim 1, wherein the step S4 includes the steps of:

s41: establishing an interactive window;

s42: setting a menu bar, a status bar and a tool bar;

s44: and synchronously comparing and playing the video to be processed and the processed video.

8. An infrared pedestrian recognition system based on a deep learning model, using the method of any one of claims 1-7, comprising:

an external interaction module: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video.