CN112488165A - Infrared pedestrian identification method and system based on deep learning model - Google Patents
Infrared pedestrian identification method and system based on deep learning model Download PDFInfo
- Publication number
- CN112488165A CN112488165A CN202011298623.3A CN202011298623A CN112488165A CN 112488165 A CN112488165 A CN 112488165A CN 202011298623 A CN202011298623 A CN 202011298623A CN 112488165 A CN112488165 A CN 112488165A
- Authority
- CN
- China
- Prior art keywords
- model
- training
- infrared
- detection
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000013136 deep learning model Methods 0.000 title claims abstract description 17
- 238000012549 training Methods 0.000 claims abstract description 67
- 238000001514 detection method Methods 0.000 claims abstract description 46
- 238000012360 testing method Methods 0.000 claims abstract description 35
- 230000006870 function Effects 0.000 claims abstract description 26
- 238000007781 pre-processing Methods 0.000 claims abstract description 21
- 230000002452 interceptive effect Effects 0.000 claims abstract description 20
- 230000000694 effects Effects 0.000 claims abstract description 18
- 230000001360 synchronised effect Effects 0.000 claims abstract description 10
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 238000013527 convolutional neural network Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 230000003993 interaction Effects 0.000 claims description 7
- 238000003709 image segmentation Methods 0.000 claims description 5
- 238000009432 framing Methods 0.000 claims description 3
- 238000012805 post-processing Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 description 7
- 230000000007 visual effect Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000004297 night vision Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000036544 posture Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 239000000523 sample Substances 0.000 description 2
- 238000001931 thermography Methods 0.000 description 2
- 210000004556 brain Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000005260 corrosion Methods 0.000 description 1
- 230000007797 corrosion Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003331 infrared imaging Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30204—Marker
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Biomedical Technology (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses an infrared pedestrian recognition method and system based on a deep learning model. Acquiring an infrared image containing a pedestrian and preprocessing the infrared image through a data preprocessing module, manually marking the preprocessed infrared image, and dividing the preprocessed infrared image into a training set and a test set of a detection model according to a set proportion; building a network model through a detection network and a model training module, optimizing a detection method, adjusting parameters, deriving a model detection effect, and determining a final model; an interactive window is established through an external interactive module, synchronous contrast and play of the video to be processed and the processed video are achieved, and therefore the system achieves the function of rapidly and accurately detecting pedestrians under the condition of complete no light.
Description
Technical Field
The invention relates to the technical field of infrared imaging, in particular to an infrared pedestrian identification method and system based on a deep learning model.
Background
The neural network algorithm is an algorithm simulating human thinking, the constructed network model is a simulated human brain, the training set is learning data provided by people, the training set is continuously sent to a computer for training to obtain a model, and the model is corrected by using the test set to finally obtain a model which is closest to ideal output. The model can identify the pedestrians in the image to the maximum extent under the condition of the infrared image. Since 2005, training libraries of pedestrian detection technologies tend to be large-scale, detection accuracy tends to be practical, and detection speed tends to be real-time. However, the existing pedestrian detection data set has few infrared thermal imaging pictures, and few algorithms are used for identifying pedestrians in the infrared images. Generally, a pedestrian recognition machine based on a general image depends on the quality of a picture, and it is difficult to accurately recognize a pedestrian in the case of insufficient light at night, so that an infrared recognition system is required for assistance.
Pedestrian detection develops for more than ten years, the current mainstream is the traditional algorithms such as Haar characteristic + Adaboost algorithm, Hog characteristic + Svm algorithm and the like, and deep learning is in rapid development because the deep learning is more in line with human thinking mode, open-source code mode and rapid development of big data in recent years and continuous optimization of network framework. The pedestrian detection is widely applied to the field of computer application, and due to the limitations of insufficient illumination, complex background, large change of human body form and the like in the traditional pedestrian identification method, the pedestrian cannot be accurately and truly identified, so that the pedestrian is required to be rapidly and accurately detected under the condition of complete no light, and the infrared thermal imaging and the pedestrian identification are combined to be matched with an advanced deep learning algorithm.
For example, chinese patent CN106407948A, published 2017, 2, month 15, a pedestrian detection and identification method based on an infrared night vision device, includes the following steps: collecting and storing video frames through an infrared night vision device, updating the effective data of the latest three frames in real time, and preprocessing the collected three-frame video images; performing area matching on the processed three frames of video images, performing three-frame difference calculation after image compensation is completed, and performing morphological expansion and corrosion treatment on the images; and carrying out pedestrian identification on the image according to the geometric characteristics and the motion rate characteristics. The method provides an improved three-frame difference method for pedestrian detection on the basis of the existing pedestrian detection, can better extract the outline of the pedestrian, and can identify and classify the target by combining the geometric and motion rate characteristics of the pedestrian, namely, the pedestrian moving on the road can be identified. The pedestrian identification detection method does not consider how to quickly and accurately detect the pedestrian under the condition of complete no light, and cannot quickly and accurately identify and detect the pedestrian under the condition of lack of visible light.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the technical problem that the existing pedestrian recognition system cannot quickly and accurately detect the pedestrian under the condition of no light is solved. The infrared pedestrian identification method and system based on the deep learning model can quickly and accurately detect pedestrians under the condition of no light.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: an infrared pedestrian recognition method based on a deep learning model comprises the following steps:
s1: acquiring data and preprocessing the data: acquiring an infrared image containing pedestrians, marking the infrared image, and dividing the infrared image into a training set and a test set of a detection model according to a set proportion;
s2: building a neural network: image segmentation is carried out on the picture by using a target detection network fast-Rcnn, and feature extraction and identification of the target picture are completed by using Vgg-16;
s3: model training: carrying out model training on the neural network by using a training set, and calculating loss functions of the training set and a testing set;
s4: designing an interactive interface: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video. An infrared thermal imager is used for shooting videos, and because the convolutional network can automatically extract information of various aspects such as color, contrast and the like, the videos are shot at different temperatures; because pedestrians in various postures are needed to enhance the robustness of the network, organization personnel are selected to take a swing shot because the street shot cannot be well recorded, 10000 pictures with good effect are intercepted as a picture set, the picture set is manually framed and selected as a data set, all data are divided into a training set and a testing set according to the proportion of 3: 1, and finally the training set is converted into a proper format and trained by using TensorFlow; the method comprises the steps of extracting features of pictures by using a convolutional neural network, completing construction of the neural network through Vgg-16, and training a model by using the most basic parameters and methods to train the network after the framework is constructed. And then, continuously adding an advanced optimization method, adjusting parameters, observing the recognition speed and the recognition accuracy before and after application, and continuously testing the effect and modifying the network. The training network has no endpoint, but the more times the training is better, because the network has the problems of insufficient fitting and overfitting, loss curves in a training set and a testing set need to be drawn, and a model of a better effect batch needs to be selected for derivation. Then, the effect of the model is required to be tested, the effect comprises a first type error rate and a second type error rate, the generated problems are theoretically analyzed, and network parameters such as regularization and the like and even a network structure are adjusted according to the analysis result. After the model is built and optimized, manufacturing a basic GUI (graphical user interface) by using PyQt5 and a Tkinter library, and realizing and displaying functions and feedback results of the model through an interactive interface. Setting menu bar, status bar and tool bar to make interface realize basic function, such as file leading-in, closing, image capturing, etc, then further designing to realize synchronous contrast playing of video to be processed and processed video.
Preferably, the step S1 includes the following steps:
s11: after acquiring pedestrian video source data, framing the video source by using Opencv;
s12: manually selecting and marking a frame of the picture by using LabelImg;
s13: making a VOC data set, and converting the VOC data set into a TFrecord;
s14: and finishing the pretreatment. The preprocessing of the picture is beneficial to the identification and extraction of the features in the picture later.
Preferably, the step S13 includes the following steps: the marked pedestrian picture set is sorted into a data set, the marked data set is converted into a file with a TFRecord format, the annotated xml file is converted into a csv format, all data are divided into a training set and a testing set according to the ratio of 3: 1 by using xml _ to _ csv.py, a train.csv training set and an eval.csv testing set are generated, and finally the TFRecord file is generated. The 3: 1 ratio may be more convenient to detect the training effect after the training test.
Preferably, the step S2 includes the following steps:
s21: carrying out corresponding preprocessing on the image;
s22: inputting a preprocessed picture into a convolutional neural network to extract features, wherein the feature extraction network is intended to select Vgg _ 16;
s23: obtaining 256 × H × W feature maps by convolution of the common feature maps by 3 × 3, and obtaining H × W9 Anchors by a series of processing;
s24: each Anchors is subjected to post-processing to obtain k frames with the largest score;
s25: the k candidate frames correspond to the original image and are mapped to the public feature map;
s26: candidate boxes on the public map get a set of features of standard size by Roi posing;
s27: the Roi features are reclassified and regressed. Vgg _16 is an excellent convolutional neural network framework. The Vgg-16 is used for identifying a single target picture, but the Vgg-16 can quickly identify the target picture, and the situation that a plurality of targets appear on one picture cannot be processed at all, so that the picture needs to be subjected to image segmentation firstly, namely, the picture is divided into a plurality of specific areas with unique properties, and an interested target is proposed. Fast-Rcnn, which belongs to the Rcnn pedigree, using one of the target detection networks, has been subjected to Rcnn, Fast-Rcnn, and finally to Fast-Rcnn, which already has a high recognition capability, and is divided into two parts, an image recognition part (Fast-Rcnn) and a candidate frame selection part (RPN).
Preferably, the loss function in step S3 is:
L(p,u,tu,v)=Lcls(p,u)+β[u≥1]Lloc(tu,v)
wherein, tuRepresents the predicted result, u represents the category, and v represents the true result. The loss function is the integration of the classified loss and the regressed loss, where the classification is log loss, i.e., the probability of a true classification is negatively log, and the regressed loss is substantially the same as R-CNN. The classification layer outputs the K +1 dimension, representing K classes and 1 background class.
Preferably, the step S3 includes the following steps:
s31: selecting 25% ROI from each image;
s32: increasing a data set by adopting a random horizontal turning mode;
s33: growing the ROI of each image to at least 2000;
s34: calculating loss functions of the training set and the test set;
s35: and repeating the steps, and calculating the loss functions of the training set and the test set after each training. In actual training, each mini-batch contains 2 images and 128 region probes (or ROIs), i.e., 64 ROIs per image. About 25% of the ROIs are then picked from these ROIs, both having IOU values greater than 0.5. In addition, only random horizontal inversion was used to increase the data set, and about 2000 ROIs were obtained per image when tested.
Preferably, the step S4 includes the following steps:
s41: establishing an interactive window;
s42: setting a menu bar, a status bar and a tool bar;
s43: setting a video playing window to be processed and a processed video playing window;
s44: and synchronously comparing and playing the video to be processed and the processed video. The functions of the program can be operated through a visual interactive interface, and the running result of the program is displayed in a visual mode, for example, the video to be processed and the processed video are synchronously compared and played. The result is displayed while the interface is designed to be beautiful and concise. The GUI is to be able to respond quickly to the user's operation and feedback results. In the design, various factors such as reduction of memory burden of a user are considered, and the interface is simple but the function is complete.
An infrared pedestrian recognition system based on a deep learning model is controlled by applying the method, and comprises the following steps:
a data preprocessing module: the system comprises a detection model, a training set and a test set, wherein the detection model is used for acquiring an infrared image containing pedestrians, preprocessing the infrared image, manually marking the preprocessed infrared image, and then dividing the preprocessed infrared image into the training set and the test set of the detection model according to a set proportion;
a detection network and model training module: the method is used for building a network model, optimizing a detection method, adjusting parameters, deriving a model detection effect and determining a final model;
an external interaction module: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video. The infrared pedestrian recognition system realizes the function that pedestrians cannot be rapidly and accurately detected under the condition of complete no light through the data preprocessing module, the detection network, the model training module and the external interaction module, and is more convenient for users to operate through the external interaction module.
The substantial effects of the invention are as follows: acquiring an infrared image containing a pedestrian and preprocessing the infrared image through a data preprocessing module, manually marking the preprocessed infrared image, and dividing the preprocessed infrared image into a training set and a test set of a detection model according to a set proportion; building a network model through a detection network and a model training module, optimizing a detection method, adjusting parameters, deriving a model detection effect, and determining a final model; an interactive window is established through an external interactive module, synchronous contrast and play of the video to be processed and the processed video are achieved, and therefore the system achieves the function of rapidly and accurately detecting pedestrians under the condition of complete no light.
Drawings
FIG. 1 is a flow chart of the overall implementation steps of the present invention.
Fig. 2 is an original infrared image.
Fig. 3 is an image after marking.
Fig. 4 is a schematic diagram of the present invention.
Detailed Description
The following provides a more detailed description of the present invention, with reference to the accompanying drawings.
An infrared pedestrian recognition method based on a deep learning model, as shown in fig. 1, includes the following steps:
s1: acquiring data and preprocessing the data: acquiring an infrared image containing pedestrians, marking the infrared image, and dividing the infrared image into a training set and a test set of a detection model according to a set proportion; step S1 includes the following steps:
s11: after acquiring pedestrian video source data, framing the video source by using Opencv;
s12: manually selecting and marking a frame of the picture by using LabelImg; the original infrared image is shown in fig. 2, and the marked image is shown in fig. 3;
s13: making a VOC data set, and converting the VOC data set into a TFrecord; the step S13 includes the following steps: the marked pedestrian picture set is sorted into a data set, the marked data set is converted into a file with a TFRecord format, the annotated xml file is converted into a csv format, all data are divided into a training set and a testing set according to the ratio of 3: 1 by using xml _ to _ csv.py, a train.csv training set and an eval.csv testing set are generated, and finally the TFRecord file is generated. The 3: 1 ratio may be more convenient to detect the training effect after the training test.
S14: and finishing the pretreatment. The preprocessing of the picture is beneficial to the identification and extraction of the features in the picture later.
S2: building a neural network: image segmentation is carried out on the picture by using a target detection network fast-Rcnn, and feature extraction and identification of the target picture are completed by using Vgg-16; step S2 includes the following steps:
s21: carrying out corresponding preprocessing on the image;
s22: inputting a preprocessed picture into a convolutional neural network to extract features, wherein the feature extraction network is intended to select Vgg _ 16;
s23: obtaining 256 × H × W feature maps by convolution of the common feature maps by 3 × 3, and obtaining H × W9 Anchors by a series of processing;
s24: each Anchors is subjected to post-processing to obtain k frames with the largest score;
s25: the k candidate frames correspond to the original image and are mapped to the public feature map;
s26: candidate boxes on the public map get a set of features of standard size by Roi posing;
s27: the Roi features are reclassified and regressed. Vgg _16 is an excellent convolutional neural network framework. The Vgg-16 is used for identifying a single target picture, but the Vgg-16 can quickly identify the target picture, and the situation that a plurality of targets appear on one picture cannot be processed at all, so that the picture needs to be subjected to image segmentation firstly, namely, the picture is divided into a plurality of specific areas with unique properties, and an interested target is proposed. Fast-Rcnn, which belongs to the Rcnn pedigree, using one of the target detection networks, has been subjected to Rcnn, Fast-Rcnn, and finally to Fast-Rcnn, which already has a high recognition capability, and is divided into two parts, an image recognition part (Fast-Rcnn) and a candidate frame selection part (RPN).
S3: model training: carrying out model training on the neural network by using a training set, and calculating loss functions of the training set and a testing set; the loss function in step S3 is:
L(p,u,tu,v)=Lcls(p,u)+β[u≥1]Lloc(tu,v)
wherein, tuRepresents the predicted result, u represents the category, and v represents the true result. The loss function is the integration of the classified loss and the regressed loss, where the classification is log loss, i.e., the probability of a true classification is negatively log, and the regressed loss is substantially the same as R-CNN. The classification layer outputs the K +1 dimension, representing K classes and 1 background class. Step S3 includes the following steps:
s31: selecting 25% ROI from each image;
s32: increasing a data set by adopting a random horizontal turning mode;
s33: growing the ROI of each image to at least 2000;
s34: calculating loss functions of the training set and the test set;
s35: and repeating the steps, and calculating the loss functions of the training set and the test set after each training. In actual training, each mini-batch contains 2 images and 128 region probes (or ROIs), i.e., 64 ROIs per image. About 25% of the ROIs are then picked from these ROIs, both having IOU values greater than 0.5. In addition, only random horizontal inversion was used to increase the data set, and about 2000 ROIs were obtained per image when tested.
S4: designing an interactive interface: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video. Step S4 includes the following steps:
s41: establishing an interactive window;
s42: setting a menu bar, a status bar and a tool bar;
s43: setting a video playing window to be processed and a processed video playing window;
s44: and synchronously comparing and playing the video to be processed and the processed video. The functions of the program can be operated through a visual interactive interface, and the running result of the program is displayed in a visual mode, for example, the video to be processed and the processed video are synchronously compared and played. The result is displayed while the interface is designed to be beautiful and concise. The GUI is to be able to respond quickly to the user's operation and feedback results. In the design, various factors such as reduction of memory burden of a user are considered, and the interface is simple but the function is complete.
An infrared pedestrian recognition system based on a deep learning model is controlled by applying the method as shown in fig. 4, and comprises a data preprocessing module: the system comprises a detection model, a training set and a test set, wherein the detection model is used for acquiring an infrared image containing pedestrians, preprocessing the infrared image, manually marking the preprocessed infrared image, and then dividing the preprocessed infrared image into the training set and the test set of the detection model according to a set proportion; a detection network and model training module: the method is used for building a network model, optimizing a detection method, adjusting parameters, deriving a model detection effect and determining a final model; an external interaction module: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video. The infrared pedestrian recognition system realizes the function that pedestrians cannot be rapidly and accurately detected under the condition of complete no light through the data preprocessing module, the detection network, the model training module and the external interaction module, and is more convenient for users to operate through the external interaction module.
In the embodiment, an infrared thermal imager is used for shooting videos, and because the convolutional network can automatically extract information of various aspects such as color, contrast and the like, the videos are selected to be shot at different temperatures; because pedestrians in various postures are needed to enhance the robustness of the network, organization personnel are selected to take a swing shot because the street shot cannot be well recorded, 10000 pictures with good effect are intercepted as a picture set, the picture set is manually framed and selected as a data set, all data are divided into a training set and a testing set according to the proportion of 3: 1, and finally the training set is converted into a proper format and trained by using TensorFlow; the method comprises the steps of extracting features of pictures by using a convolutional neural network, completing construction of the neural network through Vgg-16, and training a model by using the most basic parameters and methods to train the network after the framework is constructed. And then, continuously adding an advanced optimization method, adjusting parameters, observing the recognition speed and the recognition accuracy before and after application, and continuously testing the effect and modifying the network. The training network has no endpoint, but the more times the training is better, because the network has the problems of insufficient fitting and overfitting, loss curves in a training set and a testing set need to be drawn, and a model of a better effect batch needs to be selected for derivation. Then, the effect of the model is required to be tested, the effect comprises a first type error rate and a second type error rate, the generated problems are theoretically analyzed, and network parameters such as regularization and the like and even a network structure are adjusted according to the analysis result. After the model is built and optimized, the PyQt5 and the Tkinter library are used for basic GUI manufacturing, functions and feedback results of the model are achieved and displayed through an interactive interface, a menu bar, a status bar and a toolbar are set, so that the interface can achieve basic functions, such as file importing, file closing, image capturing and the like, and then further design is conducted to achieve synchronous contrast playing of the video to be processed and the processed video.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.
Claims (8)
1. An infrared pedestrian recognition method based on a deep learning model is characterized by comprising the following steps:
s1: acquiring data and preprocessing the data: acquiring an infrared image containing pedestrians, marking the infrared image, and dividing the infrared image into a training set and a test set of a detection model according to a set proportion;
s2: building a neural network: image segmentation is carried out on the picture by using a target detection network fast-Rcnn, and feature extraction and identification of the target picture are completed by using Vgg-16;
s3: model training: carrying out model training on the neural network by using a training set, and calculating loss functions of the training set and a testing set;
s4: designing an interactive interface: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video.
2. The infrared pedestrian recognition method based on the deep learning model as claimed in claim 1, wherein the step S1 includes the steps of:
s11: after acquiring pedestrian video source data, framing the video source by using Opencv;
s12: manually selecting and marking a frame of the picture by using LabelImg;
s13: making a VOC data set, and converting the VOC data set into a TFrecord;
s14: and finishing the pretreatment.
3. The infrared pedestrian recognition method based on the deep learning model as claimed in claim 2, wherein the step S13 specifically comprises the steps of: the marked pedestrian picture set is sorted into a data set, the marked data set is converted into a file with a TFRecord format, the annotated xml file is converted into a csv format, all data are divided into a training set and a testing set according to the ratio of 3: 1 by using xml _ to _ csv.py, a train.csv training set and an eval.csv testing set are generated, and finally the TFRecord file is generated.
4. The infrared pedestrian recognition method based on the deep learning model according to claim 1 or 2, wherein the step S2 includes the steps of:
s21: carrying out corresponding preprocessing on the image;
s22: inputting a preprocessed picture into a convolutional neural network to extract features, wherein the feature extraction network is intended to select Vgg _ 16;
s23: obtaining 256 × H × W feature maps by convolution of the common feature maps by 3 × 3, and obtaining H × W9 Anchors by a series of processing;
s24: each Anchors is subjected to post-processing to obtain k frames with the largest score;
s25: the k candidate frames correspond to the original image and are mapped to the public feature map;
s26: obtaining a group of characteristics with standard size by a candidate box on the public map through Roipooling;
s27: the Roi features are reclassified and regressed.
6. The infrared pedestrian recognition method based on the deep learning model as claimed in claim 1 or 5, wherein the step S3 comprises the steps of:
s31: selecting 25% ROI from each image;
s32: increasing a data set by adopting a random horizontal turning mode;
s33: growing the ROI of each image to at least 2000;
s34: calculating loss functions of the training set and the test set;
s35: and repeating the steps, and calculating the loss functions of the training set and the test set after each training.
7. The infrared pedestrian recognition method based on the deep learning model as claimed in claim 1, wherein the step S4 includes the steps of:
s41: establishing an interactive window;
s42: setting a menu bar, a status bar and a tool bar;
s43: setting a video playing window to be processed and a processed video playing window;
s44: and synchronously comparing and playing the video to be processed and the processed video.
8. An infrared pedestrian recognition system based on a deep learning model, using the method of any one of claims 1-7, comprising:
a data preprocessing module: the system comprises a detection model, a training set and a test set, wherein the detection model is used for acquiring an infrared image containing pedestrians, preprocessing the infrared image, manually marking the preprocessed infrared image, and then dividing the preprocessed infrared image into the training set and the test set of the detection model according to a set proportion;
a detection network and model training module: the method is used for building a network model, optimizing a detection method, adjusting parameters, deriving a model detection effect and determining a final model;
an external interaction module: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011298623.3A CN112488165A (en) | 2020-11-18 | 2020-11-18 | Infrared pedestrian identification method and system based on deep learning model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011298623.3A CN112488165A (en) | 2020-11-18 | 2020-11-18 | Infrared pedestrian identification method and system based on deep learning model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112488165A true CN112488165A (en) | 2021-03-12 |
Family
ID=74931765
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011298623.3A Pending CN112488165A (en) | 2020-11-18 | 2020-11-18 | Infrared pedestrian identification method and system based on deep learning model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112488165A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113405667A (en) * | 2021-05-20 | 2021-09-17 | 湖南大学 | Infrared thermal human body posture identification method based on deep learning |
CN114299429A (en) * | 2021-12-24 | 2022-04-08 | 宁夏广天夏电子科技有限公司 | Human body recognition method, system and device based on deep learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019010147A1 (en) * | 2017-07-05 | 2019-01-10 | Siemens Aktiengesellschaft | Semi-supervised iterative keypoint and viewpoint invariant feature learning for visual recognition |
CN110472542A (en) * | 2019-08-05 | 2019-11-19 | 深圳北斗通信科技有限公司 | A kind of infrared image pedestrian detection method and detection system based on deep learning |
CN111832515A (en) * | 2020-07-21 | 2020-10-27 | 上海有个机器人有限公司 | Dense pedestrian detection method, medium, terminal and device |
-
2020
- 2020-11-18 CN CN202011298623.3A patent/CN112488165A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019010147A1 (en) * | 2017-07-05 | 2019-01-10 | Siemens Aktiengesellschaft | Semi-supervised iterative keypoint and viewpoint invariant feature learning for visual recognition |
CN110472542A (en) * | 2019-08-05 | 2019-11-19 | 深圳北斗通信科技有限公司 | A kind of infrared image pedestrian detection method and detection system based on deep learning |
CN111832515A (en) * | 2020-07-21 | 2020-10-27 | 上海有个机器人有限公司 | Dense pedestrian detection method, medium, terminal and device |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113405667A (en) * | 2021-05-20 | 2021-09-17 | 湖南大学 | Infrared thermal human body posture identification method based on deep learning |
CN114299429A (en) * | 2021-12-24 | 2022-04-08 | 宁夏广天夏电子科技有限公司 | Human body recognition method, system and device based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Fernandes et al. | Predicting heart rate variations of deepfake videos using neural ode | |
CN105069472B (en) | A kind of vehicle checking method adaptive based on convolutional neural networks | |
CN109284738B (en) | Irregular face correction method and system | |
CN109919977B (en) | Video motion person tracking and identity recognition method based on time characteristics | |
CN109684925B (en) | Depth image-based human face living body detection method and device | |
CN112733950A (en) | Power equipment fault diagnosis method based on combination of image fusion and target detection | |
CN110532970B (en) | Age and gender attribute analysis method, system, equipment and medium for 2D images of human faces | |
CN108830252A (en) | A kind of convolutional neural networks human motion recognition method of amalgamation of global space-time characteristic | |
CN110826389B (en) | Gait recognition method based on attention 3D frequency convolution neural network | |
CN108647625A (en) | A kind of expression recognition method and device | |
CN109886153B (en) | Real-time face detection method based on deep convolutional neural network | |
CN109063643B (en) | Facial expression pain degree identification method under condition of partial hiding of facial information | |
CN109902558A (en) | A kind of human health deep learning prediction technique based on CNN-LSTM | |
CN102609724B (en) | Method for prompting ambient environment information by using two cameras | |
CN113762009B (en) | Crowd counting method based on multi-scale feature fusion and double-attention mechanism | |
CN107798279A (en) | Face living body detection method and device | |
CN114333070A (en) | Examinee abnormal behavior detection method based on deep learning | |
CN112488165A (en) | Infrared pedestrian identification method and system based on deep learning model | |
CN111666845A (en) | Small sample deep learning multi-mode sign language recognition method based on key frame sampling | |
CN114170537A (en) | Multi-mode three-dimensional visual attention prediction method and application thereof | |
CN113591825A (en) | Target search reconstruction method and device based on super-resolution network and storage medium | |
CN106529441A (en) | Fuzzy boundary fragmentation-based depth motion map human body action recognition method | |
CN113486712B (en) | Multi-face recognition method, system and medium based on deep learning | |
CN113076860B (en) | Bird detection system under field scene | |
CN111881818B (en) | Medical action fine-grained recognition device and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |