CN112488165A - Infrared pedestrian identification method and system based on deep learning model - Google Patents

Infrared pedestrian identification method and system based on deep learning model Download PDF

Info

Publication number
CN112488165A
CN112488165A CN202011298623.3A CN202011298623A CN112488165A CN 112488165 A CN112488165 A CN 112488165A CN 202011298623 A CN202011298623 A CN 202011298623A CN 112488165 A CN112488165 A CN 112488165A
Authority
CN
China
Prior art keywords
model
training
infrared
detection
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011298623.3A
Other languages
Chinese (zh)
Inventor
黄绍帅
崔光茫
张家承
蔡斌斌
方德宸
苏展
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202011298623.3A priority Critical patent/CN112488165A/en
Publication of CN112488165A publication Critical patent/CN112488165A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30204Marker

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an infrared pedestrian recognition method and system based on a deep learning model. Acquiring an infrared image containing a pedestrian and preprocessing the infrared image through a data preprocessing module, manually marking the preprocessed infrared image, and dividing the preprocessed infrared image into a training set and a test set of a detection model according to a set proportion; building a network model through a detection network and a model training module, optimizing a detection method, adjusting parameters, deriving a model detection effect, and determining a final model; an interactive window is established through an external interactive module, synchronous contrast and play of the video to be processed and the processed video are achieved, and therefore the system achieves the function of rapidly and accurately detecting pedestrians under the condition of complete no light.

Description

Infrared pedestrian identification method and system based on deep learning model
Technical Field
The invention relates to the technical field of infrared imaging, in particular to an infrared pedestrian identification method and system based on a deep learning model.
Background
The neural network algorithm is an algorithm simulating human thinking, the constructed network model is a simulated human brain, the training set is learning data provided by people, the training set is continuously sent to a computer for training to obtain a model, and the model is corrected by using the test set to finally obtain a model which is closest to ideal output. The model can identify the pedestrians in the image to the maximum extent under the condition of the infrared image. Since 2005, training libraries of pedestrian detection technologies tend to be large-scale, detection accuracy tends to be practical, and detection speed tends to be real-time. However, the existing pedestrian detection data set has few infrared thermal imaging pictures, and few algorithms are used for identifying pedestrians in the infrared images. Generally, a pedestrian recognition machine based on a general image depends on the quality of a picture, and it is difficult to accurately recognize a pedestrian in the case of insufficient light at night, so that an infrared recognition system is required for assistance.
Pedestrian detection develops for more than ten years, the current mainstream is the traditional algorithms such as Haar characteristic + Adaboost algorithm, Hog characteristic + Svm algorithm and the like, and deep learning is in rapid development because the deep learning is more in line with human thinking mode, open-source code mode and rapid development of big data in recent years and continuous optimization of network framework. The pedestrian detection is widely applied to the field of computer application, and due to the limitations of insufficient illumination, complex background, large change of human body form and the like in the traditional pedestrian identification method, the pedestrian cannot be accurately and truly identified, so that the pedestrian is required to be rapidly and accurately detected under the condition of complete no light, and the infrared thermal imaging and the pedestrian identification are combined to be matched with an advanced deep learning algorithm.
For example, chinese patent CN106407948A, published 2017, 2, month 15, a pedestrian detection and identification method based on an infrared night vision device, includes the following steps: collecting and storing video frames through an infrared night vision device, updating the effective data of the latest three frames in real time, and preprocessing the collected three-frame video images; performing area matching on the processed three frames of video images, performing three-frame difference calculation after image compensation is completed, and performing morphological expansion and corrosion treatment on the images; and carrying out pedestrian identification on the image according to the geometric characteristics and the motion rate characteristics. The method provides an improved three-frame difference method for pedestrian detection on the basis of the existing pedestrian detection, can better extract the outline of the pedestrian, and can identify and classify the target by combining the geometric and motion rate characteristics of the pedestrian, namely, the pedestrian moving on the road can be identified. The pedestrian identification detection method does not consider how to quickly and accurately detect the pedestrian under the condition of complete no light, and cannot quickly and accurately identify and detect the pedestrian under the condition of lack of visible light.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the technical problem that the existing pedestrian recognition system cannot quickly and accurately detect the pedestrian under the condition of no light is solved. The infrared pedestrian identification method and system based on the deep learning model can quickly and accurately detect pedestrians under the condition of no light.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: an infrared pedestrian recognition method based on a deep learning model comprises the following steps:
s1: acquiring data and preprocessing the data: acquiring an infrared image containing pedestrians, marking the infrared image, and dividing the infrared image into a training set and a test set of a detection model according to a set proportion;
s2: building a neural network: image segmentation is carried out on the picture by using a target detection network fast-Rcnn, and feature extraction and identification of the target picture are completed by using Vgg-16;
s3: model training: carrying out model training on the neural network by using a training set, and calculating loss functions of the training set and a testing set;
s4: designing an interactive interface: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video. An infrared thermal imager is used for shooting videos, and because the convolutional network can automatically extract information of various aspects such as color, contrast and the like, the videos are shot at different temperatures; because pedestrians in various postures are needed to enhance the robustness of the network, organization personnel are selected to take a swing shot because the street shot cannot be well recorded, 10000 pictures with good effect are intercepted as a picture set, the picture set is manually framed and selected as a data set, all data are divided into a training set and a testing set according to the proportion of 3: 1, and finally the training set is converted into a proper format and trained by using TensorFlow; the method comprises the steps of extracting features of pictures by using a convolutional neural network, completing construction of the neural network through Vgg-16, and training a model by using the most basic parameters and methods to train the network after the framework is constructed. And then, continuously adding an advanced optimization method, adjusting parameters, observing the recognition speed and the recognition accuracy before and after application, and continuously testing the effect and modifying the network. The training network has no endpoint, but the more times the training is better, because the network has the problems of insufficient fitting and overfitting, loss curves in a training set and a testing set need to be drawn, and a model of a better effect batch needs to be selected for derivation. Then, the effect of the model is required to be tested, the effect comprises a first type error rate and a second type error rate, the generated problems are theoretically analyzed, and network parameters such as regularization and the like and even a network structure are adjusted according to the analysis result. After the model is built and optimized, manufacturing a basic GUI (graphical user interface) by using PyQt5 and a Tkinter library, and realizing and displaying functions and feedback results of the model through an interactive interface. Setting menu bar, status bar and tool bar to make interface realize basic function, such as file leading-in, closing, image capturing, etc, then further designing to realize synchronous contrast playing of video to be processed and processed video.
Preferably, the step S1 includes the following steps:
s11: after acquiring pedestrian video source data, framing the video source by using Opencv;
s12: manually selecting and marking a frame of the picture by using LabelImg;
s13: making a VOC data set, and converting the VOC data set into a TFrecord;
s14: and finishing the pretreatment. The preprocessing of the picture is beneficial to the identification and extraction of the features in the picture later.
Preferably, the step S13 includes the following steps: the marked pedestrian picture set is sorted into a data set, the marked data set is converted into a file with a TFRecord format, the annotated xml file is converted into a csv format, all data are divided into a training set and a testing set according to the ratio of 3: 1 by using xml _ to _ csv.py, a train.csv training set and an eval.csv testing set are generated, and finally the TFRecord file is generated. The 3: 1 ratio may be more convenient to detect the training effect after the training test.
Preferably, the step S2 includes the following steps:
s21: carrying out corresponding preprocessing on the image;
s22: inputting a preprocessed picture into a convolutional neural network to extract features, wherein the feature extraction network is intended to select Vgg _ 16;
s23: obtaining 256 × H × W feature maps by convolution of the common feature maps by 3 × 3, and obtaining H × W9 Anchors by a series of processing;
s24: each Anchors is subjected to post-processing to obtain k frames with the largest score;
s25: the k candidate frames correspond to the original image and are mapped to the public feature map;
s26: candidate boxes on the public map get a set of features of standard size by Roi posing;
s27: the Roi features are reclassified and regressed. Vgg _16 is an excellent convolutional neural network framework. The Vgg-16 is used for identifying a single target picture, but the Vgg-16 can quickly identify the target picture, and the situation that a plurality of targets appear on one picture cannot be processed at all, so that the picture needs to be subjected to image segmentation firstly, namely, the picture is divided into a plurality of specific areas with unique properties, and an interested target is proposed. Fast-Rcnn, which belongs to the Rcnn pedigree, using one of the target detection networks, has been subjected to Rcnn, Fast-Rcnn, and finally to Fast-Rcnn, which already has a high recognition capability, and is divided into two parts, an image recognition part (Fast-Rcnn) and a candidate frame selection part (RPN).
Preferably, the loss function in step S3 is:
Figure BDA0002785102570000031
L(p,u,tu,v)=Lcls(p,u)+β[u≥1]Lloc(tu,v)
wherein, tuRepresents the predicted result, u represents the category, and v represents the true result. The loss function is the integration of the classified loss and the regressed loss, where the classification is log loss, i.e., the probability of a true classification is negatively log, and the regressed loss is substantially the same as R-CNN. The classification layer outputs the K +1 dimension, representing K classes and 1 background class.
Preferably, the step S3 includes the following steps:
s31: selecting 25% ROI from each image;
s32: increasing a data set by adopting a random horizontal turning mode;
s33: growing the ROI of each image to at least 2000;
s34: calculating loss functions of the training set and the test set;
s35: and repeating the steps, and calculating the loss functions of the training set and the test set after each training. In actual training, each mini-batch contains 2 images and 128 region probes (or ROIs), i.e., 64 ROIs per image. About 25% of the ROIs are then picked from these ROIs, both having IOU values greater than 0.5. In addition, only random horizontal inversion was used to increase the data set, and about 2000 ROIs were obtained per image when tested.
Preferably, the step S4 includes the following steps:
s41: establishing an interactive window;
s42: setting a menu bar, a status bar and a tool bar;
s43: setting a video playing window to be processed and a processed video playing window;
s44: and synchronously comparing and playing the video to be processed and the processed video. The functions of the program can be operated through a visual interactive interface, and the running result of the program is displayed in a visual mode, for example, the video to be processed and the processed video are synchronously compared and played. The result is displayed while the interface is designed to be beautiful and concise. The GUI is to be able to respond quickly to the user's operation and feedback results. In the design, various factors such as reduction of memory burden of a user are considered, and the interface is simple but the function is complete.
An infrared pedestrian recognition system based on a deep learning model is controlled by applying the method, and comprises the following steps:
a data preprocessing module: the system comprises a detection model, a training set and a test set, wherein the detection model is used for acquiring an infrared image containing pedestrians, preprocessing the infrared image, manually marking the preprocessed infrared image, and then dividing the preprocessed infrared image into the training set and the test set of the detection model according to a set proportion;
a detection network and model training module: the method is used for building a network model, optimizing a detection method, adjusting parameters, deriving a model detection effect and determining a final model;
an external interaction module: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video. The infrared pedestrian recognition system realizes the function that pedestrians cannot be rapidly and accurately detected under the condition of complete no light through the data preprocessing module, the detection network, the model training module and the external interaction module, and is more convenient for users to operate through the external interaction module.
The substantial effects of the invention are as follows: acquiring an infrared image containing a pedestrian and preprocessing the infrared image through a data preprocessing module, manually marking the preprocessed infrared image, and dividing the preprocessed infrared image into a training set and a test set of a detection model according to a set proportion; building a network model through a detection network and a model training module, optimizing a detection method, adjusting parameters, deriving a model detection effect, and determining a final model; an interactive window is established through an external interactive module, synchronous contrast and play of the video to be processed and the processed video are achieved, and therefore the system achieves the function of rapidly and accurately detecting pedestrians under the condition of complete no light.
Drawings
FIG. 1 is a flow chart of the overall implementation steps of the present invention.
Fig. 2 is an original infrared image.
Fig. 3 is an image after marking.
Fig. 4 is a schematic diagram of the present invention.
Detailed Description
The following provides a more detailed description of the present invention, with reference to the accompanying drawings.
An infrared pedestrian recognition method based on a deep learning model, as shown in fig. 1, includes the following steps:
s1: acquiring data and preprocessing the data: acquiring an infrared image containing pedestrians, marking the infrared image, and dividing the infrared image into a training set and a test set of a detection model according to a set proportion; step S1 includes the following steps:
s11: after acquiring pedestrian video source data, framing the video source by using Opencv;
s12: manually selecting and marking a frame of the picture by using LabelImg; the original infrared image is shown in fig. 2, and the marked image is shown in fig. 3;
s13: making a VOC data set, and converting the VOC data set into a TFrecord; the step S13 includes the following steps: the marked pedestrian picture set is sorted into a data set, the marked data set is converted into a file with a TFRecord format, the annotated xml file is converted into a csv format, all data are divided into a training set and a testing set according to the ratio of 3: 1 by using xml _ to _ csv.py, a train.csv training set and an eval.csv testing set are generated, and finally the TFRecord file is generated. The 3: 1 ratio may be more convenient to detect the training effect after the training test.
S14: and finishing the pretreatment. The preprocessing of the picture is beneficial to the identification and extraction of the features in the picture later.
S2: building a neural network: image segmentation is carried out on the picture by using a target detection network fast-Rcnn, and feature extraction and identification of the target picture are completed by using Vgg-16; step S2 includes the following steps:
s21: carrying out corresponding preprocessing on the image;
s22: inputting a preprocessed picture into a convolutional neural network to extract features, wherein the feature extraction network is intended to select Vgg _ 16;
s23: obtaining 256 × H × W feature maps by convolution of the common feature maps by 3 × 3, and obtaining H × W9 Anchors by a series of processing;
s24: each Anchors is subjected to post-processing to obtain k frames with the largest score;
s25: the k candidate frames correspond to the original image and are mapped to the public feature map;
s26: candidate boxes on the public map get a set of features of standard size by Roi posing;
s27: the Roi features are reclassified and regressed. Vgg _16 is an excellent convolutional neural network framework. The Vgg-16 is used for identifying a single target picture, but the Vgg-16 can quickly identify the target picture, and the situation that a plurality of targets appear on one picture cannot be processed at all, so that the picture needs to be subjected to image segmentation firstly, namely, the picture is divided into a plurality of specific areas with unique properties, and an interested target is proposed. Fast-Rcnn, which belongs to the Rcnn pedigree, using one of the target detection networks, has been subjected to Rcnn, Fast-Rcnn, and finally to Fast-Rcnn, which already has a high recognition capability, and is divided into two parts, an image recognition part (Fast-Rcnn) and a candidate frame selection part (RPN).
S3: model training: carrying out model training on the neural network by using a training set, and calculating loss functions of the training set and a testing set; the loss function in step S3 is:
Figure BDA0002785102570000061
L(p,u,tu,v)=Lcls(p,u)+β[u≥1]Lloc(tu,v)
wherein, tuRepresents the predicted result, u represents the category, and v represents the true result. The loss function is the integration of the classified loss and the regressed loss, where the classification is log loss, i.e., the probability of a true classification is negatively log, and the regressed loss is substantially the same as R-CNN. The classification layer outputs the K +1 dimension, representing K classes and 1 background class. Step S3 includes the following steps:
s31: selecting 25% ROI from each image;
s32: increasing a data set by adopting a random horizontal turning mode;
s33: growing the ROI of each image to at least 2000;
s34: calculating loss functions of the training set and the test set;
s35: and repeating the steps, and calculating the loss functions of the training set and the test set after each training. In actual training, each mini-batch contains 2 images and 128 region probes (or ROIs), i.e., 64 ROIs per image. About 25% of the ROIs are then picked from these ROIs, both having IOU values greater than 0.5. In addition, only random horizontal inversion was used to increase the data set, and about 2000 ROIs were obtained per image when tested.
S4: designing an interactive interface: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video. Step S4 includes the following steps:
s41: establishing an interactive window;
s42: setting a menu bar, a status bar and a tool bar;
s43: setting a video playing window to be processed and a processed video playing window;
s44: and synchronously comparing and playing the video to be processed and the processed video. The functions of the program can be operated through a visual interactive interface, and the running result of the program is displayed in a visual mode, for example, the video to be processed and the processed video are synchronously compared and played. The result is displayed while the interface is designed to be beautiful and concise. The GUI is to be able to respond quickly to the user's operation and feedback results. In the design, various factors such as reduction of memory burden of a user are considered, and the interface is simple but the function is complete.
An infrared pedestrian recognition system based on a deep learning model is controlled by applying the method as shown in fig. 4, and comprises a data preprocessing module: the system comprises a detection model, a training set and a test set, wherein the detection model is used for acquiring an infrared image containing pedestrians, preprocessing the infrared image, manually marking the preprocessed infrared image, and then dividing the preprocessed infrared image into the training set and the test set of the detection model according to a set proportion; a detection network and model training module: the method is used for building a network model, optimizing a detection method, adjusting parameters, deriving a model detection effect and determining a final model; an external interaction module: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video. The infrared pedestrian recognition system realizes the function that pedestrians cannot be rapidly and accurately detected under the condition of complete no light through the data preprocessing module, the detection network, the model training module and the external interaction module, and is more convenient for users to operate through the external interaction module.
In the embodiment, an infrared thermal imager is used for shooting videos, and because the convolutional network can automatically extract information of various aspects such as color, contrast and the like, the videos are selected to be shot at different temperatures; because pedestrians in various postures are needed to enhance the robustness of the network, organization personnel are selected to take a swing shot because the street shot cannot be well recorded, 10000 pictures with good effect are intercepted as a picture set, the picture set is manually framed and selected as a data set, all data are divided into a training set and a testing set according to the proportion of 3: 1, and finally the training set is converted into a proper format and trained by using TensorFlow; the method comprises the steps of extracting features of pictures by using a convolutional neural network, completing construction of the neural network through Vgg-16, and training a model by using the most basic parameters and methods to train the network after the framework is constructed. And then, continuously adding an advanced optimization method, adjusting parameters, observing the recognition speed and the recognition accuracy before and after application, and continuously testing the effect and modifying the network. The training network has no endpoint, but the more times the training is better, because the network has the problems of insufficient fitting and overfitting, loss curves in a training set and a testing set need to be drawn, and a model of a better effect batch needs to be selected for derivation. Then, the effect of the model is required to be tested, the effect comprises a first type error rate and a second type error rate, the generated problems are theoretically analyzed, and network parameters such as regularization and the like and even a network structure are adjusted according to the analysis result. After the model is built and optimized, the PyQt5 and the Tkinter library are used for basic GUI manufacturing, functions and feedback results of the model are achieved and displayed through an interactive interface, a menu bar, a status bar and a toolbar are set, so that the interface can achieve basic functions, such as file importing, file closing, image capturing and the like, and then further design is conducted to achieve synchronous contrast playing of the video to be processed and the processed video.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims (8)

1. An infrared pedestrian recognition method based on a deep learning model is characterized by comprising the following steps:
s1: acquiring data and preprocessing the data: acquiring an infrared image containing pedestrians, marking the infrared image, and dividing the infrared image into a training set and a test set of a detection model according to a set proportion;
s2: building a neural network: image segmentation is carried out on the picture by using a target detection network fast-Rcnn, and feature extraction and identification of the target picture are completed by using Vgg-16;
s3: model training: carrying out model training on the neural network by using a training set, and calculating loss functions of the training set and a testing set;
s4: designing an interactive interface: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video.
2. The infrared pedestrian recognition method based on the deep learning model as claimed in claim 1, wherein the step S1 includes the steps of:
s11: after acquiring pedestrian video source data, framing the video source by using Opencv;
s12: manually selecting and marking a frame of the picture by using LabelImg;
s13: making a VOC data set, and converting the VOC data set into a TFrecord;
s14: and finishing the pretreatment.
3. The infrared pedestrian recognition method based on the deep learning model as claimed in claim 2, wherein the step S13 specifically comprises the steps of: the marked pedestrian picture set is sorted into a data set, the marked data set is converted into a file with a TFRecord format, the annotated xml file is converted into a csv format, all data are divided into a training set and a testing set according to the ratio of 3: 1 by using xml _ to _ csv.py, a train.csv training set and an eval.csv testing set are generated, and finally the TFRecord file is generated.
4. The infrared pedestrian recognition method based on the deep learning model according to claim 1 or 2, wherein the step S2 includes the steps of:
s21: carrying out corresponding preprocessing on the image;
s22: inputting a preprocessed picture into a convolutional neural network to extract features, wherein the feature extraction network is intended to select Vgg _ 16;
s23: obtaining 256 × H × W feature maps by convolution of the common feature maps by 3 × 3, and obtaining H × W9 Anchors by a series of processing;
s24: each Anchors is subjected to post-processing to obtain k frames with the largest score;
s25: the k candidate frames correspond to the original image and are mapped to the public feature map;
s26: obtaining a group of characteristics with standard size by a candidate box on the public map through Roipooling;
s27: the Roi features are reclassified and regressed.
5. The infrared pedestrian recognition method based on the deep learning model as claimed in claim 1, wherein the loss function in the step S3 is:
Figure FDA0002785102560000021
L(p,u,tu,v)=Lcls(p,u)+β[u≥1]Lloc(tu,v)
wherein, tuRepresents the predicted result, u represents the category, and v represents the true result.
6. The infrared pedestrian recognition method based on the deep learning model as claimed in claim 1 or 5, wherein the step S3 comprises the steps of:
s31: selecting 25% ROI from each image;
s32: increasing a data set by adopting a random horizontal turning mode;
s33: growing the ROI of each image to at least 2000;
s34: calculating loss functions of the training set and the test set;
s35: and repeating the steps, and calculating the loss functions of the training set and the test set after each training.
7. The infrared pedestrian recognition method based on the deep learning model as claimed in claim 1, wherein the step S4 includes the steps of:
s41: establishing an interactive window;
s42: setting a menu bar, a status bar and a tool bar;
s43: setting a video playing window to be processed and a processed video playing window;
s44: and synchronously comparing and playing the video to be processed and the processed video.
8. An infrared pedestrian recognition system based on a deep learning model, using the method of any one of claims 1-7, comprising:
a data preprocessing module: the system comprises a detection model, a training set and a test set, wherein the detection model is used for acquiring an infrared image containing pedestrians, preprocessing the infrared image, manually marking the preprocessed infrared image, and then dividing the preprocessed infrared image into the training set and the test set of the detection model according to a set proportion;
a detection network and model training module: the method is used for building a network model, optimizing a detection method, adjusting parameters, deriving a model detection effect and determining a final model;
an external interaction module: and establishing an interactive window to realize synchronous contrast and play of the video to be processed and the processed video.
CN202011298623.3A 2020-11-18 2020-11-18 Infrared pedestrian identification method and system based on deep learning model Pending CN112488165A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011298623.3A CN112488165A (en) 2020-11-18 2020-11-18 Infrared pedestrian identification method and system based on deep learning model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011298623.3A CN112488165A (en) 2020-11-18 2020-11-18 Infrared pedestrian identification method and system based on deep learning model

Publications (1)

Publication Number Publication Date
CN112488165A true CN112488165A (en) 2021-03-12

Family

ID=74931765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011298623.3A Pending CN112488165A (en) 2020-11-18 2020-11-18 Infrared pedestrian identification method and system based on deep learning model

Country Status (1)

Country Link
CN (1) CN112488165A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113405667A (en) * 2021-05-20 2021-09-17 湖南大学 Infrared thermal human body posture identification method based on deep learning
CN114299429A (en) * 2021-12-24 2022-04-08 宁夏广天夏电子科技有限公司 Human body recognition method, system and device based on deep learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019010147A1 (en) * 2017-07-05 2019-01-10 Siemens Aktiengesellschaft Semi-supervised iterative keypoint and viewpoint invariant feature learning for visual recognition
CN110472542A (en) * 2019-08-05 2019-11-19 深圳北斗通信科技有限公司 A kind of infrared image pedestrian detection method and detection system based on deep learning
CN111832515A (en) * 2020-07-21 2020-10-27 上海有个机器人有限公司 Dense pedestrian detection method, medium, terminal and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019010147A1 (en) * 2017-07-05 2019-01-10 Siemens Aktiengesellschaft Semi-supervised iterative keypoint and viewpoint invariant feature learning for visual recognition
CN110472542A (en) * 2019-08-05 2019-11-19 深圳北斗通信科技有限公司 A kind of infrared image pedestrian detection method and detection system based on deep learning
CN111832515A (en) * 2020-07-21 2020-10-27 上海有个机器人有限公司 Dense pedestrian detection method, medium, terminal and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113405667A (en) * 2021-05-20 2021-09-17 湖南大学 Infrared thermal human body posture identification method based on deep learning
CN114299429A (en) * 2021-12-24 2022-04-08 宁夏广天夏电子科技有限公司 Human body recognition method, system and device based on deep learning

Similar Documents

Publication Publication Date Title
Fernandes et al. Predicting heart rate variations of deepfake videos using neural ode
CN105069472B (en) A kind of vehicle checking method adaptive based on convolutional neural networks
CN109284738B (en) Irregular face correction method and system
CN109919977B (en) Video motion person tracking and identity recognition method based on time characteristics
CN109684925B (en) Depth image-based human face living body detection method and device
CN112733950A (en) Power equipment fault diagnosis method based on combination of image fusion and target detection
CN110532970B (en) Age and gender attribute analysis method, system, equipment and medium for 2D images of human faces
CN108830252A (en) A kind of convolutional neural networks human motion recognition method of amalgamation of global space-time characteristic
CN110826389B (en) Gait recognition method based on attention 3D frequency convolution neural network
CN108647625A (en) A kind of expression recognition method and device
CN109886153B (en) Real-time face detection method based on deep convolutional neural network
CN109063643B (en) Facial expression pain degree identification method under condition of partial hiding of facial information
CN109902558A (en) A kind of human health deep learning prediction technique based on CNN-LSTM
CN102609724B (en) Method for prompting ambient environment information by using two cameras
CN113762009B (en) Crowd counting method based on multi-scale feature fusion and double-attention mechanism
CN107798279A (en) Face living body detection method and device
CN114333070A (en) Examinee abnormal behavior detection method based on deep learning
CN112488165A (en) Infrared pedestrian identification method and system based on deep learning model
CN111666845A (en) Small sample deep learning multi-mode sign language recognition method based on key frame sampling
CN114170537A (en) Multi-mode three-dimensional visual attention prediction method and application thereof
CN113591825A (en) Target search reconstruction method and device based on super-resolution network and storage medium
CN106529441A (en) Fuzzy boundary fragmentation-based depth motion map human body action recognition method
CN113486712B (en) Multi-face recognition method, system and medium based on deep learning
CN113076860B (en) Bird detection system under field scene
CN111881818B (en) Medical action fine-grained recognition device and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination