CN113033478A - Pedestrian detection method based on deep learning - Google Patents

Pedestrian detection method based on deep learning Download PDF

Info

Publication number
CN113033478A
CN113033478A CN202110420061.3A CN202110420061A CN113033478A CN 113033478 A CN113033478 A CN 113033478A CN 202110420061 A CN202110420061 A CN 202110420061A CN 113033478 A CN113033478 A CN 113033478A
Authority
CN
China
Prior art keywords
pedestrian detection
ssd
frame
pedestrian
detection model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110420061.3A
Other languages
Chinese (zh)
Inventor
卢立晖
索婕
王化建
张立华
司鹏程
丁明亮
李磊
张正强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rizhao Huilian Zhongchuang Intelligent Technology Research Institute
Qufu Normal University
Original Assignee
Rizhao Huilian Zhongchuang Intelligent Technology Research Institute
Qufu Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rizhao Huilian Zhongchuang Intelligent Technology Research Institute, Qufu Normal University filed Critical Rizhao Huilian Zhongchuang Intelligent Technology Research Institute
Priority to CN202110420061.3A priority Critical patent/CN113033478A/en
Publication of CN113033478A publication Critical patent/CN113033478A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a pedestrian detection method based on deep learning, belonging to the technical field of deep learning and pedestrian detection and comprising the following steps of: based on the traditional SSD pedestrian detection model, ResNet, VoVNet and K-means clustering is adopted for optimization, the problems of missing detection and false detection caused by dense or shielded pedestrians and undersized pedestrians in the SSD algorithm are solved, and the accuracy and the real-time performance of pedestrian detection and the small-target pedestrian detection performance are improved.

Description

Pedestrian detection method based on deep learning
Technical Field
The invention relates to the technical field of deep learning and pedestrian detection, in particular to a pedestrian detection method based on deep learning.
Background
Pedestrian detection is an important research branch in the field of computer vision, and the main task is to judge whether a pedestrian appears in an input image or video sequence and determine the position of the pedestrian. The pedestrian detection technology is widely applied to a plurality of fields such as video monitoring, vehicle auxiliary driving, intelligent robots and the like.
At present, the computer vision technology is rapidly developed, and the pedestrian detection is also greatly improved as an important research field, and gradually tends to practical application. With the research and application of the deep learning algorithm in pedestrian detection, a series of deep learning pedestrian detection algorithms are derived on the basis of the convolutional neural network. Compared with the traditional detection algorithm, the deep learning algorithm has stronger robustness and generalization capability, and can detect the pedestrian target more quickly and accurately. The pedestrian detection method has the advantages that continuous innovation and optimization of a pedestrian detection theory are benefited, the pedestrian detection provides technical support for the aspects of intelligent monitoring, unmanned driving and the like, and the pedestrian detection method has great application value.
However, in an actual monitoring scene, the current pedestrian detection and calculation method still has the problems of false detection and missing detection of pedestrians, and is easily influenced by factors such as shielding, pedestrian postures and scale changes, and the detection performance needs to be further enhanced.
Therefore, how to implement a pedestrian detection method based on deep learning is a problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
In view of the above, the present invention provides a pedestrian detection method based on deep learning, and aims to optimize an SSD algorithm for the problems of missing detection, false detection and long time consumption caused by dense or blocked pedestrians and too small pedestrian postures in the SSD algorithm, so as to improve the accuracy and speed of pedestrian detection and the small target pedestrian detection performance.
In order to achieve the purpose, the invention adopts the following technical scheme:
a pedestrian detection method based on deep learning comprises the following steps:
s100: acquiring a sample data set with a pedestrian target, and preprocessing the sample data set;
s200: building an SSD pedestrian detection model, and optimizing the SSD pedestrian detection model to obtain an optimized SSD pedestrian detection model;
s300: sending the sample data set obtained through the preprocessing in the step S100 into an optimized SSD pedestrian detection model for training to generate a preselection frame, and processing to obtain a detection frame;
s400: and detecting the pedestrian target in the sample data set by using the detection frame, and outputting and displaying the detection result.
Preferably, when step S300 is performed, K-means clustering is performed on the sample data set to obtain the optimal aspect ratio of the preselected frame, including:
s10: setting k clustering centers, and setting the coordinates of the clustering centers as (W)i,Hi) Calculating the distance between each preselection frame and each clustering center, and distributing the preselection frame to the nearest clustering center, wherein the specific expression is as follows:
d=1-IOU[(xj,yj,wj,hj),(xj,yj,Wj,Hj)]
j∈{1,2,…,N},i∈{1,2,…,k}
wherein d is the cluster center distance, (x)j,yj,wj,hj) For the corresponding coordinates of the real frames, IOU is the intersection ratio between two frames, N is the number of the preselected framesQuantity, k is the number of clustering center points;
s20: after the pre-selection frame is distributed, re-calculating the cluster center point of each cluster, namely calculating the average value of the width and the height of all the pre-selection frames, wherein the specific expression is as follows:
Figure BDA0003027477920000021
s30: repeating the step S10 and the step S20, and when the change of the clustering center is not obvious, obtaining the average value of the width and the height of the preselection frame at the moment to obtain the corresponding preselection frame;
s40: clustering the sample data set by using the preselection frame, and re-determining the width and the height of the preselection frame, wherein the specific expression is as follows:
Figure BDA0003027477920000022
wherein m isdRepresenting down-sampling magnification, wrWidth, w, of the pre-selection boxkRepresenting the width, h, of the input imagerHeight of the pre-selection box, hkRepresenting the height of the input image;
s50: and obtaining the optimal aspect ratio of the preselected frame according to the width and height values of the preselected frame.
Preferably, step S300 specifically includes:
s310: constructing an SSD network framework based on a ResNet residual error network structure, and constructing an SSD pedestrian detection model according to the SSD network framework to form an SSD pedestrian detection model;
s320: adding a VoVNet network into the SSD pedestrian detection model to obtain an optimized SSD pedestrian detection model;
s330: setting corresponding training parameters, creating a training set according to the training parameters to train the optimized SSD pedestrian detection model, and stopping training when the optimized SSD pedestrian detection model reaches the maximum iteration times to obtain a trained optimized SSD pedestrian detection model;
s340: and (5) sending the sample data set obtained in the step (S100) into a trained optimized SSD pedestrian detection model, generating a preselection frame, and processing to obtain a detection frame.
Preferably, the step S310 specifically includes:
s311: the SSD pedestrian detection model is composed of a plurality of residual block groups, each residual block group comprises a plurality of residual blocks, the output of the previous residual block is subjected to 1 x 1 convolution and converted into the same dimension, and the output of the previous residual block is used as the input of the whole residual structure and is input into the first convolution layer;
s312: the first convolution layer is connected with the SSD pedestrian detection model, and the output of the first convolution layer is used as the input of the next convolution layer;
s313: and combining the output of the next convolution layer after normalization and nonlinear function operation with the output of the previous residual error structure to form an SSD pedestrian detection model.
Preferably, the step S320 of joining the VoVNet network structure includes:
and sequentially connecting the first convolution layer and the residual block group in series according to the VoVNet network structure and finally performing one-time aggregation to obtain the optimized SSD pedestrian detection model.
Preferably, the step S100 further includes acquiring a test data set while acquiring the sample data set with the pedestrian target, testing the trained optimized SSD pedestrian detection model in the test data set, and outputting the tested trained optimized SSD pedestrian detection model.
Preferably, the method further includes the step S500:
s510: judging whether all the pre-selection frames are trained; if yes, go to step S520;
s520: carrying out non-maximum suppression processing on the detection frames, removing redundant detection frames and determining a unique detection frame;
s530: detecting the pedestrian target in the sample data set according to the unique detection frame;
s540: and outputting and displaying the detection result.
Compared with the prior art, the pedestrian detection method based on deep learning has the following beneficial effects that:
(1) the network structure fusing the ResNet and the VoVNet network models can effectively fuse multi-layer feature information and map the multi-layer feature information to deeper complex feature representations, so that the performance of target detection is improved, and a better detection effect is achieved on small target detection;
(2) the VoVNet network model aggregates the intermediate features of the last layer of each residual block at one time to form final feature mapping, more shallow features are aggregated on a transition layer, the number of network structure layers is reduced, the intermediate layers of the VoVNet model have the same input and output sizes, the network operation speed is higher, energy is saved, and the number of network layers is less;
(3) the optimal aspect ratio of the preselection frame is automatically obtained by using a K-means clustering experiment, the problem that the SSD algorithm depends on manual setting and experience is solved, the small target detection effect is enhanced, and the condition of missing detection is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method provided by the present invention;
fig. 2 is a schematic diagram of an optimized SSD network structure provided in this embodiment.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a pedestrian detection method based on deep learning, which comprises the following steps:
s100: acquiring a sample data set with a pedestrian target, and preprocessing the sample data set;
s200: building an SSD pedestrian detection model, and optimizing the SSD pedestrian detection model to obtain an optimized SSD pedestrian detection model;
s300: sending the sample data set obtained through the preprocessing in the step S100 into an optimized SSD pedestrian detection model for training to generate a preselection frame, and processing to obtain a detection frame;
s400: and detecting the pedestrian target in the sample data set by using the detection frame, and outputting and displaying the detection result.
In a specific embodiment, the sample data set with the pedestrian target obtained in step S100 may be local upload data or a public sample data set with a target pedestrian label;
specifically, the public sample data set may be a COCO data set, where COCO is a data set for detecting multiple types of targets, and includes 80 types, not only pedestrian images. The data set comprises about 33 thousands of pictures and more than 200 thousands of label information, and not only can target detection and positioning be carried out, but also target key point analysis and semantic understanding can be carried out. The open source of the COCO data set enables the image segmentation semantic understanding to make great progress, and the COCO data set also almost becomes a 'standard' data set for evaluating the performance of the image semantic understanding algorithm.
Specifically, if the acquired sample data set is local upload data, pedestrian data in the local sample data set also needs to be labeled, and a pedestrian data labeling file is generated;
more specifically, the process of labeling the local sample data set is as follows:
firstly, extracting pictures from a locally acquired video, naming the pictures according to a jpg format and storing the named pictures into a corresponding folder; marking the uniformly named pictures by using a marking tool, establishing a marking frame for people in the pictures, marking all pedestrians in the frame with labels, storing and generating corresponding xml files; and then, carrying out the next picture, repeating the step of marking until the pedestrians in all the pictures are labeled, directly numbering different pedestrians when the pedestrians are labeled in the process, and verifying whether the pedestrians can be identified through experiments.
In a specific embodiment, the S100 preprocessing the sample data set includes: and carrying out gray level processing, filtering processing and threshold segmentation processing on the images in the sample data set.
In a specific embodiment, the K-means clustering is performed on the sample data set to obtain the optimal aspect ratio of the preselected frame when performing step S300, and the method includes:
s10: setting k clustering centers, and setting the coordinates of the clustering centers as (W)i,Hi) Calculating the distance between each preselection frame and each clustering center, and distributing the preselection frame to the nearest clustering center, wherein the specific expression is as follows:
d=1-IOU[(xj,yj,wj,hj),(xj,yj,Wj,Hj)]
j∈{1,2,…,N},i∈{1,2,…,k}
wherein d is the cluster center distance, (x)j,yj,wj,hj) The corresponding coordinates of the real frames are represented by IOU, the intersection and parallel ratio between the two frames is represented by N, the number of the preselected frames is represented by k, and the number of the clustering center points is represented by k;
s20: after the pre-selection frame is distributed, re-calculating the cluster center point of each cluster, namely calculating the average value of the width and the height of all the pre-selection frames, wherein the specific expression is as follows:
Figure BDA0003027477920000061
s30: repeating the step S10 and the step S20, and when the change of the clustering center is not obvious, obtaining the average value of the width and the height of the preselection frame at the moment to obtain the corresponding preselection frame;
s40: clustering the sample data set by using the preselection frame, and re-determining the width and the height of the preselection frame, wherein the specific expression is as follows:
Figure BDA0003027477920000062
wherein m isdRepresenting down-sampling magnification, wrWidth, w, of the pre-selection boxkRepresenting the width, h, of the input imagerHeight of the pre-selection box, hkRepresenting the height of the input image;
s50: and obtaining the optimal aspect ratio of the preselected frame according to the width and height values of the preselected frame.
Through the steps, the problem that the proportion of the preselected frame in the SSD algorithm mostly needs to be manually set and depends too much on manual experience is solved. And the optimized SSD network model automatically acquires the optimal clustering number K value and the corresponding proportional value by adopting a K-means clustering method, the proportional value can be set to be 0.764 according to the clustering, and the aspect ratio of the preselected frame is modified. The size selected through the clustering experiment is closer to the real frame size in the pedestrian detection process, and the pedestrian target can be quickly and accurately detected.
In a specific embodiment, step S300 further includes:
s310: constructing an SSD network framework based on a ResNet residual error network structure, and constructing an SSD pedestrian detection model according to the SSD network framework to form an SSD pedestrian detection model;
s320: adding a VoVNet network into the SSD pedestrian detection model to obtain an optimized SSD pedestrian detection model;
s330: setting corresponding training parameters, creating a training set according to the training parameters to train the optimized SSD pedestrian detection model, and stopping training when the optimized SSD pedestrian detection model reaches the maximum iteration times to obtain a trained optimized SSD pedestrian detection model;
s340: and (5) sending the sample data set obtained in the step (S100) into a trained optimized SSD pedestrian detection model, generating a preselection frame, and processing to obtain a detection frame.
In a specific embodiment, step S310 is specifically as follows:
s311: the SSD pedestrian detection model is composed of a plurality of residual block groups, each residual block group comprises a plurality of residual blocks, the output of the previous residual block is subjected to 1 x 1 convolution and converted into the same dimension, and the output of the previous residual block is used as the input of the whole residual structure and is input into the first convolution layer;
s312: the first convolution layer is connected with the SSD pedestrian detection model, and the output of the first convolution layer is used as the input of the next convolution layer;
s313: and combining the output of the next convolution layer after normalization and nonlinear function operation with the output of the previous residual error structure to form an SSD pedestrian detection model.
In an embodiment, the step S320 of joining the VoVNet network specifically includes:
and sequentially connecting the first convolution layer and the residual block group in series according to the VoVNet network structure and finally performing one-time aggregation to obtain the optimized SSD pedestrian detection model.
Specifically, the optimized SSD network structure is shown in fig. 2, and includes: the ResNet structure comprises three groups of residual block groups, namely a residual block group 1, a residual block group 2 and a residual block group 3; after the VoVNet is added, the VoVNet includes a convolution layer, a residual block group and an aggregation module,
more specifically, the optimized SSD pedestrian detection model has a specific structure including: the method comprises the steps of firstly winding a layer, a residual block group 0_ residual block 0 to a residual block group 0_ residual block 6, a first aggregation module, a residual block group 1_ residual block 0 to a residual block group 1_ residual block 6, a second aggregation module, a residual block group 2_ residual block 0 to a residual block group 2_ residual block 3, a residual block group 2_ residual block 4 to a residual block group 2_ residual block 6, a winding layer 3_2, a winding layer 4_2, a winding layer 5_2 and a winding layer 6_2, then sequentially connecting in series, and finally conducting one-time aggregation to obtain an optimized SSD pedestrian detection model.
More specifically, the working principle of the optimized SSD pedestrian detection model is as follows: a polymerization module is arranged behind each group of residual block groups, ResNet is added into the polymerization module behind each residual block group, characteristic fusion is carried out between each two residual blocks in the residual block groups through nonlinear transformation Conv + BN + ReLU combination, the selected residual block group 2_ residual block 3, the selected residual block group 2_ residual block 6 and 4 convolution layers containing 1 x 1 and 3 x 3 are subjected to characteristic extraction, namely, the optimized SSD pedestrian detection model extracts characteristic information by using six characteristic diagrams of the residual block group 2_ residual block 0 to the residual block group 2_ residual block 3, the residual block group 2_ residual block 4 to the residual block group 2_ residual block 6, the convolution layer 3_2, the convolution layer 4_2, the convolution layer 5_2 and the convolution layer 6_2, a backbone network uses a Net structure, the obtained characteristic diagram refers to the fact that a VoVNet network model is connected to the next layer and carries out one-time polymerization in the final characteristic diagram, and forming final characteristic output to obtain the optimized SSD pedestrian detection model.
Through the steps, the output of each layer of the optimized SSD pedestrian detection network structure is not directly connected to all subsequent intermediate layers, so that the input size of the intermediate layers is kept unchanged; the shallow features are more gathered on the transition layer, and the deep features have little influence on the transition layer, so that the network parameters and the number of intermediate structural layers are reduced on the premise of not influencing feature transmission. The optimized SSD pedestrian detection structure integrates the advantages of ResNet and VoVNet, combines features of different layers together to describe a target together while learning residual errors, and continuously integrates shallow layer feature information into a deep layer network structure, so that final feature output fully combines the shallow layer feature information and the deep layer network feature information, and features can be better learned. The model further effectively relieves the problems of gradient loss and insufficient precision of small target detection on the basis of ResNet-SSD, so that the network is easy to train; the parameter quantity of the network is greatly reduced, the resource waste is reduced, and the target detection performance is improved.
In a specific embodiment, the step S100 of obtaining the sample data set with the pedestrian target further includes obtaining a test data set, testing the trained optimized SSD pedestrian detection model in the test data set, and outputting the tested trained optimized SSD pedestrian detection model.
Specifically, an SSD detection model is built based on a PyTorch deep learning framework, the detection model is modified into a two-classification model suitable for pedestrian detection, an SSD pedestrian detection model is built according to the PyTorch framework and an SSD algorithm framework, and the trained optimized SSD pedestrian detection model is tested according to a test data set.
In a specific embodiment, the step S340 of obtaining the detection frame through processing is to obtain a corresponding detection frame according to the preselected frame matching, specifically:
s341: searching a corresponding detection frame with the maximum cross-over ratio according to the preselection frames for matching, and ensuring that each preselection frame has one detection frame corresponding to the preselection frame;
s342: and for the remaining detection frames after the matching in S341, trying to match with any labeling frame, and if the intersection ratio between the two is greater than a preset value, matching the two.
Specifically, the specific calculation formula of the intersection ratio is as follows:
Figure BDA0003027477920000081
wherein, A is a preselection frame, B is a detection frame, and J (A, B) is the ratio of intersection and union of the preselection frame and the detection frame.
In a specific embodiment, the method further includes step S500:
s510: judging whether all the pre-selection frames are trained; if yes, go to step S520;
s520: carrying out non-maximum value suppression processing on the detection frames, removing redundant detection frames and determining a unique detection frame;
s530: detecting the pedestrian target in the sample data set according to the unique detection frame;
s540: and outputting and displaying the detection result.
According to the technical scheme, compared with the prior art, the pedestrian detection method based on deep learning is provided, the VoVNet network replaces VGG to serve as a neural network model of the SSD network, and the original SSD convolution layer is connected in a short-circuit connection mode to form a short-circuit mechanism. The characteristic information of the shallow network can be connected to the deep network structure through the short circuit mechanisms, so that the deep neural network can fuse the shallow characteristic information, the shallow characteristic information is fully utilized, the characteristic information of the target is better expressed, and the precision of small target detection is improved. Moreover, a normalized BN operation and a nonlinear ReLU function operation are added between residual block convolution layers of the ResNet network and between adjacent residual structures, and the stability of the deep neural network structure is maintained. However, the ResNet network has a deep hierarchy, which means that the residual error result needs to be repeated for many times, so that the parameter utilization rate is low, and meanwhile, the problems of low operation speed, large occupied memory, low calculation efficiency, lack of small target detection precision and the like exist.
The method has the following specific beneficial effects:
(1) the network structure fusing the ResNet and the VoVNet network models can effectively fuse multi-layer feature information and map the multi-layer feature information to deeper complex feature representations, so that the performance of target detection is improved, and a better detection effect is achieved on small target detection;
(2) the VoVNet network model aggregates the intermediate features of the last layer of each residual block at one time to form final feature mapping, more shallow features are aggregated on a transition layer, the number of network structure layers is reduced, the intermediate layers of the VoVNet model have the same input and output sizes, the network operation speed is higher, energy is saved, and the number of network layers is less;
(3) the optimal aspect ratio of the preselection frame is automatically obtained by using a K-means clustering experiment, the problem that the SSD algorithm depends on manual setting and experience is solved, the small target detection effect is enhanced, and the condition of missing detection is reduced.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

1. A pedestrian detection method based on deep learning is characterized by comprising the following steps:
s100: acquiring a sample data set with a pedestrian target, and preprocessing the sample data set;
s200: building an SSD pedestrian detection model, and optimizing the SSD pedestrian detection model to obtain an optimized SSD pedestrian detection model;
s300: sending the sample data set obtained through the preprocessing in the step S100 into an optimized SSD pedestrian detection model for training to generate a preselection frame, and processing to obtain a detection frame;
s400: and detecting the pedestrian target in the sample data set by using the detection frame, and outputting and displaying the detection result.
2. The deep learning-based pedestrian detection method according to claim 1, wherein the K-means clustering is performed on the sample data set to obtain the optimal aspect ratio of the pre-selection frame in step S300, and the method comprises:
s10: setting k clustering centers, and setting the coordinates of the clustering centers as (W)i,Hi) Calculating the distance between each preselection frame and each clustering center, and distributing the preselection frame to the nearest clustering center, wherein the specific expression is as follows:
d=1-IOU[(xj,yj,wj,hj),(xj,yj,Wj,Hj)]
j∈{1,2,…,N},i∈{1,2,…,k}
wherein d is the cluster center distance, (x)j,yj,wj,hj) The corresponding coordinates of the real frames are represented by IOU, the intersection and parallel ratio between the two frames is represented by N, the number of the preselected frames is represented by k, and the number of the clustering center points is represented by k;
s20: after the pre-selection frame is distributed, re-calculating the cluster center point of each cluster, namely calculating the average value of the width and the height of all the pre-selection frames, wherein the specific expression is as follows:
Figure FDA0003027477910000011
s30: repeating the step S10 and the step S20, and when the change of the clustering center is not obvious, obtaining the average value of the width and the height of the preselection frame at the moment to obtain the corresponding preselection frame;
s40: clustering the sample data set by using the preselection frame, and re-determining the width and the height of the preselection frame, wherein the specific expression is as follows:
Figure FDA0003027477910000012
wherein m isdRepresenting down-sampling magnification, wrWidth, w, of the pre-selection boxkRepresenting the width, h, of the input imagerHeight of the pre-selection box, hkRepresenting the height of the input image;
s50: and obtaining the optimal aspect ratio of the preselected frame according to the width and height values of the preselected frame.
3. The pedestrian detection method based on deep learning of claim 2, wherein step S300 specifically includes:
s310: constructing an SSD network framework based on a ResNet residual error network structure, and constructing an SSD pedestrian detection model according to the SSD network framework to form an SSD pedestrian detection model;
s320: adding a VoVNet network into the SSD pedestrian detection model to obtain an optimized SSD pedestrian detection model;
s330: setting corresponding training parameters, creating a training set according to the training parameters to train the optimized SSD pedestrian detection model, and stopping training when the optimized SSD pedestrian detection model reaches the maximum iteration times to obtain a trained optimized SSD pedestrian detection model;
s340: and (5) sending the sample data set obtained in the step (S100) into a trained optimized SSD pedestrian detection model, generating a preselection frame, and processing to obtain a detection frame.
4. The pedestrian detection method based on deep learning of claim 3, wherein the step S310 is as follows:
s311: the SSD pedestrian detection model is composed of a plurality of residual block groups, each residual block group comprises a plurality of residual blocks, the output of the previous residual block is subjected to 1 x 1 convolution and converted into the same dimension, and the output of the previous residual block is used as the input of the whole residual structure and is input into the first convolution layer;
s312: the first convolution layer is connected with the SSD pedestrian detection model, and the output of the first convolution layer is used as the input of the next convolution layer;
s313: and combining the output of the next convolution layer after normalization and nonlinear function operation with the output of the previous residual error structure to form an SSD pedestrian detection model.
5. The deep learning-based pedestrian detection method according to claim 3, wherein the step S320 of joining a VoVNet network structure comprises:
and sequentially connecting the first convolution layer and the residual block group in series according to the VoVNet network structure and finally performing one-time aggregation to obtain the optimized SSD pedestrian detection model.
6. The pedestrian detection method based on deep learning of claim 3, wherein the step S100 further includes obtaining a test data set while obtaining the sample data set with the pedestrian target, testing the trained optimized SSD pedestrian detection model on the test data set, and outputting the tested trained optimized SSD pedestrian detection model.
7. The pedestrian detection method based on deep learning of claim 1, further comprising the step S500:
s510: judging whether all the pre-selection frames are trained; if yes, go to step S520;
s520: carrying out non-maximum suppression processing on the detection frames, removing redundant detection frames and determining a unique detection frame;
s530: detecting the pedestrian target in the sample data set according to the unique detection frame;
s540: and outputting and displaying the detection result.
CN202110420061.3A 2021-04-19 2021-04-19 Pedestrian detection method based on deep learning Pending CN113033478A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110420061.3A CN113033478A (en) 2021-04-19 2021-04-19 Pedestrian detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110420061.3A CN113033478A (en) 2021-04-19 2021-04-19 Pedestrian detection method based on deep learning

Publications (1)

Publication Number Publication Date
CN113033478A true CN113033478A (en) 2021-06-25

Family

ID=76456851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110420061.3A Pending CN113033478A (en) 2021-04-19 2021-04-19 Pedestrian detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN113033478A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985222A (en) * 2018-07-12 2018-12-11 天津艾思科尔科技有限公司 A kind of deep learning network model and system for making and receiving calls identification
CN109635717A (en) * 2018-12-10 2019-04-16 天津工业大学 A kind of mining pedestrian detection method based on deep learning
CN110070074A (en) * 2019-05-07 2019-07-30 安徽工业大学 A method of building pedestrian detection model
CN110119718A (en) * 2019-05-15 2019-08-13 燕山大学 A kind of overboard detection and Survivable Control System based on deep learning
CN111191535A (en) * 2019-12-18 2020-05-22 南京理工大学 Pedestrian detection model construction method based on deep learning and pedestrian detection method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985222A (en) * 2018-07-12 2018-12-11 天津艾思科尔科技有限公司 A kind of deep learning network model and system for making and receiving calls identification
CN109635717A (en) * 2018-12-10 2019-04-16 天津工业大学 A kind of mining pedestrian detection method based on deep learning
CN110070074A (en) * 2019-05-07 2019-07-30 安徽工业大学 A method of building pedestrian detection model
CN110119718A (en) * 2019-05-15 2019-08-13 燕山大学 A kind of overboard detection and Survivable Control System based on deep learning
CN111191535A (en) * 2019-12-18 2020-05-22 南京理工大学 Pedestrian detection model construction method based on deep learning and pedestrian detection method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YULUN ZHANG ET AL: "Residual Dense Network for Image Super-Resolution", 《ARXIV》 *
高鹏威等: "中密度纤维板表面缺陷实时检测系统", 《林业和草原机械》 *

Similar Documents

Publication Publication Date Title
CN111783590A (en) Multi-class small target detection method based on metric learning
CN109919934B (en) Liquid crystal panel defect detection method based on multi-source domain deep transfer learning
CN111444939B (en) Small-scale equipment component detection method based on weak supervision cooperative learning in open scene of power field
CN113609896B (en) Object-level remote sensing change detection method and system based on dual-related attention
CN106408015A (en) Road fork identification and depth estimation method based on convolutional neural network
CN107229757A (en) The video retrieval method encoded based on deep learning and Hash
CN111325347A (en) Automatic danger early warning description generation method based on interpretable visual reasoning model
CN110020650B (en) Inclined license plate recognition method and device based on deep learning recognition model
CN114399672A (en) Railway wagon brake shoe fault detection method based on deep learning
CN115294150A (en) Image processing method and terminal equipment
CN110287806A (en) A kind of traffic sign recognition method based on improvement SSD network
WO2024060529A1 (en) Pavement disease recognition method and system, device, and storage medium
CN115439694A (en) High-precision point cloud completion method and device based on deep learning
CN113850136A (en) Yolov5 and BCNN-based vehicle orientation identification method and system
CN115147380A (en) Small transparent plastic product defect detection method based on YOLOv5
CN113361496B (en) City built-up area statistical method based on U-Net
CN114140665A (en) Dense small target detection method based on improved YOLOv5
CN111079826B (en) Construction progress real-time identification method integrating SLAM and image processing
Park et al. Point cloud information modeling (PCIM): An innovative framework for as-is information modeling of construction sites
Zhang et al. Pavement crack detection based on deep learning
CN114387261A (en) Automatic detection method suitable for railway steel bridge bolt diseases
CN113408630A (en) Transformer substation indicator lamp state identification method
CN115035354B (en) Reservoir water surface floater target detection method based on improved YOLOX
CN113033478A (en) Pedestrian detection method based on deep learning
KR102416714B1 (en) System and method for city-scale tree mapping using 3-channel images and multiple deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210625