CN113269038B - Multi-scale-based pedestrian detection method - Google Patents

Multi-scale-based pedestrian detection method Download PDF

Info

Publication number
CN113269038B
CN113269038B CN202110419108.4A CN202110419108A CN113269038B CN 113269038 B CN113269038 B CN 113269038B CN 202110419108 A CN202110419108 A CN 202110419108A CN 113269038 B CN113269038 B CN 113269038B
Authority
CN
China
Prior art keywords
pedestrian
data set
pedestrian detection
network
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110419108.4A
Other languages
Chinese (zh)
Other versions
CN113269038A (en
Inventor
任健
邵文泽
李海波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110419108.4A priority Critical patent/CN113269038B/en
Publication of CN113269038A publication Critical patent/CN113269038A/en
Application granted granted Critical
Publication of CN113269038B publication Critical patent/CN113269038B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention discloses a multi-scale-based pedestrian detection method, which comprises the following steps of firstly, utilizing multi-scale feature fusion to learn more features of different scales, and achieving the purposes of distinguishing simple targets by utilizing shallow features and distinguishing complex targets by utilizing deep features; and secondly, in order to further improve the detection capability of the network on multi-scale, especially small targets, sliding windows with different sizes are used, so that the RPN outputs candidate regions generated by the sliding windows of different receptive fields. The pedestrian detection method has certain promotion in the aspect of pedestrian detection, has better robustness than the prior art, and can be used for detecting small-target pedestrians.

Description

Multi-scale-based pedestrian detection method
Technical Field
The invention relates to the field of computer vision image processing, in particular to a pedestrian detection method based on multiple scales.
Background
In recent years, computer vision technology has been rapidly developed with the support of deep learning, and has attracted a great deal of researchers. The ultimate goal is the same for thousands of researchers, although the focus of attention varies: to allow the technology to serve people, or to liberate productivity, or to improve quality of life. Since technology is serving people, research related to "people" is essential. Research related to "people" has played a leading role, both in academia and industry.
Pedestrian detection has received widespread attention in the past decade as the first and most basic step of many real-world tasks such as human behavior analysis, gait recognition, intelligent video surveillance and autopilot. However, although deep Convolutional Neural Networks (CNNs) have made great progress in detecting general targets, and have achieved good results, pedestrian detection, which is one of the important branches of general target detection, is a problem that has been difficult to solve for a long time. In the aspect of importance, pedestrian detection is a prerequisite for tasks such as pedestrian tracking, automatic driving and security monitoring. Although only "people" is a single category, we still face many challenges, such as detecting the diversity of scenes, the complexity of pedestrian gestures, and the possibility that the target to be detected is occluded.
Pedestrian detection plays an important role in the fields of intelligent monitoring and security protection, and monitoring equipment is equipped in most public places in order to prevent property safety, placement, deployment and the like. However, even when monitoring a large amount of pedestrian data in a device, only upon review by a specific person, problems arise in that, on the one hand, monitoring information for a long time, the person is certainly tired and information is wrong or missing as compared with a computer, and on the other hand, the limited ability to process information does not make full use of the monitored information. However, the defect of manual problem handling can be well made up by the relevant technology of pedestrian detection, so that manpower is saved, and early warning can be timely given in case of emergency.
The pedestrian detection technology is an important problem to be overcome and improved in the field of unmanned driving. Since the development of unmanned technology, pedestrian detection has been troubling many researchers as a problem to be solved and improved urgently. Although pedestrian detection has entered a rapid development stage since 2005, there still remain many problems to be solved, mainly two aspects, namely speed and accuracy, that are not yet a trade-off. In recent years, as the research and development of automatic driving technologies such as google are actively carried out, the emergence of an effective and rapid detection method for pedestrians is urgently needed so as to ensure that the safety of the pedestrians is not threatened during automatic driving. Thus, the solution of the pedestrian detection problem can fundamentally optimize the existing unmanned technology.
Pedestrian Detection (Pedestrian Detection) is the use of computer vision techniques to determine whether a Pedestrian is present in an image or video sequence and to provide accurate positioning. The technology can be combined with technologies such as pedestrian tracking, pedestrian re-identification and the like, and is applied to the fields of artificial intelligence systems, vehicle auxiliary driving systems, intelligent robots, intelligent video monitoring, human body behavior analysis, intelligent transportation and the like. Due to the characteristics of rigidity and flexibility of the pedestrian, the appearance is easily influenced by wearing, size, shielding, posture, visual angle and the like, so that the pedestrian detection becomes a hot topic with research value and great challenge in the field of computer vision.
Small objects are a very common problem in pedestrian detection, especially in autonomous driving or surveillance scenarios, which can be very challenging for existing algorithms when the pedestrian object is far from the camera. As a specific problem in general target detection, the existing CNN-based pedestrian detection method still comes from a general target detection method (e.g., fast R-CNN, SSD), and this method is completed by laying a target candidate box, which is called an anchor-based method. However, the anchor-based method suffers from three problems: firstly, a specific anchor point frame needs to be manually selected according to a specific data set to better match a pedestrian target, secondly, a threshold value needs to be manually set to define positive and negative samples, thirdly, deviation based on data set labeling exists in the training process, especially for a difficult sample and a small target, target information which can be learned by a network model in the pedestrian frame is very rare, and the deviation enables a detector to be more difficult to detect the difficult sample and the small target.
Disclosure of Invention
The invention aims to solve the technical problem of overcoming the defects of the prior art and provides a pedestrian detection method based on multiple scales, aiming at the detection of weak and small targets, the invention provides a two-stage pedestrian detection method which is based on multiple-scale angle and introduces multiple-scale feature fusion and multiple-scale receptive field RPN, and a universal multiple-scale model is constructed, so that the learning capability of a network on the features of small-target pedestrians can be improved, the detection precision of the small-target pedestrians is improved, and the detection omission is reduced.
The invention adopts the following technical scheme for solving the technical problems:
the invention provides a pedestrian detection method based on multiple scales, which comprises the following steps:
step 1, obtaining a pedestrian data set, wherein the pedestrian data set comprises a CityPersons pedestrian data set and a Caltech pedestrian data set;
step 2, building a pedestrian detection model, wherein the pedestrian detection model comprises a multi-scale feature fusion model and an RPN (resilient packet network), and the pedestrian detection model specifically comprises the following steps:
(1) constructing a multi-scale feature fusion model, wherein the construction process specifically comprises the following steps:
inputting the pedestrian data set into a first partial convolutional network, wherein the first partial convolutional network, a second partial convolutional network, a third partial convolutional network, a fourth partial convolutional network and a fifth partial convolutional network are sequentially connected in sequence,
the first partial convolution network is used for extracting a feature map fm1 of the pedestrian data set and outputting the feature map fm1 to the second partial convolution network;
the second partial convolution network is used for extracting the feature map fm2 of the pedestrian data set again and outputting the feature map fm2 to the third partial convolution network;
the third partial convolutional network is used for performing 3 convolutional layer operations of 3 × 3 and maximum pooling layer processing of 12 × 2 on the feature map fm2 to obtain a feature map fm3, and inputting the feature map fm3 into the fourth partial convolutional network;
the fourth part of the convolutional network is used for performing 3 convolutional layer operations of 3 multiplied by 3 and 1 maximum pooling layer processing of 2 multiplied by 2 on fm3 to obtain a feature map fm4, and inputting the feature map fm4 into the fourth part of the convolutional network;
a fifth partial convolutional network, configured to perform 3 convolutional layer operations of 3 × 3 and 1 maximum pooling layer processing of 2 × 2 on fm4, to obtain a feature map fm 5;
performing 1 × 1 convolution on the output fm5 obtained by the fifth part of convolution network, changing the number of channels of fm5, and recording the obtained output as M5;
performing 2 times of upsampling on the M5 to obtain upsampled M5; adding a feature map obtained by performing 1 × 1 convolution on fm4 and the up-sampled M5, and marking the obtained result as M4;
m4 is subjected to 2 times of upsampling to obtain upsampled M4; adding a feature map obtained by convolving fm3 by 1 × 1 to the up-sampled M4 to obtain a result which is recorded as M3;
recording the result obtained by performing 4-time down-sampling on the obtained M3 as M3, and recording the result obtained by performing 2-time down-sampling on the M4 as M4;
finally, the characteristic diagram of the addition output result of M3, M4 and M5 is sent to the RPN network;
(2) the RPN is used for generating a candidate region through a multiscale sensed wild sliding window; the method comprises the following specific steps:
the RPN is used for generating candidate regions by adopting 5 sliding windows with different sizes, the candidate regions are respectively realized by convolution of 1 × 1, 3 × 3, 5 × 5, 7 × 7 and 9 × 9, and finally, results formed by the obtained candidate regions with different receptive field sizes are merged;
step 3, inputting the pedestrian data set into the pedestrian detection model on the basis of the constructed pedestrian detection model, wherein the specific scheme is as follows: the pedestrian detection model training method based on the CityPersons comprises the steps that a built pedestrian detection model is pre-trained through a CityPersons pedestrian data set, the pedestrian detection model trained through the data set is obtained, on the basis of the model, the pedestrian detection model trained through a Caltech pedestrian data set is obtained through adjustment on the Caltech pedestrian data set, and pedestrian detection is achieved through the trained pedestrian detection model.
As a further optimization scheme of the multi-scale-based pedestrian detection method, in the training of the pedestrian detection model in the step 3, the pedestrian detection model is built by using a Pythrch deep learning frame, an optimization function is set as an Adam algorithm, a basic learning rate is set as 5e-3, e is a natural base number, the scale of the RPN network is quantized, and the aspect ratio of the RPN network is divided into [0.5, 0.65, 0.8, 0.95, 1.1, 1.25, 1.4, 1.55, 1.7, 1.85, 2], so that more anchor frames are generated; and correcting the parameters of the pedestrian detection model through error back propagation until the pedestrian detection model converges, and storing the parameters after the pedestrian detection model converges.
As a further optimization scheme of the pedestrian detection method based on the multi-scale, in step 1, a Citypersons pedestrian data set is a subset of a CitypScaps data set, the Citypersons pedestrian data set is labeled with content of a Human category, and the content of the Human category comprises pedestrians and riders.
As a further optimization scheme of the multi-scale-based pedestrian detection method, in step 1, a Caltech pedestrian data set is obtained by the following method: and (3) extracting the pedestrian video data set frame by frame and converting the format of the pedestrian video data set to obtain a single-frame image data set, wherein the single-frame image data set is a Caltech pedestrian data set.
In step 1, a CityPersons pedestrian data set and a Caltech pedestrian data set are stored in a VOC format.
As a further optimized solution of the multi-scale-based pedestrian detection method, the number d of channels of fm5 is 256.
As a further optimization scheme of the multi-scale-based pedestrian detection method, adding a feature map obtained by convolving fm4 by 1 × 1 to the up-sampled M5 means: the feature map obtained by 1 × 1 convolution of fm4 and the pixel value at the same position of M5 after upsampling are added, and the scale of the feature map obtained by 1 × 1 convolution of fm4 and the scale of M5 after upsampling are both 16 × 16.
As a further optimization scheme of the multi-scale-based pedestrian detection method, the addition of M3, M4 and M5 means that pixel values at the same positions of M3, M4 and M5 are added, and feature map sizes are all 16 × 16 in the same way.
Compared with the prior art, the technical scheme adopted by the invention has the following technical effects:
the method and the device improve the detection precision of the small target pedestrian, reduce missing detection, and improve the positioning precision and the robustness of the detection model.
Drawings
FIG. 1 is a schematic flow diagram of the framework of the present invention.
FIG. 2 is a structural diagram of a fast R-CNN target detection method.
FIG. 3 is a schematic diagram of a multi-scale feature fusion module.
Fig. 4 is a diagram of the multiscale receptor field RPN.
FIG. 5a is a graph showing the effect of the Adapt Faster R-CNN method.
FIG. 5b shows the effect of the method of the present invention.
FIG. 5c shows the effect of the Adapt Faster R-CNN method.
FIG. 5d shows the effect of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
(1) Obtaining a pedestrian data set
The pedestrian data set employs CityPersons pedestrian data set and Caltech pedestrian data set. The CityPersons dataset is a subset of the cityscaps dataset, and the labeling files of CityPersons label only the Human categories, i.e., person, rider, among them. The training set comprises 19654 pedestrians in 2975 images and 3938 pedestrians in 500 images in the verification set. The Caltech pedestrian data set is originally a video data set, a single-frame picture data set is obtained through frame-by-frame extraction and format conversion, and the training set comprises 122187 images. The two pedestrian data sets are stored in a VOC format, so that the pedestrian data set is convenient to train and use.
(2) Constructing a multi-scale feature fusion model
FIG. 2 shows a network structure of fast R-CNN, and FIG. 3 is a schematic diagram of a multi-scale feature fusion module.
The improvement is made on the basis.
The multi-scale feature fusion model comprises parts of a backbone network in the Faster R-CNN, which are respectively as follows: the first part of the convolutional network, the second part of the convolutional network, the third part of the convolutional network, the fourth part of the convolutional network and the fifth part of the convolutional network are sequentially connected; wherein, the first and the second end of the pipe are connected with each other,
as shown in fig. 1, the first partial convolution network is used for extracting a feature map fm1 of the pedestrian data set and outputting the feature map fm1 to the second partial convolution network;
the second partial convolution network is used for extracting the feature map fm2 of the pedestrian data set again and outputting the feature map fm2 to the third partial convolution network;
the third partial convolutional network is used for performing 3 convolutional layer operations of 3 × 3 and 1 maximum pooling layer processing of 2 × 2 on the feature map fm2 to obtain a feature map fm3, and inputting the feature map fm3 to the fourth partial convolutional network;
the fourth part of the convolutional network is used for performing 3 convolutional layer operations of 3 multiplied by 3 and 1 maximum pooling layer processing of 2 multiplied by 2 on fm3 to obtain a feature map fm4, and inputting the feature map fm4 into the fourth part of the convolutional network;
a fifth partial convolutional network, configured to perform 3 convolutional layer operations of 3 × 3 and 1 maximum pooling layer processing of 2 × 2 on fm4, to obtain a feature map fm 5;
performing 1 × 1 convolution on the output fm5 obtained by the fifth part of convolution network, changing the number of channels of fm5, and recording the obtained output as M5;
performing 2 times of upsampling on the M5 to obtain upsampled M5; adding a feature map obtained by performing 1 × 1 convolution on fm4 and the up-sampled M5, and marking the obtained result as M4;
performing 2 times of upsampling on the M4 to obtain upsampled M4; adding a feature map obtained by convolving fm3 by 1 × 1 to the up-sampled M4 to obtain a result which is recorded as M3;
recording the result obtained by performing 4-time down-sampling on the obtained M3 as M3, and recording the result obtained by performing 2-time down-sampling on the M4 as M4;
finally, adding the M3, M4 and M5 to output a characteristic diagram of the result, and sending the characteristic diagram to the RPN network;
(3) multi-scale reception wild sliding window generation candidate region
In general, the original RPN network generation candidate region is a final-layer feature map generation candidate region output by a backbone network (VGG16, ResNet, Res2Net, or the like), and a sliding window of a fixed size of 3 × 3 is used. In consideration of the changeable conditions of the sizes of pedestrians caused by the distance from the camera, the occlusion and the like in a real scene, the candidate region cannot be obtained by using a sliding window with one size, and pedestrian targets with different sizes can be obtained by using the receptive field sliding windows with different sizes. As shown in fig. 4, compared with the original detection method, the present invention proposes to generate candidate regions by using 5 sliding windows with different sizes at the RPN stage, which are respectively implemented by 1 × 1, 3 × 3, 5 × 5, 7 × 7 and 9 × 9 convolutions, and the receptive fields of the pixel points on each feature image generated by the sliding windows with different sizes are as follows (taking the backbone network as VGG16 as an example):
[1] 1X 1: corresponding to, 196X 196
[2] 3X 3: corresponding to, 228X 228
[3] 5X 5: corresponding to, 260X 260
[4] 7X 7: corresponding to, 292X 292
[5] 9X 9: corresponding to, 324X 324
[1] - [5] the receptive field was calculated in a top-down manner: for the general convolution, we can draw a conclusion by representing the receptive field, and assuming that the initial size of the receptive field is 1, for each layer, the receptive field of the layer has a linear relation with the previous layer. The method is related to the step size and the convolution kernel size of each layer, has no relation with the padding value, and the receptive field only represents the mapping relation of the two layers and has no relation with the size of an original graph. The formula is as follows:
F(i,j-1)=(F(i,j)-1)*stride+kernelsize
wherein F (i, j) represents the local receptive field of the ith layer to the jth layer, stride refers to the step size, and kernelsize refers to the size of the convolution kernel.
And finally, combining the results formed by the obtained candidate regions with different receptive field sizes, which is equivalent to expanding the number of the original candidate regions. The receptive field is the mapping relation between the pixel points of the convolutional layer output characteristic graph and the original image in the convolutional neural network, and according to the design, the accuracy of multi-scale pedestrian target detection is improved.
(4) Loss function
The loss functions involved in the invention are both classification loss and regression loss, and the formula is as follows:
Figure BDA0003027183830000061
the losses of the following two parts are introduced:
1) loss of classification
The anchor generated by the RPN network is only divided into a foreground and a background, wherein the label of the foreground is 1, and the label of the background is 0. During the training process, 256 anchors are generated, which correspond to N in the formulacls;piPredicting the probability of being the target for the anchor;
Figure BDA0003027183830000071
is a GT tag.
Figure BDA0003027183830000072
The above is a log-loss of two classes (target and non-target), which is a classical cross-entropy loss.
In addition, Fast RCNN corresponds to cross entropy loss of multi-classification (when the number of training classes is greater than 2), but because the present invention is based on the object detection problem of pedestrian classes, the present invention uses a two-classification loss function.
2) Loss of return
Wherein t isi={tx,ty,tw,thIs a vector representing the generated anchor, i.e., RPN and predicted regression offsets corresponding to the Fast RCNN training phase; is that
Figure BDA0003027183830000073
Is and tiThe same-dimension vectors represent the actual offsets from GT for the generated anchor, i.e., RPN and Fast RCNN training phases.
Figure BDA0003027183830000074
Figure BDA0003027183830000075
Is calculated for each anchor
Figure BDA0003027183830000076
After the part, multiply by p*Setting p in the presence of an object (positive), as described above*To 1, set p when no object (negative) is present*A value of 0 means that only the foreground calculates the loss and the background does not.
(5) Evaluation index of results
For the pedestrian detection task, the evaluation index is MR-FPPI (false Positive image). Basic meaning of FPPI: given a certain number N of sample sets, N images are contained, with or without detection targets in each image. Miss Rate: the loss rate, which is the positive case in the dataset, is determined as the number of negative cases/the number of positive cases detected + the number of undetected positive cases (i.e., the number of all GTs), and is related to the Recall rate Recall: miss Rate 1-Recall.
(1) Preparation of data sets required for the experiments
Cityscaps have provided example level tagging. But these marks only provide pixels of the visible area. Training directly using these markers has the following problems: 1) the proportion of the candidate area is irregular, which can influence the normalization of the regression candidate frame; 2) not aligned with each pedestrian, possibly not the center of the pedestrian in the horizontal and vertical directions; 3) existing data sets mark entire pedestrians, not just the visible region. It is therefore necessary to re-label these pedestrians, i.e. the origin of the CityPersons pedestrian data set. The training set comprises 19654 pedestrians in 2975 images and 3938 pedestrians in 500 images in the verification set.
(2) Training model
The pedestrian detection model is built by utilizing a Pythroch deep learning framework, relevant parameters of model training are carried out through configuration files, an optimization function is set as an Adam algorithm, a basic learning rate is set as 5e-3, the scale of RPN is quantized, and a default height-width ratio [0.5, 1, 2] is divided into [0.5, 0.65, 0.8, 0.95, 1.1, 1.25, 1.4, 1.55, 1.7, 1.85, 2], so that more anchor frames are generated. And (3) sending the training set prepared in the step (1) into a built model, extracting shared features through a backbone network and a combined multi-feature fusion network, and then generating a candidate region through an RPN module. And finally, sending the data to a prediction module for prediction. The model parameters are corrected by error back propagation until the model converges. And saving the parameters after the model convergence.
(3) Results of the experiment
In order to verify the performance of the Pedestrian target Detection model, the invention is compared with an article "City questions: A diversity Dataset for Pedestrian Detection" published in 2017 IEEE by Shanshan Zhang, Rodrigo Benenson and Bernt Schiie et al, the model in which is abbreviated as Adapt Faster R-CNN. The difference between the invention and Adapt Faster R-CNN is that the invention introduces a multi-scale feature fusion and RPN module of multi-scale receptive field. The results of the specific experiments are shown in table 1.
Figure BDA0003027183830000081
TABLE 1
Note: MRoRepresenting the error rate, the lower the error rate value, the better the result; Δ MR represents the change in error rate.
The results in Table 1 show that compared with the Adapt Faster R-CNN model, the performance index of the invention is slightly improved, which shows that the invention not only improves the detection effect of the model on small targets, but also reduces the missing rate of the model. However, according to the results of the visual detection boxes in fig. 5a, 5b, 5c and 5d, there are still some missed pedestrian targets, which indicates that the present invention still has a room for improvement in mining pedestrian detection of small targets.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (8)

1. A pedestrian detection method based on multi-scale is characterized by comprising the following steps:
step 1, acquiring a pedestrian data set, wherein the pedestrian data set comprises a CityPersons pedestrian data set and a Caltech pedestrian data set;
step 2, building a pedestrian detection model, wherein the pedestrian detection model comprises a multi-scale feature fusion model and an RPN (resilient packet network), and the pedestrian detection model specifically comprises the following steps:
(1) constructing a multi-scale feature fusion model, wherein the construction process specifically comprises the following steps:
inputting the pedestrian data set into a first part of convolutional network, wherein the first part of convolutional network, a second part of convolutional network, a third part of convolutional network, a fourth part of convolutional network and a fifth part of convolutional network are sequentially connected in sequence,
the first partial convolution network is used for extracting a feature map fm1 of the pedestrian data set and outputting the feature map fm1 to the second partial convolution network;
the second partial convolution network is used for extracting the feature map fm2 of the pedestrian data set again and outputting the feature map fm2 to the third partial convolution network;
the third partial convolutional network is used for performing 3 convolutional layer operations of 3 × 3 and 1 maximum pooling layer processing of 2 × 2 on the feature map fm2 to obtain a feature map fm3, and inputting the feature map fm3 to the fourth partial convolutional network;
the fourth part of the convolutional network is used for performing 3 convolutional layer operations of 3 multiplied by 3 and 1 maximum pooling layer processing of 2 multiplied by 2 on fm3 to obtain a feature map fm4, and inputting the feature map fm4 into the fourth part of the convolutional network;
a fifth partial convolutional network, configured to perform 3 convolutional layer operations of 3 × 3 and 1 maximum pooling layer processing of 2 × 2 on fm4, to obtain a feature map fm 5;
performing 1 × 1 convolution on the output fm5 obtained by the fifth part of convolution network, changing the number of channels of fm5, and recording the obtained output as M5;
performing 2 times of upsampling on the M5 to obtain upsampled M5; adding a feature map obtained by convolving fm4 by 1 × 1 to the up-sampled M5 to obtain a result which is recorded as M4;
m4 is subjected to 2 times of upsampling to obtain upsampled M4; adding a feature map obtained by convolving fm3 by 1 × 1 to the up-sampled M4 to obtain a result which is recorded as M3;
recording the result obtained by performing 4-time down-sampling on the obtained M3 as M3, and recording the result obtained by performing 2-time down-sampling on the M4 as M4;
finally, adding the M3, M4 and M5 to output a characteristic diagram of the result, and sending the characteristic diagram to the RPN network;
(2) the RPN is used for generating a candidate region through a multiscale sensed wild sliding window; the method comprises the following specific steps:
the RPN network is used for generating candidate regions by adopting 5 sliding windows with different sizes, and is respectively realized by convolution of 1 × 1, 3 × 3, 5 × 5, 7 × 7 and 9 × 9, and finally, the obtained results formed by the candidate regions with different receptive field sizes are merged;
step 3, inputting the pedestrian data set into the pedestrian detection model on the basis of the constructed pedestrian detection model, wherein the specific scheme is as follows: the pedestrian detection model training method based on the CityPersons comprises the steps that a built pedestrian detection model is pre-trained through a CityPersons pedestrian data set, the pedestrian detection model trained through the data set is obtained, on the basis of the model, the pedestrian detection model trained through a Caltech pedestrian data set is obtained through adjustment on the Caltech pedestrian data set, and pedestrian detection is achieved through the trained pedestrian detection model.
2. The pedestrian detection method based on the multi-scale according to claim 1, wherein in the training of the pedestrian detection model in the step 3, the pedestrian detection model is built by using a Pythroch deep learning framework, an optimization function is set as an Adam algorithm, a basic learning rate is set to be 5e-3, e is a natural base number, the scale of the RPN network is quantized, and the aspect ratio of the RPN network is divided into [0.5, 0.65, 0.8, 0.95, 1.1, 1.25, 1.4, 1.55, 1.7, 1.85, 2], so that more anchor points are generated; and correcting the parameters of the pedestrian detection model through error back propagation until the pedestrian detection model is converged, and storing the parameters after the pedestrian detection model is converged.
3. The method as claimed in claim 1, wherein in step 1, the citipersonn data set is a subset of the citiscales data set, and the citipersonns data set is labeled with Human-like content, and the Human-like content includes pedestrians and riders.
4. The multi-scale-based pedestrian detection method according to claim 1, wherein in step 1, the Caltech pedestrian data set is obtained by: and (3) extracting the pedestrian video data set frame by frame and converting the format of the pedestrian video data set to obtain a single-frame image data set, wherein the single-frame image data set is a Caltech pedestrian data set.
5. The multi-scale-based pedestrian detection method according to claim 1, wherein in step 1, the CityPersons pedestrian data set and the Caltech pedestrian data set are stored in a VOC format.
6. The multi-scale-based pedestrian detection method of claim 1, wherein the number d of channels of fm5 is 256.
7. The method for detecting pedestrians according to claim 1, wherein the step of adding the feature map obtained by 1 x 1 convolution of fm4 and the up-sampled M5 is: the feature map obtained by convolution of fm4 by 1 × 1 and the pixel value at the same position of the M5 after upsampling are added, and the feature map obtained by convolution of fm4 by 1 × 1 and the M5 after upsampling have the same scale of 16 × 16.
8. The method for detecting pedestrians according to claim 1, wherein the addition of M3, M4 and M5 means that pixel values at the same positions of M3, M4 and M5 are added, and feature maps are all 16 x 16 in size.
CN202110419108.4A 2021-04-19 2021-04-19 Multi-scale-based pedestrian detection method Active CN113269038B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110419108.4A CN113269038B (en) 2021-04-19 2021-04-19 Multi-scale-based pedestrian detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110419108.4A CN113269038B (en) 2021-04-19 2021-04-19 Multi-scale-based pedestrian detection method

Publications (2)

Publication Number Publication Date
CN113269038A CN113269038A (en) 2021-08-17
CN113269038B true CN113269038B (en) 2022-07-15

Family

ID=77229006

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110419108.4A Active CN113269038B (en) 2021-04-19 2021-04-19 Multi-scale-based pedestrian detection method

Country Status (1)

Country Link
CN (1) CN113269038B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113947144B (en) 2021-10-15 2022-05-17 北京百度网讯科技有限公司 Method, apparatus, device, medium and program product for object detection

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570564A (en) * 2016-11-03 2017-04-19 天津大学 Multi-scale pedestrian detection method based on depth network
CN109284670A (en) * 2018-08-01 2019-01-29 清华大学 A kind of pedestrian detection method and device based on multiple dimensioned attention mechanism
CN110490174A (en) * 2019-08-27 2019-11-22 电子科技大学 Multiple dimensioned pedestrian detection method based on Fusion Features
CN111695430A (en) * 2020-05-18 2020-09-22 电子科技大学 Multi-scale face detection method based on feature fusion and visual receptive field network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570564A (en) * 2016-11-03 2017-04-19 天津大学 Multi-scale pedestrian detection method based on depth network
CN109284670A (en) * 2018-08-01 2019-01-29 清华大学 A kind of pedestrian detection method and device based on multiple dimensioned attention mechanism
CN110490174A (en) * 2019-08-27 2019-11-22 电子科技大学 Multiple dimensioned pedestrian detection method based on Fusion Features
CN111695430A (en) * 2020-05-18 2020-09-22 电子科技大学 Multi-scale face detection method based on feature fusion and visual receptive field network

Also Published As

Publication number Publication date
CN113269038A (en) 2021-08-17

Similar Documents

Publication Publication Date Title
Li et al. Line-cnn: End-to-end traffic line detection with line proposal unit
WO2020173226A1 (en) Spatial-temporal behavior detection method
CN107563372B (en) License plate positioning method based on deep learning SSD frame
Hausler et al. Multi-process fusion: Visual place recognition using multiple image processing methods
CN109784150B (en) Video driver behavior identification method based on multitasking space-time convolutional neural network
US8345984B2 (en) 3D convolutional neural networks for automatic human action recognition
US10198657B2 (en) All-weather thermal-image pedestrian detection method
Hernández-Vela et al. Probability-based dynamic time warping and bag-of-visual-and-depth-words for human gesture recognition in rgb-d
CN110782420A (en) Small target feature representation enhancement method based on deep learning
CN112488073A (en) Target detection method, system, device and storage medium
EP3690741A2 (en) Method for automatically evaluating labeling reliability of training images for use in deep learning network to analyze images, and reliability-evaluating device using the same
CN110765906A (en) Pedestrian detection algorithm based on key points
CN110781836A (en) Human body recognition method and device, computer equipment and storage medium
CN110633632A (en) Weak supervision combined target detection and semantic segmentation method based on loop guidance
CN112633220B (en) Human body posture estimation method based on bidirectional serialization modeling
Ciberlin et al. Object detection and object tracking in front of the vehicle using front view camera
CN111008576B (en) Pedestrian detection and model training method, device and readable storage medium
CN110533661A (en) Adaptive real-time closed-loop detection method based on characteristics of image cascade
CN104615986A (en) Method for utilizing multiple detectors to conduct pedestrian detection on video images of scene change
US20210089837A1 (en) System and method of perception error evaluation and correction by solving optimization problems under the probabilistic signal temporal logic based constraints
Getahun et al. A deep learning approach for lane detection
CN105809718A (en) Object tracking method with minimum trajectory entropy
CN113269038B (en) Multi-scale-based pedestrian detection method
CN111444816A (en) Multi-scale dense pedestrian detection method based on fast RCNN
Rosales et al. Faster r-cnn based fish detector for smart aquaculture system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant