CN114821665A - Urban pedestrian flow small target detection method based on convolutional neural network - Google Patents

Urban pedestrian flow small target detection method based on convolutional neural network Download PDF

Info

Publication number
CN114821665A
CN114821665A CN202210574388.0A CN202210574388A CN114821665A CN 114821665 A CN114821665 A CN 114821665A CN 202210574388 A CN202210574388 A CN 202210574388A CN 114821665 A CN114821665 A CN 114821665A
Authority
CN
China
Prior art keywords
feature
feature map
network
convolution
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210574388.0A
Other languages
Chinese (zh)
Inventor
产思贤
俞敏明
赖周年
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202210574388.0A priority Critical patent/CN114821665A/en
Publication of CN114821665A publication Critical patent/CN114821665A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a city pedestrian flow small target detection method based on a convolutional neural network, which comprises the steps of conducting Mosaic data enhancement and MixUp data enhancement on an image training data set labeled with a portrait small target detection frame, adjusting the enhanced image training data set to be the size of an input picture, inputting the picture into a backbone network to obtain feature maps with four sizes of the backbone network, inputting the feature maps with four sizes into a feature fusion network BIAFPN for feature processing, respectively transmitting the fused feature maps into corresponding prediction heads, respectively conducting convolution of classification branches and regression branches, then connecting along a channel part, stretching the connected feature maps into one dimension, then connecting the stretched feature maps to obtain a final feature map, calculating loss, conducting back propagation and updating network parameters, and completing network training. The invention introduces shallow fine-grained characteristics and then adopts a characteristic fusion network to detect the target, thereby effectively improving the precision of detecting the urban portrait small target.

Description

Urban pedestrian flow small target detection method based on convolutional neural network
Technical Field
The application belongs to the technical field of deep learning image processing, and particularly relates to a method for detecting urban pedestrian flow small targets based on a convolutional neural network.
Background
Target detection is a basic problem of machine vision, supports visual tasks such as instance segmentation, target tracking and action recognition, and is widely applied to the fields of automobile automatic driving, satellite images, monitoring and the like. Most of existing target detection algorithms adopt a method with an anchor frame, but the problem of imbalance of positive and negative samples is often caused, small target detection is more difficult, and how to improve small target detection precision is still a difficult problem of current detection.
The mainstream technical scheme for target detection at present comprises a one-stage algorithm and a two-stage algorithm. A two-stage mainstream algorithm such as FasterR-CNN series firstly screens a large number of candidate regions possibly having targets, and then detects the candidate regions. The mainstream algorithm of a stage, such as the YOLO series, directly completes the end-to-end prediction, and the model detection speed is faster, but the object detection precision is reduced to a certain extent.
Disclosure of Invention
The method aims to provide a convolutional neural network-based urban pedestrian flow small target detection method, a shallow information layer is introduced into the original YOLOX technical scheme to be multi-scale, and a better characteristic fusion mode BIAFPN is adopted, so that the problem of low urban portrait small target detection precision is solved.
In order to achieve the purpose, the technical scheme of the application is as follows:
a method for detecting urban pedestrian flow small targets based on a convolutional neural network comprises the following steps:
acquiring an image training data set with a portrait small target detection frame, and performing Mosaic data enhancement and MixUp data enhancement on the image training data set;
adjusting the enhanced image training data set to be the input image size, inputting the input image training data set into the backbone network CSPDarknet-53, and acquiring a feature map F with four sizes output by a dark2 unit, a dark3 unit, a dark4 unit and a dark5 unit in the backbone network CSPDarknet-53 1 、F 2 、F 3 、F 4
Feature map F of four sizes 1 、F 2 、F 3 、F 4 Inputting the data into a feature fusion network BIAFPN to perform feature processing to obtain a fused feature map F 12 、F 22 、F 32 、F 42
The fused feature map F 12 、F 22 、F 32 、F 42 Respectively transmitting the data into corresponding prediction heads, respectively performing convolution of classification branches and regression branches, then connecting the classification branches and the regression branches along a channel part, stretching the connected feature graph into one dimension to obtain a stretched feature graph F 13 、F 23 、F 33 、F 43 Then drawing the feature map F 13 、F 23 、F 33 、F 43 Connecting to obtain a final characteristic diagram, calculating loss, performing back propagation to update network parameters, and finishing training of the network;
and inputting the image to be detected into the trained network to obtain a detection result.
Further, the Mosaic data enhancement includes:
taking out 4 images, and splicing the images in a random scaling, random cutting and random arrangement mode;
the MixUp data enhancement comprises the following steps: the 2 images were superimposed together.
Further, the feature map F with four sizes 1 、F 2 、F 3 、F 4 Inputting the data into a feature fusion network BIAFPN to perform feature processing to obtain a fused feature map F 12 、F 22 、F 32 、F 42 Comprises that:
Will feature chart F 1 Directly inputting the data into a BIAFPN of a feature fusion network, firstly, performing convolution from top to bottom by 1 multiplied by 1, and upsampling the data and a feature map F 2 Performing adaptive feature fusion to obtain a feature map F 21 (ii) a Continuing to make the feature map F 21 After 1 × 1 convolution and up-sampling, the feature map F 3 Performing adaptive feature fusion to obtain a feature map F 31 (ii) a Then, the feature map F is processed 31 After 1 x 1 convolution and up-sampling, the feature map F 4 Performing adaptive feature fusion to obtain a feature map F 41 (ii) a Will feature chart F 41 Directly outputting to obtain a characteristic diagram F 42 (ii) a Then performing fusion of bottom to top and scale crossing to obtain F 42 After 1X 1 convolution and down sampling, F is compared with F before 3 And F 31 Fusing to obtain a feature map F 32 (ii) a F is to be 32 After 1X 1 convolution and down sampling, F is compared with F before 2 And F 21 Fusing to obtain a feature map F 22 (ii) a F is to be 22 After 1X 1 convolution and down sampling, F is compared with F before 1 And F 11 Fusing to obtain a feature map F 12 (ii) a After each feature fusion, a CBAM attention mechanism is performed to enhance spatial and channel information.
Further, the calculating the loss includes: class penalty, bounding box penalty and goal score penalty, which are BCELoss, bounding box penalty employs IOULoss.
According to the urban pedestrian flow small target detection method based on the convolutional neural network, a shallow layer feature layer with better fine-grained characteristics is introduced into the existing YOLOX technical scheme, and then the original PANet is replaced by the better BIAFPN to detect the target, so that the accuracy of urban portrait small target detection can be effectively improved.
Drawings
Fig. 1 is a flow chart of the urban pedestrian flow small target detection method based on the neural network.
Fig. 2 is a diagram of a neural network-based urban pedestrian flow small target detection network model.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The method for detecting the urban pedestrian flow small target based on the neural network mainly comprises the following steps: firstly, the image is subjected to data enhancement processing, then training is started in batches, and each batch obtains a feature map F by passing the image through a convolutional neural network 1 、F 2 、F 3 、F 4 Then, the obtained characteristic diagram is subjected to characteristic fusion through BIAFPN to obtain F 12 、F 22 、F 32 、F 42 And putting the characteristic graph into a prediction head for classification and regression to obtain a predicted value, comparing the predicted value with a true value of the image to calculate loss, performing back propagation to reduce loss after each batch of training is finished, and updating network parameters to finish the training of the network.
In one embodiment, as shown in fig. 1, a method for detecting a small urban traffic flow target based on a neural network is provided, including:
and step S1, acquiring an image training data set with a portrait small target detection box, and performing Mosaic data enhancement and MixUp data enhancement on the image training data set.
In the embodiment, the training data set is subjected to Mosaic data enhancement and MixUp data enhancement, wherein the Mosaic data enhancement is realized by taking out 4 images and splicing the images in a random zooming, random cutting and random arrangement mode. The MixUp data enhancement, i.e., superimposing 2 images together, can reduce the memory of the wrong tag to enhance robustness.
Step S2, adjusting the enhanced image training data set to the size of an input image, inputting the image training data set into the backbone network CSPDarknet-53, and acquiring the characteristics of four sizes output by the dark2 unit, the dark3 unit, the dark4 unit and the dark5 unit in the backbone network CSPDarknet-53FIG. F 1 、F 2 、F 3 、F 4
As shown in fig. 2, the present application uses the CSPDarknet-53 as a backbone network to perform feature extraction. The adopted CSPDarknet-53 is added with pre-training weight trained on COCO in advance, batch training is adopted, the batch processing size is 16 (namely each batch processes 16 pictures), the learning rate starts from 0.0025, a learning rate preheating method is not adopted, and the learning rate is updated by using a cosine annealing method.
Because the original picture is larger, the original picture is zoomed to 640 multiplied by 640 according to the equal proportion of the long side, the part with the short side less than 640 is filled with 0, the zoomed picture is input into a backbone network CSPDarknet-53, and after a series of operations such as convolution and the like, feature maps F with four sizes of 20 multiplied by 20,40 multiplied by 40,80 multiplied by 80 and 160 multiplied by 160 are output in sequence 1 、F 2 、F 3 、F 4
The size of the feature map is determined by the backbone network CSPDarknet-53 and will not be described herein. It should be noted that in YOLOX, only the features output by dark3, dark4, and dark5 are usually used for multi-scale fusion. In the embodiment, the characteristics output by the dark2, the dark3, the dark4 and the dark5 are adopted to perform multi-scale fusion operation, characteristic graphs of four sizes are output, shallow fine-grained information can be fused, small target detection is facilitated, and finally, one more pre-measuring head is correspondingly arranged, so that a better detection effect is achieved.
Step S3, feature map F with four sizes 1 、F 2 、F 3 、F 4 Inputting the data into a feature fusion network BIAFPN to perform feature processing to obtain a fused feature map F 12 、F 22 、F 32 、F 42
Will feature chart F 1 (the feature map output by the Dark5 in FIG. 2) is directly input into the feature fusion network BIAFPN, firstly from top to bottom, after 1 × 1 convolution, up-sampled and feature map F 2 (the feature map output by the Dark4 in FIG. 2, and so on) is subjected to adaptive feature fusion to obtain a feature map F 21 . Continuing to make the feature map F 21 After 1 × 1 convolution, onAfter sampling and characteristic diagram F 3 Performing adaptive feature fusion to obtain a feature map F 31 . Then, the feature map F is processed 31 After 1 x 1 convolution and up-sampling, the feature map F 4 Performing adaptive feature fusion to obtain a feature map F 41 A feature map F 41 Directly outputting to obtain a characteristic diagram F 42 . Then performing fusion of bottom to top and scale crossing to obtain F 42 After 1X 1 convolution and down sampling, F is compared with F before 3 And F 31 Fusing to obtain a feature map F 32 . F is to be 32 After 1X 1 convolution and down sampling, F is compared with F before 2 And F 21 Fusing to obtain a feature map F 22 . F is to be 22 After 1X 1 convolution and down sampling, F is compared with F before 1 And F 11 Fusing to obtain a feature map F 12 . After each feature fusion, a CBAM attention mechanism is performed to enhance spatial and channel information.
Specifically, 20 × 20 feature map F 1 Directly inputting the data into a top-down characteristic pyramid network BIAFPN, convolving the data into the same channel by 1 multiplied by 1, then upsampling the data to form a 40 multiplied by 40 characteristic diagram and a characteristic diagram F 2 Adaptive feature fusion SUM is made, and then the space and channel information are enhanced through a CBAM attention mechanism to obtain F 21 (40 × 40). Continue to make F 21 By 1 × 1 convolution, the feature map F is upsampled to 80 × 80 3 Adaptive feature fusion SUM is made, and then the space and channel information are enhanced through a CBAM attention mechanism to obtain F 31 (80X 80). F is to be 31 By 1 × 1 convolution, the feature map F is upsampled to 160 × 160 4 Adaptive feature fusion SUM is made, and then the space and channel information are enhanced through a CBAM attention mechanism to obtain F 41 (160X 160), mixing F 41 Directly outputting to obtain a characteristic diagram F 42 (160X 160). Followed by bottom-up and cross-scale fusion, F 41 The number of channels is converted into coincidence by 1 × 1 convolution, and then down-sampled to 80 × 80, and then matched with the feature map F 3 And F 31 Obtaining a feature map F by performing adaptive feature fusion SUM 32 (80X 80). F is to be 32 The number of channels is converted to unity by 1 x 1 convolution and then down-sampled toIs 40 × 40 histogram F 2 And F 21 Obtaining a feature map F by performing adaptive feature fusion SUM 22 (40 × 40). F is to be 22 The number of channels is converted into coincidence by 1 × 1 convolution, and then down-sampled to 20 × 20 and matched with the feature map F 1 And F 11 Performing adaptive feature fusion SUM to obtain a feature map F 12 (20X 20). To this end, output signatures for all four sizes were obtained: f 12 (20×20)、F 22 (40×40)、F 32 (80X 80) and F 42 (160×160)。
It should be noted that, in fig. 2, DWSConv, i.e., DW convolution and PW convolution, is further included between the adaptive feature fusion SUM and the CBAM attention mechanism, and is not described herein again. In the embodiment, BIAFPN is adopted to replace the original PANet, and BIAFPN feature fusion is a method for better fusing features by adding a cross-scale fusion and a CBAM attention mechanism (enhancing the features above the space and the channel) on the basis of the original bidirectional fusion.
Step S4, merging the feature map F 12 、F 22 、F 32 、F 42 Respectively transmitting the data into corresponding prediction heads, respectively performing convolution of classification branches and regression branches, then connecting the classification branches and the regression branches along a channel part, stretching the connected feature graph into one dimension to obtain a stretched feature graph F 13 、F 23 、F 33 、F 43 Then drawing the feature map F 13 、F 23 、F 33 、F 43 And connecting to obtain a final characteristic diagram, calculating loss, performing back propagation to update network parameters, and finishing training of the network.
In this embodiment, the convolution back edge channel portion of the Prediction header predictionclassification branch and the regression branch is connected to generate four new feature maps { WxHx[ (cls + reg + obj)]Xn, where W × H is the eigenmap size, cls is the detection class, reg is the prediction bounding box, obj is the goal score prediction, and N is the prediction anchor box number. Multiplying W by H, stretching the space dimension into one dimension, and obtaining a characteristic diagram F 13 、F 23 、F 33 、F 43 . Then F is put along W x H 13 、F 23 、F 33 、F 43 And connecting to obtain a final characteristic diagram F.
And finally, calculating classification loss, frame loss and target score loss, performing back propagation to reduce loss, and updating network parameters.
Specifically, F 12 、F 22 、F 32 、F 42 After convolution of the classification branch and the regression branch, each feature map generates 3 new feature maps F cls ∈{N×W×H×cls}、F obj ∈{N×W×H×1}、F reh E.g. N x W x H x 4, are connected along the channel portion to generate four new feature maps respectively { N x W x H x [ (cls + reg + obj)]The tensor of size, W, H ∈ {20,40,80,160 }. W is then multiplied by H, stretching the spatial dimension into one dimension resulting in four tensors of size { N × (cls + reg + obj) × (W × H) }. Then F is put along W x H 13 、F 23 、F 33 、F 43 And (4) performing connection to obtain a final characteristic diagram F epsilon { N X (cls + reg + obj) × 34000 }.
Where cls is the category in the dataset and reg prediction bounding box includes the predicted top left corner (x) 1 ,y 1 ) And the lower right corner point (x) 2 ,y 2 ) N is the number of anchor frames preset, which is 1 in this embodiment.
In the embodiment, the prediction head adopts a decoupling head mode, and the classification branch and the regression branch are separately subjected to convolution operation, so that a better detection effect can be achieved. And the prediction of each position is reduced from 3 to 1 through the connection operation, and the problem of imbalance of positive and negative samples is avoided by adopting a mode without an anchor frame.
Because the output characteristic value cannot be directly used for loss calculation, regression needs to be performed first to obtain an actual predicted value. And (3) performing classification loss, border loss and target score loss on the feature graph F according to the following formulas, wherein the classification loss and the target score loss are BCELoss, and the border loss is IOULoss, and the specific formulas are as follows:
BCELoss=-(ylog(p(x))+(1-y)log(1-p(x)))
Figure BDA0003660091050000061
it should be noted that the grid of the present application is disposed on the finally obtained feature map, is an abstract concept, and is intended to facilitate the frame regression calculation, and for the feature maps of 20 × 20,40 × 40,80 × 80, and 160 × 160, there are 20 × 20,40 × 40,80 × 80, and 160 × 160 grids, respectively, and the division of the feature map into multiple grids is a relatively mature technology in the art, and is not described herein again. Meanwhile, the CBAM attention mechanism is also a relatively mature technology in the field, and is not described herein again.
1. The classification Loss and the target score Loss were calculated using a Binary Cross Entropy Loss function (Binary Cross Entropy Loss):
BCELoss=-(ylog(p(x))+(1-y)log(1-p(x)))
where y represents whether it is a goal, the value is 1 or 0, and p (x) is the predicted desirability score.
2. Calculating frame loss, predicting frame information and calculating IOU (interaction of Union) according to the real frame information obtained by label calculation, wherein the IOU is the intersection ratio of the predicted frame and the real frame, and the predicted frame with a high IOU value is obtained by NMS post-processing:
Figure BDA0003660091050000071
wherein
Figure BDA0003660091050000072
A real box (ground route), B a prediction box,
Figure BDA0003660091050000073
the area where the real box and the prediction box intersect,
Figure BDA0003660091050000074
the area of the union of the real box and the predicted box, the lower the IOULoss value, the more accurate the prediction.
It should be noted that the calculation of the classification loss, the target score loss, and the frame loss is already a relatively mature technology in the art, and is not described herein again.
The loss between the predicted value and the true value is obtained, and the loss is reduced by carrying out back propagation before the end of each batch. And simultaneously updating the network parameters, starting the training of the next batch until the training of the training data of all batches is finished, finally obtaining the trained weight, and storing all the updated parameters in an output weight file.
And step S5, inputting the image to be detected into the trained network to obtain a detection result.
The image to be detected is similarly scaled to a 640 x 640 size input network, feature maps of four sizes are output through a CSPDarknet-53 main network, and a predicted value including a class cls, a frame reg and a target score obj is obtained after regression is performed on the feature values, so that a final predicted result is obtained.
The application also adopts a SimOTA positive and negative sample distribution strategy. Firstly, the prediction boxes are screened out, and only the central points of the prediction boxes are kept in the group route and in the square with the side length of 5. After the primary screening is finished, calculating the frame Loss of the prediction frame and the frame of the group, calculating the classification Loss by using the two-classification cross entropy, and calculating a cost matrix:
Figure BDA0003660091050000075
representing the cost relationship between each real box and each feature point. The first k fixed prediction frames with the smallest loss of the grountrith are used as positive samples, and the rest are used as negative samples, so that additional hyper-parameters are avoided.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (4)

1. A method for detecting urban pedestrian flow small targets based on a convolutional neural network is characterized by comprising the following steps:
acquiring an image training data set with a portrait small target detection frame, and performing Mosaic data enhancement and MixUp data enhancement on the image training data set;
adjusting the enhanced image training data set to be the input image size, inputting the input image training data set into the backbone network CSPDarknet-53, and acquiring a feature map F with four sizes output by a dark2 unit, a dark3 unit, a dark4 unit and a dark5 unit in the backbone network CSPDarknet-53 1 、F 2 、F 3 、F 4
Feature map F of four sizes 1 、F 2 、F 3 、F 4 Inputting the data into a feature fusion network BIAFPN to perform feature processing to obtain a fused feature map F 12 、F 22 、F 32 、F 42
The fused feature map F 12 、F 22 、F 32 、F 42 Respectively transmitting the data into corresponding prediction heads, respectively performing convolution of classification branches and regression branches, then connecting the classification branches and the regression branches along a channel part, stretching the connected feature graph into one dimension to obtain a stretched feature graph F 13 、F 23 、F 33 、F 43 Then drawing the feature map F 13 、F 23 、F 33 、F 43 Connecting to obtain a final characteristic diagram, calculating loss, performing back propagation to update network parameters, and finishing training of the network;
and inputting the image to be detected into the trained network to obtain a detection result.
2. The convolutional neural network-based urban traffic small target detection method according to claim 1, wherein the Mosaic data enhancement comprises:
taking out 4 images, and splicing the images in a random scaling, random cutting and random arrangement mode;
the MixUp data enhancement comprises the following steps: the 2 images were superimposed together.
3. The convolutional neural network-based urban pedestrian flow small target detection method as claimed in claim 1, wherein the feature map F with four sizes 1 、F 2 、F 3 、F 4 Inputting the data into a feature fusion network BIAFPN to perform feature processing to obtain a fused feature map F 12 、F 22 、F 32 、F 42 The method comprises the following steps:
will feature chart F 1 Directly inputting the data into a BIAFPN of a feature fusion network, firstly, performing convolution from top to bottom by 1 multiplied by 1, and upsampling the data and a feature map F 2 Performing adaptive feature fusion to obtain a feature map F 21 (ii) a Continuing to make the feature map F 21 After 1 × 1 convolution and up-sampling, the feature map F 3 Performing adaptive feature fusion to obtain a feature map F 31 (ii) a Then, the feature map F is processed 31 After 1 x 1 convolution and up-sampling, the feature map F 4 Performing adaptive feature fusion to obtain a feature map F 41 (ii) a Will feature chart F 41 Directly outputting to obtain a characteristic diagram F 42 (ii) a Then performing fusion of bottom to top and scale crossing to obtain F 42 After 1X 1 convolution and down sampling, F is compared with F before 3 And F 31 Fusing to obtain a feature map F 32 (ii) a F is to be 32 After 1X 1 convolution and down sampling, F is compared with F before 2 And F 21 Fusing to obtain a feature map F 22 (ii) a F is to be 22 After 1X 1 convolution and down sampling, F is compared with F before 1 And F 11 Fusing to obtain a feature map F 12 (ii) a After each feature fusion, a CBAM attention mechanism is performed to enhance spatial and channel information.
4. The convolutional neural network-based urban pedestrian flow small target detection method according to claim 1, wherein the calculating the loss comprises: class penalty, bounding box penalty and goal score penalty, which are BCELoss, bounding box penalty employs IOULoss.
CN202210574388.0A 2022-05-24 2022-05-24 Urban pedestrian flow small target detection method based on convolutional neural network Pending CN114821665A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210574388.0A CN114821665A (en) 2022-05-24 2022-05-24 Urban pedestrian flow small target detection method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210574388.0A CN114821665A (en) 2022-05-24 2022-05-24 Urban pedestrian flow small target detection method based on convolutional neural network

Publications (1)

Publication Number Publication Date
CN114821665A true CN114821665A (en) 2022-07-29

Family

ID=82517232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210574388.0A Pending CN114821665A (en) 2022-05-24 2022-05-24 Urban pedestrian flow small target detection method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN114821665A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035354A (en) * 2022-08-12 2022-09-09 江西省水利科学院 Reservoir water surface floater target detection method based on improved YOLOX
CN115063795A (en) * 2022-08-17 2022-09-16 西南民族大学 Urinary sediment classification detection method and device, electronic equipment and storage medium
CN115546187A (en) * 2022-10-28 2022-12-30 北京市农林科学院 Agricultural pest and disease detection method and device based on YOLO v5
CN115578631A (en) * 2022-11-15 2023-01-06 山东省人工智能研究院 Image tampering detection method based on multi-scale interaction and cross-feature contrast learning
CN115862833A (en) * 2023-02-16 2023-03-28 成都与睿创新科技有限公司 Detection system and method for instrument loss

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035354A (en) * 2022-08-12 2022-09-09 江西省水利科学院 Reservoir water surface floater target detection method based on improved YOLOX
CN115035354B (en) * 2022-08-12 2022-11-08 江西省水利科学院 Reservoir water surface floater target detection method based on improved YOLOX
CN115063795A (en) * 2022-08-17 2022-09-16 西南民族大学 Urinary sediment classification detection method and device, electronic equipment and storage medium
CN115063795B (en) * 2022-08-17 2023-01-24 西南民族大学 Urinary sediment classification detection method and device, electronic equipment and storage medium
CN115546187A (en) * 2022-10-28 2022-12-30 北京市农林科学院 Agricultural pest and disease detection method and device based on YOLO v5
CN115578631A (en) * 2022-11-15 2023-01-06 山东省人工智能研究院 Image tampering detection method based on multi-scale interaction and cross-feature contrast learning
CN115578631B (en) * 2022-11-15 2023-08-18 山东省人工智能研究院 Image tampering detection method based on multi-scale interaction and cross-feature contrast learning
CN115862833A (en) * 2023-02-16 2023-03-28 成都与睿创新科技有限公司 Detection system and method for instrument loss

Similar Documents

Publication Publication Date Title
CN109859190B (en) Target area detection method based on deep learning
CN114821665A (en) Urban pedestrian flow small target detection method based on convolutional neural network
CN111612008B (en) Image segmentation method based on convolution network
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
WO2021218786A1 (en) Data processing system, object detection method and apparatus thereof
CN111882620B (en) Road drivable area segmentation method based on multi-scale information
CN115861772A (en) Multi-scale single-stage target detection method based on RetinaNet
CN113076871A (en) Fish shoal automatic detection method based on target shielding compensation
CN110309765B (en) High-efficiency detection method for video moving target
CN115035361A (en) Target detection method and system based on attention mechanism and feature cross fusion
CN111553414A (en) In-vehicle lost object detection method based on improved Faster R-CNN
CN113313706A (en) Power equipment defect image detection method based on detection reference point offset analysis
CN110751005B (en) Pedestrian detection method integrating depth perception features and kernel extreme learning machine
CN113076972A (en) Two-stage Logo image detection method and system based on deep learning
CN112446292A (en) 2D image salient target detection method and system
CN114332921A (en) Pedestrian detection method based on improved clustering algorithm for Faster R-CNN network
CN111612802A (en) Re-optimization training method based on existing image semantic segmentation model and application
CN113269119B (en) Night vehicle detection method and device
CN117975218A (en) Small target detection method based on mixed attention and feature centralized multi-scale fusion
CN113920479A (en) Target detection network construction method, target detection device and electronic equipment
CN111582057B (en) Face verification method based on local receptive field
CN113514053B (en) Method and device for generating sample image pair and method for updating high-precision map
CN116912670A (en) Deep sea fish identification method based on improved YOLO model
Li YOLOV5-based traffic sign detection algorithm
CN112131996B (en) Road side image multi-scale pedestrian rapid detection method based on channel separation convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination