CN114821665A - Urban pedestrian flow small target detection method based on convolutional neural network - Google Patents
Urban pedestrian flow small target detection method based on convolutional neural network Download PDFInfo
- Publication number
- CN114821665A CN114821665A CN202210574388.0A CN202210574388A CN114821665A CN 114821665 A CN114821665 A CN 114821665A CN 202210574388 A CN202210574388 A CN 202210574388A CN 114821665 A CN114821665 A CN 114821665A
- Authority
- CN
- China
- Prior art keywords
- feature
- feature map
- network
- convolution
- fusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a city pedestrian flow small target detection method based on a convolutional neural network, which comprises the steps of conducting Mosaic data enhancement and MixUp data enhancement on an image training data set labeled with a portrait small target detection frame, adjusting the enhanced image training data set to be the size of an input picture, inputting the picture into a backbone network to obtain feature maps with four sizes of the backbone network, inputting the feature maps with four sizes into a feature fusion network BIAFPN for feature processing, respectively transmitting the fused feature maps into corresponding prediction heads, respectively conducting convolution of classification branches and regression branches, then connecting along a channel part, stretching the connected feature maps into one dimension, then connecting the stretched feature maps to obtain a final feature map, calculating loss, conducting back propagation and updating network parameters, and completing network training. The invention introduces shallow fine-grained characteristics and then adopts a characteristic fusion network to detect the target, thereby effectively improving the precision of detecting the urban portrait small target.
Description
Technical Field
The application belongs to the technical field of deep learning image processing, and particularly relates to a method for detecting urban pedestrian flow small targets based on a convolutional neural network.
Background
Target detection is a basic problem of machine vision, supports visual tasks such as instance segmentation, target tracking and action recognition, and is widely applied to the fields of automobile automatic driving, satellite images, monitoring and the like. Most of existing target detection algorithms adopt a method with an anchor frame, but the problem of imbalance of positive and negative samples is often caused, small target detection is more difficult, and how to improve small target detection precision is still a difficult problem of current detection.
The mainstream technical scheme for target detection at present comprises a one-stage algorithm and a two-stage algorithm. A two-stage mainstream algorithm such as FasterR-CNN series firstly screens a large number of candidate regions possibly having targets, and then detects the candidate regions. The mainstream algorithm of a stage, such as the YOLO series, directly completes the end-to-end prediction, and the model detection speed is faster, but the object detection precision is reduced to a certain extent.
Disclosure of Invention
The method aims to provide a convolutional neural network-based urban pedestrian flow small target detection method, a shallow information layer is introduced into the original YOLOX technical scheme to be multi-scale, and a better characteristic fusion mode BIAFPN is adopted, so that the problem of low urban portrait small target detection precision is solved.
In order to achieve the purpose, the technical scheme of the application is as follows:
a method for detecting urban pedestrian flow small targets based on a convolutional neural network comprises the following steps:
acquiring an image training data set with a portrait small target detection frame, and performing Mosaic data enhancement and MixUp data enhancement on the image training data set;
adjusting the enhanced image training data set to be the input image size, inputting the input image training data set into the backbone network CSPDarknet-53, and acquiring a feature map F with four sizes output by a dark2 unit, a dark3 unit, a dark4 unit and a dark5 unit in the backbone network CSPDarknet-53 1 、F 2 、F 3 、F 4 ;
Feature map F of four sizes 1 、F 2 、F 3 、F 4 Inputting the data into a feature fusion network BIAFPN to perform feature processing to obtain a fused feature map F 12 、F 22 、F 32 、F 42 ;
The fused feature map F 12 、F 22 、F 32 、F 42 Respectively transmitting the data into corresponding prediction heads, respectively performing convolution of classification branches and regression branches, then connecting the classification branches and the regression branches along a channel part, stretching the connected feature graph into one dimension to obtain a stretched feature graph F 13 、F 23 、F 33 、F 43 Then drawing the feature map F 13 、F 23 、F 33 、F 43 Connecting to obtain a final characteristic diagram, calculating loss, performing back propagation to update network parameters, and finishing training of the network;
and inputting the image to be detected into the trained network to obtain a detection result.
Further, the Mosaic data enhancement includes:
taking out 4 images, and splicing the images in a random scaling, random cutting and random arrangement mode;
the MixUp data enhancement comprises the following steps: the 2 images were superimposed together.
Further, the feature map F with four sizes 1 、F 2 、F 3 、F 4 Inputting the data into a feature fusion network BIAFPN to perform feature processing to obtain a fused feature map F 12 、F 22 、F 32 、F 42 Comprises that:
Will feature chart F 1 Directly inputting the data into a BIAFPN of a feature fusion network, firstly, performing convolution from top to bottom by 1 multiplied by 1, and upsampling the data and a feature map F 2 Performing adaptive feature fusion to obtain a feature map F 21 (ii) a Continuing to make the feature map F 21 After 1 × 1 convolution and up-sampling, the feature map F 3 Performing adaptive feature fusion to obtain a feature map F 31 (ii) a Then, the feature map F is processed 31 After 1 x 1 convolution and up-sampling, the feature map F 4 Performing adaptive feature fusion to obtain a feature map F 41 (ii) a Will feature chart F 41 Directly outputting to obtain a characteristic diagram F 42 (ii) a Then performing fusion of bottom to top and scale crossing to obtain F 42 After 1X 1 convolution and down sampling, F is compared with F before 3 And F 31 Fusing to obtain a feature map F 32 (ii) a F is to be 32 After 1X 1 convolution and down sampling, F is compared with F before 2 And F 21 Fusing to obtain a feature map F 22 (ii) a F is to be 22 After 1X 1 convolution and down sampling, F is compared with F before 1 And F 11 Fusing to obtain a feature map F 12 (ii) a After each feature fusion, a CBAM attention mechanism is performed to enhance spatial and channel information.
Further, the calculating the loss includes: class penalty, bounding box penalty and goal score penalty, which are BCELoss, bounding box penalty employs IOULoss.
According to the urban pedestrian flow small target detection method based on the convolutional neural network, a shallow layer feature layer with better fine-grained characteristics is introduced into the existing YOLOX technical scheme, and then the original PANet is replaced by the better BIAFPN to detect the target, so that the accuracy of urban portrait small target detection can be effectively improved.
Drawings
Fig. 1 is a flow chart of the urban pedestrian flow small target detection method based on the neural network.
Fig. 2 is a diagram of a neural network-based urban pedestrian flow small target detection network model.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The method for detecting the urban pedestrian flow small target based on the neural network mainly comprises the following steps: firstly, the image is subjected to data enhancement processing, then training is started in batches, and each batch obtains a feature map F by passing the image through a convolutional neural network 1 、F 2 、F 3 、F 4 Then, the obtained characteristic diagram is subjected to characteristic fusion through BIAFPN to obtain F 12 、F 22 、F 32 、F 42 And putting the characteristic graph into a prediction head for classification and regression to obtain a predicted value, comparing the predicted value with a true value of the image to calculate loss, performing back propagation to reduce loss after each batch of training is finished, and updating network parameters to finish the training of the network.
In one embodiment, as shown in fig. 1, a method for detecting a small urban traffic flow target based on a neural network is provided, including:
and step S1, acquiring an image training data set with a portrait small target detection box, and performing Mosaic data enhancement and MixUp data enhancement on the image training data set.
In the embodiment, the training data set is subjected to Mosaic data enhancement and MixUp data enhancement, wherein the Mosaic data enhancement is realized by taking out 4 images and splicing the images in a random zooming, random cutting and random arrangement mode. The MixUp data enhancement, i.e., superimposing 2 images together, can reduce the memory of the wrong tag to enhance robustness.
Step S2, adjusting the enhanced image training data set to the size of an input image, inputting the image training data set into the backbone network CSPDarknet-53, and acquiring the characteristics of four sizes output by the dark2 unit, the dark3 unit, the dark4 unit and the dark5 unit in the backbone network CSPDarknet-53FIG. F 1 、F 2 、F 3 、F 4 。
As shown in fig. 2, the present application uses the CSPDarknet-53 as a backbone network to perform feature extraction. The adopted CSPDarknet-53 is added with pre-training weight trained on COCO in advance, batch training is adopted, the batch processing size is 16 (namely each batch processes 16 pictures), the learning rate starts from 0.0025, a learning rate preheating method is not adopted, and the learning rate is updated by using a cosine annealing method.
Because the original picture is larger, the original picture is zoomed to 640 multiplied by 640 according to the equal proportion of the long side, the part with the short side less than 640 is filled with 0, the zoomed picture is input into a backbone network CSPDarknet-53, and after a series of operations such as convolution and the like, feature maps F with four sizes of 20 multiplied by 20,40 multiplied by 40,80 multiplied by 80 and 160 multiplied by 160 are output in sequence 1 、F 2 、F 3 、F 4 。
The size of the feature map is determined by the backbone network CSPDarknet-53 and will not be described herein. It should be noted that in YOLOX, only the features output by dark3, dark4, and dark5 are usually used for multi-scale fusion. In the embodiment, the characteristics output by the dark2, the dark3, the dark4 and the dark5 are adopted to perform multi-scale fusion operation, characteristic graphs of four sizes are output, shallow fine-grained information can be fused, small target detection is facilitated, and finally, one more pre-measuring head is correspondingly arranged, so that a better detection effect is achieved.
Step S3, feature map F with four sizes 1 、F 2 、F 3 、F 4 Inputting the data into a feature fusion network BIAFPN to perform feature processing to obtain a fused feature map F 12 、F 22 、F 32 、F 42 。
Will feature chart F 1 (the feature map output by the Dark5 in FIG. 2) is directly input into the feature fusion network BIAFPN, firstly from top to bottom, after 1 × 1 convolution, up-sampled and feature map F 2 (the feature map output by the Dark4 in FIG. 2, and so on) is subjected to adaptive feature fusion to obtain a feature map F 21 . Continuing to make the feature map F 21 After 1 × 1 convolution, onAfter sampling and characteristic diagram F 3 Performing adaptive feature fusion to obtain a feature map F 31 . Then, the feature map F is processed 31 After 1 x 1 convolution and up-sampling, the feature map F 4 Performing adaptive feature fusion to obtain a feature map F 41 A feature map F 41 Directly outputting to obtain a characteristic diagram F 42 . Then performing fusion of bottom to top and scale crossing to obtain F 42 After 1X 1 convolution and down sampling, F is compared with F before 3 And F 31 Fusing to obtain a feature map F 32 . F is to be 32 After 1X 1 convolution and down sampling, F is compared with F before 2 And F 21 Fusing to obtain a feature map F 22 . F is to be 22 After 1X 1 convolution and down sampling, F is compared with F before 1 And F 11 Fusing to obtain a feature map F 12 . After each feature fusion, a CBAM attention mechanism is performed to enhance spatial and channel information.
Specifically, 20 × 20 feature map F 1 Directly inputting the data into a top-down characteristic pyramid network BIAFPN, convolving the data into the same channel by 1 multiplied by 1, then upsampling the data to form a 40 multiplied by 40 characteristic diagram and a characteristic diagram F 2 Adaptive feature fusion SUM is made, and then the space and channel information are enhanced through a CBAM attention mechanism to obtain F 21 (40 × 40). Continue to make F 21 By 1 × 1 convolution, the feature map F is upsampled to 80 × 80 3 Adaptive feature fusion SUM is made, and then the space and channel information are enhanced through a CBAM attention mechanism to obtain F 31 (80X 80). F is to be 31 By 1 × 1 convolution, the feature map F is upsampled to 160 × 160 4 Adaptive feature fusion SUM is made, and then the space and channel information are enhanced through a CBAM attention mechanism to obtain F 41 (160X 160), mixing F 41 Directly outputting to obtain a characteristic diagram F 42 (160X 160). Followed by bottom-up and cross-scale fusion, F 41 The number of channels is converted into coincidence by 1 × 1 convolution, and then down-sampled to 80 × 80, and then matched with the feature map F 3 And F 31 Obtaining a feature map F by performing adaptive feature fusion SUM 32 (80X 80). F is to be 32 The number of channels is converted to unity by 1 x 1 convolution and then down-sampled toIs 40 × 40 histogram F 2 And F 21 Obtaining a feature map F by performing adaptive feature fusion SUM 22 (40 × 40). F is to be 22 The number of channels is converted into coincidence by 1 × 1 convolution, and then down-sampled to 20 × 20 and matched with the feature map F 1 And F 11 Performing adaptive feature fusion SUM to obtain a feature map F 12 (20X 20). To this end, output signatures for all four sizes were obtained: f 12 (20×20)、F 22 (40×40)、F 32 (80X 80) and F 42 (160×160)。
It should be noted that, in fig. 2, DWSConv, i.e., DW convolution and PW convolution, is further included between the adaptive feature fusion SUM and the CBAM attention mechanism, and is not described herein again. In the embodiment, BIAFPN is adopted to replace the original PANet, and BIAFPN feature fusion is a method for better fusing features by adding a cross-scale fusion and a CBAM attention mechanism (enhancing the features above the space and the channel) on the basis of the original bidirectional fusion.
Step S4, merging the feature map F 12 、F 22 、F 32 、F 42 Respectively transmitting the data into corresponding prediction heads, respectively performing convolution of classification branches and regression branches, then connecting the classification branches and the regression branches along a channel part, stretching the connected feature graph into one dimension to obtain a stretched feature graph F 13 、F 23 、F 33 、F 43 Then drawing the feature map F 13 、F 23 、F 33 、F 43 And connecting to obtain a final characteristic diagram, calculating loss, performing back propagation to update network parameters, and finishing training of the network.
In this embodiment, the convolution back edge channel portion of the Prediction header predictionclassification branch and the regression branch is connected to generate four new feature maps { WxHx[ (cls + reg + obj)]Xn, where W × H is the eigenmap size, cls is the detection class, reg is the prediction bounding box, obj is the goal score prediction, and N is the prediction anchor box number. Multiplying W by H, stretching the space dimension into one dimension, and obtaining a characteristic diagram F 13 、F 23 、F 33 、F 43 . Then F is put along W x H 13 、F 23 、F 33 、F 43 And connecting to obtain a final characteristic diagram F.
And finally, calculating classification loss, frame loss and target score loss, performing back propagation to reduce loss, and updating network parameters.
Specifically, F 12 、F 22 、F 32 、F 42 After convolution of the classification branch and the regression branch, each feature map generates 3 new feature maps F cls ∈{N×W×H×cls}、F obj ∈{N×W×H×1}、F reh E.g. N x W x H x 4, are connected along the channel portion to generate four new feature maps respectively { N x W x H x [ (cls + reg + obj)]The tensor of size, W, H ∈ {20,40,80,160 }. W is then multiplied by H, stretching the spatial dimension into one dimension resulting in four tensors of size { N × (cls + reg + obj) × (W × H) }. Then F is put along W x H 13 、F 23 、F 33 、F 43 And (4) performing connection to obtain a final characteristic diagram F epsilon { N X (cls + reg + obj) × 34000 }.
Where cls is the category in the dataset and reg prediction bounding box includes the predicted top left corner (x) 1 ,y 1 ) And the lower right corner point (x) 2 ,y 2 ) N is the number of anchor frames preset, which is 1 in this embodiment.
In the embodiment, the prediction head adopts a decoupling head mode, and the classification branch and the regression branch are separately subjected to convolution operation, so that a better detection effect can be achieved. And the prediction of each position is reduced from 3 to 1 through the connection operation, and the problem of imbalance of positive and negative samples is avoided by adopting a mode without an anchor frame.
Because the output characteristic value cannot be directly used for loss calculation, regression needs to be performed first to obtain an actual predicted value. And (3) performing classification loss, border loss and target score loss on the feature graph F according to the following formulas, wherein the classification loss and the target score loss are BCELoss, and the border loss is IOULoss, and the specific formulas are as follows:
BCELoss=-(ylog(p(x))+(1-y)log(1-p(x)))
it should be noted that the grid of the present application is disposed on the finally obtained feature map, is an abstract concept, and is intended to facilitate the frame regression calculation, and for the feature maps of 20 × 20,40 × 40,80 × 80, and 160 × 160, there are 20 × 20,40 × 40,80 × 80, and 160 × 160 grids, respectively, and the division of the feature map into multiple grids is a relatively mature technology in the art, and is not described herein again. Meanwhile, the CBAM attention mechanism is also a relatively mature technology in the field, and is not described herein again.
1. The classification Loss and the target score Loss were calculated using a Binary Cross Entropy Loss function (Binary Cross Entropy Loss):
BCELoss=-(ylog(p(x))+(1-y)log(1-p(x)))
where y represents whether it is a goal, the value is 1 or 0, and p (x) is the predicted desirability score.
2. Calculating frame loss, predicting frame information and calculating IOU (interaction of Union) according to the real frame information obtained by label calculation, wherein the IOU is the intersection ratio of the predicted frame and the real frame, and the predicted frame with a high IOU value is obtained by NMS post-processing:
whereinA real box (ground route), B a prediction box,the area where the real box and the prediction box intersect,the area of the union of the real box and the predicted box, the lower the IOULoss value, the more accurate the prediction.
It should be noted that the calculation of the classification loss, the target score loss, and the frame loss is already a relatively mature technology in the art, and is not described herein again.
The loss between the predicted value and the true value is obtained, and the loss is reduced by carrying out back propagation before the end of each batch. And simultaneously updating the network parameters, starting the training of the next batch until the training of the training data of all batches is finished, finally obtaining the trained weight, and storing all the updated parameters in an output weight file.
And step S5, inputting the image to be detected into the trained network to obtain a detection result.
The image to be detected is similarly scaled to a 640 x 640 size input network, feature maps of four sizes are output through a CSPDarknet-53 main network, and a predicted value including a class cls, a frame reg and a target score obj is obtained after regression is performed on the feature values, so that a final predicted result is obtained.
The application also adopts a SimOTA positive and negative sample distribution strategy. Firstly, the prediction boxes are screened out, and only the central points of the prediction boxes are kept in the group route and in the square with the side length of 5. After the primary screening is finished, calculating the frame Loss of the prediction frame and the frame of the group, calculating the classification Loss by using the two-classification cross entropy, and calculating a cost matrix:
representing the cost relationship between each real box and each feature point. The first k fixed prediction frames with the smallest loss of the grountrith are used as positive samples, and the rest are used as negative samples, so that additional hyper-parameters are avoided.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (4)
1. A method for detecting urban pedestrian flow small targets based on a convolutional neural network is characterized by comprising the following steps:
acquiring an image training data set with a portrait small target detection frame, and performing Mosaic data enhancement and MixUp data enhancement on the image training data set;
adjusting the enhanced image training data set to be the input image size, inputting the input image training data set into the backbone network CSPDarknet-53, and acquiring a feature map F with four sizes output by a dark2 unit, a dark3 unit, a dark4 unit and a dark5 unit in the backbone network CSPDarknet-53 1 、F 2 、F 3 、F 4 ;
Feature map F of four sizes 1 、F 2 、F 3 、F 4 Inputting the data into a feature fusion network BIAFPN to perform feature processing to obtain a fused feature map F 12 、F 22 、F 32 、F 42 ;
The fused feature map F 12 、F 22 、F 32 、F 42 Respectively transmitting the data into corresponding prediction heads, respectively performing convolution of classification branches and regression branches, then connecting the classification branches and the regression branches along a channel part, stretching the connected feature graph into one dimension to obtain a stretched feature graph F 13 、F 23 、F 33 、F 43 Then drawing the feature map F 13 、F 23 、F 33 、F 43 Connecting to obtain a final characteristic diagram, calculating loss, performing back propagation to update network parameters, and finishing training of the network;
and inputting the image to be detected into the trained network to obtain a detection result.
2. The convolutional neural network-based urban traffic small target detection method according to claim 1, wherein the Mosaic data enhancement comprises:
taking out 4 images, and splicing the images in a random scaling, random cutting and random arrangement mode;
the MixUp data enhancement comprises the following steps: the 2 images were superimposed together.
3. The convolutional neural network-based urban pedestrian flow small target detection method as claimed in claim 1, wherein the feature map F with four sizes 1 、F 2 、F 3 、F 4 Inputting the data into a feature fusion network BIAFPN to perform feature processing to obtain a fused feature map F 12 、F 22 、F 32 、F 42 The method comprises the following steps:
will feature chart F 1 Directly inputting the data into a BIAFPN of a feature fusion network, firstly, performing convolution from top to bottom by 1 multiplied by 1, and upsampling the data and a feature map F 2 Performing adaptive feature fusion to obtain a feature map F 21 (ii) a Continuing to make the feature map F 21 After 1 × 1 convolution and up-sampling, the feature map F 3 Performing adaptive feature fusion to obtain a feature map F 31 (ii) a Then, the feature map F is processed 31 After 1 x 1 convolution and up-sampling, the feature map F 4 Performing adaptive feature fusion to obtain a feature map F 41 (ii) a Will feature chart F 41 Directly outputting to obtain a characteristic diagram F 42 (ii) a Then performing fusion of bottom to top and scale crossing to obtain F 42 After 1X 1 convolution and down sampling, F is compared with F before 3 And F 31 Fusing to obtain a feature map F 32 (ii) a F is to be 32 After 1X 1 convolution and down sampling, F is compared with F before 2 And F 21 Fusing to obtain a feature map F 22 (ii) a F is to be 22 After 1X 1 convolution and down sampling, F is compared with F before 1 And F 11 Fusing to obtain a feature map F 12 (ii) a After each feature fusion, a CBAM attention mechanism is performed to enhance spatial and channel information.
4. The convolutional neural network-based urban pedestrian flow small target detection method according to claim 1, wherein the calculating the loss comprises: class penalty, bounding box penalty and goal score penalty, which are BCELoss, bounding box penalty employs IOULoss.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210574388.0A CN114821665A (en) | 2022-05-24 | 2022-05-24 | Urban pedestrian flow small target detection method based on convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210574388.0A CN114821665A (en) | 2022-05-24 | 2022-05-24 | Urban pedestrian flow small target detection method based on convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114821665A true CN114821665A (en) | 2022-07-29 |
Family
ID=82517232
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210574388.0A Pending CN114821665A (en) | 2022-05-24 | 2022-05-24 | Urban pedestrian flow small target detection method based on convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114821665A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115035354A (en) * | 2022-08-12 | 2022-09-09 | 江西省水利科学院 | Reservoir water surface floater target detection method based on improved YOLOX |
CN115063795A (en) * | 2022-08-17 | 2022-09-16 | 西南民族大学 | Urinary sediment classification detection method and device, electronic equipment and storage medium |
CN115546187A (en) * | 2022-10-28 | 2022-12-30 | 北京市农林科学院 | Agricultural pest and disease detection method and device based on YOLO v5 |
CN115578631A (en) * | 2022-11-15 | 2023-01-06 | 山东省人工智能研究院 | Image tampering detection method based on multi-scale interaction and cross-feature contrast learning |
CN115862833A (en) * | 2023-02-16 | 2023-03-28 | 成都与睿创新科技有限公司 | Detection system and method for instrument loss |
-
2022
- 2022-05-24 CN CN202210574388.0A patent/CN114821665A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115035354A (en) * | 2022-08-12 | 2022-09-09 | 江西省水利科学院 | Reservoir water surface floater target detection method based on improved YOLOX |
CN115035354B (en) * | 2022-08-12 | 2022-11-08 | 江西省水利科学院 | Reservoir water surface floater target detection method based on improved YOLOX |
CN115063795A (en) * | 2022-08-17 | 2022-09-16 | 西南民族大学 | Urinary sediment classification detection method and device, electronic equipment and storage medium |
CN115063795B (en) * | 2022-08-17 | 2023-01-24 | 西南民族大学 | Urinary sediment classification detection method and device, electronic equipment and storage medium |
CN115546187A (en) * | 2022-10-28 | 2022-12-30 | 北京市农林科学院 | Agricultural pest and disease detection method and device based on YOLO v5 |
CN115578631A (en) * | 2022-11-15 | 2023-01-06 | 山东省人工智能研究院 | Image tampering detection method based on multi-scale interaction and cross-feature contrast learning |
CN115578631B (en) * | 2022-11-15 | 2023-08-18 | 山东省人工智能研究院 | Image tampering detection method based on multi-scale interaction and cross-feature contrast learning |
CN115862833A (en) * | 2023-02-16 | 2023-03-28 | 成都与睿创新科技有限公司 | Detection system and method for instrument loss |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109859190B (en) | Target area detection method based on deep learning | |
CN114821665A (en) | Urban pedestrian flow small target detection method based on convolutional neural network | |
CN111612008B (en) | Image segmentation method based on convolution network | |
CN111027493B (en) | Pedestrian detection method based on deep learning multi-network soft fusion | |
WO2021218786A1 (en) | Data processing system, object detection method and apparatus thereof | |
CN111882620B (en) | Road drivable area segmentation method based on multi-scale information | |
CN115861772A (en) | Multi-scale single-stage target detection method based on RetinaNet | |
CN113076871A (en) | Fish shoal automatic detection method based on target shielding compensation | |
CN110309765B (en) | High-efficiency detection method for video moving target | |
CN115035361A (en) | Target detection method and system based on attention mechanism and feature cross fusion | |
CN111553414A (en) | In-vehicle lost object detection method based on improved Faster R-CNN | |
CN113313706A (en) | Power equipment defect image detection method based on detection reference point offset analysis | |
CN110751005B (en) | Pedestrian detection method integrating depth perception features and kernel extreme learning machine | |
CN113076972A (en) | Two-stage Logo image detection method and system based on deep learning | |
CN112446292A (en) | 2D image salient target detection method and system | |
CN114332921A (en) | Pedestrian detection method based on improved clustering algorithm for Faster R-CNN network | |
CN111612802A (en) | Re-optimization training method based on existing image semantic segmentation model and application | |
CN113269119B (en) | Night vehicle detection method and device | |
CN117975218A (en) | Small target detection method based on mixed attention and feature centralized multi-scale fusion | |
CN113920479A (en) | Target detection network construction method, target detection device and electronic equipment | |
CN111582057B (en) | Face verification method based on local receptive field | |
CN113514053B (en) | Method and device for generating sample image pair and method for updating high-precision map | |
CN116912670A (en) | Deep sea fish identification method based on improved YOLO model | |
Li | YOLOV5-based traffic sign detection algorithm | |
CN112131996B (en) | Road side image multi-scale pedestrian rapid detection method based on channel separation convolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |