CN117523612A - Dense pedestrian detection method based on Yolov5 network - Google Patents
Dense pedestrian detection method based on Yolov5 network Download PDFInfo
- Publication number
- CN117523612A CN117523612A CN202311540235.5A CN202311540235A CN117523612A CN 117523612 A CN117523612 A CN 117523612A CN 202311540235 A CN202311540235 A CN 202311540235A CN 117523612 A CN117523612 A CN 117523612A
- Authority
- CN
- China
- Prior art keywords
- detection
- network
- pedestrian
- dense
- dense pedestrian
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 71
- 238000000605 extraction Methods 0.000 claims abstract description 18
- 230000001629 suppression Effects 0.000 claims abstract description 7
- 230000002708 enhancing effect Effects 0.000 claims abstract description 5
- 238000000034 method Methods 0.000 claims description 11
- 230000000694 effects Effects 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 3
- 238000004445 quantitative analysis Methods 0.000 claims description 3
- 238000011158 quantitative evaluation Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 abstract description 10
- 238000005516 engineering process Methods 0.000 abstract description 5
- 238000012545 processing Methods 0.000 abstract description 5
- 238000013135 deep learning Methods 0.000 abstract description 3
- 238000012216 screening Methods 0.000 abstract description 3
- 238000012549 training Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 206010039203 Road traffic accident Diseases 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000036544 posture Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 210000004460 N cell Anatomy 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/54—Extraction of image or video features relating to texture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/62—Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of deep learning image processing, in particular to a dense pedestrian detection method based on a Yolov5 network, which comprises the following steps: s1, acquiring a data set and constructing a Yolov5 network model; s2, inputting a dense pedestrian data set picture, extracting picture features of dense pedestrian images by a trunk feature extraction network, and enhancing pedestrian features by a feature enhancement network; and S3, the detection head detects the characteristics output by the characteristic enhancement module, performs non-maximum suppression algorithm screening, and outputs a detection result. The invention adopts the Yolov 5-based network, can carry out pedestrian detection processing on dense pedestrian images, can solve the problems of false detection and missing detection of the existing image detection technology, adds deformable convolution into a model, is helpful for enhancing the detail and texture information of the images during feature extraction, and uses a DIOU loss function to pay more attention to separating dense pedestrian areas when the network carries out non-maximum suppression, thereby improving the accuracy rate of detected images.
Description
Technical Field
The invention relates to the technical field of deep learning image processing, in particular to a dense pedestrian detection method based on a Yolov5 network.
Background
Pedestrian detection is a special case of object detection and plays a vital role in specific applications, such as automatic driving automobiles, intelligent monitoring systems, robotics, advanced human-computer interaction and the like. In addition, pedestrian detection is also the basis of numerous research topics, such as target tracking, human body posture estimation, pedestrian searching, and the like. Specifically, according to incomplete statistics, the number of dead people in China is increased year by year due to frequent occurrence of traffic accidents, drivers are mainly responsible in the traffic accidents, and the most serious victim group in the accidents is pedestrians. The intelligent driving auxiliary system can successfully detect obstacles and pedestrians in front of the vehicle, and prompt a driver to avoid the obstacles in the running process of the vehicle, so that the probability of collision with the pedestrians is reduced. The video image acquisition device applied to the street and the alley can protect legal rights and interests all the time. Pedestrian detection means that a pedestrian in an image is identified, and a specific position where the pedestrian is located is framed with a specific rectangular frame.
At present, the pedestrian detection technology faces a plurality of challenges, and the requirement of detecting the image by the traditional pedestrian detection algorithm is difficult to meet. In recent years, computer hardware technology has rapidly developed, and continuous updating of Graphics Processing Units (GPUs) enables reliable hardware support for processing image data. Meanwhile, researchers propose a series of excellent target detection algorithms based on deep learning, and convolutional neural networks are used for image detection in the algorithms, so that the image detection is rapid and accurate. However, with the complexity of application scenes, the gestures and shielding situations of pedestrians in the images also tend to be diversified, so that the task of pedestrian detection is more challenging.
In recent years, various technical methods for dense pedestrian detection are proposed at home and abroad, but the dense pedestrian detection still has the following problems:
1. the difference in the appearance of pedestrians is large. Including viewing angle, attitude, apparel and attachment, illumination, imaging distance, etc. The appearance of pedestrian information seen from different viewing angles varies. Pedestrians in different postures have different appearance differences. The appearance is also greatly different due to the different wearing of clothes and wearing of clothes by a person, such as wearing of hats, scarf, luggage and the like. The difference in illumination environments also adds some complexity. The distance of the imaging distance directly influences the size of a pedestrian sample in the image, the size of a human body at a long distance is smaller, the size of a human body at a short distance is smaller, and the appearance difference is obvious;
2. sample occlusion problem. In many practical application scenarios, pedestrians are densely distributed, such as airports, stations and pedestrian areas are severely blocked, and inter-class blocking and intra-class blocking are blocked. The inter-class shielding is shielding among different objects, namely shielding between a pedestrian and other objects; an intra-class occlusion is an occlusion between the same objects, i.e., an occlusion that exists between a pedestrian and a pedestrian. The shielding problem brings greater challenges to pedestrian detection;
3. complex background. The environmental background in most real world is not single and unchanged, for example, the appearance, shape, color and texture of some objects in urban street view are similar to those of pedestrians, and also a sample image in daytime and a sample image at night also have completely different background environments, and the complex background can bring a certain influence to the pedestrian detection effect.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a dense pedestrian detection method based on a Yolov5 network, which solves the problems that the existing model has poor results on dense pedestrian images and the prior art has false detection.
(II) technical scheme
The invention adopts the following technical scheme for realizing the purposes:
a dense pedestrian detection method based on a Yolov5 network comprises the following steps:
s1, acquiring a data set and constructing a Yolov5 network model;
s2, inputting a dense pedestrian data set picture, extracting picture features of dense pedestrian images by a trunk feature extraction network, and enhancing pedestrian features by a feature enhancement network;
s3, the detection head detects the characteristics output by the characteristic enhancement module, screens a non-maximum suppression algorithm and outputs a detection result;
and S4, objectively judging the dense pedestrian detection pictures.
Further, the main feature extraction network in S2 extracts the picture features of the dense pedestrian image, and the C3 module in the main feature extraction network uses a deformable convolution module to replace a common convolution module:
the deformable convolution module and the normal convolution module may be expressed as:
because the position of the deformable convolution module after adding the offset is non-integer and does not correspond to the pixel point actually existing on the feature map, interpolation is needed to obtain the offset pixel value, and bilinear interpolation can be generally adopted, and the general convolution and the x of the deformable convolution are calculated through bilinear interpolation, and the formula is expressed as follows:
further, the loss function in S3 uses DIOU instead of GIOU,
determining the GIOU center point distance and penalty term R, the DIoU penalty can be expressed as:
p and G are respectively predicted box and real box, and the central point is P respectively 0 And G 0 C represents the maximum area of the two boxes, ρ represents the euclidean distance of the two center points and the distance between the two box diagonals of L.
Further, the criterion of the step S4 is to measure the difference of image detection of the dense population in terms of accuracy, recall and average accuracy, and quantitatively evaluate the image detection effect;
the quantitative evaluation is to adopt an accuracy rate P, a recall rate R and an average accuracy rate AP respectively,
performing quantitative analysis, wherein the precision P is defined as the accuracy of all detected targets, and can be expressed as:
the recall R-defined as the detection accuracy in all positive samples, can be expressed as:
the average precision AP, defined as the average of the precision at different recall rates, may be expressed as:
TP is positive samples and is predicted to be the number of positive classes; FP is negative sample forecast to be positive number; FN is the number of negative classes predicted by positive samples; TN is the negative sample prediction negative class number.
(III) beneficial effects
Compared with the prior art, the invention provides a dense pedestrian detection method based on a Yolov5 network, which has the following beneficial effects:
the invention adopts the Yolov5 network, can detect pedestrians on dense pedestrian images, can solve the problems of false detection and missing detection of the existing pedestrian detection technology, adds deformable convolution in the model, is helpful for extracting details and texture information of pedestrian characteristics during characteristic extraction, ensures that the network pays more attention to the part with more characteristic information during characteristic learning of pedestrians, and improves the precision of the detected images; the invention can play a great role in a plurality of fields requiring clear images, such as target tracking, image classification, target detection and the like.
Drawings
FIG. 1 is a schematic diagram of a Yolov 5-based network structure according to the present invention;
FIG. 2 is a schematic diagram of a deformable convolution structure employed in the present invention;
FIG. 3 is a schematic diagram of a DIOU module according to the present invention;
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples
As shown in fig. 1-3, a dense pedestrian detection method based on a Yolov5 network according to an embodiment of the present invention includes the following steps:
s1: acquiring a data set and constructing a Yolov5 model;
the dataset included a training set and a test set, the dataset was a crowing human dataset, there were 15000 images in the training set, 5000 images in the test set, and the validation set contained 4370 images. In the training and validation set, there are 470K instances, approximately 23 individuals per picture, and various occlusions. Each instance of the person with the bounding box, the region bounding box and the person's bounding box is visible in the person's head.
The Yolov5 algorithm mainly comprises an Input module (Input), a trunk feature extraction module (Backbone), a feature enhancement module (Neck) and a detection Head (Head).
The Yolov5 algorithm divides the input image into N x N cells that are used to detect objects whose center coordinates lie within the grid. The cell predicts P bounding boxes, each containing five pieces of information. Therefore, the final predicted value of the single-picture input model is a tensor of n×n (p× 5+C) (C is the number of classes), and the model predicts n×n×p bounding boxes altogether. When the Yolov5 algorithm is used for target detection, a confidence threshold (generally set to 0.5) is set by the algorithm, a frame with the confidence of the prediction boundary frame smaller than the threshold is firstly screened out, a prediction frame with relatively high confidence is reserved by a model after preliminary screening, then a plurality of prediction frames of the same target are filtered by the NMS (Non-Maximum Suppression) Non-maximum suppression algorithm, and the optimal prediction frame of the target is reserved. Due to various shielding and complicated background of dense scenes, screening can cause missed detection and false detection problems directly according to confidence. Based on the defects in the prior art, the invention aims to provide a dense pedestrian detection method based on Yolov5, so as to achieve the method with higher detection precision and greatly reduce false detection.
S2: inputting a dense pedestrian data set picture, extracting picture features of dense pedestrian images by a trunk feature extraction network, and enhancing pedestrian features by a feature enhancement network;
the main function of the input end is to preprocess the input picture, and the main preprocessing method comprises the steps of self-adaptive picture scaling, mosaic data enhancement and self-adaptive anchor frame calculation. Firstly, an input picture with any size is subjected to self-adaptive picture scaling, the picture is scaled to a fixed size according to the length-width ratio of the picture, and compared with common picture scaling, the self-adaptive picture scaling can adaptively add the least black edge to the original picture, so that the information redundancy caused by excessive black edges is effectively solved, and the detection speed is improved. And then, the obtained fixed-size pictures are subjected to Mosaic data enhancement, four pictures are randomly selected and spliced in a random scaling, random cutting and random arrangement mode, so that the data set is enriched, and meanwhile, the GPU calculation amount is reduced. Finally, self-adaptive anchor frame calculation is used for different data sets, and the proper initial anchor frame size is set in advance, so that more accurate positioning of targets in subsequent detection is facilitated.
The main function of the trunk feature extraction network is to extract the position information and semantic information of the target to be detected, and the main structure comprises Conv, C3 and SPPF. The Conv structure consists of standard convolution, BN, siLU activation functions. The C3 structure is composed of a standard convolution and a bottleneck module, and compared with layer-by-layer convolution, the C3 structure can effectively simplify a network, reduces the calculated amount while fully extracting the characteristics, and reduces the reasoning time of an algorithm. The SPPF is formed by the maximum pooling of three convolution kernels with the size of 5*5, and can fuse a plurality of features with different resolutions at the same time, so that more dense pedestrian effective information is obtained
The main function of the feature aggregation network is to fuse the position information and semantic information of the feature map, and the feature pyramid and path aggregation network (PathAggregationNetwork, PAN) structure is adopted. The FPN adopts a top-down structure, and rich semantic information contained in the deep feature map is transferred downwards through an up-sampling operation and is fused with the shallow feature map. The PAN adopts a bottom-up structure, and rich position information contained in the shallow feature map is transferred upwards and fused with the deep feature map through a downsampling operation. At the moment, the shallow feature map obtains semantic information transmitted by the deep feature map, the deep feature map obtains position information transmitted by the shallow feature map, multi-scale detection of the network is realized while the depth feature map information is fused, and generalization capability of the network is enhanced.
The improvement scheme is that a deformable convolution module is added in C3 modules in a trunk feature extraction network, the deformable convolution module is replaced in standard cone convolution modules in the last 3C 3 modules in the trunk feature extraction network, the capability of the trunk feature extraction network for extracting dense pedestrian features is improved, and interference of complex background on feature extraction is restrained.
The deformable convolution module (DCN) changes the position of the sampling point during the training phase by studying migration. As shown in fig. 2, in the conventional convolution operation, a feature map is divided into portions of the same size as a convolution kernel, and then the convolution operation is performed, and the positions of the portions on the feature map are determined. Because of its drawbacks, we feel that feature extraction is limited and cannot extract more background features. To solve this problem we have employed a Deformable Convolution (DCN) with geometric capabilities and a large acceptance domain. So that as many sample points as possible identify the target.
Further, p n Is p 0 Each offset within the convolution kernel range, whereas the deformable convolution introduces an offset, typically a fraction, for each point based on the conventional convolution, the offset being generated by convolving the input signature with another. The outputs of the general convolution and the deformable convolution are respectively:
further, since the position after adding the offset is not an integer and does not correspond to the pixel point actually existing on the feature map, interpolation is required to obtain the offset pixel value, and bilinear interpolation may be generally used, and the x of the convolution and the deformable convolution is generally calculated through bilinear interpolation. The formula is:
s3: the detection head detects the characteristics output by the characteristic enhancement module, screens a non-maximum suppression algorithm and outputs a detection result;
the main function of the detection layer is to detect on feature graphs with different scales through preset anchor frames, and the preset tracing frames generally use a non-maximum value inhibition method to finally obtain target classification and position information. The improvement is to replace the GIOU loss function with the DIOU loss function, thereby more efficiently separating dense features. The main part of the detection layer is the three-scale detector in the Head component, assuming that the picture size at the input end of the network is 640 x 640, and the sizes thereof are 80 x 80, 40 x 40, and 20 x 20 from top to bottom, respectively.
The IOU loss function formula is:
IoU(P,G)=(P∩G)/(P∪G)
further, the IOU is expressed in mathematical set language as the intersection of two regions divided by the union of two regions, however, ioU (P, G) =0 when the prediction block does not intersect with the true block. To solve this drawback, YOLOv5 uses GIOU and penalty term r= |c-P u g|/|c|. Even though the GIoU solves the problem of two boxes not intersecting. But when one box encapsulates another, the GIOU is degenerated to the IOU.
Further considering the center point distance and penalty term R, we elaborate on the DIoU penalty:
further P and G predict box and real box, respectively. The center points are respectively P 0 And G 0 C represents the maximum area of the two boxes, ρ represents the euclidean distance of the two center points and the distance between the two box diagonals of L.
S4: objectively distinguishing the dense pedestrian detection pictures;
further, the criterion of the step S4 is to measure the difference of image detection of the dense population in terms of accuracy, recall and average accuracy, and quantitatively evaluate the image detection effect;
the quantitative evaluation is to adopt an accuracy rate P (Precision), a Recall rate R (Recall) and an average accuracy rate AP (Average Precision) respectively
Performing quantitative analysis, wherein the precision P is defined as the accuracy of all detected targets, and can be expressed as:
the recall R-defined as the detection accuracy in all positive samples, can be expressed as:
the average precision AP, defined as the average of the precision at different recall rates, may be expressed as:
TP is positive samples and is predicted to be the number of positive classes; FP is negative sample forecast to be positive number; FN is the number of negative classes predicted by positive samples; TN is the negative sample prediction negative class number.
The invention adopts the Yolov5 network, can detect pedestrians on dense pedestrian images, can solve the problems of false detection and missing detection of the existing pedestrian detection technology, adds deformable convolution in the model, is helpful for extracting details and texture information of pedestrian characteristics during characteristic extraction, ensures that the network pays more attention to the part with more characteristic information during characteristic learning of pedestrians, and improves the precision of the detected images; the invention can play a great role in a plurality of fields requiring clear images, such as target tracking, image classification, target detection and the like.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (4)
1. The dense pedestrian detection method based on the Yolov5 network is characterized by comprising the following steps of:
s1, acquiring a data set and constructing a Yolov5 network model;
s2, inputting a dense pedestrian data set picture, extracting picture features of dense pedestrian images by a trunk feature extraction network, and enhancing pedestrian features by a feature enhancement network;
s3, the detection head detects the characteristics output by the characteristic enhancement module, screens a non-maximum suppression algorithm and outputs a detection result;
and S4, objectively judging the dense pedestrian detection pictures.
2. The method for dense pedestrian detection based on the Yolov5 network according to claim 1, wherein the method comprises the following steps: and in the step S2, the main feature extraction network extracts the picture features of the dense pedestrian images, and a deformable convolution module is used for replacing a common convolution module by a C3 module in the main feature extraction network:
the deformable convolution module and the normal convolution module may be expressed as:
because the position of the deformable convolution module after adding the offset is non-integer and does not correspond to the pixel point actually existing on the feature map, interpolation is needed to obtain the offset pixel value, and bilinear interpolation can be generally adopted, and the general convolution and the x of the deformable convolution are calculated through bilinear interpolation, and the formula is expressed as follows:
3. the method for dense pedestrian detection based on the Yolov5 network according to claim 1, wherein the method comprises the following steps: the loss function in S3 uses DIOU instead of GIOU,
determining the GIOU center point distance and penalty term R, the DIoU penalty can be expressed as:
p and G are respectively predicted box and real box, and the central point is P respectively 0 And G 0 C represents the maximum area of the two boxes, ρ represents the euclidean distance of the two center points and the distance between the two box diagonals of L.
4. The method for dense pedestrian detection based on the Yolov5 network according to claim 1, wherein the method comprises the following steps: the judgment standard of the step S4 is to measure the difference of the image detection of the dense population in terms of accuracy rate, recall rate and average accuracy rate, and quantitatively evaluate the image detection effect;
the quantitative evaluation is to adopt an accuracy rate P, a recall rate R and an average accuracy rate AP respectively,
performing quantitative analysis, wherein the precision P is defined as the accuracy of all detected targets, and can be expressed as:
the recall R-defined as the detection accuracy in all positive samples, can be expressed as:
the average precision AP, defined as the average of the precision at different recall rates, may be expressed as:
TP is positive samples and is predicted to be the number of positive classes; FP is negative sample forecast to be positive number; FN is the number of negative classes predicted by positive samples; TN is the negative sample prediction negative class number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311540235.5A CN117523612A (en) | 2023-11-20 | 2023-11-20 | Dense pedestrian detection method based on Yolov5 network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311540235.5A CN117523612A (en) | 2023-11-20 | 2023-11-20 | Dense pedestrian detection method based on Yolov5 network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117523612A true CN117523612A (en) | 2024-02-06 |
Family
ID=89743373
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311540235.5A Pending CN117523612A (en) | 2023-11-20 | 2023-11-20 | Dense pedestrian detection method based on Yolov5 network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117523612A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117893413A (en) * | 2024-03-15 | 2024-04-16 | 博创联动科技股份有限公司 | Vehicle-mounted terminal man-machine interaction method based on image enhancement |
-
2023
- 2023-11-20 CN CN202311540235.5A patent/CN117523612A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117893413A (en) * | 2024-03-15 | 2024-04-16 | 博创联动科技股份有限公司 | Vehicle-mounted terminal man-machine interaction method based on image enhancement |
CN117893413B (en) * | 2024-03-15 | 2024-06-11 | 博创联动科技股份有限公司 | Vehicle-mounted terminal man-machine interaction method based on image enhancement |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103824070B (en) | A kind of rapid pedestrian detection method based on computer vision | |
CN103020992B (en) | A kind of video image conspicuousness detection method based on motion color-associations | |
CN101980242B (en) | Human face discrimination method and system and public safety system | |
CN105260749B (en) | Real-time target detection method based on direction gradient binary pattern and soft cascade SVM | |
CN110929593A (en) | Real-time significance pedestrian detection method based on detail distinguishing and distinguishing | |
CN101470809A (en) | Moving object detection method based on expansion mixed gauss model | |
CN110119726A (en) | A kind of vehicle brand multi-angle recognition methods based on YOLOv3 model | |
CN102902983B (en) | A kind of taxi identification method based on support vector machine | |
CN106778540B (en) | Parking detection is accurately based on the parking event detecting method of background double layer | |
Cho et al. | Semantic segmentation with low light images by modified CycleGAN-based image enhancement | |
CN117523612A (en) | Dense pedestrian detection method based on Yolov5 network | |
Prasad et al. | HOG, LBP and SVM based traffic density estimation at intersection | |
CN103530640A (en) | Unlicensed vehicle detection method based on AdaBoost and SVM (support vector machine) | |
Dow et al. | A crosswalk pedestrian recognition system by using deep learning and zebra‐crossing recognition techniques | |
CN114049572A (en) | Detection method for identifying small target | |
Bush et al. | Static and dynamic pedestrian detection algorithm for visual based driver assistive system | |
CN105469054A (en) | Model construction method of normal behaviors and detection method of abnormal behaviors | |
CN116935361A (en) | Deep learning-based driver distraction behavior detection method | |
Ma et al. | AVS-YOLO: Object detection in aerial visual scene | |
Wang et al. | Multiscale traffic sign detection method in complex environment based on YOLOv4 | |
Manoharan et al. | Image processing-based framework for continuous lane recognition in mountainous roads for driver assistance system | |
CN111914606A (en) | Smoke detection method based on deep learning of time-space characteristics of transmissivity | |
Yao et al. | A real-time pedestrian counting system based on rgb-d | |
CN102682291B (en) | A kind of scene demographic method, device and system | |
CN117475353A (en) | Video-based abnormal smoke identification method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |