CN117523612A - Dense pedestrian detection method based on Yolov5 network - Google Patents

Dense pedestrian detection method based on Yolov5 network Download PDF

Info

Publication number
CN117523612A
CN117523612A CN202311540235.5A CN202311540235A CN117523612A CN 117523612 A CN117523612 A CN 117523612A CN 202311540235 A CN202311540235 A CN 202311540235A CN 117523612 A CN117523612 A CN 117523612A
Authority
CN
China
Prior art keywords
detection
network
pedestrian
dense
dense pedestrian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311540235.5A
Other languages
Chinese (zh)
Inventor
韦宇宁
胡奇
李鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun University of Science and Technology
Original Assignee
Changchun University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun University of Science and Technology filed Critical Changchun University of Science and Technology
Priority to CN202311540235.5A priority Critical patent/CN117523612A/en
Publication of CN117523612A publication Critical patent/CN117523612A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/54Extraction of image or video features relating to texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of deep learning image processing, in particular to a dense pedestrian detection method based on a Yolov5 network, which comprises the following steps: s1, acquiring a data set and constructing a Yolov5 network model; s2, inputting a dense pedestrian data set picture, extracting picture features of dense pedestrian images by a trunk feature extraction network, and enhancing pedestrian features by a feature enhancement network; and S3, the detection head detects the characteristics output by the characteristic enhancement module, performs non-maximum suppression algorithm screening, and outputs a detection result. The invention adopts the Yolov 5-based network, can carry out pedestrian detection processing on dense pedestrian images, can solve the problems of false detection and missing detection of the existing image detection technology, adds deformable convolution into a model, is helpful for enhancing the detail and texture information of the images during feature extraction, and uses a DIOU loss function to pay more attention to separating dense pedestrian areas when the network carries out non-maximum suppression, thereby improving the accuracy rate of detected images.

Description

Dense pedestrian detection method based on Yolov5 network
Technical Field
The invention relates to the technical field of deep learning image processing, in particular to a dense pedestrian detection method based on a Yolov5 network.
Background
Pedestrian detection is a special case of object detection and plays a vital role in specific applications, such as automatic driving automobiles, intelligent monitoring systems, robotics, advanced human-computer interaction and the like. In addition, pedestrian detection is also the basis of numerous research topics, such as target tracking, human body posture estimation, pedestrian searching, and the like. Specifically, according to incomplete statistics, the number of dead people in China is increased year by year due to frequent occurrence of traffic accidents, drivers are mainly responsible in the traffic accidents, and the most serious victim group in the accidents is pedestrians. The intelligent driving auxiliary system can successfully detect obstacles and pedestrians in front of the vehicle, and prompt a driver to avoid the obstacles in the running process of the vehicle, so that the probability of collision with the pedestrians is reduced. The video image acquisition device applied to the street and the alley can protect legal rights and interests all the time. Pedestrian detection means that a pedestrian in an image is identified, and a specific position where the pedestrian is located is framed with a specific rectangular frame.
At present, the pedestrian detection technology faces a plurality of challenges, and the requirement of detecting the image by the traditional pedestrian detection algorithm is difficult to meet. In recent years, computer hardware technology has rapidly developed, and continuous updating of Graphics Processing Units (GPUs) enables reliable hardware support for processing image data. Meanwhile, researchers propose a series of excellent target detection algorithms based on deep learning, and convolutional neural networks are used for image detection in the algorithms, so that the image detection is rapid and accurate. However, with the complexity of application scenes, the gestures and shielding situations of pedestrians in the images also tend to be diversified, so that the task of pedestrian detection is more challenging.
In recent years, various technical methods for dense pedestrian detection are proposed at home and abroad, but the dense pedestrian detection still has the following problems:
1. the difference in the appearance of pedestrians is large. Including viewing angle, attitude, apparel and attachment, illumination, imaging distance, etc. The appearance of pedestrian information seen from different viewing angles varies. Pedestrians in different postures have different appearance differences. The appearance is also greatly different due to the different wearing of clothes and wearing of clothes by a person, such as wearing of hats, scarf, luggage and the like. The difference in illumination environments also adds some complexity. The distance of the imaging distance directly influences the size of a pedestrian sample in the image, the size of a human body at a long distance is smaller, the size of a human body at a short distance is smaller, and the appearance difference is obvious;
2. sample occlusion problem. In many practical application scenarios, pedestrians are densely distributed, such as airports, stations and pedestrian areas are severely blocked, and inter-class blocking and intra-class blocking are blocked. The inter-class shielding is shielding among different objects, namely shielding between a pedestrian and other objects; an intra-class occlusion is an occlusion between the same objects, i.e., an occlusion that exists between a pedestrian and a pedestrian. The shielding problem brings greater challenges to pedestrian detection;
3. complex background. The environmental background in most real world is not single and unchanged, for example, the appearance, shape, color and texture of some objects in urban street view are similar to those of pedestrians, and also a sample image in daytime and a sample image at night also have completely different background environments, and the complex background can bring a certain influence to the pedestrian detection effect.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a dense pedestrian detection method based on a Yolov5 network, which solves the problems that the existing model has poor results on dense pedestrian images and the prior art has false detection.
(II) technical scheme
The invention adopts the following technical scheme for realizing the purposes:
a dense pedestrian detection method based on a Yolov5 network comprises the following steps:
s1, acquiring a data set and constructing a Yolov5 network model;
s2, inputting a dense pedestrian data set picture, extracting picture features of dense pedestrian images by a trunk feature extraction network, and enhancing pedestrian features by a feature enhancement network;
s3, the detection head detects the characteristics output by the characteristic enhancement module, screens a non-maximum suppression algorithm and outputs a detection result;
and S4, objectively judging the dense pedestrian detection pictures.
Further, the main feature extraction network in S2 extracts the picture features of the dense pedestrian image, and the C3 module in the main feature extraction network uses a deformable convolution module to replace a common convolution module:
the deformable convolution module and the normal convolution module may be expressed as:
because the position of the deformable convolution module after adding the offset is non-integer and does not correspond to the pixel point actually existing on the feature map, interpolation is needed to obtain the offset pixel value, and bilinear interpolation can be generally adopted, and the general convolution and the x of the deformable convolution are calculated through bilinear interpolation, and the formula is expressed as follows:
further, the loss function in S3 uses DIOU instead of GIOU,
determining the GIOU center point distance and penalty term R, the DIoU penalty can be expressed as:
p and G are respectively predicted box and real box, and the central point is P respectively 0 And G 0 C represents the maximum area of the two boxes, ρ represents the euclidean distance of the two center points and the distance between the two box diagonals of L.
Further, the criterion of the step S4 is to measure the difference of image detection of the dense population in terms of accuracy, recall and average accuracy, and quantitatively evaluate the image detection effect;
the quantitative evaluation is to adopt an accuracy rate P, a recall rate R and an average accuracy rate AP respectively,
performing quantitative analysis, wherein the precision P is defined as the accuracy of all detected targets, and can be expressed as:
the recall R-defined as the detection accuracy in all positive samples, can be expressed as:
the average precision AP, defined as the average of the precision at different recall rates, may be expressed as:
TP is positive samples and is predicted to be the number of positive classes; FP is negative sample forecast to be positive number; FN is the number of negative classes predicted by positive samples; TN is the negative sample prediction negative class number.
(III) beneficial effects
Compared with the prior art, the invention provides a dense pedestrian detection method based on a Yolov5 network, which has the following beneficial effects:
the invention adopts the Yolov5 network, can detect pedestrians on dense pedestrian images, can solve the problems of false detection and missing detection of the existing pedestrian detection technology, adds deformable convolution in the model, is helpful for extracting details and texture information of pedestrian characteristics during characteristic extraction, ensures that the network pays more attention to the part with more characteristic information during characteristic learning of pedestrians, and improves the precision of the detected images; the invention can play a great role in a plurality of fields requiring clear images, such as target tracking, image classification, target detection and the like.
Drawings
FIG. 1 is a schematic diagram of a Yolov 5-based network structure according to the present invention;
FIG. 2 is a schematic diagram of a deformable convolution structure employed in the present invention;
FIG. 3 is a schematic diagram of a DIOU module according to the present invention;
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples
As shown in fig. 1-3, a dense pedestrian detection method based on a Yolov5 network according to an embodiment of the present invention includes the following steps:
s1: acquiring a data set and constructing a Yolov5 model;
the dataset included a training set and a test set, the dataset was a crowing human dataset, there were 15000 images in the training set, 5000 images in the test set, and the validation set contained 4370 images. In the training and validation set, there are 470K instances, approximately 23 individuals per picture, and various occlusions. Each instance of the person with the bounding box, the region bounding box and the person's bounding box is visible in the person's head.
The Yolov5 algorithm mainly comprises an Input module (Input), a trunk feature extraction module (Backbone), a feature enhancement module (Neck) and a detection Head (Head).
The Yolov5 algorithm divides the input image into N x N cells that are used to detect objects whose center coordinates lie within the grid. The cell predicts P bounding boxes, each containing five pieces of information. Therefore, the final predicted value of the single-picture input model is a tensor of n×n (p× 5+C) (C is the number of classes), and the model predicts n×n×p bounding boxes altogether. When the Yolov5 algorithm is used for target detection, a confidence threshold (generally set to 0.5) is set by the algorithm, a frame with the confidence of the prediction boundary frame smaller than the threshold is firstly screened out, a prediction frame with relatively high confidence is reserved by a model after preliminary screening, then a plurality of prediction frames of the same target are filtered by the NMS (Non-Maximum Suppression) Non-maximum suppression algorithm, and the optimal prediction frame of the target is reserved. Due to various shielding and complicated background of dense scenes, screening can cause missed detection and false detection problems directly according to confidence. Based on the defects in the prior art, the invention aims to provide a dense pedestrian detection method based on Yolov5, so as to achieve the method with higher detection precision and greatly reduce false detection.
S2: inputting a dense pedestrian data set picture, extracting picture features of dense pedestrian images by a trunk feature extraction network, and enhancing pedestrian features by a feature enhancement network;
the main function of the input end is to preprocess the input picture, and the main preprocessing method comprises the steps of self-adaptive picture scaling, mosaic data enhancement and self-adaptive anchor frame calculation. Firstly, an input picture with any size is subjected to self-adaptive picture scaling, the picture is scaled to a fixed size according to the length-width ratio of the picture, and compared with common picture scaling, the self-adaptive picture scaling can adaptively add the least black edge to the original picture, so that the information redundancy caused by excessive black edges is effectively solved, and the detection speed is improved. And then, the obtained fixed-size pictures are subjected to Mosaic data enhancement, four pictures are randomly selected and spliced in a random scaling, random cutting and random arrangement mode, so that the data set is enriched, and meanwhile, the GPU calculation amount is reduced. Finally, self-adaptive anchor frame calculation is used for different data sets, and the proper initial anchor frame size is set in advance, so that more accurate positioning of targets in subsequent detection is facilitated.
The main function of the trunk feature extraction network is to extract the position information and semantic information of the target to be detected, and the main structure comprises Conv, C3 and SPPF. The Conv structure consists of standard convolution, BN, siLU activation functions. The C3 structure is composed of a standard convolution and a bottleneck module, and compared with layer-by-layer convolution, the C3 structure can effectively simplify a network, reduces the calculated amount while fully extracting the characteristics, and reduces the reasoning time of an algorithm. The SPPF is formed by the maximum pooling of three convolution kernels with the size of 5*5, and can fuse a plurality of features with different resolutions at the same time, so that more dense pedestrian effective information is obtained
The main function of the feature aggregation network is to fuse the position information and semantic information of the feature map, and the feature pyramid and path aggregation network (PathAggregationNetwork, PAN) structure is adopted. The FPN adopts a top-down structure, and rich semantic information contained in the deep feature map is transferred downwards through an up-sampling operation and is fused with the shallow feature map. The PAN adopts a bottom-up structure, and rich position information contained in the shallow feature map is transferred upwards and fused with the deep feature map through a downsampling operation. At the moment, the shallow feature map obtains semantic information transmitted by the deep feature map, the deep feature map obtains position information transmitted by the shallow feature map, multi-scale detection of the network is realized while the depth feature map information is fused, and generalization capability of the network is enhanced.
The improvement scheme is that a deformable convolution module is added in C3 modules in a trunk feature extraction network, the deformable convolution module is replaced in standard cone convolution modules in the last 3C 3 modules in the trunk feature extraction network, the capability of the trunk feature extraction network for extracting dense pedestrian features is improved, and interference of complex background on feature extraction is restrained.
The deformable convolution module (DCN) changes the position of the sampling point during the training phase by studying migration. As shown in fig. 2, in the conventional convolution operation, a feature map is divided into portions of the same size as a convolution kernel, and then the convolution operation is performed, and the positions of the portions on the feature map are determined. Because of its drawbacks, we feel that feature extraction is limited and cannot extract more background features. To solve this problem we have employed a Deformable Convolution (DCN) with geometric capabilities and a large acceptance domain. So that as many sample points as possible identify the target.
Further, p n Is p 0 Each offset within the convolution kernel range, whereas the deformable convolution introduces an offset, typically a fraction, for each point based on the conventional convolution, the offset being generated by convolving the input signature with another. The outputs of the general convolution and the deformable convolution are respectively:
further, since the position after adding the offset is not an integer and does not correspond to the pixel point actually existing on the feature map, interpolation is required to obtain the offset pixel value, and bilinear interpolation may be generally used, and the x of the convolution and the deformable convolution is generally calculated through bilinear interpolation. The formula is:
s3: the detection head detects the characteristics output by the characteristic enhancement module, screens a non-maximum suppression algorithm and outputs a detection result;
the main function of the detection layer is to detect on feature graphs with different scales through preset anchor frames, and the preset tracing frames generally use a non-maximum value inhibition method to finally obtain target classification and position information. The improvement is to replace the GIOU loss function with the DIOU loss function, thereby more efficiently separating dense features. The main part of the detection layer is the three-scale detector in the Head component, assuming that the picture size at the input end of the network is 640 x 640, and the sizes thereof are 80 x 80, 40 x 40, and 20 x 20 from top to bottom, respectively.
The IOU loss function formula is:
IoU(P,G)=(P∩G)/(P∪G)
further, the IOU is expressed in mathematical set language as the intersection of two regions divided by the union of two regions, however, ioU (P, G) =0 when the prediction block does not intersect with the true block. To solve this drawback, YOLOv5 uses GIOU and penalty term r= |c-P u g|/|c|. Even though the GIoU solves the problem of two boxes not intersecting. But when one box encapsulates another, the GIOU is degenerated to the IOU.
Further considering the center point distance and penalty term R, we elaborate on the DIoU penalty:
further P and G predict box and real box, respectively. The center points are respectively P 0 And G 0 C represents the maximum area of the two boxes, ρ represents the euclidean distance of the two center points and the distance between the two box diagonals of L.
S4: objectively distinguishing the dense pedestrian detection pictures;
further, the criterion of the step S4 is to measure the difference of image detection of the dense population in terms of accuracy, recall and average accuracy, and quantitatively evaluate the image detection effect;
the quantitative evaluation is to adopt an accuracy rate P (Precision), a Recall rate R (Recall) and an average accuracy rate AP (Average Precision) respectively
Performing quantitative analysis, wherein the precision P is defined as the accuracy of all detected targets, and can be expressed as:
the recall R-defined as the detection accuracy in all positive samples, can be expressed as:
the average precision AP, defined as the average of the precision at different recall rates, may be expressed as:
TP is positive samples and is predicted to be the number of positive classes; FP is negative sample forecast to be positive number; FN is the number of negative classes predicted by positive samples; TN is the negative sample prediction negative class number.
The invention adopts the Yolov5 network, can detect pedestrians on dense pedestrian images, can solve the problems of false detection and missing detection of the existing pedestrian detection technology, adds deformable convolution in the model, is helpful for extracting details and texture information of pedestrian characteristics during characteristic extraction, ensures that the network pays more attention to the part with more characteristic information during characteristic learning of pedestrians, and improves the precision of the detected images; the invention can play a great role in a plurality of fields requiring clear images, such as target tracking, image classification, target detection and the like.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (4)

1. The dense pedestrian detection method based on the Yolov5 network is characterized by comprising the following steps of:
s1, acquiring a data set and constructing a Yolov5 network model;
s2, inputting a dense pedestrian data set picture, extracting picture features of dense pedestrian images by a trunk feature extraction network, and enhancing pedestrian features by a feature enhancement network;
s3, the detection head detects the characteristics output by the characteristic enhancement module, screens a non-maximum suppression algorithm and outputs a detection result;
and S4, objectively judging the dense pedestrian detection pictures.
2. The method for dense pedestrian detection based on the Yolov5 network according to claim 1, wherein the method comprises the following steps: and in the step S2, the main feature extraction network extracts the picture features of the dense pedestrian images, and a deformable convolution module is used for replacing a common convolution module by a C3 module in the main feature extraction network:
the deformable convolution module and the normal convolution module may be expressed as:
because the position of the deformable convolution module after adding the offset is non-integer and does not correspond to the pixel point actually existing on the feature map, interpolation is needed to obtain the offset pixel value, and bilinear interpolation can be generally adopted, and the general convolution and the x of the deformable convolution are calculated through bilinear interpolation, and the formula is expressed as follows:
3. the method for dense pedestrian detection based on the Yolov5 network according to claim 1, wherein the method comprises the following steps: the loss function in S3 uses DIOU instead of GIOU,
determining the GIOU center point distance and penalty term R, the DIoU penalty can be expressed as:
p and G are respectively predicted box and real box, and the central point is P respectively 0 And G 0 C represents the maximum area of the two boxes, ρ represents the euclidean distance of the two center points and the distance between the two box diagonals of L.
4. The method for dense pedestrian detection based on the Yolov5 network according to claim 1, wherein the method comprises the following steps: the judgment standard of the step S4 is to measure the difference of the image detection of the dense population in terms of accuracy rate, recall rate and average accuracy rate, and quantitatively evaluate the image detection effect;
the quantitative evaluation is to adopt an accuracy rate P, a recall rate R and an average accuracy rate AP respectively,
performing quantitative analysis, wherein the precision P is defined as the accuracy of all detected targets, and can be expressed as:
the recall R-defined as the detection accuracy in all positive samples, can be expressed as:
the average precision AP, defined as the average of the precision at different recall rates, may be expressed as:
TP is positive samples and is predicted to be the number of positive classes; FP is negative sample forecast to be positive number; FN is the number of negative classes predicted by positive samples; TN is the negative sample prediction negative class number.
CN202311540235.5A 2023-11-20 2023-11-20 Dense pedestrian detection method based on Yolov5 network Pending CN117523612A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311540235.5A CN117523612A (en) 2023-11-20 2023-11-20 Dense pedestrian detection method based on Yolov5 network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311540235.5A CN117523612A (en) 2023-11-20 2023-11-20 Dense pedestrian detection method based on Yolov5 network

Publications (1)

Publication Number Publication Date
CN117523612A true CN117523612A (en) 2024-02-06

Family

ID=89743373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311540235.5A Pending CN117523612A (en) 2023-11-20 2023-11-20 Dense pedestrian detection method based on Yolov5 network

Country Status (1)

Country Link
CN (1) CN117523612A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117893413A (en) * 2024-03-15 2024-04-16 博创联动科技股份有限公司 Vehicle-mounted terminal man-machine interaction method based on image enhancement

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117893413A (en) * 2024-03-15 2024-04-16 博创联动科技股份有限公司 Vehicle-mounted terminal man-machine interaction method based on image enhancement
CN117893413B (en) * 2024-03-15 2024-06-11 博创联动科技股份有限公司 Vehicle-mounted terminal man-machine interaction method based on image enhancement

Similar Documents

Publication Publication Date Title
CN103824070B (en) A kind of rapid pedestrian detection method based on computer vision
CN103020992B (en) A kind of video image conspicuousness detection method based on motion color-associations
CN101980242B (en) Human face discrimination method and system and public safety system
CN105260749B (en) Real-time target detection method based on direction gradient binary pattern and soft cascade SVM
CN110929593A (en) Real-time significance pedestrian detection method based on detail distinguishing and distinguishing
CN101470809A (en) Moving object detection method based on expansion mixed gauss model
CN110119726A (en) A kind of vehicle brand multi-angle recognition methods based on YOLOv3 model
CN102902983B (en) A kind of taxi identification method based on support vector machine
CN106778540B (en) Parking detection is accurately based on the parking event detecting method of background double layer
Cho et al. Semantic segmentation with low light images by modified CycleGAN-based image enhancement
CN117523612A (en) Dense pedestrian detection method based on Yolov5 network
Prasad et al. HOG, LBP and SVM based traffic density estimation at intersection
CN103530640A (en) Unlicensed vehicle detection method based on AdaBoost and SVM (support vector machine)
Dow et al. A crosswalk pedestrian recognition system by using deep learning and zebra‐crossing recognition techniques
CN114049572A (en) Detection method for identifying small target
Bush et al. Static and dynamic pedestrian detection algorithm for visual based driver assistive system
CN105469054A (en) Model construction method of normal behaviors and detection method of abnormal behaviors
CN116935361A (en) Deep learning-based driver distraction behavior detection method
Ma et al. AVS-YOLO: Object detection in aerial visual scene
Wang et al. Multiscale traffic sign detection method in complex environment based on YOLOv4
Manoharan et al. Image processing-based framework for continuous lane recognition in mountainous roads for driver assistance system
CN111914606A (en) Smoke detection method based on deep learning of time-space characteristics of transmissivity
Yao et al. A real-time pedestrian counting system based on rgb-d
CN102682291B (en) A kind of scene demographic method, device and system
CN117475353A (en) Video-based abnormal smoke identification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination