CN111046787A - Pedestrian detection method based on improved YOLO v3 model - Google Patents

Pedestrian detection method based on improved YOLO v3 model Download PDF

Info

Publication number
CN111046787A
CN111046787A CN201911257993.XA CN201911257993A CN111046787A CN 111046787 A CN111046787 A CN 111046787A CN 201911257993 A CN201911257993 A CN 201911257993A CN 111046787 A CN111046787 A CN 111046787A
Authority
CN
China
Prior art keywords
model
target
yolo
grid
pedestrian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911257993.XA
Other languages
Chinese (zh)
Inventor
陈健
黄德天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaqiao University
Original Assignee
Huaqiao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaqiao University filed Critical Huaqiao University
Priority to CN201911257993.XA priority Critical patent/CN111046787A/en
Publication of CN111046787A publication Critical patent/CN111046787A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a pedestrian detection method based on an improved YOLO v3 model, which comprises the following steps: selecting a training sample; carrying out K-means value clustering calculation on the sample to obtain a new anchors value, and replacing the data set parameter in the original YOLO v3 model with the new anchors value; introducing an acceptance module, and performing cutting optimization on the acceptance module to obtain an improved YOLO v3 model; detecting the pedestrian by using the improved YOLO v3 model to obtain a detection result; the problem that the characteristics extracted by the original YOLO v3 model are too single is solved, and the pedestrian detection precision is improved.

Description

Pedestrian detection method based on improved YOLO v3 model
Technical Field
The invention relates to a pedestrian detection method based on a neural network, in particular to a pedestrian detection method based on an improved YOLOv3 model.
Background
Pedestrian detects the branch in the target detection field, can embody from many aspects to pedestrian detection technology's urgent need, like wisdom traffic, security protection video monitoring, autopilot technique etc.. In the early days, due to the limitation of computer hardware conditions, the form of pedestrian detection is mainly based on images, and only the requirement of detecting whether pedestrians exist in the images is met. Nowadays, with the rapid development of microelectronic technology and computer technology, the technology is required to be capable of detecting pedestrians in a simple background environment, and also required to be capable of accurately detecting pedestrians even though strong interference factors exist in an external environment, such as a strong light environment, a weak light environment, shielding and the like; meanwhile, the detection form is not limited to images any more, real-time detection is required, and functions such as tracking, behavior recognition and the like are required to be added on the basis of detection. In addition, with the rapid development of deep learning in recent years, more and more deep learning models are beginning to be widely applied to various technologies of computer vision. For example, various related technologies ranging from ubiquitous license plate recognition, pedestrian detection, to advanced driver assistance. Compared with the traditional pedestrian detection method, the pedestrian detection method based on the convolutional neural network greatly improves the detection precision and speed; however, the existing YOLO v3 model extracts too single features, so that the accuracy in recognition is not high.
Disclosure of Invention
The invention aims to provide a pedestrian detection method based on an improved YOLO v3 model, which solves the problem that the characteristics extracted by the original YOLO v3 model are too single and improves the pedestrian detection precision.
In a first aspect, the present invention provides a pedestrian detection method based on an improved YOLO v3 model, including:
step 1, selecting a training sample;
step 2, carrying out K-means value clustering calculation on the sample to obtain a new anchors value, and replacing the data set parameter in the original YOLO v3 model with the new anchors value;
step 3, introducing an acceptance module, and performing cutting optimization on the acceptance module to obtain an improved YOLO v3 model;
and 4, detecting the pedestrian by using the improved YOLO v3 model to obtain a detection result.
2. The pedestrian detection method based on the improved YOLO v3 model of claim 1, wherein: the step 1 is further specifically as follows: pedestrian images in the public data sets pascal voc2007 and pascalloc 2012 are respectively extracted from the public data sets, and training samples are selected according to the proportion that the training set and the testing set are 2: 1.
Further, the step 2 is further specifically:
using a non-linear mapping of θ to convert the sample xi(i ═ 1,2, …, l) is mapped into the high-dimensional space G, i.e. the samples are θ (x)1),θ(x2),...,θ(xi);
Performing K-means clustering operation in high-dimensional space to optimize function
Figure BDA0002310844440000021
Wherein, the sample mean value mkThis can be derived from the following formula:
Figure BDA0002310844440000022
in the nuclear space, the nuclear distance of two characteristic points is calculated
Figure BDA0002310844440000023
Where N is a kernel function.
Merging all the sample subsets obtained by clustering, wherein the merged set of the sample subsets comprises K target categories, and calculating the mean values of the K target categories respectively
Figure BDA0002310844440000024
Wherein n isiData volume, x, representing the categoryiRepresents the mean of the i-th class.
Calculating the distance between any two class means
I=|xi-xj|2(5)
If the distance between the mean values of the two target categories is smaller than a preset threshold value, combining the two target categories into one category; and then continuing to calculate the mean-like distance by the formula (5). Merging the union sets of the sample subsets to obtain a final clustering result;
and calculating the anchors values which are consistent with the model by using the finally generated clustering results, and replacing the data set parameters in the original YOLOv3 model by the new anchors values.
Further, the step 3 is further specifically: introducing an acceptance module, then cutting and optimizing the acceptance module, wherein the cut acceptance module is mainly formed by combining a 3 × 3 convolution layer and a 5 × 5 convolution layer, meanwhile, for the 5 × 5 convolution layer, continuously using two 3 × 3 convolution layers to replace the convolution layer, merging two paths of output results of different receptive fields by using a route layer in a YOLOv3 model, combining the output results into one output layer, and transmitting the output layer to the next convolution network for further feature extraction operation; and putting the clipped initiation module into the original YOLOv3 model to obtain an improved YOLOv3 model.
Further, the step 4 is further specifically:
step 4 a: partitioning an image to be detected; when a model is input, the size of an image is adjusted in a self-adaptive mode, the image is adjusted to be square, and then grids with the size of N x N are used for blocking;
and 4 b: when the center point of a certain target exists in the blocked grid, the grid is responsible for carrying out classification judgment and position detection on the target, and the following operations are carried out:
when the central point of a certain target falls into N-N grids which are divided, the grids generate B prediction frames to detect the target, namely each grid has B boundary frames which are generated by the predictions of anchors and confidence coefficient CS which indicates whether the grid contains the target or not, so that the possibility of the target existing in the boundary frame based on the current model and the accuracy of the predicted target position are comprehensively reflected
Figure BDA0002310844440000031
Wherein Pr (object) indicates whether the center point of the object is contained in the mesh, and if so, is 1; on the contrary, the number is 0,
Figure BDA0002310844440000032
the intersection ratio is used for representing the intersection ratio of the bounding box generated by grid prediction and the real bounding box area of the object;
generating B predicted boundary boxes for each grid, and detecting the target in the grid, wherein each boundary prediction box comprises5 parameters [ x, y, w, h, confidence [ ]],[x,y]Represents the coordinates of the center point of the target within the grid, [ w, h ]]Representing the width and height of the predicted boundary box, while confidence represents the intersection ratio of the predicted boundary box and the real boundary box of the object, and each grid corresponds to a predicted value C for predicting whether a certain type of target condition is containediThe expression is as follows,
Ci=Pr(Classi|Object) (7)
and 4 c: each grid obtained in step 4b contains 5 parameters, using vector yiSpecifically, the following is shown:
yi=[bx,by,bw,bh,c](8)
wherein (b)x,by) Coordinates representing the center point of the object, (b)w,bh) Representing the width and height of a bounding box generated by the network for the target prediction, and c representing the total confidence score of the prediction box;
and 4 d: and after the N-by-N grids are predicted, sorting and summarizing the parameters of all the grids, and outputting the detection result of the whole image.
One or more technical solutions provided in the embodiments of the present invention have at least the following technical effects or advantages:
the embodiment of the application provides an improved pedestrian detection method based on improved YOLOv 3. Firstly, a model training data set is manufactured, and a new anchors value is calculated through K value clustering to replace the original YOLOv3 data set parameter; then, improving the YOLOv3 model by blending different sizes of receptive fields, and cutting the size of the receptive fields of the improved model, namely introducing a tiny-interception module to further enrich the extracted characteristics and reduce the number of network layers; and finally, training by adopting the improved YOLOv3 model, and applying the trained model to a pedestrian detection scene to realize pedestrian detection. The method provided by the invention improves the accuracy and robustness of the pedestrian detection algorithm and obtains better detection effect in the aspects of subjective vision and objective evaluation indexes.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
The invention will be further described with reference to the following examples with reference to the accompanying drawings.
FIG. 1 is a schematic block diagram of the system of the present invention;
fig. 2 is a schematic diagram of the mini network of the present invention replacing 5 × 5 convolutions.
FIG. 3a is a graph I of the original model test result.
FIG. 3b is a graph I of the detection result of the model of the present invention.
FIG. 4a is a graph II of the results of the original model test.
FIG. 4b is a second graph of the test results of the model of the present invention.
FIG. 5a is a third diagram of the test results of the original model.
FIG. 5b is a third graph of the detection result of the model of the present invention.
FIG. 6a is a diagram of the test results of the original model.
FIG. 6b is a graph IV of the test results of the model of the present invention.
Detailed Description
As shown in fig. 1, the present invention provides a pedestrian detection method based on an improved YOLO v3 model, comprising:
step 1, selecting a training sample;
step 2, carrying out K-means value clustering calculation on the sample to obtain a new anchors value, and replacing the data set parameter in the original YOLO v3 model with the new anchors value;
step 3, introducing an acceptance module, and performing cutting optimization on the acceptance module to obtain an improved YOLO v3 model;
and 4, detecting the pedestrian by using the improved YOLO v3 model to obtain a detection result.
The step 1 is further specifically as follows: pedestrian images in the public data sets pascal voc2007 and pascal voc2012 are respectively extracted from the public data sets, and training samples are selected according to the proportion that the training set to the testing set is 2: 1.
The step 2 is further specifically as follows:
using a non-linear mapping of θ to convert the sample xi(i ═ 1,2, …, l) is mapped into the high-dimensional space G, i.e. the samples are θ (x)1),θ(x2),...,θ(xi);
Performing K-means clustering operation in high-dimensional space to optimize function
Figure BDA0002310844440000061
Wherein, the sample mean value mkThis can be derived from the following formula:
Figure BDA0002310844440000062
in the nuclear space, the nuclear distance of two characteristic points is calculated
Figure BDA0002310844440000063
Where N is a kernel function.
Merging all the sample subsets obtained by clustering, wherein the merged set of the sample subsets comprises K target categories, and calculating the mean values of the K target categories respectively
Figure BDA0002310844440000064
Wherein n isiData volume, x, representing the categoryiRepresents the mean of the i-th class.
Calculating the distance between any two class means
I=|xi-xj|2(5)
If the distance between the mean values of the two target categories is smaller than a preset threshold value, combining the two target categories into one category; and then continuing to calculate the mean-like distance by the formula (5). Merging the union sets of the sample subsets to obtain a final clustering result;
and calculating the anchors values which are consistent with the model by using the finally generated clustering results, and replacing the data set parameters in the original YOLOv3 model by the new anchors values.
The step 3 is further specifically as follows: introducing an acceptance module, then cutting and optimizing the acceptance module, wherein the cut acceptance module is mainly formed by combining a 3 × 3 convolution layer and a 5 × 5 convolution layer, meanwhile, for the 5 × 5 convolution layer, continuously using two 3 × 3 convolution layers to replace the convolution layer, merging output results of two different sense fields by using a route layer in a YOLO v3 model (the two different sense fields are the independent 3 × 3 convolution layer and the improved two 3 × 3 convolution layers), and combining the output results into one output layer to be transmitted to the next convolution network for further feature extraction operation; and putting the clipped initiation module into the original YOLO v3 model to obtain an improved YOLO v3 model.
The step 4 is further specifically as follows:
step 4 a: partitioning an image to be detected; when a model is input, the size of an image is adjusted in a self-adaptive mode, the image is adjusted to be square, and then grids with the size of N x N are used for blocking;
and 4 b: when the center point of a certain target exists in the blocked grid, the grid is responsible for carrying out classification judgment and position detection on the target, and the following operations are carried out:
when the central point of a certain target falls into N-N grids which are divided, the grids generate B prediction frames to detect the target, namely each grid has B boundary frames which are generated by the predictions of anchors and confidence coefficient CS which indicates whether the grid contains the target or not, so that the possibility of the target existing in the boundary frame based on the current model and the accuracy of the predicted target position are comprehensively reflected
Figure BDA0002310844440000071
Wherein Pr (object) indicates whether the center point of the object is contained in the mesh, and if so, is 1; on the contrary, the number is 0,
Figure BDA0002310844440000072
the intersection ratio is used for representing the intersection ratio of the bounding box generated by grid prediction and the real bounding box area of the object;
b predicted boundary boxes are generated by each grid to detect the target in the grid, wherein each boundary prediction box comprises 5 parameters [ x, y, w, h, confidence ]],[x,y]Represents the coordinates of the center point of the target within the grid, [ w, h ]]Representing the width and height of the predicted boundary box, while confidence represents the intersection ratio of the predicted boundary box and the real boundary box of the object, and each grid corresponds to a predicted value C for predicting whether a certain type of target condition is containediThe expression is as follows,
Ci=Pr(Classi|Object) (7)
and 4 c: each grid obtained in step 4b contains 5 parameters, using vector yiSpecifically, the following is shown:
yi=[bx,by,bw,bh,c](8)
wherein (b)x,by) Coordinates representing the center point of the object, (b)w,bh) Representing the width and height of a bounding box generated by the network for the target prediction, and c representing the total confidence score of the prediction box;
and 4 d: and after the N-by-N grids are predicted, sorting and summarizing the parameters of all the grids, and outputting the detection result of the whole image.
One specific embodiment of the present invention:
in order to further explain the technical scheme of the invention, the invention is explained in detail by the specific embodiment.
Inputting: the image containing the pedestrian to be recognized is a sample image containing the pedestrian for learning.
1. Sample data preparation
Pedestrian images in the public data sets pascalloc 2007 and pascalloc 2012 are respectively extracted from the public data sets, and 2094 images are extracted to train the data sets: test set 2:1, respectively recording the training set and the test set as
Figure BDA0002310844440000081
And
Figure BDA0002310844440000082
wherein TR represents a training set, TE represents a test set, x represents an input sample image, y represents a label corresponding to the sample image, and N represents the number of data sets.
2. Performing clustering calculation on the K-means sample to obtain a new anchors value
Reading the width and height of the pedestrian as data to be classified according to the TR training set in the step 1, initializing a clustering center point, wherein the coordinate of the clustering center point describes the width and height of a rectangular frame, calculating the IOU value of the clustering center point and the rectangular frame described by the data to be classified respectively, and finally obtaining 9 groups of anchors by taking the distance of 1-IOU as a classification basis, wherein the coordinates of the prediction center point, the width and height of an anchor frame and the prediction target class are included.
Figure BDA0002310844440000083
Where S represents the area of the rectangular frame.
3 improving the original YOLO v3 network and introducing a tiny-interception module
3.1 introduction of multiple size convolution kernels
An initiation module is introduced into the original YOLO v3 framework to train the set TR to construct new model data.
(1) The input and output relationship is
Figure BDA0002310844440000091
Wherein X is an abstract feature or a hierarchical feature extracted from an input sample,
Figure BDA0002310844440000092
for the feature extraction function, x is the input image, W is the convolution kernel, b is the offset value, Y is the predicted value of the sample image, and θ is the softmax logistic regression parameter。
(2) The classifier parameters are
Figure BDA0002310844440000093
Where T is the classifier coefficient and P represents the probability that the sample prediction result is k.
The final output class is y ═ argmax { y (k) } (4)
(3) Constructing an objective function by using cross entropy on a training set
Figure BDA0002310844440000094
Where R (W) and R (θ) are regularization terms for thinning out parameters and preventing overfitting, λ1And λ2Representing sparse coefficients.
(4) Optimization formula after updating parameters
Figure BDA0002310844440000095
Wherein J (W, b; theta) represents the objective function constructed by using the cross entropy in (3). The parameters of the above formula are consistent with the classifier.
3.2 clipping acceptance Module
In the convolutional layer, one 5 × 5 convolutional kernel is replaced by two 3 × 3 convolutional kernels in succession. The calculation resource consumption is large by using the convolution kernel of 5 × 5, 25/9 times of calculation consumption is needed compared with the convolution kernel of 3 × 3, and the calculation amount saved after improvement is 1- (9+ 9)/25-7/25-28%. As shown in fig. 2, the improved acceptance module combines two different reception fields, and then combines the two inputs to form input data of the next layer by using the YOLO v3 network route layer.
4 pedestrian detection
4.1 image adaptive adjustment and partitioning
The size of the input image with any size is firstly adjusted to 224X 224 in an adaptive mode, and then the image is divided by using a grid with the size of N X N to be used as a new input image
Figure BDA0002310844440000101
The image resolution of the input image domain is 224 x 224 and the output image y e 0,1,2, …,1396]。
4.2 pedestrian prediction
After the image to be recognized passes through an improved LOYO v3 network to complete the prediction of N-by-N grids, the position of the target pedestrian is judged according to the network output result, the coordinate value and the confidence score of the rectangular frame formed by the target pedestrian are output, finally, the parameters of all the grids are sorted and summarized, and the detection result of the whole image is output.
5 simulation experiment
The effects of the present invention can be further illustrated by the following simulation experiments. In the experiment, in order to ensure the objectivity of the experiment, images are derived from various pedestrian images extracted from public data sets pascal voc2007 and pascal voc2012, wherein 1396 images are taken as training samples and include various images containing people, and the rest 698 images are taken as test sample sets. The experiment will be compared to the original YOLO v3 model algorithm.
In order to quantitatively evaluate the superiority of the improved algorithm in performance, the experiment was trained and tested using the same data set, and finally the test results were analyzed.
TABLE-comprehensive Property test results
Figure BDA0002310844440000102
From the data in table one, it can be seen that the improved algorithm has a small improvement in accuracy, recall, and average IOU values over the original YOLO v3 model.
TABLE two AP value to compare
Figure BDA0002310844440000112
The pedestrian samples in the test set are tested, and the comparison result of the AP (average precision) values of the pedestrian categories is shown in the table two. An improved network architecture may be demonstrated with improved detection performance compared to the original network. In addition, when the improved model is used for detecting the pedestrian image, as shown in fig. 3a to 6b, the improved model is improved to a certain extent in the aspects of the integrity, the accuracy, the missing detection condition, the false detection condition and the like of the prediction frame. The improved YOLO model uses the anchor scale aiming at the pedestrian data set (passacal voc2007+ passacal voc2012), and in addition, richer layer structures are obtained by applying different scale feature extraction methods, test results show that the precision ratio and the recall ratio respectively reach 79% and 74%, and are slightly improved compared with the original YOLO v3, and meanwhile, the new model has the performance improvement of 1.72% aiming at the AP value of pedestrians.
Although specific embodiments of the invention have been described above, it will be understood by those skilled in the art that the specific embodiments described are illustrative only and are not limiting upon the scope of the invention, and that equivalent modifications and variations can be made by those skilled in the art without departing from the spirit of the invention, which is to be limited only by the appended claims.

Claims (5)

1. A pedestrian detection method based on an improved YOLO v3 model is characterized by comprising the following steps: the method comprises the following steps:
step 1, selecting a training sample;
step 2, carrying out K-means value clustering calculation on the sample to obtain a new anchors value, and replacing the data set parameter in the original YOLO v3 model with the new anchors value;
step 3, introducing an acceptance module, and performing cutting optimization on the acceptance module to obtain an improved YOLO v3 model;
and 4, detecting the pedestrian by using the improved YOLO v3 model to obtain a detection result.
2. The pedestrian detection method based on the improved YOLO v3 model of claim 1, wherein: the step 1 is further specifically as follows: pedestrian images in the public data sets pascal voc2007 and pascal voc2012 are respectively extracted from the public data sets, and training samples are selected according to the proportion that the training set to the testing set is 2: 1.
3. The pedestrian detection method based on the improved YOLO v3 model of claim 1, wherein: the step 2 is further specifically as follows:
using a non-linear mapping of θ to convert the sample xi(i ═ 1,2, …, l) is mapped into the high-dimensional space G, i.e. the samples are θ (x)1),θ(x2),...,θ(xi);
Performing K-means clustering operation in high-dimensional space to optimize function
Figure FDA0002310844430000011
Wherein, the sample mean value mkThis can be derived from the following formula:
Figure FDA0002310844430000012
in the nuclear space, the nuclear distance of two characteristic points is calculated
Figure FDA0002310844430000013
Where N is a kernel function.
Merging all the sample subsets obtained by clustering, wherein the merged set of the sample subsets comprises K target categories, and calculating the mean values of the K target categories respectively
Figure FDA0002310844430000021
Wherein n isiData volume, x, representing the categoryiRepresents the mean of the i-th class.
Calculating the distance between any two class means
I=|xi-xj|2(5)
If the distance between the mean values of the two target categories is smaller than a preset threshold value, combining the two target categories into one category; and then continuing to calculate the mean-like distance by the formula (5). Merging the union sets of the sample subsets to obtain a final clustering result;
and calculating the anchors values which are consistent with the model by using the finally generated clustering results, and replacing the data set parameters in the original YOLO v3 model by the new anchors values.
4. The pedestrian detection method based on the improved YOLO v3 model of claim 1, wherein: the step 3 is further specifically as follows: introducing an acceptance module, then cutting and optimizing the acceptance module, wherein the cut acceptance module is mainly formed by combining a 3 × 3 convolution layer and a 5 × 5 convolution layer, meanwhile, for the 5 × 5 convolution layer, continuously using two 3 × 3 convolution layers to replace the convolution layer, merging two paths of output results of different receptive fields by a route layer in a YOLO v3 model, combining the output results into one output layer, and transmitting the output layer to a next convolution network for further feature extraction operation; and putting the clipped initiation module into the original YOLO v3 model to obtain an improved YOLO v3 model.
5. The pedestrian detection method based on the improved YOLO v3 model of claim 1, wherein: the step 4 is further specifically as follows:
step 4 a: partitioning an image to be detected; when a model is input, the size of an image is adjusted in a self-adaptive mode, the image is adjusted to be square, and then grids with the size of N x N are used for blocking;
and 4 b: when the center point of a certain target exists in the blocked grid, the grid is responsible for carrying out classification judgment and position detection on the target, and the following operations are carried out:
when the central point of a certain target falls into N-N grids which are divided, the grids generate B prediction frames to detect the target, namely each grid has B boundary frames which are generated by the predictions of anchors and confidence coefficient CS which indicates whether the grid contains the target or not, so that the possibility of the target existing in the boundary frame based on the current model and the accuracy of the predicted target position are comprehensively reflected
Figure FDA0002310844430000031
Wherein Pr (object) indicates whether the center point of the object is contained in the mesh, and if so, is 1; on the contrary, the number is 0,
Figure FDA0002310844430000032
the intersection ratio is used for representing the intersection ratio of the bounding box generated by grid prediction and the real bounding box area of the object;
b predicted boundary boxes are generated by each grid to detect the target in the grid, wherein each boundary prediction box comprises 5 parameters [ x, y, w, h, confidence ]],[x,y]Represents the coordinates of the center point of the target within the grid, [ w, h ]]Representing the width and height of the predicted boundary box, while confidence represents the intersection ratio of the predicted boundary box and the real boundary box of the object, and each grid corresponds to a predicted value C for predicting whether a certain type of target condition is containediThe expression is as follows,
Ci=Pr(Classi|Object) (7)
and 4 c: each grid obtained in step 4b contains 5 parameters, using vector yiSpecifically, the following is shown:
yi=[bx,by,bw,bh,c](8)
wherein (b)x,by) Coordinates representing the center point of the object, (b)w,bh) Representing the width and height of a bounding box generated by the network for the target prediction, and c representing the total confidence score of the prediction box;
and 4 d: and after the N-by-N grids are predicted, sorting and summarizing the parameters of all the grids, and outputting the detection result of the whole image.
CN201911257993.XA 2019-12-10 2019-12-10 Pedestrian detection method based on improved YOLO v3 model Pending CN111046787A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911257993.XA CN111046787A (en) 2019-12-10 2019-12-10 Pedestrian detection method based on improved YOLO v3 model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911257993.XA CN111046787A (en) 2019-12-10 2019-12-10 Pedestrian detection method based on improved YOLO v3 model

Publications (1)

Publication Number Publication Date
CN111046787A true CN111046787A (en) 2020-04-21

Family

ID=70235386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911257993.XA Pending CN111046787A (en) 2019-12-10 2019-12-10 Pedestrian detection method based on improved YOLO v3 model

Country Status (1)

Country Link
CN (1) CN111046787A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814749A (en) * 2020-08-12 2020-10-23 Oppo广东移动通信有限公司 Human body feature point screening method and device, electronic equipment and storage medium
CN111950500A (en) * 2020-08-21 2020-11-17 成都睿芯行科技有限公司 Real-time pedestrian detection method based on improved YOLOv3-tiny in factory environment
CN112347938A (en) * 2020-11-09 2021-02-09 南京机电职业技术学院 People stream detection method based on improved YOLOv3
CN112418212A (en) * 2020-08-28 2021-02-26 西安电子科技大学 Improved YOLOv3 algorithm based on EIoU
CN112560682A (en) * 2020-12-16 2021-03-26 重庆守愚科技有限公司 Valve automatic detection method based on deep learning
CN112598056A (en) * 2020-12-21 2021-04-02 北京工业大学 Software identification method based on screen monitoring
CN112633299A (en) * 2020-12-30 2021-04-09 深圳市优必选科技股份有限公司 Target detection method, network, device, terminal equipment and storage medium
CN113011390A (en) * 2021-04-23 2021-06-22 电子科技大学 Road pedestrian small target detection method based on image partition
CN113158897A (en) * 2021-04-21 2021-07-23 新疆大学 Pedestrian detection system based on embedded YOLOv3 algorithm
CN113609895A (en) * 2021-06-22 2021-11-05 上海中安电子信息科技有限公司 Road traffic information acquisition method based on improved Yolov3
CN113935410A (en) * 2021-10-13 2022-01-14 甘肃同兴智能科技发展有限责任公司 Electric power customer portrait method based on cross-correlation density clustering

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934121A (en) * 2019-02-21 2019-06-25 江苏大学 A kind of orchard pedestrian detection method based on YOLOv3 algorithm
CN110059554A (en) * 2019-03-13 2019-07-26 重庆邮电大学 A kind of multiple branch circuit object detection method based on traffic scene
CN110276247A (en) * 2019-05-09 2019-09-24 南京航空航天大学 A kind of driving detection method based on YOLOv3-Tiny

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934121A (en) * 2019-02-21 2019-06-25 江苏大学 A kind of orchard pedestrian detection method based on YOLOv3 algorithm
CN110059554A (en) * 2019-03-13 2019-07-26 重庆邮电大学 A kind of multiple branch circuit object detection method based on traffic scene
CN110276247A (en) * 2019-05-09 2019-09-24 南京航空航天大学 A kind of driving detection method based on YOLOv3-Tiny

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
葛雯 等: "改进YOLOV3算法在行人识别中的应用", 《计算机工程与应用》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814749A (en) * 2020-08-12 2020-10-23 Oppo广东移动通信有限公司 Human body feature point screening method and device, electronic equipment and storage medium
CN111950500A (en) * 2020-08-21 2020-11-17 成都睿芯行科技有限公司 Real-time pedestrian detection method based on improved YOLOv3-tiny in factory environment
CN112418212A (en) * 2020-08-28 2021-02-26 西安电子科技大学 Improved YOLOv3 algorithm based on EIoU
CN112418212B (en) * 2020-08-28 2024-02-09 西安电子科技大学 YOLOv3 algorithm based on EIoU improvement
CN112347938B (en) * 2020-11-09 2023-09-26 南京机电职业技术学院 People stream detection method based on improved YOLOv3
CN112347938A (en) * 2020-11-09 2021-02-09 南京机电职业技术学院 People stream detection method based on improved YOLOv3
CN112560682A (en) * 2020-12-16 2021-03-26 重庆守愚科技有限公司 Valve automatic detection method based on deep learning
CN112598056A (en) * 2020-12-21 2021-04-02 北京工业大学 Software identification method based on screen monitoring
CN112633299A (en) * 2020-12-30 2021-04-09 深圳市优必选科技股份有限公司 Target detection method, network, device, terminal equipment and storage medium
CN112633299B (en) * 2020-12-30 2024-01-16 深圳市优必选科技股份有限公司 Target detection method, network, device, terminal equipment and storage medium
CN113158897A (en) * 2021-04-21 2021-07-23 新疆大学 Pedestrian detection system based on embedded YOLOv3 algorithm
CN113011390A (en) * 2021-04-23 2021-06-22 电子科技大学 Road pedestrian small target detection method based on image partition
CN113609895A (en) * 2021-06-22 2021-11-05 上海中安电子信息科技有限公司 Road traffic information acquisition method based on improved Yolov3
CN113935410A (en) * 2021-10-13 2022-01-14 甘肃同兴智能科技发展有限责任公司 Electric power customer portrait method based on cross-correlation density clustering

Similar Documents

Publication Publication Date Title
CN111046787A (en) Pedestrian detection method based on improved YOLO v3 model
CN109978893B (en) Training method, device, equipment and storage medium of image semantic segmentation network
Feng et al. A review and comparative study on probabilistic object detection in autonomous driving
CN111062413B (en) Road target detection method and device, electronic equipment and storage medium
CN108416250B (en) People counting method and device
CN107563372B (en) License plate positioning method based on deep learning SSD frame
US20210089895A1 (en) Device and method for generating a counterfactual data sample for a neural network
CN103020978B (en) SAR (synthetic aperture radar) image change detection method combining multi-threshold segmentation with fuzzy clustering
CN110309747B (en) Support quick degree of depth pedestrian detection model of multiscale
CN110348437B (en) Target detection method based on weak supervised learning and occlusion perception
CN107784288B (en) Iterative positioning type face detection method based on deep neural network
CN110322445B (en) Semantic segmentation method based on maximum prediction and inter-label correlation loss function
CN106778687A (en) Method for viewing points detecting based on local evaluation and global optimization
CN112861970B (en) Fine-grained image classification method based on feature fusion
CN111368634B (en) Human head detection method, system and storage medium based on neural network
CN112287983B (en) Remote sensing image target extraction system and method based on deep learning
CN113609895A (en) Road traffic information acquisition method based on improved Yolov3
CN110738132A (en) target detection quality blind evaluation method with discriminant perception capability
CN110781970A (en) Method, device and equipment for generating classifier and storage medium
CN112801227A (en) Typhoon identification model generation method, device, equipment and storage medium
CN112149664A (en) Target detection method for optimizing classification and positioning tasks
CN114549909A (en) Pseudo label remote sensing image scene classification method based on self-adaptive threshold
CN112528058B (en) Fine-grained image classification method based on image attribute active learning
CN111797795A (en) Pedestrian detection algorithm based on YOLOv3 and SSR
CN116030300A (en) Progressive domain self-adaptive recognition method for zero-sample SAR target recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination