CN111626128B - Pedestrian detection method based on improved YOLOv3 in orchard environment - Google Patents

Pedestrian detection method based on improved YOLOv3 in orchard environment Download PDF

Info

Publication number
CN111626128B
CN111626128B CN202010341941.7A CN202010341941A CN111626128B CN 111626128 B CN111626128 B CN 111626128B CN 202010341941 A CN202010341941 A CN 202010341941A CN 111626128 B CN111626128 B CN 111626128B
Authority
CN
China
Prior art keywords
network
box
pedestrian
predicted
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010341941.7A
Other languages
Chinese (zh)
Other versions
CN111626128A (en
Inventor
沈跃
张健
刘慧�
张礼帅
吴边
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN202010341941.7A priority Critical patent/CN111626128B/en
Publication of CN111626128A publication Critical patent/CN111626128A/en
Application granted granted Critical
Publication of CN111626128B publication Critical patent/CN111626128B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses a pedestrian detection method in an orchard environment based on improved YOLOv 3. The method comprises the following steps: s1, acquiring images in an orchard environment, and preprocessing to manufacture an orchard pedestrian sample set; s2, generating an anchor box number by using a K-means clustering algorithm to calculate pedestrian candidate frames; s3, adding a finer feature extraction layer in the YOLOv3 network, and increasing the detection output of the network in the large-scale feature layer to obtain an improved network model YOLO-Z; s4, inputting the training set into a YOLO-Z network to perform multiple environmental training, and then storing a weight file of the training set; s5, introducing a Kalman filtering algorithm and carrying out corresponding improvement to improve the robustness of the model, solve the problem of missed detection and improve the detection speed. The invention solves the dilemma of low real-time detection speed and low accuracy of pedestrians in an orchard environment, realizes multitask training, and ensures the detection speed and accuracy of pedestrians in the orchard environment.

Description

Pedestrian detection method based on improved YOLOv3 in orchard environment
Technical Field
The invention relates to a pedestrian detection method in an orchard environment based on improved YOLOv3, which aims at pedestrian detection of unmanned agricultural machinery in the orchard environment and belongs to the technical field of deep learning and pedestrian detection.
Background
With the rapid development of artificial intelligence, agricultural intelligent equipment also enters historic moment, and unmanned agricultural machinery is a heavy weight of the agricultural intelligent equipment. Obstacle detection is a primary problem faced when unmanned agricultural machinery is operated in the field, where pedestrian detection is more critical. The methods commonly used for pedestrian detection at present include a method based on motion characteristics, a method based on shape information, a method based on pedestrian models, a method based on stereoscopic vision, a method based on neural networks, a method based on wavelets and a support vector machine, and the like
Pedestrian detection in an orchard environment faces a series of problems: (1) pedestrian multi-pose problem. The pedestrian target is severely non-rigid and the pedestrian may take on a variety of different poses, either resting or walking, or standing or squatting. (2) detect complexity problems of the scene. Pedestrians are mixed with the background and are difficult to separate. And (3) the problem of real-time performance of the pedestrian detection and tracking system. In practical application, a certain requirement is often made on the reaction speed of the detection tracking system, the construction of a pedestrian detection algorithm is often complex, and the real-time resistance of the system is further improved. (4) occlusion problem. In a practical environment, there are a large number of occlusions from person to person. The method adopts computer vision to combine with deep learning to detect pedestrians, and provides a research foundation for realizing pedestrian detection.
Disclosure of Invention
In order to solve the above requirements of intelligent unmanned agricultural machinery in an orchard environment on pedestrian detection, the invention provides a pedestrian detection method in the orchard environment based on improved YOLOv3, detection is regarded as regression problem, the whole image is directly processed by using a convolution network structure, and the detection type and position are predicted.
The invention discloses a pedestrian detection method in an orchard environment based on improved YOLOv3, which comprises the following steps:
step 1: collecting pedestrian images in an orchard environment;
collecting images of pedestrians at various positions of an orchard where the pedestrians are under the depth cameras, wherein the photographed images of the pedestrians under different shielding environments, the images under different weather conditions and the images of the pedestrians at different distances comprise short-distance, medium-distance and long-distance images of the pedestrians;
step 2: preprocessing the image acquired in the step 1, and constructing a standard pedestrian detection data set;
step 3: putting the training set processed in the step 2 into a convolution characteristic device to extract pedestrian characteristics, generating an anchor box number through a K-means clustering algorithm to generate a predicted pedestrian boundary frame, and performing multi-scale fusion prediction by using a similar FPN network to improve the accuracy of boundary frame and category prediction, wherein the method comprises the following specific steps of:
(3.1): randomly selecting the width and height of a coordinate frame as a first clustering center;
(3.2): the n-th cluster center selection principle is that the larger the similarity distance between the n-th cluster center and the current n-1 cluster centers is, the larger the probability that the frame is selected;
(3.3): cycling (3.2) until all initial cluster centers are determined;
(3.4): calculating IoU (Intersection over Union) the rest coordinate frames with the clustering centers one by one to obtain similarity distances IoU loss between the two frames, and dividing the coordinate frames into classes with the smallest similarity distances to the clustering centers;
(3.5): after all coordinate frames are traversed, calculating the average value of the width and the height of the coordinate frames in each class, and taking the average value as a clustering center of next iteration;
(3.6): repeating (3.4) and (3.5) until the Total IoU loss difference of adjacent iterations is smaller than a threshold value or the number of iterations is reached, and stopping the clustering algorithm.
The improved K-means clustering algorithm mainly optimizes the selection of initial clustering centers, so that the similarity distance between the initial clustering centers is as large as possible.
Step 4: in the more detailed feature extraction layer of the YOLOv3 network, the detection output of the network in the large-scale feature layer is increased, and an improved network model YOLO-Z is obtained, specifically as follows:
(4.1): the training set image size obtained in step 2 is adjusted to 608×608, and the IOU threshold is set to 0.45, and the confidence threshold is set to 0.5. Each lattice predicts B bounding boxes, each bounding box containing 1 confidence score, 4 coordinate values and C class probabilities, where B is the number of output feature layers anchor boxes where the lattice is located. Then, for the output feature layer of size, the final output dimension is;
the clustering uses the formula d (box, centroid) =1-IOU (box, centroid)
Wherein, box is a priori frame, centroid is cluster center, IOU (box, centroid) is the ratio of the intersection of two regions, when d (box, centroid) is less than or equal to the measurement threshold value, confirm the width and height of the anchor box.
The formula of the prediction boundary box is
b x =σ(t x )+c x
b y =σ(t y )+c y
Wherein c x And c y For the distance of the divided cells from the abscissa of the upper left corner of the image, p w 、p h The width and height of the bounding box before prediction, t x And t y To predict the center relative parameter, σ (t x ) Sum sigma (t) y ) The distances from the center of the prediction frame to the horizontal direction and the vertical direction of the upper left corner of the cell where the prediction frame is positioned are respectively b x And b y Respectively the abscissa, the ordinate, b of the predicted bounding box center w And b h The width and height of the predicted bounding box, respectively.
Confidence formula for prediction bounding box is
Wherein Pr (object) is 0 or 1, 0 indicates no object in the image, and 1 indicates an object;representing the ratio of intersection between the predicted bounding box and the actual bounding box, the confidence score reflects whether the target is contained and the accuracy of the predicted location if the target is contained. If the confidence threshold is set to 0.5, deleting the predicted bounding box when the confidence of the predicted bounding box is less than 0.5; and when the confidence of the predicted boundary frame is greater than 0.5, reserving the predicted boundary frame.
(4.2): the more detailed feature extraction layer is added in the YOLOv3 network, and the detection output of the network in the large-scale feature layer is increased;
the YOLOv3 network adopts a large number of convolutions every time it performs downsampling, and according to the receptive field calculation formula, as the number of layers of the network increases, the receptive field increases, and the extracted features are formed by more information fusion, i.e. the deeper the network, the more concerned the global information. The pedestrian occupies smaller proportion in the picture, belongs to small-size object detection, and in a deep feature map, the influence of information of the small-size object on the feature map is smaller, and the information loss of the small-size object is serious. Therefore, a more detailed feature extraction layer is added, on the basis of keeping the original output layer of the YOLOv3, the output feature map is up-sampled to obtain a size feature map and is combined with a shallow size convolution layer, and then the model YOLO-Z is obtained through prediction output after a plurality of convolution layers;
(4.3): then, carrying out multi-scale fusion prediction on pedestrians through a similar FPN network, wherein the target detection is regarded as a regression problem by a YOLOv3 algorithm, so that a mean square error loss function is adopted;
the mean square error loss function (loss function) formula used for class prediction is
Wherein: s is S 2 Representing the grid size of the final characteristic diagram of the network, B representing the number of predicted frames of each grid, x, y, w and h representing the center and width and height of the frames, C i Representing the confidence that the prediction box is located to the pedestrian,representing confidence level of true existence of pedestrian in frame, P i (c) Representing predicted pedestrian confidence,/->The confidence of pedestrians exists truly; />Refers to judging whether the jth binding box in the ith grid is responsible for the objectThe body and the IOU maximum bound box of the real existing target frame group_trunk of the object; />Representing the largest boundingbox of the IOU; lambda (lambda) coord Weight coefficients for the bounding box coordinate prediction error; lambda (lambda) noobj Weights representing classification errors classification error; />Judging whether the center of an object falls in a grid i, wherein the center of the object is contained in the grid, and predicting the class probability of the object;
step 5: inputting the training set into a YOLO-Z network to perform various environmental training, and then storing a weight file of the training set;
based on the improved YOLO-Z network, a convolution layer is added, finer feature extraction is obtained, and small targets are detected in a shallow layer, so that a pedestrian detection model under an orchard is obtained. The prior knowledge of the data set is utilized, the width and height of the candidate frames are obtained through a K-means clustering algorithm, the influence of different candidate frame numbers on the performance of the model is analyzed, the model with optimal performance is obtained under limited computing resources, and training parameters are optimized for improving the positioning accuracy of the model.
Step 6: the Kalman filtering algorithm is introduced and the corresponding improvement is carried out to improve the robustness of the model, solve the problem of missing detection and improve the detection speed, and the specific steps are as follows:
the Kalman filtering algorithm outputs an optimal recurrence algorithm, and the tracking process is mainly divided into two steps: prediction and updating. After a state space model and an observation equation are established for the system, the filter can obtain a predicted value of the state variable at the current moment according to the noise of the system and the state variable at the previous moment, and then the state variable is updated by combining with the observed value at the current moment to finally realize the state of prediction estimation.
The state space model and the observation equation are formulated as follows, which are the basis for iterative tracking by a Kalman filter:
X i =A i|i-1 X i-1 +w i-1
Z i =Hx i +v i
wherein X is i And X i-1 Is the system state corresponding to the moment i and the moment i-1, A i|i-1 Is a state transition matrix, and is related to state variables of the system and a target movement mode; z is Z i The observation state of the system at the moment i is shown, H is an observation matrix, and the observation matrix and the observation value are related. W (W) i-1 Corresponding to system noise, v i The measurement noise of the corresponding system is subjected to normal distribution, and the covariance is Q, R respectively.
The invention has the following advantages:
1. the improved K-means clustering algorithm is used for optimizing the selection of initial clustering centers, so that the similarity distance between the initial clustering centers is as large as possible, the clustering time can be effectively shortened, and the clustering effect of the algorithm is improved;
2. a convolution layer is added on a shallow layer of a network to obtain finer feature extraction, and small targets are detected on the shallow layer, so that the detection accuracy of the obtained YOLO-Z model is greatly improved, the detection speed is also remarkably improved, and the requirement of real-time detection is met;
3. the YOLO-Z model is combined with a Kalman filtering algorithm, so that the omission ratio of a place where shielding is obvious can be improved, and the detection speed of the place can be further increased.
Drawings
Fig. 1 is a flowchart of an overall implementation process of a pedestrian detection method in an orchard environment based on improved YOLOv3 in an embodiment of the present invention.
FIG. 2 is a diagram of network-based coordinate prediction in multitasking training in accordance with an embodiment of the present invention.
FIG. 3 is a YOLOv3 network-based shallow-layer added convolution feature extractor in an embodiment of the present invention.
FIG. 4 is an effect diagram of an orchard pedestrian detection method based on improved Yolov3 in an embodiment of the present invention; (a) is in a resting state; (b) being in a mobile state; (c) is in a normal posture; (d) an abnormal posture; (e) is a large target; (f) is a mid-target; (g) is a small target.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the invention provides a pedestrian detection method in an orchard environment based on improved YOLOv3, which comprises the following steps:
step 1: collecting pedestrian images in an orchard environment;
collecting images of pedestrians at various positions of an orchard where the pedestrians are under the depth cameras, wherein the photographed images of the pedestrians under different shielding environments, the images under different weather conditions and the images of the pedestrians at different distances comprise short-distance, medium-distance and long-distance images of the pedestrians;
step 2: preprocessing the image acquired in the step 1, and constructing a standard pedestrian detection data set;
as shown in fig. 2-3, step 3: putting the training set processed in the step 2 into a convolution characteristic device to extract pedestrian characteristics, generating an anchor box number through a K-means clustering algorithm to generate a predicted pedestrian boundary frame, and performing multi-scale fusion prediction by using a similar FPN network to improve the accuracy of boundary frame and category prediction, wherein the method comprises the following specific steps of:
(3.1): randomly selecting the width and height of a coordinate frame as a first clustering center;
(3.2): the n-th cluster center selection principle is that the larger the similarity distance between the n-th cluster center and the current n-1 cluster centers is, the larger the probability that the frame is selected;
(3.3): cycling (3.2) until all initial cluster centers are determined;
(3.4): calculating IoU the rest coordinate frames with the clustering centers one by one to obtain similarity distances IoU loss between the two frames, and dividing the coordinate frames into classes with the smallest similarity distances to the clustering centers;
(3.5): after all coordinate frames are traversed, calculating the average value of the width and the height of the coordinate frames in each class, and taking the average value as a clustering center of next iteration;
(3.6): repeating (3.4) and (3.5) until the Total IoU loss difference of adjacent iterations is smaller than a threshold value or the number of iterations is reached, and stopping the clustering algorithm.
The improved K-means clustering algorithm mainly optimizes the selection of initial clustering centers, so that the similarity distance between the initial clustering centers is as large as possible.
Step 4: in the more detailed feature extraction layer of the YOLOv3 network, the detection output of the network in the large-scale feature layer is increased, and an improved network model YOLO-Z is obtained, specifically as follows:
(4.1): the training set image size obtained in step 2 is adjusted to 608×608, and the IOU threshold is set to 0.45, and the confidence threshold is set to 0.5. Each lattice predicts B bounding boxes, each bounding box containing 1 confidence score, 4 coordinate values and C class probabilities, where B is the number of output feature layers anchor boxes where the lattice is located. Then, for the output feature layer of size, the final output dimension is;
the clustering uses the formula d (box, centroid) =1-IOU (box, centroid)
Wherein, box is a priori frame, centroid is cluster center, IOU (box, centroid) is the ratio of the intersection of two regions, when d (box, centroid) is less than or equal to the measurement threshold value, confirm the width and height of the anchor box.
The formula of the prediction boundary box is
b x =σ(t x )+c x
b y =σ(t y )+c y
Wherein c x And c y For the distance of the divided cells from the abscissa of the upper left corner of the image, p w 、p h The width and height of the bounding box before prediction, t x And t y To predict the center relative parameter, σ (t x ) Sum sigma (t) y ) The distances from the center of the prediction frame to the horizontal direction and the vertical direction of the upper left corner of the cell where the prediction frame is positioned are respectively b x And b y Respectively the abscissa, the ordinate, b of the predicted bounding box center w And b h The width and height of the predicted bounding box, respectively.
Confidence formula for prediction bounding box is
Wherein Pr (object) is 0 or 1, 0 indicates no object in the image, and 1 indicates an object;representing the ratio of intersection between the predicted bounding box and the actual bounding box, the confidence score reflects whether the target is contained and the accuracy of the predicted location if the target is contained. If the confidence threshold is set to 0.5, deleting the predicted bounding box when the confidence of the predicted bounding box is less than 0.5; and when the confidence of the predicted boundary frame is greater than 0.5, reserving the predicted boundary frame.
(4.2): the more detailed feature extraction layer is added in the YOLOv3 network, and the detection output of the network in the large-scale feature layer is increased;
the YOLOv3 network adopts a large number of convolutions every time it performs downsampling, and according to the receptive field calculation formula, as the number of layers of the network increases, the receptive field increases, and the extracted features are formed by more information fusion, i.e. the deeper the network, the more concerned the global information. The pedestrian occupies smaller proportion in the picture, belongs to small-size object detection, and in a deep feature map, the influence of information of the small-size object on the feature map is smaller, and the information loss of the small-size object is serious. Therefore, a more detailed feature extraction layer is added, on the basis of keeping the original output layer of the YOLOv3, the output feature map is up-sampled to obtain a size feature map and is combined with a shallow size convolution layer, and then the model YOLO-Z is obtained through prediction output after a plurality of convolution layers;
(4.3): then, carrying out multi-scale fusion prediction on pedestrians through a similar FPN network, wherein the target detection is regarded as a regression problem by a YOLOv3 algorithm, so that a mean square error loss function is adopted;
the mean square error loss function (loss function) formula used for class prediction is
Wherein: s is S 2 Representing the grid size of the final characteristic diagram of the network, B representing the number of predicted frames of each grid, x, y, w and h representing the center and width and height of the frames, C i Representing the confidence that the prediction box is located to the pedestrian,representing confidence level of true existence of pedestrian in frame, P i (c) Representing predicted pedestrian confidence,/->The confidence of pedestrians exists truly; />Judging whether the jth binding box in the ith grid is responsible for the object or not, and judging the IOU maximum binding box of the group_trunk of the object;representing the largest boundingbox of the IOU; lambda (lambda) noobj A weight representing classification error;judging whether the center of an object falls in a grid i, wherein the center of the object is contained in the grid, and predicting the class probability of the object;
step 5: inputting the training set into a YOLO-Z network to perform various environmental training, and then storing a weight file of the training set;
based on the improved YOLO-Z network, a convolution layer is added, finer feature extraction is obtained, and small targets are detected in a shallow layer, so that a pedestrian detection model under an orchard is obtained. The prior knowledge of the data set is utilized, the width and height of the candidate frames are obtained through a K-means clustering algorithm, the influence of different candidate frame numbers on the performance of the model is analyzed, the model with optimal performance is obtained under limited computing resources, and training parameters are optimized for improving the positioning accuracy of the model.
Step 6: the Kalman filtering algorithm is introduced and the corresponding improvement is carried out to improve the robustness of the model, solve the problem of missing detection and improve the detection speed, and the specific steps are as follows:
the Kalman filtering algorithm outputs an optimal recurrence algorithm, and the tracking process is mainly divided into two steps: prediction and updating. After a state space model and an observation equation are established for the system, the filter can obtain a predicted value of the state variable at the current moment according to the noise of the system and the state variable at the previous moment, and then the state variable is updated by combining with the observed value at the current moment to finally realize the state of prediction estimation.
The state space model and the observation equation are formulated as follows, which are the basis for iterative tracking by a Kalman filter:
X i =A i|i-1 X i-1 +w i-1
Z i =Hx i +v i
wherein X is i And X i-1 Is the system state corresponding to the moment i and the moment i-1, A i|i-1 Is a state transition matrix, and is related to state variables of the system and a target movement mode; z is Z i The observation state of the system at the moment i is shown, H is an observation matrix, and the observation matrix and the observation value are related.W i-1 Corresponding to system noise, v i The measurement noise of the corresponding system is subjected to normal distribution, and the covariance is Q, R respectively. As shown in fig. 4, the pedestrian detection method based on the improved YOLOv3 in the orchard environment is based on YOLOv3, aims at detection difficulties such as illumination and shielding in the orchard environment, improves a K-means clustering algorithm and a Kalman filtering algorithm by providing a YOLO-Z network in the improvement of training samples and network structures, improves the accuracy and recall rate of pedestrian detection, meets the requirement of real-time detection, reduces the requirement of a network model on hardware, and is beneficial to intelligent agricultural machinery pedestrian detection in the orchard.
In summary, the invention provides a pedestrian detection method in an orchard environment based on improved YOLOv 3. The method comprises the following steps: s1, acquiring images in an orchard environment, and preprocessing to manufacture an orchard pedestrian sample set; s2, generating an anchor box number by using a K-means clustering algorithm to calculate pedestrian candidate frames; s3, adding a finer feature extraction layer in the YOLOv3 network, and increasing the detection output of the network in the large-scale feature layer to obtain an improved network model YOLO-Z; s4, inputting the training set into a YOLO-Z network to perform multiple environmental training, and then storing a weight file of the training set; s5, introducing a Kalman filtering algorithm and carrying out corresponding improvement to improve the robustness of the model, solve the problem of missed detection and improve the detection speed. The invention solves the dilemma of low real-time detection speed and low accuracy of pedestrians in an orchard environment, realizes multitask training, and ensures the detection speed and accuracy of pedestrians in the orchard environment.
In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims (3)

1. The pedestrian detection method based on the improved YOLOv3 in the orchard environment is characterized by comprising the following steps of:
step 1: collecting pedestrian images in an orchard environment;
collecting images of pedestrians at various positions of an orchard where the pedestrians are under the depth cameras, wherein the photographed images of the pedestrians under different shielding environments, the images under different weather conditions and the images of the pedestrians at different distances comprise short-distance, medium-distance and long-distance images of the pedestrians;
step 2: preprocessing the image acquired in the step 1, and constructing a standard pedestrian detection data set;
step 3: processing the pedestrian detection data set in the step 2, then making a training set, putting the training set into a convolution characteristic device to extract pedestrian characteristics, generating an anchor box number through a K-means clustering algorithm to generate predicted pedestrian boundary frame expansion data, and performing multi-scale fusion prediction by using a similar FPN network to improve the accuracy of boundary frame and category prediction;
step 4: the more detailed feature extraction layer is added in the YOLOv3 network, the detection output of the network in the large-scale feature layer is increased, and an improved network model YOLO-Z is obtained;
the step 4 is specifically as follows:
step 4.1: firstly, adjusting the size of the training set image obtained in the step 2 to 608 multiplied by 608, setting a IoU threshold to 0.45, representing Intersection over Union by using an IoU, and setting a confidence threshold to 0.5, predicting B bounding boxes for each grid, wherein each bounding box comprises 1 confidence score value, 4 coordinate values and C category probabilities, wherein B is the number of output feature layers of the grid, and then, for the output feature layers of the size, the final output dimension is;
the formula for clustering is
d(box,centroid)=1-IOU(box,centroid)
Wherein, box is a priori frame, centroid is a cluster center, IOU (box, centroid) is the intersection ratio of two areas, when d (box, centroid) is smaller than or equal to the measurement threshold value, confirm the width and height of the anchor box;
the formula of the prediction boundary box is
b x =σ(t x )+c x
b y =σ(t y )+c y
Wherein c x And c y For the distance of the divided cells from the abscissa of the upper left corner of the image, p w 、p h The width and height of the bounding box before prediction, t x And t y To predict the center relative parameter, σ (t x ) Sum sigma (t) y ) The distances from the center of the prediction frame to the horizontal direction and the vertical direction of the upper left corner of the cell where the prediction frame is positioned are respectively b x And b y Respectively the abscissa, the ordinate, b of the predicted bounding box center w And b h The width and height of the predicted bounding box, respectively;
confidence formula for prediction bounding box is
Wherein Pr (object) is 0 or 1, 0 indicates no object in the image, and 1 indicates an object;representing the intersection ratio between the predicted boundary frame and the actual boundary frame, wherein the confidence coefficient confidence score reflects whether the target is contained or not and the accuracy of the predicted position under the condition that the target is contained, and the confidence coefficient threshold value is set to be 0.5, and deleting the predicted boundary frame when the confidence coefficient of the predicted boundary frame is smaller than 0.5; when the confidence coefficient of the predicted boundary frame is larger than 0.5, reserving the predicted boundary frame;
step 4.2: the more detailed feature extraction layer is added in the YOLOv3 network, and the detection output of the network in the large-scale feature layer is increased;
according to a receptive field calculation formula, as the number of layers of the network increases, the receptive field increases, the extracted features are formed by more information fusion, namely, the deeper the network is, the more concerned global information is, the smaller the proportion of pedestrians in the picture is, the detection of small-size objects is realized, in a deep feature map, the influence of the information of the small-size objects on the feature map is smaller, and the information loss of the small-size objects is serious; therefore, a more detailed feature extraction layer is added, on the basis of keeping the original output layer of the YOLOv3, the output feature map is up-sampled to obtain a size feature map and is combined with a shallow size convolution layer, and then the model YOLO-Z is obtained through prediction output after a plurality of convolution layers;
step 4.3: then, carrying out multi-scale fusion prediction on pedestrians through a similar FPN network, wherein the target detection is regarded as a regression problem by a YOLOv3 algorithm, so that a mean square error loss function is adopted;
the mean square error loss function formula used for category prediction is as follows
Wherein: s is S 2 Representing the grid size of the final characteristic diagram of the network, B representing the number of predicted frames of each grid, x, y, w and h representing the center and width and height of the frames, C i Representing the confidence that the prediction box is located to the pedestrian,representing confidence level of true existence of pedestrian in frame, P i (c) Representing predicted pedestrian confidence,/->The confidence of pedestrians exists truly; />Judging whether the jth binding box in the ith grid is responsible for the object, and judging the IOU maximum binding box of the jth binding box with the truly existing target frame group_trunk of the object; />Representing the largest binding box of the IOU; lambda (lambda) coord Weight coefficients for the bounding box coordinate prediction error; lambda (lambda) noobj Weights representing classification errors classification error; />Judging whether the center of an object falls in a grid i, wherein the center of the object is contained in the grid, and predicting the class probability of the object;
step 5: inputting the training set into a YOLO-Z network to perform various environmental training, and then storing a weight file of the training set;
step 6: an improved Kalman filtering algorithm is introduced to improve the robustness of the model, solve the problem of missed detection and improve the detection speed.
2. The pedestrian detection method in an orchard environment based on improved YOLOv3 of claim 1, wherein the generating of the predicted pedestrian bounding box expansion data by generating the number of anchor boxes through a K-means clustering algorithm comprises the following specific steps:
step 3.1: randomly selecting the width and height of a coordinate frame as a first clustering center;
step 3.2: the n-th cluster center selection principle is that the larger the similarity distance between the n-th cluster center and the current n-1 cluster centers is, the larger the probability that the frame is selected;
step 3.3: cycling step 3.2 until all initial cluster centers are determined;
step 3.4: calculating IoU the rest coordinate frames with the clustering centers one by one to obtain similarity distances IoU loss between the two frames, and dividing the coordinate frames into classes with the smallest similarity distances to the clustering centers;
step 3.5: after all coordinate frames are traversed, calculating the average value of the width and the height of the coordinate frames in each class, and taking the average value as a clustering center of next iteration;
step 3.6: repeating the steps 3.4 and 3.5 until the Total IoU loss difference value of the adjacent iterations is smaller than a threshold value or the number of iterations is reached, and stopping the clustering algorithm.
3. The method for pedestrian detection in an orchard environment based on improved YOLOv3 of claim 1, wherein step 6 is specifically as follows:
the improved Kalman filtering algorithm outputs an optimal recurrence algorithm, and the tracking process is mainly divided into two steps: predicting and updating; after a state space model and an observation equation are established for the system, a filter can obtain a predicted value of a state variable at the current moment according to noise of the system and the state variable at the previous moment, and then the state variable is updated by combining with the observed value at the current moment to finally realize a predicted estimated state;
the state space model and the observation equation are formulated as follows, which are the basis for iterative tracking by a Kalman filter:
X i =A i|i-1 X i-1 +w i-1
Z i =Hx i +v i
wherein X is i And X i-1 Is the system state corresponding to the moment i and the moment i-1, A i|i-1 Is a state transition matrix, and is related to state variables of the system and a target movement mode; z is Z i Representing the observation state of the system at the moment i, wherein H is an observation matrix, and is related to the system matrix and the observation value, W i-1 Corresponding to system noise, v i Corresponding systemIs subjected to normal distribution, and covariance is Q, R respectively.
CN202010341941.7A 2020-04-27 2020-04-27 Pedestrian detection method based on improved YOLOv3 in orchard environment Active CN111626128B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010341941.7A CN111626128B (en) 2020-04-27 2020-04-27 Pedestrian detection method based on improved YOLOv3 in orchard environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010341941.7A CN111626128B (en) 2020-04-27 2020-04-27 Pedestrian detection method based on improved YOLOv3 in orchard environment

Publications (2)

Publication Number Publication Date
CN111626128A CN111626128A (en) 2020-09-04
CN111626128B true CN111626128B (en) 2023-07-21

Family

ID=72260566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010341941.7A Active CN111626128B (en) 2020-04-27 2020-04-27 Pedestrian detection method based on improved YOLOv3 in orchard environment

Country Status (1)

Country Link
CN (1) CN111626128B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307955A (en) * 2020-10-29 2021-02-02 广西科技大学 Optimization method based on SSD infrared image pedestrian detection
CN112347938B (en) * 2020-11-09 2023-09-26 南京机电职业技术学院 People stream detection method based on improved YOLOv3
CN112329697B (en) * 2020-11-18 2022-04-12 广西师范大学 Improved YOLOv 3-based on-tree fruit identification method
CN112381021B (en) * 2020-11-20 2022-07-12 安徽一视科技有限公司 Personnel detection counting method based on deep learning
CN112381043A (en) * 2020-11-27 2021-02-19 华南理工大学 Flag detection method
CN112733603A (en) * 2020-12-11 2021-04-30 江苏大学 Frequency conversion scroll compressor fault diagnosis method based on improved VMD and SVM
CN112541483A (en) * 2020-12-25 2021-03-23 三峡大学 Dense face detection method combining YOLO and blocking-fusion strategy
CN112668662B (en) * 2020-12-31 2022-12-06 北京理工大学 Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN112766188A (en) * 2021-01-25 2021-05-07 浙江科技学院 Small-target pedestrian detection method based on improved YOLO algorithm
CN112911171B (en) * 2021-02-04 2022-04-22 上海航天控制技术研究所 Intelligent photoelectric information processing system and method based on accelerated processing
CN113111703B (en) * 2021-03-02 2023-07-28 郑州大学 Airport pavement disease foreign matter detection method based on fusion of multiple convolutional neural networks
CN113139481B (en) * 2021-04-28 2023-09-01 广州大学 Classroom people counting method based on yolov3
CN113609895A (en) * 2021-06-22 2021-11-05 上海中安电子信息科技有限公司 Road traffic information acquisition method based on improved Yolov3
CN113378753A (en) * 2021-06-23 2021-09-10 华南农业大学 Improved YOLOv 4-based boundary target identification method for rice field in seedling stage
CN113486764B (en) * 2021-06-30 2022-05-03 中南大学 Pothole detection method based on improved YOLOv3
CN113822169B (en) * 2021-08-30 2024-03-19 江苏大学 Orchard tree pedestrian detection method based on improved PP-YOLO

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325418A (en) * 2018-08-23 2019-02-12 华南理工大学 Based on pedestrian recognition method under the road traffic environment for improving YOLOv3
CN109934121A (en) * 2019-02-21 2019-06-25 江苏大学 A kind of orchard pedestrian detection method based on YOLOv3 algorithm
CN110070074A (en) * 2019-05-07 2019-07-30 安徽工业大学 A method of building pedestrian detection model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325418A (en) * 2018-08-23 2019-02-12 华南理工大学 Based on pedestrian recognition method under the road traffic environment for improving YOLOv3
CN109934121A (en) * 2019-02-21 2019-06-25 江苏大学 A kind of orchard pedestrian detection method based on YOLOv3 algorithm
CN110070074A (en) * 2019-05-07 2019-07-30 安徽工业大学 A method of building pedestrian detection model

Also Published As

Publication number Publication date
CN111626128A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
CN111626128B (en) Pedestrian detection method based on improved YOLOv3 in orchard environment
CN111310861B (en) License plate recognition and positioning method based on deep neural network
CN110232350B (en) Real-time water surface multi-moving-object detection and tracking method based on online learning
CN109559320B (en) Method and system for realizing visual SLAM semantic mapping function based on hole convolution deep neural network
CN111179217A (en) Attention mechanism-based remote sensing image multi-scale target detection method
CN109859238B (en) Online multi-target tracking method based on multi-feature optimal association
CN107633226B (en) Human body motion tracking feature processing method
CN112288770A (en) Video real-time multi-target detection and tracking method and device based on deep learning
CN111797983A (en) Neural network construction method and device
CN110991444B (en) License plate recognition method and device for complex scene
CN112052802B (en) Machine vision-based front vehicle behavior recognition method
CN111753682B (en) Hoisting area dynamic monitoring method based on target detection algorithm
CN111079604A (en) Method for quickly detecting tiny target facing large-scale remote sensing image
CN109242019B (en) Rapid detection and tracking method for optical small target on water surface
CN112884742A (en) Multi-algorithm fusion-based multi-target real-time detection, identification and tracking method
CN111626200A (en) Multi-scale target detection network and traffic identification detection method based on Libra R-CNN
WO2023030182A1 (en) Image generation method and apparatus
CN113888461A (en) Method, system and equipment for detecting defects of hardware parts based on deep learning
CN111126278A (en) Target detection model optimization and acceleration method for few-category scene
CN115861619A (en) Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network
CN111709317A (en) Pedestrian re-identification method based on multi-scale features under saliency model
Fu et al. A case study of utilizing YOLOT based quantitative detection algorithm for marine benthos
CN112668662B (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN113281718A (en) 3D multi-target tracking system and method based on laser radar scene flow estimation
CN112597919A (en) Real-time medicine box detection method based on YOLOv3 pruning network and embedded development board

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant