CN111626128A - Improved YOLOv 3-based pedestrian detection method in orchard environment - Google Patents

Improved YOLOv 3-based pedestrian detection method in orchard environment Download PDF

Info

Publication number
CN111626128A
CN111626128A CN202010341941.7A CN202010341941A CN111626128A CN 111626128 A CN111626128 A CN 111626128A CN 202010341941 A CN202010341941 A CN 202010341941A CN 111626128 A CN111626128 A CN 111626128A
Authority
CN
China
Prior art keywords
box
pedestrian
network
prediction
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010341941.7A
Other languages
Chinese (zh)
Other versions
CN111626128B (en
Inventor
沈跃
张健
刘慧�
张礼帅
吴边
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN202010341941.7A priority Critical patent/CN111626128B/en
Publication of CN111626128A publication Critical patent/CN111626128A/en
Application granted granted Critical
Publication of CN111626128B publication Critical patent/CN111626128B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses an orchard environment pedestrian detection method based on improved YOLOv 3. The method comprises the following steps: s1, acquiring images in an orchard environment, preprocessing the images, and manufacturing an orchard pedestrian sample set; s2, generating anchor box quantity calculation pedestrian candidate boxes by using a K-means clustering algorithm; s3, adding a more detailed feature extraction layer in a YOLOv3 network, and increasing the detection output of the network in a large-scale feature layer to obtain an improved network model YOLO-Z; s4, inputting the training set into a YOLO-Z network for multiple environment training, and then storing the weight file; s5, introducing a Kalman filtering algorithm and improving the Kalman filtering algorithm correspondingly to improve the robustness of the model, solve the problem of missing detection and improve the detection speed. The invention solves the dilemma that the real-time detection speed of the pedestrian is low and the accuracy rate is low in the orchard environment, realizes multi-task training and ensures the detection speed and the detection precision of the pedestrian in the orchard environment.

Description

Improved YOLOv 3-based pedestrian detection method in orchard environment
Technical Field
The invention relates to an orchard environment pedestrian detection method based on improved YOLOv3, aims at pedestrian detection of unmanned agricultural machinery in an orchard environment, and belongs to the technical field of deep learning and pedestrian detection.
Background
With the rapid development of artificial intelligence, agricultural intelligent equipment also enters historical moments, and unmanned agricultural machinery is the central importance of the agricultural intelligent equipment. Obstacle detection is a first problem faced when unmanned agricultural machinery operates in the field, with pedestrian detection being more critical. The current common methods for pedestrian detection include methods based on motion characteristics, methods based on shape information, methods based on pedestrian models, methods based on stereo vision, methods based on neural networks, methods based on wavelets and support vector machines, and the like
Pedestrian detection in an orchard environment faces a series of problems: (1) the multi-pose problem of pedestrians. The pedestrian target is severely non-rigid and the pedestrian may assume a variety of different positions, either still or walking, or standing or squatting. (2) Complexity problems of the detection scenario. Pedestrians are mixed with the background and are difficult to separate. (3) The real-time performance of the pedestrian detection and tracking system. In practical application, certain requirements are often required for the response speed of a detection tracking system, the construction of a pedestrian detection algorithm is often complex, and the resistance of the system in real time is further improved. (4) The problem of occlusion. In the actual environment, a large amount of shelters exist among people and among people and things. The pedestrian detection is carried out by combining a computer vision method with deep learning, and a research basis is provided for realizing the pedestrian detection.
Disclosure of Invention
In order to solve the requirement of intelligent unmanned agricultural machinery for pedestrian detection in an orchard environment, the invention provides a pedestrian detection method in the orchard environment based on improved YOLOv3, which treats detection as a regression problem, directly processes the whole image by using a convolution network structure, and simultaneously predicts the type and position of detection.
The orchard environment pedestrian detection method based on the improved YOLOv3 comprises the following steps:
step 1: acquiring images of pedestrians in an orchard environment;
collecting images of various positions of an orchard where pedestrians are shot under a depth camera, wherein the shot images of the pedestrians under different sheltering environments, the images under different weather conditions and the images of the pedestrians at different distances including a short distance, a middle distance and a long distance are shot;
step 2: preprocessing the image acquired in the step 1, and constructing a standard pedestrian detection data set;
and step 3: putting the training set processed and manufactured in the step 2 into a convolution characteristic device to extract pedestrian characteristics, generating anchor box numbers through a K-means clustering algorithm to generate a predicted pedestrian boundary box, and performing multi-scale fusion prediction by using a similar FPN network to improve the accuracy of the boundary box and category prediction, wherein the specific steps are as follows:
(3.1): randomly selecting the width and the height of a coordinate frame as a first clustering center;
(3.2): the nth clustering center selection principle is that the probability of selecting a frame with larger similarity distance with the current n-1 clustering centers is larger;
(3.3): looping (3.2) until all initial cluster centers are determined;
(3.4): calculating IoU (interaction over Union) of the rest other coordinate frames with the clustering center one by one to obtain a similarity distance IoU loss between the two frames, and dividing the coordinate frames into the class to which the clustering center with the minimum similarity distance belongs;
(3.5): after all the coordinate frames are traversed, calculating the mean values of the width and the height of the coordinate frames in each class to be used as the clustering center of next iteration;
(3.6): and (3.4) and (3.5) are repeated until the Total IoU loss difference value of adjacent iterations is smaller than the threshold value or the iteration number is reached, and the clustering algorithm is stopped.
The improved K-means clustering algorithm mainly optimizes the selection of initial clustering centers, so that the similarity distance between the initial clustering centers is as large as possible, the method can effectively shorten the clustering time, and improve the clustering effect of the algorithm.
And 4, step 4: a more detailed feature extraction layer is added to the YOLOv3 network, and the detection output of the network in a large-scale feature layer is increased to obtain an improved network model YOLO-Z, which is specifically as follows:
(4.1): firstly, the size of the training set image obtained in step 2 is adjusted to 608 × 608, and the IOU threshold value is set to 0.45, and the confidence threshold value is set to 0.5. Each grid predicts B bounding boxes (bounding boxes), each containing 1 confidence score (confidence score) value, 4 coordinate values and C class probabilities, wherein B is the number of output feature layers anchor boxes where the grid is located. Then, for a scaled output feature layer, the final output dimension is;
the formula used for clustering is d (box, centroid) ═ 1-IOU (box, centroid)
Wherein, box is a priori box, centroid is a clustering center, IOU (box, centroid) is the intersection ratio of two areas, and when d (box, centroid) is less than or equal to a measurement threshold, the width and height of the anchor box are determined.
The formula for predicting the bounding box is
bx=σ(tx)+cx
by=σ(ty)+cy
Figure BDA0002468833560000021
Figure BDA0002468833560000022
Wherein, cxAnd cyIs the distance between the divided cell and the horizontal and vertical coordinates of the upper left corner of the image, pw、phThe width and height of the bounding box before prediction, txAnd tyTo predict the central relative parameter, σ (t)x) And σ (t)y) Respectively the distances of the center of the prediction frame from the upper left corner of the cell in which the prediction frame is positioned in the horizontal direction and the vertical direction, bxAnd byRespectively the abscissa and ordinate of the predicted bounding box center, bwAnd bhRespectively the width and height of the predicted bounding box.
The confidence of the predicted bounding box is formulated as
Figure BDA0002468833560000031
Wherein Pr (object) is 0 or 1, 0 indicates no object in the image, and 1 indicates an object;
Figure BDA0002468833560000032
representing the intersection ratio between the predicted bounding box and the actual bounding box, the confidence score (confidence score) reflects whether the target is contained and the accuracy of the predicted location in the case of containing the target. Setting the confidence threshold value to be 0.5, and deleting the predicted boundary box when the confidence of the predicted boundary box is less than 0.5; and when the confidence of the predicted bounding box is more than 0.5, the predicted bounding box is reserved.
(4.2): adding a more detailed feature extraction layer in a YOLOv3 network, and increasing the detection output of the network in a large-scale feature layer;
the YOLOv3 network adopts a large amount of convolution every time a downsampling is carried out, and according to a receptive field calculation formula, as the number of layers of the network increases, the receptive field increases, and the extracted features are formed by fusing more information, namely, the deeper the network, the more the network pays attention to the global information. The proportion of the pedestrians in the picture is small, the method belongs to small-size object detection, in the deep characteristic diagram, the influence of the information of the small-size object on the characteristic diagram is small, and the information loss of the small-size object is serious. Therefore, a more detailed feature extraction layer is added, on the basis of reserving the original output layer of YOLOv3, the output feature map is subjected to upsampling to obtain a size feature map, the size feature map is combined with the shallow size convolution layer, and the prediction output is carried out after a plurality of convolution layers to obtain a model YOLO-Z;
(4.3): then, multi-scale fusion prediction is carried out on the pedestrians through a similar FPN network, and the target detection is regarded as a regression problem through the YOLOv3 algorithm, so that a mean square error loss function is adopted;
the mean square error loss function (loss function) used for class prediction is formulated as
Figure BDA0002468833560000033
Wherein: s2Representing the mesh size of the final characteristic diagram of the network, B representing each meshThe number of prediction boxes, x, y, w, h, represents the center and width and height of the box, CiRepresenting the confidence that the prediction box is positioned to the pedestrian,
Figure BDA0002468833560000041
confidence, P, that a pedestrian is actually present within the framei(c) The confidence level of the predicted pedestrian is represented,
Figure BDA0002468833560000042
the confidence of the pedestrian really exists;
Figure BDA0002468833560000043
judging whether the jth bounding box in the ith grid is in charge of the object or not and judging the IOU maximum bounding box of the real existing target frame group _ judge _ box of the object;
Figure BDA0002468833560000044
a bounding box representing the IOU maximum; lambda [ alpha ]coordIs a weight coefficient for the bounding box coordinate prediction error; lambda [ alpha ]noobjA weight representing a classification error;
Figure BDA0002468833560000045
judging whether the center of the object falls into the grid i or not, wherein the grid contains the center of the object and is responsible for predicting the class probability of the object;
and 5: inputting the training set into a YOLO-Z network for multiple environment training, and then storing a weight file of the training set;
based on the improved YOLO-Z network, the convolution layer is added, more detailed feature extraction is obtained, and a small target is detected in a shallow layer to obtain a pedestrian detection model under an orchard. The method comprises the steps of obtaining the width and the height of candidate frames by using the prior knowledge of a data set and a K-means clustering algorithm, analyzing the influence of different candidate frame numbers on the model performance, obtaining a model with optimal performance under limited computing resources, and adjusting and optimizing training parameters in order to improve the positioning accuracy of the model.
Step 6: a Kalman filtering algorithm is introduced and is correspondingly improved to improve the robustness of the model, solve the problem of missing detection and improve the detection speed, and the method specifically comprises the following steps:
the Kalman filtering algorithm outputs an optimal recursion algorithm, and the tracking process is mainly divided into two steps: and (4) predicting and updating. After a state space model and an observation equation are established for the system, the filter can obtain a predicted value of the state variable at the current moment according to the noise of the system and the state variable at the previous moment, and the state variable is updated by combining the observation value at the current moment to finally realize the state of prediction estimation.
The equations for the state space model and the observation equation, which are the basis for the Kalman filter to perform iterative tracking, are as follows:
Xi=Ai|i-1Xi-1+wi-1
Zi=Hxi+vi
wherein, XiAnd Xi-1Is the system state corresponding to the i time and the i-1 time, Ai|i-1Is a state transition matrix, which is related to the state variable of the system and the target motion mode; ziAnd H is an observation matrix, and is related to the system matrix and the observation value. Wi-1Corresponding to system noise, viThe measured noise of the corresponding system is subject to normal distribution, and the covariance is Q, R respectively.
The invention has the following advantages:
the selection of initial clustering centers is optimized by using an improved K-means clustering algorithm, so that the similarity distance between the initial clustering centers is as large as possible, the clustering time can be effectively shortened, and the clustering effect of the algorithm is improved;
secondly, a convolution layer is added on a shallow layer of the network to obtain more detailed feature extraction, and a small target is detected on the shallow layer, so that the detection precision of the obtained YOLO-Z model is greatly improved, the detection speed is obviously improved, and the requirement of real-time detection is met;
and thirdly, the YOLO-Z model is combined with a Kalman filtering algorithm to improve the missing detection rate in the place with obvious shielding and further accelerate the detection speed.
Drawings
Fig. 1 is a flowchart of an overall implementation process of a pedestrian detection method in an orchard environment based on improved YOLOv3 in an embodiment of the present invention.
FIG. 2 is a diagram illustrating network coordinate prediction in multitasking training according to an embodiment of the present invention.
Fig. 3 is a shallow layer add convolution feature extractor based on YOLOv3 network in an embodiment of the present invention.
Fig. 4 is a diagram illustrating the effect of the orchard pedestrian detection method based on the improved YOLOv3 in the embodiment of the invention; (a) is in a static state; (b) is in a moving state; (c) is in a normal posture; (d) is in an abnormal posture; (e) is a large target; (f) is a medium target; (g) is a small target.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the invention provides an orchard environment pedestrian detection method based on improved YOLOv3, which comprises the following steps:
step 1: acquiring images of pedestrians in an orchard environment;
collecting images of various positions of an orchard where pedestrians are shot under a depth camera, wherein the shot images of the pedestrians under different sheltering environments, the images under different weather conditions and the images of the pedestrians at different distances including a short distance, a middle distance and a long distance are shot;
step 2: preprocessing the image acquired in the step 1, and constructing a standard pedestrian detection data set;
as shown in fig. 2-3, step 3: putting the training set processed and manufactured in the step 2 into a convolution characteristic device to extract pedestrian characteristics, generating anchor box numbers through a K-means clustering algorithm to generate a predicted pedestrian boundary box, and performing multi-scale fusion prediction by using a similar FPN network to improve the accuracy of the boundary box and category prediction, wherein the specific steps are as follows:
(3.1): randomly selecting the width and the height of a coordinate frame as a first clustering center;
(3.2): the nth clustering center selection principle is that the probability of selecting a frame with larger similarity distance with the current n-1 clustering centers is larger;
(3.3): looping (3.2) until all initial cluster centers are determined;
(3.4): calculating IoU the rest coordinate frames with the clustering centers one by one to obtain the similarity distance IoU loss between the two frames, and dividing the coordinate frames into the class to which the clustering center with the minimum similarity distance belongs;
(3.5): after all the coordinate frames are traversed, calculating the mean values of the width and the height of the coordinate frames in each class to be used as the clustering center of next iteration;
(3.6): and (3.4) and (3.5) are repeated until the Total IoU loss difference value of adjacent iterations is smaller than the threshold value or the iteration number is reached, and the clustering algorithm is stopped.
The improved K-means clustering algorithm mainly optimizes the selection of initial clustering centers, so that the similarity distance between the initial clustering centers is as large as possible, the method can effectively shorten the clustering time, and improve the clustering effect of the algorithm.
And 4, step 4: a more detailed feature extraction layer is added to the YOLOv3 network, and the detection output of the network in a large-scale feature layer is increased to obtain an improved network model YOLO-Z, which is specifically as follows:
(4.1): firstly, the size of the training set image obtained in step 2 is adjusted to 608 × 608, and the IOU threshold value is set to 0.45, and the confidence threshold value is set to 0.5. Each grid predicts B bounding boxes (bounding boxes), each containing 1 confidence score (confidence score) value, 4 coordinate values and C class probabilities, wherein B is the number of output feature layers anchor boxes where the grid is located. Then, for a scaled output feature layer, the final output dimension is;
the formula used for clustering is d (box, centroid) ═ 1-IOU (box, centroid)
Wherein, box is a priori box, centroid is a clustering center, IOU (box, centroid) is the intersection ratio of two areas, and when d (box, centroid) is less than or equal to a measurement threshold, the width and height of the anchor box are determined.
The formula for predicting the bounding box is
bx=σ(tx)+cx
by=σ(ty)+cy
Figure BDA0002468833560000061
Figure BDA0002468833560000062
Wherein, cxAnd cyIs the distance between the divided cell and the horizontal and vertical coordinates of the upper left corner of the image, pw、phThe width and height of the bounding box before prediction, txAnd tyTo predict the central relative parameter, σ (t)x) And σ (t)y) Respectively the distances of the center of the prediction frame from the upper left corner of the cell in which the prediction frame is positioned in the horizontal direction and the vertical direction, bxAnd byRespectively the abscissa and ordinate of the predicted bounding box center, bwAnd bhRespectively the width and height of the predicted bounding box.
The confidence of the predicted bounding box is formulated as
Figure BDA0002468833560000071
Wherein Pr (object) is 0 or 1, 0 indicates no object in the image, and 1 indicates an object;
Figure BDA0002468833560000072
representing the intersection ratio between the predicted bounding box and the actual bounding box, the confidence score (confidence score) reflects whether the target is contained and the accuracy of the predicted location in the case of containing the target. Device for placingSetting the confidence threshold to be 0.5, and deleting the predicted boundary box when the confidence of the predicted boundary box is less than 0.5; and when the confidence of the predicted bounding box is more than 0.5, the predicted bounding box is reserved.
(4.2): adding a more detailed feature extraction layer in a YOLOv3 network, and increasing the detection output of the network in a large-scale feature layer;
the YOLOv3 network adopts a large amount of convolution every time a downsampling is carried out, and according to a receptive field calculation formula, as the number of layers of the network increases, the receptive field increases, and the extracted features are formed by fusing more information, namely, the deeper the network, the more the network pays attention to the global information. The proportion of the pedestrians in the picture is small, the method belongs to small-size object detection, in the deep characteristic diagram, the influence of the information of the small-size object on the characteristic diagram is small, and the information loss of the small-size object is serious. Therefore, a more detailed feature extraction layer is added, on the basis of reserving the original output layer of YOLOv3, the output feature map is subjected to upsampling to obtain a size feature map, the size feature map is combined with the shallow size convolution layer, and the prediction output is carried out after a plurality of convolution layers to obtain a model YOLO-Z;
(4.3): then, multi-scale fusion prediction is carried out on the pedestrians through a similar FPN network, and the target detection is regarded as a regression problem through the YOLOv3 algorithm, so that a mean square error loss function is adopted;
the mean square error loss function (loss function) used for class prediction is formulated as
Figure BDA0002468833560000073
Wherein: s2Representing the mesh size of the final characteristic diagram of the network, B representing the number of prediction boxes of each mesh, x, y, w, h representing the center and width and height of the boxes, CiRepresenting the confidence that the prediction box is positioned to the pedestrian,
Figure BDA0002468833560000081
confidence, P, that a pedestrian is actually present within the framei(c) The confidence level of the predicted pedestrian is represented,
Figure BDA0002468833560000082
the confidence of the pedestrian really exists;
Figure BDA0002468833560000083
judging whether the jth bounding box in the ith grid is responsible for the object or not and judging the largest bounding box of the IOU of the group _ route _ box of the object;
Figure BDA0002468833560000084
a bounding box representing the IOU maximum; lambda [ alpha ]noobjRepresents a weight of the classification error;
Figure BDA0002468833560000085
judging whether the center of the object falls into the grid i or not, wherein the grid contains the center of the object and is responsible for predicting the class probability of the object;
and 5: inputting the training set into a YOLO-Z network for multiple environment training, and then storing a weight file of the training set;
based on the improved YOLO-Z network, the convolution layer is added, more detailed feature extraction is obtained, and a small target is detected in a shallow layer to obtain a pedestrian detection model under an orchard. The method comprises the steps of obtaining the width and the height of candidate frames by using the prior knowledge of a data set and a K-means clustering algorithm, analyzing the influence of different candidate frame numbers on the model performance, obtaining a model with optimal performance under limited computing resources, and adjusting and optimizing training parameters in order to improve the positioning accuracy of the model.
Step 6: a Kalman filtering algorithm is introduced and is correspondingly improved to improve the robustness of the model, solve the problem of missing detection and improve the detection speed, and the method specifically comprises the following steps:
the Kalman filtering algorithm outputs an optimal recursion algorithm, and the tracking process is mainly divided into two steps: and (4) predicting and updating. After a state space model and an observation equation are established for the system, the filter can obtain a predicted value of the state variable at the current moment according to the noise of the system and the state variable at the previous moment, and the state variable is updated by combining the observation value at the current moment to finally realize the state of prediction estimation.
The equations for the state space model and the observation equation, which are the basis for the Kalman filter to perform iterative tracking, are as follows:
Xi=Ai|i-1Xi-1+wi-1
Zi=Hxi+vi
wherein, XiAnd Xi-1Is the system state corresponding to the i time and the i-1 time, Ai|i-1Is a state transition matrix, which is related to the state variable of the system and the target motion mode; ziAnd H is an observation matrix, and is related to the system matrix and the observation value. Wi-1Corresponding to system noise, viThe measured noise of the corresponding system is subject to normal distribution, and the covariance is Q, R respectively. As shown in fig. 4, the pedestrian detection method in the orchard environment based on the improved YOLOv3 is based on YOLOv3, and aiming at the detection difficulties such as illumination, shielding and the like in the orchard environment, the YOLO-Z network is provided through the improvement of the training sample and the network structure, the K-means clustering algorithm and the Kalman filtering algorithm are improved, so that the accuracy and recall rate of pedestrian detection are improved, the requirement of real-time detection is met, the requirement of a network model on hardware is reduced, and the pedestrian detection of an intelligent agricultural machine in the orchard is facilitated.
In conclusion, the orchard environment pedestrian detection method based on the improved YOLOv3 is disclosed. The method comprises the following steps: s1, acquiring images in an orchard environment, preprocessing the images, and manufacturing an orchard pedestrian sample set; s2, generating anchor box quantity calculation pedestrian candidate boxes by using a K-means clustering algorithm; s3, adding a more detailed feature extraction layer in a YOLOv3 network, and increasing the detection output of the network in a large-scale feature layer to obtain an improved network model YOLO-Z; s4, inputting the training set into a YOLO-Z network for multiple environment training, and then storing the weight file; s5, introducing a Kalman filtering algorithm and improving the Kalman filtering algorithm correspondingly to improve the robustness of the model, solve the problem of missing detection and improve the detection speed. The invention solves the dilemma that the real-time detection speed of the pedestrian is low and the accuracy rate is low in the orchard environment, realizes multi-task training and ensures the detection speed and the detection precision of the pedestrian in the orchard environment.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples" or the like mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims (4)

1. An orchard environment pedestrian detection method based on improved YOLOv3 is characterized by comprising the following steps:
step 1: acquiring images of pedestrians in an orchard environment;
collecting images of various positions of an orchard where pedestrians are shot under a depth camera, wherein the shot images of the pedestrians under different sheltering environments, the images under different weather conditions and the images of the pedestrians at different distances including a short distance, a middle distance and a long distance are shot;
step 2: preprocessing the image acquired in the step 1, and constructing a standard pedestrian detection data set;
and step 3: processing the pedestrian detection data set in the step 2, making a training set, putting the training set into a convolution characteristic device for characteristic extraction of pedestrian characteristics, generating an anchor box number through a K-means clustering algorithm to generate predicted pedestrian boundary frame expansion data, and performing multi-scale fusion prediction by using a similar FPN network to improve the accuracy of boundary frame and category prediction;
and 4, step 4: a more detailed feature extraction layer is added in the YOLOv3 network, and the detection output of the network in a large-scale feature layer is increased to obtain an improved network model YOLO-Z;
and 5: inputting the training set into a YOLO-Z network for multiple environment training, and then storing a weight file of the training set;
step 6: an improved Kalman filtering algorithm is introduced to improve the robustness of the model, solve the problem of missing detection and improve the detection speed.
2. The orchard environment pedestrian detection method based on the improved YOLOv3 is characterized in that the pedestrian bounding box expansion data is generated by generating the anchor box number through a K-means clustering algorithm, and the method comprises the following specific steps:
step 3.1: randomly selecting the width and the height of a coordinate frame as a first clustering center;
step 3.2: the nth clustering center selection principle is that the probability of selecting a frame with larger similarity distance with the current n-1 clustering centers is larger;
step 3.3: looping step 3.2 until all initial cluster centers are determined;
step 3.4: calculating IoU (interaction over Union) of the rest other coordinate frames with the clustering center one by one to obtain a similarity distance IoU loss between the two frames, and dividing the coordinate frames into the class to which the clustering center with the minimum similarity distance belongs;
step 3.5: after all the coordinate frames are traversed, calculating the mean values of the width and the height of the coordinate frames in each class to be used as the clustering center of next iteration;
step 3.6: and (5) repeating the step 3.4 and the step 3.5 until the Total IoU loss difference value of adjacent iterations is smaller than the threshold value or the iteration times are reached, and stopping the clustering algorithm.
3. The orchard environment pedestrian detection method based on the improved YOLOv3 is characterized in that the step 4 is specifically as follows:
step 4.1: firstly, adjusting the size of the training set image obtained in the step 2 to 608 × 608, setting an iou (interaction over union) threshold to 0.45, setting a confidence threshold to 0.5, predicting B bounding boxes for each grid, wherein each bounding box comprises 1 confidence score value, 4 coordinate values and C class probabilities, wherein B is the number of output feature layers anchor boxes where the grids are located, and then, for the output feature layers of the size, the final output dimension is;
the formula used for clustering is
d(box,centroid)=1-IOU(box,centroid)
Wherein, box is a prior frame, centroid is a clustering center, IOU (box, centroid) is the intersection ratio of two areas, and when d (box, centroid) is less than or equal to a measurement threshold, the width and height of the anchor box are determined;
the formula for predicting the bounding box is
bx=σ(tx)+cx
by=σ(ty)+cy
Figure FDA0002468833550000021
Figure FDA0002468833550000022
Wherein, cxAnd cyIs the distance between the divided cell and the horizontal and vertical coordinates of the upper left corner of the image, pw、phThe width and height of the bounding box before prediction, txAnd tyTo predict the central relative parameter, σ (t)x) And σ (t)y) Respectively the distances of the center of the prediction frame from the upper left corner of the cell in which the prediction frame is positioned in the horizontal direction and the vertical direction, bxAnd byRespectively the abscissa and ordinate of the predicted bounding box center, bwAnd bhThe width and height of the predicted bounding box, respectively;
the confidence of the predicted bounding box is formulated as
Figure FDA0002468833550000023
Wherein Pr (object) is 0 or 1, and 0 represents the figureNo object in the image, 1 indicates that there is an object;
Figure FDA0002468833550000024
representing the intersection ratio between the predicted boundary box and the actual boundary box, reflecting whether the target is contained or not and the accuracy of the predicted position under the condition of containing the target by using a confidence score, setting a confidence threshold value to be 0.5, and deleting the predicted boundary box when the confidence of the predicted boundary box is less than 0.5; when the confidence coefficient of the predicted boundary box is greater than 0.5, the predicted boundary box is reserved;
step 4.2: adding a more detailed feature extraction layer in a YOLOv3 network, and increasing the detection output of the network in a large-scale feature layer;
the YOLOv3 network adopts a large amount of convolution every time a downsampling is carried out, and according to a receptive field calculation formula, the receptive field is increased along with the increase of the number of layers of the network, the extracted characteristics are formed by fusion of more information, namely the deeper the network is, the more the network focuses on global information, the smaller the proportion of pedestrians in pictures, the detection belongs to small-size objects, in a deep characteristic diagram, the influence of the information of the small-size objects on the characteristic diagram is smaller, and the information loss of the small-size objects is serious; therefore, a more detailed feature extraction layer is added, on the basis of reserving the original output layer of YOLOv3, the output feature map is subjected to upsampling to obtain a size feature map, the size feature map is combined with the shallow size convolution layer, and the prediction output is carried out after a plurality of convolution layers to obtain a model YOLO-Z;
step 4.3: then, multi-scale fusion prediction is carried out on the pedestrians through a similar FPN network, and the target detection is regarded as a regression problem through the YOLOv3 algorithm, so that a mean square error loss function is adopted;
the mean square error loss function used for class prediction is expressed by
Figure FDA0002468833550000031
Wherein: s2Representing the mesh size of the final characteristic diagram of the network, B representing the number of prediction boxes of each mesh, x, y, w, h representing the center and width and height of the boxes, CiRepresenting the confidence that the prediction box is positioned to the pedestrian,
Figure FDA0002468833550000032
confidence, P, that a pedestrian is actually present within the framei(c) The confidence level of the predicted pedestrian is represented,
Figure FDA0002468833550000033
the confidence of the pedestrian really exists;
Figure FDA0002468833550000034
judging whether the jth bounding box in the ith grid is in charge of the object or not and judging the largest bounding box of the IOU of the real existing target frame group _ route _ box of the object;
Figure FDA0002468833550000035
a bounding box representing the IOU maximum; lambda [ alpha ]coordIs a weight coefficient for the bounding box coordinate prediction error; lambda [ alpha ]noobjA weight representing a classification error;
Figure FDA0002468833550000036
and judging whether the center of the object falls into the grid i or not, wherein the grid contains the center of the object and is responsible for predicting the class probability of the object.
4. The orchard environment pedestrian detection method based on the improved YOLOv3 is characterized in that the step 6 specifically comprises the following steps:
the improved Kalman filtering algorithm outputs an optimal recursion algorithm, and the tracking process is mainly divided into two steps: and (4) predicting and updating. After a state space model and an observation equation are established for the system, the filter can obtain a predicted value of the state variable at the current moment according to the noise of the system and the state variable at the previous moment, and the state variable is updated by combining the observation value at the current moment to finally realize the state of prediction estimation;
the equations for the state space model and the observation equation, which are the basis for the Kalman filter to perform iterative tracking, are as follows:
Xi=Ai|i-1Xi-1+wi-1
Zi=Hxi+vi
wherein, XiAnd Xi-1Is the system state corresponding to the i time and the i-1 time, Ai|i-1Is a state transition matrix, which is related to the state variable of the system and the target motion mode; ziRepresenting the observation state of the system at time i, H being an observation matrix, related to the system matrix and the observation value, Wi-1Corresponding to system noise, viThe measured noise of the corresponding system is subject to normal distribution, and the covariance is Q, R respectively.
CN202010341941.7A 2020-04-27 2020-04-27 Pedestrian detection method based on improved YOLOv3 in orchard environment Active CN111626128B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010341941.7A CN111626128B (en) 2020-04-27 2020-04-27 Pedestrian detection method based on improved YOLOv3 in orchard environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010341941.7A CN111626128B (en) 2020-04-27 2020-04-27 Pedestrian detection method based on improved YOLOv3 in orchard environment

Publications (2)

Publication Number Publication Date
CN111626128A true CN111626128A (en) 2020-09-04
CN111626128B CN111626128B (en) 2023-07-21

Family

ID=72260566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010341941.7A Active CN111626128B (en) 2020-04-27 2020-04-27 Pedestrian detection method based on improved YOLOv3 in orchard environment

Country Status (1)

Country Link
CN (1) CN111626128B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307955A (en) * 2020-10-29 2021-02-02 广西科技大学 Optimization method based on SSD infrared image pedestrian detection
CN112329697A (en) * 2020-11-18 2021-02-05 广西师范大学 Improved YOLOv 3-based on-tree fruit identification method
CN112347938A (en) * 2020-11-09 2021-02-09 南京机电职业技术学院 People stream detection method based on improved YOLOv3
CN112381021A (en) * 2020-11-20 2021-02-19 安徽一视科技有限公司 Personnel detection counting method based on deep learning
CN112381043A (en) * 2020-11-27 2021-02-19 华南理工大学 Flag detection method
CN112541483A (en) * 2020-12-25 2021-03-23 三峡大学 Dense face detection method combining YOLO and blocking-fusion strategy
CN112668662A (en) * 2020-12-31 2021-04-16 北京理工大学 Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN112766188A (en) * 2021-01-25 2021-05-07 浙江科技学院 Small-target pedestrian detection method based on improved YOLO algorithm
CN112911171A (en) * 2021-02-04 2021-06-04 上海航天控制技术研究所 Intelligent photoelectric information processing system and method based on accelerated processing
CN113111703A (en) * 2021-03-02 2021-07-13 郑州大学 Airport pavement disease foreign matter detection method based on fusion of multiple convolutional neural networks
CN113139481A (en) * 2021-04-28 2021-07-20 广州大学 Classroom people counting method based on yolov3
CN113378753A (en) * 2021-06-23 2021-09-10 华南农业大学 Improved YOLOv 4-based boundary target identification method for rice field in seedling stage
CN113486764A (en) * 2021-06-30 2021-10-08 中南大学 Pothole detection method based on improved YOLOv3
CN113822169A (en) * 2021-08-30 2021-12-21 江苏大学 Orchard tree pedestrian detection method based on improved PP-YOLO

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325418A (en) * 2018-08-23 2019-02-12 华南理工大学 Based on pedestrian recognition method under the road traffic environment for improving YOLOv3
CN109934121A (en) * 2019-02-21 2019-06-25 江苏大学 A kind of orchard pedestrian detection method based on YOLOv3 algorithm
CN110070074A (en) * 2019-05-07 2019-07-30 安徽工业大学 A method of building pedestrian detection model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325418A (en) * 2018-08-23 2019-02-12 华南理工大学 Based on pedestrian recognition method under the road traffic environment for improving YOLOv3
CN109934121A (en) * 2019-02-21 2019-06-25 江苏大学 A kind of orchard pedestrian detection method based on YOLOv3 algorithm
CN110070074A (en) * 2019-05-07 2019-07-30 安徽工业大学 A method of building pedestrian detection model

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307955A (en) * 2020-10-29 2021-02-02 广西科技大学 Optimization method based on SSD infrared image pedestrian detection
CN112347938A (en) * 2020-11-09 2021-02-09 南京机电职业技术学院 People stream detection method based on improved YOLOv3
CN112347938B (en) * 2020-11-09 2023-09-26 南京机电职业技术学院 People stream detection method based on improved YOLOv3
CN112329697A (en) * 2020-11-18 2021-02-05 广西师范大学 Improved YOLOv 3-based on-tree fruit identification method
CN112329697B (en) * 2020-11-18 2022-04-12 广西师范大学 Improved YOLOv 3-based on-tree fruit identification method
CN112381021A (en) * 2020-11-20 2021-02-19 安徽一视科技有限公司 Personnel detection counting method based on deep learning
CN112381021B (en) * 2020-11-20 2022-07-12 安徽一视科技有限公司 Personnel detection counting method based on deep learning
CN112381043A (en) * 2020-11-27 2021-02-19 华南理工大学 Flag detection method
CN112541483A (en) * 2020-12-25 2021-03-23 三峡大学 Dense face detection method combining YOLO and blocking-fusion strategy
CN112668662A (en) * 2020-12-31 2021-04-16 北京理工大学 Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN112668662B (en) * 2020-12-31 2022-12-06 北京理工大学 Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN112766188A (en) * 2021-01-25 2021-05-07 浙江科技学院 Small-target pedestrian detection method based on improved YOLO algorithm
CN112911171A (en) * 2021-02-04 2021-06-04 上海航天控制技术研究所 Intelligent photoelectric information processing system and method based on accelerated processing
CN113111703A (en) * 2021-03-02 2021-07-13 郑州大学 Airport pavement disease foreign matter detection method based on fusion of multiple convolutional neural networks
CN113139481A (en) * 2021-04-28 2021-07-20 广州大学 Classroom people counting method based on yolov3
CN113139481B (en) * 2021-04-28 2023-09-01 广州大学 Classroom people counting method based on yolov3
CN113378753A (en) * 2021-06-23 2021-09-10 华南农业大学 Improved YOLOv 4-based boundary target identification method for rice field in seedling stage
CN113486764B (en) * 2021-06-30 2022-05-03 中南大学 Pothole detection method based on improved YOLOv3
CN113486764A (en) * 2021-06-30 2021-10-08 中南大学 Pothole detection method based on improved YOLOv3
CN113822169A (en) * 2021-08-30 2021-12-21 江苏大学 Orchard tree pedestrian detection method based on improved PP-YOLO
CN113822169B (en) * 2021-08-30 2024-03-19 江苏大学 Orchard tree pedestrian detection method based on improved PP-YOLO

Also Published As

Publication number Publication date
CN111626128B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN111626128A (en) Improved YOLOv 3-based pedestrian detection method in orchard environment
CN109559320B (en) Method and system for realizing visual SLAM semantic mapping function based on hole convolution deep neural network
CN109800689B (en) Target tracking method based on space-time feature fusion learning
CN109614985B (en) Target detection method based on densely connected feature pyramid network
CN110135243B (en) Pedestrian detection method and system based on two-stage attention mechanism
CN110991311B (en) Target detection method based on dense connection deep network
CN110222769B (en) Improved target detection method based on YOLOV3-tiny
CN111179217A (en) Attention mechanism-based remote sensing image multi-scale target detection method
CN109508675B (en) Pedestrian detection method for complex scene
CN109145836B (en) Ship target video detection method based on deep learning network and Kalman filtering
CN113076871B (en) Fish shoal automatic detection method based on target shielding compensation
CN110991444B (en) License plate recognition method and device for complex scene
CN111753682B (en) Hoisting area dynamic monitoring method based on target detection algorithm
CN109801297B (en) Image panorama segmentation prediction optimization method based on convolution
CN112884742A (en) Multi-algorithm fusion-based multi-target real-time detection, identification and tracking method
CN110334584B (en) Gesture recognition method based on regional full convolution network
CN109242019A (en) A kind of water surface optics Small object quickly detects and tracking
CN108537825B (en) Target tracking method based on transfer learning regression network
CN113688797A (en) Abnormal behavior identification method and system based on skeleton extraction
CN114565842A (en) Unmanned aerial vehicle real-time target detection method and system based on Nvidia Jetson embedded hardware
CN115565130A (en) Unattended system and monitoring method based on optical flow
CN117058235A (en) Visual positioning method crossing various indoor scenes
Ouyang et al. Aerial target detection based on the improved YOLOv3 algorithm
CN111291785A (en) Target detection method, device, equipment and storage medium
CN113793472B (en) Image type fire detector pose estimation method based on feature depth aggregation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant