CN115482489A - Improved YOLOv 3-based power distribution room pedestrian detection and trajectory tracking method and system - Google Patents

Improved YOLOv 3-based power distribution room pedestrian detection and trajectory tracking method and system Download PDF

Info

Publication number
CN115482489A
CN115482489A CN202211141822.2A CN202211141822A CN115482489A CN 115482489 A CN115482489 A CN 115482489A CN 202211141822 A CN202211141822 A CN 202211141822A CN 115482489 A CN115482489 A CN 115482489A
Authority
CN
China
Prior art keywords
layer
pedestrian
pedestrian detection
image
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211141822.2A
Other languages
Chinese (zh)
Inventor
王增煜
陈申宇
陈泽涛
刘秦铭
张攀
黄海波
马灿桂
陈志健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Original Assignee
Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd filed Critical Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority to CN202211141822.2A priority Critical patent/CN115482489A/en
Publication of CN115482489A publication Critical patent/CN115482489A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras

Abstract

The invention discloses a pedestrian detection and trajectory tracking method and system for a power distribution room based on improved YOLO v3, wherein the method comprises the following steps: s1, selecting frames of a video, carrying out format conversion, designing a reasonable video frame selection interval, and converting an intercepted single-frame picture into a JPG format picture which can be processed by a model; s2, image preprocessing and pedestrian detection, wherein the image preprocessing is carried out on the picture after format conversion, and the picture is input into a pedestrian detection model to judge whether a pedestrian is detected; if a pedestrian is detected, performing step S3, and if no pedestrian is detected, ending the method; s3, image segmentation, namely preprocessing operation before image identification is carried out; and S4, tracking and identifying the track and obtaining a result, judging whether the alarm condition is met or not according to the track tracking result, and if so, alarming. According to the invention, the improved YOLOv3 model is used as a detector of Deep SORT, so that the problem of inaccurate detection and tracking of the traditional model is solved, and effective monitoring on pedestrians in the power distribution room is realized.

Description

Improved YOLOv 3-based power distribution room pedestrian detection and trajectory tracking method and system
Technical Field
The invention belongs to the technical field of pedestrian detection, and particularly relates to a power distribution room pedestrian detection and trajectory tracking method and system based on improved YOLO v 3.
Background
With the increase of the workload of the operation and construction of the power distribution room year by year, although the power enterprise unit formulates a perfect safety risk management system, the power distribution room has multiple and wide field working points and complex working environment, and cannot comprehensively prevent the safety risk only by depending on management regulations, field working responsible persons and safety supervision personnel to perform duties. The defects of civil defense are overcome by urgent need of technical defense, namely, by means of effective technical means, the construction of the on-site safety prevention and control system can effectively guarantee the implementation of key risk points, high-risk point early warning is issued to operation construction workers in real time, and the final aim of reducing safety accidents is fulfilled.
The traditional pedestrian recognition model is inaccurate in detection and tracking and difficult to realize effective monitoring of pedestrians in the power distribution room.
Disclosure of Invention
The invention mainly aims to overcome the defects in the prior art and provides a power distribution room pedestrian detection and trajectory tracking method and system based on improved YOLO v 3.
In order to achieve the purpose, the invention adopts the following technical scheme:
a pedestrian detection and trajectory tracking method for a power distribution room based on improved YOLOv3 comprises the following steps:
s1, selecting frames of a video, carrying out format conversion, designing a reasonable video frame selection interval according to comprehensive consideration of scenes, requirements and performance, and converting an intercepted single-frame picture into a JPG format picture which can be processed by a model;
s2, image preprocessing and pedestrian detection, wherein the image preprocessing is carried out on the picture after format conversion, and the picture is input into a pedestrian detection model to judge whether a pedestrian is detected; if a pedestrian is detected, performing step S3, and if no pedestrian is detected, ending the method;
s3, image segmentation, namely preprocessing operation before image identification;
and S4, tracking and identifying the track and obtaining a result, judging whether the alarm condition is met or not according to the track tracking result, and if so, alarming.
Further, the image preprocessing specifically includes image enhancement, sharpening, smoothing, denoising, gray scale adjustment, and image clipping.
Further, the pedestrian detection model is specifically an improved YOLOv3 model, and the YOLOv3 model comprises a feature extraction network Darknet-33 and a YOLO multi-scale prediction layer;
the image size of a YOLOv3 model is input to be 416 x 3, 5 times of downsampling is conducted on a feature extraction network, the obtained feature map is output to a YOLO multi-scale prediction layer, tensor dimensions are expanded through a concat mechanism, connection of upsampling and a shallow feature map is achieved, feature maps with the sizes of 13 x 13, 26 x 26 and 52 x 52 are output, each feature map can be predicted by a corresponding grid, 3 prediction frames are arranged at each grid point to be responsible for prediction of an area, and as long as the center of an object is located in the area, the object is determined by the grid point.
Further, the YOLOv3 model inputs the picture size 416 × 416 × 3, firstly convolves the picture, the channel is changed to 32, residual convolution is performed once, the shape is changed to 208 × 208 × 64, two times of residual convolution are performed again, the shape is changed to 104 × 104 × 128, then eight times of residual convolution are performed, the shape is changed to 52 × 52 × 256, and the layer is output as a first feature layer;
carrying out eight times of residual convolution, changing the shape into 26 multiplied by 512, and outputting the layer as a second characteristic layer;
performing residual convolution for four times again, wherein the shape is changed into 13 multiplied by 1024, outputting the layer as a third characteristic layer, performing convolution for 5 times on the layer of characteristic layer, adding one-way sampling and the second layer of characteristic layer, performing convolution for 3 multiplied by 3 and 1 multiplied by 1, and outputting a result with the shape of 13 multiplied by B; after adding with the second layer of characteristic layer, one path of up-sampling is carried out to add with the first layer of characteristic layer, convolution with the size of 3 multiplied by 3 and the size of 1 multiplied by 1 is carried out to output a result with the shape of 26 multiplied by B; adding the first layer feature layer and then convolving with 3 × 3 and 1 × 1 to obtain an output result with the shape of 56 × 56 × B, where B is the number of prediction types +1+4.
Further, the YOLOv3 model scales the input image to 416 × 416 for training, then uniformly divides the input image into S × S grids, predicts a bounding box in each grid for target detection, predicts and outputs the position and the type of the bounding box of each type of target each time, and respectively calculates the confidence of each bounding box; if the center point of the object is on a certain grid, the grid is responsible for predicting the object, and three anchor points are generated on the object;
predicting three boundary frames by each grid through dimension clustering and logistic regression by means of three anchor point frames; the grid responsible for predicting each object needs to predict 5 values, namely the position of the grid and the probability value of the object; wherein the position of the self-body needs 4 values to be determined, including the central point coordinate of the prediction frame and the width and the height of the prediction frame, which are respectively t x 、t y 、t w 、t h
Wherein the latter two terms relate to the k value;
if the center target is offset in the cell relative to the upper left corner of the image (c x ,c y ) The height and width of the anchor box is denoted p w And p h Then, the specific calculation formula of the corrected bounding box is as follows:
b x =δ(t x )+c x
b y =δ(t y )+c y
Figure BDA0003853929190000031
Figure BDA0003853929190000032
further, the YOLOv3 model includes losses in three aspects, namely, losses of a prediction box, a confidence level, and a category, and a specific loss function is as follows:
Figure BDA0003853929190000041
Figure BDA0003853929190000042
Figure BDA0003853929190000043
wherein L is loc To predict frame loss, λ coord In order to be the weight coefficient,
Figure BDA0003853929190000044
indicating whether the neurons of cell i in the jth sliding window include the normalized value, x, of the detection target object i 、y i 、w i And h i Respectively representing the coordinate of the central point, the length and the width of a sliding window of the prediction unit cell i as a prediction value; the corresponding real value is expressed as
Figure BDA0003853929190000045
And
Figure BDA0003853929190000046
L conf for confidence loss, the overlapping area of the sliding window and the real detection object region is expressed, lambda noobj To penalize errors, C i In order to predict the degree of confidence,
Figure BDA0003853929190000047
is the corresponding true value;
L cls as class loss, p i (c) The cell i representing the prediction contains the conditional probability of the kth class object, and
Figure BDA0003853929190000048
representing the corresponding true probability value.
Further, the improved YOLOv3 model is specifically an improved YOLOv3 multi-scale detection network formed by replacing Darknet-53 with a Wide-Darknet-33 novel feature extraction network and adding a detection layer of 104 x 104.
Furthermore, the Wide-Darknet-33 comprises 13 residual blocks, 32 convolutional layers and 1 full-connection layer, the depth is reduced by reducing the convolutional layers of the Darknet-53, and meanwhile, the network is widened, so that the feature extraction is more accurate in width;
in the aspect of multi-scale detection, in order to reduce the omission rate of small head and shoulder targets under a complex background, two groups of convolution groups of 1 × 1 and 3 × 3 in front of three YOLO layers of a YOLOv3 model are removed respectively.
Further, when the prior frame is obtained, clustering is performed by adopting a K-means + + algorithm, which specifically comprises the following steps:
selecting initial K clustering central points by a wheel disc method, wherein the total number of samples is Q, and the clustering is K types, and the specific clustering process is as follows:
firstly, randomly picking a point in a data set as a certain type of clustering center point;
step two, calculating the distance D (x) between each point x and the central point, and summing the calculated distances to obtain Sum (D (x));
step three, normalizing the data by using D (x)/Sum (D (x)) to regenerate a sequence consisting of the first N D (x)/Sum (D (x)) and Sum (D (x)/Sum (D (x)), wherein N is an integer which is added with 1 from 1 in sequence, the range of N is [1, Q ], then taking a Random value Random from [0,1], and then using Random- = Sum (D (x)/Sum (D (x)) until the Random value is less than or equal to 0, wherein the point is the next cluster center;
step four, repeating the step two and the step three until K clustering centers are selected;
and step five, performing a K-means algorithm by using the K initial clustering centers.
The invention also comprises a pedestrian detection and trajectory tracking system of the power distribution room, wherein the system adopts the method provided by the invention and comprises a video frame selection module, an image segmentation and preprocessing module, a pedestrian detection module and a trajectory tracking and identifying module;
the video frame selection module is used for setting a certain frame selection interval to select frames of the video according to comprehensive consideration of scenes, requirements and performance, and converting the intercepted single-frame picture into a JPG format picture which can be processed by a model;
the image segmentation and preprocessing module is used for preprocessing the picture after format conversion; the image segmentation device is also used for carrying out image segmentation processing;
a pedestrian detection module that detects a pedestrian on the basis of the pedestrian detection model on the input preprocessed image;
and the track tracking identification module is used for carrying out track tracking identification on the pedestrian when the pedestrian detection module detects the pedestrian and obtaining a result, judging whether the alarm condition is met or not according to the track tracking result, and giving an alarm if the alarm condition is met.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the improved YOLO v3 algorithm model is used as a detector of Deep SORT, the problem that the detection and tracking of the traditional model are inaccurate is solved, and effective monitoring on pedestrians in the power distribution room is realized; the intelligent power distribution room has the advantages of real-time detection, accurate alarm, automatic pushing of alarm information and the like, and 24 × 7 all-weather and omission-free real-time detection in the power distribution room area is realized through video monitoring, and the intrusion alarm information in the peripheral area is automatically and intelligently pushed to an attendant.
Drawings
FIG. 1 is a general flow diagram of the process of the present invention;
FIG. 2 is a diagram of the structure of YOLOv 3;
FIG. 3a is a schematic diagram of the predicted bounding box of YOLO v3 in single bin;
FIG. 3b is a schematic diagram of the predicted bounding box of YOLO v3 in single bin;
FIG. 4 is a diagram illustrating the relative position of the modified bounding box;
FIG. 5 is a block diagram of YOLO v3 and modified YOLO v 3;
FIG. 6 is a flow chart of the K-means clustering algorithm.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Examples
As shown in fig. 1, the method for detecting pedestrians and tracking tracks in a power distribution room based on improved YOLOv3 of the present invention includes the following steps:
s1, selecting frames of a video, carrying out format conversion, designing a reasonable video frame selection interval according to comprehensive consideration of scenes, requirements and performance, and converting an intercepted single-frame picture into a JPG format picture which can be processed by a model;
s2, image preprocessing and pedestrian detection, wherein the image preprocessing is carried out on the picture after format conversion, and the picture is input into a pedestrian detection model to judge whether a pedestrian is detected; the image preprocessing specifically includes image enhancement/sharpening, smoothing, denoising, gray scale adjustment, image clipping, and other processing.
S3, preprocessing the picture before identification according to a pedestrian detection result;
and S4, tracking and identifying the track and obtaining a result, judging whether the alarm condition is met or not according to the track tracking result, and if so, alarming.
In this embodiment, the pedestrian detection model is specifically an improved YOLOv3 model, and the YOLOv3 model includes a feature extraction network Darknet-33 and a YOLO multiscale prediction layer;
the image size of a YOLOv3 model is input to be 416 x 3, 5 times of downsampling is conducted on a feature extraction network, the obtained feature map is output to a YOLO multi-scale prediction layer, tensor dimensions are expanded through a concat mechanism, connection of upsampling and a shallow feature map is achieved, feature maps with the sizes of 13 x 13, 26 x 26 and 52 x 52 are output, each feature map can be predicted by a corresponding grid, 3 prediction frames are arranged at each grid point to be responsible for prediction of an area, and as long as the center of an object is located in the area, the object is determined by the grid point. By means of the multi-scale method, small objects can be detected better. The YOLOv3 model network structure is shown in fig. 2.
The YOLOv3 model inputs the picture size of 416 × 416 × 3, firstly convolves the picture, the channel is changed to 32, residual convolution is carried out once, the shape is changed to 208 × 208 × 64, then residual convolution is carried out twice, the shape is changed to 104 × 104 × 128, then residual convolution is carried out eight times, the shape is changed to 52 × 52 × 256, and the layer is output as a first feature layer;
eight residual convolution times are carried out, the shape is changed into 26 multiplied by 512, and the layer is taken as a second characteristic layer to be output;
performing residual convolution for four times again, wherein the shape is changed into 13 multiplied by 1024, outputting the layer as a third characteristic layer, performing convolution for 5 times on the layer of characteristic layer, adding one-way sampling and the second layer of characteristic layer, performing convolution for 3 multiplied by 3 and 1 multiplied by 1, and outputting a result with the shape of 13 multiplied by B; adding the first layer of feature layer and the second layer of feature layer, performing up-sampling on one path of the result, adding the result to the first layer of feature layer, performing convolution with the length of 3 multiplied by 3 and the length of 1 multiplied by 1, and outputting a result with the shape of 26 multiplied by B; convolution with 3 × 3, 1 × 1 after addition with the first layer feature layer yields an output result with a shape of 56 × 56 × B, where B is the number of predicted species +1+4.
The YOLO v3 model scales an input image to 416 x 416 dimensions for training, then uniformly divides the image into S x S grids, predicts a bounding box in each grid for target detection, outputs the position and the category of the bounding box of each type of target in each prediction, and respectively calculates the confidence coefficient of each bounding box; if the center point of the object falls on a certain grid, the grid is responsible for predicting the object, and three anchor blocks are generated on the object, as shown in fig. 3a and 3 b.
Predicting three boundary frames by each grid through dimension clustering and logistic regression by means of three anchor point frames; the grid responsible for predicting each object needs to predict 5 values, namely the position of the grid and the probability value of the object; the self position needs 4 values to be determined, including the central point coordinate of the prediction frame and the width and height of the prediction frame, which are t x 、t y 、t w 、t h
Wherein the latter two terms relate to the k value;
if the center target is offset in the cell relative to the upper left corner of the image (c) x ,c y ) The height and width of the anchor box is denoted p w And p h Then, the specific calculation formula of the corrected bounding box is as follows:
b x =δ(t x )+c x
b y =δ(t y )+c y
Figure BDA0003853929190000081
Figure BDA0003853929190000082
fig. 4 is a schematic diagram showing the relative position of the corrected bounding box.
The YOLO v3 model includes losses in three aspects, namely, losses of a prediction box, confidence and a category, and a specific loss function is as follows:
Figure BDA0003853929190000091
Figure BDA0003853929190000092
Figure BDA0003853929190000093
wherein L is loc To predict frame loss, λ coord In order to be the weight coefficient,
Figure BDA0003853929190000094
indicating whether the neurons of cell i in the jth sliding window include the normalized value, x, of the detection target object i 、y i 、w i And h i Respectively representing the coordinate of the central point, the length and the width of a sliding window of the prediction unit cell i as a prediction value; the corresponding real value is expressed as
Figure BDA0003853929190000095
And
Figure BDA0003853929190000096
L conf for confidence loss, the overlapping area of the sliding window and the real detection object region is expressed, lambda noobj To penalize errors, C i In order to predict the degree of confidence,
Figure BDA0003853929190000097
is the corresponding true value;
L cls as class loss, p i (c) The cell i representing the prediction contains the conditional probability of the kth class object, and
Figure BDA0003853929190000098
representing the corresponding true probability value.
As shown in fig. 5, the improved YOLOv3 model is specifically an improved YOLOv3 multi-scale detection network, which is improved by adopting a Wide-Darknet-33 novel feature extraction network to replace Darknet-53 and adding a detection layer of 104 × 104.
The Wide-Darknet-33 comprises 13 residual blocks, 32 convolutional layers and 1 full connection layer, reduces the depth by reducing the convolutional layers of the Darknet-53, and widens the network so as to ensure that the feature extraction is more accurate in width;
in the aspect of multi-scale detection, in order to reduce the omission rate of small head and shoulder targets under a complex background, two groups of convolution groups of 1 × 1 and 3 × 3 in front of three YOLO layers of the YOLOv3 model are removed respectively.
The embodiment aims at the problems that when the YOLOv3 obtains the prior frame, the K-means algorithm has a large dependence on an initial value, so that the clustering effect is not accurate, the obtained anchor frame has a low matching degree with data characteristics, and the detection precision is low, and when the prior frame is obtained, the K-means + + algorithm is adopted for clustering. FIG. 6 is a flow chart of the K-means algorithm.
The clustering by adopting the K-means + + algorithm specifically comprises the following steps:
selecting initial K clustering central points by a wheel disc method, wherein the total number of samples is Q, and the clustering is K types, and the specific clustering process is as follows:
firstly, randomly picking a point in a data set as a certain type of clustering central point;
step two, calculating the distance D (x) between each point x and the central point, and summing the calculated distances to obtain Sum (D (x));
step three, normalizing the data by using D (x)/Sum (D (x)) to regenerate a sequence consisting of the first N D (x)/Sum (D (x)) and Sum (D (x)/Sum (D (x)), wherein N is an integer which is added with 1 from 1 in sequence, and the range of N is [1, Q ], then taking a Random value Random from [0,1], and then using Random- = Sum (D (x)/Sum (D (x)) until the Random value is less than or equal to 0, wherein the point is the next cluster center;
step four, repeating the step two and the step three until K clustering centers are selected;
and step five, performing a K-means algorithm by using the K initial clustering centers.
In another embodiment, a pedestrian detection and trajectory tracking system for a power distribution room is further provided, wherein the system adopts the method of the above embodiment and comprises a video frame selection module, an image segmentation and preprocessing module, a pedestrian detection module and a trajectory tracking identification module;
the video frame selection module is used for setting a certain frame selection interval to select frames of the video according to comprehensive consideration of scenes, requirements and performance, and converting the intercepted single-frame picture into a JPG format picture which can be processed by a model;
the image segmentation and preprocessing module is used for preprocessing the picture after format conversion; the image segmentation processing module is also used for carrying out image segmentation processing;
the pedestrian detection module is used for detecting pedestrians on the input preprocessed image based on the pedestrian detection model;
and the track tracking identification module is used for carrying out track tracking identification on the pedestrian when the pedestrian detection module detects the pedestrian and obtaining a result, judging whether the alarm condition is met or not according to the track tracking result, and giving an alarm if the alarm condition is met.
It should also be noted that in the present specification, terms such as "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A power distribution room pedestrian detection and trajectory tracking method based on improved YOLOv3 is characterized by comprising the following steps:
s1, selecting frames of a video, carrying out format conversion, designing a reasonable video frame selection interval according to comprehensive consideration of scenes, requirements and performance, and converting an intercepted single-frame picture into a JPG format picture which can be processed by a model;
s2, image preprocessing and pedestrian detection, wherein the image preprocessing is carried out on the picture after format conversion, and the picture is input into a pedestrian detection model to judge whether a pedestrian is detected; if a pedestrian is detected, performing step S3, and if no pedestrian is detected, ending the method;
s3, image segmentation, namely preprocessing operation before image identification;
and S4, tracking and identifying the track and obtaining a result, judging whether the alarm condition is met or not according to the track tracking result, and if so, alarming.
2. The improved YOLOv 3-based power distribution room pedestrian detection and trajectory tracking method according to claim 1, wherein the image pre-processing specifically comprises image enhancement, sharpening, smoothing, denoising, gray scale adjustment, and image cropping.
3. The improved yollov 3-based power distribution room pedestrian detection and trajectory tracking method according to claim 1, wherein the pedestrian detection model is an improved yollov 3 model, and the yollov 3 model comprises a feature extraction network Darknet-33 and a YOLO multi-scale prediction layer;
the image size of a YOLOv3 model is input to be 416 x 3, 5 times of downsampling is conducted on a feature extraction network, the obtained feature map is output to a YOLO multi-scale prediction layer, tensor dimensions are expanded through a concat mechanism, connection of upsampling and a shallow feature map is achieved, feature maps with the sizes of 13 x 13, 26 x 26 and 52 x 52 are output, each feature map can be predicted by a corresponding grid, 3 prediction frames are arranged at each grid point to be responsible for prediction of an area, and as long as the center of an object is located in the area, the object is determined by the grid point.
4. The improved YOLOv 3-based power distribution room pedestrian detection and trajectory tracking method according to claim 3, characterized in that the YOLOv3 model inputs a picture size of 416 x 3, the picture is convolved first, the channel is changed to 32, one residual convolution is performed, the shape is changed to 208 x 64, two residual convolutions are performed again, the shape is changed to 104 x 128, eight residual convolutions are performed again, the shape is changed to 52 x 256, and the layer is output as a first feature layer;
carrying out eight times of residual convolution, changing the shape into 26 multiplied by 512, and outputting the layer as a second characteristic layer;
performing residual convolution for four times again, wherein the shape is changed into 13 multiplied by 1024, outputting the layer as a third characteristic layer, performing convolution for 5 times on the layer of characteristic layer, adding one-way sampling and the second layer of characteristic layer, performing convolution for 3 multiplied by 3 and 1 multiplied by 1, and outputting a result with the shape of 13 multiplied by B; after adding with the second layer of characteristic layer, one path of up-sampling is carried out to add with the first layer of characteristic layer, convolution with the size of 3 multiplied by 3 and the size of 1 multiplied by 1 is carried out to output a result with the shape of 26 multiplied by B; adding the first layer feature layer and then convolving with 3 × 3 and 1 × 1 to obtain an output result with the shape of 56 × 56 × B, where B is the number of prediction types +1+4.
5. The improved YOLOv 3-based power distribution room pedestrian detection and trajectory tracking method according to claim 1, wherein the YOLOv3 model scales an input image to 416 x 416 dimensions for training, then uniformly divides the input image into S x S grids, predicts bounding boxes in each grid for target detection, outputs the bounding box position and category of each type of target for each prediction, and calculates the confidence of each bounding box respectively; if the center point of the object is on a certain grid, the grid is responsible for predicting the object, and three anchor points are generated on the object;
predicting three boundary frames by each grid through dimension clustering and logistic regression by means of three anchor point frames; the grid responsible for predicting each object needs to predict 5 values, namely the position of the grid and the probability value of the object; in which the self position needs 4 values to be determined, including the central point of the prediction boxThe width and height of the target and prediction boxes are t x 、t y 、t w 、t h
Wherein the latter two terms relate to the k value;
if the center target is offset in the cell relative to the upper left corner of the image (c) x ,c y ) The height and width of the anchor box is denoted as p w And p h Then, the specific calculation formula of the corrected bounding box is as follows:
b x =δ(t x )+c x
b y =δ(t y )+c y
Figure FDA0003853929180000021
Figure FDA0003853929180000022
6. the improved YOLOv 3-based power distribution room pedestrian detection and trajectory tracking method according to claim 5, wherein the YOLOv3 model includes losses in three aspects, namely, a prediction box, a confidence level and a category, and the specific loss function is as follows:
Figure FDA0003853929180000031
Figure FDA0003853929180000032
Figure FDA0003853929180000033
wherein L is loc To predict frame loss, λ coord In order to be the weight coefficient,
Figure FDA0003853929180000034
indicating whether the neurons of cell i in the jth sliding window include the normalized value, x, of the detection target object i 、y i 、w i And h i Respectively representing the coordinate of the central point, the length and the width of a sliding window of the prediction unit cell i as a prediction value; the corresponding real value is expressed as
Figure FDA0003853929180000035
And
Figure FDA0003853929180000036
L conf for confidence loss, the overlapping area of the sliding window and the real detection object region is expressed, lambda noobj To penalize errors, C i In order to predict the degree of confidence,
Figure FDA0003853929180000037
is the corresponding true value;
L cls as class loss, p i (c) The cell i representing the prediction contains the conditional probability of the kth class object, and
Figure FDA0003853929180000038
representing the corresponding true probability value.
7. The improved yollov 3-based power distribution room pedestrian detection and trajectory tracking method according to claim 3, wherein the improved yollov 3 model is specifically a new feature extraction network Wide-Darknet-33 is adopted to replace Darknet-53, and a 104 x 104 detection layer is added to improve the yollov 3 multi-scale detection network.
8. The improved YOLOv 3-based power distribution room pedestrian detection and trajectory tracking method according to claim 7, wherein Wide-Darknet-33 comprises 13 residual blocks, 32 convolutional layers, 1 fully-connected layer, the depth is reduced by reducing convolutional layers of Darknet-53, and the network is widened at the same time, so that the feature extraction is more accurate in width;
in the aspect of multi-scale detection, in order to reduce the omission rate of small head and shoulder targets under a complex background, two groups of convolution groups of 1 × 1 and 3 × 3 in front of three YOLO layers of the YOLOv3 model are removed respectively.
9. The improved YOLOv 3-based power distribution room pedestrian detection and trajectory tracking method according to claim 3, wherein when the prior frame is obtained, clustering is performed by using a K-means + + algorithm, specifically:
selecting initial K clustering center points by a wheel disc method, wherein the total number of samples is Q, and the clustering is K, and the specific clustering process is as follows:
firstly, randomly picking a point in a data set as a certain type of clustering center point;
step two, calculating the distance D (x) between each point x and the central point, and summing the calculated distances to obtain Sum (D (x));
step three, normalizing the data by using D (x)/Sum (D (x)) to regenerate a sequence consisting of the first N D (x)/Sum (D (x)) and Sum (D (x)/Sum (D (x)), wherein N is an integer which is added with 1 from 1 in sequence, the range of N is [1, Q ], then taking a Random value Random from [0,1], and then using Random- = Sum (D (x)/Sum (D (x)) until the Random value is less than or equal to 0, wherein the point is the next cluster center;
step four, repeating the step two and the step three until K clustering centers are selected;
and step five, performing a K-means algorithm by using the K initial clustering centers.
10. A pedestrian detection and trajectory tracking system for a power distribution room, wherein the system adopts the method of any one of claims 1 to 9, and comprises a video frame selection module, an image segmentation and preprocessing module, a pedestrian detection module and a trajectory tracking identification module;
the video frame selection module is used for setting a certain frame selection interval to select frames of the video according to comprehensive consideration of scenes, requirements and performance, and converting the intercepted single-frame picture into a JPG format picture which can be processed by a model;
the image segmentation and preprocessing module is used for preprocessing the picture after format conversion; the image segmentation device is also used for carrying out image segmentation processing;
the pedestrian detection module is used for detecting pedestrians on the input preprocessed image based on the pedestrian detection model;
and the track tracking identification module is used for carrying out track tracking identification on the pedestrian when the pedestrian detection module detects the pedestrian and obtaining a result, judging whether the alarm condition is met or not according to the track tracking result, and giving an alarm if the alarm condition is met.
CN202211141822.2A 2022-09-20 2022-09-20 Improved YOLOv 3-based power distribution room pedestrian detection and trajectory tracking method and system Pending CN115482489A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211141822.2A CN115482489A (en) 2022-09-20 2022-09-20 Improved YOLOv 3-based power distribution room pedestrian detection and trajectory tracking method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211141822.2A CN115482489A (en) 2022-09-20 2022-09-20 Improved YOLOv 3-based power distribution room pedestrian detection and trajectory tracking method and system

Publications (1)

Publication Number Publication Date
CN115482489A true CN115482489A (en) 2022-12-16

Family

ID=84423268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211141822.2A Pending CN115482489A (en) 2022-09-20 2022-09-20 Improved YOLOv 3-based power distribution room pedestrian detection and trajectory tracking method and system

Country Status (1)

Country Link
CN (1) CN115482489A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116486290A (en) * 2023-06-21 2023-07-25 成都庆龙航空科技有限公司 Unmanned aerial vehicle monitoring and tracking method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116486290A (en) * 2023-06-21 2023-07-25 成都庆龙航空科技有限公司 Unmanned aerial vehicle monitoring and tracking method and device, electronic equipment and storage medium
CN116486290B (en) * 2023-06-21 2023-09-05 成都庆龙航空科技有限公司 Unmanned aerial vehicle monitoring and tracking method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
JP6759474B2 (en) Vessel automatic tracking methods and systems based on deep learning networks and average shifts
CN110059554B (en) Multi-branch target detection method based on traffic scene
CN113011319B (en) Multi-scale fire target identification method and system
CN112101221B (en) Method for real-time detection and identification of traffic signal lamp
CN111553201B (en) Traffic light detection method based on YOLOv3 optimization algorithm
CN107657226B (en) People number estimation method based on deep learning
CN109145836B (en) Ship target video detection method based on deep learning network and Kalman filtering
CN110223302A (en) A kind of naval vessel multi-target detection method extracted based on rotary area
CN111862145B (en) Target tracking method based on multi-scale pedestrian detection
CN101800890A (en) Multiple vehicle video tracking method in expressway monitoring scene
CN111753682B (en) Hoisting area dynamic monitoring method based on target detection algorithm
CN107944354B (en) Vehicle detection method based on deep learning
CN115223063B (en) Deep learning-based unmanned aerial vehicle remote sensing wheat new variety lodging area extraction method and system
CN113743260B (en) Pedestrian tracking method under condition of dense pedestrian flow of subway platform
CN111353496B (en) Real-time detection method for infrared dim targets
CN113763427B (en) Multi-target tracking method based on coarse-to-fine shielding processing
CN115482489A (en) Improved YOLOv 3-based power distribution room pedestrian detection and trajectory tracking method and system
CN115661569A (en) High-precision fine-grained SAR target detection method
CN114332644A (en) Large-view-field traffic density acquisition method based on video satellite data
CN117218545A (en) LBP feature and improved Yolov 5-based radar image detection method
CN113221760A (en) Expressway motorcycle detection method
CN116188442A (en) High-precision forest smoke and fire detection method suitable for any scene
Pillai et al. Fine-Tuned EfficientNetB4 Transfer Learning Model for Weather Classification
CN113496159B (en) Multi-scale convolution and dynamic weight cost function smoke target segmentation method
CN116453033A (en) Crowd density estimation method with high precision and low calculation amount in video monitoring scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination