CN114140750A - Filling station safety helmet wearing real-time detection method based on YOLOv4-Tiny - Google Patents

Filling station safety helmet wearing real-time detection method based on YOLOv4-Tiny Download PDF

Info

Publication number
CN114140750A
CN114140750A CN202111495511.1A CN202111495511A CN114140750A CN 114140750 A CN114140750 A CN 114140750A CN 202111495511 A CN202111495511 A CN 202111495511A CN 114140750 A CN114140750 A CN 114140750A
Authority
CN
China
Prior art keywords
tiny
yolov4
network
training
pictures
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111495511.1A
Other languages
Chinese (zh)
Inventor
范庆来
周君良
倪勇龙
陈义
钱至远
王豆
杨杰
王崇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Zheyou Comprehensive Energy Sales Co ltd
Zhejiang Energy Group Research Institute Co Ltd
Original Assignee
Zhejiang Zheyou Comprehensive Energy Sales Co ltd
Zhejiang Energy Group Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Zheyou Comprehensive Energy Sales Co ltd, Zhejiang Energy Group Research Institute Co Ltd filed Critical Zhejiang Zheyou Comprehensive Energy Sales Co ltd
Priority to CN202111495511.1A priority Critical patent/CN114140750A/en
Publication of CN114140750A publication Critical patent/CN114140750A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a filling station safety helmet wearing real-time detection method based on YOLOv4-Tiny, which comprises the following steps: s1, performing frame extraction processing on the original monitoring video of the gas station, extracting a plurality of monitoring pictures, preprocessing the monitoring pictures to obtain training pictures, selecting training pictures with corresponding proportions to generate countermeasure samples, and taking the rest training pictures as original samples; s2, generating a first training set based on the confrontation sample and the original sample; s3, training the YOLOv4-Tiny improved model by adopting a first training set to obtain corresponding training weight; s4, accessing a gas station for real-time monitoring, performing frame decomposition, inputting the decomposition picture of each frame into a trained YOLOv4-Tiny improved model in real time to obtain a personal safety helmet wearing classification result in the decomposition picture, and overlapping the classification result into the decomposition picture to obtain a classification picture; and S5, synthesizing the classified pictures into videos in real time, and outputting the videos in real time. The detection accuracy is higher, the real-time is stronger, and the adaptation is in the interior hardware equipment basis of oil station.

Description

Filling station safety helmet wearing real-time detection method based on YOLOv4-Tiny
Technical Field
The invention belongs to the technical field of intelligent recognition of safety images of a gasoline station, and particularly relates to a method for detecting wearing of safety caps of the gasoline station based on YOLOv4-Tiny in real time.
Background
With the increasing complexity of industrial systems and the continuous development of artificial intelligence technology, many problems in industry can be solved by applying artificial intelligence technology to improve work efficiency. While the technology is being improved, worker safety is beginning to be a great concern. The safety helmet is used as important equipment for protecting the personal safety of workers, the safety of the workers can be effectively protected in actual industrial production, and production safety accidents are greatly reduced.
Most of the existing safety helmet detection methods are based on deep learning, and the existing safety helmet detection methods are used for specially detecting the safety helmet by replacing a training set with a classical algorithm in the field of target detection. Such as two-stage model R-CNN series and single-stage network YOLO series in target detection. However, in actual use, when an existing target detection model is used for detecting an actual safety helmet, due to the fact that the detection environment is severe and the size of the safety helmet is too small, the existing detection method has the problems of being high in requirement on the detection environment and low in detection precision. Secondly, hardware equipment in the gas station is updated slowly, and the calculation power of the hardware is low, so that the detection accuracy is improved, and meanwhile, the complexity of a target detection model needs to be considered so as to adapt to the hardware equipment foundation of the gas station.
Therefore, it is necessary to improve the existing detection model to adapt to the hardware device foundation in the gas station and improve the detection accuracy of the detection model.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a method for detecting the wearing real-time of the safety helmet of the gas station based on YOLOv4-Tiny, which can be adapted to the hardware equipment foundation in the gas station and has higher detection accuracy.
The invention adopts the following technical scheme: a filling station helmet wearing real-time detection method based on YOLOv4-Tiny comprises the following steps: s1, performing frame extraction processing on the original monitoring video of the gas station, extracting a plurality of monitoring pictures, preprocessing the monitoring pictures to obtain training pictures, selecting training pictures with corresponding proportions to generate countermeasure samples, and taking the rest training pictures as original samples;
s2, generating a first training set based on the confrontation sample and the original sample;
s3, training the YOLOv4-Tiny improved model by adopting a first training set to obtain corresponding training weight;
s4, accessing a gas station for real-time monitoring, performing frame decomposition, inputting the decomposition picture of each frame into a trained YOLOv4-Tiny improved model in real time to obtain a personal safety helmet wearing classification result in the decomposition picture, and overlapping the classification result into the decomposition picture to obtain a classification picture;
s5, synthesizing the classified pictures into videos in real time, and outputting the videos in real time;
the YOLOv4-Tiny improved model comprises a feature extraction backbone network module, a multi-scale feature fusion module and a classification prediction module which are sequentially connected, wherein the feature extraction backbone network module is a CSPdarknet53_ Tiny network;
the YOLOv4-Tiny improvement model also includes an attention mechanism module inserted into the residual network of the Resblock _ body module of the CSPdarknet53_ Tiny network.
As a preferred scheme, the attention mechanism module is a SENET network, the SENET network is a channel characteristic attention network, the input characteristic graph is subjected to global average pooling firstly, then passes through two full-connection layers, and finally outputs corresponding weight through a Sigmoid activation function, and the weight is multiplied by the input characteristic graph to obtain output.
Preferably, step S1 includes the steps of:
s1.1, selecting a monitoring video of a gas station in a period of time, and performing frame extraction processing to obtain monitoring pictures containing a safety helmet worn by a person and a safety helmet not worn by the person;
s1.2, marking the target position in the monitoring picture, and forming a label to obtain a training picture;
s1.3, selecting training pictures in corresponding proportions, and generating a confrontation sample by adopting a target confrontation object gradient attack method;
and S1.4, performing a scrambling operation on the confrontation sample and the rest training pictures, and performing a data enhancement operation to form a first training set.
Preferably, step S1.3 includes the steps of:
s1.3.1, training to obtain a trained original YOLOv4-Tiny network;
s1.3.2, inputting the selected training pictures with corresponding proportions into the trained original YOLOv4-Tiny network, and carrying out network forward propagation to obtain confidence loss;
s1.3.3, performing back propagation on the confidence loss, wherein the frozen network parameters are not variable in the back propagation process, and only the pixels of the picture can be modified;
s1.3.4, after each iteration, inputting the modified picture into the original well-trained YOLOv4-Tiny network, and if the target position cannot be correctly detected, stopping the iteration, wherein the picture becomes a countermeasure sample.
Preferably, in step S1.3.3, the propagation is reversed toward the direction of increasing confidence loss.
Preferably, step S1.3.1 includes the following steps:
a. taking all the training pictures as a second training set;
b. the training labels are changed into person, 50% hat and 100% hat, and are respectively used for detecting that the safety helmet is not worn, the safety helmet is not correctly worn and the safety helmet is correctly worn;
c. and inputting the second training set into an original YOLOv4-Tiny model to calculate loss, and performing back propagation to obtain corresponding network weight so as to obtain a trained original YOLOv4-Tiny network.
Preferably, in step c, the loss is calculated by the following formula:
Figure 926006DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 436622DEST_PATH_IMAGE003
a weight representing a coordinate loss;
Figure 685201DEST_PATH_IMAGE004
weights representing mesh prediction class losses;
Figure 188732DEST_PATH_IMAGE005
representing the number of grids divided by the picture;
Figure 471946DEST_PATH_IMAGE006
representing the number of prediction boxes contained in each grid;
Figure 887884DEST_PATH_IMAGE007
is shown as
Figure 702387DEST_PATH_IMAGE008
A first of the grid
Figure 370129DEST_PATH_IMAGE009
Whether each prediction frame is a responsible prediction frame or not is judged, if yes, 1 is selected, and if not, 0 is selected;
Figure 632483DEST_PATH_IMAGE010
is shown as
Figure 94689DEST_PATH_IMAGE008
A first of the grid
Figure 691761DEST_PATH_IMAGE009
Whether the prediction frame is not the responsible prediction frame or not is judged, if so, 1 is selected, and if not, 0 is selected;
Figure 897614DEST_PATH_IMAGE011
Figure 624262DEST_PATH_IMAGE012
respectively represent by
Figure 896849DEST_PATH_IMAGE008
The coordinates of the real marked central point of the target object in charge of each grid;
Figure 607316DEST_PATH_IMAGE013
Figure 475915DEST_PATH_IMAGE014
respectively represent by
Figure 588227DEST_PATH_IMAGE008
Coordinates of a central point of a prediction frame of the target object in charge of each grid;
Figure 267602DEST_PATH_IMAGE015
Figure 793261DEST_PATH_IMAGE016
respectively represent by
Figure 340917DEST_PATH_IMAGE008
The length and the width of the real mark of the target object in charge of each grid;
Figure 209866DEST_PATH_IMAGE017
Figure 325721DEST_PATH_IMAGE018
respectively represent by
Figure 745201DEST_PATH_IMAGE008
The length and the width of a prediction frame of the target object in charge of each grid;
Figure 221181DEST_PATH_IMAGE019
is represented by
Figure 42507DEST_PATH_IMAGE008
Real classification results of the target object for which each grid is responsible;
Figure 827798DEST_PATH_IMAGE020
is represented by
Figure 734574DEST_PATH_IMAGE008
The prediction classification result of the target object is responsible for each grid;
Figure 748667DEST_PATH_IMAGE021
is represented by
Figure 690078DEST_PATH_IMAGE008
The object for which the grid is responsible belongs to
Figure 147735DEST_PATH_IMAGE022
True classification probabilities for individual classes;
Figure 541807DEST_PATH_IMAGE023
is represented by
Figure 359591DEST_PATH_IMAGE008
The object for which the grid is responsible belongs to
Figure 889929DEST_PATH_IMAGE022
A predicted classification probability for each class;
Figure 100002_DEST_PATH_IMAGE024
a set of class numbers is indicated.
Preferably, the CSPdarknet53_ tiny network includes a first darknencv 2D _ BN _ leak module, a second darknencv 2D _ BN _ leak module, a first Resblock _ body module, a second Resblock _ body module, a third Resblock _ body module, and a third darknencv 2D _ BN _ leak module, which are connected in sequence.
As a preferred scheme, the multi-scale feature fusion module comprises a first convolution layer, a second convolution layer, a first up-sampling layer, a first splicing layer, a third convolution layer, a second up-sampling layer and a second splicing layer which are sequentially connected from bottom to top;
the classification prediction module comprises a first Yolo Head classification network, a second Yolo Head classification network and a third Yolo Head classification network which are sequentially connected from bottom to top;
the first feature graph output by the third DarknetConv2D _ BN _ Leaky module is input to the first Yolo Head classification network through the first convolution layer on one hand, and is input to the second Yolo Head classification network through the second convolution layer and the first up-sampling layer on the other hand, and the second feature graph output by the second Resblock _ body module is input to the first splicing layer together for splicing to obtain a first fusion feature graph, the first fusion feature graph is input to the second Yolo Head classification network on the one hand, and is input to the second splicing layer together with the third feature graph output by the first Resblock _ body module through the third convolution layer and the second up-sampling layer on the other hand, so as to obtain a second fusion feature graph, and the second fusion feature graph is input to the third Yolo Head classification network.
Preferably, the DarknetConv2D _ BN _ Leaky modules are each composed of convolution Conv2D, batch normalized BN, and Leaky ReLU activation functions.
The invention has the beneficial effects that:
the detection is carried out based on a YOLOv4-Tiny model, and the YOLOv4-Tiny model belongs to a light weight version of a YOLOv4 model so as to adapt to the hardware equipment foundation in the gas station and improve the detection speed.
The YOLOv4-Tiny model has only two scales and is not accurate enough for the detection of the safety helmet, so the method improves the model, increases one scale to perform more-scale detection and improves the detection accuracy. An attention mechanism module is also inserted into a residual error network of a Resblock _ body module of the CSPdarknet53_ tiny network, so that the detection accuracy is improved, and meanwhile, the CSPdarknet53_ tiny network and the attention mechanism network are simple and are further adapted to a hardware equipment foundation in a gas station.
Based on an original YOLOv4-Tiny network, a target counterattack object gradient attack method is adopted to generate a countersample, and a YOLOv4-Tiny improved model is trained based on the countersample and the original sample to obtain a final detection model, so that the robustness of the detection model is improved, and the generalization capability is strong.
The method comprises the steps of carrying out frame decomposition on a monitoring video, detecting and classifying decomposed pictures of each frame to obtain classified pictures, and finally carrying out video synthesis and output on the classified pictures to realize real-time classification of wearing conditions of the personal safety helmet, so that better detection is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for detecting the wearing of a safety helmet of a gas station based on YOLOv4-Tiny in real time according to the invention;
FIG. 2 is a schematic structural diagram of a YOLOv4-Tiny improved model;
fig. 3 is a schematic diagram of the structure of the SENet network.
Detailed Description
The following description of the embodiments of the present invention is provided by way of specific examples, and other advantages and effects of the present invention will be readily apparent to those skilled in the art from the disclosure herein. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
Referring to fig. 1, the embodiment provides a filling station helmet wearing real-time detection method based on YOLOv4-Tiny, which includes the steps:
s1, performing frame extraction processing on an original monitoring video of a gas station, extracting a plurality of monitoring pictures, preprocessing the monitoring pictures to obtain training pictures, selecting training pictures with corresponding proportions to generate confrontation samples, and taking the rest training pictures as original samples, wherein the proportion of 1/3 is selected in the embodiment and can be specifically set according to actual conditions;
s2, generating a first training set based on the confrontation sample and the original sample;
s3, training the YOLOv4-Tiny improved model by adopting a first training set to obtain corresponding training weight;
s4, accessing a gas station for real-time monitoring, performing frame decomposition, inputting the decomposition picture of each frame into a trained YOLOv4-Tiny improved model in real time to obtain a personal safety helmet wearing classification result in the decomposition picture, and overlapping the classification result into the decomposition picture to obtain a classification picture;
s5, synthesizing the classified pictures into videos in real time, and outputting the videos in real time;
the YOLOv4-Tiny improved model comprises a feature extraction backbone network module, a multi-scale feature fusion module and a classification prediction module which are sequentially connected, wherein the feature extraction backbone network module is a CSPdarknet53_ Tiny network;
the YOLOv4-Tiny improvement model also includes an attention mechanism module inserted into the residual network of the Resblock _ body module of the CSPdarknet53_ Tiny network.
In the invention, firstly, the detection is carried out based on a YOLOv4-Tiny model, and the YOLOv4-Tiny model belongs to a light weight version of a YOLOv4 model so as to adapt to the hardware equipment foundation in a gas station and improve the detection speed.
Secondly, the existing YOLOv4-Tiny model has only two scales, and the detection of the safety helmet is not accurate enough, so the model is improved, the feature extraction backbone network module adopts a CSPdarknet53_ Tiny network, and one scale is added to perform more-scale detection, and the detection accuracy is improved. An attention mechanism module is also inserted into a residual error network of a Resblock _ body module of the CSPdarknet53_ tiny network, so that the detection accuracy is improved, and meanwhile, the CSPdarknet53_ tiny network and the attention mechanism network are simple and are further adapted to a hardware equipment foundation in a gas station.
And thirdly, performing frame decomposition on the monitoring video, detecting and classifying the decomposed pictures of each frame to obtain classified pictures, and finally performing video synthesis and output on the classified pictures to realize real-time classification of the wearing condition of the personal safety helmet, thereby better detecting.
In addition, the YOLOv4-Tiny improved model is trained based on the confrontation sample and the original sample to obtain a final detection model, so that the robustness of the detection model is improved, and the generalization capability is strong.
Specifically, the method comprises the following steps:
in step S1, the method includes the steps of:
s1.1, selecting a monitoring video of a gas station in a time period, performing frame extraction processing, randomly selecting a plurality of pictures including pictures of workers with safety helmets and pictures without safety helmets, and uniformly cutting the pictures into 416 x 416 sizes to obtain monitoring pictures including pictures of workers wearing safety helmets and pictures of workers not wearing safety helmets;
s1.2, manually marking a target position in a monitored picture by adopting labelme software, forming a label, and simultaneously generating corresponding xml, json and png format files to obtain a training picture;
s1.3, selecting training pictures in corresponding proportions, and generating a confrontation sample by adopting a target confrontation object gradient attack method;
and S1.4, performing a scrambling operation on the confrontation sample and the rest of the training pictures, and performing a data enhancement operation to form a first training set, wherein the data enhancement operation comprises random cutting, turning, zooming and the like.
In step S1.3, the method comprises the steps of:
s1.3.1, training to obtain a trained original YOLOv4-Tiny network;
s1.3.2, inputting the selected 1/3 proportion training picture into the trained original YOLOv4-Tiny network, and carrying out network forward propagation to obtain confidence coefficient loss;
s1.3.3, performing back propagation on the confidence loss, wherein the frozen network parameters are not variable in the back propagation process, only the pixels of the picture can be modified, and in the back propagation process, the confidence loss is maximized (different from the traditional method for minimizing the loss), and the grid region containing the target in the picture can be classified into the background by maximizing the confidence loss, so as to generate a countermeasure sample with an attack effect;
s1.3.4, after each iteration, inputting the modified picture into the original well-trained YOLOv4-Tiny network, and if the target position cannot be correctly detected, stopping the iteration, wherein the picture becomes a countermeasure sample.
In step S1.3.1, the method includes the following steps:
a. taking all the training pictures as a second training set;
b. the training labels are changed into person, 50% hat and 100% hat, and are respectively used for detecting that the safety helmet is not worn, the safety helmet is not correctly worn and the safety helmet is correctly worn;
c. and inputting the second training set into an original YOLOv4-Tiny model to calculate loss, and performing back propagation to obtain corresponding network weight so as to obtain a trained original YOLOv4-Tiny network.
In step c, the formula for calculating loss is as follows:
Figure 485864DEST_PATH_IMAGE025
wherein the content of the first and second substances,
Figure 491866DEST_PATH_IMAGE026
a weight representing a coordinate loss;
Figure 988707DEST_PATH_IMAGE027
weights representing mesh prediction class losses;
Figure 514497DEST_PATH_IMAGE028
representing the number of grids divided by the picture;
Figure 797447DEST_PATH_IMAGE006
representing the number of prediction boxes contained in each grid;
Figure 166112DEST_PATH_IMAGE007
is shown as
Figure 574965DEST_PATH_IMAGE008
A first of the grid
Figure 345475DEST_PATH_IMAGE009
Whether each prediction frame is a responsible prediction frame or not is judged, if yes, 1 is selected, and if not, 0 is selected;
Figure 299524DEST_PATH_IMAGE010
is shown as
Figure 155485DEST_PATH_IMAGE008
A first of the grid
Figure 869494DEST_PATH_IMAGE009
Whether the prediction frame is not the responsible prediction frame or not is judged, if so, 1 is selected, and if not, 0 is selected;
Figure 228931DEST_PATH_IMAGE029
Figure 619461DEST_PATH_IMAGE012
respectively represent by
Figure 697139DEST_PATH_IMAGE008
The coordinates of the real marked central point of the target object in charge of each grid;
Figure 713374DEST_PATH_IMAGE030
Figure 192897DEST_PATH_IMAGE014
respectively represent by
Figure 754328DEST_PATH_IMAGE008
Coordinates of a central point of a prediction frame of the target object in charge of each grid;
Figure 397930DEST_PATH_IMAGE015
Figure 171851DEST_PATH_IMAGE016
respectively represent by
Figure 394629DEST_PATH_IMAGE008
The length and the width of the real mark of the target object in charge of each grid;
Figure 877694DEST_PATH_IMAGE017
Figure 257860DEST_PATH_IMAGE018
respectively represent by
Figure 976417DEST_PATH_IMAGE008
The length and the width of a prediction frame of the target object in charge of each grid;
Figure 227270DEST_PATH_IMAGE019
is represented by
Figure 379771DEST_PATH_IMAGE008
Real classification results of the target object for which each grid is responsible;
Figure 247233DEST_PATH_IMAGE020
is represented by
Figure 769481DEST_PATH_IMAGE008
A gridThe responsible target object prediction classification result;
Figure 953469DEST_PATH_IMAGE031
is represented by
Figure 902971DEST_PATH_IMAGE008
The object for which the grid is responsible belongs to
Figure 788887DEST_PATH_IMAGE022
True classification probabilities for individual classes;
Figure 114826DEST_PATH_IMAGE032
is represented by
Figure 651856DEST_PATH_IMAGE008
The object for which the grid is responsible belongs to
Figure 772258DEST_PATH_IMAGE022
A predicted classification probability for each class;
Figure 411050DEST_PATH_IMAGE024
a set of class numbers is indicated.
Therefore, the method generates the corresponding confrontation sample by using the TOG method for the YOLOv4-Tiny model, so that the confrontation training result is better, namely the robustness and the generalization performance of the model are improved to a greater extent.
With reference to fig. 2 and 3, the YOLOv4-Tiny improved model and the detection process thereof are described in more detail as follows:
the CSPdark net53_ tiny network outputs three feature maps with different sizes, the sizes are respectively: 52, 26, 13, and the three different feature maps are input into the multi-scale feature fusion module for feature fusion. The multi-scale feature fusion module fuses the three feature maps to obtain three fused feature maps with the sizes of 52 × 52, 26 × 26 and 13 × 13. And the classification prediction module predicts the input feature maps with different scales and outputs a final detection result.
The attention mechanism module is a SENET network. The SENEt is a channel feature attention network, the input feature graph is subjected to global average pooling firstly, then passes through two full-connection layers, and finally outputs corresponding weight through a Sigmoid activation function, and the weight is multiplied by the input feature graph to obtain output.
The CSPdarknet53_ tiny network comprises a first darknencv 2D _ BN _ leak module, a second darknencv 2D _ BN _ leak module, a first Resblock _ body module, a second Resblock _ body module, a third Resblock _ body module, and a third darknencv 2D _ BN _ leak module, which are connected in sequence. The DarknetConv2D _ BN _ Leaky module consists of a convolution Conv2D, a Batch Normalization (BN) and a Leaky ReLU activation function. The Resblock _ body module carries out residual error calculation on the input characteristic diagram through a DarknetConv2D _ BN _ Leaky module, outputs the residual error calculation output, and then splices the residual error calculation output with the input of the module to output.
The multi-scale feature fusion module comprises a first convolution layer, a second convolution layer, a first up-sampling layer, a first splicing layer, a third convolution layer, a second up-sampling layer and a second splicing layer which are sequentially connected from bottom to top;
the classification prediction module comprises a first Yolo Head classification network, a second Yolo Head classification network and a third Yolo Head classification network which are sequentially connected from bottom to top;
the first feature graph output by the third DarknetConv2D _ BN _ Leaky module is input to the first Yolo Head classification network through the first convolution layer on one hand, and is input to the second Yolo Head classification network through the second convolution layer and the first up-sampling layer on the other hand, and the second feature graph output by the second Resblock _ body module is input to the first splicing layer together for splicing to obtain a first fusion feature graph, the first fusion feature graph is input to the second Yolo Head classification network on the one hand, and is input to the second splicing layer together with the third feature graph output by the first Resblock _ body module through the third convolution layer and the second up-sampling layer on the other hand, so as to obtain a second fusion feature graph, and the second fusion feature graph is input to the third Yolo Head classification network.
Wherein the first signature size is 13 × 13, the second signature size is 26 × 26, the third signature size is 52 × 52, the first fused signature size is 26 × 26, and the second fused signature size is 52 × 52.
The Yolo Head classification network has the same structure as the Yolo Head of the existing Yolo 4-Tiny algorithm, and redundant explanation is not provided here.
The improved YOLOv4-Tiny model adopts the fusion feature layer of three dimensions 13 x 13, 26 x 26 and 52 x 52 to detect, and compared with the original model which only has two dimensions, the detection of the detector on Tiny objects can be greatly improved. The existing YOLOv4-Tiny has only two scales, and the detection capability of the existing YOLOv4-Tiny on the details and Tiny objects cannot meet the existing production requirement, so that more scales are required to participate in the reasoning of the model. Taking 13 × 13 as an example, the Yolo Head prediction process is as follows: the input picture is divided into 13 × 13 cells, and if the center of an object is detected in a certain cell, the cell is taken as a prediction cell of the object. And each cell can generate three anchor frames, and a total of 13 × 3=507 anchor frames are generated for prediction. And when the confidence coefficient of the object is greater than the threshold value, three anchor frames are reserved, and the optimal anchor frame is screened out by using non-maximum value inhibition and is used as the final prediction frame of the object. Three-scale prediction can predict more objects than two-scale prediction. And the fusion feature layers with different scales are suitable for detecting objects with different volumes.
The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention by those skilled in the art should fall within the protection scope of the present invention without departing from the design spirit of the present invention.

Claims (10)

1. A filling station helmet wearing real-time detection method based on YOLOv4-Tiny is characterized by comprising the following steps:
s1, performing frame extraction processing on the original monitoring video of the gas station, extracting a plurality of monitoring pictures, preprocessing the monitoring pictures to obtain training pictures, selecting training pictures with corresponding proportions to generate countermeasure samples, and taking the rest training pictures as original samples;
s2, generating a first training set based on the confrontation sample and the original sample;
s3, training the YOLOv4-Tiny improved model by adopting a first training set to obtain corresponding training weight;
s4, accessing a gas station for real-time monitoring, performing frame decomposition, inputting the decomposition picture of each frame into a trained YOLOv4-Tiny improved model in real time to obtain a personal safety helmet wearing classification result in the decomposition picture, and overlapping the classification result into the decomposition picture to obtain a classification picture;
s5, synthesizing the classified pictures into videos in real time, and outputting the videos in real time;
the YOLOv4-Tiny improved model comprises a feature extraction backbone network module, a multi-scale feature fusion module and a classification prediction module which are sequentially connected, wherein the feature extraction backbone network module is a CSPdarknet53_ Tiny network;
the YOLOv4-Tiny improvement model also includes an attention mechanism module inserted into the residual network of the Resblock _ body module of the CSPdarknet53_ Tiny network.
2. The method as claimed in claim 1, wherein the attention mechanism module is a SENET network, the SENET network is a channel feature attention network, the input feature map is first subjected to global average pooling, then passes through two full connection layers, and finally outputs a corresponding weight through a Sigmoid activation function, and the weight is multiplied by the input feature map to obtain an output.
3. The method for detecting the wearing of the safety helmet of the gasoline station based on YOLOv4-Tiny as claimed in claim 1, wherein the step S1 comprises the steps of:
s1.1, selecting a monitoring video of a gas station in a period of time, and performing frame extraction processing to obtain monitoring pictures containing a safety helmet worn by a person and a safety helmet not worn by the person;
s1.2, marking the target position in the monitoring picture, and forming a label to obtain a training picture;
s1.3, selecting training pictures in corresponding proportions, and generating a confrontation sample by adopting a target confrontation object gradient attack method;
and S1.4, performing a scrambling operation on the confrontation sample and the rest training pictures, and performing a data enhancement operation to form a first training set.
4. The method for detecting the wearing condition of the gas station safety helmet based on YOLOv4-Tiny as claimed in claim 3, wherein the step S1.3 comprises the steps of:
s1.3.1, training to obtain a trained original YOLOv4-Tiny network;
s1.3.2, inputting the selected training pictures with corresponding proportions into the trained original YOLOv4-Tiny network, and carrying out network forward propagation to obtain confidence loss;
s1.3.3, performing back propagation on the confidence loss, wherein the frozen network parameters are not variable in the back propagation process, and only the pixels of the picture can be modified;
s1.3.4, after each iteration, inputting the modified picture into the original well-trained YOLOv4-Tiny network, and if the target position cannot be correctly detected, stopping the iteration, wherein the picture becomes a countermeasure sample.
5. The method for detecting wearing of safety caps of gas stations based on YOLOv4-Tiny as claimed in claim 4, wherein the step S1.3.3 is backward propagated in the direction of increasing confidence loss.
6. The method for detecting the wearing condition of the safety helmet of the gasoline station based on YOLOv4-Tiny as claimed in claim 4, wherein the step S1.3.1 comprises the following steps:
a. taking all the training pictures as a second training set;
b. the training labels are changed into person, 50% hat and 100% hat, and are respectively used for detecting that the safety helmet is not worn, the safety helmet is not correctly worn and the safety helmet is correctly worn;
c. and inputting the second training set into an original YOLOv4-Tiny model to calculate loss, and performing back propagation to obtain corresponding network weight so as to obtain a trained original YOLOv4-Tiny network.
7. The method for detecting the wearing condition of the gas station safety helmet based on YOLOv4-Tiny as claimed in claim 6, wherein in the step c, the loss is calculated by the formula:
Figure DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE003
a weight representing a coordinate loss;
Figure DEST_PATH_IMAGE004
weights representing mesh prediction class losses;
Figure DEST_PATH_IMAGE005
representing the number of grids divided by the picture;
Figure DEST_PATH_IMAGE006
representing the number of prediction boxes contained in each grid;
Figure DEST_PATH_IMAGE007
is shown as
Figure DEST_PATH_IMAGE008
A first of the grid
Figure DEST_PATH_IMAGE009
Whether each prediction frame is a responsible prediction frame or not is judged, if yes, 1 is selected, and if not, 0 is selected;
Figure DEST_PATH_IMAGE010
is shown as
Figure 23022DEST_PATH_IMAGE008
A first of the grid
Figure 222053DEST_PATH_IMAGE009
Whether the prediction frame is not the responsible prediction frame or not is judged, if so, 1 is selected, and if not, 0 is selected;
Figure DEST_PATH_IMAGE011
Figure DEST_PATH_IMAGE012
respectively represent by
Figure 969167DEST_PATH_IMAGE008
The coordinates of the real marked central point of the target object in charge of each grid;
Figure DEST_PATH_IMAGE013
Figure DEST_PATH_IMAGE014
respectively represent by
Figure 708584DEST_PATH_IMAGE008
Coordinates of a central point of a prediction frame of the target object in charge of each grid;
Figure DEST_PATH_IMAGE015
Figure DEST_PATH_IMAGE016
respectively represent by
Figure 365699DEST_PATH_IMAGE008
The length and the width of the real mark of the target object in charge of each grid;
Figure DEST_PATH_IMAGE017
Figure DEST_PATH_IMAGE018
respectively represent by
Figure 329107DEST_PATH_IMAGE008
The length and the width of a prediction frame of the target object in charge of each grid;
Figure DEST_PATH_IMAGE019
is represented by
Figure 658457DEST_PATH_IMAGE008
Real classification results of the target object for which each grid is responsible;
Figure DEST_PATH_IMAGE020
is represented by
Figure 168942DEST_PATH_IMAGE008
The prediction classification result of the target object is responsible for each grid;
Figure DEST_PATH_IMAGE022
is represented by
Figure 900138DEST_PATH_IMAGE008
The object for which the grid is responsible belongs to
Figure DEST_PATH_IMAGE023
True classification probabilities for individual classes;
Figure DEST_PATH_IMAGE024
is represented by
Figure 206352DEST_PATH_IMAGE008
The object for which the grid is responsible belongs to
Figure 491840DEST_PATH_IMAGE023
A predicted classification probability for each class;
Figure DEST_PATH_IMAGE026
a set of class numbers is indicated.
8. The method of claim 1, wherein the CSPdarknet53_ Tiny network comprises a first darknencv 2D _ BN _ leak module, a second darknencv 2D _ BN _ leak module, a first Resblock _ body module, a second Resblock _ body module, a third Resblock _ body module, and a third darknencv 2D _ BN _ leak module, which are sequentially connected.
9. The method for detecting wearing of a gas station helmet based on YOLOv4-Tiny according to claim 8, wherein the multi-scale feature fusion module comprises a first convolution layer, a second convolution layer, a first up-sampling layer, a first splicing layer, a third convolution layer, a second up-sampling layer and a second splicing layer which are sequentially connected from bottom to top;
the classification prediction module comprises a first Yolo Head classification network, a second Yolo Head classification network and a third Yolo Head classification network which are sequentially connected from bottom to top;
the first feature graph output by the third DarknetConv2D _ BN _ Leaky module is input to the first Yolo Head classification network through the first convolution layer on one hand, and is input to the second Yolo Head classification network through the second convolution layer and the first up-sampling layer on the other hand, and the second feature graph output by the second Resblock _ body module is input to the first splicing layer together for splicing to obtain a first fusion feature graph, the first fusion feature graph is input to the second Yolo Head classification network on the one hand, and is input to the second splicing layer together with the third feature graph output by the first Resblock _ body module through the third convolution layer and the second up-sampling layer on the other hand, so as to obtain a second fusion feature graph, and the second fusion feature graph is input to the third Yolo Head classification network.
10. The method for detecting wearing of safety caps of gas stations based on YOLOv4-Tiny as claimed in claim 8, wherein the DarknetConv2D _ BN _ Leaky modules are composed of convolution Conv2D, batch standardized BN and Leaky ReLU activation functions.
CN202111495511.1A 2021-12-09 2021-12-09 Filling station safety helmet wearing real-time detection method based on YOLOv4-Tiny Pending CN114140750A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111495511.1A CN114140750A (en) 2021-12-09 2021-12-09 Filling station safety helmet wearing real-time detection method based on YOLOv4-Tiny

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111495511.1A CN114140750A (en) 2021-12-09 2021-12-09 Filling station safety helmet wearing real-time detection method based on YOLOv4-Tiny

Publications (1)

Publication Number Publication Date
CN114140750A true CN114140750A (en) 2022-03-04

Family

ID=80385385

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111495511.1A Pending CN114140750A (en) 2021-12-09 2021-12-09 Filling station safety helmet wearing real-time detection method based on YOLOv4-Tiny

Country Status (1)

Country Link
CN (1) CN114140750A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115409818A (en) * 2022-09-05 2022-11-29 江苏济远医疗科技有限公司 Enhanced training method applied to endoscope image target detection model
CN116229522A (en) * 2023-05-10 2023-06-06 广东电网有限责任公司湛江供电局 Substation operator safety protection equipment detection method and system
CN116977919A (en) * 2023-06-21 2023-10-31 北京卓视智通科技有限责任公司 Method and system for identifying dressing specification, storage medium and electronic equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115409818A (en) * 2022-09-05 2022-11-29 江苏济远医疗科技有限公司 Enhanced training method applied to endoscope image target detection model
CN115409818B (en) * 2022-09-05 2023-10-27 江苏济远医疗科技有限公司 Enhanced training method applied to endoscope image target detection model
CN116229522A (en) * 2023-05-10 2023-06-06 广东电网有限责任公司湛江供电局 Substation operator safety protection equipment detection method and system
CN116977919A (en) * 2023-06-21 2023-10-31 北京卓视智通科技有限责任公司 Method and system for identifying dressing specification, storage medium and electronic equipment
CN116977919B (en) * 2023-06-21 2024-01-26 北京卓视智通科技有限责任公司 Method and system for identifying dressing specification, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN108564097B (en) Multi-scale target detection method based on deep convolutional neural network
CN114140750A (en) Filling station safety helmet wearing real-time detection method based on YOLOv4-Tiny
CN111723786B (en) Method and device for detecting wearing of safety helmet based on single model prediction
CN112434672B (en) Marine human body target detection method based on improved YOLOv3
CN111079739B (en) Multi-scale attention feature detection method
CN112733749A (en) Real-time pedestrian detection method integrating attention mechanism
CN113011319A (en) Multi-scale fire target identification method and system
CN110378222A (en) A kind of vibration damper on power transmission line target detection and defect identification method and device
CN112149591B (en) SSD-AEFF automatic bridge detection method and system for SAR image
CN113569667B (en) Inland ship target identification method and system based on lightweight neural network model
CN113052834A (en) Pipeline defect detection method based on convolution neural network multi-scale features
CN114565891A (en) Smoke and fire monitoring method and system based on graph generation technology
Park et al. Advanced wildfire detection using generative adversarial network-based augmented datasets and weakly supervised object localization
CN112861646A (en) Cascade detection method for oil unloading worker safety helmet in complex environment small target recognition scene
CN115512387A (en) Construction site safety helmet wearing detection method based on improved YOLOV5 model
CN116543346A (en) Deep learning-based transmission line video mountain fire detection method
CN116385958A (en) Edge intelligent detection method for power grid inspection and monitoring
Yandouzi et al. Investigation of combining deep learning object recognition with drones for forest fire detection and monitoring
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN116310850B (en) Remote sensing image target detection method based on improved RetinaNet
CN113936299A (en) Method for detecting dangerous area in construction site
CN117218545A (en) LBP feature and improved Yolov 5-based radar image detection method
CN112364864A (en) License plate recognition method and device, electronic equipment and storage medium
CN115131826B (en) Article detection and identification method, and network model training method and device
CN116740516A (en) Target detection method and system based on multi-scale fusion feature extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination