CN114140750A

CN114140750A - Filling station safety helmet wearing real-time detection method based on YOLOv4-Tiny

Info

Publication number: CN114140750A
Application number: CN202111495511.1A
Authority: CN
Inventors: 范庆来; 周君良; 倪勇龙; 陈义; 钱至远; 王豆; 杨杰; 王崇
Original assignee: Zhejiang Zheyou Comprehensive Energy Sales Co ltd; Zhejiang Energy Group Research Institute Co Ltd
Current assignee: Zhejiang Zheyou Comprehensive Energy Sales Co ltd; Zhejiang Energy Group Research Institute Co Ltd
Priority date: 2021-12-09
Filing date: 2021-12-09
Publication date: 2022-03-04

Abstract

The invention discloses a filling station safety helmet wearing real-time detection method based on YOLOv4-Tiny, which comprises the following steps: s1, performing frame extraction processing on the original monitoring video of the gas station, extracting a plurality of monitoring pictures, preprocessing the monitoring pictures to obtain training pictures, selecting training pictures with corresponding proportions to generate countermeasure samples, and taking the rest training pictures as original samples; s2, generating a first training set based on the confrontation sample and the original sample; s3, training the YOLOv4-Tiny improved model by adopting a first training set to obtain corresponding training weight; s4, accessing a gas station for real-time monitoring, performing frame decomposition, inputting the decomposition picture of each frame into a trained YOLOv4-Tiny improved model in real time to obtain a personal safety helmet wearing classification result in the decomposition picture, and overlapping the classification result into the decomposition picture to obtain a classification picture; and S5, synthesizing the classified pictures into videos in real time, and outputting the videos in real time. The detection accuracy is higher, the real-time is stronger, and the adaptation is in the interior hardware equipment basis of oil station.

Description

Filling station safety helmet wearing real-time detection method based on YOLOv4-Tiny

Technical Field

The invention belongs to the technical field of intelligent recognition of safety images of a gasoline station, and particularly relates to a method for detecting wearing of safety caps of the gasoline station based on YOLOv4-Tiny in real time.

Background

With the increasing complexity of industrial systems and the continuous development of artificial intelligence technology, many problems in industry can be solved by applying artificial intelligence technology to improve work efficiency. While the technology is being improved, worker safety is beginning to be a great concern. The safety helmet is used as important equipment for protecting the personal safety of workers, the safety of the workers can be effectively protected in actual industrial production, and production safety accidents are greatly reduced.

Most of the existing safety helmet detection methods are based on deep learning, and the existing safety helmet detection methods are used for specially detecting the safety helmet by replacing a training set with a classical algorithm in the field of target detection. Such as two-stage model R-CNN series and single-stage network YOLO series in target detection. However, in actual use, when an existing target detection model is used for detecting an actual safety helmet, due to the fact that the detection environment is severe and the size of the safety helmet is too small, the existing detection method has the problems of being high in requirement on the detection environment and low in detection precision. Secondly, hardware equipment in the gas station is updated slowly, and the calculation power of the hardware is low, so that the detection accuracy is improved, and meanwhile, the complexity of a target detection model needs to be considered so as to adapt to the hardware equipment foundation of the gas station.

Therefore, it is necessary to improve the existing detection model to adapt to the hardware device foundation in the gas station and improve the detection accuracy of the detection model.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a method for detecting the wearing real-time of the safety helmet of the gas station based on YOLOv4-Tiny, which can be adapted to the hardware equipment foundation in the gas station and has higher detection accuracy.

The invention adopts the following technical scheme: a filling station helmet wearing real-time detection method based on YOLOv4-Tiny comprises the following steps: s1, performing frame extraction processing on the original monitoring video of the gas station, extracting a plurality of monitoring pictures, preprocessing the monitoring pictures to obtain training pictures, selecting training pictures with corresponding proportions to generate countermeasure samples, and taking the rest training pictures as original samples;

s2, generating a first training set based on the confrontation sample and the original sample;

s3, training the YOLOv4-Tiny improved model by adopting a first training set to obtain corresponding training weight;

s4, accessing a gas station for real-time monitoring, performing frame decomposition, inputting the decomposition picture of each frame into a trained YOLOv4-Tiny improved model in real time to obtain a personal safety helmet wearing classification result in the decomposition picture, and overlapping the classification result into the decomposition picture to obtain a classification picture;

s5, synthesizing the classified pictures into videos in real time, and outputting the videos in real time;

the YOLOv4-Tiny improved model comprises a feature extraction backbone network module, a multi-scale feature fusion module and a classification prediction module which are sequentially connected, wherein the feature extraction backbone network module is a CSPdarknet53_ Tiny network;

the YOLOv4-Tiny improvement model also includes an attention mechanism module inserted into the residual network of the Resblock _ body module of the CSPdarknet53_ Tiny network.

As a preferred scheme, the attention mechanism module is a SENET network, the SENET network is a channel characteristic attention network, the input characteristic graph is subjected to global average pooling firstly, then passes through two full-connection layers, and finally outputs corresponding weight through a Sigmoid activation function, and the weight is multiplied by the input characteristic graph to obtain output.

Preferably, step S1 includes the steps of:

s1.1, selecting a monitoring video of a gas station in a period of time, and performing frame extraction processing to obtain monitoring pictures containing a safety helmet worn by a person and a safety helmet not worn by the person;

s1.2, marking the target position in the monitoring picture, and forming a label to obtain a training picture;

s1.3, selecting training pictures in corresponding proportions, and generating a confrontation sample by adopting a target confrontation object gradient attack method;

and S1.4, performing a scrambling operation on the confrontation sample and the rest training pictures, and performing a data enhancement operation to form a first training set.

Preferably, step S1.3 includes the steps of:

s1.3.1, training to obtain a trained original YOLOv4-Tiny network;

s1.3.2, inputting the selected training pictures with corresponding proportions into the trained original YOLOv4-Tiny network, and carrying out network forward propagation to obtain confidence loss;

s1.3.3, performing back propagation on the confidence loss, wherein the frozen network parameters are not variable in the back propagation process, and only the pixels of the picture can be modified;

s1.3.4, after each iteration, inputting the modified picture into the original well-trained YOLOv4-Tiny network, and if the target position cannot be correctly detected, stopping the iteration, wherein the picture becomes a countermeasure sample.

Preferably, in step S1.3.3, the propagation is reversed toward the direction of increasing confidence loss.

Preferably, step S1.3.1 includes the following steps:

a. taking all the training pictures as a second training set;

b. the training labels are changed into person, 50% hat and 100% hat, and are respectively used for detecting that the safety helmet is not worn, the safety helmet is not correctly worn and the safety helmet is correctly worn;

c. and inputting the second training set into an original YOLOv4-Tiny model to calculate loss, and performing back propagation to obtain corresponding network weight so as to obtain a trained original YOLOv4-Tiny network.

Preferably, in step c, the loss is calculated by the following formula:

，

wherein the content of the first and second substances,

a weight representing a coordinate loss;

weights representing mesh prediction class losses;

representing the number of grids divided by the picture;

representing the number of prediction boxes contained in each grid;

is shown as

A first of the grid

Whether each prediction frame is a responsible prediction frame or not is judged, if yes, 1 is selected, and if not, 0 is selected;

is shown as

A first of the grid

Whether the prediction frame is not the responsible prediction frame or not is judged, if so, 1 is selected, and if not, 0 is selected;

、

respectively represent by

The coordinates of the real marked central point of the target object in charge of each grid;

、

respectively represent by

Coordinates of a central point of a prediction frame of the target object in charge of each grid;

、

respectively represent by

The length and the width of the real mark of the target object in charge of each grid;

、

respectively represent by

The length and the width of a prediction frame of the target object in charge of each grid;

is represented by

Real classification results of the target object for which each grid is responsible;

is represented by

The prediction classification result of the target object is responsible for each grid;

is represented by

The object for which the grid is responsible belongs to

True classification probabilities for individual classes;

is represented by

The object for which the grid is responsible belongs to

A predicted classification probability for each class;

a set of class numbers is indicated.

Preferably, the CSPdarknet53_ tiny network includes a first darknencv 2D _ BN _ leak module, a second darknencv 2D _ BN _ leak module, a first Resblock _ body module, a second Resblock _ body module, a third Resblock _ body module, and a third darknencv 2D _ BN _ leak module, which are connected in sequence.

As a preferred scheme, the multi-scale feature fusion module comprises a first convolution layer, a second convolution layer, a first up-sampling layer, a first splicing layer, a third convolution layer, a second up-sampling layer and a second splicing layer which are sequentially connected from bottom to top;

the classification prediction module comprises a first Yolo Head classification network, a second Yolo Head classification network and a third Yolo Head classification network which are sequentially connected from bottom to top;

the first feature graph output by the third DarknetConv2D _ BN _ Leaky module is input to the first Yolo Head classification network through the first convolution layer on one hand, and is input to the second Yolo Head classification network through the second convolution layer and the first up-sampling layer on the other hand, and the second feature graph output by the second Resblock _ body module is input to the first splicing layer together for splicing to obtain a first fusion feature graph, the first fusion feature graph is input to the second Yolo Head classification network on the one hand, and is input to the second splicing layer together with the third feature graph output by the first Resblock _ body module through the third convolution layer and the second up-sampling layer on the other hand, so as to obtain a second fusion feature graph, and the second fusion feature graph is input to the third Yolo Head classification network.

Preferably, the DarknetConv2D _ BN _ Leaky modules are each composed of convolution Conv2D, batch normalized BN, and Leaky ReLU activation functions.

The invention has the beneficial effects that:

the detection is carried out based on a YOLOv4-Tiny model, and the YOLOv4-Tiny model belongs to a light weight version of a YOLOv4 model so as to adapt to the hardware equipment foundation in the gas station and improve the detection speed.

The YOLOv4-Tiny model has only two scales and is not accurate enough for the detection of the safety helmet, so the method improves the model, increases one scale to perform more-scale detection and improves the detection accuracy. An attention mechanism module is also inserted into a residual error network of a Resblock _ body module of the CSPdarknet53_ tiny network, so that the detection accuracy is improved, and meanwhile, the CSPdarknet53_ tiny network and the attention mechanism network are simple and are further adapted to a hardware equipment foundation in a gas station.

Based on an original YOLOv4-Tiny network, a target counterattack object gradient attack method is adopted to generate a countersample, and a YOLOv4-Tiny improved model is trained based on the countersample and the original sample to obtain a final detection model, so that the robustness of the detection model is improved, and the generalization capability is strong.

The method comprises the steps of carrying out frame decomposition on a monitoring video, detecting and classifying decomposed pictures of each frame to obtain classified pictures, and finally carrying out video synthesis and output on the classified pictures to realize real-time classification of wearing conditions of the personal safety helmet, so that better detection is realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a method for detecting the wearing of a safety helmet of a gas station based on YOLOv4-Tiny in real time according to the invention;

FIG. 2 is a schematic structural diagram of a YOLOv4-Tiny improved model;

fig. 3 is a schematic diagram of the structure of the SENet network.

Detailed Description

The following description of the embodiments of the present invention is provided by way of specific examples, and other advantages and effects of the present invention will be readily apparent to those skilled in the art from the disclosure herein. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

Referring to fig. 1, the embodiment provides a filling station helmet wearing real-time detection method based on YOLOv4-Tiny, which includes the steps:

s1, performing frame extraction processing on an original monitoring video of a gas station, extracting a plurality of monitoring pictures, preprocessing the monitoring pictures to obtain training pictures, selecting training pictures with corresponding proportions to generate confrontation samples, and taking the rest training pictures as original samples, wherein the proportion of 1/3 is selected in the embodiment and can be specifically set according to actual conditions;

In the invention, firstly, the detection is carried out based on a YOLOv4-Tiny model, and the YOLOv4-Tiny model belongs to a light weight version of a YOLOv4 model so as to adapt to the hardware equipment foundation in a gas station and improve the detection speed.

Secondly, the existing YOLOv4-Tiny model has only two scales, and the detection of the safety helmet is not accurate enough, so the model is improved, the feature extraction backbone network module adopts a CSPdarknet53_ Tiny network, and one scale is added to perform more-scale detection, and the detection accuracy is improved. An attention mechanism module is also inserted into a residual error network of a Resblock _ body module of the CSPdarknet53_ tiny network, so that the detection accuracy is improved, and meanwhile, the CSPdarknet53_ tiny network and the attention mechanism network are simple and are further adapted to a hardware equipment foundation in a gas station.

And thirdly, performing frame decomposition on the monitoring video, detecting and classifying the decomposed pictures of each frame to obtain classified pictures, and finally performing video synthesis and output on the classified pictures to realize real-time classification of the wearing condition of the personal safety helmet, thereby better detecting.

In addition, the YOLOv4-Tiny improved model is trained based on the confrontation sample and the original sample to obtain a final detection model, so that the robustness of the detection model is improved, and the generalization capability is strong.

Specifically, the method comprises the following steps:

in step S1, the method includes the steps of:

s1.1, selecting a monitoring video of a gas station in a time period, performing frame extraction processing, randomly selecting a plurality of pictures including pictures of workers with safety helmets and pictures without safety helmets, and uniformly cutting the pictures into 416 x 416 sizes to obtain monitoring pictures including pictures of workers wearing safety helmets and pictures of workers not wearing safety helmets;

s1.2, manually marking a target position in a monitored picture by adopting labelme software, forming a label, and simultaneously generating corresponding xml, json and png format files to obtain a training picture;

and S1.4, performing a scrambling operation on the confrontation sample and the rest of the training pictures, and performing a data enhancement operation to form a first training set, wherein the data enhancement operation comprises random cutting, turning, zooming and the like.

In step S1.3, the method comprises the steps of:

s1.3.1, training to obtain a trained original YOLOv4-Tiny network;

s1.3.2, inputting the selected 1/3 proportion training picture into the trained original YOLOv4-Tiny network, and carrying out network forward propagation to obtain confidence coefficient loss;

s1.3.3, performing back propagation on the confidence loss, wherein the frozen network parameters are not variable in the back propagation process, only the pixels of the picture can be modified, and in the back propagation process, the confidence loss is maximized (different from the traditional method for minimizing the loss), and the grid region containing the target in the picture can be classified into the background by maximizing the confidence loss, so as to generate a countermeasure sample with an attack effect;

In step S1.3.1, the method includes the following steps:

a. taking all the training pictures as a second training set;

In step c, the formula for calculating loss is as follows:

，

wherein the content of the first and second substances,

a weight representing a coordinate loss;

weights representing mesh prediction class losses;

representing the number of grids divided by the picture;

representing the number of prediction boxes contained in each grid;

is shown as

A first of the grid

is shown as

A first of the grid

、

respectively represent by

、

respectively represent by

、

respectively represent by

、

respectively represent by

is represented by

is represented by

A gridThe responsible target object prediction classification result;

is represented by

The object for which the grid is responsible belongs to

True classification probabilities for individual classes;

is represented by

The object for which the grid is responsible belongs to

A predicted classification probability for each class;

a set of class numbers is indicated.

Therefore, the method generates the corresponding confrontation sample by using the TOG method for the YOLOv4-Tiny model, so that the confrontation training result is better, namely the robustness and the generalization performance of the model are improved to a greater extent.

With reference to fig. 2 and 3, the YOLOv4-Tiny improved model and the detection process thereof are described in more detail as follows:

the CSPdark net53_ tiny network outputs three feature maps with different sizes, the sizes are respectively: 52, 26, 13, and the three different feature maps are input into the multi-scale feature fusion module for feature fusion. The multi-scale feature fusion module fuses the three feature maps to obtain three fused feature maps with the sizes of 52 × 52, 26 × 26 and 13 × 13. And the classification prediction module predicts the input feature maps with different scales and outputs a final detection result.

The attention mechanism module is a SENET network. The SENEt is a channel feature attention network, the input feature graph is subjected to global average pooling firstly, then passes through two full-connection layers, and finally outputs corresponding weight through a Sigmoid activation function, and the weight is multiplied by the input feature graph to obtain output.

The CSPdarknet53_ tiny network comprises a first darknencv 2D _ BN _ leak module, a second darknencv 2D _ BN _ leak module, a first Resblock _ body module, a second Resblock _ body module, a third Resblock _ body module, and a third darknencv 2D _ BN _ leak module, which are connected in sequence. The DarknetConv2D _ BN _ Leaky module consists of a convolution Conv2D, a Batch Normalization (BN) and a Leaky ReLU activation function. The Resblock _ body module carries out residual error calculation on the input characteristic diagram through a DarknetConv2D _ BN _ Leaky module, outputs the residual error calculation output, and then splices the residual error calculation output with the input of the module to output.

The multi-scale feature fusion module comprises a first convolution layer, a second convolution layer, a first up-sampling layer, a first splicing layer, a third convolution layer, a second up-sampling layer and a second splicing layer which are sequentially connected from bottom to top;

Wherein the first signature size is 13 × 13, the second signature size is 26 × 26, the third signature size is 52 × 52, the first fused signature size is 26 × 26, and the second fused signature size is 52 × 52.

The Yolo Head classification network has the same structure as the Yolo Head of the existing Yolo 4-Tiny algorithm, and redundant explanation is not provided here.

The improved YOLOv4-Tiny model adopts the fusion feature layer of three dimensions 13 x 13, 26 x 26 and 52 x 52 to detect, and compared with the original model which only has two dimensions, the detection of the detector on Tiny objects can be greatly improved. The existing YOLOv4-Tiny has only two scales, and the detection capability of the existing YOLOv4-Tiny on the details and Tiny objects cannot meet the existing production requirement, so that more scales are required to participate in the reasoning of the model. Taking 13 × 13 as an example, the Yolo Head prediction process is as follows: the input picture is divided into 13 × 13 cells, and if the center of an object is detected in a certain cell, the cell is taken as a prediction cell of the object. And each cell can generate three anchor frames, and a total of 13 × 3=507 anchor frames are generated for prediction. And when the confidence coefficient of the object is greater than the threshold value, three anchor frames are reserved, and the optimal anchor frame is screened out by using non-maximum value inhibition and is used as the final prediction frame of the object. Three-scale prediction can predict more objects than two-scale prediction. And the fusion feature layers with different scales are suitable for detecting objects with different volumes.

The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention by those skilled in the art should fall within the protection scope of the present invention without departing from the design spirit of the present invention.

Claims

1. A filling station helmet wearing real-time detection method based on YOLOv4-Tiny is characterized by comprising the following steps:

s1, performing frame extraction processing on the original monitoring video of the gas station, extracting a plurality of monitoring pictures, preprocessing the monitoring pictures to obtain training pictures, selecting training pictures with corresponding proportions to generate countermeasure samples, and taking the rest training pictures as original samples;

2. The method as claimed in claim 1, wherein the attention mechanism module is a SENET network, the SENET network is a channel feature attention network, the input feature map is first subjected to global average pooling, then passes through two full connection layers, and finally outputs a corresponding weight through a Sigmoid activation function, and the weight is multiplied by the input feature map to obtain an output.

3. The method for detecting the wearing of the safety helmet of the gasoline station based on YOLOv4-Tiny as claimed in claim 1, wherein the step S1 comprises the steps of:

4. The method for detecting the wearing condition of the gas station safety helmet based on YOLOv4-Tiny as claimed in claim 3, wherein the step S1.3 comprises the steps of:

s1.3.1, training to obtain a trained original YOLOv4-Tiny network;

5. The method for detecting wearing of safety caps of gas stations based on YOLOv4-Tiny as claimed in claim 4, wherein the step S1.3.3 is backward propagated in the direction of increasing confidence loss.

6. The method for detecting the wearing condition of the safety helmet of the gasoline station based on YOLOv4-Tiny as claimed in claim 4, wherein the step S1.3.1 comprises the following steps:

a. taking all the training pictures as a second training set;

7. The method for detecting the wearing condition of the gas station safety helmet based on YOLOv4-Tiny as claimed in claim 6, wherein in the step c, the loss is calculated by the formula:

，

wherein the content of the first and second substances,

a weight representing a coordinate loss;

weights representing mesh prediction class losses;

representing the number of grids divided by the picture;

representing the number of prediction boxes contained in each grid;

is shown as

A first of the grid

is shown as

A first of the grid

、

respectively represent by

、

respectively represent by

、

respectively represent by

、

respectively represent by

is represented by

is represented by

is represented by

The object for which the grid is responsible belongs to

True classification probabilities for individual classes;

is represented by

The object for which the grid is responsible belongs to

A predicted classification probability for each class;

a set of class numbers is indicated.

8. The method of claim 1, wherein the CSPdarknet53_ Tiny network comprises a first darknencv 2D _ BN _ leak module, a second darknencv 2D _ BN _ leak module, a first Resblock _ body module, a second Resblock _ body module, a third Resblock _ body module, and a third darknencv 2D _ BN _ leak module, which are sequentially connected.

9. The method for detecting wearing of a gas station helmet based on YOLOv4-Tiny according to claim 8, wherein the multi-scale feature fusion module comprises a first convolution layer, a second convolution layer, a first up-sampling layer, a first splicing layer, a third convolution layer, a second up-sampling layer and a second splicing layer which are sequentially connected from bottom to top;

10. The method for detecting wearing of safety caps of gas stations based on YOLOv4-Tiny as claimed in claim 8, wherein the DarknetConv2D _ BN _ Leaky modules are composed of convolution Conv2D, batch standardized BN and Leaky ReLU activation functions.