CN117423062B

CN117423062B - Construction site safety helmet detection method based on improved YOLOv5

Info

Publication number: CN117423062B
Application number: CN202311509438.8A
Authority: CN
Inventors: 李跃华; 张月月; 吴赛林; 王金凤; 姚章燕; 魏浩宇; 胡彬
Original assignee: Nantong University
Current assignee: Nantong University
Priority date: 2023-11-13
Filing date: 2023-11-13
Publication date: 2024-07-19
Anticipated expiration: 2043-11-13
Also published as: CN117423062A

Abstract

The invention provides an improved YOLOv-based construction site safety helmet detection method, and belongs to the technical field of computer vision target detection. The technical problems of low detection precision, small target false leakage detection and high detection difficulty of shielding targets in the prior art are solved. The technical proposal is as follows: the method comprises the following steps: s1, sample data acquisition; s2, using a AFPN structure to replace an FPN structure in Neck networks; s3, adopting Slide Loss to replace BCE-Loss; s4, replacing a boundary box regression loss function; s5, obtaining a result of whether a worker wears the safety helmet. The beneficial effects of the invention are as follows: through improving YOLOv network structure, distribute more attention to difficult sample, draw the characteristic information of more targets, optimize regression loss function and promote detection precision and speed, reduce the emergence of building site dangerous accident.

Description

Construction site safety helmet detection method based on improved YOLOv5

Technical Field

The invention relates to the technical field of computer vision target detection, in particular to an improved YOLOv-based construction site safety helmet detection method.

Background

The building industry is one of important substance production departments and prop industries of national economy in China, and plays an important role in improving living conditions, perfecting infrastructure, absorbing labor employment, promoting economic growth and the like. Along with the continuous promotion of the urban process in China, the construction engineering scale is continuously enlarged, the construction quality is safe, the construction quality still can not fall off, and the operation risk seriously threatens the life safety of workers, such as: high altitude falling, collapse, etc. How to strengthen the safety management of construction sites, reduce the accident occurrence frequency, stop various illegal operations and non-civilized construction, and improve the quality of construction engineering is still an important research subject in front of government departments, industry personnel and scholars at all levels.

The early building site supervision adopts artificial mode, not only consuming time and labor, and can't monitor whether unsafe action exists in the workman at any time, leads to building site incident to take frequently. With the continuous development of deep learning in the field of target detection, a new method is provided for wearing detection of the safety helmet. Currently, the mainstream target detection algorithms are divided into two categories: the two-stage algorithm firstly extracts candidate frames and then carries out regression positioning, wherein the candidate frames comprise RCNN, fast-RCNN and the like, and the accuracy is high but the training is complicated; the one-stage algorithm directly obtains the target category score and the position coordinates, including SSD, YOLO and the like, and the operation speed is high, but the accuracy is slightly lower than that of the two-stage algorithm.

Many scholars also apply deep learning to helmet detection, such as Hou Gongyu and the like, reconstruct yolov5 main network, replace feature pyramid reinforcement feature fusion and introduce a attention mechanism to realize the detection of the helmet; xu Shoukun, et al, based on the original FASTER RCNN, enhance the robustness of network detection of targets of different sizes by using methods such as multi-scale training, and deal with the problem of sample imbalance. Although the detection effect of the safety helmet is improved to a certain extent by the method, the detection method of the safety helmet still has the problems of low precision, false detection of small target omission, high detection difficulty of shielding target and the like under the complex environment background.

Disclosure of Invention

Aiming at the technical problems of low detection precision, false detection of small targets and higher detection difficulty of shielding targets at present, the invention provides an improved YOLOv-based construction site safety helmet detection method, which is characterized in that a YOLOv5 network structure is improved, more attention is distributed to difficult samples, more characteristic information of the targets is extracted, a regression loss function is optimized to improve detection precision and speed, and dangerous accidents of a construction site are reduced.

In order to achieve the aim of the invention, the invention adopts the technical scheme that: the construction site safety helmet detection method based on the improved YOLOv comprises the following steps of:

S1, sample data acquisition

(1) The safety helmet data set is manufactured by downloading the public data set, acquiring the related data of the safety helmet on the construction site of the construction site by utilizing technologies such as a web crawler tool and the like, and capturing video frames of effective samples from the monitoring video.

(2) Labeling the dataset pictures by using a LabelImg target detection labeling tool, wherein the labeling classes comprise a hat class and a person class; the safety helmet marking area comprises a head part above the neck and a safety helmet; after the labeling is completed, an xml-format tag file is formed and is generally stored in a Annotations folder to form a voc-format (xml file) data set.

(3) Converting the tag file from the voc format to the YOLO dataset tag file format and following 9: the scale of 1 divides the data into a training set and a validation set.

S2, using AFPN structure to replace FPN structure in Neck network

The FPN in the original YOLOv transfers the Low-Level features to the High-Level features in a top-down mode so as to realize the fusion of the features of different levels; however, the High-Level Low-Level features are not fused with the Low-Level High-Level features, so that the semantic gap between non-adjacent levels is increased, and a progressive feature pyramid network AFPN is used to replace FPN to solve the problem.

AFPN is progressive, in the process of extracting the features of the trunk from bottom to top, the fusion process is started by combining two low-level features with different resolutions at the initial stage, then deep features are fused, finally high-level features of the trunk are fused, larger semantic differences between non-adjacent layers are avoided, and the representation of the feature pyramid is enhanced. In the multi-level feature fusion process, the self-adaptive spatial fusion operation ASFF is used for distributing different spatial weights for the features of different layers, enhancing the importance of a key layer and inhibiting information contradiction between the features of different layers, wherein the feature fusion expression is as follows:

In the method, in the process of the invention, The feature vector representing the position (i, j) from layer n to layer l, where the number of fused layers is set to 3, i.e. the values of n are 1, 2 and 3 respectively; spatial weights of features of 3 layers in layer l are respectively represented; constraint Representing the resulting feature vector.

S3, replacing BCE-Loss by Slide Loss

Because of the large number of sample pictures in the safety helmet, the number of easy samples is quite large, the difficult samples are relatively sparse, and in order to solve the problem of sample unbalance, a Slide Loss function is adopted to replace a cross entropy Loss function. The Slide Loss function is improved based on a cross entropy Loss function, wherein the cross entropy Loss function L is:

Where x is the true value of the sampling class, x 'is the activation function, x' e (0, 1), ln is the logarithm based on the irrational number e.

It can be seen that x' represents a probability prediction for future targets, the lower the probability of prediction for a foreground target, the greater the loss; however, the iteration speed of the method is slower and cannot reach the optimal value, the Slide Loss function (f (x)) firstly divides the sample into a positive sample and a negative sample through a parameter mu, and the negative sample is smaller than mu and the positive sample is larger than mu; then, the samples at the boundary are emphasized through the weighting function, and higher weight is redistributed to the difficult samples, so that the model obtained through training can concentrate more attention on the samples which are difficult to be classified and wrong, the detection performance of the small target and the shielding target can be improved, and the expression of the Slide Loss function is as follows:

Where x is IoU of the bounding box, the threshold μ is the average of IoU of all bounding boxes, f (x) is the Slide Loss, and e is the base of the natural logarithm.

S4, replacing boundary box regression loss function

In order to solve the above problem, MPDIoU is used instead of CIOU, wherein all factors considered in the existing loss function, such as overlapping or non-overlapping area, center point distance and deviation of height and width, are included, and meanwhile, calculation is simplified, so that faster convergence speed and more accurate regression result are obtained. The MPDIoU regression loss function has the expression:

L_MPDIoU＝1-MPDIoU

Wherein B _prd,B_gt is a prediction frame and a real frame, in which For prediction blocks, it is ensured thatW, h is the width and height of the input picture; Representing squares of distances between the top left corner and the bottom right corner of the prediction box and the bounding box; representing the area of the prediction frame; representing the area of the real frame; Coordinates of rectangles of the overlapping portions, respectively; i is the area of the overlapping portion; the IOU is an anchor frame loss function, the result of which is obtained by dividing the part I where two regions of the prediction frame and the real frame overlap by the aggregate part u of the two regions (u=a ^gt+A^prd -I).

And S5, training the improved YOLOv network model by using the safety helmet data set to obtain a trained model. And inputting the image to be detected and the video into a trained model for reasoning and predicting to obtain a result of whether a worker wears the safety helmet.

Compared with the prior art, the invention has the beneficial effects that:

1. According to the invention, YOLOv networks are improved, a AFPN progressive pyramid is used for replacing FPN in Neck, progressive connecting paths are introduced, direct interaction between non-adjacent layers is enhanced, feature graphs of different layers are fused, feature expression capability is enhanced, and target detection precision and reasoning speed are improved.

2. According to the invention, the Slide Loss function is used for replacing the original YOLOv classification Loss function, so that the problem of unbalance between difficult and easy samples is solved, and the detection performance of a small target and a shielding object is improved; using MPDIoU instead of CIOU, a IoU loss based on minimum point distance is proposed, minimizing the distance between the top left and bottom right corner between the predicted bounding box and the real box, resulting in faster convergence speed and more accurate regression results.

3. The invention can accurately detect the wearing condition of the safety helmet of the construction worker, improves the detection precision of the small target, reduces the condition of false detection of missed detection, and effectively reduces the occurrence frequency of safety accidents of the construction site.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.

Fig. 1 is a flowchart of a method for detecting a safety helmet with a YOLOv network structure after improvement according to the present invention.

FIG. 2 is a schematic diagram illustrating the operation of AFPN of the present invention.

FIG. 3 is a graph showing Slide Loss function according to the present invention.

FIG. 4 is a graph showing the comparative effects of the present invention before and after improvement.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. Of course, the specific embodiments described herein are for purposes of illustration only and are not intended to limit the invention.

Example 1

Referring to fig. 1 to 4, the technical scheme provided by the embodiment is that the construction site safety helmet detection method based on the improved YOLOv comprises the following steps:

S1, sample data acquisition

Firstly, collecting network pictures through a python crawler technology, downloading SHWD and other public data sets, and removing unqualified pictures of scenes, such as: the classroom monitors the captured pictures and the like and gathers some building site videos as test samples.

Secondly, the collected pictures are arranged and converted into a JPG format;

Finally, a LabelImg tool is used for marking, including a safety helmet (hat) and a person (person), wherein the safety helmet marking area comprises a head above the neck and the safety helmet, the file is placed in a Annotations folder in an xml format, and the mark format of the YOLO is txt, so that the mark is converted into the YOLO format through a python code, and the training set and the verification set are divided according to a certain proportion.

S2, using AFPN structure to replace FPN structure in Neck network

The FPN in the source YOLOv uses a top-down mode to transfer the Low-Level (High-Level) feature to the High-Level (Low-Level) feature so as to realize the fusion of the features of different layers. However, the High-Level (Low-Level) features are not fused with the Low-Level (High-Level) features, and in addition, as the features of targets with different scales are different in the feature graphs with different levels, the performance is reduced due to the fact that prediction is performed only at a single Level, a progressive feature pyramid AFPN is used for replacing FPN, and multi-scale semantic information is extracted by fusing the feature graphs with different levels. AFPN is mainly composed of two parts: progressive junction and feature fusion. The progressive connection enables direct information transfer between non-adjacent layers, so that redundancy and loss of information can be avoided, and feature representation capability is improved.

Specifically, AFPN performs up-sampling on the bottom-layer feature map to obtain a feature map with higher resolution, and then performs element-by-element addition on the feature map and the feature map of the previous layer to obtain a fused feature map, which can be iterated until the feature map of the highest layer is reached. Wherein the upsampling operation may use a1 x 1 convolution and bilinear interpolation method. The feature fusion adopts self-adaptive space fusion operation-ASFF, so that different space weights are distributed for the features of different levels, the features of different levels are fused according to proper proportion, the contribution of low-level and high-level features is balanced better, and the performance of target detection is improved. The feature fusion expression is:

S3, replacing BCE-Loss by Slide Loss

In most cases, the number of easy samples is large, the difficult samples are relatively sparse, and Slide Loss is used for solving the problem of sample unbalance. The distinction between simple and difficult samples is based on the IoU sizes of the predicted and real frames, in order to reduce the superparameter, the average of IoU values of all bounding boxes is taken as the threshold μ, negative samples less than μ, positive samples greater than μ.

Since classification ambiguity can result in a large Loss of a small number of samples located near the boundary, trying to assign higher weights to difficult samples, emphasizing the samples at the boundary by Slide Loss, in order for the model to learn to optimize the samples and fully use them to train the network. The Slide Loss function has the expression:

In the existing target detection classification function, the cross entropy Loss function is still the main stream, but the cross entropy Loss function has poor distinguishing performance on difficult and easy samples, and Slide Loss has certain advantages in the aspect, so that the detection performance of a small target and a shielding target can be improved.

S4, replacing boundary box regression loss function

The existing boundary box regression loss function cannot be optimized under the condition that the prediction box and the real box have the same aspect ratio but completely different width and height values, so that MPDIoU regression loss function is adopted for replacement, the effect of boundary box regression can be effectively improved, and the convergence speed and regression accuracy are improved. MPDIoU is a boundary frame similarity comparison measure based on the minimum point distance, which directly minimizes the distance between the predicted point and the left upper corner and the right lower corner of the predicted frame and the real frame, fully considers the related factors considered in the existing loss function, and simplifies the calculation process.

Firstly, calculating squares of distances between the left upper corner and the right lower corner of the prediction frame and the boundary frame:

secondly, calculating the areas of a predicted frame and a real frame, and preparing for the subsequent calculation of IoU values:

then, calculating coordinates of four points of the overlapping part of the prediction frame and the real frame, and further obtaining the area of the overlapping part:

Finally, MPDIoU is calculated according to the formula, and then the value of L _MPDIoU is obtained:

L_MPDIoU＝1-MPDIoU

And S5, training the improved YOLOv network model by using the safety helmet data set to obtain a trained model.

Parameters of YOLOv network structures to be improved are set, the model is selected for YOLOv s, a weight file is selected for yolov5.Pt files, the learning rate is 0.01, epoch is set to 300 rounds, batch-size is set to 16, the manufactured data set is used for training, and the parameters of the network structures are continuously optimized according to training conditions until an improved model with the best training effect is obtained.

And finally, inputting the image to be detected and the video into a trained model for reasoning and predicting, obtaining and outputting the result of whether the worker wears the safety helmet.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. The construction site safety helmet detection method based on the improved YOLOv is characterized by comprising the following steps of:

S1, sample data acquisition

(1) The method comprises the steps of manufacturing a safety helmet data set by downloading a public data set, acquiring data related to safety helmets on construction sites of construction sites by utilizing a web crawler tool technology, and capturing video frames of effective samples from monitoring videos;

(2) Labeling the dataset pictures by using a LabelImg target detection labeling tool, wherein the labeling classes comprise a safety helmet and a person; the safety helmet marking area comprises a head part above the neck and a safety helmet; after the labeling is completed, forming a tag file in an xml format, and storing the tag file in a Annotations folder to form a data set of the voc format or the xml file;

(3) Converting the tag file from the voc format to the YOLO dataset tag file format and following 9:1 dividing the data into a training set and a verification set;

s2, using AFPN structure to replace FPN structure in Neck network

The FPN in the original YOLOv transfers the low-level features to the high-level features in a top-down mode to realize the fusion of the features of different levels, as the high-level features cannot be fused with the low-level features, the semantic gap between non-adjacent levels is enlarged, and a progressive feature pyramid network AFPN is used for replacing the FPN;

S3, replacing BCE-Loss by Slide Loss

Because of the large number of sample pictures of the safety helmet, the number of easy samples is quite large, the difficult samples are relatively sparse, and a Slide Loss function is adopted to replace a cross entropy Loss function;

S4, replacing boundary box regression loss function

The existing boundary box regression loss function cannot be optimized under the condition that the prediction box and the actual labeling box have the same aspect ratio, but the width and the height values are completely different, MPDIoU is used for replacing CIOU, and all factors considered in the existing loss function are included, including overlapping or non-overlapping area, center point distance and deviation of the height and the width;

s5, training the improved YOLOv network model by utilizing the safety helmet data set to obtain a trained model, and inputting an image to be detected and a video into the trained model for reasoning and predicting to obtain a result of whether a worker wears the safety helmet;

In the step S2, the framework of AFPN is that in the process of extracting the features of the trunk from bottom to top, in the initial stage, a fusion process is started by combining two low-level features with different resolutions, then deep features are fused, finally high-level features of the trunk are fused, in the process of multi-level feature fusion, the self-adaptive spatial fusion operation ASFF is used for distributing different spatial weights for the features of different levels, the importance of a key layer is enhanced, the information contradiction between the features of different levels is suppressed, and the feature fusion expression is as follows:

In the method, in the process of the invention, The feature vector representing the position (i, j) from layer n to layer l, where the number of fused layers is set to 3, i.e. the values of n are 1, 2 and 3 respectively; spatial weights of features of 3 layers in layer l are respectively represented; constraint Representing a resulting feature vector;

in the step S3, x' represents probability prediction of a future target, the Slide Loss function (f (x)) divides the sample into a positive sample and a negative sample through a parameter μ, wherein the negative sample is smaller than μ, and the positive sample is larger than μ; then, the samples at the boundary are emphasized through a weighting function, and higher weight is reassigned to the difficult samples, so that the model obtained through training focuses more attention on the samples which are difficult to classify and are wrong, and the Slide Loss function has the expression:

wherein x is IoU of the bounding box, the threshold μ is the average value of IoU of all bounding boxes, f (x) is the Slide Loss value, and e is the base of natural logarithm;

in the step S4, the expression of MPDIoU regression loss function is:

L_MPDIoU＝1-MPDIoU

2. The improved YOLOv-based construction site helmet detection method according to claim 1, wherein in the step S3, a Slide Loss function is improved based on a cross entropy Loss function, wherein the cross entropy Loss function L is: