CN115546614B - Safety helmet wearing detection method based on improved YOLOV5 model - Google Patents

Safety helmet wearing detection method based on improved YOLOV5 model Download PDF

Info

Publication number
CN115546614B
CN115546614B CN202211534970.0A CN202211534970A CN115546614B CN 115546614 B CN115546614 B CN 115546614B CN 202211534970 A CN202211534970 A CN 202211534970A CN 115546614 B CN115546614 B CN 115546614B
Authority
CN
China
Prior art keywords
feature
feature map
module
image
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211534970.0A
Other languages
Chinese (zh)
Other versions
CN115546614A (en
Inventor
张艳
梁化民
刘业辉
孙晶雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Chengjian University
Original Assignee
Tianjin Chengjian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Chengjian University filed Critical Tianjin Chengjian University
Priority to CN202211534970.0A priority Critical patent/CN115546614B/en
Publication of CN115546614A publication Critical patent/CN115546614A/en
Application granted granted Critical
Publication of CN115546614B publication Critical patent/CN115546614B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a safety helmet wearing detection method based on an improved YOLOV5 model, which comprises the following steps: randomly selecting images from the helmet wearing image dataset to perform image data enhancement to obtain data-enhanced images, inputting the data-enhanced images into an improved Yolov5 model to perform training to obtain a trained improved Yolov5 model; the improved YOLOV5 model includes: embedding an inverted residual error module and an inverted residual error attention module in the feature extraction part to extract image features; designing a multi-scale feature fusion module in the feature fusion part for feature fusion, and generating four detection heads with different receptive fields; optimizing a prediction box regression loss function; finally, inputting the image to be detected into the trained improved YOLOV5 model to obtain a detection result of whether the related person wears the safety helmet or not; the invention effectively solves the problems of missed detection and false detection of small targets in the construction site video monitoring image, and improves the wearing detection precision of the safety helmet.

Description

Helmet wearing detection method based on improved YOLOV5 model
Technical Field
The invention relates to the technical field of image processing, in particular to a helmet wearing detection method based on an improved YOLOV5 model.
Background
At present, the building industry of China is still in a stage of continuous development, building employees are increasing every year, and in the safety management of a construction site, the safety helmet is used as a protective article capable of effectively preventing head injury accidents, can effectively absorb the impact force of falling objects on the heads of constructors, avoids or reduces the damage of the falling objects on the heads, and is a personal protective article which must be worn in production and construction activities specified by a safety production method. The safety helmet can be worn correctly in a construction scene, the casualty rate of personnel in production accidents can be effectively reduced, and the safety helmet has important significance for guaranteeing safe production.
At present, whether a worker wears a safety helmet or not is judged mostly by adopting a manual supervision method in a building site, the method easily causes waste of manpower and material resources, and the problems that the supervision effect is poor due to the limitation that a larger working range and manual operation easily cause fatigue and the like exist.
In recent years, with the continuous development of target detection technology, certain results are obtained in the detection research of safety helmets; compared with the traditional manual inspection which consumes time and labor, the method based on the machine vision has the characteristics of high automation degree, easiness in expansion and the like, so that the method becomes a current urgent need.
However, the existing detection method based on traditional machine learning mainly identifies the shape and color features of the safety helmet, for example, the skin color detection method is used to locate the human face, and the detection of the safety helmet is realized by using the method of the support vector machine; although the detection speed of the traditional machine learning safety helmet detection algorithm is high, the design of characteristics and training classifiers needs to be carried out on specific detection objects, meanwhile, due to the characteristics of poor generalization capability, single characteristics and the like, the targets cannot be effectively detected in a complex construction environment, the problems of small target omission and false detection easily occur, and the wearing detection precision of the safety helmet in the complex environment is low.
Therefore, how to avoid the problems of missing detection and false detection of small targets when the safety helmet is worn and detected in a complex environment and improve the accuracy of wearing and detecting the safety helmet is a problem that needs to be solved by the technical personnel in the field.
Disclosure of Invention
In view of this, the invention provides a method for detecting wearing of a safety helmet based on an improved YOLOV5 model, which at least solves some of the above technical problems, and the method embeds an inverted residual module and an inverted residual attention module in a feature extraction part of the YOLOV5 model, so as to obtain abundant small target spatial information and deep semantic information, and improve the detection accuracy of small targets; a multi-scale feature fusion module is designed in a feature fusion part, the recognition capability of a model to small-size targets is improved, the missing detection of the small targets is reduced, the method can be used for effectively detecting the wearing of the safety helmet in a complex environment, the missing detection and the false detection of the small targets are avoided, and the wearing detection precision of the safety helmet in the complex environment is improved.
In order to achieve the purpose, the invention adopts the technical scheme that:
the embodiment of the invention provides a safety helmet wearing detection method based on an improved YOLOV5 model, which comprises the following steps:
s1, acquiring a helmet wearing image data set, and randomly selecting N images from the helmet wearing image data set to perform image data enhancement to obtain data-enhanced images;
s2, inputting the image subjected to data enhancement into an improved YOLOV5 model for training to obtain a trained improved YOLOV5 model; the improved YOLOV5 model comprises: embedding an inverted residual error module and an inverted residual error attention module in the feature extraction part to extract image features; designing a multi-scale feature fusion module in the feature fusion part for feature fusion, and generating four detection heads with different receptive fields; optimizing a prediction box regression loss function;
and S3, inputting the image to be detected into the trained improved YOLOV5 model to obtain a detection result of whether the related person wears the safety helmet or not.
Further, in step S1, N images are randomly selected from the helmet wearing image dataset to perform image data enhancement, where the image data enhancement includes:
turning, scaling and color gamut transformation are carried out on the image;
and randomly cutting the image after the turning, the scaling and the color gamut conversion according to a preset template and splicing.
Further, the scaling the image specifically includes: wearing an image dataset on the headgearSelecting N images at random, using the width and height of the images as boundary values, and carrying out t on the images x And t y Zooming of zooming magnification;
t x =f r (t w ,t w +Δt w )
t y =f r (t h ,t h +Δt h )
wherein, t w And t h Minimum values of wide and high magnification, Δ t, respectively w And Δ t h Length of random interval of wide and high magnification, respectively, f r Representing a random value function.
Further, splicing the zoomed images after randomly cutting the zoomed images according to a preset template specifically comprises:
determining an image template with the height h and the width w as the size of an output image, randomly generating four dividing lines in the width and height direction for cutting, splicing nine cut images, and cutting off an overflowing frame part; performing secondary cutting on the internal overlapped part, and obtaining a spliced image after cutting; this image was used as input layer data for the YOLOV5 convolutional neural network.
Further, in the step S2, the image feature extraction is performed by the inverted residual error module and the inverted residual error attention module embedded in the feature extraction part; the method specifically comprises the following steps:
a. inputting the image after the data enhancement into a Feature extraction module, performing convolution on the input image through a first layer of Focus, specifically, taking a value for every other pixel in each picture, similar to downsampling, dividing the image into four pictures in the way, wherein the four pictures are similar but have no information loss, concentrating the information into a channel space through the operation, expanding an input channel by 4 times, namely, a channel for splicing the pictures is 12 channels, and then performing convolution on the pictures to finally obtain a Feature map, and obtaining a Feature map Feature _ C0 after the Feature map passes through the Focus convolution and a convolution layer of 3 multiplied by 3;
b. inputting the Feature map Feature _ C0 into a first inverted residual error module, amplifying shallow features by adopting a channel expansion mode, realizing high-dimensional to low-dimensional Feature mapping by utilizing linear transformation through channel expansion of the input features, acquiring rich shallow information, extracting features by utilizing convolution, repeatedly learning the features by adopting a residual error connection mode, and outputting a Feature map Feature _ C1;
c. the Feature map Feature _ C1 is subjected to a layer of convolution and a second inverted residual error module to obtain a Feature map Feature _ C2, and then the Feature map Feature _ C2 is input into a first inverted residual error attention module through a layer of convolution to obtain a Feature map Feature _ C3; and after convolution with convolution kernel size of 3 × 3 and spatial pyramid pooling, the Feature map Feature _ C3 enters a second inverted residual attention module to obtain a Feature map Feature _ C4 which is used as input of the multi-scale Feature fusion module.
Further, in step S2, the designing a multi-scale feature fusion module in the feature fusion part to perform feature fusion and generate four detection heads with different receptive fields specifically includes the following steps:
1) Obtaining a Feature map Feature _ d1 by convolving the Feature map Feature _ C4 with the convolution kernel size of 3 multiplied by 3 and the channel number of 512, and obtaining a Feature map Feature _ Up1 through an Up-sampling operation;
2) Performing cascade operation on the Feature map Feature _ Up1 and the Feature map Feature _ C3 Feature of the Feature extraction module to obtain a Feature map Feature _ Fuse1, performing convolution with a C3 module and a convolution kernel of 3 × 3 and with a channel number of 256 to obtain a Feature map Feature _ d2, and performing Up-sampling operation to obtain a Feature map Feature _ Up2;
3) Performing cascade operation on the Feature map Feature _ Up2 and the Feature map Feature _ C2 to obtain a Feature map Feature _ Fuse2, then obtaining a Feature map Feature _ d3 through a C3 module and convolution, and obtaining a Feature map Feature _ Up3 through an Up-sampling operation;
4) Performing cascade operation on the Feature map Feature _ Up3 and the Feature map Feature _ C1 to obtain a Feature map Feature _ Fuse3, and then performing Feature extraction through a C3 module and convolution with a convolution kernel size of 1 × 1 to obtain a Feature map F4, wherein the Feature size is 1/4 of the original image and the Feature map F4 is used for detecting a minimum target;
5) The Feature map Feature _ Fuse3 is convolved with a convolution kernel of 3 multiplied by 3 by a C3 module and is cascaded with the Feature map Feature _ d3 to obtain a Feature map Feature _ Fuse4, and then Feature extraction is carried out on the Feature map by the convolution kernel of 1 multiplied by 1 by the C3 module to obtain a Feature map F3, wherein the Feature size is 1/8 of the original image; for the detection of small targets;
6) Obtaining a Feature map Feature _ Fuse5 after the Feature map Feature _ Fuse4 is convolved by a C3 module and a convolution kernel with the size of 3 × 3 and is cascaded with a Feature map Feature _ d2, and then performing Feature extraction by the C3 module and a convolution layer with the convolution kernel size of 1 × 1 to obtain a Feature map F2, wherein the Feature size is 1/16 of the original image and is used for detecting a medium target;
7) And cascading the Feature map Feature _ Fuse5 with the Feature map Feature _ d1 through convolution of a C3 module and a convolution kernel with the size of 3 × 3 to obtain a Feature map Feature _ Fuse6, and then performing Feature extraction through the C3 module and a convolution layer with the convolution kernel size of 1 × 1 to obtain a Feature map F1, wherein the Feature size is 1/32 of the original image and the Feature map F1 is used for detecting a large target.
Further, in the optimized prediction box regression loss function: CIoU loss function is adopted as prediction frame regression loss function L of improved YOLOV5 model algorithm CIoU It is defined as:
Figure GDA0004058460450000041
L CIoU =1-CIoU
wherein, ioU represents the intersection ratio of the prediction frame and the real frame, b represents the central point of the prediction frame gt Represents the center point of the real frame, and represents the Euclidean distance rho 2 (b,b gt ) C represents the diagonal distance of the minimum circumscribed rectangle which can simultaneously contain the prediction frame and the real frame; α represents a parameter for making a balance, and v represents a parameter for measuring the uniformity of the aspect ratio;
wherein, the parameter expressions of alpha and v are as follows:
Figure GDA0004058460450000042
Figure GDA0004058460450000043
wherein w and h are the width and height of the prediction box, respectively; w is a gt And h gt Respectively the width and height of the real box.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the helmet wearing detection method based on the improved YOLOV5 model, the inverted residual error module and the inverted residual error attention module are embedded in the feature extraction part, so that abundant small target space information and deep semantic information can be obtained conveniently, and the detection precision of small and medium targets in helmet wearing detection is improved;
2. according to the safety helmet wearing detection method based on the improved YOLOV5 model, the multi-scale feature fusion module is designed in the feature fusion part for feature fusion, and four detection heads with different receptive fields are generated, so that the recognition capability of the model on small-size targets is improved, and the missing detection of the small targets in the safety helmet wearing detection is reduced;
3. the helmet wearing detection method based on the improved YOLOV5 model provided by the embodiment of the invention is designed with a mosaic mixed data enhancement method, establishes a linear relation between data, increases the background complexity of an image, improves the robustness of an algorithm, and can effectively detect the wearing of a helmet in a complex environment.
Drawings
Fig. 1 is a flowchart of a method for detecting wearing of a safety helmet based on an improved YOLOV5 model according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of an improved YOLOV5 convolutional neural network provided in an embodiment of the present invention.
Fig. 3 is a schematic model diagram of an inverted residual error module according to an embodiment of the present invention.
Fig. 4 is a model diagram of an inverted residual attention module according to an embodiment of the present invention.
Fig. 5 is an average precision mean index graph of the improved YOLOV5 model provided by the embodiment of the present invention after 100 times of training.
Fig. 6 is a comparison graph of the results of the detection of the wearing of the helmet before and after the improvement of the YOLOV5 model provided by the embodiment of the present invention.
Fig. 7 is another comparison graph of the results of the detection of the wearing of the helmet before and after the improvement of the YOLOV5 model provided by the embodiment of the present invention.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.
In the description of the present invention, it should be noted that the terms "upper", "lower", "left", "right", "front", "rear", "both ends", "one end", "the other end", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be configured in a specific orientation, and operate, and thus, should not be construed as limiting the present invention. Furthermore, the ordinal numbers "(1)", "(2)", etc. are for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The embodiment of the invention applies the target detection algorithm based on the convolutional neural network to the wearing detection of the safety helmet, and performs optimization and improvement by combining the task characteristics of the wearing detection of the safety helmet on the basis of the Yolov5 model, thereby realizing a more accurate and intelligent detection scheme.
Referring to fig. 1, the invention provides a method for detecting wearing of a safety helmet based on an improved YOLOV5 model, which comprises the following steps:
s1, acquiring a helmet wearing image data set, and randomly selecting N images from the helmet wearing image data set to perform image data enhancement to obtain data-enhanced images;
s2, inputting the image subjected to data enhancement into an improved YOLOV5 model for training to obtain a trained improved YOLOV5 model; the improved YOLOV5 model comprises: embedding an inverted residual error module and an inverted residual error attention module in the feature extraction part to extract image features; designing a multi-scale feature fusion module in the feature fusion part for feature fusion, and generating four detection heads with different receptive fields; optimizing a prediction box regression loss function;
and S3, inputting the image to be detected into the trained improved YOLOV5 model to obtain a detection result of whether the related person wears the safety helmet or not.
The above steps are described in detail below:
in the step S1, firstly, a helmet wearing image data set is obtained, and data enhancement is performed on the data set, where the image data enhancement includes: turning, scaling and color gamut transformation are carried out on the image; randomly cutting the image after the turning, the scaling and the color gamut conversion according to a preset template and splicing; in this embodiment, an improved mosaic data enhancement method is adopted, which specifically includes: randomly selecting 9 images from the helmet wearing image data set, and performing t on the images by using the width and height of the images as boundary values x And t y Zooming with zooming magnification;
t x =f r (t w ,t w +Δt w )
t y =f r (t h ,t h +Δt h )
wherein, t w And t h Minimum values of wide and high magnification, Δ t, respectively w And Δ t h Width and height, respectivelyLength of random interval of zoom factor, f r Representing a random value function.
Further, determining an image template with the height h and the width w as the size of an output image, randomly generating four dividing lines in the width and height direction, splicing nine cut images, and cutting out an overflowing frame part; and performing secondary cutting on the internal overlapped part, obtaining a spliced image after cutting, and taking the image as the input layer data of the YOLOV5 model convolutional neural network.
In this embodiment, the improved mosaic data enhancement method is to add 9 images on the basis of adopting four images for random splicing. Namely, each image has a corresponding frame, and nine images are combined together after random cutting and random splicing, so that balance among targets with different scales is realized.
In the step S2, inputting the data-enhanced image into the improved YOLOV5 model for training to obtain a trained improved YOLOV5 model; in the embodiment of the present invention, the improved YOLOV5 model includes: 1. embedding an inverted residual error module (IRC 3) and an inverted residual error attention module (IRAC 3) in a feature extraction part to extract image features; 2. designing a multi-scale feature fusion module in the feature fusion part for feature fusion, and generating four detection heads with different receptive fields; 3. and optimizing the regression loss function of the prediction box.
(1) A feature extraction section:
the main purpose of feature extraction is to learn the mapping relationship between high-resolution images and low-resolution images using a convolutional neural network. As shown in fig. 2, in the embodiment of the present invention, the feature extraction module mainly includes a slice convolution layer (Focus Conv), a convolution layer (Conv), an Inverted residual C3 (IRC 3) module, an Inverted residual Attention C3 (IRAC 3) module, and a feature pyramid module (SPP); wherein, the structures of IRC3 and IRAC3 are respectively shown in FIG. 3 and FIG. 4; wherein Conv represents convolution, H represents the height of the feature map, W represents the width of the feature map, C represents the number of channels of the feature map, 2C represents the number of channels obtained after twice expansion,. Alpha.represents fusion operation,
Figure GDA0004058460450000072
represents the stitching operation, DWConv (Depthwise Convolution) represents Depth Convolution, PWConv (Pointwise Convolution) represents point-by-point Convolution, ECA-Net (Efficient Channel Attention) represents valid Channel Attention, and SD (Stochartic Depth, SD) represents random Depth.
Referring to fig. 2, the specific process of feature extraction is as follows:
s1, an input image is subjected to a first layer of slice convolution (Focus Conv), specifically, every other pixel in each picture takes one value, the method is similar to down-sampling, the four pictures are divided into four pictures in the mode, the four pictures are similar but have no information loss, information is concentrated to a channel space through the operation, an input channel is expanded by 4 times, namely, a channel for splicing the pictures is 12 channels, and then the pictures are subjected to convolution operation to obtain a Feature map Feature _ C0, so that the parameter number is reduced and the training speed is improved;
s2, after passing through a convolutional layer (Conv), entering an inverted residual C3 module (IRC 3), wherein in the module, firstly, expansion of the number of channels is realized on an input Feature map by utilizing an expansion factor, then, high-dimensional channels are mapped to low-dimensional channels by utilizing linear change to obtain rich shallow features, and identity mapping is combined with the input features through residual operation to obtain a Feature map Feature _ C1;
s3, the Feature map Feature _ C1 passes through a convolution layer and an inverted residual error C3 module to obtain a Feature map Feature _ C2, and then passes through the convolution layer and enters an inverted residual error attention C3 (IRAC 3) module;
in the embodiment of the invention, the inverse residual attention C3 module comprises a depth separable convolution and effective channel attention module, and firstly passes through a depth convolution module which is designed to replace a standard convolution of 3 multiplied by 3 by using a depth convolution with less parameters and low calculation complexity. As shown in the following formula:
Figure GDA0004058460450000071
wherein
Figure GDA0004058460450000081
Represents a deep convolution kernel, i, j represents the convolution kernel size, k, l represents the feature map size, and->
Figure GDA0004058460450000082
The c-th convolution kernel in (a) is applied to the c-th channel, in the feature multiplied therewith, in->
Figure GDA0004058460450000083
For the c-th channel of the feature map, the features of the deep convolution outputs are calculated by a 1 × 1 convolution and combined linearly, on the basis of the results of the linear combination>
Figure GDA0004058460450000084
Represents the Feature after 3x3 convolution of the Feature map Feature _ C2.
Further, after the depth separable convolution, further by the attention of the effective channel, the attention of the effective channel captures local cross-channel interaction information by each channel and k adjacent channels thereof; finally, after point convolution with convolution kernel size of 1 multiplied by 1, reducing the channel number into an original channel to obtain a Feature map Feature _ C3;
further, the obtained Feature _ C3 enters a second IRAC3 module in the Feature extraction module after being convolved by a convolution kernel with a size of 3 × 3 and further being subjected to Spatial Pyramid Pooling (SPP), so as to obtain a Feature _ C4.
(2) A feature fusion part:
referring to fig. 2, a specific process of feature fusion is as follows:
s1, a final Feature map Feature _ C4 of a Feature extraction part enters a multi-scale Feature fusion module, a Feature map Feature _ d1 is obtained through convolution with a convolution kernel size of 3 multiplied by 3 and a channel number of 512 (convolution is represented by Conv in the figure and the same principle is applied below), and a Feature map Feature _ Up1 is obtained after upsampling operation (represented by 'Upesple' in figure 2 and the same principle is applied below);
s2, carrying out cascade operation on the Feature map Feature _ Up1 and the Feature _ C3 of the Feature extraction module to obtain a Feature map Feature _ Fuse1; obtaining a Feature map Feature _ d2 through convolution with a convolution kernel of 3 multiplied by 3 and a channel number of 256 by a C3 module, and obtaining a Feature map Feature _ Up2 through an Up-sampling operation;
s3, performing cascade operation on the Feature map Feature _ Up2 and the Feature map Feature _ C2 to obtain a Feature map Feature _ Fuse2, then obtaining a Feature map Feature _ d3 through a C3 module and convolution, and obtaining a Feature map Feature _ Up3 through an Up-sampling operation;
s4, performing cascade operation on the obtained Feature map Feature _ Up3 and Feature _ C1 in a backbone network (a Feature extraction module) to obtain a Feature map Feature _ Fuse3, and then performing Feature extraction through a C3 module and a convolution layer with the convolution kernel size of 1 multiplied by 1 to obtain a Feature map F4, wherein the Feature size is 1/4 of the original image and is used for detecting a minimum target;
s5, convolving the Feature map Feature _ Fuse3 with a convolution kernel of 3 multiplied by 3 through a C3 module and cascading with the Feature map Feature _ d3 to obtain a Feature map Feature _ Fuse4, and then performing Feature extraction through the convolution of the convolution kernel of 1 multiplied by 1 through the C3 module to obtain a Feature map F3, wherein the Feature size is 1/8 of that of an original image and the Feature map F3 is used for detecting small targets;
s6, convolving the Feature map Feature _ Fuse4 with a convolution kernel of 3 multiplied by 3 size through a C3 module and cascading with the Feature map Feature _ d2 to obtain a Feature map Feature _ Fuse5, and then performing Feature extraction through the convolution layer of 1 multiplied by 1 size through the C3 module and the convolution kernel to obtain a Feature map F2, wherein the Feature size is 1/16 of the original image and the Feature map Feature _ Fuse is used for detecting a medium target;
and S7, cascading the Feature map Feature _ Fuse5 with the Feature map Feature _ d1 through convolution of a C3 module and a convolution kernel with the size of 3 multiplied by 3 to obtain a Feature map Feature _ Fuse6, and then performing Feature extraction through the C3 module and a convolution layer with the convolution kernel size of 1 multiplied by 1 to obtain a Feature map F1, wherein the Feature size is 1/32 of the original image and the Feature map F1 is used for detecting a large target.
(3) Optimizing a prediction box regression loss function:
in the embodiment of the invention, a CIoU loss function is adopted for doingPrediction box regression loss function L for improving YOLOV5 model algorithm CIoU Which is defined as:
Figure GDA0004058460450000091
L CIoU =1-CIoU
wherein, ioU represents the intersection ratio of the prediction frame and the real frame, b represents the central point of the prediction frame gt Represents the center point of the real frame, and represents the Euclidean distance rho 2 (b,b gt ) C represents the diagonal distance of the minimum circumscribed rectangle which can simultaneously contain the prediction frame and the real frame; alpha represents a parameter for weighing, and v represents a parameter for measuring the consistency of the aspect ratio;
wherein, the parameter expressions of alpha and v are as follows:
Figure GDA0004058460450000092
Figure GDA0004058460450000093
wherein w and h are the width and height of the prediction box, respectively; w is a gt And h gt Respectively the width and height of the real box.
Further, setting training parameters: the batch size is 16, the iteration times are 100, the initial learning rate is 0.01, the termination learning rate is 0.2, the momentum is 0.937, the weight attenuation is 0.0005, and a random gradient descent strategy is adopted for random attenuation;
in the embodiment of the invention, the rotation and horizontal mirroring methods are adopted to increase the images of the safety helmet at different angles and the improved mosaic method is combined to improve the identification capability of the object; and training by using the improved convolutional neural network and combining the optimized loss function, and finishing training to obtain the final improved YOLOV5 convolutional neural network.
Further, in the embodiment of the present invention, an Average Precision Mean (map0.5) is used as a relevant index for measuring model performance, as shown in fig. 5, which shows an Average Precision Mean index achieved after 100 times of training after improving the YOLOv5s model.
Further, inputting two images of wearing the safety helmet in a complex environment into the trained improved YOLOV5 model to obtain a detection result of whether related personnel wear the safety helmet or not; and comparing the result with the test result of the unmodified YOLOV5 model, wherein the comparison result is shown in fig. 6 and 7, fig. 6a and 7a are the test result of the wearing of the helmet of the unmodified YOLOV5 model, and fig. 6c and 7c are the partial enlarged views of the test result of the wearing of the helmet of the unmodified YOLOV5 model; fig. 6b and fig. 7b are partial enlarged views of the wearing test results of the improved YOLOV5 model helmet, and fig. 6d and fig. 7d are partial enlarged views of the wearing test results of the improved YOLOV5 model helmet; from a left-right comparison in fig. 6, it can be seen that the unmodified YOLOV5 model has a missing detection of the wearing of the helmet in the small targets (three wearing targets are detected in fig. 6c, and four wearing targets are detected in fig. 6 d); it can also be seen from fig. 7 that the unmodified YOLOV5 model also has a small target in which the wearing of the helmet is missed (fig. 7c shows that two targets are worn by the helmet, and fig. 7d shows that three targets are worn by the helmet); as can be known from fig. 6 and 7, the method of the present invention can effectively solve the problem of missing detection of wearing the safety helmet in a dense target, and can effectively detect wearing of the safety helmet on the target even in a complex scene, thereby avoiding missing detection of a small target.
Through the description of the embodiment, those skilled in the art can see that the invention provides a helmet wearing detection method based on an improved YOLOV5 model, and firstly an improved Mosaic data enhancement method is designed, so that the diversity of image samples is enriched, a linear relation between data is established, and the robustness of an algorithm is improved; secondly, optimizing a model backbone network aiming at the problem of low small target detection precision, embedding an inverted residual error module and an inverted residual error attention module in the backbone network part, and mapping by using low-dimensional to high-dimensional characteristic information so as to obtain abundant small target space information and deep semantic information and improve the small target detection precision; finally, a multi-scale feature fusion module is designed in the feature fusion part, shallow spatial information and deep semantic information are fused, four detection heads with different receptive fields are generated, the recognition capability of the model on small-size targets is improved, the missing detection of the small targets is reduced, the problems of the missing detection and the false detection of the small targets in the construction site video monitoring image can be effectively solved, and the wearing detection precision of the safety helmet is improved.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (4)

1. A safety helmet wearing detection method based on an improved YOLOV5 model is characterized by comprising the following steps:
s1, acquiring a helmet wearing image data set, and randomly selecting N images from the helmet wearing image data set to perform image data enhancement to obtain data-enhanced images;
s2, inputting the data-enhanced image into an improved YOLOV5 model for training to obtain a trained improved YOLOV5 model; the improved YOLOV5 model comprises: embedding an inverted residual error module and an inverted residual error attention module in the feature extraction part to extract image features; designing a multi-scale feature fusion module in the feature fusion part for feature fusion, and generating four detection heads with different receptive fields; optimizing a prediction box regression loss function;
s3, inputting the image to be detected into the trained improved YOLOV5 model to obtain a detection result of whether the related person wears the safety helmet or not;
in the step S1, N images are randomly selected from the helmet wearing image data set to perform image data enhancement, where the image data enhancement includes:
turning, scaling and color gamut transformation are carried out on the image;
randomly cutting the image after the turning, the scaling and the color gamut conversion according to a preset template and splicing;
the scaling of the image specifically comprises: randomly selecting N images in the image data set for wearing the helmet, and performing t on the images by using the width and the height of the images as boundary values x And t y Zooming of zooming magnification;
t x =f r (t w ,t w +Δt w )
t y =f r (t h ,t h +Δt h )
wherein, t w And t h Minimum values of wide and high magnification, Δ t, respectively w And Δ t h Length of random interval of wide and high magnification, respectively, f r Representing a random value function;
the splicing of the zoomed images after random cutting according to a preset template specifically comprises the following steps: determining an image template with the height h and the width w as the size of an output image, randomly generating four dividing lines in the width and height direction for cutting, splicing nine cut images, and cutting off an overflowing frame part; and performing secondary cutting on the internal overlapped part, and obtaining a spliced image after cutting.
2. The method for detecting wearing of a safety helmet based on the improved YOLOV5 model as claimed in claim 1, wherein in step S2, the image feature extraction is performed by embedding an inverted residual error module and an inverted residual error attention module in the feature extraction part; the method specifically comprises the following steps:
a. obtaining a Feature map Feature _ C0 by means of Focus convolution and convolution of 3 multiplied by 3 on the image after the data enhancement;
b. inputting the Feature map Feature _ C0 into a first inversion residual error module, amplifying shallow features in a channel expansion mode, extracting features by convolution, repeatedly learning features in a residual error connection mode, and outputting a Feature map Feature _ C1;
c. and the Feature map Feature _ C1 is subjected to a layer of convolution and a second inverted residual error module to obtain a Feature map Feature _ C2, the Feature map Feature _ C2 is input to a first inverted residual error attention module through a layer of convolution to obtain a Feature map Feature _ C3, the Feature map Feature _ C3 is subjected to convolution with a convolution kernel of 3 multiplied by 3 and spatial pyramid pooling, and then the Feature map Feature _ C3 enters a second inverted residual error attention module to obtain a Feature map Feature _ C4 which is used as the input of a multi-scale Feature fusion module.
3. The method for detecting wearing of a safety helmet based on the improved YOLOV5 model as claimed in claim 2, wherein in the step S2, the multi-scale feature fusion module is designed in the feature fusion part for feature fusion, and four detection heads with different receptive fields are generated, specifically comprising the following steps:
1) Obtaining a Feature map Feature _ d1 by convolving the Feature map Feature _ C4, and obtaining a Feature map Feature _ Up1 through an Up-sampling operation;
2) Performing cascade operation on the Feature map Feature _ Up1 and the Feature map Feature _ C3 to obtain a Feature map Feature _ Fuse1, performing convolution through a C3 module to obtain a Feature map Feature _ d2, and performing Up-sampling operation to obtain a Feature map Feature _ Up2;
3) Performing cascade operation on the Feature map Feature _ Up2 and the Feature map Feature _ C2 to obtain a Feature map Feature _ Fuse2, then obtaining a Feature map Feature _ d3 through a C3 module and convolution, and obtaining a Feature map Feature _ Up3 through Up-sampling operation;
4) Performing cascade operation on the Feature map Feature _ Up3 and the Feature map Feature _ C1 to obtain a Feature map Feature _ Fuse3, and then performing Feature extraction through a C3 module and convolution with a convolution kernel size of 1 × 1 to obtain a Feature map F4, wherein the Feature size is 1/4 of the original image;
5) The Feature map Feature _ Fuse3 is convolved with a convolution kernel of 3 multiplied by 3 by a C3 module and is cascaded with the Feature map Feature _ d3 to obtain a Feature map Feature _ Fuse4, and then Feature extraction is carried out on the Feature map by the convolution kernel of 1 multiplied by 1 by the C3 module to obtain a Feature map F3, wherein the Feature size is 1/8 of the original image;
6) The Feature map Feature _ Fuse4 is convolved by a C3 module and a convolution kernel with the size of 3 multiplied by 3 and is cascaded with a Feature map Feature _ d2 to obtain a Feature map Feature _ Fuse5, and then Feature extraction is carried out on the convolution layer with the size of 1 multiplied by 1 by 3 by the C3 module and the convolution kernel to obtain a Feature map F2, wherein the Feature size is 1/16 of the original image;
7) And cascading the Feature map Feature _ Fuse5 with the Feature map Feature _ d1 through convolution of a C3 module and a convolution kernel with the size of 3 multiplied by 3 to obtain a Feature map Feature _ Fuse6, and then performing Feature extraction through the C3 module and a convolution layer with the convolution kernel size of 1 multiplied by 1 to obtain a Feature map F1, wherein the Feature size is 1/32 of the original image.
4. The improved YOLOV5 model-based helmet wearing detection method according to claim 1, wherein in the optimized prediction box regression loss function: CIoU loss function is adopted as prediction frame regression loss function L of improved YOLOV5 model algorithm CIoU Which is defined as:
Figure FDA0004058460440000031
L CIoU =1-CIoU
wherein, ioU represents the intersection ratio of the prediction frame and the real frame, b represents the central point of the prediction frame gt Represents the center point of the real frame, and represents the Euclidean distance rho 2 (b,b gt ) C represents the diagonal distance of the minimum circumscribed rectangle which can simultaneously contain the prediction frame and the real frame; alpha represents a parameter for weighing, and v represents a parameter for measuring the consistency of the aspect ratio;
wherein, the parameter expressions of alpha and v are as follows:
Figure FDA0004058460440000032
Figure FDA0004058460440000033
wherein w and h are the width and height of the prediction box, respectively; w is a gt And h gt Respectively the width and height of the real box.
CN202211534970.0A 2022-12-02 2022-12-02 Safety helmet wearing detection method based on improved YOLOV5 model Active CN115546614B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211534970.0A CN115546614B (en) 2022-12-02 2022-12-02 Safety helmet wearing detection method based on improved YOLOV5 model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211534970.0A CN115546614B (en) 2022-12-02 2022-12-02 Safety helmet wearing detection method based on improved YOLOV5 model

Publications (2)

Publication Number Publication Date
CN115546614A CN115546614A (en) 2022-12-30
CN115546614B true CN115546614B (en) 2023-04-18

Family

ID=84721761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211534970.0A Active CN115546614B (en) 2022-12-02 2022-12-02 Safety helmet wearing detection method based on improved YOLOV5 model

Country Status (1)

Country Link
CN (1) CN115546614B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116612087B (en) * 2023-05-22 2024-02-23 山东省人工智能研究院 Coronary artery CTA stenosis detection method based on YOLOv5-LA
CN116343383A (en) * 2023-05-30 2023-06-27 四川三思德科技有限公司 Campus access management method and system based on Internet of things
CN116580027B (en) * 2023-07-12 2023-11-28 中国科学技术大学 Real-time polyp detection system and method for colorectal endoscope video
CN116824551A (en) * 2023-08-30 2023-09-29 山东易图信息技术有限公司 Light parking space state detection method based on visual attention
CN116958883B (en) * 2023-09-15 2023-12-29 四川泓宝润业工程技术有限公司 Safety helmet detection method, system, storage medium and electronic equipment
CN117710285B (en) * 2023-10-20 2024-07-16 重庆理工大学 Cervical lesion cell mass detection method and system based on self-adaptive feature extraction

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102547301A (en) * 2010-09-30 2012-07-04 苹果公司 System and method for processing image data using an image signal processor
CN107941775A (en) * 2017-12-28 2018-04-20 清华大学 Muti-spectrum imaging system
CN108537729A (en) * 2018-03-27 2018-09-14 珠海全志科技股份有限公司 Picture scaling method, computer installation and computer readable storage medium
CN110163234A (en) * 2018-10-10 2019-08-23 腾讯科技(深圳)有限公司 A kind of model training method, device and storage medium
CN110796249A (en) * 2019-09-29 2020-02-14 中山大学孙逸仙纪念医院 In-ear endoscope image neural network model construction method based on deep learning and intelligent classification processing method
CN111639527A (en) * 2020-04-23 2020-09-08 平安国际智慧城市科技股份有限公司 English handwritten text recognition method and device, electronic equipment and storage medium
CN111738922A (en) * 2020-06-19 2020-10-02 新希望六和股份有限公司 Method and device for training density network model, computer equipment and storage medium
CN112215753A (en) * 2020-10-23 2021-01-12 成都理工大学 Image demosaicing enhancement method based on double-branch edge fidelity network
CN112487862A (en) * 2020-10-28 2021-03-12 南京云牛智能科技有限公司 Garage pedestrian detection method based on improved EfficientDet model
CN112990237A (en) * 2019-12-02 2021-06-18 上海交通大学 Subway tunnel image leakage detection method based on deep learning
CN114022785A (en) * 2021-11-15 2022-02-08 中国华能集团清洁能源技术研究院有限公司 Remote sensing image semantic segmentation method, system, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597902A (en) * 2020-12-24 2021-04-02 上海核工程研究设计院有限公司 Small target intelligent identification method based on nuclear power safety
CN114419659A (en) * 2021-12-13 2022-04-29 中南大学 Method for detecting wearing of safety helmet in complex scene
CN114549997B (en) * 2022-04-27 2022-07-29 清华大学 X-ray image defect detection method and device based on regional feature extraction
CN114581860A (en) * 2022-05-09 2022-06-03 武汉纺织大学 Helmet detection algorithm based on improved YOLOv5 model
CN115205604A (en) * 2022-08-11 2022-10-18 淮阴工学院 Improved YOLOv 5-based method for detecting wearing of safety protection product in chemical production process

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102547301A (en) * 2010-09-30 2012-07-04 苹果公司 System and method for processing image data using an image signal processor
CN107941775A (en) * 2017-12-28 2018-04-20 清华大学 Muti-spectrum imaging system
CN108537729A (en) * 2018-03-27 2018-09-14 珠海全志科技股份有限公司 Picture scaling method, computer installation and computer readable storage medium
CN110163234A (en) * 2018-10-10 2019-08-23 腾讯科技(深圳)有限公司 A kind of model training method, device and storage medium
CN110796249A (en) * 2019-09-29 2020-02-14 中山大学孙逸仙纪念医院 In-ear endoscope image neural network model construction method based on deep learning and intelligent classification processing method
CN112990237A (en) * 2019-12-02 2021-06-18 上海交通大学 Subway tunnel image leakage detection method based on deep learning
CN111639527A (en) * 2020-04-23 2020-09-08 平安国际智慧城市科技股份有限公司 English handwritten text recognition method and device, electronic equipment and storage medium
CN111738922A (en) * 2020-06-19 2020-10-02 新希望六和股份有限公司 Method and device for training density network model, computer equipment and storage medium
CN112215753A (en) * 2020-10-23 2021-01-12 成都理工大学 Image demosaicing enhancement method based on double-branch edge fidelity network
CN112487862A (en) * 2020-10-28 2021-03-12 南京云牛智能科技有限公司 Garage pedestrian detection method based on improved EfficientDet model
CN114022785A (en) * 2021-11-15 2022-02-08 中国华能集团清洁能源技术研究院有限公司 Remote sensing image semantic segmentation method, system, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王昕钰 ; 王倩 ; 程敦诚 ; 吴福庆 ; .基于三级级联架构的接触网定位管开口销缺陷检测.仪器仪表学报.2019,(第10期),全文. *

Also Published As

Publication number Publication date
CN115546614A (en) 2022-12-30

Similar Documents

Publication Publication Date Title
CN115546614B (en) Safety helmet wearing detection method based on improved YOLOV5 model
Wang et al. Detection and localization of image forgeries using improved mask regional convolutional neural network
CN112800937A (en) Intelligent face recognition method
CN111209811B (en) Method and system for detecting eyeball attention position in real time
CN113657409A (en) Vehicle loss detection method, device, electronic device and storage medium
CN111666852A (en) Micro-expression double-flow network identification method based on convolutional neural network
CN107464245A (en) A kind of localization method and device at picture structure edge
CN114708566A (en) Improved YOLOv 4-based automatic driving target detection method
CN117409481A (en) Action detection method based on 2DCNN and 3DCNN
CN115719445A (en) Seafood identification method based on deep learning and raspberry type 4B module
CN111612895A (en) Leaf-shielding-resistant CIM real-time imaging method for detecting abnormal parking of shared bicycle
CN114997279A (en) Construction worker dangerous area intrusion detection method based on improved Yolov5 model
CN114067273A (en) Night airport terminal thermal imaging remarkable human body segmentation detection method
Imran et al. Image-based automatic energy meter reading using deep learning
CN117830283A (en) Strip steel defect detection method based on redundant feature reuse and receptive field enhancement
CN102682291B (en) A kind of scene demographic method, device and system
CN103455798B (en) Histogrammic human body detecting method is flowed to based on maximum geometry
CN117079125A (en) Kiwi fruit pollination flower identification method based on improved YOLOv5
CN116580289A (en) Fine granularity image recognition method based on attention
CN103455805B (en) A kind of new face characteristic describes method
CN115527105A (en) Underwater target detection method based on multi-scale feature learning
CN115240057A (en) Overhead transmission line monitoring image detection method based on deep learning
CN114973246A (en) Crack detection method of cross mode neural network based on optical flow alignment
CN112907553A (en) High-definition image target detection method based on Yolov3
CN113869151A (en) Cross-view gait recognition method and system based on feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant