CN112464765A - Safety helmet detection algorithm based on single-pixel characteristic amplification and application thereof - Google Patents

Safety helmet detection algorithm based on single-pixel characteristic amplification and application thereof Download PDF

Info

Publication number
CN112464765A
CN112464765A CN202011282208.9A CN202011282208A CN112464765A CN 112464765 A CN112464765 A CN 112464765A CN 202011282208 A CN202011282208 A CN 202011282208A CN 112464765 A CN112464765 A CN 112464765A
Authority
CN
China
Prior art keywords
feature
pixel
value
characteristic
safety helmet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011282208.9A
Other languages
Chinese (zh)
Other versions
CN112464765B (en
Inventor
姜丽芬
周雍恒
孙华志
马春梅
梁妍
马建扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Normal University
Original Assignee
Tianjin Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Normal University filed Critical Tianjin Normal University
Publication of CN112464765A publication Critical patent/CN112464765A/en
Application granted granted Critical
Publication of CN112464765B publication Critical patent/CN112464765B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a safety helmet detection algorithm based on single-pixel characteristic amplification and application thereof, wherein the detection algorithm comprises the following steps: preprocessing and enhancing the safety helmet data set; extracting a characteristic representation form of the target through an Efficientnet-b0 network; using a single-pixel feature scaling module to perform feature filtering on the backbone network features to enhance foreground elements in the features; performing multi-scale feature fusion operation on the enhanced features through a BiFPN feature fusion module; and inputting the fused features into a target prediction network, and classifying and positioning the targets. The SPZ-Det safety helmet wearing detection algorithm mainly uses the SPZ module to scale the features, so that the small target features are ensured not to be lost in a network, and the performance of the algorithm in detecting the small target is improved.

Description

Safety helmet detection algorithm based on single-pixel characteristic amplification and application thereof
Technical Field
The invention relates to the technical field of deep learning and target detection, in particular to a safety helmet detection algorithm based on single-pixel feature amplification and application thereof.
Background
The wearing of the safety helmet is a necessary safety precaution measure for construction sites, and research reports show that nearly hundreds of construction workers in China suffer from construction accidents every year. The accidents are mostly caused by the fact that the safety supervision of a construction site is not in place. In a construction site, a worker wears a safety helmet which is the most basic safety protection measure, but because the safety consciousness of the worker is low and the self-protection consciousness is weak, the worker can take off the safety helmet in the process of working conveniently, so that once an accident happens, the life of a construction worker can be threatened.
At present, the wearing detection scheme of the safety helmet mainly adopts the detection modes of video monitoring, manual patrol and the like, can not give out warning to workers who do not wear the safety helmet in time, and needs a large amount of human resources, so that the automatic detection technology of the safety helmet is very important. The safety helmet detection is the practical application field of target detection, in early safety helmet detection, the position distribution of the safety helmet and the human face is determined through the comparison of the color distribution of the safety helmet and the human face, and whether a worker wears the safety helmet or not is finally determined according to the position distribution information. Such a detection algorithm based on color distribution is extremely dependent on the characteristics of the color difference of the safety helmet, and is difficult to satisfy the detection environment with various types of safety helmets.
With the development and the perfection of the deep learning technology, the deep neural network can automatically capture feature information with finer granularity, and the self-adaptive captured information features are used for helping a subsequent detection task to predict the target position. The safety helmet detection algorithm in the deep learning era avoids the dependence on single characteristics, and the network can acquire more precise characteristic information in a self-adaptive manner for predicting the target. The general target detection algorithm can be divided into two categories: the method comprises the steps of a regression-based single-step target detection algorithm represented by YOLO, SSD and RetinalNet; the other is a two-step detection algorithm based on regions, such as Fast-RCNN, etc. Some algorithms adopt a two-step detection algorithm, namely fast-RCNN, to detect the helmet in pursuit of detection accuracy, but the detection is difficult to apply to actual life, and the detection speed of the two-step detection algorithm is very slow due to the complex calculation process of the two-step detection algorithm.
In addition, in the research of related algorithms for wearing and detecting the safety helmet, huge challenges are faced, such as large background change of a construction site and complex scene; individuals far away from the camera are often small in size and difficult to distinguish from a complex background; the construction site is dense in personnel, and the situation that a plurality of people are in the same scene and are shielded mutually often occurs. These challenges greatly limit the performance of headgear wear detection algorithms.
Disclosure of Invention
The invention aims to provide a safety helmet detection algorithm based on single-pixel characteristic amplification aiming at the problems of complex safety helmet wearing detection steps, low detection speed and high identification difficulty in the prior art.
In another aspect of the invention, the application of the safety helmet detection algorithm based on single-pixel feature amplification in construction site monitoring is provided.
The technical scheme adopted for realizing the purpose of the invention is as follows:
a safety helmet detection algorithm based on single-pixel feature amplification comprises the following steps:
step 1, preprocessing and enhancing a safety helmet data set to obtain preprocessed and enhanced sample data;
step 2, extracting a characteristic representation form of a target from the preprocessed and enhanced sample data obtained in the step 1 through an Efficientnet-b0 network to obtain a backbone network characteristic;
step 3, performing feature filtering on the backbone network features obtained in the step 2 by using a single-pixel feature scaling module, and enhancing foreground elements in the features to obtain new feature values;
step 4, performing multi-scale feature fusion operation on the new feature value obtained in the step 3 through a BiFPN feature fusion module to obtain a fused feature;
and 5, inputting the fused features obtained in the step 4 into a target prediction network, and classifying and positioning the targets.
In the above technical solution, the pretreatment in step 1 includes the following steps:
step 1.1, expanding a safety helmet data set by using a horizontal turning method, so that each sample in the safety helmet data set has sample data in positive and negative forms;
and 1.2, randomly inserting noise into the sample data, improving the complexity of the sample, and improving the robustness of the algorithm on a data level.
In the above technical solution, when selecting the trunk feature layer of Efficientnet-b0 in step 2, selecting the top three-layer feature, a down-sampling layer feature and a lower layer feature;
the backbone network characteristics are extracted by the following method:
step 2.1, in the feature map extracted by Efficientnet-b0, for feature layers with different resolutions of the upper layer, two feature layers are provided, namely a low-layer feature X1 and a high-layer feature X2;
and 2.2, for the feature layer with the same resolution, the algorithm only selects the high-level feature X2 as the feature representation of the subsequent calculation.
In the above technical solution, the new characteristic value in step 3 is obtained through the following steps:
and performing primary spatial attention enhancement on the features of the trunk network to obtain a main area of the foreground element, namely an attention enhancement feature, wherein in the attention enhancement feature, the contribution capacity of each pixel to the overall feature is calculated, then a feature contribution graph is obtained according to the contribution value, different pixels are subjected to scaling control, and the scaled feature is obtained.
In the above technical solution, the new characteristic value in step 3 is obtained through the following steps:
step 3.1, the main network characteristic F is calculated by using simple space attention once to obtain the attention enhancement characteristic F, as shown in a formula (1),
Figure BDA0002781149050000031
where max is the maximum pooling, mean is the average pooling, v is a convolution calculation of 7 x 7,
Figure BDA0002781149050000032
and S stands for Rule and Sigmoid operations, fiIs an initial feature;
step 3.2, after obtaining the attention enhancing feature F, performing a feature amplification operation at a pixel level on the attention enhancing feature F, firstly calculating a contribution value of each pixel point to the feature map, then obtaining a feature contribution map according to the contribution value, and scaling primary and secondary elements of the feature elements, which is specifically as follows:
Figure BDA0002781149050000033
wherein, CiRepresenting the characteristic value of the ith channel, niTo scale value, fiFor the initial feature, H and W are the height and width of the feature map, S is a Softmax function, firstly, the Softmax score value of the single channel feature is obtained through S, the score represents the contribution value of each pixel position to the overall feature of the channel, and then the Softmax score value is combined with the average contribution value 1/(H multiplied by W) of the single pixel of the channelComparing, if the contribution value is larger than the average contribution value, setting the scaling value as niIf the average contribution value is smaller than the average contribution value, the scaling value is set to (1-n)i) And finally obtaining a characteristic contribution graph.
Step 3.3, the feature contribution graph and input features CiPerforming dot multiplication operation to obtain scaling characteristics, introducing a residual error structure, and performing interpolation on the initial characteristicsiAnd adding the new characteristic values to obtain new characteristic values for the characteristic fusion of the multi-scale module.
In the above technical solution, the multi-scale feature fusion operation method in step 4 includes: and the new characteristic value enhanced by the single-pixel scaling module is transmitted to a BiFPN sign and fusion module, and the BiFPN sign and fusion module performs characteristic fusion on the characteristics with different sizes of different hierarchies to compensate the information lost due to downsampling.
In the technical scheme, in the BiFPN feature fusion module, three-layer cross-link operation is used for maintaining the original features in the backbone network to be transferred, and the proportion of different features is controlled by a control factor.
In the above technical solution, in the step 5, the detection networks, both of which are three layers of CNNs, classify and locate the target.
In the above technical solution, the method for classifying and positioning the target in step 5 includes the following steps:
step 5.1, in a classification network, using an FcoalLoss loss calculation strategy to limit a large number of background elements and ensure the balance of a positive sample and a negative sample;
and 5.2, in the positioning regression network, using smooth L1 function as a loss calculation strategy as shown in formula (3), and performing loss calculation on the predicted position offset and the offset of the real position of the sample, wherein the offset of the real position can be calculated by formula (4).
Figure BDA0002781149050000041
Figure BDA0002781149050000042
Wherein gt is the regression offset after conversion; reg represents the predicted offset of the regression sub-network; (dx, dy, dw, dh) is a regression label, which is replaced by the relative positional offset between the true annotation box (tx, ty, tw, th) and the anchor box (ax, ay, aw, ah).
In another aspect of the invention, the application of the safety helmet detection algorithm based on single-pixel characteristic amplification in construction site monitoring is characterized in that a foreground terminal monitors a construction site by using a camera, the camera transmits a real-time picture to a background processing terminal, and the background terminal performs detection analysis by performing the safety helmet detection algorithm based on single-pixel characteristic amplification and returns a result to the foreground terminal to remind workers in real time.
Compared with the prior art, the invention has the beneficial effects that:
1. when the data set is preprocessed, the data enhancement mode of horizontal turning is used on the detection data set of the open-source safety helmet, so that the volume of the data set is doubled, and the content of the data set is supplemented; meanwhile, noise is randomly inserted into the data set so as to improve the complexity of data samples and ensure stronger robustness of the model from a data level.
2. The invention adopts Efficientnet-b0 as a backbone network for feature extraction. In order to solve the problems of serious loss of characteristic information of the small target and the like, a bottom layer characteristic layer is introduced into a main network characteristic layer, and the occupation ratio of the small target characteristic in the network is increased. Specifically, in the aspect of selecting a backbone feature layer, the most upper three layers of features are selected by Efficientnet, and then two layers of features are sampled at the lower part, namely five layers, but the feature of the lower layer is also added into the backbone network, so that one down-sampling layer is reduced, but the feature of the lower layer is also five layers of feature layers in total, which means that one more layer of backbone network features is reserved compared with the original Efficientnet network, and one additional down-sampling feature layer is reduced. In order to ensure that the features of small objects still exist in feature fusion and detection networks, a single-pixel enhancement module is used for controlling the features, and foreground elements are not lost.
3. The invention carries out pixel-level scaling on the features extracted from the backbone network, then transmits the enhanced features to a multi-scale fusion module BiFPN feature fusion module for interactive fusion of upper and lower layer features, and then carries out prediction of classification and positioning of targets by a detection head network (target prediction network).
4. The invention provides a single-pixel feature scaling-SPZ-Det detection model based on context attention, aiming at the problems of complex shielding, small target detection and the like. The model introduces the bottom layer characteristics with rich details into the network, ensures the effectiveness of the network in detecting the small target, and solves the problems that the personnel are shielded mutually, the small target is detected, the network is difficult to extract accurate characteristics and the like. The single-pixel feature Scaling (SPZ) module strengthens the main information in the features, ensures that the main feature information is not ignored or replaced by other noise features in the network reasoning process, and relieves the phenomenon of feature loss in the feature transmission process.
5. The loss phenomenon of characteristic information in the network transmission process is solved based on the selection of the characteristic layer and the introduction of the single-pixel characteristic scaling module, the effectiveness of the SPZ module is verified through comparison, the accuracy of target detection is improved on the basis of ensuring the detection speed by the model, and the AP detection accuracy reaches 94% when the safety helmet is worn.
Drawings
FIG. 1 is a diagram of an SPZ-Det network model;
FIG. 2 is a block diagram of an SPZ module, where M is max pooling; a is average pooling; c is a concatenate; s is Sigmoid calculation; ghost stands for GhostModule; featureiRepresents formula (2).
Detailed Description
The present invention will be described in further detail with reference to specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
A safety helmet detection algorithm based on single-pixel feature amplification comprises the following steps:
step 1, preprocessing and enhancing a safety helmet data set to obtain preprocessed and enhanced sample data:
step 1.1, expanding a safety helmet data set by using a horizontal turning method, so that each sample in the safety helmet data set has sample data in positive and negative forms;
and 1.2, randomly inserting noise into the sample data, improving the complexity of the sample, and improving the robustness of the algorithm on a data level.
Step 2, extracting a characteristic representation form of a target from the preprocessed and enhanced sample data obtained in the step 1 through an Efficientnet-b0 network to obtain a backbone network characteristic:
step 2.1, in the feature map extracted by Efficientnet-b0, for feature layers with different resolutions of the upper layer, two feature layers are provided, namely a low-layer feature X1 and a high-layer feature X2;
and 2.2, for the feature layer with the same resolution, the algorithm only selects the high-level feature X2 as the feature representation of the subsequent calculation.
And 3, performing feature filtering on the backbone network features obtained in the step 2 by using a single-pixel feature scaling module, enhancing foreground elements in the features, and obtaining a new feature value:
step 3.1, the main network characteristic F is calculated by using simple space attention once to obtain the attention enhancement characteristic F, as shown in a formula (1),
Figure BDA0002781149050000061
where max is the maximum pooling, mean is the average pooling, v is a convolution calculation of 7 x 7,
Figure BDA0002781149050000062
and S stands for Rule and Sigmoid operations, fiIs an initial feature;
step 3.2, after obtaining the attention enhancing feature F, performing a feature amplification operation at a pixel level on the attention enhancing feature F, firstly calculating a contribution value of each pixel point to the feature map, then obtaining a feature contribution map according to the contribution value, and scaling primary and secondary elements of the feature elements, which is specifically as follows:
Figure BDA0002781149050000063
wherein, CiRepresenting the characteristic value of the ith channel, niTo scale value, fiFor the initial feature, H and W are the height and width of the feature map, S is a Softmax function, firstly, the Softmax score value of the single channel feature is obtained through S, the score represents the contribution value of each pixel position to the overall feature of the channel, then the contribution value is compared with the average contribution value 1/(H multiplied by W) of the single pixel of the channel, and if the contribution value is larger than the average contribution value, the scaling value is set to niIf the average contribution value is smaller than the average contribution value, the scaling value is set to (1-n)i) And finally obtaining a characteristic contribution graph.
Step 3.3, the feature contribution graph and input features CiPerforming dot multiplication operation to obtain scaling characteristics, introducing a residual error structure, and performing interpolation on the initial characteristicsiAnd adding the new characteristic values to obtain new characteristic values for the characteristic fusion of the multi-scale module. The introduction of the residual structure can be referred to as: he K, Zhang X, Ren S, et al]// Proceedings of the IEEE conference on computer vision and pattern recognition.2016:770-778, which will not be described in detail herein.
Step 4, performing multi-scale feature fusion operation on the new feature value obtained in the step 3 through a BiFPN feature fusion module to obtain a fused feature;
and the new characteristic value enhanced by the single-pixel scaling module is transmitted to a BiFPN sign and fusion module, and the BiFPN sign and fusion module performs characteristic fusion on the characteristics with different sizes of different hierarchies to compensate the information lost due to downsampling.
In the BiFPN feature fusion module, three-layer cross-link operation is used for maintaining the original features in the backbone network to be transferred, and the proportion between different features is controlled by a control factor. BiFPN feature fusion Module feature fusion may be referred to Tan M, Pang R, Le Q V. efficient: Scalable and impact object detection [ C ]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition.2020: 10781-.
And 5, inputting the fused features obtained in the step 4 into a target prediction network, and classifying and positioning the targets by two detection networks which are three layers of CNNs:
step 5.1, in a classification network, using an FcoalLoss loss calculation strategy to limit a large number of background elements and ensure the balance of a positive sample and a negative sample;
and 5.2, in the positioning regression network, using smooth L1 function as a loss calculation strategy as shown in formula (3), and performing loss calculation on the predicted position offset and the offset of the real position of the sample, wherein the offset of the real position can be calculated by formula (4).
Figure BDA0002781149050000071
Figure BDA0002781149050000072
Wherein gt is the regression offset after conversion; reg represents the predicted offset of the regression sub-network; (dx, dy, dw, dh) is a regression label, which is replaced by the relative positional offset between the true annotation box (tx, ty, tw, th) and the anchor box (ax, ay, aw, ah).
In another aspect of the invention, the application of the safety helmet detection algorithm based on single-pixel characteristic amplification in construction site monitoring is characterized in that a foreground terminal monitors a construction site by using a camera, the camera transmits a real-time picture to a background processing terminal, and the background terminal performs detection analysis by performing the safety helmet detection algorithm based on single-pixel characteristic amplification and returns a result to the foreground terminal to remind workers in real time.
Example 2
The embodiment adopts the public data set Safety-Helmet-week-Dataset provided by Wenshaihui, and comprises 7582 images in total, comprising 9044 bounding boxes (positive class) with Safety caps and 111514 bounding boxes (negative class) without Safety caps, wherein most of the data sets of the negative class are derived from the SCUT-HEAD data set. The labeled target of the data set has a large number of small heads and an obscured unclear target, the data is disordered and complex, and the labeled data does not belong to the detection category. In the process of data reading, the targets with class errors and difficult detection are firstly eliminated, and a data set which finally participates in training is obtained. The data of two parts of 7582 pieces of image data are each as 8: the structure of 2 divides a training set and a test set, and the uniformity of the distribution of the original data is ensured.
Cross-over ratios IoU are commonly used in the field of object detection to evaluate whether the prediction can locate the position of a real object. IoU is shown in the formula (5).
Figure BDA0002781149050000073
Wherein DR is a detectionResult result frame as a network prediction result frame, GT is a position frame of a GroundTruth real sample, and the larger the IoU value in the experiment is, the more the model prediction effect accords with the real sample position frame. IoU, it is judged whether the predicted result can be used as the final predicted result, and the block is judged to be the available position prediction block when the value is set to 0.5 in the experiment.
The performance effect of the algorithm model is often evaluated by using the mAP value in the evaluation of detection performance, and the mAP is the average AP value of results predicted by multiple categories. In order to obtain the AP value of a single category, the accuracy and recall must be obtained, as shown in equations (6) and (7).
Figure BDA0002781149050000081
Figure BDA0002781149050000082
Wherein TP, FP and FN are defined as shown in Table 1.
TABLE 1 TP, FP, FN definitions
Figure BDA0002781149050000083
If a PR curve can be constructed from the values of precision and Recall, the AP value is calculated as shown in equation (8).
Figure BDA0002781149050000084
It can be seen that the AP value is equal to the area under the PR curve. And (3) calculating the AP value of each category according to a formula (8), and finally taking an average value to obtain a final result mAP, wherein the larger the mAP value is, the better the detection performance of the network is.
Selecting a rolling module group: in order to achieve better effect and minimum calculation amount, general convolution, separation convolution and Ghost convolution are used for comparison, and finally the Ghost convolution with good effect and low calculation amount is selected as a convolution operation scheme of the single-pixel scaling module.
To ensure the validity of our proposed model, we used two baseline models for comparative experiments. The specific method comprises the following steps:
(1) efficientdet-d 0: the original Efficientdet model shows that the detection performance is low in the small target Person category, and the final mAP is only 52.3%.
(2) Efficientdet-change: the improved Efficientdet model increases the bottom layer characteristic information, and finally the mAP reaches 77.9%.
(3) YOLOv 3: the mAP can reach 71.4% when tested on the same dataset using YOLOV 3.
(4) YOLOv3+ SPZ: the single pixel scaling module we propose was introduced in YOLOv3, raising the value of the mapp to 73.5%.
(5) SPZ-Det: the method is a model finally proposed by us, a novel safety helmet detection algorithm is constructed by using an Efficientdet structure and combining bottom layer characteristics and an SPZ single-pixel-side module, the mAP value of the safety helmet detection algorithm finally reaches 80.2%, and the AP value for safety helmet wearing detection reaches 94.6%.
Compared analysis shows that the characterization capability of the network characteristics can be improved by focusing on the bottom layer characteristics, the single-pixel characteristic scaling module can be embedded into other detection models, the detection performance of the models is improved, and the experimental results are shown in table 2.
TABLE 2 results of the experiment
Detection method AP(hat) AP(person) mAP
Efficientdet-d0 79.6% 24.9% 52.3%
Efficientdet-change 93.9% 61.9% 77.9%
YOLOv3 86.3% 56.4% 71.4%
YOLOv3+SPZ 91.5% 55.5% 73.5%
SPZ-Det 94.6% 65.8% 80.2%
Feature extraction is carried out by using an Efficientnet-do backbone network, an important feature layer is reselected to enhance the representation capability of features extracted by the backbone network, and then a single-pixel feature scaling module is introduced into a model, so that the phenomenon that features generated in the calculation process of a small target disappear is solved. Finally, the detection precision of the model on the wearing safety helmet reaches 94%, and the overall mAP reaches 80.2%.
Example 3
A monitoring system is constructed through a safety helmet detection algorithm based on single-pixel characteristic amplification, a foreground terminal monitors a construction site through a camera, the camera transmits a real-time picture to a background processing terminal, the safety helmet detection algorithm based on the single-pixel characteristic amplification of the background terminal carries out detection analysis, a result is returned to the foreground terminal, and workers are reminded in real time.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A safety helmet detection algorithm based on single-pixel feature amplification is characterized by comprising the following steps:
step 1, preprocessing and enhancing a safety helmet data set to obtain preprocessed and enhanced sample data;
step 2, extracting a characteristic representation form of a target from the preprocessed and enhanced sample data obtained in the step 1 through an Efficientnet-b0 network to obtain a backbone network characteristic;
step 3, performing feature filtering on the backbone network features obtained in the step 2 by using a single-pixel feature scaling module, and enhancing foreground elements in the features to obtain new feature values;
step 4, performing multi-scale feature fusion operation on the new feature value obtained in the step 3 through a BiFPN feature fusion module to obtain a fused feature;
and 5, inputting the fused features obtained in the step 4 into a target prediction network, and classifying and positioning the targets.
2. The safety helmet detection algorithm based on single-pixel feature amplification as claimed in claim 1, wherein the preprocessing in step 1 comprises the following steps:
step 1.1, expanding a safety helmet data set by using a horizontal turning method, so that each sample in the safety helmet data set has sample data in positive and negative forms;
and 1.2, randomly inserting noise into the sample data, improving the complexity of the sample, and improving the robustness of the algorithm on a data level.
3. The helmet detection algorithm based on single-pixel feature amplification of claim 1, wherein in the step 2, when selecting the Efficientnet-b0 trunk feature layer, the topmost three-layer feature, a downsampling layer feature and a lower-layer feature are selected;
the backbone network characteristics are extracted by the following method:
step 2.1, in the feature map extracted by Efficientnet-b0, for feature layers with different resolutions of the upper layer, two feature layers are provided, namely a low-layer feature X1 and a high-layer feature X2;
and 2.2, for the feature layer with the same resolution, the algorithm only selects the high-level feature X2 as the feature representation of the subsequent calculation.
4. The helmet detection algorithm based on single-pixel feature amplification of claim 1, wherein the new feature value in step 3 is obtained by the following steps:
and performing primary spatial attention enhancement on the features of the trunk network to obtain a main area of the foreground element, namely an attention enhancement feature, wherein in the attention enhancement feature, the contribution capacity of each pixel to the overall feature is calculated, then a feature contribution graph is obtained according to the contribution value, different pixels are subjected to scaling control, and the scaled feature is obtained.
5. The helmet detection algorithm based on single-pixel feature amplification of claim 4, wherein the new feature value in step 3 is obtained by the following steps:
step 3.1, the main network characteristic F is calculated by using simple space attention once to obtain the attention enhancement characteristic F, as shown in a formula (1),
Figure FDA0002781149040000021
where max is the maximum pooling, mean is the average pooling, v is a convolution calculation of 7 x 7,
Figure FDA0002781149040000022
and S stands for Rule and Sigmoid operations, fiIs an initial feature;
step 3.2, after obtaining the attention enhancing feature F, performing a feature amplification operation at a pixel level on the attention enhancing feature F, firstly calculating a contribution value of each pixel point to the feature map, then obtaining a feature contribution map according to the contribution value, and scaling primary and secondary elements of the feature elements, which is specifically as follows:
Figure FDA0002781149040000023
wherein, CiRepresenting the characteristic value of the ith channel, niTo scale value, fiFor the initial feature, H and W are the height and width of the feature map, S is a Softmax function, firstly, the Softmax score value of the single channel feature is obtained through S, the score represents the contribution value of each pixel position to the overall feature of the channel, then the contribution value is compared with the average contribution value 1/(H multiplied by W) of the single pixel of the channel, and if the contribution value is larger than the average contribution value, the scaling value is set to niIf the average contribution value is smaller than the average contribution value, the scaling value is set to (1-n)i) And finally obtaining a characteristic contribution graph.
Step 3.3, the feature contribution graph and input features CiPerforming dot multiplication operation to obtain scaling characteristics, introducing a residual error structure, and performing interpolation on the initial characteristicsiAnd adding the new characteristic values to obtain new characteristic values for the characteristic fusion of the multi-scale module.
6. The safety helmet detection algorithm based on single-pixel feature amplification as claimed in claim 1, wherein the multi-scale feature fusion operation method in step 4 is as follows: and the new characteristic value enhanced by the single-pixel scaling module is transmitted to a BiFPN sign and fusion module, and the BiFPN sign and fusion module performs characteristic fusion on the characteristics with different sizes of different hierarchies to compensate the information lost due to downsampling.
7. The single-pixel feature amplification-based helmet detection algorithm of claim 6, wherein in the BiFPN feature fusion module, three-layer cross-link operation is used to maintain the original features in the backbone network to be transferred, and the control factor controls the ratio between different features.
8. The single-pixel feature amplification-based helmet detection algorithm of claim 1, wherein in the step 5, the target is classified and located by the detection networks which are both three layers of CNN.
9. The safety helmet detection algorithm based on single-pixel feature amplification as claimed in claim 1, wherein the method for classifying and positioning the target in step 5 comprises the following steps:
step 5.1, in a classification network, using an FcoalLoss loss calculation strategy to limit a large number of background elements and ensure the balance of a positive sample and a negative sample;
and 5.2, in the positioning regression network, using smooth L1 function as a loss calculation strategy as shown in formula (3), and performing loss calculation on the predicted position offset and the offset of the real position of the sample, wherein the offset of the real position can be calculated by formula (4).
Figure FDA0002781149040000031
Figure FDA0002781149040000032
Wherein gt is the regression offset after conversion; reg represents the predicted offset of the regression sub-network; (dx, dy, dw, dh) is a regression label, which is replaced by the relative positional offset between the true annotation box (tx, ty, tw, th) and the anchor box (ax, ay, aw, ah).
10. The application of the safety helmet detection algorithm based on single-pixel characteristic amplification in construction site monitoring as claimed in any one of claims 1-9, wherein a front-end terminal monitors a construction site by using a camera, the camera transmits a real-time picture to a background processing terminal, and the background terminal performs detection analysis by performing the safety helmet detection algorithm based on single-pixel characteristic amplification and returns the result to the front-end terminal to perform real-time reminding on workers.
CN202011282208.9A 2020-09-10 2020-11-16 Safety helmet detection method based on single-pixel characteristic amplification and application thereof Active CN112464765B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2020109498709 2020-09-10
CN202010949870 2020-09-10

Publications (2)

Publication Number Publication Date
CN112464765A true CN112464765A (en) 2021-03-09
CN112464765B CN112464765B (en) 2022-09-23

Family

ID=74837081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011282208.9A Active CN112464765B (en) 2020-09-10 2020-11-16 Safety helmet detection method based on single-pixel characteristic amplification and application thereof

Country Status (1)

Country Link
CN (1) CN112464765B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011365A (en) * 2021-03-31 2021-06-22 中国科学院光电技术研究所 Target detection method combined with lightweight network
CN114462555A (en) * 2022-04-13 2022-05-10 国网江西省电力有限公司电力科学研究院 Multi-scale feature fusion power distribution network equipment identification method based on raspberry pi

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059582A (en) * 2019-03-28 2019-07-26 东南大学 Driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN110796009A (en) * 2019-09-29 2020-02-14 航天恒星科技有限公司 Method and system for detecting marine vessel based on multi-scale convolution neural network model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN110059582A (en) * 2019-03-28 2019-07-26 东南大学 Driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks
CN110796009A (en) * 2019-09-29 2020-02-14 航天恒星科技有限公司 Method and system for detecting marine vessel based on multi-scale convolution neural network model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MINGXING TAN ET AL.: "EfficientDet: Scalable and Efficient Object Detection", 《ARXIV》 *
SANGHYUN WOO ET AL.: "CBAM: Convolutional Block Attention Module", 《ARXIV》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011365A (en) * 2021-03-31 2021-06-22 中国科学院光电技术研究所 Target detection method combined with lightweight network
CN114462555A (en) * 2022-04-13 2022-05-10 国网江西省电力有限公司电力科学研究院 Multi-scale feature fusion power distribution network equipment identification method based on raspberry pi
US11631238B1 (en) 2022-04-13 2023-04-18 Iangxi Electric Power Research Institute Of State Grid Method for recognizing distribution network equipment based on raspberry pi multi-scale feature fusion

Also Published As

Publication number Publication date
CN112464765B (en) 2022-09-23

Similar Documents

Publication Publication Date Title
CN109670441B (en) Method, system, terminal and computer readable storage medium for realizing wearing recognition of safety helmet
CN108009473B (en) Video structuralization processing method, system and storage device based on target behavior attribute
CN108062349B (en) Video monitoring method and system based on video structured data and deep learning
CN108052859B (en) Abnormal behavior detection method, system and device based on clustering optical flow characteristics
CN103824070B (en) A kind of rapid pedestrian detection method based on computer vision
CN109918971B (en) Method and device for detecting number of people in monitoring video
CN104978567B (en) Vehicle checking method based on scene classification
CN109935080B (en) Monitoring system and method for real-time calculation of traffic flow on traffic line
CN112464765B (en) Safety helmet detection method based on single-pixel characteristic amplification and application thereof
CN111401310B (en) Kitchen sanitation safety supervision and management method based on artificial intelligence
CN111753651A (en) Subway group abnormal behavior detection method based on station two-dimensional crowd density analysis
CN111738336A (en) Image detection method based on multi-scale feature fusion
CN113989858B (en) Work clothes identification method and system
CN111832461A (en) Non-motor vehicle riding personnel helmet wearing detection method based on video stream
CN112163572A (en) Method and device for identifying object
CN110853025A (en) Crowd density prediction method based on multi-column residual error cavity convolutional neural network
KR101030257B1 (en) Method and System for Vision-Based People Counting in CCTV
CN114092877A (en) Garbage can unattended system design method based on machine vision
CN111259736B (en) Real-time pedestrian detection method based on deep learning in complex environment
CN114885119A (en) Intelligent monitoring alarm system and method based on computer vision
CN109325426B (en) Black smoke vehicle detection method based on three orthogonal planes time-space characteristics
CN111709305A (en) Face age identification method based on local image block
CN106372566A (en) Digital signage-based emergency evacuation system and method
CN109064444B (en) Track slab disease detection method based on significance analysis
CN114708532A (en) Monitoring video quality evaluation method, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant