CN112070043A - Safety helmet wearing convolutional network based on feature fusion, training and detecting method - Google Patents
Safety helmet wearing convolutional network based on feature fusion, training and detecting method Download PDFInfo
- Publication number
- CN112070043A CN112070043A CN202010966231.3A CN202010966231A CN112070043A CN 112070043 A CN112070043 A CN 112070043A CN 202010966231 A CN202010966231 A CN 202010966231A CN 112070043 A CN112070043 A CN 112070043A
- Authority
- CN
- China
- Prior art keywords
- conv
- feature
- sampling
- module
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000004927 fusion Effects 0.000 title claims abstract description 20
- 238000012549 training Methods 0.000 title claims abstract description 14
- 238000001514 detection method Methods 0.000 claims abstract description 53
- 238000005070 sampling Methods 0.000 claims abstract description 27
- 230000010354 integration Effects 0.000 claims abstract description 26
- 238000011176 pooling Methods 0.000 claims abstract description 25
- 230000008569 process Effects 0.000 claims abstract description 19
- 230000008859 change Effects 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 15
- 238000010586 diagram Methods 0.000 claims description 14
- 238000000605 extraction Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 230000004931 aggregating effect Effects 0.000 claims description 2
- 230000005484 gravity Effects 0.000 claims description 2
- 230000000717 retained effect Effects 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 15
- 238000013135 deep learning Methods 0.000 description 8
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000010276 construction Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a convolutional network worn on a safety helmet based on feature fusion, a training method and a detection method. Three modules were introduced in the centrnet in sequence: the characteristic pyramid module adopts a top-down process, firstly Conv-5 adopts n times of up-sampling, and Conv-4 adopts m × m convolution kernels to change the number of channels and is fused with the Conv-5 characteristic layer after up-sampling; conv-4 and Conv-3 are similar to the operation, namely n times of upsampling is carried out firstly and then the upsampled n times of upsampled m times of convolution kernel is fused with the next layer; the global guide module comprises a pyramid pooling module and a global guide flow module; the feature integration module performs n, 2n and 4n times of down sampling on the fused features, performs average pooling, performs corresponding times of up sampling integration, and performs convolution with a convolution kernel of 3m × 3 m. The invention greatly improves the detection efficiency, obviously improves the detection effect on the wearing condition of the safety helmet of a worker with small image mesoscale, has the detection speed of 21fps and basically meets the real-time property.
Description
Technical Field
The invention belongs to the field of artificial intelligence, and particularly relates to a helmet wearing detection method CenterNet-Feure-Fusion based on feature Fusion,
background
Statistical data show that 13 major accidents happen in the building industry in May nationwide this year, and the death of 51 people is increased by 18.2% and 34.2% respectively compared with the death of the last year. The safety helmet is an important tool for protecting the head of a worker, and has important significance for life safety. However, many workers lack safety awareness, and the situation that the safety helmet is not worn occurs, so that it is a significant matter to automatically detect whether the worker wears the safety helmet or not.
According to investigation, most of monitoring cameras on a construction site are arranged at a high position, the proportion of the shot images occupied by workers is small, and the characteristics are difficult to identify, so that the problem of wearing and detecting the safety helmet of small-scale workers is urgently needed to be solved.
Sensor-based approaches focus on location and tracking technologies such as Radio Frequency Identification (RFID) and Wireless Local Area Networks (WLANs). Dong et al developed a real-time location system (RTLS) and virtual configuration for worker location tracking, i.e., a pressure sensor is placed in the helmet and pressure information is transmitted via bluetooth to determine whether the worker is wearing the helmet. Zhang et al developed an intelligent headgear system using an internet of things based architecture. In order to determine the use condition of the safety helmet, an infrared beam detector and a thermal infrared sensor are placed in the safety helmet.
Object detection is to identify not only the class of an object but also to predict the location of the object in an image, typically marked with a box. The traditional target detection method generally uses a frame of a sliding window, and mainly comprises three steps:
(1) sliding on the image by using sliding windows with different scales, and selecting a certain part as a candidate area;
(2) and extracting the visual features of the candidate regions. Such as HOG features, Haar features, etc.;
(3) classifying by using a classifier, such as an SVM classifier;
with the development of deep learning, a target detection algorithm based on deep learning becomes mainstream, and different from the traditional method, the characteristics are autonomously learned through a multilayer convolutional neural network without designing the characteristics by self. The current target detection algorithms based on deep learning are roughly divided into two types: a two-stage process and a one-stage process. The difference between the two methods is that the two-stage method needs to extract a preselected candidate frame and then carry out classification prediction on the candidate frame, such as R-CNN series, FPN and the like. The one-stage method does not need to extract a preselected candidate frame, and directly treats the target positioning problem as a regression problem, such as a YOLO series, SSD, and the like.
Compared with a sensor-based method, the vision-based method is concerned by more and more people, the use of a camera and the improvement of computer vision and pattern recognition technology lay a solid foundation for the vision-based safety helmet wearing detection, and the safety helmet wearing detection belongs to a small application in target detection. The traditional method based on manual design features is as follows: liu Xiao Hui et al uses skin color detection method to detect the area of face, extracts Hu moment feature of the above area, and uses SVM to classify. Park et al extract the HOG characteristics of the body part of the person and the safety helmet respectively, and then match according to their spatial relationship, when the person is not standing, the method has poor detection effect. Li et al propose a color-based hybrid descriptor to extract features of different color helmets, and then use a build-level support vector machine to classify objects into four categories (red, yellow, blue and no helmets). With the development of deep learning target detection, Fang et al uses FasterR-CNN algorithm to perform helmet detection, firstly uses RPN network to select candidate region frame, and then predicts each candidate region. The handsome et al uses the YOLOv3 algorithm to detect the safety helmet, does not need to preselect candidate frames in advance, and directly treats the target positioning problem as a regression problem.
A novel feature fusion mode is introduced on the basis of the current advanced CenterNet target detection algorithm, and the problem that small-scale workers on a site detect wearing safety helmets is solved. According to the above contents, the existing helmet wearing detection method is still in a starting stage, once the method based on the sensor exceeds a signal range, the detection cannot be carried out, equipment needs to be charged regularly and cannot be used for a long time, the method based on the manual design features is poor in generalization performance, low in universality and limited in use scene. Only then is there a later approach based on deep learning target detection, but the deep learning target detection algorithms used in the past for headgear wearing detection still have the following problems: (1) an anchor frame mechanism is adopted, the size and the length-width ratio of the anchor frame are designed to be troublesome, and a large number of redundant frames exist, so that the positive and negative samples are seriously unbalanced. (2) Because the workers are scattered under the complex working condition, the distance from the lens is short, the proportion of the image occupied by the workers far away is small, and the features are difficult to obtain.
The original centrnet structure is shown in figure 1: inputting a 512 × 512 picture, extracting the features of the image through a backbone network, and performing upsampling by using a bilinear interpolation method to obtain a 128 × 128 feature map in order to obtain a high-resolution feature map. The prediction part adopts 3 branches for generating a key point thermodynamic diagram and scale prediction of a bounding box (W)i,Hi) And deviation prediction of key points (Δ X)i,ΔYi) Convolution of 3 × 3 and 1 × 1 is adopted. From the predicted centre point coordinates (X)i,Yi) Namely, the peak point in the thermodynamic diagram of the key point, the scale of the bounding box and the deviation value can locate the position (X) of the targeti+ΔXi-Wi/2,Yi+ΔYi-Hi/2, Xi+ΔXi+Wi/2,Yi+ΔYi+Hi/2). It can be seen that the centret, although up-sampling the last layer of feature map, obtains a higher resolution output feature map, does not incorporate the details of the shallow features, and therefore, the centret is less effective in detecting the wearing of small-scale workers' safety helmets.
Disclosure of Invention
1. The invention aims to provide a novel method.
The invention provides a safety helmet wearing convolutional network based on feature fusion, a training and detecting method, a storage medium and a device, aiming at solving the problems that the number of network layers is increased, semantic information is rich, position information is lost, shallow feature position information is rich and semantic information is lost in the prior art.
2. The technical scheme adopted by the invention is disclosed.
The invention discloses a method for generating a convolutional network for detecting safety helmet wearing based on feature fusion, which sequentially introduces three modules, a feature pyramid module, a global guide module and a feature integration module into a CenterNet:
the characteristic pyramid module adopts a top-down process, firstly Conv-5 adopts n times of up-sampling, and Conv-4 adopts m × m convolution kernels to change the number of channels and is fused with the Conv-5 characteristic layer after up-sampling; conv-4 and Conv-3 are similar to the operation, namely n times of upsampling is carried out firstly and then the upsampled n times of upsampled m times of convolution kernel is fused with the next layer;
the global guide module comprises a pyramid pooling module and a global guide flow module; the global guiding flow module is used for respectively adding n, 2n and 4n times of up-sampled pyramid pooling features during each horizontal connection of the feature pyramid in the top-down process;
the feature integration module performs n, 2n and 4n times of down sampling on the fused features, performs average pooling, performs corresponding times of up sampling integration, and performs convolution with a convolution kernel of 3m × 3 m.
Preferably, step 1, a top-down process is adopted, starting from Conv-5, Conv-5 is firstly up-sampled by 2 times, and Conv-4 is changed in channel number by 1 × 1 convolution kernel and fused with the up-sampled Conv-5 feature layer; conv-4 and Conv-3 are similar to the above, 2 times of upsampling is carried out, and then the upsampling is fused with the next layer of features after being convolved by 1 x 1 convolution kernel, so that the finally fused features are obtained;
Step 2.1, capturing global information
Carrying out average pooling on the last layer of the CenterNet feature extraction network, namely Conv-5, to generate pooling features with different scales of 1 x 1, 2 x 2, 3 x 3 and 6 x 6, changing the number of channels of the pooling features into the original 1/4 by using convolution of 1 x 1, then sampling back to the original feature layer size through bilinear interpolation, finally merging the original features together to obtain the features after pyramid pooling, and aggregating context information of different areas, thereby capturing global information;
step 2.2, globally guiding flow module
2, 4 and 8 times of pyramid pooling characteristics sampled are respectively added during each horizontal connection in the characteristic pyramid top-down process;
Preferably, a feature integration module is added on the basis of the step 2, the Conv-5 is firstly subjected to feature integration, then the Conv-4 is fused, the fused features are subjected to feature integration, the third layer and the second layer are similar to the third layer, and the fused features of each layer are integrated to obtain the final fused features; the key point, center point deviation and target size are predicted separately for three branches.
The invention provides a convolutional network training method for detecting wearing of safety helmets, which comprises the following steps that the first stage is a forward propagation stage, and the second stage is a backward propagation stage:
(4) the initial weights of the network were taken as the weights obtained from the original centrnet trained on the COCO dataset.
(5) The forward propagation of the input image, modified centret network, yields the keypoint thermodynamic diagram, the center point bias, and the size of the target.
(6) Calculating an error between the predicted value and the target value; the error function is divided into three parts:
L=Lk+λ1Lsize+λ2Loff (I.4)
wherein the formulaRepresenting a key point classification loss function, and adopting a focus loss function to solve the problem of unbalance of positive and negative samples during training;which represents the detected center point of the image,representing a background; the key points of the truth value are distributed to the thermodynamic diagram through the Gaussian functionIn the above, N represents the number of targets in the image, and alpha and beta are hyper-parameters of the loss function and are respectively 2 and 4; equation I.2 represents the target size penalty, using the L1 penalty, assumingIs the coordinates of the kth target bounding box, thenIs the coordinate of the center point of the kth target,the size of the k-th object is indicated,is the predicted target size; equation i.3 represents the center point bias loss, p is the position of the center point in the input image,representing the position of the center point p after R-fold down-sampling,indicating that the center point position, rounded after down-sampling, will have a bias,for the predicted center point deviation, formula i.4 is the total loss, and λ, λ are the specific gravities of the different loss functions.
Continuously adjusting the network weight by adopting a gradient descent method; the learning rate adjustment weight of 1.25e-4 is adopted, the iteration times are 200, the learning rate attenuation step length is 90 and 120, and the learning rate attenuation factor is 0.1.
The invention has proposed a safety helmet and worn the detection method, predict and get the thermodynamic diagram of key point through the said convolution network, compare all its corresponding points with its adjacent one point, if the response value of this point is greater than or equal to this one point value then keep, keep all peak points meeting the condition finally; order toIs a set of peak points; the final target bounding box isWhereinIn order to predict the deviation of the center point,is the predicted target size; the predicted coordinate values are displayed in the imageThe form of a box is drawn and the corresponding category is displayed.
The invention provides a storage medium for storing the safety helmet wearing detection method based on feature fusion.
The invention provides a storage device, comprising:
a memory;
one or more processors, and
one or more programs stored in the memory and configured to be executed by the one or more processors, the programs when executed by the processors implementing the feature fusion based headgear wear detection method of claim 5.
3. The technical effect produced by the invention.
1) The invention adopts a characteristic pyramid module with a U-shaped structure, integrates multilayer characteristics and improves the sensitivity to small targets.
2) According to the invention, a global guide module and a feature integration module are introduced on the basis of the feature pyramid, so that the details of the significant target are further sharpened.
3) The invention introduces a global guide flow and feature integration module to gradually refine deep semantic information; the feature integration module can reduce aliasing effect caused by sampling at high multiple, and can observe local information at different spatial positions in different scale spaces, so that the perception field of the whole network is further expanded.
4) According to the invention, the disclosed data set Safety-Helmet-training-dataset (SHWD) is adopted to train and test the CenterNet FF, the result shows that compared with the CenterNet, the detection efficiency is greatly improved, the detection effect on the Wearing condition of the Safety Helmet of a worker with small image mesoscale is improved more obviously, the detection speed can reach 21fps, and the real-time performance is basically met.
Drawings
Figure 1 is a diagram of the architecture of a centrnet network.
Fig. 2 is a structural diagram of a feature pyramid module.
FIG. 3 is a block diagram of a feature pyramid in combination with a global boot module.
FIG. 4 is a pyramid pooling module structure.
FIG. 5 is a feature integration module architecture.
Fig. 6 is a diagram of a centrenetff network architecture.
FIG. 7 is a comparison of the results of the CenterNet FF and CenterNet tests on the SHWD dataset.
FIG. 8 is a comparison of results of the test of CenterNetFF against other deep learning target detection algorithms on a SHWD data set.
Fig. 9 is a sample of a partial data set.
Fig. 10 shows the effect of CenterNetFF on field worker helmet wear.
Figure 11 is a comparison of the results of CenterNetFF and CenterNet in the detection of helmet wear on a worker at a worksite.
Detailed Description
Aiming at the problems that workers are scattered under complex working conditions, the distance from the camera is far or near, the proportion of the image occupied by the workers at the far distance is small, and the features are difficult to extract, the invention provides a method CenterNeTFF based on novel feature fusion for detecting the wearing of safety helmets of the workers at the construction site. Comprises the following steps:
The experimental data set of the invention is: SHWD, classified into hat and person categories, with a total of 7581 data sets comprising 9044 positive examples of a helmet and 11151 negative examples of an unworn helmet. The invention divides the data set into 4548 training sets, 1516 verification sets and 1517 test sets. The partial data set is shown in fig. 9.
Training process
Experiment hardware environment: ubuntu 16.04, Tesla P100 video card and video memory 16G. The code running environment is as follows: deep learning framework (pytorch0.4.1), python3.6, CUDA8.0, cudnn 5.1.
The training process is divided into two phases: the first stage is the forward propagation stage and the second stage is the backward propagation stage. The specific process comprises the following steps:
(1) the invention uses the weights obtained from the training of the original centrnet on the COCO dataset as the initial weights of the network.
(2) The 512 x 512 size image is input, 16 maps are transmitted each time, and the forward propagation of the improved centret network results in the keypoint thermodynamic map, the center point bias, and the size of the target.
(3) An error between the predicted value and the target value is calculated. The error function is divided into three parts:
L=Lk+λ1Lsize+λ2Loff (I.4)
wherein the formulaAnd representing a key point classification loss function, and adopting a focus loss function to solve the problem of unbalance of positive and negative samples during training.Which represents the detected center point of the image,representing the background. The key points of the truth value are distributed to the thermodynamic diagram through the Gaussian functionIn the above, N represents the number of objects in the image, and α and β are hyper-parameters of the loss function, which are 2 and 4, respectively. Equation I.2 represents the target size penalty, using the L1 penalty, assumingIs the coordinates of the kth target bounding box, thenIs the coordinate of the center point of the kth target,the size of the k-th object is indicated,is the predicted target size. Equation i.3 represents the center point bias loss, p is the position of the center point in the input image,representing the position of the center point p after R-fold down-sampling,indicating that the center point position, rounded after down-sampling, will have a bias,is the predicted center point deviation. Formula I.4 is the total loss, and in order to assign specific weights of different loss functions, λ is added1=0.1,λ2=1。
(4) And continuously adjusting the network weight by adopting a gradient descent method to minimize the error. The invention adopts the learning rate of 1.25e-4 to adjust the weight, the iteration times is 200, the learning rate attenuation step length is 90 and 120, and the learning rate attenuation factor is 0.1.
Detection of
And obtaining a key point thermodynamic diagram through network prediction, comparing all corresponding points with 8 adjacent points, if the response value of the point is greater than or equal to the 8 point values, reserving, and finally reserving all 100 peak points meeting the condition. Order toIs the set of peak points. The final target bounding box is
WhereinIn order to predict the deviation of the center point,is the predicted target size. And drawing the predicted coordinate values in the image in a frame form, and displaying the corresponding categories.
In order to verify the effectiveness of the feature pyramid, the global guide module and the feature integration module provided by the invention, the three modules are introduced into the centrnet in sequence. Lines 2, 6 and 10 of fig. 7 show the detection results after the feature pyramid is added under different backbone networks, and it can be seen that the average detection accuracy AP is respectively improved by 0.4%, 1.4% and 1.8%, and the average recall rate AR is respectively improved by 2.4%, 4.3% and 0.7%. Meanwhile, the detection effect of small-scale targets is greatly improved, and APsmallRespectively increased by 4.0%, 3.6% and 1.0%, and increased by ARsmallThe sensitivity to small targets is obviously improved by the introduction of the characteristic pyramid by 4.9%, 4.9% and 2.0% respectively. Lines 3, 7 and 11 in fig. 7 are results after the global boot module is introduced, and the average detection accuracy AP and the average recall rate AR are further improved, where AP is respectively improved by 0.4%, 0.7% and 0.8%, and AR is respectively improved by 0.5%, 0.4% and 1.0%. Finally the centrenetff algorithm herein AP reached 87.8%, AR reached 43.1%, and the detected AP for small scale targetssmallUp to 33.2%, ARsmall43.1% compared to the original CENTERNet APsmallAnd ARsmallRespectively improved by 2.0 percent and 2.7 percent.
The CenterNetFF algorithm is compared with the currently advanced target detection algorithms, namely, Faster R-CNN, Yolov3 and SSD, and under the condition of the same experimental environment, the result is shown in the figure, and the result shows that the Faster R-CNN has almost the same average precision as the algorithm in the text and reaches 87.02 percent. However, the detection speed is only 4.92fps, and the real-time effect is far from being achieved, while the YOLOv3 and the SSD can achieve the real-time effect in the detection speed, but the average precision is slightly lower than that of the SSD, and the requirement on the precision of the wearing detection of the safety helmet under the complex working condition cannot be met. The average accuracy of the CenterNetFF of the invention reaches 87.8%, the detection speed also reaches 21fps, and the real-time property is satisfied.
Fig. 10 is a graph of the detection of the SHWD data set by the improved centret algorithm, both by workers at distance and nearby. Fig. 11 is a comparison graph of the detection effect on the SHWD data set before and after model improvement, fig. 11a is the detection effect before no improvement, fig. 11b is the detection effect after improvement, and it can be seen that centrenetff solves the small-scale worker detection problem.
Claims (8)
1. A method for generating a convolutional network for detecting safety helmet wearing based on feature fusion is characterized in that three modules are introduced into a CenterNet in sequence:
the characteristic pyramid module adopts a top-down process, firstly Conv-5 adopts n times of up-sampling, and Conv-4 adopts m × m convolution kernels to change the number of channels and is fused with the Conv-5 characteristic layer after up-sampling; conv-4 and Conv-3 are similar to the operation, namely n times of upsampling is carried out firstly and then the upsampled n times of upsampled m times of convolution kernel is fused with the next layer;
the global guide module comprises a pyramid pooling module and a global guide flow module; the global guiding flow module is used for respectively adding n, 2n and 4n times of up-sampled pyramid pooling features during each horizontal connection of the feature pyramid in the top-down process;
the feature integration module performs n, 2n and 4n times of down sampling on the fused features, performs average pooling, performs corresponding times of up sampling integration, and performs convolution with a convolution kernel of 3m × 3 m.
2. The feature fusion based helmet wearing detection convolutional network generating method of claim 1,
step 1, adopting a top-down process, starting from Conv-5, firstly adopting 2 times of upsampling for Conv-5, adopting 1 x 1 convolution kernel for Conv-4 to change the number of channels, and fusing with the upsampled Conv-5 characteristic layer; conv-4 and Conv-3 are similar to the above, 2 times of upsampling is carried out, and then the upsampling is fused with the next layer of features after being convolved by 1 x 1 convolution kernel, so that the finally fused features are obtained;
step 2, global guide module
Step 2.1, capturing global information
Carrying out average pooling on the last layer of the CenterNet feature extraction network, namely Conv-5, to generate pooling features with different scales of 1 x 1, 2 x 2, 3 x 3 and 6 x 6, changing the number of channels of the pooling features into the original 1/4 by using convolution of 1 x 1, then sampling back to the original feature layer size through bilinear interpolation, finally merging the original features together to obtain the features after pyramid pooling, and aggregating context information of different areas, thereby capturing global information;
step 2.2, globally guiding flow module
2, 4 and 8 times of pyramid pooling characteristics sampled are respectively added during each horizontal connection in the characteristic pyramid top-down process;
step 3, firstly, performing 2, 4 and 8 times of down sampling on the fused features, then performing average pooling, then performing corresponding times of up sampling and integrating together, and then performing convolution with a convolution kernel with the size of 3 x 3; the feature integration module is introduced into the centrnet.
3. The feature fusion based helmet wearing detection convolutional network generating method of claim 2,
adding a feature integration module on the basis of the step 2, firstly performing feature integration on Conv-5, then fusing with Conv-4, performing feature integration on fused features, wherein the third layer is similar to the second layer in structure, and integrating the fused features of each layer to obtain final fused features; the key point, center point deviation and target size are predicted separately for three branches.
4. The helmet wearing detection convolutional network training method based on feature fusion of claim 3, comprising a first stage of forward propagation and a second stage of backward propagation:
(1) the initial weights of the network were taken as the weights obtained from the original centrnet trained on the COCO dataset.
(2) The forward propagation of the input image, modified centret network, yields the keypoint thermodynamic diagram, the center point bias, and the size of the target.
(3) Calculating an error between the predicted value and the target value; the error function is divided into three parts:
L=Lk+λ1Lsize+λ2Loff (I.4)
the formula I.1 represents a key point classification loss function, and a focus loss function is adopted to solve the problem of unbalance of positive and negative samples during training;which represents the detected center point of the image,representing a background; the key points of the truth value are distributed to the thermodynamic diagram through the Gaussian functionIn the above, N represents the number of targets in the image, and alpha and beta are hyper-parameters of the loss function and are respectively 2 and 4; equation I.2 represents the target size penalty, using the L1 penalty, assumingIs the coordinates of the kth target bounding box, thenIs the coordinate of the center point of the kth target,the size of the k-th object is indicated,is the predicted target size; equation i.3 represents the center point bias loss, p is the position of the center point in the input image,representing the position of the center point p after R-fold down-sampling,indicating that the center point position, rounded after down-sampling, will have a bias,for predicted center point deviation formula I.4 is the total loss, λ1、λ2Are the specific gravity of different loss functions.
5. The helmet wearing detection convolutional network training method based on feature fusion of claim 4, wherein a gradient descent method is adopted to continuously adjust the network weight; the learning rate adjustment weight of 1.25e-4 is adopted, the iteration times are 200, the learning rate attenuation step length is 90 and 120, and the learning rate attenuation factor is 0.1.
6. A safety helmet wearing detection method based on feature fusion as claimed in claim 4 or 5, characterized in that, a key point thermodynamic diagram is obtained through the prediction of the convolution network, all corresponding points are compared with the adjacent point I, if the response value of the point is greater than or equal to the point I, all peak points meeting the condition are retained; order toIs a set of peak points; the final target bounding box is
7. A storage medium storing the method for detecting wearing of a helmet based on feature fusion according to claim 6.
8. A memory device, comprising:
a memory;
one or more processors, and
one or more programs stored in the memory and configured to be executed by the one or more processors, the programs when executed by the processors implementing the feature fusion based headgear wear detection method of claim 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010966231.3A CN112070043B (en) | 2020-09-15 | 2020-09-15 | Feature fusion-based safety helmet wearing convolution network, training and detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010966231.3A CN112070043B (en) | 2020-09-15 | 2020-09-15 | Feature fusion-based safety helmet wearing convolution network, training and detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112070043A true CN112070043A (en) | 2020-12-11 |
CN112070043B CN112070043B (en) | 2023-11-10 |
Family
ID=73695749
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010966231.3A Active CN112070043B (en) | 2020-09-15 | 2020-09-15 | Feature fusion-based safety helmet wearing convolution network, training and detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112070043B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112949730A (en) * | 2021-03-11 | 2021-06-11 | 江苏禹空间科技有限公司 | Method, device, storage medium and equipment for detecting target with few samples |
CN113128413A (en) * | 2021-04-22 | 2021-07-16 | 广州织点智能科技有限公司 | Face detection model training method, face detection method and related device thereof |
CN113177486A (en) * | 2021-04-30 | 2021-07-27 | 重庆师范大学 | Dragonfly order insect identification method based on regional suggestion network |
CN113222904A (en) * | 2021-04-21 | 2021-08-06 | 重庆邮电大学 | Concrete pavement crack detection method for improving PoolNet network structure |
CN113222990A (en) * | 2021-06-11 | 2021-08-06 | 青岛高重信息科技有限公司 | Chip counting method based on image data enhancement |
CN113643235A (en) * | 2021-07-07 | 2021-11-12 | 青岛高重信息科技有限公司 | Chip counting method based on deep learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101725271A (en) * | 2009-11-17 | 2010-06-09 | 江苏南通三建集团有限公司 | Rapid construction method of reinforced concrete chimney |
CN109034215A (en) * | 2018-07-09 | 2018-12-18 | 东北大学 | A kind of safety cap wearing detection method based on depth convolutional neural networks |
CN109344821A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detecting method based on Fusion Features and deep learning |
CN109376673A (en) * | 2018-10-31 | 2019-02-22 | 南京工业大学 | A kind of coal mine down-hole personnel unsafe acts recognition methods based on human body attitude estimation |
CN110119686A (en) * | 2019-04-17 | 2019-08-13 | 电子科技大学 | A kind of safety cap real-time detection method based on convolutional neural networks |
CN110728223A (en) * | 2019-10-08 | 2020-01-24 | 济南东朔微电子有限公司 | Helmet wearing identification method based on deep learning |
-
2020
- 2020-09-15 CN CN202010966231.3A patent/CN112070043B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101725271A (en) * | 2009-11-17 | 2010-06-09 | 江苏南通三建集团有限公司 | Rapid construction method of reinforced concrete chimney |
CN109034215A (en) * | 2018-07-09 | 2018-12-18 | 东北大学 | A kind of safety cap wearing detection method based on depth convolutional neural networks |
CN109344821A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detecting method based on Fusion Features and deep learning |
CN109376673A (en) * | 2018-10-31 | 2019-02-22 | 南京工业大学 | A kind of coal mine down-hole personnel unsafe acts recognition methods based on human body attitude estimation |
CN110119686A (en) * | 2019-04-17 | 2019-08-13 | 电子科技大学 | A kind of safety cap real-time detection method based on convolutional neural networks |
CN110728223A (en) * | 2019-10-08 | 2020-01-24 | 济南东朔微电子有限公司 | Helmet wearing identification method based on deep learning |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112949730A (en) * | 2021-03-11 | 2021-06-11 | 江苏禹空间科技有限公司 | Method, device, storage medium and equipment for detecting target with few samples |
CN112949730B (en) * | 2021-03-11 | 2024-04-09 | 无锡禹空间智能科技有限公司 | Method, device, storage medium and equipment for detecting target with few samples |
CN113222904A (en) * | 2021-04-21 | 2021-08-06 | 重庆邮电大学 | Concrete pavement crack detection method for improving PoolNet network structure |
CN113128413A (en) * | 2021-04-22 | 2021-07-16 | 广州织点智能科技有限公司 | Face detection model training method, face detection method and related device thereof |
CN113177486A (en) * | 2021-04-30 | 2021-07-27 | 重庆师范大学 | Dragonfly order insect identification method based on regional suggestion network |
CN113177486B (en) * | 2021-04-30 | 2022-06-03 | 重庆师范大学 | Dragonfly order insect identification method based on regional suggestion network |
CN113222990A (en) * | 2021-06-11 | 2021-08-06 | 青岛高重信息科技有限公司 | Chip counting method based on image data enhancement |
CN113222990B (en) * | 2021-06-11 | 2023-03-14 | 青岛高重信息科技有限公司 | Chip counting method based on image data enhancement |
CN113643235A (en) * | 2021-07-07 | 2021-11-12 | 青岛高重信息科技有限公司 | Chip counting method based on deep learning |
CN113643235B (en) * | 2021-07-07 | 2023-12-29 | 青岛高重信息科技有限公司 | Chip counting method based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN112070043B (en) | 2023-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112070043B (en) | Feature fusion-based safety helmet wearing convolution network, training and detection method | |
CN110502965B (en) | Construction safety helmet wearing monitoring method based on computer vision human body posture estimation | |
Shen et al. | Detecting safety helmet wearing on construction sites with bounding‐box regression and deep transfer learning | |
Han et al. | Deep learning-based workers safety helmet wearing detection on construction sites using multi-scale features | |
CN111126325B (en) | Intelligent personnel security identification statistical method based on video | |
Su et al. | RCAG-Net: Residual channelwise attention gate network for hot spot defect detection of photovoltaic farms | |
CN102163290B (en) | Method for modeling abnormal events in multi-visual angle video monitoring based on temporal-spatial correlation information | |
Shi et al. | Real-time traffic light detection with adaptive background suppression filter | |
CN103530638B (en) | Method for pedestrian matching under multi-cam | |
CN112149512A (en) | Helmet wearing identification method based on two-stage deep learning | |
CN110728252B (en) | Face detection method applied to regional personnel motion trail monitoring | |
CN113139437B (en) | Helmet wearing inspection method based on YOLOv3 algorithm | |
CN112287827A (en) | Complex environment pedestrian mask wearing detection method and system based on intelligent lamp pole | |
JP2014093023A (en) | Object detection device, object detection method and program | |
CN107688830A (en) | It is a kind of for case string and show survey visual information association figure layer generation method | |
CN113033315A (en) | Rare earth mining high-resolution image identification and positioning method | |
CN106570471A (en) | Scale adaptive multi-attitude face tracking method based on compressive tracking algorithm | |
CN115512387A (en) | Construction site safety helmet wearing detection method based on improved YOLOV5 model | |
Dai et al. | Real-time safety helmet detection system based on improved SSD | |
CN117197676A (en) | Target detection and identification method based on feature fusion | |
CN112785564B (en) | Pedestrian detection tracking system and method based on mechanical arm | |
CN112541403B (en) | Indoor personnel falling detection method by utilizing infrared camera | |
CN110929711B (en) | Method for automatically associating identity information and shape information applied to fixed scene | |
Peng et al. | [Retracted] Helmet Wearing Recognition of Construction Workers Using Convolutional Neural Network | |
CN115273150A (en) | Novel identification method and system for wearing safety helmet based on human body posture estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |