CN112070043A - Safety helmet wearing convolutional network based on feature fusion, training and detecting method - Google Patents

Safety helmet wearing convolutional network based on feature fusion, training and detecting method Download PDF

Info

Publication number
CN112070043A
CN112070043A CN202010966231.3A CN202010966231A CN112070043A CN 112070043 A CN112070043 A CN 112070043A CN 202010966231 A CN202010966231 A CN 202010966231A CN 112070043 A CN112070043 A CN 112070043A
Authority
CN
China
Prior art keywords
conv
feature
sampling
module
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010966231.3A
Other languages
Chinese (zh)
Other versions
CN112070043B (en
Inventor
周敏新
张方舟
王学宇
任鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changshu Institute of Technology
Original Assignee
Changshu Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changshu Institute of Technology filed Critical Changshu Institute of Technology
Priority to CN202010966231.3A priority Critical patent/CN112070043B/en
Publication of CN112070043A publication Critical patent/CN112070043A/en
Application granted granted Critical
Publication of CN112070043B publication Critical patent/CN112070043B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a convolutional network worn on a safety helmet based on feature fusion, a training method and a detection method. Three modules were introduced in the centrnet in sequence: the characteristic pyramid module adopts a top-down process, firstly Conv-5 adopts n times of up-sampling, and Conv-4 adopts m × m convolution kernels to change the number of channels and is fused with the Conv-5 characteristic layer after up-sampling; conv-4 and Conv-3 are similar to the operation, namely n times of upsampling is carried out firstly and then the upsampled n times of upsampled m times of convolution kernel is fused with the next layer; the global guide module comprises a pyramid pooling module and a global guide flow module; the feature integration module performs n, 2n and 4n times of down sampling on the fused features, performs average pooling, performs corresponding times of up sampling integration, and performs convolution with a convolution kernel of 3m × 3 m. The invention greatly improves the detection efficiency, obviously improves the detection effect on the wearing condition of the safety helmet of a worker with small image mesoscale, has the detection speed of 21fps and basically meets the real-time property.

Description

Safety helmet wearing convolutional network based on feature fusion, training and detecting method
Technical Field
The invention belongs to the field of artificial intelligence, and particularly relates to a helmet wearing detection method CenterNet-Feure-Fusion based on feature Fusion,
background
Statistical data show that 13 major accidents happen in the building industry in May nationwide this year, and the death of 51 people is increased by 18.2% and 34.2% respectively compared with the death of the last year. The safety helmet is an important tool for protecting the head of a worker, and has important significance for life safety. However, many workers lack safety awareness, and the situation that the safety helmet is not worn occurs, so that it is a significant matter to automatically detect whether the worker wears the safety helmet or not.
According to investigation, most of monitoring cameras on a construction site are arranged at a high position, the proportion of the shot images occupied by workers is small, and the characteristics are difficult to identify, so that the problem of wearing and detecting the safety helmet of small-scale workers is urgently needed to be solved.
Sensor-based approaches focus on location and tracking technologies such as Radio Frequency Identification (RFID) and Wireless Local Area Networks (WLANs). Dong et al developed a real-time location system (RTLS) and virtual configuration for worker location tracking, i.e., a pressure sensor is placed in the helmet and pressure information is transmitted via bluetooth to determine whether the worker is wearing the helmet. Zhang et al developed an intelligent headgear system using an internet of things based architecture. In order to determine the use condition of the safety helmet, an infrared beam detector and a thermal infrared sensor are placed in the safety helmet.
Object detection is to identify not only the class of an object but also to predict the location of the object in an image, typically marked with a box. The traditional target detection method generally uses a frame of a sliding window, and mainly comprises three steps:
(1) sliding on the image by using sliding windows with different scales, and selecting a certain part as a candidate area;
(2) and extracting the visual features of the candidate regions. Such as HOG features, Haar features, etc.;
(3) classifying by using a classifier, such as an SVM classifier;
with the development of deep learning, a target detection algorithm based on deep learning becomes mainstream, and different from the traditional method, the characteristics are autonomously learned through a multilayer convolutional neural network without designing the characteristics by self. The current target detection algorithms based on deep learning are roughly divided into two types: a two-stage process and a one-stage process. The difference between the two methods is that the two-stage method needs to extract a preselected candidate frame and then carry out classification prediction on the candidate frame, such as R-CNN series, FPN and the like. The one-stage method does not need to extract a preselected candidate frame, and directly treats the target positioning problem as a regression problem, such as a YOLO series, SSD, and the like.
Compared with a sensor-based method, the vision-based method is concerned by more and more people, the use of a camera and the improvement of computer vision and pattern recognition technology lay a solid foundation for the vision-based safety helmet wearing detection, and the safety helmet wearing detection belongs to a small application in target detection. The traditional method based on manual design features is as follows: liu Xiao Hui et al uses skin color detection method to detect the area of face, extracts Hu moment feature of the above area, and uses SVM to classify. Park et al extract the HOG characteristics of the body part of the person and the safety helmet respectively, and then match according to their spatial relationship, when the person is not standing, the method has poor detection effect. Li et al propose a color-based hybrid descriptor to extract features of different color helmets, and then use a build-level support vector machine to classify objects into four categories (red, yellow, blue and no helmets). With the development of deep learning target detection, Fang et al uses FasterR-CNN algorithm to perform helmet detection, firstly uses RPN network to select candidate region frame, and then predicts each candidate region. The handsome et al uses the YOLOv3 algorithm to detect the safety helmet, does not need to preselect candidate frames in advance, and directly treats the target positioning problem as a regression problem.
A novel feature fusion mode is introduced on the basis of the current advanced CenterNet target detection algorithm, and the problem that small-scale workers on a site detect wearing safety helmets is solved. According to the above contents, the existing helmet wearing detection method is still in a starting stage, once the method based on the sensor exceeds a signal range, the detection cannot be carried out, equipment needs to be charged regularly and cannot be used for a long time, the method based on the manual design features is poor in generalization performance, low in universality and limited in use scene. Only then is there a later approach based on deep learning target detection, but the deep learning target detection algorithms used in the past for headgear wearing detection still have the following problems: (1) an anchor frame mechanism is adopted, the size and the length-width ratio of the anchor frame are designed to be troublesome, and a large number of redundant frames exist, so that the positive and negative samples are seriously unbalanced. (2) Because the workers are scattered under the complex working condition, the distance from the lens is short, the proportion of the image occupied by the workers far away is small, and the features are difficult to obtain.
The original centrnet structure is shown in figure 1: inputting a 512 × 512 picture, extracting the features of the image through a backbone network, and performing upsampling by using a bilinear interpolation method to obtain a 128 × 128 feature map in order to obtain a high-resolution feature map. The prediction part adopts 3 branches for generating a key point thermodynamic diagram and scale prediction of a bounding box (W)i,Hi) And deviation prediction of key points (Δ X)i,ΔYi) Convolution of 3 × 3 and 1 × 1 is adopted. From the predicted centre point coordinates (X)i,Yi) Namely, the peak point in the thermodynamic diagram of the key point, the scale of the bounding box and the deviation value can locate the position (X) of the targeti+ΔXi-Wi/2,Yi+ΔYi-Hi/2, Xi+ΔXi+Wi/2,Yi+ΔYi+Hi/2). It can be seen that the centret, although up-sampling the last layer of feature map, obtains a higher resolution output feature map, does not incorporate the details of the shallow features, and therefore, the centret is less effective in detecting the wearing of small-scale workers' safety helmets.
Disclosure of Invention
1. The invention aims to provide a novel method.
The invention provides a safety helmet wearing convolutional network based on feature fusion, a training and detecting method, a storage medium and a device, aiming at solving the problems that the number of network layers is increased, semantic information is rich, position information is lost, shallow feature position information is rich and semantic information is lost in the prior art.
2. The technical scheme adopted by the invention is disclosed.
The invention discloses a method for generating a convolutional network for detecting safety helmet wearing based on feature fusion, which sequentially introduces three modules, a feature pyramid module, a global guide module and a feature integration module into a CenterNet:
the characteristic pyramid module adopts a top-down process, firstly Conv-5 adopts n times of up-sampling, and Conv-4 adopts m × m convolution kernels to change the number of channels and is fused with the Conv-5 characteristic layer after up-sampling; conv-4 and Conv-3 are similar to the operation, namely n times of upsampling is carried out firstly and then the upsampled n times of upsampled m times of convolution kernel is fused with the next layer;
the global guide module comprises a pyramid pooling module and a global guide flow module; the global guiding flow module is used for respectively adding n, 2n and 4n times of up-sampled pyramid pooling features during each horizontal connection of the feature pyramid in the top-down process;
the feature integration module performs n, 2n and 4n times of down sampling on the fused features, performs average pooling, performs corresponding times of up sampling integration, and performs convolution with a convolution kernel of 3m × 3 m.
Preferably, step 1, a top-down process is adopted, starting from Conv-5, Conv-5 is firstly up-sampled by 2 times, and Conv-4 is changed in channel number by 1 × 1 convolution kernel and fused with the up-sampled Conv-5 feature layer; conv-4 and Conv-3 are similar to the above, 2 times of upsampling is carried out, and then the upsampling is fused with the next layer of features after being convolved by 1 x 1 convolution kernel, so that the finally fused features are obtained;
step 2, global guide module
Step 2.1, capturing global information
Carrying out average pooling on the last layer of the CenterNet feature extraction network, namely Conv-5, to generate pooling features with different scales of 1 x 1, 2 x 2, 3 x 3 and 6 x 6, changing the number of channels of the pooling features into the original 1/4 by using convolution of 1 x 1, then sampling back to the original feature layer size through bilinear interpolation, finally merging the original features together to obtain the features after pyramid pooling, and aggregating context information of different areas, thereby capturing global information;
step 2.2, globally guiding flow module
2, 4 and 8 times of pyramid pooling characteristics sampled are respectively added during each horizontal connection in the characteristic pyramid top-down process;
step 3, firstly, performing 2, 4 and 8 times of down sampling on the fused features, then performing average pooling, then performing corresponding times of up sampling and integrating together, and then performing convolution with a convolution kernel with the size of 3 x 3; the feature integration module is introduced into the centrnet.
Preferably, a feature integration module is added on the basis of the step 2, the Conv-5 is firstly subjected to feature integration, then the Conv-4 is fused, the fused features are subjected to feature integration, the third layer and the second layer are similar to the third layer, and the fused features of each layer are integrated to obtain the final fused features; the key point, center point deviation and target size are predicted separately for three branches.
The invention provides a convolutional network training method for detecting wearing of safety helmets, which comprises the following steps that the first stage is a forward propagation stage, and the second stage is a backward propagation stage:
(4) the initial weights of the network were taken as the weights obtained from the original centrnet trained on the COCO dataset.
(5) The forward propagation of the input image, modified centret network, yields the keypoint thermodynamic diagram, the center point bias, and the size of the target.
(6) Calculating an error between the predicted value and the target value; the error function is divided into three parts:
Figure BDA0002682426360000041
Figure BDA0002682426360000042
Figure BDA0002682426360000051
L=Lk1Lsize2Loff (I.4)
wherein the formula
Figure BDA0002682426360000052
Representing a key point classification loss function, and adopting a focus loss function to solve the problem of unbalance of positive and negative samples during training;
Figure BDA0002682426360000053
which represents the detected center point of the image,
Figure BDA0002682426360000054
representing a background; the key points of the truth value are distributed to the thermodynamic diagram through the Gaussian function
Figure BDA0002682426360000055
In the above, N represents the number of targets in the image, and alpha and beta are hyper-parameters of the loss function and are respectively 2 and 4; equation I.2 represents the target size penalty, using the L1 penalty, assuming
Figure BDA0002682426360000056
Is the coordinates of the kth target bounding box, then
Figure BDA0002682426360000057
Is the coordinate of the center point of the kth target,
Figure BDA0002682426360000058
the size of the k-th object is indicated,
Figure BDA0002682426360000059
is the predicted target size; equation i.3 represents the center point bias loss, p is the position of the center point in the input image,
Figure BDA00026824263600000510
representing the position of the center point p after R-fold down-sampling,
Figure BDA00026824263600000511
indicating that the center point position, rounded after down-sampling, will have a bias,
Figure BDA00026824263600000512
for the predicted center point deviation, formula i.4 is the total loss, and λ, λ are the specific gravities of the different loss functions.
Continuously adjusting the network weight by adopting a gradient descent method; the learning rate adjustment weight of 1.25e-4 is adopted, the iteration times are 200, the learning rate attenuation step length is 90 and 120, and the learning rate attenuation factor is 0.1.
The invention has proposed a safety helmet and worn the detection method, predict and get the thermodynamic diagram of key point through the said convolution network, compare all its corresponding points with its adjacent one point, if the response value of this point is greater than or equal to this one point value then keep, keep all peak points meeting the condition finally; order to
Figure BDA00026824263600000513
Is a set of peak points; the final target bounding box is
Figure BDA00026824263600000514
Wherein
Figure BDA00026824263600000515
In order to predict the deviation of the center point,
Figure BDA00026824263600000516
is the predicted target size; the predicted coordinate values are displayed in the imageThe form of a box is drawn and the corresponding category is displayed.
The invention provides a storage medium for storing the safety helmet wearing detection method based on feature fusion.
The invention provides a storage device, comprising:
a memory;
one or more processors, and
one or more programs stored in the memory and configured to be executed by the one or more processors, the programs when executed by the processors implementing the feature fusion based headgear wear detection method of claim 5.
3. The technical effect produced by the invention.
1) The invention adopts a characteristic pyramid module with a U-shaped structure, integrates multilayer characteristics and improves the sensitivity to small targets.
2) According to the invention, a global guide module and a feature integration module are introduced on the basis of the feature pyramid, so that the details of the significant target are further sharpened.
3) The invention introduces a global guide flow and feature integration module to gradually refine deep semantic information; the feature integration module can reduce aliasing effect caused by sampling at high multiple, and can observe local information at different spatial positions in different scale spaces, so that the perception field of the whole network is further expanded.
4) According to the invention, the disclosed data set Safety-Helmet-training-dataset (SHWD) is adopted to train and test the CenterNet FF, the result shows that compared with the CenterNet, the detection efficiency is greatly improved, the detection effect on the Wearing condition of the Safety Helmet of a worker with small image mesoscale is improved more obviously, the detection speed can reach 21fps, and the real-time performance is basically met.
Drawings
Figure 1 is a diagram of the architecture of a centrnet network.
Fig. 2 is a structural diagram of a feature pyramid module.
FIG. 3 is a block diagram of a feature pyramid in combination with a global boot module.
FIG. 4 is a pyramid pooling module structure.
FIG. 5 is a feature integration module architecture.
Fig. 6 is a diagram of a centrenetff network architecture.
FIG. 7 is a comparison of the results of the CenterNet FF and CenterNet tests on the SHWD dataset.
FIG. 8 is a comparison of results of the test of CenterNetFF against other deep learning target detection algorithms on a SHWD data set.
Fig. 9 is a sample of a partial data set.
Fig. 10 shows the effect of CenterNetFF on field worker helmet wear.
Figure 11 is a comparison of the results of CenterNetFF and CenterNet in the detection of helmet wear on a worker at a worksite.
Detailed Description
Aiming at the problems that workers are scattered under complex working conditions, the distance from the camera is far or near, the proportion of the image occupied by the workers at the far distance is small, and the features are difficult to extract, the invention provides a method CenterNeTFF based on novel feature fusion for detecting the wearing of safety helmets of the workers at the construction site. Comprises the following steps:
step 1, as shown in fig. 6, the sizes of the Conv _2 to Conv _5 feature maps gradually decrease with the increase of the number of network layers, the resolution is lower, the semantic information is richer and richer, but the position information is lacked, the shallow feature resolution is high, the feeling is also matched with the small target size, the position information is rich, but the semantic information is lacked. The centret shown in fig. 1 only predicts through the last feature layer, ignores the detail information of shallow features, and is not sensitive to small targets, and has poor detection effect on workers far away from the camera on the construction site. According to the invention, a top-down process is adopted, as shown in FIG. 2, F represents a feature fusion process, namely an F module in FIG. 6, starting from Conv-5, Conv-5 is firstly up-sampled by 2 times, and Conv-4 is fused with the up-sampled Conv-5 feature layer by changing the number of channels by a 1 x 1 convolution kernel. Conv-4 and Conv-3 are similar to the above, and 2 times of upsampling is performed first, and then the upsampled samples are fused with the next layer of features after being convolved by 1 x 1 convolution kernel to obtain final fused features P2.
Step 2, the characteristic pyramid is a typical structure fusing multiple characteristic layers, but has a disadvantage that deep semantic information is gradually diluted in the top-down process. In the past, researches show that the actual sensing visual field of the convolutional neural network is smaller than the theoretical sensing visual field, especially on deep features, so that the sensing visual field of the whole network is not enough to capture global information of an input image, a remarkable target is easily phagocytosed by the background, namely, a worker on a construction site is easily phagocytosed by the background of a building and the like, and missing detection is caused. The invention introduces a Global Guide Module (GGM) based on a characteristic Pyramid, and the GGM comprises two modules of Pyramid Pooling (PPM) and Global Guide Flow (GGFs). Firstly, the last layer of the CenterNet feature extraction network, namely Conv-5, is subjected to average pooling to generate pooled features with different scales of 1 x 1, 2 x 2, 3 x 3 and 6 x 6, then the number of channels of the pooled features is changed into the original 1/4 by using convolution of 1 x 1, then the original feature layer size is sampled back through bilinear interpolation, and finally the features and the original features are combined together to obtain the features after pyramid pooling, and context information of different regions is aggregated, so that global information is captured. In the second step, as shown by the dotted line box in fig. 3, 2, 4, and 8 times of the pyramid pooling features of upsampling are added during each horizontal connection in the top-down process of the feature pyramid, so that semantic information is not diluted.
Step 3, the global guide module in the previous step adds global guide information to each feature layer in the top-down process, but also brings some problems, the traditional feature pyramid module adopts a convolution kernel of 3 × 3 to perform convolution after 2 times of upsampling feature fusion is performed to eliminate aliasing effect caused by upsampling, and the global guide module needs to perform upsampling of a multiple of 4 or 8 times, so that how to efficiently process the difference between the GGFs and the feature layers with different scales is necessary. The invention adopts a feature integration module, and the specific structure is shown in figure 5, the fused features are firstly sampled by 2, 4 and 8 times, then averaged and pooled, and then sampled by corresponding times and integrated together, and then convolved with a convolution kernel with the size of 3 x 3. Introducing a feature integration module into the centrnet as shown in fig. 6, adding a feature integration module a on the basis of the second step, firstly performing feature integration on Conv-5, then performing fusion with Conv-4, performing feature integration on the fused features, wherein the third layer is similar to the second layer, integrating the fused features of each layer to obtain the final fused feature F2, and the feature integration module can reduce aliasing effect caused by sampling at high times, observe local information at different spatial positions in different scale spaces, and further expand the experience view of the whole network. Next, as with the original CenterNet, the keypoints, center point deviations, and target sizes are predicted separately for three branches.
The experimental data set of the invention is: SHWD, classified into hat and person categories, with a total of 7581 data sets comprising 9044 positive examples of a helmet and 11151 negative examples of an unworn helmet. The invention divides the data set into 4548 training sets, 1516 verification sets and 1517 test sets. The partial data set is shown in fig. 9.
Training process
Experiment hardware environment: ubuntu 16.04, Tesla P100 video card and video memory 16G. The code running environment is as follows: deep learning framework (pytorch0.4.1), python3.6, CUDA8.0, cudnn 5.1.
The training process is divided into two phases: the first stage is the forward propagation stage and the second stage is the backward propagation stage. The specific process comprises the following steps:
(1) the invention uses the weights obtained from the training of the original centrnet on the COCO dataset as the initial weights of the network.
(2) The 512 x 512 size image is input, 16 maps are transmitted each time, and the forward propagation of the improved centret network results in the keypoint thermodynamic map, the center point bias, and the size of the target.
(3) An error between the predicted value and the target value is calculated. The error function is divided into three parts:
Figure BDA0002682426360000091
Figure BDA0002682426360000092
Figure BDA0002682426360000093
L=Lk1Lsize2Loff (I.4)
wherein the formula
Figure BDA0002682426360000094
And representing a key point classification loss function, and adopting a focus loss function to solve the problem of unbalance of positive and negative samples during training.
Figure BDA0002682426360000095
Which represents the detected center point of the image,
Figure BDA0002682426360000096
representing the background. The key points of the truth value are distributed to the thermodynamic diagram through the Gaussian function
Figure BDA0002682426360000097
In the above, N represents the number of objects in the image, and α and β are hyper-parameters of the loss function, which are 2 and 4, respectively. Equation I.2 represents the target size penalty, using the L1 penalty, assuming
Figure BDA0002682426360000098
Is the coordinates of the kth target bounding box, then
Figure BDA0002682426360000099
Is the coordinate of the center point of the kth target,
Figure BDA00026824263600000910
the size of the k-th object is indicated,
Figure BDA00026824263600000911
is the predicted target size. Equation i.3 represents the center point bias loss, p is the position of the center point in the input image,
Figure BDA00026824263600000912
representing the position of the center point p after R-fold down-sampling,
Figure BDA00026824263600000913
indicating that the center point position, rounded after down-sampling, will have a bias,
Figure BDA00026824263600000914
is the predicted center point deviation. Formula I.4 is the total loss, and in order to assign specific weights of different loss functions, λ is added1=0.1,λ2=1。
(4) And continuously adjusting the network weight by adopting a gradient descent method to minimize the error. The invention adopts the learning rate of 1.25e-4 to adjust the weight, the iteration times is 200, the learning rate attenuation step length is 90 and 120, and the learning rate attenuation factor is 0.1.
Detection of
And obtaining a key point thermodynamic diagram through network prediction, comparing all corresponding points with 8 adjacent points, if the response value of the point is greater than or equal to the 8 point values, reserving, and finally reserving all 100 peak points meeting the condition. Order to
Figure BDA0002682426360000101
Is the set of peak points. The final target bounding box is
Figure BDA0002682426360000102
Wherein
Figure BDA0002682426360000103
In order to predict the deviation of the center point,
Figure BDA0002682426360000104
is the predicted target size. And drawing the predicted coordinate values in the image in a frame form, and displaying the corresponding categories.
In order to verify the effectiveness of the feature pyramid, the global guide module and the feature integration module provided by the invention, the three modules are introduced into the centrnet in sequence. Lines 2, 6 and 10 of fig. 7 show the detection results after the feature pyramid is added under different backbone networks, and it can be seen that the average detection accuracy AP is respectively improved by 0.4%, 1.4% and 1.8%, and the average recall rate AR is respectively improved by 2.4%, 4.3% and 0.7%. Meanwhile, the detection effect of small-scale targets is greatly improved, and APsmallRespectively increased by 4.0%, 3.6% and 1.0%, and increased by ARsmallThe sensitivity to small targets is obviously improved by the introduction of the characteristic pyramid by 4.9%, 4.9% and 2.0% respectively. Lines 3, 7 and 11 in fig. 7 are results after the global boot module is introduced, and the average detection accuracy AP and the average recall rate AR are further improved, where AP is respectively improved by 0.4%, 0.7% and 0.8%, and AR is respectively improved by 0.5%, 0.4% and 1.0%. Finally the centrenetff algorithm herein AP reached 87.8%, AR reached 43.1%, and the detected AP for small scale targetssmallUp to 33.2%, ARsmall43.1% compared to the original CENTERNet APsmallAnd ARsmallRespectively improved by 2.0 percent and 2.7 percent.
The CenterNetFF algorithm is compared with the currently advanced target detection algorithms, namely, Faster R-CNN, Yolov3 and SSD, and under the condition of the same experimental environment, the result is shown in the figure, and the result shows that the Faster R-CNN has almost the same average precision as the algorithm in the text and reaches 87.02 percent. However, the detection speed is only 4.92fps, and the real-time effect is far from being achieved, while the YOLOv3 and the SSD can achieve the real-time effect in the detection speed, but the average precision is slightly lower than that of the SSD, and the requirement on the precision of the wearing detection of the safety helmet under the complex working condition cannot be met. The average accuracy of the CenterNetFF of the invention reaches 87.8%, the detection speed also reaches 21fps, and the real-time property is satisfied.
Fig. 10 is a graph of the detection of the SHWD data set by the improved centret algorithm, both by workers at distance and nearby. Fig. 11 is a comparison graph of the detection effect on the SHWD data set before and after model improvement, fig. 11a is the detection effect before no improvement, fig. 11b is the detection effect after improvement, and it can be seen that centrenetff solves the small-scale worker detection problem.

Claims (8)

1. A method for generating a convolutional network for detecting safety helmet wearing based on feature fusion is characterized in that three modules are introduced into a CenterNet in sequence:
the characteristic pyramid module adopts a top-down process, firstly Conv-5 adopts n times of up-sampling, and Conv-4 adopts m × m convolution kernels to change the number of channels and is fused with the Conv-5 characteristic layer after up-sampling; conv-4 and Conv-3 are similar to the operation, namely n times of upsampling is carried out firstly and then the upsampled n times of upsampled m times of convolution kernel is fused with the next layer;
the global guide module comprises a pyramid pooling module and a global guide flow module; the global guiding flow module is used for respectively adding n, 2n and 4n times of up-sampled pyramid pooling features during each horizontal connection of the feature pyramid in the top-down process;
the feature integration module performs n, 2n and 4n times of down sampling on the fused features, performs average pooling, performs corresponding times of up sampling integration, and performs convolution with a convolution kernel of 3m × 3 m.
2. The feature fusion based helmet wearing detection convolutional network generating method of claim 1,
step 1, adopting a top-down process, starting from Conv-5, firstly adopting 2 times of upsampling for Conv-5, adopting 1 x 1 convolution kernel for Conv-4 to change the number of channels, and fusing with the upsampled Conv-5 characteristic layer; conv-4 and Conv-3 are similar to the above, 2 times of upsampling is carried out, and then the upsampling is fused with the next layer of features after being convolved by 1 x 1 convolution kernel, so that the finally fused features are obtained;
step 2, global guide module
Step 2.1, capturing global information
Carrying out average pooling on the last layer of the CenterNet feature extraction network, namely Conv-5, to generate pooling features with different scales of 1 x 1, 2 x 2, 3 x 3 and 6 x 6, changing the number of channels of the pooling features into the original 1/4 by using convolution of 1 x 1, then sampling back to the original feature layer size through bilinear interpolation, finally merging the original features together to obtain the features after pyramid pooling, and aggregating context information of different areas, thereby capturing global information;
step 2.2, globally guiding flow module
2, 4 and 8 times of pyramid pooling characteristics sampled are respectively added during each horizontal connection in the characteristic pyramid top-down process;
step 3, firstly, performing 2, 4 and 8 times of down sampling on the fused features, then performing average pooling, then performing corresponding times of up sampling and integrating together, and then performing convolution with a convolution kernel with the size of 3 x 3; the feature integration module is introduced into the centrnet.
3. The feature fusion based helmet wearing detection convolutional network generating method of claim 2,
adding a feature integration module on the basis of the step 2, firstly performing feature integration on Conv-5, then fusing with Conv-4, performing feature integration on fused features, wherein the third layer is similar to the second layer in structure, and integrating the fused features of each layer to obtain final fused features; the key point, center point deviation and target size are predicted separately for three branches.
4. The helmet wearing detection convolutional network training method based on feature fusion of claim 3, comprising a first stage of forward propagation and a second stage of backward propagation:
(1) the initial weights of the network were taken as the weights obtained from the original centrnet trained on the COCO dataset.
(2) The forward propagation of the input image, modified centret network, yields the keypoint thermodynamic diagram, the center point bias, and the size of the target.
(3) Calculating an error between the predicted value and the target value; the error function is divided into three parts:
Figure FDA0002682426350000021
Figure FDA0002682426350000022
Figure FDA0002682426350000023
L=Lk1Lsize2Loff (I.4)
the formula I.1 represents a key point classification loss function, and a focus loss function is adopted to solve the problem of unbalance of positive and negative samples during training;
Figure FDA0002682426350000024
which represents the detected center point of the image,
Figure FDA0002682426350000025
representing a background; the key points of the truth value are distributed to the thermodynamic diagram through the Gaussian function
Figure FDA0002682426350000026
In the above, N represents the number of targets in the image, and alpha and beta are hyper-parameters of the loss function and are respectively 2 and 4; equation I.2 represents the target size penalty, using the L1 penalty, assuming
Figure FDA0002682426350000031
Is the coordinates of the kth target bounding box, then
Figure FDA0002682426350000032
Is the coordinate of the center point of the kth target,
Figure FDA0002682426350000033
the size of the k-th object is indicated,
Figure FDA0002682426350000034
is the predicted target size; equation i.3 represents the center point bias loss, p is the position of the center point in the input image,
Figure FDA0002682426350000035
representing the position of the center point p after R-fold down-sampling,
Figure FDA0002682426350000036
indicating that the center point position, rounded after down-sampling, will have a bias,
Figure FDA0002682426350000037
for predicted center point deviation formula I.4 is the total loss, λ1、λ2Are the specific gravity of different loss functions.
5. The helmet wearing detection convolutional network training method based on feature fusion of claim 4, wherein a gradient descent method is adopted to continuously adjust the network weight; the learning rate adjustment weight of 1.25e-4 is adopted, the iteration times are 200, the learning rate attenuation step length is 90 and 120, and the learning rate attenuation factor is 0.1.
6. A safety helmet wearing detection method based on feature fusion as claimed in claim 4 or 5, characterized in that, a key point thermodynamic diagram is obtained through the prediction of the convolution network, all corresponding points are compared with the adjacent point I, if the response value of the point is greater than or equal to the point I, all peak points meeting the condition are retained; order to
Figure FDA0002682426350000038
Is a set of peak points; the final target bounding box is
Figure FDA0002682426350000039
Wherein
Figure FDA00026824263500000310
In order to predict the deviation of the center point,
Figure FDA00026824263500000311
is the predicted target size; and drawing the predicted coordinate values in the image in a frame form, and displaying the corresponding categories.
7. A storage medium storing the method for detecting wearing of a helmet based on feature fusion according to claim 6.
8. A memory device, comprising:
a memory;
one or more processors, and
one or more programs stored in the memory and configured to be executed by the one or more processors, the programs when executed by the processors implementing the feature fusion based headgear wear detection method of claim 6.
CN202010966231.3A 2020-09-15 2020-09-15 Feature fusion-based safety helmet wearing convolution network, training and detection method Active CN112070043B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010966231.3A CN112070043B (en) 2020-09-15 2020-09-15 Feature fusion-based safety helmet wearing convolution network, training and detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010966231.3A CN112070043B (en) 2020-09-15 2020-09-15 Feature fusion-based safety helmet wearing convolution network, training and detection method

Publications (2)

Publication Number Publication Date
CN112070043A true CN112070043A (en) 2020-12-11
CN112070043B CN112070043B (en) 2023-11-10

Family

ID=73695749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010966231.3A Active CN112070043B (en) 2020-09-15 2020-09-15 Feature fusion-based safety helmet wearing convolution network, training and detection method

Country Status (1)

Country Link
CN (1) CN112070043B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949730A (en) * 2021-03-11 2021-06-11 江苏禹空间科技有限公司 Method, device, storage medium and equipment for detecting target with few samples
CN113128413A (en) * 2021-04-22 2021-07-16 广州织点智能科技有限公司 Face detection model training method, face detection method and related device thereof
CN113177486A (en) * 2021-04-30 2021-07-27 重庆师范大学 Dragonfly order insect identification method based on regional suggestion network
CN113222904A (en) * 2021-04-21 2021-08-06 重庆邮电大学 Concrete pavement crack detection method for improving PoolNet network structure
CN113222990A (en) * 2021-06-11 2021-08-06 青岛高重信息科技有限公司 Chip counting method based on image data enhancement
CN113643235A (en) * 2021-07-07 2021-11-12 青岛高重信息科技有限公司 Chip counting method based on deep learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101725271A (en) * 2009-11-17 2010-06-09 江苏南通三建集团有限公司 Rapid construction method of reinforced concrete chimney
CN109034215A (en) * 2018-07-09 2018-12-18 东北大学 A kind of safety cap wearing detection method based on depth convolutional neural networks
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109376673A (en) * 2018-10-31 2019-02-22 南京工业大学 A kind of coal mine down-hole personnel unsafe acts recognition methods based on human body attitude estimation
CN110119686A (en) * 2019-04-17 2019-08-13 电子科技大学 A kind of safety cap real-time detection method based on convolutional neural networks
CN110728223A (en) * 2019-10-08 2020-01-24 济南东朔微电子有限公司 Helmet wearing identification method based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101725271A (en) * 2009-11-17 2010-06-09 江苏南通三建集团有限公司 Rapid construction method of reinforced concrete chimney
CN109034215A (en) * 2018-07-09 2018-12-18 东北大学 A kind of safety cap wearing detection method based on depth convolutional neural networks
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109376673A (en) * 2018-10-31 2019-02-22 南京工业大学 A kind of coal mine down-hole personnel unsafe acts recognition methods based on human body attitude estimation
CN110119686A (en) * 2019-04-17 2019-08-13 电子科技大学 A kind of safety cap real-time detection method based on convolutional neural networks
CN110728223A (en) * 2019-10-08 2020-01-24 济南东朔微电子有限公司 Helmet wearing identification method based on deep learning

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949730A (en) * 2021-03-11 2021-06-11 江苏禹空间科技有限公司 Method, device, storage medium and equipment for detecting target with few samples
CN112949730B (en) * 2021-03-11 2024-04-09 无锡禹空间智能科技有限公司 Method, device, storage medium and equipment for detecting target with few samples
CN113222904A (en) * 2021-04-21 2021-08-06 重庆邮电大学 Concrete pavement crack detection method for improving PoolNet network structure
CN113128413A (en) * 2021-04-22 2021-07-16 广州织点智能科技有限公司 Face detection model training method, face detection method and related device thereof
CN113177486A (en) * 2021-04-30 2021-07-27 重庆师范大学 Dragonfly order insect identification method based on regional suggestion network
CN113177486B (en) * 2021-04-30 2022-06-03 重庆师范大学 Dragonfly order insect identification method based on regional suggestion network
CN113222990A (en) * 2021-06-11 2021-08-06 青岛高重信息科技有限公司 Chip counting method based on image data enhancement
CN113222990B (en) * 2021-06-11 2023-03-14 青岛高重信息科技有限公司 Chip counting method based on image data enhancement
CN113643235A (en) * 2021-07-07 2021-11-12 青岛高重信息科技有限公司 Chip counting method based on deep learning
CN113643235B (en) * 2021-07-07 2023-12-29 青岛高重信息科技有限公司 Chip counting method based on deep learning

Also Published As

Publication number Publication date
CN112070043B (en) 2023-11-10

Similar Documents

Publication Publication Date Title
CN112070043B (en) Feature fusion-based safety helmet wearing convolution network, training and detection method
CN110502965B (en) Construction safety helmet wearing monitoring method based on computer vision human body posture estimation
Shen et al. Detecting safety helmet wearing on construction sites with bounding‐box regression and deep transfer learning
Han et al. Deep learning-based workers safety helmet wearing detection on construction sites using multi-scale features
CN111126325B (en) Intelligent personnel security identification statistical method based on video
Su et al. RCAG-Net: Residual channelwise attention gate network for hot spot defect detection of photovoltaic farms
CN102163290B (en) Method for modeling abnormal events in multi-visual angle video monitoring based on temporal-spatial correlation information
Shi et al. Real-time traffic light detection with adaptive background suppression filter
CN103530638B (en) Method for pedestrian matching under multi-cam
CN112149512A (en) Helmet wearing identification method based on two-stage deep learning
CN110728252B (en) Face detection method applied to regional personnel motion trail monitoring
CN113139437B (en) Helmet wearing inspection method based on YOLOv3 algorithm
CN112287827A (en) Complex environment pedestrian mask wearing detection method and system based on intelligent lamp pole
JP2014093023A (en) Object detection device, object detection method and program
CN107688830A (en) It is a kind of for case string and show survey visual information association figure layer generation method
CN113033315A (en) Rare earth mining high-resolution image identification and positioning method
CN106570471A (en) Scale adaptive multi-attitude face tracking method based on compressive tracking algorithm
CN115512387A (en) Construction site safety helmet wearing detection method based on improved YOLOV5 model
Dai et al. Real-time safety helmet detection system based on improved SSD
CN117197676A (en) Target detection and identification method based on feature fusion
CN112785564B (en) Pedestrian detection tracking system and method based on mechanical arm
CN112541403B (en) Indoor personnel falling detection method by utilizing infrared camera
CN110929711B (en) Method for automatically associating identity information and shape information applied to fixed scene
Peng et al. [Retracted] Helmet Wearing Recognition of Construction Workers Using Convolutional Neural Network
CN115273150A (en) Novel identification method and system for wearing safety helmet based on human body posture estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant