CN113553936A - Mask wearing detection method based on improved YOLOv3 - Google Patents

Mask wearing detection method based on improved YOLOv3 Download PDF

Info

Publication number
CN113553936A
CN113553936A CN202110813607.1A CN202110813607A CN113553936A CN 113553936 A CN113553936 A CN 113553936A CN 202110813607 A CN202110813607 A CN 202110813607A CN 113553936 A CN113553936 A CN 113553936A
Authority
CN
China
Prior art keywords
mask
wearing detection
mask wearing
yolov3
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110813607.1A
Other languages
Chinese (zh)
Inventor
刘阳
李莉
彭娜
李冰雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei University of Engineering
Original Assignee
Hebei University of Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei University of Engineering filed Critical Hebei University of Engineering
Priority to CN202110813607.1A priority Critical patent/CN113553936A/en
Publication of CN113553936A publication Critical patent/CN113553936A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a mask wearing detection method based on improved YOLOv3, and belongs to the technical field of target detection. According to the method, firstly, a mask shielding face data set is obtained, then a mask wearing detection network based on YOLOv3 is constructed, then the mask wearing detection network is trained, an optimal model is selected, and therefore mask wearing detection is carried out on dense people by the optimal model. The invention utilizes a channel attention mechanism to enable the feature extraction network to have higher attention to the associated target area, and utilizes a K-means + + algorithm to perform cluster optimization on the mask data set, thereby improving the detection efficiency. In addition, the invention takes CIoU as a loss function optimization detection algorithm, which can reduce the loss function value and improve the regression effect of the bounding box.

Description

Mask wearing detection method based on improved YOLOv3
Technical Field
The invention relates to the technical field of target detection, in particular to a mask wearing detection method based on improved YOLOv 3.
Background
Since novel coronavirus pneumonia (COVID-19) was abused, third industries such as tourism and catering and labor-intensive enterprises are forced to delay the repeated production and rework, and the national economic development and the daily life of people are greatly influenced. Researches show that the novel coronavirus is mainly transmitted through droplets and aerosol, people are generally susceptible, and the possibility of large-scale aggregated infection outbreak exists at any time, so that the mask worn in public places is taken as a necessary means for controlling the normalized epidemic situation. In areas with dense groups of people, such as shopping malls and stations, a large amount of manpower resources are consumed and the efficiency is low through manual inspection of the wearing condition of the mask.
In recent years, a deep convolutional neural network has made a great progress in the field of target detection, and the algorithm thereof can be mainly divided into two stages and a single stage. The two-stage algorithm mainly comprises an R-CNN series, and the single-stage algorithm mainly comprises an SSD (Single shot detection) series, a YOLO (Young Look one) series. The two-stage algorithm firstly generates a target candidate frame, and then utilizes a convolutional neural network to perform feature extraction classification and bounding box regression, although the detection precision is excellent, the detection speed is slow, and real-time detection cannot be guaranteed. The single-stage algorithm treats the target detection as a single regression problem, realizes the target detection directly through a regression mode, has high calculation efficiency and can realize real-time detection. The YOLOv3 algorithm is superior in many algorithms due to the advantages of high speed, high precision, strong realizability and the like. However, when the YOLOv3 algorithm is directly used for detecting targets in certain specific scenes, the requirement for detection cannot be met, particularly, the detection of wearing the mask is difficult because the scenes are complex, the crowd is dense, the proportion of pedestrians in image pixels is small, and the difference of wearing the mask is small. Therefore, the method for realizing the real-time and efficient mask wearing detection has important significance.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a mask wearing detection method based on improved YOLOv3, which can realize automatic detection of the wearing condition of a mask of a person and has higher detection precision.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a mask wearing detection method based on improved YOLOv3 comprises the following steps:
step 1, acquiring a mask shielding face data set, carrying out classification marking and format conversion on the data set, and dividing the data set into a training set and a testing set;
step 2, constructing a mask wearing detection network based on YOLOv 3;
step 3, using the training set in the step 1, training the mask wearing detection network for multiple times, and adjusting the learning rate parameter of each training to make the loss function converge, wherein each training obtains a mask wearing detection model;
step 4, the plurality of mask wearing detection models obtained in the step 3 are respectively tested by using the test set in the step 1, the accuracy of each mask wearing detection model is recorded, and the optimal model is selected as the final mask wearing detection model;
and 5, carrying out mask wearing detection on the dense population by using the mask wearing detection model selected in the step 4.
Further, the specific manner of step 2 is as follows:
step 201, reconstructing a Yolov3 feature extraction network by using a multi-scale channel attention mechanism;
step 202, performing target anchor frame clustering on the data set;
step 203, optimize the loss function.
Further, the specific way of step 201 is to embed the sentet channel attention mechanism into the 5 residual error network structures of the backbone feature extraction network of YOLOv3, deeply mine the context of the target, emphasize useful detail information, suppress invalid interference information, and complete the reconstruction of the feature extraction network.
Further, in step 202, the K-means + + algorithm is used for optimizing the size of the anchor frame of the mask occlusion face data set, so that the detection efficiency is improved.
Further, in step 203, a bounding box regression is performed using the CIoU loss function, so as to improve the positioning accuracy.
As can be seen from the above description, the technical scheme of the invention has the beneficial effects that:
1. aiming at the problem of insufficient feature extraction capability of the original YOLOv3, the method utilizes a channel attention mechanism to enable the feature extraction network to have higher attention to the associated target area, so that the feature extraction capability of the network is improved.
2. Aiming at the problem that the target size in the mask data set is small and the prior frame of the public data set is not suitable any more, the mask data set is subjected to cluster optimization by using a K-means + + algorithm, and the most appropriate anchor frame size is selected, so that the detection effect is optimized, the model convergence speed is accelerated, and the detection efficiency can be improved.
3. Aiming at the problems that the evaluation standard IoU (Intersection over Union) of the detection effect of the original YOLOv3 algorithm is insensitive to the target object scale and cannot accurately reflect the overlapping condition of a prediction frame and a real frame, the invention takes CIoU (Complete-IoU) as a loss function optimization detection algorithm, so that the loss function value can be reduced and the regression effect of the boundary frame can be improved.
In a word, the three measures are adopted, so that the detection precision of the mask wearing detection task can be improved in the scene of dense people.
Drawings
In order to more clearly describe this patent, one or more of the following figures are provided.
FIG. 1 is a diagram of SEnet (signature compression and Excitation Networks) architecture.
FIG. 2 is a diagram of SE-Res structure.
Fig. 3 is a schematic diagram of a mask wearing detection model according to an embodiment of the present invention.
Fig. 4 is a graph of visual clustering results of RMFD data sets.
Detailed Description
In order to facilitate the understanding of the technical solutions of the present patent by those skilled in the art, the technical solutions of the present patent are further described in the following specific cases.
A mask wearing detection method based on improved YOLOv3 comprises the following steps:
step 1: and acquiring an open mask shielded Face data set RMFD (Real-World Masked Face Dataset), carrying out classification marking and format conversion on the data set, and dividing the data set into a training set and a testing set.
Step 2: the mask wearing detection network is constructed in the following specific mode:
step 201: a multi-scale channel attention mechanism SENet structure was built as shown in figure 1. In the figure, the C ' W ' H ' feature layers X are subjected to a switching operation FtrObtaining C characteristic layers U of W and H, and realizing the process as shown in the formula (1):
Figure BDA0003169118290000031
in the formula ucRepresenting the c-th two-dimensional matrix, v, in the feature UcRepresenting the c-th convolution kernel and Xs representing the s-th input.
The Squeeze compression operation is to compress the width W and height H of each feature layer after obtaining U, using a global average pooling operation, so that C feature layers are converted into a 1 × 1 × C array. As shown in formula (2):
Figure BDA0003169118290000032
in the formula uc(i, j) represents the matrix ucRow i and column j.
The Excitation operation is used to capture the channel dependence in its entirety, as shown in equation (3):
s=Fex(z,W)=σ(g(z,W))=σ(w2δ(w1z)) (3)
expression (3) represents the nonlinear interaction relationship between learning channels, wherein sigma and delta are respectively a Sigmoid activation function and a ReLU function, and w1To reduce the dimensional parameter, w2For the upscaled parameter, s is the weight of each channel.
Finally, Scale operation is carried out by the following formula (4) to obtain final output:
Figure BDA0003169118290000033
in the formula, scIs the weight of the c-th two-dimensional matrix.
The Squeeze operation in SENet compresses an input feature map into a channel-based one-dimensional vector by using global average pooling, so that a global receptive field is obtained, a receptive area is wider, two full-connection layers are connected, the dependency between channels is learned by an Excitation operation while the parameter quantity is reduced, then the weight value between the channels is fixed between 0 and 1 through a Sigmoid activation function, and finally the input feature is multiplied by the weight value to obtain the final output. The SEnet structure is embedded into a Residual structure (Residual), and the construction of the SE-Res module is completed, as shown in FIG. 2. And finally, replacing 5 residual structures in YOLOv3 with an SE-Res structure to complete the network structure of the mask wearing detection model, as shown in FIG. 3. Inputting 416 x 146 pictures into a network, entering 5 SE-Res layers for feature extraction after DBL (convolution, standardization and activation function) initialization, taking the last three feature layers for multi-scale feature fusion of a feature enhancement network, and finally obtaining prediction frames of three scales through a prediction layer.
Step 202: and carrying out target anchor frame clustering on the RMFD data set. The target detection network based on the anchor frame needs reasonable anchor frame setting, and if the size of the anchor frame is not consistent with the size of a target, the number of positive samples can be greatly reduced, so that a large number of missed detection and false detection situations occur. YOLOv3 adopts a K-means algorithm to cluster targets in the data set to obtain prior frames with 9 sizes, the prior frames are distributed to 3 different detection layers, the RMFD mask shields the targets in the face data set to be smaller, and the prior frames of the public data set are not suitable any more. The K-means algorithm initialization clustering center is randomly selected from the samples, and the selection of the clustering center has great influence on the clustering result and the running time. The K-means + + algorithm is improved in the aspect of random selection, when the clustering centers are initialized, the distance between the clustering centers is increased as far as possible, and the inter-cluster distance is increased, so that the global optimum is achieved. In order to optimize the detection effect, the dimension and the width and the height of the mask data set are re-optimized and clustered by using a K-means + + clustering algorithm, and the obtained visual clustering result of the RMFD data set is shown in FIG. 4. In fig. 4, the abscissa represents the width of the object, the ordinate represents the height of the object, and the triangle represents the cluster center.
Step 203: the loss function is optimized. The original YOLOv3 algorithm uses L2 norm loss to calculate the regression loss of the bounding box position coordinates, but using IoU as the evaluation criterion of the target detection effect cannot truly reflect the overlapping condition of the prediction box and the real box. In order to solve the problems, a CIoU is introduced as a loss function, the CIoU takes the scale, distance, overlapping rate and punishment items between the target and the anchor frame into consideration, the problems of divergence and the like in the training process like IoU are avoided, and the regression of the target frame becomes more stable. CIoU is represented by formula (5):
Figure BDA0003169118290000041
wherein, b and bgtRespectively representing the center points, p, of the prediction and real boxes2(b,bgt) C represents the diagonal distance of the minimum closure area which can contain the prediction frame and the real frame at the same time.
The calculation of α and v is shown in equations (6) and (7):
Figure BDA0003169118290000051
Figure BDA0003169118290000052
in the formula, Wgt、HgtRespectively, the width and height of the prediction box, and W, H, respectively, the width and height of the real box.
The loss function of CIoU is shown in equation (8):
Figure BDA0003169118290000053
and step 3: and (3) performing data enhancement on the training set in the step (1) in the modes of rotation, translation, brightness adjustment contrast, random cutting and the like, increasing the diversity of images, enabling the network to have a stronger generalization effect, improving the robustness of the model, and inputting the images into a mask wearing detection network for training. The network was optimized using an Adam optimizer with an initial learning rate set to 0.001, with every 30 epochs (1 epoch equals all samples in the training set trained once), the learning rate becoming 0.1 as it was, and the batch size being 12. And training 150 epochs in total to optimize the convergence of the loss function. Obtaining a mask wearing detection model for each epoch, and obtaining 150 mask wearing detection models in total;
and 4, step 4: detecting 150 mask wearing models by using the test set in the step 1, recording the accuracy of models under different learning rate parameters, and selecting an optimal model as a final mask wearing detection model;
and 5: and (4) carrying out mask wearing detection on the intensive population by using the mask wearing detection model obtained in the step (4).
It should be noted that the above embodiments are only specific examples of the implementation schemes of this patent, and do not cover all the implementation schemes of this patent, and therefore, the scope of protection of this patent cannot be considered as limited; all the implementations which belong to the same concept as the above cases or the combination of the above schemes are within the protection scope of the patent.

Claims (5)

1. A mask wearing detection method based on improved YOLOv3 is characterized by comprising the following steps:
step 1, acquiring a mask shielding face data set, carrying out classification marking and format conversion on the data set, and dividing the data set into a training set and a testing set;
step 2, constructing a mask wearing detection network based on YOLOv 3;
step 3, using the training set in the step 1, training the mask wearing detection network for multiple times, and adjusting the learning rate parameter of each training to make the loss function converge, wherein each training obtains a mask wearing detection model;
step 4, the plurality of mask wearing detection models obtained in the step 3 are respectively tested by using the test set in the step 1, the accuracy of each mask wearing detection model is recorded, and the optimal model is selected as the final mask wearing detection model;
and 5, carrying out mask wearing detection on the dense population by using the mask wearing detection model selected in the step 4.
2. The mask wearing detection method based on the improved YOLOv3 as claimed in claim 1, wherein the specific mode of step 2 is as follows:
step 201, reconstructing a Yolov3 feature extraction network by using a multi-scale channel attention mechanism;
step 202, performing target anchor frame clustering on the data set;
step 203, optimize the loss function.
3. The mask wearing detection method based on improved YOLOv3 as claimed in claim 2, wherein the specific way of step 201 is to embed a sentet channel attention mechanism into 5 residual network structures of a backbone feature extraction network of YOLOv3, deeply mine the context of the target, emphasize useful detail information, suppress ineffective interference information, and complete the reconstruction of the feature extraction network.
4. The method for detecting whether a user wears a mask according to claim 2, wherein the mask wearing detection method is based on improved YOLOv3, and in step 202, the mask occlusion face data set is optimized in the size of an anchor frame by using a K-means + + algorithm, so that the detection efficiency is improved.
5. The method for detecting whether a mask is worn based on improved YOLOv3 of claim 2, wherein in step 203, a CIoU loss function is used to perform a bounding box regression, thereby improving the positioning accuracy.
CN202110813607.1A 2021-07-19 2021-07-19 Mask wearing detection method based on improved YOLOv3 Pending CN113553936A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110813607.1A CN113553936A (en) 2021-07-19 2021-07-19 Mask wearing detection method based on improved YOLOv3

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110813607.1A CN113553936A (en) 2021-07-19 2021-07-19 Mask wearing detection method based on improved YOLOv3

Publications (1)

Publication Number Publication Date
CN113553936A true CN113553936A (en) 2021-10-26

Family

ID=78132030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110813607.1A Pending CN113553936A (en) 2021-07-19 2021-07-19 Mask wearing detection method based on improved YOLOv3

Country Status (1)

Country Link
CN (1) CN113553936A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114241548A (en) * 2021-11-22 2022-03-25 电子科技大学 Small target detection algorithm based on improved YOLOv5

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214399A (en) * 2018-10-12 2019-01-15 清华大学深圳研究生院 A kind of improvement YOLOV3 Target Recognition Algorithms being embedded in SENet structure
CN111062282A (en) * 2019-12-05 2020-04-24 武汉科技大学 Transformer substation pointer type instrument identification method based on improved YOLOV3 model
CN111507199A (en) * 2020-03-25 2020-08-07 杭州电子科技大学 Method and device for detecting mask wearing behavior
CN112215188A (en) * 2020-10-21 2021-01-12 平安国际智慧城市科技股份有限公司 Traffic police gesture recognition method, device, equipment and storage medium
CN112270341A (en) * 2020-10-15 2021-01-26 西安工程大学 Mask detection method integrating transfer learning and deep learning
CN112949572A (en) * 2021-03-26 2021-06-11 重庆邮电大学 Slim-YOLOv 3-based mask wearing condition detection method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214399A (en) * 2018-10-12 2019-01-15 清华大学深圳研究生院 A kind of improvement YOLOV3 Target Recognition Algorithms being embedded in SENet structure
CN111062282A (en) * 2019-12-05 2020-04-24 武汉科技大学 Transformer substation pointer type instrument identification method based on improved YOLOV3 model
CN111507199A (en) * 2020-03-25 2020-08-07 杭州电子科技大学 Method and device for detecting mask wearing behavior
CN112270341A (en) * 2020-10-15 2021-01-26 西安工程大学 Mask detection method integrating transfer learning and deep learning
CN112215188A (en) * 2020-10-21 2021-01-12 平安国际智慧城市科技股份有限公司 Traffic police gesture recognition method, device, equipment and storage medium
CN112949572A (en) * 2021-03-26 2021-06-11 重庆邮电大学 Slim-YOLOv 3-based mask wearing condition detection method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曹城硕 等: "基于YOLO-Mask算法的口罩佩戴检测方法", 《激光与光电子学进展研》 *
王艺皓 等: "复杂场景下基于改进YOLOv3的口罩佩戴检测算法", 《计算机工程》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114241548A (en) * 2021-11-22 2022-03-25 电子科技大学 Small target detection algorithm based on improved YOLOv5

Similar Documents

Publication Publication Date Title
CN112733749B (en) Real-time pedestrian detection method integrating attention mechanism
CN112966684B (en) Cooperative learning character recognition method under attention mechanism
Yuan et al. Gated CNN: Integrating multi-scale feature layers for object detection
CN105512289B (en) Image search method based on deep learning and Hash
CN111310861A (en) License plate recognition and positioning method based on deep neural network
CN110889449A (en) Edge-enhanced multi-scale remote sensing image building semantic feature extraction method
CN111753828B (en) Natural scene horizontal character detection method based on deep convolutional neural network
CN111275688A (en) Small target detection method based on context feature fusion screening of attention mechanism
CN108805070A (en) A kind of deep learning pedestrian detection method based on built-in terminal
CN112949673A (en) Feature fusion target detection and identification method based on global attention
CN103942557B (en) A kind of underground coal mine image pre-processing method
CN110503063A (en) Fall detection method based on hourglass convolution autocoding neural network
CN110991444A (en) Complex scene-oriented license plate recognition method and device
CN113569881A (en) Self-adaptive semantic segmentation method based on chain residual error and attention mechanism
CN115294563A (en) 3D point cloud analysis method and device based on Transformer and capable of enhancing local semantic learning ability
Zhou et al. Algorithm of Helmet Wearing Detection Based on AT-YOLO Deep Mode.
CN110334584A (en) A kind of gesture identification method based on the full convolutional network in region
CN109447014A (en) A kind of online behavioral value method of video based on binary channels convolutional neural networks
CN113743505A (en) Improved SSD target detection method based on self-attention and feature fusion
Yuan et al. Few-shot scene classification with multi-attention deepemd network in remote sensing
CN112883931A (en) Real-time true and false motion judgment method based on long and short term memory network
CN109711442A (en) Unsupervised layer-by-layer generation fights character representation learning method
CN117079098A (en) Space small target detection method based on position coding
CN115830449A (en) Remote sensing target detection method with explicit contour guidance and spatial variation context enhancement
Hu et al. Deep learning for distinguishing computer generated images and natural images: A survey

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20211026

WD01 Invention patent application deemed withdrawn after publication