CN113033375A - Face and mask detection method, system, equipment and medium based on YOLOV3 - Google Patents

Face and mask detection method, system, equipment and medium based on YOLOV3 Download PDF

Info

Publication number
CN113033375A
CN113033375A CN202110303335.0A CN202110303335A CN113033375A CN 113033375 A CN113033375 A CN 113033375A CN 202110303335 A CN202110303335 A CN 202110303335A CN 113033375 A CN113033375 A CN 113033375A
Authority
CN
China
Prior art keywords
data
yolov3
algorithm model
face
mask detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110303335.0A
Other languages
Chinese (zh)
Inventor
王健
林浪
王宋凌
张海彬
刘诗伟
王柏芝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Institute Of Software Engineering Gu
Original Assignee
South China Institute Of Software Engineering Gu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Institute Of Software Engineering Gu filed Critical South China Institute Of Software Engineering Gu
Priority to CN202110303335.0A priority Critical patent/CN113033375A/en
Publication of CN113033375A publication Critical patent/CN113033375A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a face mask detection method, a face mask detection system, face mask detection equipment and face mask detection media based on YOLOV3, wherein the method comprises the steps of inputting data to be detected into a target YOLOV3 algorithm model, and performing feature extraction through a DarkNet53 feature extraction network to obtain feature data in different formats; the data to be detected is a face image with a uniform format, and the target Yolov3 algorithm model is a Yolov3 algorithm model combined with an IOU Loss function; inputting the feature data into a feature fusion layer, and performing feature fusion through convolution and long sampling; and carrying out convolution operation on the data after the characteristic fusion through an output layer, and detecting a bounding box of the face image to obtain a face mask detection result. The face mask detection method based on the YOLOV3 has the advantages of high detection speed and high result accuracy.

Description

Face and mask detection method, system, equipment and medium based on YOLOV3
Technical Field
The invention relates to the technical field of face mask detection, in particular to a face mask detection method, a face mask detection system, face mask detection equipment and a face mask detection medium based on YOLOV 3.
Background
At present, wearing the mask to go out becomes the basic requirement of daily trip. When entering various public places, the user can pass the security check only by wearing the mask. Aiming at the detection of the face mask, the prior art mainly checks one by one through manual detection, but the mode is easy to miss detection, consumes manpower and material resources, has low efficiency and also has great potential safety hazard.
Disclosure of Invention
The invention aims to provide a face mask detection method, a face mask detection system, face mask detection equipment and a face mask detection medium based on YOLOV3, and the face mask detection method solves the problems that in the prior art, the face mask detection efficiency is low and the accuracy cannot be guaranteed by improving a YOLOV3 algorithm structure.
In order to overcome the defects in the prior art, the invention provides a face mask detection method based on Yolov3, which comprises the following steps:
inputting the data to be detected into a target YOLOV3 algorithm model, and performing feature extraction through a DarkNet53 feature extraction network to obtain feature data with different formats; the data to be detected is a face image with a uniform format, and the target Yolov3 algorithm model is a Yolov3 algorithm model combined with an IOU Loss function;
inputting the feature data into a feature fusion layer, and performing feature fusion through convolution and long sampling;
and carrying out convolution operation on the data after the characteristic fusion through an output layer, and detecting a bounding box of the face image to obtain a face mask detection result.
Further, before the inputting the data to be tested into the target YOLOV3 algorithm model, the method further includes:
performing detection frame regression analysis by using an IOU Loss function according to a YOLOV3 algorithm model to generate an initial YOLOV3 algorithm model;
and adjusting the learning rate, the number of training iteration rounds and the number of training data set samples of the initial Yolov3 algorithm model to obtain a target Yolov3 algorithm model.
Further, the adjusting the learning rate of the initial YOLOV3 algorithm model includes: fitting the learning rate by an optimizer using a decay strategy.
Further, the number of training iteration rounds includes 270 times.
Further, before the inputting the data to be tested into the target YOLOV3 algorithm model, the method further includes:
the method comprises the steps of collecting a face image data set, and carrying out labeling, duplicate removal, data cleaning and normalization processing on the face image data set to obtain a face image with a uniform format.
Further, the normalization processing method comprises a Z-score normalization method.
Further, the face image dataset is acquired using image detection or liveness detection.
The invention also provides a face mask detection system based on Yolov3, which comprises:
the characteristic extraction unit is used for inputting the data to be detected into a target YOLOV3 algorithm model, and extracting characteristics through a DarkNet53 characteristic extraction network to obtain characteristic data in different formats; the data to be detected is a face image with a uniform format, and the target Yolov3 algorithm model is a Yolov3 algorithm model combined with an IOU Loss function;
the characteristic fusion unit is used for inputting the characteristic data into a characteristic fusion layer and carrying out characteristic fusion through convolution and long sampling;
and the detection unit is used for performing convolution operation on the data after the characteristic fusion through an output layer and detecting a bounding box of the face image to obtain a face mask detection result.
The present invention also provides a computer terminal device, comprising:
one or more processors;
a memory coupled to the processor for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the YOLOV 3-based face mask detection method as described in any one of the above.
The invention also provides a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the YOLOV 3-based face mask detection method as described in any one of the above.
Compared with the prior art, the invention has the beneficial effects that:
based on the Yolov3 algorithm and combined with the IOU Loss function, the accuracy of the data set can be correspondingly controlled in the detection process, so that the value of the Loss function is reduced, the overall learning effect is finally improved, and the working efficiency of detecting whether the face wears the mask and the accuracy of the detection result are further improved.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a face mask detection method based on YOLOV3 according to an embodiment of the present invention;
FIG. 2 is a grid structure diagram of YOLOV3 according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating an image detection method according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a variation process of a learning rate according to an embodiment of the present invention;
FIG. 5 is a diagram of a service architecture under a flutter according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a face mask detection system based on YOLOV3 according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be understood that the step numbers used herein are for convenience of description only and are not intended as limitations on the order in which the steps are performed.
It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
The terms "comprises" and "comprising" indicate the presence of the described features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The term "and/or" refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1, an embodiment of the present invention provides a face mask detection method based on YOLOV3, including:
s10, inputting the data to be tested into a target YOLOV3 algorithm model, and performing feature extraction through a DarkNet53 feature extraction network to obtain feature data in different formats; the data to be detected is a face image with a uniform format, and the target Yolov3 algorithm model is a Yolov3 algorithm model combined with an IOU Loss function;
in this embodiment, it should be noted that the face and mask detection is a technology for determining whether a face wears a mask according to facial features of a person, and the face and mask detection technology is implemented by collecting a face worn with a mask and a face not worn with a mask and integrating the faces into a data set, training the face detection at a mobile terminal and other devices by using a related algorithm, and determining whether the face wears the mask.
Specifically, in step S10, the pre-acquired data to be measured is mainly input into the target YOLOV3 algorithm model to perform the first layer of operation. Before explaining a target YOLOV3 algorithm model, firstly, a model of an organizational structure and a loss function adopted by the model is explained, as shown in fig. 2, YOLOV3 divides an input image into S × S grids, namely, the input image is mapped into a N × N grid format, each input image is taken as a coordinate axis, and coordinates of the grids are marked, namely, position information and types of all objects in the image can be inferred only by scanning the image. Each trellis predicts B bounding boxes, each of which predicts the probability of Location (x, y, w, h), Confidence Score and C categories, so the number of channels of the output layer of Yolov3 is B (5+ C). The loss function of YOLOV3 also has three components, Location error, Confidence error, and classification error.
Further, the network structure of the YOLOV3 is composed of a basic feature extraction network, a multi-scale feature fusion layer and an output layer, wherein the YOLOV3 uses the DarkNet53 as the feature extraction network, the DarkNet53 basically adopts a full convolution network, the pooling layer is replaced by convolution operation with the step length of 2, and simultaneously, a Residual unit is added, so that gradient dispersion is avoided when the number of network layers is too deep.
Further, the loss function is crucial to the self-learning effect of YOLOV3 model, and it should be noted that the loss function of YOLOV3 is composed of five parts:
Figure BDA0002987135650000051
wherein, (1) is called the loss error calculation of the center point of the rectangular frame,
Figure BDA0002987135650000052
is the coordinates of the center of the rectangular box that we predict,
Figure BDA0002987135650000053
is to mark the center coordinates of the rectangular frame, and
Figure BDA0002987135650000054
representing the responsibility of a rectangular frame, if the responsibility is 1, detecting an object target, and otherwise, 0;
(2) then for predicting the frame height error
Figure BDA0002987135650000055
To indicate the rectangular box height, size, and
Figure BDA0002987135650000056
calculating the sum of the width and the height of all the predicted rectangular frames and the sum of the width and the height of the marked rectangular frames as the size of the width and the height of the marked rectangular frames;
(3) and (4) then for the prediction box confidence loss calculation,
Figure BDA0002987135650000057
indicating that the prediction box contains the probability score for the target object,
Figure BDA0002987135650000061
the value of the formal value is determined by whether the (i, j) th rectangular box fulfills the prediction responsibility, and the value of the formal value is 1 or 0;
(5) then it is the prediction box class loss for judgment
Figure BDA0002987135650000062
In the (i, j) th prediction frameIn the case of the probability of C,
Figure BDA0002987135650000063
the true value is also only 1 or 0.
It should be noted that the Loss function under YOLOV3 uses a smooth L1 Loss to perform a regression test on the test frame, and is divided into three parts: boundary box mean square error, confidence cross entropy and category cross entropy. Taking a bisection cross entropy as detection, depicting the distance between two probability distributions, and setting the two probabilities as p and q as the numerical value of the cross entropy is smaller and the two probability distributions are closer; p is the current distribution, q is the predicted distribution, and the cross entropy of q is represented by p, namely:
H(p,q)=-∑p(xi)logq(xi) (6)
this equation is also referred to as the loss function probability prediction equation of YOLOV 3. In the process of calculating through the YOLOV3 function, it is found that the value of mAp of target detection is about 70-90, and there is no more accurate value, so it is necessary to improve the correlation function under YOLOV3 to obtain the target YOLOV3, and the method mainly uses the IOU Loss function instead of smooth L1 Loss function as shown in table 1:
TABLE 1 loss function formula for IOU function
Figure BDA0002987135650000064
Figure BDA0002987135650000071
Under the set of algorithms, regression analysis is performed on a box consisting of 4 points of two frames as a whole, IoU of 2 frames is obtained, and then-ln (IoU) is obtained, and in actual use, a large number of IoU are often defined as IoU Loss 1-IoU. Wherein IoU is the ratio of the intersection and union of the real frame and the prediction frame, when they are completely overlapped, IoU is 1, and then for Loss, the smaller the Loss is, the better, it indicates that their overlap ratio is high, so the IoU Loss can be simply expressed as 1-IoU, so that the linear regression is certainly adjusted by using the two block diagrams.
Therefore, in this embodiment, the IOU Loss function is adopted to linearly modify the previous Loss function to obtain the target Yolov3 algorithm model, and the principle that the prior Loss function is still the interaction entropy is used, and the IOU ratio is also specified as [0,1 ]. Therefore, training and testing of multiple scales and multiple proportion images can be carried out, the IOU can have a lower loss value along with the increase of the number of iterations, objects in a prediction frame can be analyzed more accurately, the accuracy of a data set is controlled correspondingly, the value of a loss function is reduced, and the effect of integral learning can be improved.
S20, inputting the feature data into a feature fusion layer, and performing feature fusion through convolution and long sampling;
in this embodiment, to solve the problem that the previous YOLO version is not sensitive to small targets, YOLOV3 uses 3 feature maps with different scales for target detection, 13 × 13, 26 × 26, and 52 × 52, respectively, to detect three targets, i.e., large, medium, and small. The feature fusion layer selects three scale feature maps produced by DarkNet as input, and fuses the feature maps of all scales through a series of convolution layers and upsampling by using the idea of FPN (feature pyramid templates). It should be noted that the 3 different-scale feature maps of the present embodiment, that is, "13 × 13, 26 × 26, 52 × 52" is only a preferred way, and there may be other adaptive options in practical applications, and the present invention is not limited herein.
And S30, performing convolution operation on the data after feature fusion through an output layer, and detecting a bounding box of the face image to obtain a face mask detection result.
In this embodiment, the output layer also uses the full convolution structure, where the number of convolution kernels of the last convolution layer is 255: 3 × (80+4+1) ═ 255, 3 indicates that one grid cell contains 3 bounding boxes, 4 indicates 4 coordinate information of the frame, 1 indicates Confidence Score, and 80 indicates the probability of 80 classes in the COCO data set. It should be noted that 80 may be modified to the actual number of categories if another data set is used instead. After the convolution operation is finished, detecting a boundary frame of the face image, outputting an output result by using detection frames with different colors, marking confidence degrees of corresponding types and corresponding detection results, then performing image classification operation on the result with or without the mask, and finishing the detection process.
The embodiment of the invention is based on the YOLOV3 algorithm, is combined with the IOU Loss function, and can correspondingly control the accuracy of a data set in the detection process so as to reduce the value of the Loss function, finally improve the effect of integral learning, and further improve the working efficiency of detecting whether a face wears a mask and the accuracy of a detection result.
In a certain embodiment, before step S10, the method further includes acquiring a face image data set, and performing labeling, deduplication, data cleaning, and normalization processing on the face image data set to obtain a face image with a uniform format. In this embodiment, a process of constructing a face mask data set is mainly given:
the collection of the data set pictures is all obtained from a network, and the resolution is larger than or equal to 1920 x 1080. Finally, 7949 effective pictures are obtained through screening and de-weighting, the total marked number is 16635, wherein the marked number of the masks is 7024, and the marked number of the normal is 9611. Screening the collected pictures, deleting the pictures with low resolution and non-conformity, and performing duplicate removal processing on the pictures by using duplicate removal software. And (3) labeling the image in labelimg image labeling software, wherein the labeling items are divided into a mask and a nomask, the mask represents the face with the mask, and the nomask represents the face without the mask.
And then, performing data cleaning on the marked data. Data cleansing is the process of re-examining and verifying data with the aim of deleting duplicate information, correcting existing errors, and providing data consistency. Firstly, data duplication removal is carried out, and a DuplicatePhotoFinder software is used for deleting pictures with the same characteristics (characteristic duplication removal); and then, deleting the noise data, and deleting the noise data from the data after the characteristic de-duplication by a manual selection mode.
And finally, carrying out data normalization operation and unifying the data formats. Through the series of data processing operations, a data set with more effective lattice content can be obtained, and further the training efficiency and effect are greatly improved.
In one embodiment, the normalization process is performed by a Z-score normalization method, which normalizes the data by giving the mean and standard deviation of the raw data. The processed data are in accordance with the standard normal distribution, i.e. the mean value is 0, the standard deviation is 1, and the conversion function is:
Figure BDA0002987135650000091
where μ is the mean of all sample data and σ is the standard deviation of all sample data. The Z-score standardization method is used for processing the image to normalize the image to [ -1,1], is not limited to [0,1], enables the input result to have positive or negative, and can accelerate the training speed of the model.
In one embodiment, the mode of acquiring the face image includes image detection or living body detection, where the image detection refers to a process of acquiring a picture through a web page or locally, and acquiring a face position in the image through a face detection algorithm, attribute detection, and feature analysis to perform detection comparison. Firstly, whether a face exists in an image is detected, the position of the face is obtained, whether the face wears a mask is detected through attributes, and a detection result is output through a corresponding algorithm, wherein the specific flow is shown in fig. 3. The living body detection is different from the image detection, the living body detection acquires real-time pictures through a camera of the equipment, processes each frame of picture in real time, detects the face in the picture, extracts features and detects attributes, and outputs the detection result. Live body detection acquires the range which can be identified by equipment in real time, and the system can automatically identify and reflect the detection result as long as the range is within the range, so that the detection is more flexible than image detection, and the confidence level of whether the mask is worn or not can be displayed on a boundary frame, so that the detection result can be conveniently analyzed.
In addition, during the living body detection, the obtained dynamic image frames are not positioned very consistently, and the frames can be positioned accurately only if the detector is kept still. Therefore, further optimization and improvement of the algorithm are needed. For the situation that the video is continuously accessed to the model by using ajax, each frame is reacted once, and the continuous asynchronous request of Android can cause problems of a server and the model after a long time, the problem is solved.
In a certain embodiment, before step S10, the method further includes adjusting a learning rate, a number of training iteration rounds, and a number of training data set samples of the initial YOLOV3 algorithm model to obtain a target YOLOV3 algorithm model.
In the previous embodiments, it has been explained that the training effect is enhanced by replacing the loss function, here by optimizing the parameters of the model to assist the detection. Firstly, for the learning rate, an optimizer is used to implement an attenuation strategy, and a learning process of machine learning is gradually reduced from small-gradually increasing-fixed-attenuation fixed value-step by step in the training process, as shown in fig. 4, so that the learning process has an ascending trend, the generation of overfitting is prevented, the learning process is slowed down, the instability problem occurring in the learning process is reduced, and the oscillation phenomenon of the data training result is reduced when the loss function value is too large due to too fast learning.
In one embodiment, the number of training iteration rounds (num _ epochs) is preferably 270. The iteration round number is fixed for 270 times, so that the training effect and the training integrity are improved by using the improvement of the round number, the machine repeatedly scans the data sets, and the learning accuracy is improved.
In one embodiment, the training batch size is adjusted according to the number of data sets, and when the training batch size is adjusted to 32, the training effect can produce a good result and the gradient is moderate.
In one embodiment, supervised learning is used for training, and comparative analysis is performed on experimental results to illustrate the effect of the invention. Wherein, the experimental parameters are shown in table 2:
TABLE 2 Experimental parameters and values
Figure BDA0002987135650000101
By setting the parameters, the results of YOLOV3 training and YOLOV3 training in combination with IOU Loss were compared, and the results are shown in table 3:
TABLE 3 Yolov3 training and Yolov3 training results in combination with IOU Loss
Figure BDA0002987135650000102
Figure BDA0002987135650000111
As can be seen from the above table, the mAP of Yolov3 combined with IOU Loss is significantly improved, and the training time is also significantly shortened. In addition, by comparing the YOLOV3 algorithm with other algorithms, the results are shown in table 4:
TABLE 4 training results of four algorithms
Figure BDA0002987135650000112
From the above table, it can be seen that in the four algorithms, the accuracy, training speed, mAP effect and small object measurement accuracy of the Yolov3 algorithm combined with IOU Loss are all optimal.
Referring to fig. 5, in a certain embodiment, a service architecture diagram under the flute is provided, wherein a real-time video monitoring technology can be implemented by using the paddlel-Lite project in combination with an Android platform, and the real-time video playing and docking technology based on the Android is formed after model derivation generation is performed to set application data, so that an app application focusing on the video and real-time monitoring technology is formed, and the face mask detection method based on YOLOV3 provided by the present invention is implemented.
Referring to fig. 6, in an embodiment, a face mask detection system based on YOLOV3 is further provided, including:
the feature extraction unit 01 is used for inputting data to be detected into a target Yolov3 algorithm model, and performing feature extraction through a DarkNet53 feature extraction network to obtain feature data in different formats; the data to be detected is a face image with a uniform format, and the target Yolov3 algorithm model is a Yolov3 algorithm model combined with an IOU Loss function;
the feature fusion unit 02 is used for inputting the feature data into a feature fusion layer and performing feature fusion through convolution and long sampling;
and the detection unit 03 is configured to perform convolution operation on the data after feature fusion through an output layer, and detect a bounding box of the face image to obtain a face mask detection result.
In an embodiment, there is also provided a computer terminal device including:
one or more processors;
a memory coupled to the processor for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the YOLOV 3-based face mask detection method as described above.
The processor is used for controlling the overall operation of the computer terminal device so as to complete all or part of the steps of the face mask detection method based on the YOLOV 3. The memory is used to store various types of data to support the operation at the computer terminal device, which data may include, for example, instructions for any application or method operating on the computer terminal device, as well as application-related data. The Memory may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk.
The computer terminal Device may be implemented by one or more Application Specific 1 integrated circuits (AS 1C), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and is configured to perform the face mask detection method based on yol 3 according to any of the embodiments described above, and achieve technical effects consistent with the above methods.
In an embodiment, a computer readable storage medium is further provided, which includes program instructions, when executed by a processor, to implement the steps of the YOLOV 3-based face mask detection method according to any one of the above embodiments. For example, the computer readable storage medium may be the above memory including program instructions, which are executable by a processor of a computer terminal device to perform the face mask detection method based on YOLOV3 according to any of the above embodiments, and achieve the technical effects consistent with the above methods.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (10)

1. A face mask detection method based on YOLOV3 is characterized by comprising the following steps:
inputting the data to be detected into a target YOLOV3 algorithm model, and performing feature extraction through a DarkNet53 feature extraction network to obtain feature data with different formats; the data to be detected is a face image with a uniform format, and the target Yolov3 algorithm model is a Yolov3 algorithm model combined with an IOU Loss function;
inputting the feature data into a feature fusion layer, and performing feature fusion through convolution and long sampling;
and carrying out convolution operation on the data after the characteristic fusion through an output layer, and detecting a bounding box of the face image to obtain a face mask detection result.
2. The YOLOV 3-based face and mask detection method according to claim 1, wherein before inputting the data to be tested into the target YOLOV3 algorithm model, the method further comprises:
performing detection frame regression analysis by using an IOU Loss function according to a YOLOV3 algorithm model to generate an initial YOLOV3 algorithm model;
and adjusting the learning rate, the number of training iteration rounds and the number of training data set samples of the initial Yolov3 algorithm model to obtain a target Yolov3 algorithm model.
3. The YOLOV 3-based face mask detection method according to claim 2, wherein the adjusting the learning rate of the initial YOLOV3 algorithm model comprises: fitting the learning rate by an optimizer using a decay strategy.
4. The YOLOV 3-based face mask detection method according to claim 2, wherein the number of training iteration rounds comprises 270 times.
5. The YOLOV 3-based face and mask detection method according to claim 1, wherein before inputting the data to be tested into the target YOLOV3 algorithm model, the method further comprises:
the method comprises the steps of collecting a face image data set, and carrying out labeling, duplicate removal, data cleaning and normalization processing on the face image data set to obtain a face image with a uniform format.
6. The YOLOV 3-based face mask detection method according to claim 5, wherein the normalization process comprises a Z-score normalization method.
7. The YOLOV 3-based face mask detection method of claim 5, wherein the face image dataset is acquired using image detection or liveness detection.
8. A face and mouth mask detection system based on YOLOV3 is characterized by comprising:
the characteristic extraction unit is used for inputting the data to be detected into a target YOLOV3 algorithm model, and extracting characteristics through a DarkNet53 characteristic extraction network to obtain characteristic data in different formats; the data to be detected is a face image with a uniform format, and the target Yolov3 algorithm model is a Yolov3 algorithm model combined with an IOU Loss function;
the characteristic fusion unit is used for inputting the characteristic data into a characteristic fusion layer and carrying out characteristic fusion through convolution and long sampling;
and the detection unit is used for performing convolution operation on the data after the characteristic fusion through an output layer and detecting a bounding box of the face image to obtain a face mask detection result.
9. A computer terminal device, comprising:
one or more processors;
a memory coupled to the processor for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the YOLOV 3-based face mask detection method of any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the YOLOV 3-based face mask detection method according to any one of claims 1 to 7.
CN202110303335.0A 2021-03-22 2021-03-22 Face and mask detection method, system, equipment and medium based on YOLOV3 Pending CN113033375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110303335.0A CN113033375A (en) 2021-03-22 2021-03-22 Face and mask detection method, system, equipment and medium based on YOLOV3

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110303335.0A CN113033375A (en) 2021-03-22 2021-03-22 Face and mask detection method, system, equipment and medium based on YOLOV3

Publications (1)

Publication Number Publication Date
CN113033375A true CN113033375A (en) 2021-06-25

Family

ID=76472416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110303335.0A Pending CN113033375A (en) 2021-03-22 2021-03-22 Face and mask detection method, system, equipment and medium based on YOLOV3

Country Status (1)

Country Link
CN (1) CN113033375A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019153175A1 (en) * 2018-02-08 2019-08-15 国民技术股份有限公司 Machine learning-based occluded face recognition system and method, and storage medium
CN111291637A (en) * 2020-01-19 2020-06-16 中国科学院上海微系统与信息技术研究所 Face detection method, device and equipment based on convolutional neural network
CN112085010A (en) * 2020-10-28 2020-12-15 成都信息工程大学 Mask detection and deployment system and method based on image recognition
CN112183471A (en) * 2020-10-28 2021-01-05 西安交通大学 Automatic detection method and system for standard wearing of epidemic prevention mask of field personnel

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019153175A1 (en) * 2018-02-08 2019-08-15 国民技术股份有限公司 Machine learning-based occluded face recognition system and method, and storage medium
CN111291637A (en) * 2020-01-19 2020-06-16 中国科学院上海微系统与信息技术研究所 Face detection method, device and equipment based on convolutional neural network
CN112085010A (en) * 2020-10-28 2020-12-15 成都信息工程大学 Mask detection and deployment system and method based on image recognition
CN112183471A (en) * 2020-10-28 2021-01-05 西安交通大学 Automatic detection method and system for standard wearing of epidemic prevention mask of field personnel

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王艺皓等: "《复杂场景下基于改进YOLOv3的口罩佩戴检测算法》", 《计算机工程》, vol. 46, no. 11, pages 12 - 22 *

Similar Documents

Publication Publication Date Title
US11341626B2 (en) Method and apparatus for outputting information
CN110807429A (en) Construction safety detection method and system based on tiny-YOLOv3
WO2018108129A1 (en) Method and apparatus for use in identifying object type, and electronic device
CN109815770B (en) Two-dimensional code detection method, device and system
CN109948497B (en) Object detection method and device and electronic equipment
CN111291637A (en) Face detection method, device and equipment based on convolutional neural network
CN111523414A (en) Face recognition method and device, computer equipment and storage medium
CN110889446A (en) Face image recognition model training and face image recognition method and device
CN112419202B (en) Automatic wild animal image recognition system based on big data and deep learning
CN116843999B (en) Gas cylinder detection method in fire operation based on deep learning
CN111814905A (en) Target detection method, target detection device, computer equipment and storage medium
CN113111817A (en) Semantic segmentation face integrity measurement method, system, equipment and storage medium
CN111738164B (en) Pedestrian detection method based on deep learning
CN117495735B (en) Automatic building elevation texture repairing method and system based on structure guidance
CN112819821A (en) Cell nucleus image detection method
CN116416884A (en) Testing device and testing method for display module
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN117372424B (en) Defect detection method, device, equipment and storage medium
CN113240699B (en) Image processing method and device, model training method and device, and electronic equipment
Lin et al. Integrated circuit board object detection and image augmentation fusion model based on YOLO
CN112884721A (en) Anomaly detection method and system and computer readable storage medium
CN112037173A (en) Chromosome detection method and device and electronic equipment
CN113033375A (en) Face and mask detection method, system, equipment and medium based on YOLOV3
CN113052798A (en) Screen aging detection model training method and screen aging detection method
CN112184708B (en) Sperm survival rate detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination