CN116370151A - Mask identification method, mask identification device, mask identification equipment and storage medium - Google Patents

Mask identification method, mask identification device, mask identification equipment and storage medium Download PDF

Info

Publication number
CN116370151A
CN116370151A CN202310077871.2A CN202310077871A CN116370151A CN 116370151 A CN116370151 A CN 116370151A CN 202310077871 A CN202310077871 A CN 202310077871A CN 116370151 A CN116370151 A CN 116370151A
Authority
CN
China
Prior art keywords
attribute
head
image
mask
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310077871.2A
Other languages
Chinese (zh)
Inventor
钟浩
熊超
牛昕宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Corerain Technologies Co Ltd
Original Assignee
Shenzhen Corerain Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Corerain Technologies Co Ltd filed Critical Shenzhen Corerain Technologies Co Ltd
Priority to CN202310077871.2A priority Critical patent/CN116370151A/en
Publication of CN116370151A publication Critical patent/CN116370151A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61FFILTERS IMPLANTABLE INTO BLOOD VESSELS; PROSTHESES; DEVICES PROVIDING PATENCY TO, OR PREVENTING COLLAPSING OF, TUBULAR STRUCTURES OF THE BODY, e.g. STENTS; ORTHOPAEDIC, NURSING OR CONTRACEPTIVE DEVICES; FOMENTATION; TREATMENT OR PROTECTION OF EYES OR EARS; BANDAGES, DRESSINGS OR ABSORBENT PADS; FIRST-AID KITS
    • A61F2/00Filters implantable into blood vessels; Prostheses, i.e. artificial substitutes or replacements for parts of the body; Appliances for connecting them with the body; Devices providing patency to, or preventing collapsing of, tubular structures of the body, e.g. stents
    • A61F2/02Prostheses implantable into the body
    • A61F2/24Heart valves ; Vascular valves, e.g. venous valves; Heart implants, e.g. passive devices for improving the function of the native valve or the heart muscle; Transmyocardial revascularisation [TMR] devices; Valves implantable in the body
    • A61F2/2412Heart valves ; Vascular valves, e.g. venous valves; Heart implants, e.g. passive devices for improving the function of the native valve or the heart muscle; Transmyocardial revascularisation [TMR] devices; Valves implantable in the body with soft flexible valve members, e.g. tissue valves shaped like natural valves
    • A61F2/2418Scaffolds therefor, e.g. support stents
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61FFILTERS IMPLANTABLE INTO BLOOD VESSELS; PROSTHESES; DEVICES PROVIDING PATENCY TO, OR PREVENTING COLLAPSING OF, TUBULAR STRUCTURES OF THE BODY, e.g. STENTS; ORTHOPAEDIC, NURSING OR CONTRACEPTIVE DEVICES; FOMENTATION; TREATMENT OR PROTECTION OF EYES OR EARS; BANDAGES, DRESSINGS OR ABSORBENT PADS; FIRST-AID KITS
    • A61F2/00Filters implantable into blood vessels; Prostheses, i.e. artificial substitutes or replacements for parts of the body; Appliances for connecting them with the body; Devices providing patency to, or preventing collapsing of, tubular structures of the body, e.g. stents
    • A61F2/02Prostheses implantable into the body
    • A61F2/24Heart valves ; Vascular valves, e.g. venous valves; Heart implants, e.g. passive devices for improving the function of the native valve or the heart muscle; Transmyocardial revascularisation [TMR] devices; Valves implantable in the body
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61FFILTERS IMPLANTABLE INTO BLOOD VESSELS; PROSTHESES; DEVICES PROVIDING PATENCY TO, OR PREVENTING COLLAPSING OF, TUBULAR STRUCTURES OF THE BODY, e.g. STENTS; ORTHOPAEDIC, NURSING OR CONTRACEPTIVE DEVICES; FOMENTATION; TREATMENT OR PROTECTION OF EYES OR EARS; BANDAGES, DRESSINGS OR ABSORBENT PADS; FIRST-AID KITS
    • A61F2/00Filters implantable into blood vessels; Prostheses, i.e. artificial substitutes or replacements for parts of the body; Appliances for connecting them with the body; Devices providing patency to, or preventing collapsing of, tubular structures of the body, e.g. stents
    • A61F2/02Prostheses implantable into the body
    • A61F2/24Heart valves ; Vascular valves, e.g. venous valves; Heart implants, e.g. passive devices for improving the function of the native valve or the heart muscle; Transmyocardial revascularisation [TMR] devices; Valves implantable in the body
    • A61F2/2442Annuloplasty rings or inserts for correcting the valve shape; Implants for improving the function of a native heart valve
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01DSEPARATION
    • B01D11/00Solvent extraction
    • B01D11/04Solvent extraction of solutions which are liquid

Landscapes

  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Transplantation (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Vascular Medicine (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Chemical & Material Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Collating Specific Patterns (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides a mask identification method, device, equipment and storage medium, wherein the method comprises the following steps: acquiring a pedestrian image frame to be detected, and identifying a head region image in the pedestrian image frame; generating head attribute information corresponding to the head region image under the condition that the head region image meets the detection size condition, wherein the head attribute information is at least used for representing the pose and definition of the head in the head region image; and if the head attribute information meets the preset attribute requirement, detecting the outline information of the target object in the head area image, and generating a mask recognition result based on the outline information of the target object. According to the technical scheme provided by one or more embodiments of the present disclosure, accuracy of mask recognition can be improved.

Description

Mask identification method, mask identification device, mask identification equipment and storage medium
Technical Field
The disclosure relates to the technical field of image processing, in particular to a mask identification method, a mask identification device, mask identification equipment and a storage medium.
Background
In some face recognition scenarios, it is often necessary to detect whether a pedestrian is wearing a mask as desired. At present, most mask identification methods adopt a target detection and classification method based on a human face, a human body or a human face image is detected from a monitoring picture through a target detection model, and then the image is divided into an image wearing a mask or an image not wearing the mask through a classification model, so that the mask identification process is completed.
However, in practical applications, the quality of the monitoring picture is generally not guaranteed. For example, the definition of the originally photographed monitoring picture is low, or pedestrians in the monitoring picture can be blocked from each other, so that the mask recognition result is not very accurate.
In view of this, there is a need for a more accurate mask identification method.
Disclosure of Invention
One or more embodiments of the present disclosure provide a mask recognition method, apparatus, device, and storage medium, which can improve the accuracy of mask recognition.
In one aspect, the disclosure provides a mask recognition method, which includes: acquiring a pedestrian image frame to be detected, and identifying a head region image in the pedestrian image frame; generating head attribute information corresponding to the head region image under the condition that the head region image meets the detection size condition, wherein the head attribute information is at least used for representing the pose and definition of the head in the head region image; and if the head attribute information meets the preset attribute requirement, detecting the outline information of the target object in the head area image, and generating a mask recognition result based on the outline information of the target object.
Another aspect of the present disclosure also provides a mask recognition device, the device including: a head region identification unit for acquiring a pedestrian image frame to be detected and identifying a head region image in the pedestrian image frame; the head attribute identification unit is used for generating head attribute information corresponding to the head area image under the condition that the head area image meets the detection size condition, wherein the head attribute information is at least used for representing the pose and definition of the head in the head area image; and the mask identification unit is used for detecting the outline information of the target object in the head region image if the head attribute information meets the preset attribute requirement and generating a mask identification result based on the outline information of the target object.
The disclosure also provides an electronic device, which comprises a memory and a processor, wherein the memory is used for storing a computer program, and the computer program is executed by the processor to realize the mask recognition method.
Another aspect of the present disclosure also provides a computer readable storage medium storing a computer program, which when executed by a processor, implements the mask recognition method described above.
According to the technical scheme provided by one or more embodiments of the present disclosure, after the image frame of the pedestrian to be detected is acquired, the head region image in the image frame can be identified. In the prior art, after capturing an image frame containing a face, the image frame is classified based on a classification model to determine whether the face in the image frame wears a mask. The detection mode in the prior art can ensure higher recognition efficiency, but in certain scenes, false recognition results are easy to generate due to poor quality of image frames or the fact that faces in the image frames are blocked. That is, in the mask recognition scheme in the related art, although the mask recognition result can be generated for the image frames in the monitoring video comprehensively, the error rate of the mask recognition result is relatively high. In view of this, in order to ensure accuracy of mask recognition, the present disclosure may ensure that a head region image in an image frame meets a detection size condition after recognizing the head region image. If the size of the head area image is too small, mask identification can be performed on the next image frame in the monitoring video, so that the inaccuracy of the subsequent mask identification result is avoided. In the case where the detection size condition is satisfied, the head attribute information corresponding to the head region image may be further generated. The head attribute information may characterize the pose and sharpness of the head in the head region image. Wherein, the pose and definition have key effects when mask recognition is carried out subsequently. The pedestrian image frames are screened by utilizing the head attribute information, so that the accuracy of the subsequent mask identification can be further ensured.
Under the condition that the head attribute information meets the preset attribute requirement, the definition of the head area image is indicated to meet the requirement, and meanwhile, the pose of the face also meets the mask recognition requirement, so that the outline information of the target object in the head area image can be detected. The target object can be a nose or a mask, and whether the mask is worn by a face in the head area image or not and whether the wearing of the mask meets the specification can be effectively distinguished by detecting the outline information of the nose or the mask.
Therefore, by identifying the head area and the head attribute, the image quality of the mask identification can be ensured, and the accuracy of the mask identification is improved.
In the actual mask recognition scene, the monitoring camera usually corresponds to a monitoring area, and a corresponding monitoring video can be generated by shooting pedestrians in the monitoring area. For a pedestrian in the monitored area, the monitoring camera continuously shoots an image frame containing the pedestrian when the pedestrian walks in the monitored area. In the finally photographed monitoring video, a large number of image frames containing the pedestrian are obtained. For example, assuming that a monitoring camera can take 25 frames of images per second, the pedestrian is in the monitored area for 10 seconds, then 250 frames of images of the pedestrian can be obtained. According to the technical scheme, although part of image frames which do not meet the requirements can be filtered, on the basis of a plurality of image frames, the image frames which meet the requirements can be searched finally, and accurate mask recognition results can be output on the basis of the image frames which meet the requirements. That is, according to the technical scheme provided by the present disclosure, when a pedestrian just starts to enter a monitoring area, if the captured image frame does not meet the requirement, the mask recognition result of the pedestrian may not be immediately given. However, as the pedestrian travels in the monitored area, there is a high probability that an image frame meeting the requirements can be captured, and an accurate mask recognition result is finally given.
Therefore, in the technical scheme provided by the disclosure, in the actual application scene, higher mask recognition efficiency can be ensured on the basis of ensuring pedestrian coverage rate. Meanwhile, the technical scheme provided by the disclosure can ensure that the output mask recognition result has higher accuracy, and well balances the recognition efficiency and recognition accuracy under the mask recognition scene.
Drawings
The features and advantages of the various embodiments of the present disclosure will be more clearly understood by reference to the accompanying drawings, which are schematic and should not be construed as limiting the disclosure in any way, in which:
fig. 1 is a schematic diagram showing a system configuration of mask recognition in an embodiment of the present disclosure;
fig. 2 is a schematic diagram showing steps of a mask recognition method according to an embodiment of the present disclosure;
FIG. 3 illustrates a schematic diagram of detection of attribute information in one embodiment of the present disclosure;
FIG. 4 illustrates a flowchart of mask identification in one embodiment of the present disclosure;
FIG. 5 illustrates a schematic diagram of identification of profile information in one embodiment of the present disclosure;
fig. 6 is a schematic diagram showing functional modules of a mask recognition device according to an embodiment of the present disclosure;
fig. 7 shows a schematic structural diagram of an electronic device in one embodiment of the present disclosure.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be made by those skilled in the art without the inventive effort, based on the embodiments of the present invention are intended to be within the scope of the present disclosure.
An embodiment of the present disclosure provides a mask recognition method, referring to fig. 1, which can be applied to the system architecture shown in fig. 1. In the present embodiment, the input image frames may be sequentially processed by the human head detection model, the attribute detection model, and the mask recognition model, to finally obtain the mask recognition result. The human head detection model can identify a head region image of a pedestrian from the pedestrian image frame, the attribute detection model can further identify head attributes of the head region image, and the head attributes can represent the pose and definition of the head in the head region image. Finally, the mask recognition model can detect the mask state in the head region image.
Referring to fig. 2, in one embodiment, the mask recognition method may include the following steps.
S1: and acquiring a pedestrian image frame to be detected, and identifying a head area image in the pedestrian image frame.
In this embodiment, when mask detection is required, a monitoring video of a pedestrian can be acquired by a camera. The monitoring video contains the image frame of the pedestrian, and can be used as the image frame of the pedestrian to be detected. For a pedestrian image frame to be detected, a head region image in the pedestrian image frame may be first identified.
The human head detection model can adopt a network detection framework of YOLOV 4. The prior prediction box (anchor) used in the conventional YOLOV4 network architecture is clustered for the COCO dataset, and may not be suitable for the head detection scenario of the present embodiment. In view of this, in the network inspection framework employing YOLOV4, some improvement is required.
Specifically, in the training process, a human head detection training sample set may be obtained, where each training sample in the human head detection training sample set may have a labeled head region detection frame. The labeled head region detection frames in the training sample can be used as labeling boxes, and in a human head detection model, the labeled head region detection frames can be clustered to generate a specified number of priori detection frames. The clustering process can adopt a K-means clustering algorithm, a designated number (for example, 9) of initial detection frames are selected from the marked head region detection frames, and then the rest head region detection frames can be clustered by taking the designated number of initial detection frames as a reference through calculating an overlapping region. During the clustering process, the position of the initial detection frame can be adjusted. After the clustering is completed, a specified number of detection frames with positions not adjusted can be obtained, and the specified number of detection frames can be used as priori detection frames suitable for training the human head detection model in the human head detection scene.
Subsequently, the prior detection frame can be utilized to learn the training sample, so that the human head detection model with higher precision and more accurate positioning can be obtained through training.
In practical applications, the human head detection model can be modified in other aspects. For example, in the original YOLOV4 network architecture, the number of network channels is generally large, and the calculation amount of data in the network is also large. In order to reduce the data calculation amount and improve the mask recognition efficiency, the number of channels in the YOLOV4 network can be compressed. For example, the number of channels of the YOLOV4 network in the human head detection model can be compressed to 60% of the original number, so that the detection speed of the head region image can be improved.
In addition, after the number of network channels is compressed, the detection accuracy of the human head detection model is reduced, and in order to balance the detection accuracy and the detection speed, the image input size of the yoov 4 network can be appropriately increased. Specifically, the original image input size of the YOLOV4 network is generally 416×416, and the image size of the training sample in the training process and the image size of the pedestrian image frame in the prediction process are generally 640×384, so that the image input size of the head detection model can be adjusted in the training process, so that the adjusted image input size is consistent with the image size of the training sample, and the detection precision and speed under a monitoring scene with 1080p resolution can be ensured.
S3: and under the condition that the head region image meets the detection size condition, generating head attribute information corresponding to the head region image, wherein the head attribute information is at least used for representing the pose and definition of the head in the head region image.
In the present embodiment, the pedestrian image frame is identified by the human head detection model, and the head region image is extracted and then represented by the upper left corner coordinate information and the width and height of the human head detection frame. For example, x may represent abscissa information of the upper left corner, y may represent ordinate information of the upper left corner, w may represent the width of the head detection frame, and h may represent the height of the head detection frame. Thus, the detected head region image can be characterized by (x, y, w, h).
In the present embodiment, since the accuracy of the subsequent mask recognition is affected if the head of the person in the head region image is too small, a detection size condition may be predetermined, and when the head region image meets the detection size condition, a subsequent data processing process is performed.
Specifically, the detection size condition may define lower limit values of the width and the height of the human head detection frame, and if the width of the human head detection frame in the head region image is smaller than the corresponding lower limit value, or the height of the human head detection frame is smaller than the corresponding lower limit value, it is indicated that the head region image does not conform to the detection size condition. Only when the width and height of the human head detection frame in the head region image are larger than or equal to the corresponding lower limit value, the head region image is indicated to meet the detection size condition.
In this embodiment, if the head area image does not meet the detection size condition, the pedestrian image frame to be detected may be directly discarded, a next pedestrian image frame to be detected in the surveillance video may be acquired, and processing of the next pedestrian image frame to be detected may be started according to the description of step S1.
In the case that the head region image meets the detection size condition, the head attribute information corresponding to the head region image may be further generated through an attribute detection model. In this embodiment, the head attribute information may at least characterize the pose and sharpness of the head in the head region image. The pose attributes may include a head orientation attribute and a head lift attribute, among others. In practical application, the orientation attribute can be divided into three types of side, front and back, which respectively represent three different degrees of orientation of the head in the head region image; the head-up attribute can be divided into three types of head-down, head-up and head-up, which respectively represent three different degrees of elevation of the head in the head region image. The pose attribute of the head can be comprehensively obtained through the orientation attribute and the head-lifting attribute. The definition attribute can comprehensively reflect the imaging effect of the head region image. In practical applications, the sharpness attribute may be classified into various types such as image brightness over-black, image quality sharpness, and non-human head. It can be seen that the imaging quality of the head in the head region image can be specifically characterized by the head attribute information. When the front face of the attribute representation, the head-up attribute representation and the definition attribute representation are clear in image quality, the imaging quality of the head area image is optimal, and the accuracy of subsequent mask recognition is higher.
In order to accurately output the above-described header attribute information, in the present embodiment, an attribute detection model may be constructed using a network of the resnet 18. Referring to fig. 3, four different types of block blocks may be included in the attribute detection model, wherein the first three block blocks, together with the convolutional neural network, may sample the input head region image layer by layer, so as to extract the image features of the head region image. At the output layer of the attribute detection model, a plurality of different attribute branches can be constructed through a fourth block, and each attribute branch can output corresponding attribute information. For example, in one specific application example, the output layer of the attribute detection model may include three attribute branches, where the three attribute branches respectively output the above-mentioned orientation attribute, head-up attribute, and sharpness attribute. The block can be composed of convolution layers, and in practical application, the number of the convolution layers in the block and the architecture of the convolution layers can be flexibly adjusted according to practical requirements.
In this embodiment, each attribute branch may be considered a classifier. For example, for an attribute branch towards an attribute, three different classes of classifiers can be considered, side, front, back. In this way, in the training process, an image training sample set can be obtained, each image sample in the image training sample set is provided with an attribute tag, and the attribute tag is at least used for representing the pose and definition of the image sample. For example, in one practical application scenario, the attribute tag of the image sample may include sub-tags of three different aspects, namely an orientation attribute, a head-up attribute, and a sharpness attribute. The label value of the sub-label can be expressed in a preset manner. For example, for the orientation attribute, a label value of 0 may be represented as a side, a label value of 1 as a front and a label value of 2 as a back. Similarly, for the sharpness attribute, a label value of 0 may be represented as image intensity too dark, a label value of 1 as image quality sharpness and a label value of 2 as a non-human head image. Of course, the above setting of the tag value is only a feasible manner, and in practical application, the tag value can be flexibly set according to needs, so long as the tag value can represent different classification results under the attribute branch.
In this embodiment, after the image sample with the label completed is obtained, the image feature of the image sample may be extracted by the attribute detection model. For example, the image samples may be sampled by the convolutional layer and the first three blocks in fig. 3, thereby extracting corresponding image features. The extracted image features may then be fed into different attribute branches, respectively, such that the image features are processed through a plurality of attribute branches to generate attribute prediction results for each of the attribute branches. The attribute prediction result may be a classification result under the attribute branch.
And comparing the attribute prediction result with a corresponding tag value in the attribute tag aiming at the attribute prediction result of any attribute branch, thereby obtaining an error between the attribute prediction result and the tag value. For example, for an attribute branch towards an attribute, the tag value of the input image sample indicates that the towards attribute is positive, and the generated attribute prediction result indicates that the towards attribute is lateral, then an error can be generated between the two by a loss function.
In this embodiment, each attribute branch may obtain a corresponding error through a respective loss function (for example, a cross entropy loss function), and then the corresponding error of each attribute branch may be counted to form a total error of the attribute branch, and then the attribute detection model may be corrected based on the counted error.
From the above description, each attribute branch may be considered as a classifier, and the result output by the classifier is typically a probability vector, in which probability values corresponding to the respective classes may be contained. For example, taking the attribute branch towards the attribute as an example, the output probability vector may include three probability values, where the three probability values correspond to probabilities of the side, front, and back surfaces, respectively. Finally, the maximum probability value may be used as the final classification result. For example, if the probability vector output towards the attribute branch of the attribute is (0.1,0.7,0.2), then the maximum probability value is 0.7, and the probability that 0.7 corresponds to the front is determined, then the final classification result may indicate that the predicted orientation of the head is the front in the input head region image.
It can be seen that the attribute prediction result of the attribute branch is a probability vector, and a plurality of probability values contained in the probability vector are respectively in one-to-one correspondence with a plurality of attribute categories of the attribute branch. After the attribute prediction results of the attribute branches are generated, the maximum probability value can be identified from probability vectors characterized by the attribute prediction results of the attribute branches, and the attribute category corresponding to the maximum probability value is used as the classification result of the attribute branches.
In practical applications, to balance the calculation speed and accuracy of the model, the size of the head region image of the input attribute detection model may be uniformly scaled to 128×128 resolution. By performing the processing of the different attribute branches on the input head region image, the head attribute information characterizing the head region image can be finally obtained.
It should be noted that, in some application scenarios, when the head area image meets the detection size condition, the ambiguity of the head area image needs to be further determined. If the judgment result indicates that the head area image is too fuzzy, the current pedestrian image frame also needs to be discarded. When the judging result indicates that the ambiguity of the head region image meets the requirement, the head attribute information of the head region image is further detected.
In this embodiment, when the blur degree determination is performed on the head area image, a blur degree determination function mature in the related art may be employed. For example, the Laplace operator and the meanstdev operator in the opencv library may be selected for ambiguity determination.
S5: and if the head attribute information meets the preset attribute requirement, detecting the outline information of the target object in the head area image, and generating a mask recognition result based on the outline information of the target object.
In this embodiment, after the head attribute information of the head area image is generated, the head area image whose head attribute information satisfies the preset attribute requirement may be screened. The head region image meeting the requirement of the preset attribute can refer to a head region image with the orientation attribute of front face, the head raising attribute of front view and the definition attribute of clear image quality. Thus, when the head attribute information satisfies the above-described preset attribute requirement, the contour information of the target object in the head area image can be further detected. And if the head attribute information of the head area image does not meet the preset attribute requirement, the current pedestrian image frame can be abandoned.
As shown in fig. 4, if the head area image does not meet the detection size condition, or the ambiguity of the head area image does not meet the preset attribute requirement, the pedestrian image frame to be detected is discarded, and a next pedestrian image frame to be detected is acquired, so that the head area image in the next pedestrian image frame to be detected is identified, and the subsequent determination and processing process are performed on the head area image.
The processing aims at filtering the unsatisfactory pedestrian image frames from the monitoring video, so that the accuracy of mask identification is ensured, on one hand, the accuracy of mask identification is improved, on the other hand, the unsatisfactory pedestrian image frames can be filtered rapidly, and the data processing process is simplified.
In this embodiment, when the head region image satisfies the above-described series of requirements, the head region image may be finally fed into the mask recognition model, so that the contour information of the target object in the head region image is output through the mask recognition model. Wherein the target object may be a nose or a mask. By identifying the contours of the nose and the mask in the head region image, whether the mask is worn by the face in the head region image and whether the mask is worn in compliance with the specifications can be determined.
Specifically, HRNet can be used to construct the mask recognition model described above. In HRNet, a network hierarchy of multiple stages may be included. For example, in fig. 5, the mask recognition model may include a network hierarchy of 4 stages, where each stage's network hierarchy may be implemented in a manner conventional in HRNet (e.g., may include different dimensions and different and number of convolution layers). And processing the head region image through the HRNet, and finally outputting contour information of the region of interest. In the training stage, mask recognition image sample sets for training the mask recognition model are used for training, and each image sample has outline information of which marking is completed. The profile information may be classified according to the target object. For example, the outline information of the nose and the outline information of the mask may be labeled in the image sample. The outline information of the nose can be used as the outline information of a first-class target object, and the outline information of the mask can be used as the outline information of a second-class target object. In practical application, the contour information of different types of target objects can be obtained after polygonal labeling by a labelme labeling tool. Thus, the labels of the image samples can characterize different types of target objects, and can also characterize the contour information of the target objects. The contour information may be represented by the positions of the polygons marked in the image sample. For example, for a certain image sample, its label may be expressed as { category 1: profile information 1; category 2: profile information 2, wherein class 1 may represent a nose and class 2 may represent a mask, profile information 1 may represent profile information of a nose and profile information 2 may represent profile information of a mask. Of course, for a partial image sample, the tag may also have only one category of contour information. For example, for a certain image sample without mask, the tag may only have outline information of the nose; for an image sample in which the mask is worn in a certain specification, the tag may be only the outline information of the mask (the nose is covered by the mask).
In the present embodiment, the obtained image sample may be subjected to sample expansion by a certain means. For example, masks of different colors and styles should be included in the mask recognition image sample set, taking into account the colors (blue, gray, red, white, etc.) and styles (N95, medical mask, etc.) that the mask may have. In this way, in the training process, data enhancement means such as random angle rotation, horizontal overturning, fuzzy disturbance, mask region color disturbance and the like can be adopted to process partial image samples, so that richer image samples are generated, and data support is provided for the subsequent training process.
In the present embodiment, after training the mask recognition model, the mask recognition model can accurately output the contour information of the target object for any of the input head region images. The final mask recognition result can be obtained by recognizing the outline information of the target object.
Specifically, if the profile information of the target object only includes the profile information of the mask, and the profile area represented by the profile information of the mask is greater than a specified area threshold, it may be determined that the mask is worn by the pedestrian specification in the pedestrian image frame.
If the contour information of the target object includes the contour information of the nose and the contour information of the mask, and the contour area represented by the contour information of the mask is greater than the specified area threshold, it may be determined that the mask is not worn by the pedestrian in the pedestrian image frame.
If the contour information of the target object only includes contour information of the nose, it may be determined that the mask is not worn by the pedestrian in the pedestrian image frame.
The area threshold is set for the purpose of reducing erroneous determination of the model. Specifically, in the case where the mask is not worn, there may be erroneous division of the model, thereby outputting an erroneous recognition result. For example, when the mask is not worn, the mask contour position information may be divided into a very small area when the model performs contour division on the image, and erroneous judgment may be caused if the contour area is not further judged.
According to the technical scheme provided by one or more embodiments of the present disclosure, after the image frame of the pedestrian to be detected is acquired, the head region image in the image frame can be identified. In the prior art, after capturing an image frame containing a face, the image frame is classified based on a classification model to determine whether the face in the image frame wears a mask. The detection mode in the prior art can ensure higher recognition efficiency, but in certain scenes, false recognition results are easy to generate due to poor quality of image frames or the fact that faces in the image frames are blocked. That is, in the mask recognition scheme in the related art, although the mask recognition result can be generated for the image frames in the monitoring video comprehensively, the error rate of the mask recognition result is relatively high. In view of this, in order to ensure accuracy of mask recognition, the present disclosure may ensure that a head region image in an image frame meets a detection size condition after recognizing the head region image. If the size of the head area image is too small, mask identification can be performed on the next image frame in the monitoring video, so that the inaccuracy of the subsequent mask identification result is avoided. In the case where the detection size condition is satisfied, the head attribute information corresponding to the head region image may be further generated. The head attribute information may characterize the pose and sharpness of the head in the head region image. Wherein, the pose and definition have key effects when mask recognition is carried out subsequently. The pedestrian image frames are screened by utilizing the head attribute information, so that the accuracy of the subsequent mask identification can be further ensured.
Under the condition that the head attribute information meets the preset attribute requirement, the definition of the head area image is indicated to meet the requirement, and meanwhile, the pose of the face also meets the mask recognition requirement, so that the outline information of the target object in the head area image can be detected. The target object can be a nose or a mask, and whether the mask is worn by a face in the head area image or not and whether the wearing of the mask meets the specification can be effectively distinguished by detecting the outline information of the nose or the mask.
Therefore, by identifying the head area and the head attribute, the image quality of the mask identification can be ensured, and the accuracy of the mask identification is improved.
In the actual mask recognition scene, the monitoring camera usually corresponds to a monitoring area, and a corresponding monitoring video can be generated by shooting pedestrians in the monitoring area. For a pedestrian in the monitored area, the monitoring camera continuously shoots an image frame containing the pedestrian when the pedestrian walks in the monitored area. In the finally photographed monitoring video, a large number of image frames containing the pedestrian are obtained. For example, assuming that a monitoring camera can take 25 frames of images per second, the pedestrian is in the monitored area for 10 seconds, then 250 frames of images of the pedestrian can be obtained. According to the technical scheme, although part of image frames which do not meet the requirements can be filtered, on the basis of a plurality of image frames, the image frames which meet the requirements can be searched finally, and accurate mask recognition results can be output on the basis of the image frames which meet the requirements. That is, according to the technical scheme provided by the present disclosure, when a pedestrian just starts to enter a monitoring area, if the captured image frame does not meet the requirement, the mask recognition result of the pedestrian may not be immediately given. However, as the pedestrian travels in the monitored area, there is a high probability that an image frame meeting the requirements can be captured, and an accurate mask recognition result is finally given.
Therefore, the technical scheme provided by the disclosure can ensure higher mask recognition efficiency on the basis of ensuring pedestrian coverage rate in an actual application scene. Meanwhile, the technical scheme provided by the disclosure can ensure that the output mask recognition result has higher accuracy, and well balances the recognition efficiency and recognition accuracy under the mask recognition scene.
Referring to fig. 6, an embodiment of the present disclosure further provides a mask recognition device, including:
a head region identification unit 100 for acquiring a pedestrian image frame to be detected and identifying a head region image in the pedestrian image frame;
a head attribute identifying unit 200, configured to generate head attribute information corresponding to the head area image, where the head area image meets a detection size condition, and the head attribute information is at least used to characterize a pose and a definition of a head in the head area image;
and the mask recognition unit 300 is configured to detect contour information of a target object in the head region image if the head attribute information meets a preset attribute requirement, and generate a mask recognition result based on the contour information of the target object.
The respective units set forth in the above embodiments may be implemented by a computer chip or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present application.
Referring to fig. 7, the disclosure further provides an electronic device, where the electronic device includes a memory and a processor, and the memory is configured to store a computer program, and when the computer program is executed by the processor, implement the mask recognition method described above.
The present disclosure also provides a computer readable storage medium for storing a computer program which, when executed by a processor, implements the mask recognition method described above.
The processor may be a central processing unit (Central Processing Unit, CPU). The processor may also be any other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in embodiments of the present disclosure. The processor executes various functional applications of the processor and data processing, i.e., implements the methods of the method embodiments described above, by running non-transitory software programs, instructions, and modules stored in memory.
The memory may include a memory program area and a memory data area, wherein the memory program area may store an operating system, at least one application program required for a function; the storage data area may store data created by the processor, etc. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some implementations, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-described embodiment method when executed. Wherein the storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for embodiments of the apparatus, device and storage medium, the description is relatively simple as it is substantially similar to the method embodiments, as relevant points are found in the partial description of the method embodiments.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.
Although embodiments of the present disclosure have been described with reference to the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the disclosure, and such modifications and variations fall within the scope as defined by the appended claims.

Claims (11)

1. A mask identification method, the method comprising:
acquiring a pedestrian image frame to be detected, and identifying a head region image in the pedestrian image frame;
generating head attribute information corresponding to the head region image under the condition that the head region image meets the detection size condition, wherein the head attribute information is at least used for representing the pose and definition of the head in the head region image;
and if the head attribute information meets the preset attribute requirement, detecting the outline information of the target object in the head area image, and generating a mask recognition result based on the outline information of the target object.
2. The method of claim 1, wherein the head area image in the pedestrian image frame is identified by a trained human head detection model, wherein the human head detection model is trained in the following manner;
acquiring a human head detection training sample set, wherein each training sample in the human head detection training sample set is provided with a marked head region detection frame;
in the human head detection model, clustering the marked head region detection frames to generate a specified number of priori detection frames;
and training the human head detection model according to the prior detection frame.
3. The method of claim 2, wherein in training the human head detection model according to the a priori detection frame, the method further comprises:
and adjusting the image input size of the human head detection model so that the adjusted image input size is consistent with the image size of the training sample.
4. The method according to claim 1, wherein the method further comprises:
if the head region image does not meet the detection size condition or the head attribute information does not meet the preset attribute requirement, discarding the pedestrian image frame to be detected, acquiring a next pedestrian image frame to be detected, and identifying a head region image in the next pedestrian image frame to be detected.
5. The method of claim 1, wherein the head attribute information is identified by a trained attribute detection model, wherein the attribute detection model is trained in the following manner;
acquiring an image training sample set, wherein each image sample in the image training sample set is provided with an attribute tag, and the attribute tag is at least used for representing the pose and definition of the image sample;
extracting image features of the image sample through an attribute detection model, and processing the image features through a plurality of attribute branches to generate attribute prediction results of the attribute branches;
comparing the attribute prediction result with a label value in the attribute label aiming at the attribute prediction result of any attribute branch to obtain an error between the attribute prediction result and the label value;
and counting errors corresponding to the attribute branches, and correcting the attribute detection model based on the counted errors.
6. The method according to claim 5, wherein the attribute prediction result of the attribute branch is a probability vector, and a plurality of probability values contained in the probability vector are respectively in one-to-one correspondence with a plurality of attribute categories of the attribute branch;
after generating the attribute prediction result of each of the attribute branches, the method further includes:
and identifying the maximum probability value from the probability vector represented by the attribute prediction result of the attribute branch, and taking the attribute category corresponding to the maximum probability value as the classification result of the attribute branch.
7. The method according to claim 1, wherein the profile information of the target object is detected by a mask recognition model that is trained, wherein mask recognition image samples for training the mask recognition model are set, and each image sample has the profile information that is annotated, wherein the profile information at least characterizes a category of the target object and a profile area of the target object, and the category of the target object is nose or mask.
8. The method according to claim 1 or 7, wherein generating a mask recognition result based on the profile information of the target object includes:
if the outline information of the target object only comprises the outline information of the mask and the outline area represented by the outline information of the mask is larger than a specified area threshold, judging that the mask is worn by pedestrians in the pedestrian image frame in a standard mode;
if the contour information of the target object comprises the contour information of the nose and the contour information of the mask, and the contour area represented by the contour information of the mask is larger than the specified area threshold, judging that the mask is not worn by the pedestrian in the pedestrian image frame in a standardized manner;
and if the outline information of the target object only comprises the outline information of the nose, judging that the mask is not worn by the pedestrian in the pedestrian image frame.
9. A mask identification device, the device comprising:
a head region identification unit for acquiring a pedestrian image frame to be detected and identifying a head region image in the pedestrian image frame;
the head attribute identification unit is used for generating head attribute information corresponding to the head area image under the condition that the head area image meets the detection size condition, wherein the head attribute information is at least used for representing the pose and definition of the head in the head area image;
and the mask identification unit is used for detecting the outline information of the target object in the head region image if the head attribute information meets the preset attribute requirement and generating a mask identification result based on the outline information of the target object.
10. An electronic device comprising a memory and a processor, the memory for storing a computer program which, when executed by the processor, implements the method of any of claims 1 to 8.
11. A computer readable storage medium for storing a computer program which, when executed by a processor, implements the method of any one of claims 1 to 8.
CN202310077871.2A 2023-01-13 2023-01-13 Mask identification method, mask identification device, mask identification equipment and storage medium Pending CN116370151A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310077871.2A CN116370151A (en) 2023-01-13 2023-01-13 Mask identification method, mask identification device, mask identification equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310077871.2A CN116370151A (en) 2023-01-13 2023-01-13 Mask identification method, mask identification device, mask identification equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116370151A true CN116370151A (en) 2023-07-04

Family

ID=86964530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310077871.2A Pending CN116370151A (en) 2023-01-13 2023-01-13 Mask identification method, mask identification device, mask identification equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116370151A (en)

Similar Documents

Publication Publication Date Title
CN110163114B (en) Method and system for analyzing face angle and face blurriness and computer equipment
WO2022036953A1 (en) Defect detection method and related apparatus, device, storage medium, and computer program product
JP6330385B2 (en) Image processing apparatus, image processing method, and program
US9542751B2 (en) Systems and methods for reducing a plurality of bounding regions
US7970180B2 (en) Method, apparatus, and program for processing red eyes
US20210133980A1 (en) Image processing apparatus, training apparatus, image processing method, training method, and storage medium
KR101781358B1 (en) Personal Identification System And Method By Face Recognition In Digital Image
US20160092726A1 (en) Using gestures to train hand detection in ego-centric video
US20150262068A1 (en) Event detection apparatus and event detection method
US20220215560A1 (en) Method and device for tracking multiple target objects in motion state
KR100735549B1 (en) Method and apparatus for conversion of skin color of image
CN111368758A (en) Face ambiguity detection method and device, computer equipment and storage medium
KR20070016849A (en) Method and apparatus for serving prefer color conversion of skin color applying face detection and skin area detection
US8498496B2 (en) Method and apparatus for filtering red and/or golden eye artifacts
CN112800978A (en) Attribute recognition method, and training method and device for part attribute extraction network
CN111160107A (en) Dynamic region detection method based on feature matching
CN110321778B (en) Face image processing method and device and storage medium
CN112581481A (en) Image processing method and device, electronic equipment and computer readable storage medium
CN110348353B (en) Image processing method and device
CN112347967A (en) Pedestrian detection method fusing motion information in complex scene
CN113012030A (en) Image splicing method, device and equipment
CN111160340A (en) Moving target detection method and device, storage medium and terminal equipment
CN111179287A (en) Portrait instance segmentation method, device, equipment and storage medium
CN116370151A (en) Mask identification method, mask identification device, mask identification equipment and storage medium
CN113435358B (en) Sample generation method, device, equipment and program product for training model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination