CN115457624B - Face recognition method, device, equipment and medium for wearing mask by cross fusion of local face features and whole face features - Google Patents

Face recognition method, device, equipment and medium for wearing mask by cross fusion of local face features and whole face features Download PDF

Info

Publication number
CN115457624B
CN115457624B CN202210990521.0A CN202210990521A CN115457624B CN 115457624 B CN115457624 B CN 115457624B CN 202210990521 A CN202210990521 A CN 202210990521A CN 115457624 B CN115457624 B CN 115457624B
Authority
CN
China
Prior art keywords
face
mask
features
network
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210990521.0A
Other languages
Chinese (zh)
Other versions
CN115457624A (en
Inventor
陈岸明
温峻峰
林群雄
洪小龙
孙全忠
李鑫
杜海江
罗海涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Tianwang Guangdong Technology Co ltd
Original Assignee
Zhongke Tianwang Guangdong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Tianwang Guangdong Technology Co ltd filed Critical Zhongke Tianwang Guangdong Technology Co ltd
Priority to CN202210990521.0A priority Critical patent/CN115457624B/en
Publication of CN115457624A publication Critical patent/CN115457624A/en
Application granted granted Critical
Publication of CN115457624B publication Critical patent/CN115457624B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/169Holistic features and representations, i.e. based on the facial image taken as a whole
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a face recognition method, a device, equipment and a medium for a mask wearing face, wherein the face features of the mask wearing face are crossed and fused with the face features of the mask wearing face, and a front-end camera is used for acquiring an image of a user to be detected; inputting the image of the user to be detected into a mask face detection model, and outputting position information about the face and the mask; cutting the image of the user to be detected according to the position information of the face and the mask, obtaining a complete face area in the image and a face area which is not blocked by the mask, and carrying out image denoising or enhancing treatment; inputting the face region and the face region which is not blocked by the mask into a mask face feature extraction network, inputting the face region into a main network to extract overall contour features, integrating the two information through a feature fusion module, and outputting the fused mask face features; and inputting the facial features of the mask into a classifier to obtain the identification result of the identity of the user to be detected. The network has the function of detecting the face area and the mask area.

Description

Face recognition method, device, equipment and medium for wearing mask by cross fusion of local face features and whole face features
Technical Field
The embodiment of the invention relates to the technical field of face recognition, in particular to a face recognition method, device, equipment and medium for wearing mask by cross fusion of local and whole face features
Background
Under the push of the deep learning theory, the face recognition technology is breakthrough developed and mature day by day, and is widely applied to daily life, such as face payment, community access control, station security inspection systems and the like. However, these applications currently often require users to use under controlled conditions, and under unconstrained conditions such as poor ambient lighting, face occlusion, etc., the accuracy of face recognition is often compromised.
Dai Kouzhao face recognition belongs to the category of shielding face recognition, and aims to solve the problem of recognition accuracy reduction caused by facial feature reduction under shielding of a mask. Currently, there are two common ideas: firstly, the facial features of the non-shielding area are effectively utilized, and secondly, the shielded area, namely the inherent features under the mask, is repaired. The shielding position caused by the mask is relatively fixed, the former method is effective, but the method ignores the effect of the whole facial features on recognition; while the latter usually employs generation of an countermeasure network to repair the image around both local and global aspects, such methods are often difficult to train and it is difficult to ensure consistency of the repaired area and other areas. At present, a learner puts forward a method based on feature fusion, utilizes human context information to assist in completing face recognition, and the model has better stability, and fused features can effectively reduce the influence of mask shielding on classification, but how to enhance the discrimination of fused information and reduce the redundancy of information, and put forward a more efficient and accurate face recognition method for masks, which is still the problem to be considered in the next step
Disclosure of Invention
The embodiment of the invention aims to provide a face recognition method, device, equipment and medium for a mask with local and whole face features in a cross fusion mode, which are used for solving the problems in the background technology.
In order to achieve the above purpose, the embodiment of the present invention mainly provides the following technical solutions: the face recognition method of the mask wearing the face with the cross fusion of the local face features and the whole face features is characterized by comprising the following steps of:
acquiring an image of a user to be detected through a front-end camera;
inputting the image of the user to be detected into a mask face detection model, and outputting position information about the face and the mask;
cutting the user image to be detected according to the position information of the face and the mask, obtaining a complete face area and an eyebrow area in the image, and carrying out image denoising or enhancing treatment;
inputting the face region and the eyebrow region into a mask face feature extraction network, wherein the face region enters a main path network to extract overall outline features, the non-mask shielding region enters a branch path network to extract local eyebrow features, and finally, integrating the two information through a feature fusion module to output fused mask face features;
and inputting the facial features of the mask into a classifier to obtain the identification result of the identity of the user to be detected.
Preferably, the network structure of the mask face detection model consists of a trunk, a neck and a detection head; the trunk part adopts a general feature extraction network ResNet; the neck adopts FPN to refine the original feature map, and aggregate semantic information of different layers; the detection head adopts an SSD algorithm, and a context attention module is added in the detection head to enable the network to pay attention to the face and mask area;
the context awareness module is composed of a context awareness module and a CBAM awareness module. The context sensing module is provided with three branches, wherein the three branches respectively comprise 1, 2 and 3 multiplied by 3 convolution kernels, and the output results of the three branches are combined into a feature map through channel cascading operation and are input into the CBAM attention module;
the mask face detection model is trained by face data of a wearer wearing a mask. Each image in the face data has a tag file annotated with the face position and mask position information. After inputting the image into the model, the model outputs a corresponding prediction result according to the extracted characteristics, wherein the prediction result comprises the coordinates of a face and the confidence of the face, the coordinates of a mask and the confidence of the mask, and a loss value between the prediction result and a true value in a tag file is calculated through a preset mask face loss function, so that the loss value is reduced as an optimization target, and the mask face detection model is trained;
the mask face loss function adopts multitasking loss, and consists of face position offset loss and confidence loss, and mask position offset loss and confidence loss, and the expression is as follows:
wherein L represents a loss value of the mask face detection model; l (L) conf (. Cndot.) and L loc (. Cndot.) represents the confidence loss function and the position offset loss function, respectively;indicating whether a face exists (1 if a face exists, 0 if no face exists),indicating whether or not there is a mask (if there is a mask)1, 0 if not present),>coordinates representing a face region, < >>Coordinates representing the mask area; p (P) fc Representing confidence level of predicting existence of face, P mc Indicating confidence level of predicted mask presence, P fl Representing predicted face region coordinates, P fl Representing predicted mask region coordinates; alpha represents a confidence penalty term factor and beta represents a position offset penalty term factor.
Preferably, the user image to be detected is cut according to the output result of the mask face detection model. And cutting out a face region from the image of the user to be detected according to the predicted face region coordinates, and cutting out an eyebrow region which is not shielded above the boundary line according to the predicted mask region coordinates by taking the upper boundary of the mask region as the boundary line. And after cutting, obtaining corresponding face area and eyebrow area images. In order to improve the image quality and ensure the accuracy of the subsequent feature extraction and feature matching, the face and eyebrow region images are further subjected to denoising or enhancement processing.
Preferably, the facial area and the eyebrow area images are input into a facial feature extraction network of the mask, so that facial features of the mask are obtained; in order to utilize all descriptive characteristics in the mask face image as much as possible, the mask face characteristic extraction network adopts a parallel design of a main path network and a branch path network, and is respectively used for extracting the whole outline characteristics and the local eyebrow characteristics of the face, and finally the mask face characteristics are obtained through a whole-local characteristic fusion module; in order to ensure the feature extraction efficiency, the main network and the branch network respectively adopt two lightweight networks, namely an acceptance V3 network and a Mobi leNet network. Meanwhile, in order to make the main network pay more attention to the outline and appearance characteristics of the face, a CBAM attention module is connected after the InceptionV 3. The main network and the branch network are finally connected with a whole-local feature fusion module;
the whole-local feature fusion module is provided with two stages, wherein the first stage is an information interaction stage, the second stage is an information integration stage, and the whole outline features respectively output by a main network and a branch network are set as F o ∈R C×H×W The local eyebrow is characterized by F l ∈R C×H×W Wherein R is C×H×W The spatial dimension representing the feature map is composed of the channel number C, the height H and the width W, respectively, when F o And F l After being input into the integral-local feature fusion module, the mask facial features F are obtained through an information interaction stage and an information integration stage in sequence m ∈R C×H×W
In the information interaction stage, the integral outline feature F o And local eyebrow feature F l After the channel dimensions are compressed through a 1X 1 convolution kernel respectively, the two are combined into a feature F by using a bilinear fusion operation b ∈R C×H×W Feature F b And feature F o And F l After channel cascade, the weight W is obtained by a 1X 1 convolution kernel and softmax function o ∈R C×H×W And W is l ∈R C×H×W . Weight W o And W is l Respectively with characteristic F o And F l Multiplying, and respectively passing through a 1×1 convolution kernel to obtain feature F o ′∈R C×H×W And F l ′∈R C×H×W . Finally, F b 、F o ' and F l The three characteristics are subjected to channel cascade to obtain an output characteristic F of the first stage s1 ∈R 3C ×H×W . The above process can be expressed by the following formula:
F b =bilinear(conv 1×1 (F o ),conv 1×1 (F l ))
W o ,W l =softmax(conv 1×1 (cat(F o ,F b ,F l )))
F s1 =cat(conv 1×1 (F o ⊙W o ),F b ,conv 1×1 (F l ⊙W l ))
wherein conv 1×1 (. Cndot.) represents a 1X 1 convolution operationPerforming; bilinear (·) represents a bilinear fusion operation; softmax (·) represents the softmax function; the "" indicates that the matrix is multiplied by element; cat (-) represents a channel cascade operation;
in the information integration stage, feature F s1 Respectively into an identity branch and a residual branch. The identity branch is composed of only one 1×1 convolution kernel, and the residual branch is composed of one 1×1 convolution kernel, one 3×3 depth separable convolution kernel, a ReLU activation function and one 1×1 convolution kernel which are connected in sequence. Adding the outputs of the identity branch and the residual branch, and regularizing to obtain the fused mask face feature F m The method comprises the steps of carrying out a first treatment on the surface of the The above procedure can be represented by the following formula:
F indentity =conv 1×1 (F s1 )
F residual =conv 1×1 (relu(DWconv 3×3 (conv 1×1 (F s1 ))))
wherein F is identity ∈R C×H×W And F residual ∈R C×H×W Output characteristics of the identity branch and the residual branch are respectively represented; DWconv 3×3 (. Cndot.) represents a 3 x 3 depth separable convolution operation; reLU (·) represents a ReLU activation function; norm (·) represents a regularization operation;
the mask face feature extraction network is used for training face data of a mask, and is used for expanding an existing face recognition dataset of the mask, a GAN network is used for generating the mask for the disclosed face recognition dataset so as to simulate the face dataset of the mask and increase sample diversity. And simultaneously, the data set is subjected to similar clipping processing to obtain images of the face area and the eyebrow area and corresponding character identity labels. In the training stage, the tail end of the mask face feature extraction network is connected with a full connection layer, classification is carried out according to the extracted mask face features, a ternary group loss function is adopted to calculate a loss value between a classification result and a true value, and the network is trained by taking the loss value reduction as a target to enable the network to be converged. And in the model operation stage, extracting the mask face characteristics of the user to be detected from the input face area and eyebrow area images by using a mask face characteristic extraction network without a full connection layer.
Preferably, the facial features of the mask of the user to be tested are input into a trained classifier to obtain a prediction result of the identity of the user to be tested. The classifier is trained by using registered user samples with features extracted through the mask face feature extraction network in advance, so that the classifier can be correctly matched with the identity information of a user according to similar mask face features.
A face recognition system for a wearer's mask, the system comprising:
the user image acquisition module is used for acquiring image data of a user to be detected so as to identify the identity; meanwhile, the system is also used for collecting images of registered users so as to train a classifier for matching identity information;
the mask face detection module is used for accurately positioning a face area and a mask area in a user image and outputting position information of the corresponding area;
the image preprocessing module is used for cutting the user image according to the position information output by the mask face detection module and preprocessing such as denoising and enhancing the cut image;
the mask face feature extraction module is used for extracting integral outline features and local eyebrow and eye features from the face region and the eyebrow and eye region images respectively, and obtaining robust mask face features after information of the integral outline features and the local eyebrow and eye features are fused;
and the mask face matching module is used for classifying according to the extracted mask face characteristics and matching the identities of the registered characters in the library.
The computer equipment comprises a processor and a memory for storing a program executable by the processor, wherein the facial mask face recognition method is realized when the processor executes the program stored by the memory.
A storage medium storing a program which, when executed by a processor, implements the face recognition method for a wearer's face.
The technical scheme provided by the embodiment of the invention has at least the following advantages:
1. according to the invention, on the basis of the existing face detection network, mask positioning tasks are introduced, and the network has the function of detecting the face region and the mask region at the same time by utilizing a multitask learning mode, so that a good front foundation is provided for the subsequent feature extraction tasks;
2. in order to fully utilize the identifiable information of the face wearing the mask and reduce the influence caused by shielding of the mask as much as possible, the invention fuses the whole outline feature and the local eyebrow feature of the face, thereby obtaining the robust face feature of the mask and improving the accuracy of face recognition;
3. in order to make the generalization capability of the mask face feature extraction network stronger, the invention utilizes the conventional face data set disclosed in the prior art to form the face data set of the mask to be worn by the user through the GAN network, thereby increasing the diversity and the number of data samples.
Drawings
Fig. 1 is a flowchart of a face recognition method of a mask with a combination of local and global face features according to embodiment 1 of the present invention.
Fig. 2 is a schematic structural diagram of a face detection network of a mask according to embodiment 1 of the present invention.
Fig. 3 is a schematic structural diagram of a face extraction network of a mask according to embodiment 1 of the present invention.
Fig. 4 is a schematic structural diagram of a global-local feature fusion module according to embodiment 1 of the present invention.
Fig. 5 is a block diagram of a face recognition system for a mask according to embodiment 2 of the present invention.
Detailed Description
Further advantages and effects of the present invention will become apparent to those skilled in the art from the disclosure of the present invention, which is described by the following specific examples.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
Referring to fig. 1-5, four embodiments of the present invention are provided:
example 1:
in the embodiment, based on a Python programming language, a Pytorch deep learning framework is used to construct a network model structure, and training of a model is completed under a Ubuntu system. The hardware environment is Ubuntu18.04.3, and the GPU model is GeForce RTX2080T i.
As shown in fig. 1, the embodiment discloses a method, a device, equipment and a medium for face recognition of a mask with cross fusion of local and whole face features, which specifically comprises the following steps:
s101, acquiring an image of a user to be detected through a front-end camera.
The method comprises the steps of acquiring image data of a registered user by using a front-end camera in a pre-registration stage to train a classifier for matching user identities, and acquiring the image data of a current user to be tested by using the camera in a later use stage.
S102, inputting the image of the user to be detected into a mask face detection model, and outputting position information about a face and a mask;
as shown in fig. 2, the network structure of the mask face detection model consists of a trunk, a neck and a detection head. The trunk part adopts a general feature extraction network ResNet; the neck adopts FPN to refine the original feature map, and aggregate semantic information of different layers; the detection head adopts SSD algorithm, and adds a context attention module inside to make the network pay attention to the face and mask area.
Specifically, the context awareness module consists of a context awareness module and a CBAM awareness module. The context sensing module is provided with three branches, wherein the three branches respectively comprise 1, 2 and 3 multiplied by 3 convolution kernels, and the output results of the three branches are combined into a characteristic diagram through channel cascading operation and input into the CBAM attention module.
The mask face detection model is trained by face data of a wearer wearing a mask. Each image in the face data has a tag file annotated with the face position and mask position information. After the image is input into the model, the model outputs corresponding prediction results according to the extracted features, wherein the prediction results comprise the coordinates (upper left corner and lower right corner) of the face, the confidence of the face, the coordinates (upper left corner and lower right corner) of the mask and the confidence of the mask. And calculating a loss value between a predicted result and a true value in a tag file through a preset mask face loss function, and training the mask face detection model by taking the loss value as an optimization target.
The mask face loss function adopts multitasking loss, and consists of position offset loss and confidence loss of the face, and position offset loss and confidence loss of the mask, and the expression is as follows:
wherein L represents a loss value of the mask face detection model; l (L) conf (. Cndot.) and L loc (. Cndot.) represents the confidence loss function and the position offset loss function, respectively;indicating whether a face exists (1 if a face exists, 0 if no face exists),indicating whether there is a mask (1 if there is a mask, 0 if there is no mask), and +.>Coordinates representing a face region, < >>Coordinates representing the mask area; p (P) fc Representing confidence level of predicting existence of face, P mc Indicating confidence level of predicted mask presence, P fl Representing predicted face region coordinates, P fl Representing predicted mask region coordinates; alpha represents a confidence penalty term factor and beta represents a position offset penalty term factor.
S103, cutting the image of the user to be detected according to the position information of the face and the mask, obtaining a complete face area and an eyebrow area in the image, and carrying out image denoising or enhancing treatment;
specifically, according to the output result of the mask face detection model, the image of the user to be detected is cut. And cutting out a face region from the image of the user to be detected according to the predicted face region coordinates, and cutting out an eyebrow region which is not shielded above the boundary line according to the predicted mask region coordinates by taking the upper boundary of the mask region as the boundary line. And after cutting, obtaining corresponding face area and eyebrow area images. In order to improve the image quality and ensure the accuracy of the subsequent feature extraction and feature matching, the face and eyebrow region images are further subjected to denoising or enhancement processing.
S104, inputting the face area and the eyebrow area into a mask face feature extraction network, wherein the face area enters a main path network to extract overall outline features, the mask shielding area enters a branch path network to extract local eyebrow features, and finally, integrating the two information through a feature fusion module to output fused mask face features;
as shown in fig. 3, in order to utilize all the describable features in the mask face image as much as possible, the mask face feature extraction network adopts a parallel design of a main path network and a branch path network, which are respectively used for extracting the whole outline features and the local eyebrow features of the face, and finally the mask face features are obtained through a whole-local feature fusion module. In order to ensure the feature extraction efficiency, the main network and the branch network respectively adopt two lightweight networks, namely an acceptance V3 network and a MobileNet network. Meanwhile, in order to make the main network pay more attention to the outline and appearance characteristics of the face, a CBAM attention module is connected after the InceptionV 3. The main network and the branch network are finally connected with a whole-local feature fusion module.
As shown in fig. 4The whole-local feature fusion module is provided with two stages, wherein the first stage is an information exchange stage and the second stage is an information integration stage. The overall outline characteristics respectively output by the main path network and the branch path network are F o ∈R C×H×W The local eyebrow is characterized by F l ∈R C×H×W . Wherein R is C×H×W The spatial dimensions representing the feature map are composed of the number of channels C, the height H and the width W, respectively. When F o And F l After being input into the integral-local feature fusion module, the mask facial features F are obtained through an information exchange stage and an information integration stage in sequence m ∈R C×H×W
In the information exchange stage, the integral outline feature F o And local eyebrow feature F l After the channel dimensions are compressed through a 1X 1 convolution kernel respectively, the two are combined into a feature F by using a bilinear fusion operation b ∈R C×H×W . Feature F b And feature F o And F l After channel cascade, the weight W is obtained by a 1X 1 convolution kernel and softmax function o ∈R C×H×W And W is l ∈R C×H×W . Weight W o And W is l Respectively with characteristic F o And F l Multiplying, and respectively passing through a 1×1 convolution kernel to obtain feature F o ′∈R C×H×W And F l ′∈R C×H×W . Finally, F b 、F o ' and F l The three characteristics are subjected to channel cascade to obtain an output characteristic F of the first stage s1 ∈R 3C ×H×W . The above process can be expressed by the following formula:
F b =bilinear(conv 1×1 (F o ),conv 1×1 (F l ))
W o ,W l =softmax(conv 1×1 (cat(F o ,F b ,F l )))
F s1 =cat(conv 1×1 (F o :W o ),F b ,conv 1×1 (F l :W l ))
wherein conv 1×1 (. Cndot.) represents a 1X 1 convolution operationThe method comprises the steps of carrying out a first treatment on the surface of the bilinear (·) represents a bilinear fusion operation; softmax (·) represents the softmax function; representing the multiplication of the matrix by elements; cat (-) indicates a channel cascade operation.
In the information integration stage, feature F s1 Respectively into an identity branch and a residual branch. The identity branch is composed of only one 1×1 convolution kernel, and the residual branch is composed of one 1×1 convolution kernel, one 3×3 depth separable convolution kernel, a ReLU activation function and one 1×1 convolution kernel which are connected in sequence. Adding the outputs of the identity branch and the residual branch, and regularizing to obtain the fused mask face feature F m . The above procedure can be represented by the following formula:
F indentity =conv 1×1 (F s1 )
F residual =conv 1×1 (relu(DWconv 3×3 (conv 1×1 (F s1 ))))
wherein F is identity ∈R C×H×W And F residual ∈R C×H×W Output characteristics of the identity branch and the residual branch are respectively represented; DWconv 3×3 (. Cndot.) represents a 3 x 3 depth separable convolution operation; reLU (·) represents a ReLU activation function; norm (·) represents the regularization operation.
The mask face feature extraction network adopts face data of a mask to train, adopts a GAN network to generate a mask for the disclosed face recognition data set in order to expand the existing face recognition data set of the mask to simulate the face data set of the mask, and increases sample diversity. And simultaneously, the data set is subjected to similar clipping processing to obtain images of the face area and the eyebrow area and corresponding character identity labels. In the training stage, the tail end of the mask face feature extraction network is connected with a full connection layer, classification is carried out according to the extracted mask face features, a ternary group loss function is adopted to calculate a loss value between a classification result and a true value, and the network is trained by taking the loss value reduction as a target to enable the network to be converged. And in the model operation stage, extracting the mask face characteristics of the user to be detected from the input face area and eyebrow area images by using a mask face characteristic extraction network without a full connection layer.
S105, inputting the facial features of the mask into a classifier to obtain the identification result of the identity of the user to be detected.
Specifically, the facial features of the mask of the user to be tested are input into a trained classifier, and a prediction result of the identity of the user to be tested is obtained. The classifier is trained by using registered user samples with features extracted through the mask face feature extraction network in advance, so that the classifier can be correctly matched with the identity information of a user according to similar mask face features.
Those skilled in the art will appreciate that all or part of the steps in a method implementing the above embodiments may be implemented by a program to instruct related hardware, and the corresponding program may be stored in a computer readable storage medium.
It should be noted that although the method operations of the above embodiments are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in that particular order or that all illustrated operations be performed in order to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
Example 2:
as shown in fig. 5, the present embodiment provides a mask-wearing face recognition system with local and whole face features fused, which includes a user image acquisition module 501, a mask face detection module 502, an image preprocessing module 503, a mask face feature extraction module 504 and a mask face matching module 505, wherein:
the user image acquisition module 501 is used for acquiring image data of a user to be detected so as to identify the identity; and the system is also used for collecting images of registered users so as to train a classifier for matching identity information.
The mask face detection module 502 is configured to accurately locate a face region and a mask region in a user image, and output position information of the corresponding region;
an image preprocessing module 503, which performs clipping processing on the user image according to the position information output by the mask face detection module, and performs preprocessing such as denoising and enhancement on the clipped image;
the mask face feature extraction module 504 is configured to extract overall contour features and local eyebrow features from the face region and the eyebrow region images, and fuse the two information to obtain robust mask face features;
the mask face matching module 505 classifies the mask face according to the extracted mask face features, and matches the identities of the persons registered in the library.
Specific implementation of each module in this embodiment may be referred to embodiment 1 above, and will not be described in detail herein; it should be noted that, in the system provided in this embodiment, only the division of the above functional modules is used as an example, in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure is divided into different functional modules to perform all or part of the functions described above.
Example 3:
the present embodiment provides a computer device, which may be a computer, and the computer device is connected to a processor, a memory, an input system, a display and a network interface through a system bus, where the processor is configured to provide computing and control capabilities, the memory includes a nonvolatile storage medium and an internal memory, the nonvolatile storage medium stores an operating system, a computer program and a database, the internal memory provides an environment for the operating system and the computer program in the nonvolatile storage medium, and when the processor executes the computer program stored in the memory, the facial mask face recognition method of the foregoing embodiment 1 is implemented, as follows:
acquiring an image of a user to be detected through a front-end camera;
inputting the image of the user to be detected into a mask face detection model, and outputting position information about the face and the mask;
cutting the user image to be detected according to the position information of the face and the mask, obtaining a complete face area and an eyebrow area in the image, and carrying out image denoising or enhancing treatment;
inputting the face region and the eyebrow region into a mask face feature extraction network, wherein the face region enters a main path network to extract overall outline features, the non-mask shielding region enters a branch path network to extract local eyebrow features, and finally, integrating the two information through a feature fusion module to output fused mask face features;
and inputting the facial features of the mask into a classifier to obtain the identification result of the identity of the user to be detected.
Example 4:
the present embodiment provides a storage medium, which is a computer-readable storage medium storing a computer program that, when executed by a processor, implements the face recognition method for a mask of the above embodiment 1, as follows:
acquiring an image of a user to be detected through a front-end camera;
inputting the image of the user to be detected into a mask face detection model, and outputting position information about the face and the mask;
cutting the user image to be detected according to the position information of the face and the mask, obtaining a complete face area and an eyebrow area in the image, and carrying out image denoising or enhancing treatment;
inputting the face region and the eyebrow region into a mask face feature extraction network, wherein the face region enters a main path network to extract overall outline features, the non-mask shielding region enters a branch path network to extract local eyebrow features, and finally, integrating the two information through a feature fusion module to output fused mask face features;
and inputting the facial features of the mask into a classifier to obtain the identification result of the identity of the user to be detected. The computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention in further detail, and are not to be construed as limiting the scope of the invention, but are merely intended to cover any modifications, equivalents, improvements, etc. based on the teachings of the invention.

Claims (5)

1. The face recognition method of the mask wearing the face with the cross fusion of the local face features and the whole face features is characterized by comprising the following steps of:
acquiring an image of a user to be detected through a front-end camera;
inputting the user image to be detected into a mask face detection model, outputting position information about a face and a mask, and cutting the user image to be detected according to the output result of the mask face detection model; cutting out a face region from the image of the user to be detected according to the predicted face region coordinates, and cutting out an eyebrow region which is not shielded above the boundary line according to the predicted mask region coordinates by taking the upper boundary of the mask region as the boundary line; after cutting, obtaining corresponding face area and eyebrow area images; in order to improve the image quality and ensure the accuracy of the subsequent feature extraction and feature matching, the face and eyebrow region images are further subjected to denoising or enhancement treatment, and the network structure of the mask face detection model consists of a trunk, a neck and a detection head; the trunk part adopts a general feature extraction network ResNet; the neck adopts FPN to refine the original feature map, and aggregate semantic information of different layers; the detection head adopts an SSD algorithm, and a context attention module is added in the detection head to enable the network to pay attention to a face area and an eye-brow area;
the context attention module consists of a context awareness module and a CBAM attention module; the context sensing module is provided with three branches, wherein the three branches respectively comprise 1, 2 and 3 multiplied by 3 convolution kernels, and the output results of the three branches are combined into a feature map through channel cascading operation and are input into the CBAM attention module;
the mask face detection model is trained by face data of a wearer wearing a mask; each image in the face data has a label file annotated with the face position and mask position information; after inputting the image into the model, the model outputs a corresponding prediction result according to the extracted characteristics, wherein the prediction result comprises the coordinates of a face and the confidence of the face, the coordinates of a mask and the confidence of the mask, and a loss value between the prediction result and a true value in a tag file is calculated through a preset mask face loss function, so that the loss value is reduced as an optimization target, and the mask face detection model is trained;
the mask face loss function adopts multitasking loss, and consists of face position offset loss and confidence loss, and mask position offset loss and confidence loss, and the expression is as follows:
wherein L represents a loss value of the mask face detection model; l (L) conf (. Cndot.) and L loc (. Cndot.) represents the confidence loss function and the position offset loss function, respectively;indicating whether a face exists, if so, it is 1, and if not, it is 0,/if not>Indicating whether there is a mask, if there is a mask, 1, if there is no mask, 0,/if there is no mask>Coordinates representing a face region, < >>Coordinates representing the mask area; p (P) fc Representing confidence level of predicting existence of face, P mc Indicating confidence level of predicted mask presence, P fl Representing predicted face region coordinates, P fl Representing predicted mask region coordinates; alpha represents a confidence loss term factor, and beta represents a position offset loss term factor;
cutting the user image to be detected according to the position information of the face and the mask, obtaining a complete face area and an eyebrow area in the image, and carrying out image denoising or enhancing treatment;
inputting the face region and the eyebrow region into a mask face feature extraction network, wherein the face region enters a main network to extract overall outline features, the non-mask shielding region enters a branch network to extract local eyebrow features, and finally, after integrating the two information through a feature fusion module, outputting fused mask face features, and inputting the face region and the eyebrow region images into the mask face feature extraction network to obtain mask face features; in order to utilize all descriptive characteristics in the mask face image as much as possible, the mask face characteristic extraction network adopts a parallel design of a main path network and a branch path network, and is respectively used for extracting the whole outline characteristics and the local eyebrow characteristics of the face, and finally the mask face characteristics are obtained through a whole-local characteristic fusion module; in order to ensure the feature extraction efficiency, the main network and the branch network respectively adopt two lightweight networks, namely an InceptionV3 network and a MobileNet network; meanwhile, in order to make the main network pay more attention to the outline and appearance characteristics of the face, a CBAM attention module is connected behind the InceptionV 3; the main network and the branch network are finally connected with a whole-local feature fusion module;
the whole-local feature fusion module is provided with two stages, wherein the first stage is an information interaction stage, the second stage is an information integration stage, and the whole outline features respectively output by a main network and a branch network are set as F o ∈R C×H×W The local eyebrow is characterized by F l ∈R C×H×W Wherein R is C×H×W The spatial dimension representing the feature map is composed of the channel number C, the height H and the width W, respectively, when F o And F l After being input into the integral-local feature fusion module, the mask facial features F are obtained through an information interaction stage and an information integration stage in sequence m ∈R C×H×W
In the information interaction stage, the integral outline feature F o And local eyebrow feature F l After the channel dimensions are compressed through a 1X 1 convolution kernel respectively, the two are combined into a feature F by using a bilinear fusion operation b ∈R C×H×W Feature F b And feature F o And F l After channel cascade, the weight W is obtained by a 1X 1 convolution kernel and softmax function o ∈R C×H×W And W is l ∈R C×H×W The method comprises the steps of carrying out a first treatment on the surface of the Weight W o And W is l Respectively with characteristic F o And F l Multiplication, respectively through a 1X 1 convolution kernel, to obtain the characteristic F' o ∈R C×H×W And F l ′∈R C ×H×W The method comprises the steps of carrying out a first treatment on the surface of the Finally, F b 、F′ o And F l The three characteristics are subjected to channel cascade to obtain an output characteristic F of the first stage s1 ∈R 3C×H×W The method comprises the steps of carrying out a first treatment on the surface of the The above process can be expressed by the following formula:
F b =bilinear(conv 1×1 (F o ),conv 1×1 (F l ))
W o ,W l =softmax(conv 1×1 (cat(F o ,F b ,F l )))
F s1 =cat(conv 1×1 (F o ⊙W o ),F b ,conv 1×1 (F l ⊙W l ))
wherein conv 1×1 (. Cndot.) represents a 1X 1 convolution operation; bilinear (·) represents a bilinear fusion operation; softmax (·) represents the softmax function; the "" indicates that the matrix is multiplied by element; cat (-) represents a channel cascade operation;
in the information integration stage, feature F s1 Respectively entering an identity branch and a residual branch; the identical branch is formed by only one 1 multiplied by 1 convolution kernel, and the residual branch is formed by sequentially connecting a 1 multiplied by 1 convolution kernel, a 3 multiplied by 3 depth separable convolution kernel, a ReLU activation function and a 1 multiplied by 1 convolution kernel; adding the outputs of the identity branch and the residual branch, and regularizing to obtain the fused mask face feature F m The method comprises the steps of carrying out a first treatment on the surface of the The above procedure can be represented by the following formula:
F indentity =conv 1×1 (F s1 )
F residual =conv 1×1 (relu(DWconv 3×3 (conv 1×1 (F s1 ))))
wherein F is identity ∈R C×H×W And F residual ∈R C×H×W Output characteristics of the identity branch and the residual branch are respectively represented; DWconv 3×3 (. Cndot.) represents a 3 x 3 depth separable convolution operation; reLU (·) represents a ReLU activation function; norm (·) represents a regularization operation;
the mask face feature extraction network is used for training face data of a mask, and in order to expand the existing face recognition data set of the mask, a GAN network is used for generating a mask for the disclosed face recognition data set so as to simulate the face data set of the mask and increase sample diversity; simultaneously, the data set is subjected to similar clipping processing to obtain images of a face area and an eyebrow area and corresponding character identity labels; in the training stage, the tail end of the mask face feature extraction network is connected with a full connection layer, classification is carried out according to the extracted mask face features, a ternary group loss function is adopted to calculate a loss value between a classification result and a true value, and the network is trained by taking the loss value as a target to enable the loss value to be converged; in the model operation stage, extracting mask face features of a user to be detected from the input face area and eyebrow area images by using a mask face feature extraction network without a full connection layer;
and inputting the facial features of the mask into a classifier to obtain the identification result of the identity of the user to be detected.
2. The face recognition method of the mask for wearing, which is characterized in that the local face features and the whole face features are crossed and fused, according to claim 1, wherein: inputting the facial features of the mask of the user to be tested into a trained classifier to obtain a prediction result of the identity of the user to be tested; the classifier is trained by using registered user samples with features extracted through the mask face feature extraction network in advance, so that the classifier can be correctly matched with the identity information of a user according to the mask face features.
3. A facial mask face recognition device for cross fusion of local and global face features, comprising:
the user image acquisition module is used for acquiring image data of a user to be detected so as to identify the identity; meanwhile, the system is also used for collecting images of registered users so as to train a classifier for matching identity information;
the mask face detection module is used for accurately positioning a face area and a mask area in a user image and outputting position information of the corresponding areas, and comprises a mask face detection model, wherein the network structure of the mask face detection model consists of a trunk, a neck and a detection head; the trunk part adopts a general feature extraction network ResNet; the neck adopts FPN to refine the original feature map, and aggregate semantic information of different layers; the detection head adopts an SSD algorithm, and a context attention module is added in the detection head to enable the network to pay attention to a face area and an eye-brow area;
the image preprocessing module is used for cutting the user image according to the position information output by the mask face detection module, and denoising and enhancing the cut image;
the mask face feature extraction module is used for respectively extracting the whole outline features and the local eyebrow features from the face area and the eyebrow area images, obtaining robust mask face features after integrating the whole outline features and the local eyebrow features, wherein the mask face feature extraction module comprises a mask face feature extraction network, the mask face feature extraction network adopts a parallel design of a main network and a branch network and is respectively used for extracting the whole outline features and the local eyebrow features of the face, finally, the mask face features are obtained through a whole-local feature integration module, the mask face feature extraction network adopts mask wearing face data for training, a GAN network is adopted for expanding the existing mask wearing face recognition data set, a mask is generated for the disclosed face recognition data set so as to simulate the mask face data set, and the sample diversity is increased; simultaneously, the data set is subjected to similar clipping processing to obtain images of a face area and an eyebrow area and corresponding character identity labels; in the training stage, the tail end of the mask face feature extraction network is connected with a full connection layer, classification is carried out according to the extracted mask face features, a ternary group loss function is adopted to calculate a loss value between a classification result and a true value, and the network is trained by taking the loss value as a target to enable the loss value to be converged; in the model operation stage, extracting mask face features of a user to be detected from the input face area and eyebrow area images by using a mask face feature extraction network without a full connection layer;
and the mask face matching module is used for classifying according to the extracted mask face characteristics and matching the identities of the registered characters in the library.
4. A computer device comprising a processor and a memory for storing a program executable by the processor, when executing the program stored in the memory, implementing a facial mask face recognition method of intersecting fusion of local and global facial features as defined in claim 1.
5. A storage medium storing a program which, when executed by a processor, implements a face recognition method for a mask that cross-fuses local and global face features as claimed in claim 1.
CN202210990521.0A 2022-08-18 2022-08-18 Face recognition method, device, equipment and medium for wearing mask by cross fusion of local face features and whole face features Active CN115457624B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210990521.0A CN115457624B (en) 2022-08-18 2022-08-18 Face recognition method, device, equipment and medium for wearing mask by cross fusion of local face features and whole face features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210990521.0A CN115457624B (en) 2022-08-18 2022-08-18 Face recognition method, device, equipment and medium for wearing mask by cross fusion of local face features and whole face features

Publications (2)

Publication Number Publication Date
CN115457624A CN115457624A (en) 2022-12-09
CN115457624B true CN115457624B (en) 2023-09-01

Family

ID=84298205

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210990521.0A Active CN115457624B (en) 2022-08-18 2022-08-18 Face recognition method, device, equipment and medium for wearing mask by cross fusion of local face features and whole face features

Country Status (1)

Country Link
CN (1) CN115457624B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116311553B (en) * 2023-05-17 2023-08-15 武汉利楚商务服务有限公司 Human face living body detection method and device applied to semi-occlusion image
CN116895091A (en) * 2023-07-24 2023-10-17 山东睿芯半导体科技有限公司 Facial recognition method and device for incomplete image, chip and terminal
CN116883670B (en) * 2023-08-11 2024-05-14 智慧眼科技股份有限公司 Anti-shielding face image segmentation method
CN117152575B (en) * 2023-10-26 2024-02-02 吉林大学 Image processing apparatus, electronic device, and computer-readable storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461028A (en) * 2020-04-02 2020-07-28 杭州视在科技有限公司 Mask detection model training and detection method, medium and device in complex scene
CN111914630A (en) * 2020-06-19 2020-11-10 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for generating training data for face recognition
CN112560828A (en) * 2021-02-25 2021-03-26 佛山科学技术学院 Lightweight mask face recognition method, system, storage medium and equipment
CN112597867A (en) * 2020-12-17 2021-04-02 佛山科学技术学院 Face recognition method and system for mask, computer equipment and storage medium
CN113158883A (en) * 2021-04-19 2021-07-23 汇纳科技股份有限公司 Face recognition method, system, medium and terminal based on regional attention
CN113283405A (en) * 2021-07-22 2021-08-20 第六镜科技(北京)有限公司 Mask detection method and device, computer equipment and storage medium
WO2021174880A1 (en) * 2020-09-01 2021-09-10 平安科技(深圳)有限公司 Feature extraction model training method, facial recognition method, apparatus, device and medium
CN113807332A (en) * 2021-11-19 2021-12-17 珠海亿智电子科技有限公司 Mask robust face recognition network, method, electronic device and storage medium
CN114220143A (en) * 2021-11-26 2022-03-22 华南理工大学 Face recognition method for wearing mask
JP2022053158A (en) * 2020-09-24 2022-04-05 エヌ・ティ・ティ・コミュニケーションズ株式会社 Information processor, information processing method, and information processing program
CN114360033A (en) * 2022-03-18 2022-04-15 武汉大学 Mask face recognition method, system and equipment based on image convolution fusion network

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461028A (en) * 2020-04-02 2020-07-28 杭州视在科技有限公司 Mask detection model training and detection method, medium and device in complex scene
CN111914630A (en) * 2020-06-19 2020-11-10 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for generating training data for face recognition
WO2021174880A1 (en) * 2020-09-01 2021-09-10 平安科技(深圳)有限公司 Feature extraction model training method, facial recognition method, apparatus, device and medium
JP2022053158A (en) * 2020-09-24 2022-04-05 エヌ・ティ・ティ・コミュニケーションズ株式会社 Information processor, information processing method, and information processing program
CN112597867A (en) * 2020-12-17 2021-04-02 佛山科学技术学院 Face recognition method and system for mask, computer equipment and storage medium
CN112560828A (en) * 2021-02-25 2021-03-26 佛山科学技术学院 Lightweight mask face recognition method, system, storage medium and equipment
CN113158883A (en) * 2021-04-19 2021-07-23 汇纳科技股份有限公司 Face recognition method, system, medium and terminal based on regional attention
CN113283405A (en) * 2021-07-22 2021-08-20 第六镜科技(北京)有限公司 Mask detection method and device, computer equipment and storage medium
CN113807332A (en) * 2021-11-19 2021-12-17 珠海亿智电子科技有限公司 Mask robust face recognition network, method, electronic device and storage medium
CN114220143A (en) * 2021-11-26 2022-03-22 华南理工大学 Face recognition method for wearing mask
CN114360033A (en) * 2022-03-18 2022-04-15 武汉大学 Mask face recognition method, system and equipment based on image convolution fusion network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭富海.基于YOLOv4分块权重剪枝的口罩佩戴检测及嵌入式实现.《中国优秀硕士学位论文全文数据库 (信息科技辑)》.2022,(第2022(02)期),I138-515. *

Also Published As

Publication number Publication date
CN115457624A (en) 2022-12-09

Similar Documents

Publication Publication Date Title
CN115457624B (en) Face recognition method, device, equipment and medium for wearing mask by cross fusion of local face features and whole face features
Li et al. A deep learning-based hybrid framework for object detection and recognition in autonomous driving
Li et al. An automatic detection method of bird’s nest on transmission line tower based on faster_RCNN
CN111639544B (en) Expression recognition method based on multi-branch cross-connection convolutional neural network
CN109948526B (en) Image processing method and device, detection equipment and storage medium
Yang et al. Masked relation learning for deepfake detection
CN111898432B (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
Sincan et al. Using motion history images with 3d convolutional networks in isolated sign language recognition
Tan et al. Fine-grained classification via hierarchical bilinear pooling with aggregated slack mask
He et al. Semi-supervised skin detection by network with mutual guidance
CN105893941B (en) A kind of facial expression recognizing method based on area image
CN114332893A (en) Table structure identification method and device, computer equipment and storage medium
Wu et al. Traffic sign detection based on SSD combined with receptive field module and path aggregation network
Li et al. Distracted driving detection by combining ViT and CNN
Su et al. Pose graph parsing network for human-object interaction detection
Hu et al. Hierarchical attention vision transformer for fine-grained visual classification
Agrawal et al. Multimodal vision transformers with forced attention for behavior analysis
Jing et al. SmokeSeger: a transformer-CNN coupled model for urban scene smoke segmentation
WO2023246921A1 (en) Target attribute recognition method and apparatus, and model training method and apparatus
Mohod et al. Human detection in surveillance video using deep learning approach
Li et al. Sequential interactive biased network for context-aware emotion recognition
CN116311495A (en) Dual-stream global-local action recognition method, system, equipment and storage medium based on video input
Chen et al. Object detection using dual graph network
Das et al. Pedestrian detection in thermal and color images using a new combination of saliency network and Faster R-CNN
Lin et al. A coarse-to-fine pattern parser for mitigating the issue of drastic imbalance in pixel distribution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant