CN115457624B

CN115457624B - Face recognition method, device, equipment and medium for wearing mask by cross fusion of local face features and whole face features

Info

Publication number: CN115457624B
Application number: CN202210990521.0A
Authority: CN
Inventors: 陈岸明; 温峻峰; 林群雄; 洪小龙; 孙全忠; 李鑫; 杜海江; 罗海涛
Original assignee: Zhongke Tianwang Guangdong Technology Co ltd
Current assignee: Zhongke Tianwang Guangdong Technology Co ltd
Priority date: 2022-08-18
Filing date: 2022-08-18
Publication date: 2023-09-01
Anticipated expiration: 2042-08-18
Also published as: CN115457624A

Abstract

The embodiment of the invention discloses a face recognition method, a device, equipment and a medium for a mask wearing face, wherein the face features of the mask wearing face are crossed and fused with the face features of the mask wearing face, and a front-end camera is used for acquiring an image of a user to be detected; inputting the image of the user to be detected into a mask face detection model, and outputting position information about the face and the mask; cutting the image of the user to be detected according to the position information of the face and the mask, obtaining a complete face area in the image and a face area which is not blocked by the mask, and carrying out image denoising or enhancing treatment; inputting the face region and the face region which is not blocked by the mask into a mask face feature extraction network, inputting the face region into a main network to extract overall contour features, integrating the two information through a feature fusion module, and outputting the fused mask face features; and inputting the facial features of the mask into a classifier to obtain the identification result of the identity of the user to be detected. The network has the function of detecting the face area and the mask area.

Description

Face recognition method, device, equipment and medium for wearing mask by cross fusion of local face features and whole face features

Technical Field

The embodiment of the invention relates to the technical field of face recognition, in particular to a face recognition method, device, equipment and medium for wearing mask by cross fusion of local and whole face features

Background

Under the push of the deep learning theory, the face recognition technology is breakthrough developed and mature day by day, and is widely applied to daily life, such as face payment, community access control, station security inspection systems and the like. However, these applications currently often require users to use under controlled conditions, and under unconstrained conditions such as poor ambient lighting, face occlusion, etc., the accuracy of face recognition is often compromised.

Dai Kouzhao face recognition belongs to the category of shielding face recognition, and aims to solve the problem of recognition accuracy reduction caused by facial feature reduction under shielding of a mask. Currently, there are two common ideas: firstly, the facial features of the non-shielding area are effectively utilized, and secondly, the shielded area, namely the inherent features under the mask, is repaired. The shielding position caused by the mask is relatively fixed, the former method is effective, but the method ignores the effect of the whole facial features on recognition; while the latter usually employs generation of an countermeasure network to repair the image around both local and global aspects, such methods are often difficult to train and it is difficult to ensure consistency of the repaired area and other areas. At present, a learner puts forward a method based on feature fusion, utilizes human context information to assist in completing face recognition, and the model has better stability, and fused features can effectively reduce the influence of mask shielding on classification, but how to enhance the discrimination of fused information and reduce the redundancy of information, and put forward a more efficient and accurate face recognition method for masks, which is still the problem to be considered in the next step

Disclosure of Invention

The embodiment of the invention aims to provide a face recognition method, device, equipment and medium for a mask with local and whole face features in a cross fusion mode, which are used for solving the problems in the background technology.

In order to achieve the above purpose, the embodiment of the present invention mainly provides the following technical solutions: the face recognition method of the mask wearing the face with the cross fusion of the local face features and the whole face features is characterized by comprising the following steps of:

acquiring an image of a user to be detected through a front-end camera;

inputting the image of the user to be detected into a mask face detection model, and outputting position information about the face and the mask;

cutting the user image to be detected according to the position information of the face and the mask, obtaining a complete face area and an eyebrow area in the image, and carrying out image denoising or enhancing treatment;

inputting the face region and the eyebrow region into a mask face feature extraction network, wherein the face region enters a main path network to extract overall outline features, the non-mask shielding region enters a branch path network to extract local eyebrow features, and finally, integrating the two information through a feature fusion module to output fused mask face features;

and inputting the facial features of the mask into a classifier to obtain the identification result of the identity of the user to be detected.

Preferably, the network structure of the mask face detection model consists of a trunk, a neck and a detection head; the trunk part adopts a general feature extraction network ResNet; the neck adopts FPN to refine the original feature map, and aggregate semantic information of different layers; the detection head adopts an SSD algorithm, and a context attention module is added in the detection head to enable the network to pay attention to the face and mask area;

the context awareness module is composed of a context awareness module and a CBAM awareness module. The context sensing module is provided with three branches, wherein the three branches respectively comprise 1, 2 and 3 multiplied by 3 convolution kernels, and the output results of the three branches are combined into a feature map through channel cascading operation and are input into the CBAM attention module;

the mask face detection model is trained by face data of a wearer wearing a mask. Each image in the face data has a tag file annotated with the face position and mask position information. After inputting the image into the model, the model outputs a corresponding prediction result according to the extracted characteristics, wherein the prediction result comprises the coordinates of a face and the confidence of the face, the coordinates of a mask and the confidence of the mask, and a loss value between the prediction result and a true value in a tag file is calculated through a preset mask face loss function, so that the loss value is reduced as an optimization target, and the mask face detection model is trained;

the mask face loss function adopts multitasking loss, and consists of face position offset loss and confidence loss, and mask position offset loss and confidence loss, and the expression is as follows:

wherein L represents a loss value of the mask face detection model; l (L) _conf (. Cndot.) and L _loc (. Cndot.) represents the confidence loss function and the position offset loss function, respectively;indicating whether a face exists (1 if a face exists, 0 if no face exists),indicating whether or not there is a mask (if there is a mask)1, 0 if not present),>coordinates representing a face region, < >>Coordinates representing the mask area; p (P) _fc Representing confidence level of predicting existence of face, P _mc Indicating confidence level of predicted mask presence, P _fl Representing predicted face region coordinates, P _fl Representing predicted mask region coordinates; alpha represents a confidence penalty term factor and beta represents a position offset penalty term factor.

Preferably, the user image to be detected is cut according to the output result of the mask face detection model. And cutting out a face region from the image of the user to be detected according to the predicted face region coordinates, and cutting out an eyebrow region which is not shielded above the boundary line according to the predicted mask region coordinates by taking the upper boundary of the mask region as the boundary line. And after cutting, obtaining corresponding face area and eyebrow area images. In order to improve the image quality and ensure the accuracy of the subsequent feature extraction and feature matching, the face and eyebrow region images are further subjected to denoising or enhancement processing.

Preferably, the facial area and the eyebrow area images are input into a facial feature extraction network of the mask, so that facial features of the mask are obtained; in order to utilize all descriptive characteristics in the mask face image as much as possible, the mask face characteristic extraction network adopts a parallel design of a main path network and a branch path network, and is respectively used for extracting the whole outline characteristics and the local eyebrow characteristics of the face, and finally the mask face characteristics are obtained through a whole-local characteristic fusion module; in order to ensure the feature extraction efficiency, the main network and the branch network respectively adopt two lightweight networks, namely an acceptance V3 network and a Mobi leNet network. Meanwhile, in order to make the main network pay more attention to the outline and appearance characteristics of the face, a CBAM attention module is connected after the InceptionV 3. The main network and the branch network are finally connected with a whole-local feature fusion module;

the whole-local feature fusion module is provided with two stages, wherein the first stage is an information interaction stage, the second stage is an information integration stage, and the whole outline features respectively output by a main network and a branch network are set as F _o ∈R ^C×H×W The local eyebrow is characterized by F _l ∈R ^C×H×W Wherein R is ^C×H×W The spatial dimension representing the feature map is composed of the channel number C, the height H and the width W, respectively, when F _o And F _l After being input into the integral-local feature fusion module, the mask facial features F are obtained through an information interaction stage and an information integration stage in sequence _m ∈R ^C×H×W ；

In the information interaction stage, the integral outline feature F _o And local eyebrow feature F _l After the channel dimensions are compressed through a 1X 1 convolution kernel respectively, the two are combined into a feature F by using a bilinear fusion operation _b ∈R ^C×H×W Feature F _b And feature F _o And F _l After channel cascade, the weight W is obtained by a 1X 1 convolution kernel and softmax function _o ∈R ^C×H×W And W is _l ∈R ^C×H×W . Weight W _o And W is _l Respectively with characteristic F _o And F _l Multiplying, and respectively passing through a 1×1 convolution kernel to obtain feature F _o ′∈R ^C×H×W And F _l ′∈R ^C×H×W . Finally, F _b 、F _o ' and F _l The three characteristics are subjected to channel cascade to obtain an output characteristic F of the first stage _s1 ∈R ^3C ^×H×W . The above process can be expressed by the following formula:

F _b ＝bilinear(conv ^1×1 (F _o ),conv ^1×1 (F _l ))

W _o ,W _l ＝softmax(conv ^1×1 (cat(F _o ,F _b ,F _l )))

F _s1 ＝cat(conv ^1×1 (F _o ⊙W _o ),F _b ,conv ^1×1 (F _l ⊙W _l ))

wherein conv ^1×1 (. Cndot.) represents a 1X 1 convolution operationPerforming; bilinear (·) represents a bilinear fusion operation; softmax (·) represents the softmax function; the "" indicates that the matrix is multiplied by element; cat (-) represents a channel cascade operation;

in the information integration stage, feature F _s1 Respectively into an identity branch and a residual branch. The identity branch is composed of only one 1×1 convolution kernel, and the residual branch is composed of one 1×1 convolution kernel, one 3×3 depth separable convolution kernel, a ReLU activation function and one 1×1 convolution kernel which are connected in sequence. Adding the outputs of the identity branch and the residual branch, and regularizing to obtain the fused mask face feature F _m The method comprises the steps of carrying out a first treatment on the surface of the The above procedure can be represented by the following formula:

F _indentity ＝conv ^1×1 (F _s1 )

F _residual ＝conv ^1×1 (relu(DWconv ^3×3 (conv ^1×1 (F _s1 ))))

wherein F is _identity ∈R ^C×H×W And F _residual ∈R ^C×H×W Output characteristics of the identity branch and the residual branch are respectively represented; DWconv ^3×3 (. Cndot.) represents a 3 x 3 depth separable convolution operation; reLU (·) represents a ReLU activation function; norm (·) represents a regularization operation;

the mask face feature extraction network is used for training face data of a mask, and is used for expanding an existing face recognition dataset of the mask, a GAN network is used for generating the mask for the disclosed face recognition dataset so as to simulate the face dataset of the mask and increase sample diversity. And simultaneously, the data set is subjected to similar clipping processing to obtain images of the face area and the eyebrow area and corresponding character identity labels. In the training stage, the tail end of the mask face feature extraction network is connected with a full connection layer, classification is carried out according to the extracted mask face features, a ternary group loss function is adopted to calculate a loss value between a classification result and a true value, and the network is trained by taking the loss value reduction as a target to enable the network to be converged. And in the model operation stage, extracting the mask face characteristics of the user to be detected from the input face area and eyebrow area images by using a mask face characteristic extraction network without a full connection layer.

Preferably, the facial features of the mask of the user to be tested are input into a trained classifier to obtain a prediction result of the identity of the user to be tested. The classifier is trained by using registered user samples with features extracted through the mask face feature extraction network in advance, so that the classifier can be correctly matched with the identity information of a user according to similar mask face features.

A face recognition system for a wearer's mask, the system comprising:

the user image acquisition module is used for acquiring image data of a user to be detected so as to identify the identity; meanwhile, the system is also used for collecting images of registered users so as to train a classifier for matching identity information;

the mask face detection module is used for accurately positioning a face area and a mask area in a user image and outputting position information of the corresponding area;

the image preprocessing module is used for cutting the user image according to the position information output by the mask face detection module and preprocessing such as denoising and enhancing the cut image;

the mask face feature extraction module is used for extracting integral outline features and local eyebrow and eye features from the face region and the eyebrow and eye region images respectively, and obtaining robust mask face features after information of the integral outline features and the local eyebrow and eye features are fused;

and the mask face matching module is used for classifying according to the extracted mask face characteristics and matching the identities of the registered characters in the library.

The computer equipment comprises a processor and a memory for storing a program executable by the processor, wherein the facial mask face recognition method is realized when the processor executes the program stored by the memory.

A storage medium storing a program which, when executed by a processor, implements the face recognition method for a wearer's face.

The technical scheme provided by the embodiment of the invention has at least the following advantages:

1. according to the invention, on the basis of the existing face detection network, mask positioning tasks are introduced, and the network has the function of detecting the face region and the mask region at the same time by utilizing a multitask learning mode, so that a good front foundation is provided for the subsequent feature extraction tasks;

2. in order to fully utilize the identifiable information of the face wearing the mask and reduce the influence caused by shielding of the mask as much as possible, the invention fuses the whole outline feature and the local eyebrow feature of the face, thereby obtaining the robust face feature of the mask and improving the accuracy of face recognition;

3. in order to make the generalization capability of the mask face feature extraction network stronger, the invention utilizes the conventional face data set disclosed in the prior art to form the face data set of the mask to be worn by the user through the GAN network, thereby increasing the diversity and the number of data samples.

Drawings

Fig. 1 is a flowchart of a face recognition method of a mask with a combination of local and global face features according to embodiment 1 of the present invention.

Fig. 2 is a schematic structural diagram of a face detection network of a mask according to embodiment 1 of the present invention.

Fig. 3 is a schematic structural diagram of a face extraction network of a mask according to embodiment 1 of the present invention.

Fig. 4 is a schematic structural diagram of a global-local feature fusion module according to embodiment 1 of the present invention.

Fig. 5 is a block diagram of a face recognition system for a mask according to embodiment 2 of the present invention.

Detailed Description

Further advantages and effects of the present invention will become apparent to those skilled in the art from the disclosure of the present invention, which is described by the following specific examples.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

Referring to fig. 1-5, four embodiments of the present invention are provided:

example 1:

in the embodiment, based on a Python programming language, a Pytorch deep learning framework is used to construct a network model structure, and training of a model is completed under a Ubuntu system. The hardware environment is Ubuntu18.04.3, and the GPU model is GeForce RTX2080T i.

As shown in fig. 1, the embodiment discloses a method, a device, equipment and a medium for face recognition of a mask with cross fusion of local and whole face features, which specifically comprises the following steps:

s101, acquiring an image of a user to be detected through a front-end camera.

The method comprises the steps of acquiring image data of a registered user by using a front-end camera in a pre-registration stage to train a classifier for matching user identities, and acquiring the image data of a current user to be tested by using the camera in a later use stage.

S102, inputting the image of the user to be detected into a mask face detection model, and outputting position information about a face and a mask;

as shown in fig. 2, the network structure of the mask face detection model consists of a trunk, a neck and a detection head. The trunk part adopts a general feature extraction network ResNet; the neck adopts FPN to refine the original feature map, and aggregate semantic information of different layers; the detection head adopts SSD algorithm, and adds a context attention module inside to make the network pay attention to the face and mask area.

Specifically, the context awareness module consists of a context awareness module and a CBAM awareness module. The context sensing module is provided with three branches, wherein the three branches respectively comprise 1, 2 and 3 multiplied by 3 convolution kernels, and the output results of the three branches are combined into a characteristic diagram through channel cascading operation and input into the CBAM attention module.

The mask face detection model is trained by face data of a wearer wearing a mask. Each image in the face data has a tag file annotated with the face position and mask position information. After the image is input into the model, the model outputs corresponding prediction results according to the extracted features, wherein the prediction results comprise the coordinates (upper left corner and lower right corner) of the face, the confidence of the face, the coordinates (upper left corner and lower right corner) of the mask and the confidence of the mask. And calculating a loss value between a predicted result and a true value in a tag file through a preset mask face loss function, and training the mask face detection model by taking the loss value as an optimization target.

The mask face loss function adopts multitasking loss, and consists of position offset loss and confidence loss of the face, and position offset loss and confidence loss of the mask, and the expression is as follows:

wherein L represents a loss value of the mask face detection model; l (L) _conf (. Cndot.) and L _loc (. Cndot.) represents the confidence loss function and the position offset loss function, respectively;indicating whether a face exists (1 if a face exists, 0 if no face exists),indicating whether there is a mask (1 if there is a mask, 0 if there is no mask), and +.>Coordinates representing a face region, < >>Coordinates representing the mask area; p (P) _fc Representing confidence level of predicting existence of face, P _mc Indicating confidence level of predicted mask presence, P _fl Representing predicted face region coordinates, P _fl Representing predicted mask region coordinates; alpha represents a confidence penalty term factor and beta represents a position offset penalty term factor.

S103, cutting the image of the user to be detected according to the position information of the face and the mask, obtaining a complete face area and an eyebrow area in the image, and carrying out image denoising or enhancing treatment;

specifically, according to the output result of the mask face detection model, the image of the user to be detected is cut. And cutting out a face region from the image of the user to be detected according to the predicted face region coordinates, and cutting out an eyebrow region which is not shielded above the boundary line according to the predicted mask region coordinates by taking the upper boundary of the mask region as the boundary line. And after cutting, obtaining corresponding face area and eyebrow area images. In order to improve the image quality and ensure the accuracy of the subsequent feature extraction and feature matching, the face and eyebrow region images are further subjected to denoising or enhancement processing.

S104, inputting the face area and the eyebrow area into a mask face feature extraction network, wherein the face area enters a main path network to extract overall outline features, the mask shielding area enters a branch path network to extract local eyebrow features, and finally, integrating the two information through a feature fusion module to output fused mask face features;

as shown in fig. 3, in order to utilize all the describable features in the mask face image as much as possible, the mask face feature extraction network adopts a parallel design of a main path network and a branch path network, which are respectively used for extracting the whole outline features and the local eyebrow features of the face, and finally the mask face features are obtained through a whole-local feature fusion module. In order to ensure the feature extraction efficiency, the main network and the branch network respectively adopt two lightweight networks, namely an acceptance V3 network and a MobileNet network. Meanwhile, in order to make the main network pay more attention to the outline and appearance characteristics of the face, a CBAM attention module is connected after the InceptionV 3. The main network and the branch network are finally connected with a whole-local feature fusion module.

As shown in fig. 4The whole-local feature fusion module is provided with two stages, wherein the first stage is an information exchange stage and the second stage is an information integration stage. The overall outline characteristics respectively output by the main path network and the branch path network are F _o ∈R ^C×H×W The local eyebrow is characterized by F _l ∈R ^C×H×W . Wherein R is ^C×H×W The spatial dimensions representing the feature map are composed of the number of channels C, the height H and the width W, respectively. When F _o And F _l After being input into the integral-local feature fusion module, the mask facial features F are obtained through an information exchange stage and an information integration stage in sequence _m ∈R ^C×H×W 。

In the information exchange stage, the integral outline feature F _o And local eyebrow feature F _l After the channel dimensions are compressed through a 1X 1 convolution kernel respectively, the two are combined into a feature F by using a bilinear fusion operation _b ∈R ^C×H×W . Feature F _b And feature F _o And F _l After channel cascade, the weight W is obtained by a 1X 1 convolution kernel and softmax function _o ∈R ^C×H×W And W is _l ∈R ^C×H×W . Weight W _o And W is _l Respectively with characteristic F _o And F _l Multiplying, and respectively passing through a 1×1 convolution kernel to obtain feature F _o ′∈R ^C×H×W And F _l ′∈R ^C×H×W . Finally, F _b 、F _o ' and F _l The three characteristics are subjected to channel cascade to obtain an output characteristic F of the first stage _s1 ∈R ^3C ^×H×W . The above process can be expressed by the following formula:

F _b ＝bilinear(conv ^1×1 (F _o ),conv ^1×1 (F _l ))

W _o ,W _l ＝softmax(conv ^1×1 (cat(F _o ,F _b ,F _l )))

F _s1 ＝cat(conv ^1×1 (F _o :W _o ),F _b ,conv ^1×1 (F _l :W _l ))

wherein conv ^1×1 (. Cndot.) represents a 1X 1 convolution operationThe method comprises the steps of carrying out a first treatment on the surface of the bilinear (·) represents a bilinear fusion operation; softmax (·) represents the softmax function; representing the multiplication of the matrix by elements; cat (-) indicates a channel cascade operation.

In the information integration stage, feature F _s1 Respectively into an identity branch and a residual branch. The identity branch is composed of only one 1×1 convolution kernel, and the residual branch is composed of one 1×1 convolution kernel, one 3×3 depth separable convolution kernel, a ReLU activation function and one 1×1 convolution kernel which are connected in sequence. Adding the outputs of the identity branch and the residual branch, and regularizing to obtain the fused mask face feature F _m . The above procedure can be represented by the following formula:

F _indentity ＝conv ^1×1 (F _s1 )

F _residual ＝conv ^1×1 (relu(DWconv ^3×3 (conv ^1×1 (F _s1 ))))

wherein F is _identity ∈R ^C×H×W And F _residual ∈R ^C×H×W Output characteristics of the identity branch and the residual branch are respectively represented; DWconv ^3×3 (. Cndot.) represents a 3 x 3 depth separable convolution operation; reLU (·) represents a ReLU activation function; norm (·) represents the regularization operation.

The mask face feature extraction network adopts face data of a mask to train, adopts a GAN network to generate a mask for the disclosed face recognition data set in order to expand the existing face recognition data set of the mask to simulate the face data set of the mask, and increases sample diversity. And simultaneously, the data set is subjected to similar clipping processing to obtain images of the face area and the eyebrow area and corresponding character identity labels. In the training stage, the tail end of the mask face feature extraction network is connected with a full connection layer, classification is carried out according to the extracted mask face features, a ternary group loss function is adopted to calculate a loss value between a classification result and a true value, and the network is trained by taking the loss value reduction as a target to enable the network to be converged. And in the model operation stage, extracting the mask face characteristics of the user to be detected from the input face area and eyebrow area images by using a mask face characteristic extraction network without a full connection layer.

S105, inputting the facial features of the mask into a classifier to obtain the identification result of the identity of the user to be detected.

Specifically, the facial features of the mask of the user to be tested are input into a trained classifier, and a prediction result of the identity of the user to be tested is obtained. The classifier is trained by using registered user samples with features extracted through the mask face feature extraction network in advance, so that the classifier can be correctly matched with the identity information of a user according to similar mask face features.

Those skilled in the art will appreciate that all or part of the steps in a method implementing the above embodiments may be implemented by a program to instruct related hardware, and the corresponding program may be stored in a computer readable storage medium.

It should be noted that although the method operations of the above embodiments are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in that particular order or that all illustrated operations be performed in order to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

Example 2:

as shown in fig. 5, the present embodiment provides a mask-wearing face recognition system with local and whole face features fused, which includes a user image acquisition module 501, a mask face detection module 502, an image preprocessing module 503, a mask face feature extraction module 504 and a mask face matching module 505, wherein:

the user image acquisition module 501 is used for acquiring image data of a user to be detected so as to identify the identity; and the system is also used for collecting images of registered users so as to train a classifier for matching identity information.

The mask face detection module 502 is configured to accurately locate a face region and a mask region in a user image, and output position information of the corresponding region;

an image preprocessing module 503, which performs clipping processing on the user image according to the position information output by the mask face detection module, and performs preprocessing such as denoising and enhancement on the clipped image;

the mask face feature extraction module 504 is configured to extract overall contour features and local eyebrow features from the face region and the eyebrow region images, and fuse the two information to obtain robust mask face features;

the mask face matching module 505 classifies the mask face according to the extracted mask face features, and matches the identities of the persons registered in the library.

Specific implementation of each module in this embodiment may be referred to embodiment 1 above, and will not be described in detail herein; it should be noted that, in the system provided in this embodiment, only the division of the above functional modules is used as an example, in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure is divided into different functional modules to perform all or part of the functions described above.

Example 3:

the present embodiment provides a computer device, which may be a computer, and the computer device is connected to a processor, a memory, an input system, a display and a network interface through a system bus, where the processor is configured to provide computing and control capabilities, the memory includes a nonvolatile storage medium and an internal memory, the nonvolatile storage medium stores an operating system, a computer program and a database, the internal memory provides an environment for the operating system and the computer program in the nonvolatile storage medium, and when the processor executes the computer program stored in the memory, the facial mask face recognition method of the foregoing embodiment 1 is implemented, as follows:

acquiring an image of a user to be detected through a front-end camera;

Example 4:

the present embodiment provides a storage medium, which is a computer-readable storage medium storing a computer program that, when executed by a processor, implements the face recognition method for a mask of the above embodiment 1, as follows:

acquiring an image of a user to be detected through a front-end camera;

and inputting the facial features of the mask into a classifier to obtain the identification result of the identity of the user to be detected. The computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention in further detail, and are not to be construed as limiting the scope of the invention, but are merely intended to cover any modifications, equivalents, improvements, etc. based on the teachings of the invention.

Claims

1. The face recognition method of the mask wearing the face with the cross fusion of the local face features and the whole face features is characterized by comprising the following steps of:

acquiring an image of a user to be detected through a front-end camera;

inputting the user image to be detected into a mask face detection model, outputting position information about a face and a mask, and cutting the user image to be detected according to the output result of the mask face detection model; cutting out a face region from the image of the user to be detected according to the predicted face region coordinates, and cutting out an eyebrow region which is not shielded above the boundary line according to the predicted mask region coordinates by taking the upper boundary of the mask region as the boundary line; after cutting, obtaining corresponding face area and eyebrow area images; in order to improve the image quality and ensure the accuracy of the subsequent feature extraction and feature matching, the face and eyebrow region images are further subjected to denoising or enhancement treatment, and the network structure of the mask face detection model consists of a trunk, a neck and a detection head; the trunk part adopts a general feature extraction network ResNet; the neck adopts FPN to refine the original feature map, and aggregate semantic information of different layers; the detection head adopts an SSD algorithm, and a context attention module is added in the detection head to enable the network to pay attention to a face area and an eye-brow area;

the context attention module consists of a context awareness module and a CBAM attention module; the context sensing module is provided with three branches, wherein the three branches respectively comprise 1, 2 and 3 multiplied by 3 convolution kernels, and the output results of the three branches are combined into a feature map through channel cascading operation and are input into the CBAM attention module;

the mask face detection model is trained by face data of a wearer wearing a mask; each image in the face data has a label file annotated with the face position and mask position information; after inputting the image into the model, the model outputs a corresponding prediction result according to the extracted characteristics, wherein the prediction result comprises the coordinates of a face and the confidence of the face, the coordinates of a mask and the confidence of the mask, and a loss value between the prediction result and a true value in a tag file is calculated through a preset mask face loss function, so that the loss value is reduced as an optimization target, and the mask face detection model is trained;

wherein L represents a loss value of the mask face detection model; l (L) _conf (. Cndot.) and L _loc (. Cndot.) represents the confidence loss function and the position offset loss function, respectively;indicating whether a face exists, if so, it is 1, and if not, it is 0,/if not>Indicating whether there is a mask, if there is a mask, 1, if there is no mask, 0,/if there is no mask>Coordinates representing a face region, < >>Coordinates representing the mask area; p (P) _fc Representing confidence level of predicting existence of face, P _mc Indicating confidence level of predicted mask presence, P _fl Representing predicted face region coordinates, P _fl Representing predicted mask region coordinates; alpha represents a confidence loss term factor, and beta represents a position offset loss term factor;

inputting the face region and the eyebrow region into a mask face feature extraction network, wherein the face region enters a main network to extract overall outline features, the non-mask shielding region enters a branch network to extract local eyebrow features, and finally, after integrating the two information through a feature fusion module, outputting fused mask face features, and inputting the face region and the eyebrow region images into the mask face feature extraction network to obtain mask face features; in order to utilize all descriptive characteristics in the mask face image as much as possible, the mask face characteristic extraction network adopts a parallel design of a main path network and a branch path network, and is respectively used for extracting the whole outline characteristics and the local eyebrow characteristics of the face, and finally the mask face characteristics are obtained through a whole-local characteristic fusion module; in order to ensure the feature extraction efficiency, the main network and the branch network respectively adopt two lightweight networks, namely an InceptionV3 network and a MobileNet network; meanwhile, in order to make the main network pay more attention to the outline and appearance characteristics of the face, a CBAM attention module is connected behind the InceptionV 3; the main network and the branch network are finally connected with a whole-local feature fusion module;

In the information interaction stage, the integral outline feature F _o And local eyebrow feature F _l After the channel dimensions are compressed through a 1X 1 convolution kernel respectively, the two are combined into a feature F by using a bilinear fusion operation _b ∈R ^C×H×W Feature F _b And feature F _o And F _l After channel cascade, the weight W is obtained by a 1X 1 convolution kernel and softmax function _o ∈R ^C×H×W And W is _l ∈R ^C×H×W The method comprises the steps of carrying out a first treatment on the surface of the Weight W _o And W is _l Respectively with characteristic F _o And F _l Multiplication, respectively through a 1X 1 convolution kernel, to obtain the characteristic F' _o ∈R ^C×H×W And F _l ′∈R ^C ^×H×W The method comprises the steps of carrying out a first treatment on the surface of the Finally, F _b 、F′ _o And F _l The three characteristics are subjected to channel cascade to obtain an output characteristic F of the first stage _s1 ∈R ^3C×H×W The method comprises the steps of carrying out a first treatment on the surface of the The above process can be expressed by the following formula:

F _b ＝bilinear(conv ^1×1 (F _o ),conv ^1×1 (F _l ))

W _o ,W _l ＝softmax(conv ^1×1 (cat(F _o ,F _b ,F _l )))

F _s1 ＝cat(conv ^1×1 (F _o ⊙W _o ),F _b ,conv ^1×1 (F _l ⊙W _l ))

wherein conv ^1×1 (. Cndot.) represents a 1X 1 convolution operation; bilinear (·) represents a bilinear fusion operation; softmax (·) represents the softmax function; the "" indicates that the matrix is multiplied by element; cat (-) represents a channel cascade operation;

in the information integration stage, feature F _s1 Respectively entering an identity branch and a residual branch; the identical branch is formed by only one 1 multiplied by 1 convolution kernel, and the residual branch is formed by sequentially connecting a 1 multiplied by 1 convolution kernel, a 3 multiplied by 3 depth separable convolution kernel, a ReLU activation function and a 1 multiplied by 1 convolution kernel; adding the outputs of the identity branch and the residual branch, and regularizing to obtain the fused mask face feature F _m The method comprises the steps of carrying out a first treatment on the surface of the The above procedure can be represented by the following formula:

F _indentity ＝conv ^1×1 (F _s1 )

F _residual ＝conv ^1×1 (relu(DWconv ^3×3 (conv ^1×1 (F _s1 ))))

the mask face feature extraction network is used for training face data of a mask, and in order to expand the existing face recognition data set of the mask, a GAN network is used for generating a mask for the disclosed face recognition data set so as to simulate the face data set of the mask and increase sample diversity; simultaneously, the data set is subjected to similar clipping processing to obtain images of a face area and an eyebrow area and corresponding character identity labels; in the training stage, the tail end of the mask face feature extraction network is connected with a full connection layer, classification is carried out according to the extracted mask face features, a ternary group loss function is adopted to calculate a loss value between a classification result and a true value, and the network is trained by taking the loss value as a target to enable the loss value to be converged; in the model operation stage, extracting mask face features of a user to be detected from the input face area and eyebrow area images by using a mask face feature extraction network without a full connection layer;

2. The face recognition method of the mask for wearing, which is characterized in that the local face features and the whole face features are crossed and fused, according to claim 1, wherein: inputting the facial features of the mask of the user to be tested into a trained classifier to obtain a prediction result of the identity of the user to be tested; the classifier is trained by using registered user samples with features extracted through the mask face feature extraction network in advance, so that the classifier can be correctly matched with the identity information of a user according to the mask face features.

3. A facial mask face recognition device for cross fusion of local and global face features, comprising:

the mask face detection module is used for accurately positioning a face area and a mask area in a user image and outputting position information of the corresponding areas, and comprises a mask face detection model, wherein the network structure of the mask face detection model consists of a trunk, a neck and a detection head; the trunk part adopts a general feature extraction network ResNet; the neck adopts FPN to refine the original feature map, and aggregate semantic information of different layers; the detection head adopts an SSD algorithm, and a context attention module is added in the detection head to enable the network to pay attention to a face area and an eye-brow area;

the image preprocessing module is used for cutting the user image according to the position information output by the mask face detection module, and denoising and enhancing the cut image;

the mask face feature extraction module is used for respectively extracting the whole outline features and the local eyebrow features from the face area and the eyebrow area images, obtaining robust mask face features after integrating the whole outline features and the local eyebrow features, wherein the mask face feature extraction module comprises a mask face feature extraction network, the mask face feature extraction network adopts a parallel design of a main network and a branch network and is respectively used for extracting the whole outline features and the local eyebrow features of the face, finally, the mask face features are obtained through a whole-local feature integration module, the mask face feature extraction network adopts mask wearing face data for training, a GAN network is adopted for expanding the existing mask wearing face recognition data set, a mask is generated for the disclosed face recognition data set so as to simulate the mask face data set, and the sample diversity is increased; simultaneously, the data set is subjected to similar clipping processing to obtain images of a face area and an eyebrow area and corresponding character identity labels; in the training stage, the tail end of the mask face feature extraction network is connected with a full connection layer, classification is carried out according to the extracted mask face features, a ternary group loss function is adopted to calculate a loss value between a classification result and a true value, and the network is trained by taking the loss value as a target to enable the loss value to be converged; in the model operation stage, extracting mask face features of a user to be detected from the input face area and eyebrow area images by using a mask face feature extraction network without a full connection layer;

4. A computer device comprising a processor and a memory for storing a program executable by the processor, when executing the program stored in the memory, implementing a facial mask face recognition method of intersecting fusion of local and global facial features as defined in claim 1.

5. A storage medium storing a program which, when executed by a processor, implements a face recognition method for a mask that cross-fuses local and global face features as claimed in claim 1.