CN113537066B

CN113537066B - Wearing mask face recognition method based on multi-granularity mixed loss and electronic equipment

Info

Publication number: CN113537066B
Application number: CN202110808959.8A
Authority: CN
Inventors: 王一帆; 杜兵; 罗翚
Original assignee: Fiberhome Telecommunication Technologies Co Ltd
Current assignee: Fiberhome Telecommunication Technologies Co Ltd
Priority date: 2021-07-16
Filing date: 2021-07-16
Publication date: 2022-09-09
Anticipated expiration: 2041-07-16
Also published as: CN113537066A

Abstract

The invention discloses a wear-mask face recognition method based on multi-granularity mixing loss, which comprises the following steps: s1, designing a basic network, wherein the basic network generates a plurality of branches according to a preset granularity; s2, designing a mixed loss function, wherein the mixed loss function adopts a mixed loss mode to train and optimize each branch of each granularity of the basic network; s3, training each branch of each granularity of the basic network by using the mixed loss function and adopting a mixed loss mode based on a training data set to obtain a trained face recognition network model; and S4, identifying the face image to be identified by using the trained face identification network model. The invention can be compatible with face recognition of wearing and not wearing the mask, effectively extends different face distances by using mixed loss, makes the threshold value debugging more convenient and flexible, improves the recognition precision, has high running speed and is convenient to transplant. The invention also provides corresponding electronic equipment.

Description

Wearing mask face recognition method based on multi-granularity mixed loss and electronic equipment

Technical Field

The invention belongs to the technical field of image recognition, and particularly relates to a mask wearing face recognition method based on multi-granularity mixing loss and electronic equipment.

Background

The current training method for face recognition mainly relies on single loss to train a convolutional neural network, characteristic vectors are extracted from the network to be output as a model, characteristic comparison is carried out after the characteristic vectors are respectively extracted from an image to be recognized and a base library, and finally face similarity of the image to be recognized to the base library is obtained, so that face recognition is realized. However, when face recognition is performed by wearing a mask, most features of a face are shielded by the mask, and recognition accuracy is greatly reduced.

At present, a mainstream training model is trained based on classification loss, and a single basic network is used, so that the similarity of strange faces cannot be effectively pulled, the similarity of feature comparison is reflected to be too close, and the threshold value is set to be sensitive. Meanwhile, the mainstream face recognition network cannot effectively solve the recognition problem of the face of the mask, and the face recognition accuracy of the mask is lower.

Disclosure of Invention

Aiming at the defects or improvement requirements in the prior art, the invention provides a mask wearing face recognition method based on multi-granularity mixing loss, which ensures the recognition accuracy of normal faces and improves the recognition accuracy of the mask wearing face and the face sheltering face.

To achieve the above object, according to an aspect of the present invention, there is provided a face recognition method for a mask worn on a face based on a multi-granularity mixing loss, including:

s1, designing a basic network, wherein the basic network generates a plurality of branches according to a preset granularity;

s2, designing a mixed loss function, wherein the mixed loss function adopts a mixed loss mode to train and optimize each branch of each granularity of the basic network;

s3, training each branch of each granularity of the basic network by using the mixed loss function in a mixed loss mode based on a training data set to obtain a trained face recognition network model;

and S4, identifying the face image to be identified by using the trained face identification network model.

In an embodiment of the present invention, the generating, by the base network, a plurality of branches according to a preset granularity includes:

the feature map is divided into three branches in the middle of the basic network, one branch represents the whole face, one branch represents the upper part of the face, the other branch represents the lower part of the face, and the upper part and the lower part of the face are divided according to a preset proportion.

In one embodiment of the invention, the mixing loss comprises cross entropy classification loss and boundary mining loss with softmax as an activation function.

In an embodiment of the present invention, the training of each branch of each granularity of the base network by using a hybrid loss manner is specifically implemented as follows:

training is carried out on each branch of each granularity by using a mixed loss mode, wherein each branch is simultaneously optimized by boundary mining loss and cross entropy classification loss taking softmax as an activation function, and a BNNeck layer is used before the classification loss.

In an embodiment of the present invention, the boundary mining loss is specifically:

each batch has P IDs, k face images belonging to the ID, namely, each batch has P images, the images in the batch are traversed during loss calculation, the traversed and selected images are regarded as anchor points denoted by a, all k images in the ID where the anchor points are located are regarded as positive samples denoted by P, the images of the IDs where all the non-anchor points are located are regarded as negative samples denoted by n, wherein n1 and n2 represent negative samples from different IDs in the negative samples; max (d) _a,p ) Calculating Euclidean distances between anchor point characteristics and all positive sample characteristics, and then selecting the maximum value of all Euclidean distances; min (d) _a,n ) Calculating Euclidean distances between anchor point characteristics and all negative sample characteristics and taking a minimum value; min (d) _n1,n2 ) Calculating Euclidean distances of different IDs between negative samples and finding a minimum value; m1m2 is used as a weight term in the loss function.

In one embodiment of the present invention, the training data set comprises an original face training set and a mask simulation training set based on an original face, wherein:

the original face training set comprises: the method is a training set used for learning human face features, each folder in a data set represents a human face ID, and each ID comprises human face images illuminated at different angles of a person;

the mask training set based on the original face simulation comprises the following steps: the mask simulation training set is a training set used for learning human faces and wearing mask human face features, each folder in a data set represents a human face ID, each ID comprises human face images of the person illuminated at different angles, the mask simulation training set simulates masks to the corresponding human faces according to mask styles and human face angles, and the mask simulation proportion is determined according to the number of the images of each folder.

In one embodiment of the invention, in the process of training a face recognition network model, an original face training set and a face simulation mask training set based on the original face are combined, and are randomly transmitted to the face recognition network model in a Mini Batch mode to optimize the network.

In an embodiment of the present invention, the recognizing the face image to be recognized by using the trained face recognition network model specifically includes:

the face recognition network model respectively extracts the features of the face to be recognized and the face in the face library, calculates the inner product of the feature vectors in the face to be recognized and the face library to obtain the cosine distance of the two feature vectors, and calculates the similarity of the two vectors through the following formula:

similarity＝(dist+1)/2

dist is a cosine distance, N similarity degrees are obtained through the formula and represent similarity degrees between the face to be recognized and N face IDs in the face library, the highest similarity degree item is found from the N similarity degrees, whether the item is larger than a preset face similarity degree threshold value is verified, when the similarity degree is larger than the threshold value, the ID to be recognized exists in the bottom library, and otherwise, the face library is considered to have no ID.

In an embodiment of the present invention, a feature vector obtained by extracting features of a face is:

and fusing the feature vectors of the three branches to obtain the feature vector of the image.

According to another aspect of the present invention, there is also provided an electronic apparatus, including:

at least one processor; and (c) a second step of,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described multi-granularity blending loss based face recognition method for a respirator.

In general, compared with the prior art, the technical scheme conceived by the invention has the following beneficial effects:

(1) the model is output in multiple granularities and multiple branches, one model obtains feature vectors with different granularities at the same time, and face recognition of wearing masks and not wearing masks is compatible (the recognition model extracts face features with different granularities, and mutual assistance improves precision);

(2) the problem of high similarity between strange human faces is fully considered, different human face distances are effectively pulled apart by using mixed loss, so that the threshold value debugging is more convenient and flexible, and the identification precision is improved;

(3) the method has the advantages of easy deployment, easy model lightweight, support of using a conventional neural network layer, high execution efficiency, high running speed and convenient transplantation, and meanwhile, the model architecture is not limited by a basic network structure, and the size of the used network can be dynamically adjusted according to the requirements.

Drawings

FIG. 1 is a schematic flow diagram of a wearing mask face recognition method based on multi-granularity mixing loss according to the present invention;

FIG. 2 is a schematic structural diagram of a face recognition network model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the structure of the first half of the infrastructure network in an embodiment of the invention;

fig. 4 is a schematic structural diagram of a second half of the infrastructure network in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

In order to achieve the object of the present invention, as shown in fig. 1, the present invention provides a face recognition method based on a multi-granularity mixing loss for a mask, comprising:

designing a deep learning-based model, collecting face information with a plurality of granularities into the model and outputting multi-branch feature vectors. The feature map is divided into three branches with various granularities in the middle of the basic network (namely, at one half layer), one branch represents the global granularity and represents the whole face, the other two branches represent the upper half and the lower half of the face (the feature map is uniformly divided into two parts, the upper part and the lower part of the face are divided according to a preset proportion or divided according to the proportion of 3:7 and 4: 6), and the feature vector finally output by each branch is connected with respective mixing loss to carry out training optimization. The mixing loss comprises cross entropy classification loss and boundary mining loss which take softmax as an activation function. After optimization of the loss function, each branch outputs a respective feature vector. During feature extraction, feature vectors of all branches are connected and fused into a final feature vector for face recognition. The method can reduce the negative influence of the wearing mask on the model identification precision, simultaneously reduce the distance of the same face ID by using the mixing loss, effectively pull the distance between different face IDs away, and enable the adjustment of the face identification similarity threshold value to be more accurate and flexible;

In summary, as shown in fig. 2, after the input (i.e. the input image) passes through the image preprocessing and the first half of the base network, the split layer operation is performed in the middle of the base network (i.e. at one half layer), and if the base network has a depth of 100 layers, multiple branches with multiple granularities are separated at about 50 th layer (e.g. three branches with two granularities in the embodiment of the present invention, i.e. three branches pass through the first half of the base network in the figure, and can be specifically set at the split layer as needed; depending on the granularity size, multiple branches may be generated according to the corresponding granularity, e.g. only one branch is generated by the global granularity, two branches are generated by the split granularity, three branches are generated by the third granularity, and so on). Although the architecture of current infrastructure networks is based on mobilefacenet, there are a variety of infrastructure network options. For example, when the device flash has surplus memory, the invention supports the situation and uses a larger and stronger basic network such as Resnet 50. The network behind the split layer is divided into two granularity structures (global granularity and refined granularity), one granularity represents the global feature map and forms an independent branch (refer to global granularity-branch C in fig. 2), the other granularity represents the more refined granularity, namely the branch in the granularity only occupies one half of the feature map or is designed according to a certain proportion (the division is not uniform and generates two granularities, the concept of the granularity is an abstract concept of the size of the feature map), and the current network divides the feature map into two parts by using uniform division and forms two new branches, namely an upper half face and a lower half face Uface and a Downface (refer to refined granularity-upper half face branch A and refined granularity-upper half face branch B in fig. 2). When a network is designed, the proportion setting can be flexibly adjusted according to practical application, namely, the division can be carried out according to the proportion of 3:7 and 4:6, the flexible division is that the proportion of the face covered by the mask is not fixed and can be changed according to application scenes such as shooting angles, so the invention supports the situation, and the proportion refers to the fact that the characteristic diagram is divided into two parts in the vertical direction. The invention focuses on network division of multiple granularities, and can flexibly adjust the granularity according to the practical application condition, and once the division proportion is determined, the subsequent training and face recognition processes are carried out according to the proportion. Meanwhile, all granularity branches share the weight of the basic network of the first half part and are optimized and fed back in a gradient mode. The three branches respectively obtain a feature A, a feature B and a feature C, and then training based on mixed loss (boundary mining loss and BNNeck + classification loss) is respectively carried out on the feature A, the feature B and the feature C so as to obtain a face recognition network model; and when the face recognition is carried out, the feature A, the feature B and the feature C which are output by the three branches are directly connected together to be used as the final feature of the face recognition.

Before the image is input into the basic network (including training and recognition), image preprocessing is required, wherein the image preprocessing comprises data centralization, 127.5 is subtracted from the value of each element in the input RGB image, and the three-channel value range after centralization is-127.5. The data is normalized by dividing each element of the input data by 127.5, and the value range of the processed data is-1 to 1. After approximate centralization and standardization, data distribution with the mean value approximate to 0 and the standard deviation approximate to 1 is obtained, and accelerated convergence of the model is facilitated.

The infrastructure network is described in further detail below, and the overall architecture of the infrastructure network is referenced from mobile facenet, while this infrastructure network is modified according to a multi-granularity multi-branching design concept. As shown in fig. 3, the first half of the base network, and as shown in fig. 4, the second half of the base network.

As shown in FIG. 3, the input to the base network is a standard 112x112x3 RGB image structure, all 112 wide, consistent with the training data set image size, with RGB three channels ranging from 0 to 255.

A Conv1 layer, which is the first neural network combination layer after image preprocessing, including a [ Conv2d (c _ in ═ 3, c _ out ═ 64, kernel ═ 3, s ═ 2, p ═ 1) layer, a BatchNorm layer, and a prilu layer ]. In the attribute of Conv2d, c _ in represents the number of input channels, c _ out represents the number of output channels, kernel represents the convolution kernel size, here, the convolution kernel of 3 × 3, s is the convolution kernel step size, and p is the padding. After conv1, the output of 56x56x64 is obtained.

The dw _ Conv1 layer includes a [ Conv2d (c _ in 64, c _ out 64, kernel 3, s 1, p 1, dw true) layer, a BatchNorm layer, and a prilu layer), which takes the output of the Conv1 layer as the input of the layer. Where dw True represents this convolution as a depth separable convolution. The output of 56x56x64 is obtained after the dw _ conv1 layer.

A BottleNeck layer (BottleNeck network), which exists in a common building block form in the basic network, and implements different network structures, such as bottleeck (inp, oup, b _ s, expansion), according to parameter control. Where inp represents the number of input channels, oup represents the number of output channels, b _ s represents the step size of the 3x3 convolution in BottleNeck, and expansion represents the expansion coefficient of the convolved channels in BottleNeck. The BottleNeck layer is abstracted by the above parameters and includes a layer [ Conv2d (c _ in _ inp, c _ out _ inp _ extension, kernel _ 1, s _ 1, p _ tch 0), batnorm layer, PRelu layer, Conv2d (c _ in _ inp _ extension, c _ out _ inp _ extension, kernel _ 3, s _ b _ s, p _ 0), batcher norm layer, PRelu layer, Conv2d (c _ in _ inp _ extension, c _ out _ oup, kernel _ 1, s _ 1, p _ 0), perm layer, pru layer, and elu layer.

And a block1 layer, wherein the output of the dw _ conv1 layer is used as the input of the layer. block1 is composed of several bottleecks, block1 has the attribute t-2, c-64, n-5, s-2. Where t-2 represents two bottleecks in block1, and c-64 represents 64 channels in the final output of block 1. n-5 represents the expansion coefficient of the bottleeck, s-2 represents the convolution step of the first bottleeck as 2, and the convolution steps of the rest bottleecks as 1. The output of final block1 is 28x28x 64.

And a block2 layer, wherein the output of the dw _ conv1 layer is used as the input of the layer. The block2 is composed of a plurality of bottleecks, and the block2 has the attribute of t-4, c-128, n-1 and s-2. The output of the final block2 is 14x14x 128. The first half of the infrastructure network is constructed as shown in fig. 2, and the subsequent network layer will split the feature map into multiple granularities.

The split layer, as shown in FIG. 4, splits the output of block2 into 3 branches. Wherein the granularity is refined-the upper half face branch a gets 7x14x128(up), the granularity is refined-the lower half face branch B gets 7x14x128(down), and the global granularity-the face branch C gets 14x14x 128.

block3a, block3b and block3c layers respectively take outputs of three branches of the split layer as inputs of three branch layers, and the attributes of the three branch layers are unified into t-2, c-128, n-6 and s-1. The output results of three branches, namely output 7x14x128 of branch A, output 7x14x128 of branch B and output 14x14x128 of branch C are obtained respectively.

The block4a, block4b and block4c layers respectively take outputs of the block3a, block3b and block3c layers as inputs of the current three branch layers, and the attributes of the three branch layers are unified into t-4, c-128, n-1 and s-2. The output results of three branches, namely 4x7x128 output of branch A, 4x7x128 output of branch B and 7x7x128 output of branch C are obtained.

The block5a, block5b and block5c layers respectively take outputs of the block4a, block4b and block4c layers as inputs of the current three branch layers, and the attributes of the three branch layers are unified into t-2, c-128, n-2 and s-1. The output results of three branches, namely 4x7x128 output of branch A, 4x7x128 output of branch B and 7x7x128 output of branch C are obtained.

The linear1a layer, the linear1B layer and the linear1C layer respectively take the outputs of the block5a layer, the block5B layer and the block5C layer as the inputs of the current three branch layers, and the linear layer is taken as a full connection layer, which is a way of converting a feature map into preset feature vectors, wherein the full connection outputs of the three branches are all 128, and the output results of the three branches are respectively obtained, namely the output of the branch A is 1x128, the output of the branch B is 1x128, and the output of the branch C is 1x 128.

the mixing loss function is designed to maximize the output of the underlying network, i.e., the eigenvectors of the three branches. To this end, the present invention uses a mixed-loss method to train each branch of each granularity, where each branch is optimized by boundary mining loss (msml) and cross entropy classification loss using softmax as an activation function, and uses bntack layer before the classification loss, so that the advantage is that the feature vector can be optimized by boundary mining loss in free euclidean space, and the feature vector is mapped to the hypersphere after being processed by bntack, which is more suitable for being optimized by classification loss. The BNNeck can accelerate the convergence of the network, and can better balance the problem that the boundary mining loss is inconsistent with the cross entropy classification loss in the gradient direction when the neural network performs feedback, so that the network convergence is smoother.

Boundary excavation loss:

there are concepts of batch in the loss function calculation process, each batch has P IDs, k face images belonging to the ID, that is, each batch has P × k images. During loss calculation, images in the batch are traversed, the traversed and selected images are regarded as anchor points (anchors) to be recorded as a, all k images in the IDs where the anchor points are located are regarded as Positive samples (Positive samples) to be recorded as p (lower case, different from the previous upper case of the ID number), and all images of the IDs where the anchor points are not located are regarded as Negative samples (Negative samples) to be recorded as n. Wherein n1 and n2 represent negative samples from different IDs. Max (d) _a,p ) The euclidean distances of the anchor point features and all the positive sample features need to be calculated first, and then the maximum value among all the euclidean distances is selected. Min (d) _a,n ) And calculating Euclidean distances between the anchor point characteristics and all the negative sample characteristics, and taking the minimum value. Min (d) _n1,n2 ) Then the euclidean distances for different IDs between negative examples are calculated and the minimum found. The m1m2 can be flexibly adjusted in the loss function as a weight term according to the convergence condition of the actual model, and is recommended to be set to be 0.8 and 0.4. Through the loss function, the calculated loss is fed back to the network, and the weight of each layer in the basic network is optimized. Therefore, the optimization target of shortening the distance between the same IDs and shortening the distance between different IDs is achieved.

BNNeck layer:

the bntack layer takes the output feature vector of the basic network as input, and comprises a [ transfer, BatchNorm layer, transfer ]. The purpose of BNNeck design is to use BatchNorm instead of L2Norm to approximate the normalization of features, but because the BatchNorm layer does not match the design purpose by normalizing around the dimension of the channel, here it needs to use transpose to replace the dimension of the input channel with the dimension of Mini Batch. BatchNorm then uses transpose again to replace the dimensions back to the original shape, facilitating subsequent loss calculations. Meanwhile, the BatchNorm layer has parameters which can absorb the gradient of subsequent loss, so that the convergence is smoother. The output dimension is 1x 128.

softmax cross entropy classification loss:

and (3) with softmax as cross entropy loss of an activation function, optimizing directly on an output feature vector of a basic network, taking the output feature vector of a BNNeck layer as input, and mapping the features onto a hypersphere through BNNeck, namely optimizing by taking the modulus of the feature vector as 1. The final goal is still to optimize the output feature vectors of the underlying network.

Mixing loss:

in the actual training process, the boundary mining loss and the cross entropy classification loss of each branch are added to obtain the final loss. The deep learning training framework will automatically calculate the gradient according to the loss and feed back the whole network and optimize and update the weight of each network layer participating in the loss calculation.

S3, training each branch of each granularity of the basic network in a mixed loss mode based on a training data set to obtain a trained face recognition network model;

step S31: a training data set is prepared. To enhance the generalization capability of the model, two training sets are used here: original face training set and based on original face simulation gauze mask training set, wherein:

original face training set: the method is a training set used for learning face features, each folder in a data set represents a face ID, and each ID comprises face images of the person illuminated at different angles. The number of images in each folder is different, and about 6-40 faces are generally contained according to the actual acquisition capacity.

Based on the original face simulation mask training set: the training set is used for learning the human face and the facial features of the mask, each folder in the data set represents a human face ID, and each ID comprises facial images of the person illuminated at different angles. Different from the original face training set, the simulated mask training set simulates masks to corresponding faces according to face angles (left, middle and right) of mask styles (N95, KN95, scientific, gas, inpaint and cloth), and determines the proportion of mask simulation according to the number of images of each folder. Both the accuracy of face recognition and the balance of the number of sufficient face wearing masks are ensured, and the proportion of two kinds of data, namely original face training data (without masks) and simulated mask data (with masks) needs to be adjusted to a proper value. The current ratio is 1: 1.

Step S32: and training a face recognition network model. Firstly, a basic network and corresponding mixed loss are built through the basic network and the mixed loss function architecture. Meanwhile, in the process of training a basic network, an original face training set and a mask training set based on original face simulation are combined, and are randomly conveyed to a face recognition network model in a Mini Batch mode from the original face training set, so that the network is optimally trained.

Step S33: and (4) exporting a model and extracting characteristics. The classifier (classification loss function), the boundary mining loss function, the BNNeck and other operation layers are not needed in the use process of the actual face recognition network model, because the operations occur after the feature vectors and exist only for optimizing the network. And (4) cutting the training network to remove the classifier, excavating the boundary to obtain a loss function, and reserving the weight of the basic network only in a BNNeck layer. Thus, a face recognition network model for the recognition process is obtained.

concat layer: feature vectors of three branches can be obtained from the basic network, and are connected into a final feature vector for feature comparison through concat operation. The operation layer only occurs in the identification process, the input is three feature vectors of the output of the basic network, and the output of the operation layer is 1x384 feature vectors.

And (3) feature extraction, wherein when feature extraction is carried out on the image by using the recognition model, feature vectors of three branches can be obtained from the recognition model and are sent to a concat layer to obtain 1x384 feature vector output. This operation fuses the feature vectors of the multiple branches. The feature vector obtained by final fusion has stronger robustness, and the characterization capability of the feature vector is effectively improved by the fusion of the multi-dimensional feature vector. And then carrying out vector Normalization on the fused feature vectors by using L2 Normalization, wherein the obtained result vectors can be used as final output of face recognition network feature extraction.

Carrying out the following steps of 1: and during N retrieval, the face recognition network model respectively extracts the face to be recognized and the face in the face library. And calculating the inner product of the face to be recognized and the feature vectors in the face library to obtain the cosine distance between the two feature vectors. The similarity of the two vectors is calculated by the following formula:

similarity＝(dist+1)/2

where dist is a cosine distance, N similarity degrees representing the similarity degrees between the face to be recognized and N face IDs in the face library can be obtained by the above method, the item with the highest similarity degree is found from the N similarity degrees, and it is verified whether the item is greater than a preset face similarity degree threshold (here, set to 0.76). And when the similarity is larger than the threshold value, the ID to be identified exists in the bottom library. Otherwise, the face library is considered to have no ID.

Carrying out the following steps of 1: 1, during verification, replacing the face library in the operation into a target face.

Further, the present invention also provides an electronic device, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A wear-mask face recognition method based on multi-granularity mixing loss is characterized by comprising the following steps:

S3, training each branch of each granularity of the basic network by using the mixed loss function and adopting a mixed loss mode based on a training data set to obtain a trained face recognition network model; the mixing loss comprises cross entropy classification loss and boundary mining loss which take softmax as an activation function; the training of each branch of each granularity of the basic network by adopting the mixed loss mode is specifically realized as follows: training each branch of each granularity by using a mixed loss mode, wherein each branch is simultaneously optimized by boundary mining loss and cross entropy classification loss taking softmax as an activation function, and a BNNeck layer is used before the classification loss;

2. The wear mask face recognition method based on multi-granularity mixing loss according to claim 1, wherein the base network generates a plurality of branches according to a preset granularity size, and the method comprises the following steps:

3. The mask-worn face recognition method based on multi-granularity mixing loss according to claim 1, wherein the boundary mining loss is specifically:

each batch has P IDs, k face images belonging to the ID, namely, each batch has P images, the images in the batch are traversed during loss calculation, the traversed and selected images are regarded as anchor points marked as a, all k images in the ID where the anchor points are located are regarded as positive samples marked as P, the images of the IDs where all the non-anchor points are located are regarded as negative samples marked as n, wherein n1 and n2 represent negative samples from different IDs in the negative samples, and Max (d is a positive sample and a negative sample, and n is a negative sample _a,p ) Calculating Euclidean distances between anchor point characteristics and all positive sample characteristics, and then selecting the maximum value, Min (d), of all Euclidean distances _a,n ) The Euclidean distance between the anchor point feature and all negative sample features needs to be calculated and the minimum value, Min (d) _n1,n2 ) Is to calculate the euclidean distance between negative samples for different IDs and find the minimum, m1 m2 is used as the weight term in the loss function.

4. The wear mask face recognition method based on multi-granularity mixing loss according to claim 1 or 2, wherein the training data set comprises an original face training set and a simulated mask training set based on an original face, wherein:

the mask training set based on the original face simulation comprises the following steps: the simulation mask training set is used for learning the face and wearing mask face features, each folder in the data set represents a face ID, each ID comprises face images of the person illuminated at different angles, the simulation mask training set simulates a mask to the corresponding face according to the mask style and the face angle, and the mask simulation proportion is determined according to the number of the images of each folder.

5. The method for face recognition based on multi-granularity mixing loss of the mask worn by the wearer according to claim 1 or 2, wherein in the process of training the face recognition network model, an original face training set and a face simulation mask training set based on the original face are combined, and are randomly transmitted to the face recognition network model in a Mini Batch mode from the combination, so that the network is optimized.

6. The wear mask face recognition method based on multi-granularity mixing loss according to claim 1 or 2, wherein the trained face recognition network model is used for recognizing a face image to be recognized, specifically:

similarity＝(dist+1)/2

7. The wear-mask face recognition method based on multi-granularity mixing loss according to claim 6, wherein the feature vector obtained by feature extraction on the face is as follows:

8. An electronic device, comprising:

at least one processor; and (c) a second step of,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.