CN113537066B - Wearing mask face recognition method based on multi-granularity mixed loss and electronic equipment - Google Patents

Wearing mask face recognition method based on multi-granularity mixed loss and electronic equipment Download PDF

Info

Publication number
CN113537066B
CN113537066B CN202110808959.8A CN202110808959A CN113537066B CN 113537066 B CN113537066 B CN 113537066B CN 202110808959 A CN202110808959 A CN 202110808959A CN 113537066 B CN113537066 B CN 113537066B
Authority
CN
China
Prior art keywords
face
loss
granularity
mask
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110808959.8A
Other languages
Chinese (zh)
Other versions
CN113537066A (en
Inventor
王一帆
杜兵
罗翚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fiberhome Telecommunication Technologies Co Ltd
Original Assignee
Fiberhome Telecommunication Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fiberhome Telecommunication Technologies Co Ltd filed Critical Fiberhome Telecommunication Technologies Co Ltd
Priority to CN202110808959.8A priority Critical patent/CN113537066B/en
Publication of CN113537066A publication Critical patent/CN113537066A/en
Application granted granted Critical
Publication of CN113537066B publication Critical patent/CN113537066B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a wear-mask face recognition method based on multi-granularity mixing loss, which comprises the following steps: s1, designing a basic network, wherein the basic network generates a plurality of branches according to a preset granularity; s2, designing a mixed loss function, wherein the mixed loss function adopts a mixed loss mode to train and optimize each branch of each granularity of the basic network; s3, training each branch of each granularity of the basic network by using the mixed loss function and adopting a mixed loss mode based on a training data set to obtain a trained face recognition network model; and S4, identifying the face image to be identified by using the trained face identification network model. The invention can be compatible with face recognition of wearing and not wearing the mask, effectively extends different face distances by using mixed loss, makes the threshold value debugging more convenient and flexible, improves the recognition precision, has high running speed and is convenient to transplant. The invention also provides corresponding electronic equipment.

Description

Wearing mask face recognition method based on multi-granularity mixed loss and electronic equipment
Technical Field
The invention belongs to the technical field of image recognition, and particularly relates to a mask wearing face recognition method based on multi-granularity mixing loss and electronic equipment.
Background
The current training method for face recognition mainly relies on single loss to train a convolutional neural network, characteristic vectors are extracted from the network to be output as a model, characteristic comparison is carried out after the characteristic vectors are respectively extracted from an image to be recognized and a base library, and finally face similarity of the image to be recognized to the base library is obtained, so that face recognition is realized. However, when face recognition is performed by wearing a mask, most features of a face are shielded by the mask, and recognition accuracy is greatly reduced.
At present, a mainstream training model is trained based on classification loss, and a single basic network is used, so that the similarity of strange faces cannot be effectively pulled, the similarity of feature comparison is reflected to be too close, and the threshold value is set to be sensitive. Meanwhile, the mainstream face recognition network cannot effectively solve the recognition problem of the face of the mask, and the face recognition accuracy of the mask is lower.
Disclosure of Invention
Aiming at the defects or improvement requirements in the prior art, the invention provides a mask wearing face recognition method based on multi-granularity mixing loss, which ensures the recognition accuracy of normal faces and improves the recognition accuracy of the mask wearing face and the face sheltering face.
To achieve the above object, according to an aspect of the present invention, there is provided a face recognition method for a mask worn on a face based on a multi-granularity mixing loss, including:
s1, designing a basic network, wherein the basic network generates a plurality of branches according to a preset granularity;
s2, designing a mixed loss function, wherein the mixed loss function adopts a mixed loss mode to train and optimize each branch of each granularity of the basic network;
s3, training each branch of each granularity of the basic network by using the mixed loss function in a mixed loss mode based on a training data set to obtain a trained face recognition network model;
and S4, identifying the face image to be identified by using the trained face identification network model.
In an embodiment of the present invention, the generating, by the base network, a plurality of branches according to a preset granularity includes:
the feature map is divided into three branches in the middle of the basic network, one branch represents the whole face, one branch represents the upper part of the face, the other branch represents the lower part of the face, and the upper part and the lower part of the face are divided according to a preset proportion.
In one embodiment of the invention, the mixing loss comprises cross entropy classification loss and boundary mining loss with softmax as an activation function.
In an embodiment of the present invention, the training of each branch of each granularity of the base network by using a hybrid loss manner is specifically implemented as follows:
training is carried out on each branch of each granularity by using a mixed loss mode, wherein each branch is simultaneously optimized by boundary mining loss and cross entropy classification loss taking softmax as an activation function, and a BNNeck layer is used before the classification loss.
In an embodiment of the present invention, the boundary mining loss is specifically:
Figure BDA0003167448200000021
each batch has P IDs, k face images belonging to the ID, namely, each batch has P images, the images in the batch are traversed during loss calculation, the traversed and selected images are regarded as anchor points denoted by a, all k images in the ID where the anchor points are located are regarded as positive samples denoted by P, the images of the IDs where all the non-anchor points are located are regarded as negative samples denoted by n, wherein n1 and n2 represent negative samples from different IDs in the negative samples; max (d) a,p ) Calculating Euclidean distances between anchor point characteristics and all positive sample characteristics, and then selecting the maximum value of all Euclidean distances; min (d) a,n ) Calculating Euclidean distances between anchor point characteristics and all negative sample characteristics and taking a minimum value; min (d) n1,n2 ) Calculating Euclidean distances of different IDs between negative samples and finding a minimum value; m1m2 is used as a weight term in the loss function.
In one embodiment of the present invention, the training data set comprises an original face training set and a mask simulation training set based on an original face, wherein:
the original face training set comprises: the method is a training set used for learning human face features, each folder in a data set represents a human face ID, and each ID comprises human face images illuminated at different angles of a person;
the mask training set based on the original face simulation comprises the following steps: the mask simulation training set is a training set used for learning human faces and wearing mask human face features, each folder in a data set represents a human face ID, each ID comprises human face images of the person illuminated at different angles, the mask simulation training set simulates masks to the corresponding human faces according to mask styles and human face angles, and the mask simulation proportion is determined according to the number of the images of each folder.
In one embodiment of the invention, in the process of training a face recognition network model, an original face training set and a face simulation mask training set based on the original face are combined, and are randomly transmitted to the face recognition network model in a Mini Batch mode to optimize the network.
In an embodiment of the present invention, the recognizing the face image to be recognized by using the trained face recognition network model specifically includes:
the face recognition network model respectively extracts the features of the face to be recognized and the face in the face library, calculates the inner product of the feature vectors in the face to be recognized and the face library to obtain the cosine distance of the two feature vectors, and calculates the similarity of the two vectors through the following formula:
similarity=(dist+1)/2
dist is a cosine distance, N similarity degrees are obtained through the formula and represent similarity degrees between the face to be recognized and N face IDs in the face library, the highest similarity degree item is found from the N similarity degrees, whether the item is larger than a preset face similarity degree threshold value is verified, when the similarity degree is larger than the threshold value, the ID to be recognized exists in the bottom library, and otherwise, the face library is considered to have no ID.
In an embodiment of the present invention, a feature vector obtained by extracting features of a face is:
and fusing the feature vectors of the three branches to obtain the feature vector of the image.
According to another aspect of the present invention, there is also provided an electronic apparatus, including:
at least one processor; and (c) a second step of,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described multi-granularity blending loss based face recognition method for a respirator.
In general, compared with the prior art, the technical scheme conceived by the invention has the following beneficial effects:
(1) the model is output in multiple granularities and multiple branches, one model obtains feature vectors with different granularities at the same time, and face recognition of wearing masks and not wearing masks is compatible (the recognition model extracts face features with different granularities, and mutual assistance improves precision);
(2) the problem of high similarity between strange human faces is fully considered, different human face distances are effectively pulled apart by using mixed loss, so that the threshold value debugging is more convenient and flexible, and the identification precision is improved;
(3) the method has the advantages of easy deployment, easy model lightweight, support of using a conventional neural network layer, high execution efficiency, high running speed and convenient transplantation, and meanwhile, the model architecture is not limited by a basic network structure, and the size of the used network can be dynamically adjusted according to the requirements.
Drawings
FIG. 1 is a schematic flow diagram of a wearing mask face recognition method based on multi-granularity mixing loss according to the present invention;
FIG. 2 is a schematic structural diagram of a face recognition network model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the structure of the first half of the infrastructure network in an embodiment of the invention;
fig. 4 is a schematic structural diagram of a second half of the infrastructure network in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
In order to achieve the object of the present invention, as shown in fig. 1, the present invention provides a face recognition method based on a multi-granularity mixing loss for a mask, comprising:
s1, designing a basic network, wherein the basic network generates a plurality of branches according to a preset granularity;
designing a deep learning-based model, collecting face information with a plurality of granularities into the model and outputting multi-branch feature vectors. The feature map is divided into three branches with various granularities in the middle of the basic network (namely, at one half layer), one branch represents the global granularity and represents the whole face, the other two branches represent the upper half and the lower half of the face (the feature map is uniformly divided into two parts, the upper part and the lower part of the face are divided according to a preset proportion or divided according to the proportion of 3:7 and 4: 6), and the feature vector finally output by each branch is connected with respective mixing loss to carry out training optimization. The mixing loss comprises cross entropy classification loss and boundary mining loss which take softmax as an activation function. After optimization of the loss function, each branch outputs a respective feature vector. During feature extraction, feature vectors of all branches are connected and fused into a final feature vector for face recognition. The method can reduce the negative influence of the wearing mask on the model identification precision, simultaneously reduce the distance of the same face ID by using the mixing loss, effectively pull the distance between different face IDs away, and enable the adjustment of the face identification similarity threshold value to be more accurate and flexible;
In summary, as shown in fig. 2, after the input (i.e. the input image) passes through the image preprocessing and the first half of the base network, the split layer operation is performed in the middle of the base network (i.e. at one half layer), and if the base network has a depth of 100 layers, multiple branches with multiple granularities are separated at about 50 th layer (e.g. three branches with two granularities in the embodiment of the present invention, i.e. three branches pass through the first half of the base network in the figure, and can be specifically set at the split layer as needed; depending on the granularity size, multiple branches may be generated according to the corresponding granularity, e.g. only one branch is generated by the global granularity, two branches are generated by the split granularity, three branches are generated by the third granularity, and so on). Although the architecture of current infrastructure networks is based on mobilefacenet, there are a variety of infrastructure network options. For example, when the device flash has surplus memory, the invention supports the situation and uses a larger and stronger basic network such as Resnet 50. The network behind the split layer is divided into two granularity structures (global granularity and refined granularity), one granularity represents the global feature map and forms an independent branch (refer to global granularity-branch C in fig. 2), the other granularity represents the more refined granularity, namely the branch in the granularity only occupies one half of the feature map or is designed according to a certain proportion (the division is not uniform and generates two granularities, the concept of the granularity is an abstract concept of the size of the feature map), and the current network divides the feature map into two parts by using uniform division and forms two new branches, namely an upper half face and a lower half face Uface and a Downface (refer to refined granularity-upper half face branch A and refined granularity-upper half face branch B in fig. 2). When a network is designed, the proportion setting can be flexibly adjusted according to practical application, namely, the division can be carried out according to the proportion of 3:7 and 4:6, the flexible division is that the proportion of the face covered by the mask is not fixed and can be changed according to application scenes such as shooting angles, so the invention supports the situation, and the proportion refers to the fact that the characteristic diagram is divided into two parts in the vertical direction. The invention focuses on network division of multiple granularities, and can flexibly adjust the granularity according to the practical application condition, and once the division proportion is determined, the subsequent training and face recognition processes are carried out according to the proportion. Meanwhile, all granularity branches share the weight of the basic network of the first half part and are optimized and fed back in a gradient mode. The three branches respectively obtain a feature A, a feature B and a feature C, and then training based on mixed loss (boundary mining loss and BNNeck + classification loss) is respectively carried out on the feature A, the feature B and the feature C so as to obtain a face recognition network model; and when the face recognition is carried out, the feature A, the feature B and the feature C which are output by the three branches are directly connected together to be used as the final feature of the face recognition.
Before the image is input into the basic network (including training and recognition), image preprocessing is required, wherein the image preprocessing comprises data centralization, 127.5 is subtracted from the value of each element in the input RGB image, and the three-channel value range after centralization is-127.5. The data is normalized by dividing each element of the input data by 127.5, and the value range of the processed data is-1 to 1. After approximate centralization and standardization, data distribution with the mean value approximate to 0 and the standard deviation approximate to 1 is obtained, and accelerated convergence of the model is facilitated.
The infrastructure network is described in further detail below, and the overall architecture of the infrastructure network is referenced from mobile facenet, while this infrastructure network is modified according to a multi-granularity multi-branching design concept. As shown in fig. 3, the first half of the base network, and as shown in fig. 4, the second half of the base network.
As shown in FIG. 3, the input to the base network is a standard 112x112x3 RGB image structure, all 112 wide, consistent with the training data set image size, with RGB three channels ranging from 0 to 255.
A Conv1 layer, which is the first neural network combination layer after image preprocessing, including a [ Conv2d (c _ in ═ 3, c _ out ═ 64, kernel ═ 3, s ═ 2, p ═ 1) layer, a BatchNorm layer, and a prilu layer ]. In the attribute of Conv2d, c _ in represents the number of input channels, c _ out represents the number of output channels, kernel represents the convolution kernel size, here, the convolution kernel of 3 × 3, s is the convolution kernel step size, and p is the padding. After conv1, the output of 56x56x64 is obtained.
The dw _ Conv1 layer includes a [ Conv2d (c _ in 64, c _ out 64, kernel 3, s 1, p 1, dw true) layer, a BatchNorm layer, and a prilu layer), which takes the output of the Conv1 layer as the input of the layer. Where dw True represents this convolution as a depth separable convolution. The output of 56x56x64 is obtained after the dw _ conv1 layer.
A BottleNeck layer (BottleNeck network), which exists in a common building block form in the basic network, and implements different network structures, such as bottleeck (inp, oup, b _ s, expansion), according to parameter control. Where inp represents the number of input channels, oup represents the number of output channels, b _ s represents the step size of the 3x3 convolution in BottleNeck, and expansion represents the expansion coefficient of the convolved channels in BottleNeck. The BottleNeck layer is abstracted by the above parameters and includes a layer [ Conv2d (c _ in _ inp, c _ out _ inp _ extension, kernel _ 1, s _ 1, p _ tch 0), batnorm layer, PRelu layer, Conv2d (c _ in _ inp _ extension, c _ out _ inp _ extension, kernel _ 3, s _ b _ s, p _ 0), batcher norm layer, PRelu layer, Conv2d (c _ in _ inp _ extension, c _ out _ oup, kernel _ 1, s _ 1, p _ 0), perm layer, pru layer, and elu layer.
And a block1 layer, wherein the output of the dw _ conv1 layer is used as the input of the layer. block1 is composed of several bottleecks, block1 has the attribute t-2, c-64, n-5, s-2. Where t-2 represents two bottleecks in block1, and c-64 represents 64 channels in the final output of block 1. n-5 represents the expansion coefficient of the bottleeck, s-2 represents the convolution step of the first bottleeck as 2, and the convolution steps of the rest bottleecks as 1. The output of final block1 is 28x28x 64.
And a block2 layer, wherein the output of the dw _ conv1 layer is used as the input of the layer. The block2 is composed of a plurality of bottleecks, and the block2 has the attribute of t-4, c-128, n-1 and s-2. The output of the final block2 is 14x14x 128. The first half of the infrastructure network is constructed as shown in fig. 2, and the subsequent network layer will split the feature map into multiple granularities.
The split layer, as shown in FIG. 4, splits the output of block2 into 3 branches. Wherein the granularity is refined-the upper half face branch a gets 7x14x128(up), the granularity is refined-the lower half face branch B gets 7x14x128(down), and the global granularity-the face branch C gets 14x14x 128.
block3a, block3b and block3c layers respectively take outputs of three branches of the split layer as inputs of three branch layers, and the attributes of the three branch layers are unified into t-2, c-128, n-6 and s-1. The output results of three branches, namely output 7x14x128 of branch A, output 7x14x128 of branch B and output 14x14x128 of branch C are obtained respectively.
The block4a, block4b and block4c layers respectively take outputs of the block3a, block3b and block3c layers as inputs of the current three branch layers, and the attributes of the three branch layers are unified into t-4, c-128, n-1 and s-2. The output results of three branches, namely 4x7x128 output of branch A, 4x7x128 output of branch B and 7x7x128 output of branch C are obtained.
The block5a, block5b and block5c layers respectively take outputs of the block4a, block4b and block4c layers as inputs of the current three branch layers, and the attributes of the three branch layers are unified into t-2, c-128, n-2 and s-1. The output results of three branches, namely 4x7x128 output of branch A, 4x7x128 output of branch B and 7x7x128 output of branch C are obtained.
The linear1a layer, the linear1B layer and the linear1C layer respectively take the outputs of the block5a layer, the block5B layer and the block5C layer as the inputs of the current three branch layers, and the linear layer is taken as a full connection layer, which is a way of converting a feature map into preset feature vectors, wherein the full connection outputs of the three branches are all 128, and the output results of the three branches are respectively obtained, namely the output of the branch A is 1x128, the output of the branch B is 1x128, and the output of the branch C is 1x 128.
S2, designing a mixed loss function, wherein the mixed loss function adopts a mixed loss mode to train and optimize each branch of each granularity of the basic network;
the mixing loss function is designed to maximize the output of the underlying network, i.e., the eigenvectors of the three branches. To this end, the present invention uses a mixed-loss method to train each branch of each granularity, where each branch is optimized by boundary mining loss (msml) and cross entropy classification loss using softmax as an activation function, and uses bntack layer before the classification loss, so that the advantage is that the feature vector can be optimized by boundary mining loss in free euclidean space, and the feature vector is mapped to the hypersphere after being processed by bntack, which is more suitable for being optimized by classification loss. The BNNeck can accelerate the convergence of the network, and can better balance the problem that the boundary mining loss is inconsistent with the cross entropy classification loss in the gradient direction when the neural network performs feedback, so that the network convergence is smoother.
Boundary excavation loss:
Figure BDA0003167448200000091
there are concepts of batch in the loss function calculation process, each batch has P IDs, k face images belonging to the ID, that is, each batch has P × k images. During loss calculation, images in the batch are traversed, the traversed and selected images are regarded as anchor points (anchors) to be recorded as a, all k images in the IDs where the anchor points are located are regarded as Positive samples (Positive samples) to be recorded as p (lower case, different from the previous upper case of the ID number), and all images of the IDs where the anchor points are not located are regarded as Negative samples (Negative samples) to be recorded as n. Wherein n1 and n2 represent negative samples from different IDs. Max (d) a,p ) The euclidean distances of the anchor point features and all the positive sample features need to be calculated first, and then the maximum value among all the euclidean distances is selected. Min (d) a,n ) And calculating Euclidean distances between the anchor point characteristics and all the negative sample characteristics, and taking the minimum value. Min (d) n1,n2 ) Then the euclidean distances for different IDs between negative examples are calculated and the minimum found. The m1m2 can be flexibly adjusted in the loss function as a weight term according to the convergence condition of the actual model, and is recommended to be set to be 0.8 and 0.4. Through the loss function, the calculated loss is fed back to the network, and the weight of each layer in the basic network is optimized. Therefore, the optimization target of shortening the distance between the same IDs and shortening the distance between different IDs is achieved.
BNNeck layer:
the bntack layer takes the output feature vector of the basic network as input, and comprises a [ transfer, BatchNorm layer, transfer ]. The purpose of BNNeck design is to use BatchNorm instead of L2Norm to approximate the normalization of features, but because the BatchNorm layer does not match the design purpose by normalizing around the dimension of the channel, here it needs to use transpose to replace the dimension of the input channel with the dimension of Mini Batch. BatchNorm then uses transpose again to replace the dimensions back to the original shape, facilitating subsequent loss calculations. Meanwhile, the BatchNorm layer has parameters which can absorb the gradient of subsequent loss, so that the convergence is smoother. The output dimension is 1x 128.
softmax cross entropy classification loss:
and (3) with softmax as cross entropy loss of an activation function, optimizing directly on an output feature vector of a basic network, taking the output feature vector of a BNNeck layer as input, and mapping the features onto a hypersphere through BNNeck, namely optimizing by taking the modulus of the feature vector as 1. The final goal is still to optimize the output feature vectors of the underlying network.
Mixing loss:
in the actual training process, the boundary mining loss and the cross entropy classification loss of each branch are added to obtain the final loss. The deep learning training framework will automatically calculate the gradient according to the loss and feed back the whole network and optimize and update the weight of each network layer participating in the loss calculation.
S3, training each branch of each granularity of the basic network in a mixed loss mode based on a training data set to obtain a trained face recognition network model;
step S31: a training data set is prepared. To enhance the generalization capability of the model, two training sets are used here: original face training set and based on original face simulation gauze mask training set, wherein:
original face training set: the method is a training set used for learning face features, each folder in a data set represents a face ID, and each ID comprises face images of the person illuminated at different angles. The number of images in each folder is different, and about 6-40 faces are generally contained according to the actual acquisition capacity.
Based on the original face simulation mask training set: the training set is used for learning the human face and the facial features of the mask, each folder in the data set represents a human face ID, and each ID comprises facial images of the person illuminated at different angles. Different from the original face training set, the simulated mask training set simulates masks to corresponding faces according to face angles (left, middle and right) of mask styles (N95, KN95, scientific, gas, inpaint and cloth), and determines the proportion of mask simulation according to the number of images of each folder. Both the accuracy of face recognition and the balance of the number of sufficient face wearing masks are ensured, and the proportion of two kinds of data, namely original face training data (without masks) and simulated mask data (with masks) needs to be adjusted to a proper value. The current ratio is 1: 1.
Step S32: and training a face recognition network model. Firstly, a basic network and corresponding mixed loss are built through the basic network and the mixed loss function architecture. Meanwhile, in the process of training a basic network, an original face training set and a mask training set based on original face simulation are combined, and are randomly conveyed to a face recognition network model in a Mini Batch mode from the original face training set, so that the network is optimally trained.
Step S33: and (4) exporting a model and extracting characteristics. The classifier (classification loss function), the boundary mining loss function, the BNNeck and other operation layers are not needed in the use process of the actual face recognition network model, because the operations occur after the feature vectors and exist only for optimizing the network. And (4) cutting the training network to remove the classifier, excavating the boundary to obtain a loss function, and reserving the weight of the basic network only in a BNNeck layer. Thus, a face recognition network model for the recognition process is obtained.
concat layer: feature vectors of three branches can be obtained from the basic network, and are connected into a final feature vector for feature comparison through concat operation. The operation layer only occurs in the identification process, the input is three feature vectors of the output of the basic network, and the output of the operation layer is 1x384 feature vectors.
And (3) feature extraction, wherein when feature extraction is carried out on the image by using the recognition model, feature vectors of three branches can be obtained from the recognition model and are sent to a concat layer to obtain 1x384 feature vector output. This operation fuses the feature vectors of the multiple branches. The feature vector obtained by final fusion has stronger robustness, and the characterization capability of the feature vector is effectively improved by the fusion of the multi-dimensional feature vector. And then carrying out vector Normalization on the fused feature vectors by using L2 Normalization, wherein the obtained result vectors can be used as final output of face recognition network feature extraction.
And S4, identifying the face image to be identified by using the trained face identification network model.
Carrying out the following steps of 1: and during N retrieval, the face recognition network model respectively extracts the face to be recognized and the face in the face library. And calculating the inner product of the face to be recognized and the feature vectors in the face library to obtain the cosine distance between the two feature vectors. The similarity of the two vectors is calculated by the following formula:
similarity=(dist+1)/2
where dist is a cosine distance, N similarity degrees representing the similarity degrees between the face to be recognized and N face IDs in the face library can be obtained by the above method, the item with the highest similarity degree is found from the N similarity degrees, and it is verified whether the item is greater than a preset face similarity degree threshold (here, set to 0.76). And when the similarity is larger than the threshold value, the ID to be identified exists in the bottom library. Otherwise, the face library is considered to have no ID.
Carrying out the following steps of 1: 1, during verification, replacing the face library in the operation into a target face.
Further, the present invention also provides an electronic device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described multi-granularity blending loss based face recognition method for a respirator.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A wear-mask face recognition method based on multi-granularity mixing loss is characterized by comprising the following steps:
s1, designing a basic network, wherein the basic network generates a plurality of branches according to a preset granularity;
s2, designing a mixed loss function, wherein the mixed loss function adopts a mixed loss mode to train and optimize each branch of each granularity of the basic network;
S3, training each branch of each granularity of the basic network by using the mixed loss function and adopting a mixed loss mode based on a training data set to obtain a trained face recognition network model; the mixing loss comprises cross entropy classification loss and boundary mining loss which take softmax as an activation function; the training of each branch of each granularity of the basic network by adopting the mixed loss mode is specifically realized as follows: training each branch of each granularity by using a mixed loss mode, wherein each branch is simultaneously optimized by boundary mining loss and cross entropy classification loss taking softmax as an activation function, and a BNNeck layer is used before the classification loss;
and S4, identifying the face image to be identified by using the trained face identification network model.
2. The wear mask face recognition method based on multi-granularity mixing loss according to claim 1, wherein the base network generates a plurality of branches according to a preset granularity size, and the method comprises the following steps:
the feature map is divided into three branches in the middle of the basic network, one branch represents the whole face, one branch represents the upper part of the face, the other branch represents the lower part of the face, and the upper part and the lower part of the face are divided according to a preset proportion.
3. The mask-worn face recognition method based on multi-granularity mixing loss according to claim 1, wherein the boundary mining loss is specifically:
Figure FDA0003747955160000011
each batch has P IDs, k face images belonging to the ID, namely, each batch has P images, the images in the batch are traversed during loss calculation, the traversed and selected images are regarded as anchor points marked as a, all k images in the ID where the anchor points are located are regarded as positive samples marked as P, the images of the IDs where all the non-anchor points are located are regarded as negative samples marked as n, wherein n1 and n2 represent negative samples from different IDs in the negative samples, and Max (d is a positive sample and a negative sample, and n is a negative sample a,p ) Calculating Euclidean distances between anchor point characteristics and all positive sample characteristics, and then selecting the maximum value, Min (d), of all Euclidean distances a,n ) The Euclidean distance between the anchor point feature and all negative sample features needs to be calculated and the minimum value, Min (d) n1,n2 ) Is to calculate the euclidean distance between negative samples for different IDs and find the minimum, m1 m2 is used as the weight term in the loss function.
4. The wear mask face recognition method based on multi-granularity mixing loss according to claim 1 or 2, wherein the training data set comprises an original face training set and a simulated mask training set based on an original face, wherein:
The original face training set comprises: the method is a training set used for learning human face features, each folder in a data set represents a human face ID, and each ID comprises human face images illuminated at different angles of a person;
the mask training set based on the original face simulation comprises the following steps: the simulation mask training set is used for learning the face and wearing mask face features, each folder in the data set represents a face ID, each ID comprises face images of the person illuminated at different angles, the simulation mask training set simulates a mask to the corresponding face according to the mask style and the face angle, and the mask simulation proportion is determined according to the number of the images of each folder.
5. The method for face recognition based on multi-granularity mixing loss of the mask worn by the wearer according to claim 1 or 2, wherein in the process of training the face recognition network model, an original face training set and a face simulation mask training set based on the original face are combined, and are randomly transmitted to the face recognition network model in a Mini Batch mode from the combination, so that the network is optimized.
6. The wear mask face recognition method based on multi-granularity mixing loss according to claim 1 or 2, wherein the trained face recognition network model is used for recognizing a face image to be recognized, specifically:
The face recognition network model respectively extracts the features of the face to be recognized and the face in the face library, calculates the inner product of the feature vectors in the face to be recognized and the face library to obtain the cosine distance of the two feature vectors, and calculates the similarity of the two vectors through the following formula:
similarity=(dist+1)/2
dist is a cosine distance, N similarity degrees are obtained through the formula and represent similarity degrees between the face to be recognized and N face IDs in the face library, the highest similarity degree item is found from the N similarity degrees, whether the item is larger than a preset face similarity degree threshold value is verified, when the similarity degree is larger than the threshold value, the ID to be recognized exists in the bottom library, and otherwise, the face library is considered to have no ID.
7. The wear-mask face recognition method based on multi-granularity mixing loss according to claim 6, wherein the feature vector obtained by feature extraction on the face is as follows:
and fusing the feature vectors of the three branches to obtain the feature vector of the image.
8. An electronic device, comprising:
at least one processor; and (c) a second step of,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
CN202110808959.8A 2021-07-16 2021-07-16 Wearing mask face recognition method based on multi-granularity mixed loss and electronic equipment Active CN113537066B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110808959.8A CN113537066B (en) 2021-07-16 2021-07-16 Wearing mask face recognition method based on multi-granularity mixed loss and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110808959.8A CN113537066B (en) 2021-07-16 2021-07-16 Wearing mask face recognition method based on multi-granularity mixed loss and electronic equipment

Publications (2)

Publication Number Publication Date
CN113537066A CN113537066A (en) 2021-10-22
CN113537066B true CN113537066B (en) 2022-09-09

Family

ID=78100009

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110808959.8A Active CN113537066B (en) 2021-07-16 2021-07-16 Wearing mask face recognition method based on multi-granularity mixed loss and electronic equipment

Country Status (1)

Country Link
CN (1) CN113537066B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723368B (en) * 2021-10-29 2022-07-12 杭州魔点科技有限公司 Multi-scene compatible face recognition method and device, electronic equipment and storage medium
CN113963237B (en) * 2021-12-22 2022-03-25 北京的卢深视科技有限公司 Model training method, mask wearing state detection method, electronic device and storage medium
CN114120430B (en) * 2022-01-26 2022-04-22 杭州魔点科技有限公司 Mask face recognition method based on double-branch weight fusion homology self-supervision

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348416A (en) * 2019-07-17 2019-10-18 北方工业大学 Multi-task face recognition method based on multi-scale feature fusion convolutional neural network
CN111783600A (en) * 2020-06-24 2020-10-16 北京百度网讯科技有限公司 Face recognition model training method, device, equipment and medium
CN112036266A (en) * 2020-08-13 2020-12-04 北京迈格威科技有限公司 Face recognition method, device, equipment and medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480261B (en) * 2017-08-16 2020-06-16 上海荷福人工智能科技(集团)有限公司 Fine-grained face image fast retrieval method based on deep learning
CN113033328A (en) * 2021-03-05 2021-06-25 杭州追猎科技有限公司 Personnel mask wearing state detection and identification method based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348416A (en) * 2019-07-17 2019-10-18 北方工业大学 Multi-task face recognition method based on multi-scale feature fusion convolutional neural network
CN111783600A (en) * 2020-06-24 2020-10-16 北京百度网讯科技有限公司 Face recognition model training method, device, equipment and medium
CN112036266A (en) * 2020-08-13 2020-12-04 北京迈格威科技有限公司 Face recognition method, device, equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于深度卷积对抗神经网络的多状态自适应人脸识别方法;杜翠凤,温云龙,李建中;《移动通信》;20190930;第43卷(第9期);第9-14页 *
精准人脸识别及测温技术在疫情防控中的应用;彭骏,吉纲,张艳红,占涛;《软件导刊》;20201031;第19卷(第10期);第75-85页 *

Also Published As

Publication number Publication date
CN113537066A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
CN113537066B (en) Wearing mask face recognition method based on multi-granularity mixed loss and electronic equipment
Wang et al. Domain generalization via shuffled style assembly for face anti-spoofing
Liu et al. Detach and adapt: Learning cross-domain disentangled deep representation
CN111310624B (en) Occlusion recognition method, occlusion recognition device, computer equipment and storage medium
CN109325952B (en) Fashionable garment image segmentation method based on deep learning
CN109829356B (en) Neural network training method and pedestrian attribute identification method based on neural network
CN108564129B (en) Trajectory data classification method based on generation countermeasure network
CN111950656B (en) Image recognition model generation method and device, computer equipment and storage medium
CN109033938A (en) A kind of face identification method based on ga s safety degree Fusion Features
CN112446302B (en) Human body posture detection method, system, electronic equipment and storage medium
CN109472274B (en) Training device and method for deep learning classification model
CN110598017B (en) Self-learning-based commodity detail page generation method
CN104391879B (en) The method and device of hierarchical clustering
CN113128478B (en) Model training method, pedestrian analysis method, device, equipment and storage medium
CN108073851A (en) A kind of method, apparatus and electronic equipment for capturing gesture identification
CN113761105A (en) Text data processing method, device, equipment and medium
CN113449671A (en) Multi-scale and multi-feature fusion pedestrian re-identification method and device
CN111178403A (en) Method and device for training attribute recognition model, electronic equipment and storage medium
CN111125415A (en) Clothing design method and device, computer equipment and storage medium
CN111125396B (en) Image retrieval method of single-model multi-branch structure
CN109255382A (en) For the nerve network system of picture match positioning, method and device
Zhang et al. Facial component-landmark detection with weakly-supervised lr-cnn
Ou et al. Ad-rcnn: Adaptive dynamic neural network for small object detection
CN111914796B (en) Human body behavior identification method based on depth map and skeleton points
CN111191527B (en) Attribute identification method, attribute identification device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant