CN112949565A - Single-sample partially-shielded face recognition method and system based on attention mechanism - Google Patents

Single-sample partially-shielded face recognition method and system based on attention mechanism Download PDF

Info

Publication number
CN112949565A
CN112949565A CN202110320104.0A CN202110320104A CN112949565A CN 112949565 A CN112949565 A CN 112949565A CN 202110320104 A CN202110320104 A CN 202110320104A CN 112949565 A CN112949565 A CN 112949565A
Authority
CN
China
Prior art keywords
feature map
module
face
layer
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110320104.0A
Other languages
Chinese (zh)
Other versions
CN112949565B (en
Inventor
钟福金
侯梦军
王润生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202110320104.0A priority Critical patent/CN112949565B/en
Publication of CN112949565A publication Critical patent/CN112949565A/en
Application granted granted Critical
Publication of CN112949565B publication Critical patent/CN112949565B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Abstract

The invention belongs to the field of single-sample partially-occluded face recognition, and particularly relates to a single-sample partially-occluded face recognition method and system based on an attention mechanism, wherein the method comprises the following steps: acquiring a partially shielded test face image and an unshielded single sample face Gallery set and preprocessing the partially shielded test face image and the unshielded single sample face Gallery set; inputting the preprocessed data into a differential network formed by ReseNet-34, and extracting a shallow feature map through a convolution layer; utilizing a space attention module to adjust the weight of the shallow layer feature map from the space position information, and multiplying the weight by the feature map output by the last convolutional layer to highlight the local detail feature of the shallow layer feature map; carrying out subtraction operation on the feature images of the occluded image and the unoccluded image with the highlighted local detail features; after absolute value processing is carried out on the difference value, the difference value is calibrated on the original characteristic diagram according to the channel through a channel attention module; sending the calibrated characteristic diagram into a full connection layer to output a classification result; the invention can accurately identify the face image with partial shielding.

Description

Single-sample partially-shielded face recognition method and system based on attention mechanism
Technical Field
The invention belongs to the field of single-sample partially-occluded face recognition, and particularly relates to a single-sample partially-occluded face recognition method and system based on an attention mechanism.
Background
The human face recognition is a biological recognition technology for carrying out identity recognition based on the facial feature information of people, and the purpose of recognizing the identities of different people is achieved by collecting images or videos containing human faces by using a camera or a camera and carrying out a series of technical processing on the collected human faces. After many years of research, face recognition has achieved good results under controlled conditions, but face recognition under unconstrained conditions still faces many challenges. In some special application scenarios, such as an identification card management system, a criminal investigation law enforcement system, passport verification, identity recognition at a registration port, and the like, only one face image (a certificate photo face image) of each person can be collected as a training sample, and when the testing sample may be affected by severe face changes such as illumination changes, posture changes, expression changes, external object occlusion, and the like, the differences of facial features of a human face within a class are larger than the differences between classes, so that the problem of single-sample partially-occluded face recognition occurs, that is, only a single-sample design face recognition method is used to recognize an unknown face image.
In real life, the situations causing facial occlusion can be classified into the following three categories: 1) shielded by external objects (such as sunglasses, scarves, hats, etc.); 2) extreme lighting (e.g., shadows); 3) self-occlusion (e.g. side face) caused by pose change, we will hereinafter refer to occlusion as face recognition primarily occluded by external objects.
Although face recognition has been well addressed in the area of occlusion research, there are still some problems: 1) the current method still cannot completely remove the influence caused by the shielding. The recognition result will be more ideal if the influence of occlusion can be completely removed. 2) When the existing method faces the problem of single sample, the existing mature face recognition algorithms cannot extract the intra-class change information by using a single training sample, so that the recognition effects of the existing mature face recognition algorithms are poor.
Most research works are focused on improving the accuracy of an identification system, the problems of a face database are ignored, for example, due to the difficulty of collecting samples or the storage limit of the system, each person in the database may only have one sample image, under the condition, most of the traditional methods, such as PCA and LDA, have the condition of performance reduction or even incapability of working, and when a deep learning route is adopted, each person only stores the face database of one image, and lacks a large amount of samples to learn rich intra-class change information, and on the premise, the problem of shielding is processed and has no good performance. In summary, the single-sample partially-occluded face recognition is inevitable in real application scenarios, and the face recognition under the single-sample constraint still faces a great challenge.
Disclosure of Invention
In view of the above-mentioned problem of learning rich intra-class variation information due to lack of a large number of samples and the problem of influence on recognition accuracy due to loss of facial feature information caused by the existence of occlusion, the present invention provides a single-sample partially-occluded face recognition method and system based on attention mechanism, the method comprising:
inputting a face image pair set with a category label as a source domain data set, wherein the face image pair comprises a clean front face and a face with partial shielding in the same identity, and preprocessing the face image pair data set;
inputting the preprocessed face image into a differential network formed by two ResNet-34, and extracting a shallow feature map through a convolution layer;
inputting the shallow feature map into a residual network consisting of four sequentially cascaded residual module groups, and extracting the global features of the face image;
embedding a space attention module between the second layer residual error module and the fourth layer residual error module, adjusting the weight of pixels in the uncovered area of the shallow layer feature map, and outputting a space position weight feature map;
the spatial position weight feature map is multiplied and connected with a feature map output by a residual error module of a fourth layer, and local detail features from a lower layer are obtained through fusion of cross-layer information;
taking the absolute value of the difference value of the feature map of the shielded image and the feature map of the unshielded image with the highlighted local detail features as the input of a channel attention module;
the channel attention module calibrates the shielded image and the unshielded image feature map with the highlighted local detail features according to the input absolute value according to the channel, and the calibrated feature map is sent to the full connection layer to output a classification result;
performing iterative training on the network by jointly optimizing a cross entropy loss function and the contrast loss of the human face to the image due to the difference;
after multiple rounds of training, the network loss tends to be stable, and the iterative training process is finished to obtain a trained network model;
and inputting the target domain single sample human face Gallery set and the partially shielded test human face images into a trained network model, and calculating and outputting the category of the finally shielded human face according to the cosine distance of the characteristics of the human face images by the model.
Further, pre-processing the face image data set includes cropping the face image to a size of 128 × 128, and performing a pixel normalization operation on the cropped face image, which is expressed as:
Xpix=(Xpix-128)/128;
wherein, XpixAnd the image pixel values are corresponding to the face images.
Further, extracting the shallow feature map through a convolutional layer includes: inputting the face image with the channel number of 3 into a convolution layer with the convolution kernel size of 3 multiplied by 3, the channel number of 64 and the step length of 1 for feature extraction, outputting a feature map with the size of 128 multiplied by 128 and the output channel changed into 64, and obtaining a shallow feature map of the face image by passing the feature map through a maximum pooling layer.
Furthermore, in the four residual module groups which are sequentially cascaded, each residual module group sequentially comprises 3, 4, 6 and 3 residual modules according to the cascade order, and the size of the feature map output by each residual module group according to the cascade order is 64 × 64, 32 × 32, 16 × 16 and 8 × 8.
Further, the process of acquiring local detail features from the lower layer includes:
a spatial attention module is embedded between the second layer residual error module and the fourth layer residual error module;
obtaining a weight vector characteristic diagram by utilizing the global average pooling and the global maximum pooling of the input h ' xw ' xc ' three-dimensional tensor;
processing the weight vector feature map by using a convolution layer with 7 multiplied by 7 convolution kernel, 3 filling size and 1 channel number and a Sigmoid nonlinear activation layer;
upsampling the feature map with the size of 8 multiplied by 8 output by the fourth layer residual group module into a feature map with the size of 32 multiplied by 32 by a bilinear interpolation method;
and multiplying the feature map after the up-sampling with the processed weight vector feature map, and outputting the feature map with the size of 8 multiplied by 8 through the down-sampling to obtain the local detail features from the lower layer.
Further, the calibrating, by the channel attention module, on the feature map of the occluded image and the feature map of the unoccluded image after highlighting the local detail feature according to the absolute value of the difference of the input feature maps by channels includes:
performing subtraction operation on the feature maps of the occluded image and the unoccluded image with the highlighted local detail features, and taking the absolute value of the subtracted difference as the input of a channel attention module;
the channel attention module obtains two 1 × 1 × C channel descriptions by respectively using global average pooling and global maximum pooling operations;
the two channel descriptions are respectively sent into a shallow neural network, the number of neurons in the first layer in the shallow neural network is C/r, the activation function is ReLU, and the number of neurons in the second layer is C;
adding and combining two characteristic graphs extracted by a shallow neural network, and obtaining a weight coefficient of each channel by using a Sigmoid activation function;
after the weight of the characteristic channel is obtained, the original characteristic is weighted channel by channel through multiplication, and the original characteristic recalibration on the channel dimension is completed;
wherein C is the number of channels, and r is the dimensionality reduction factor.
Further, the cross entropy loss function is expressed as:
Figure BDA0002992722240000041
Figure BDA0002992722240000042
wherein the content of the first and second substances,
Figure BDA0002992722240000043
is a cross entropy loss function;
Figure BDA0002992722240000044
representing samples y in a networkiThe probability of the identity classification of (a),
Figure BDA0002992722240000045
indicating that the shielded face image exists in the ith personal face image pair, F is a full connection layer after the last convolution layer in the differential network,
Figure BDA0002992722240000046
the weight feature map is a weight feature map showing the feature of a face with partial occlusion after the channel attention mask operation, and μ (-) shows the channel attention output, and has a value of [0, 1%]A mask in between; f (-) represents the characteristics of the final output of the convolutional layer; n represents the number of training lumped samples of the face image.
Further, the contrast loss function represents:
Figure BDA0002992722240000051
wherein the content of the first and second substances,
Figure BDA0002992722240000052
mu (-) represents the weight feature map of the channel attention output, with a value of [0,1 ] for the loss of contrast between the two pictures due to the existence of the occlusion region]A mask in between;
Figure BDA0002992722240000053
a face image indicating that an occlusion exists in the ith personal face image pair; x is the number ofiA face image indicating that no occlusion exists in the ith personal face image pair; f (-) represents the characteristics of the final output of the convolutional layer; n represents the number of training lumped samples of the face image.
The invention also provides a single-sample partially-occluded face recognition system based on the attention mechanism, which comprises an image acquisition module, a data preprocessing module, a neural network module and an output module, wherein:
the image acquisition module is used for inputting a data set and acquiring face image information or a face image to be detected;
the data preprocessing module is used for performing pixel normalization processing on the face image, performing data enhancement in a source domain data set and expanding a training set of a source domain through the operation of randomly adding a shelter;
the neural network module is used for constructing and training a differential neural network formed by two identical ReseNet-34 embedded with an attention mechanism;
and the output module is used for outputting the final belonged identity category of the face image to be detected, namely migrating the model trained by using the source domain to a target domain data set, sending the single sample face Gallery set and the partially shielded test face image into the model, and judging the identity of the partially shielded test face image.
The invention has the beneficial technical effects that:
(1) the method has the effects of high speed and high precision, and can accurately judge the identity of the face image with partial shielding, which is input randomly.
(2) The invention provides a novel feature extraction framework which utilizes a differential network and simultaneously considers the fusion of high-level information and low-level information, the fusion of cross-layer information is obtained through the connection of global information and local information, the feature characterization capability of the network is enhanced, so that more discriminant representation is obtained, and the high-precision single-sample partially-shielded face recognition effect is realized.
(3) According to the method, a space attention module and a channel attention module are embedded in a differential network, the space attention module guides a model to pay attention to the characteristics of the model, and the characteristics of an unoccluded area are paid attention to; the channel attention module completes the recalibration of the original features on the channel dimension through the correlation between modeling channels, inhibits the channel which actively responds to the shielded area, and overcomes the defects of the existing single-sample partially-shielded face recognition method.
Drawings
Fig. 1 is a schematic process diagram of a single-sample partially-occluded face recognition method based on an attention mechanism according to an embodiment of the present invention;
FIG. 2 is a schematic view of a spatial attention module according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a channel attention module according to an embodiment of the invention;
FIG. 4 is a schematic diagram of a training process according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a single-sample partially-occluded face recognition network based on an attention mechanism according to an embodiment of the present invention;
fig. 6 is a diagram illustrating an application effect of the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a single-sample partially-occluded face recognition method based on an attention mechanism, which specifically comprises the following steps of:
inputting a face image pair set with a category label as a source domain data set, wherein the face image pair comprises a clean front face and a face with partial shielding in the same identity, and preprocessing the face image pair data set;
inputting the preprocessed face image into a differential network formed by two ResNet-34, and extracting a shallow feature map through a convolution layer;
inputting the shallow feature map into a residual network consisting of four sequentially cascaded residual module groups, and extracting the global features of the face image;
embedding a space attention module between the second layer residual error module and the fourth layer residual error module, adjusting the weight of pixels in the uncovered area of the shallow layer feature map, and outputting a space position weight feature map;
the spatial position weight feature map is multiplied and connected with a feature map output by a residual error module of a fourth layer, and local detail features from a lower layer are obtained through fusion of cross-layer information;
taking the absolute value of the difference value of the feature map of the shielded image and the feature map of the unshielded image with the highlighted local detail features as the input of a channel attention module;
the channel attention module calibrates the shielded image and the unshielded image feature map with the highlighted local detail features according to the input absolute value according to the channel, and the calibrated feature map is sent to the full connection layer to output a classification result;
performing iterative training on the network by jointly optimizing a cross entropy loss function and the contrast loss of the human face to the image due to the difference;
after multiple rounds of training, the network loss tends to be stable, and the iterative training process is finished to obtain a trained network model;
and inputting the target domain single sample human face Gallery set and the partially shielded test human face images into a trained network model, and calculating and outputting the category of the finally shielded human face according to the cosine distance of the characteristics of the human face images by the model.
In one embodiment, the invention is intended to use a trained model, and the data set used in the training phase is the CASIA _ WebFace face data set, which serves as the source domain and contains 494414 face images collected from the Internet, from 10575 people, with labels represented by the same numerical value of the same identity. In order to solve the problem of identification accuracy reduction caused by the existence of a shielding area, the invention randomly adds black blocks or masks, leaves and other real shielding objects to the CASIA _ WebFace data set for data enhancement, and trains the CASIA _ WebFace data set after data enhancement on a model. After the model is trained, the model is migrated to a target domain data set, and in order to ensure fairness of partitioning of a training set and a test set, partition structures of a target domain single-sample face Gallery set and a partially-shielded test face image set in all comparison experiments are the same in this embodiment. The target domain data set comprises three single sample face data sets of AR, Extended Yale B and CAS-PEAL-R1.
Preprocessing the source domain and target domain data sets: all pictures are uniformly cut into the size of 128 multiplied by 128, and the processed face image is subjected to pixel normalization processing, wherein the formula comprises the following steps:
Xpix=(Xpix-128)/128;
wherein, XpixThe correspondence is the input face image pixel value, specifically, the face image pixel value input to the differential network.
And sequentially transmitting the preprocessed sample images to a neural network, and training the network by utilizing a back propagation minimized loss function. Compared with the traditional single-sample partial occlusion face recognition algorithm, the method adopts ResNet-34 to reduce the size of the model and improve the accuracy of the model, the ResNet-34 adds a bypass connection (shortcut) branch outside an original convolutional layer to form a basic residual module, the original mapping H (X) is expressed as H (X) (F (X)) + x, wherein F (X) is residual mapping, x is an input signal, the learning of the convolutional layer pair H (X) is converted into the learning of F (X) through a residual module structure, the learning of F (X) is simpler than that of H (X), the structure reduces the calculated amount and effectively solves the attenuation problem caused by the fact that the number of layers of the network is too deep.
Inputting a face image into a ResNet-34 network, performing shallow feature extraction through a convolution layer to serve as an input feature map of a subsequent residual error network, specifically, performing feature extraction on a feature map with an input channel number of 3 through a convolution layer with a kernel size of 3 x 3, a channel number of 64 and a step length of 1, wherein the size of an output feature map is 128 x 128, an output channel is changed into 64, and the output feature map serves as an input feature map of a subsequent residual error module after passing through a maximum pooling layer.
And after the shallow feature map is extracted, four groups of residual modules which are sequentially connected are connected to form a residual network, and the residual network is used as a global branch to extract the global features of the face image.
It can be understood that the core improvement of the invention is that the attention mechanism and the cross-layer information fusion two modules proposed by the invention are integrated, and for the attention mechanism module, the attention mechanism module is divided into a space attention mechanism and a channel attention mechanism and is mainly embedded in ResNet-34; in order to simultaneously input a pair of face images each time, the ResNet-34 is improved to be used as a differential network, a shallow layer feature map is extracted at the beginning of the convolutional layer, four groups of residual modules which are sequentially connected are connected behind the maximum pooling layer to form a residual network, and the residual network is used as a global branch to extract the global features of the face images; on the other hand, a space attention module is embedded between the second layer and the fourth layer of the residual error network, the feature maps output by the second layer and the fourth layer are connected, namely cross-layer information fusion is carried out, and the absolute value of the difference of the feature maps of the fourth layer of the differential network is used as the input of a next channel attention module. In the present invention, if not specifically emphasized, the residual error network of the present invention mainly refers to a differential network composed of a plurality of sets of residual error modules and attention mechanism modules after the ResNet-34, and of course, the above division refers to only the point for more highlighting the improvement of the present invention, and those skilled in the art can adaptively understand the present invention according to the overall embodiment of the present invention and the attached drawings.
Further, the differential network is composed of a residual module and an attention mechanism module.
The process of constructing the differential network comprises the following steps:
inputting the shallow layer feature map output by the first convolutional layer into two ResNet-34 network branches, wherein each ResNet-34 network branch is formed by connecting 4 groups of residual modules in series, the number of input channels of each group of residual modules is 64, 128, 256 and 512, each residual module consists of convolution operation, Batch Normalization (BN) operation and modified linear unit (ReLU) operation, the series of operations act on the mapping of global features, and the corresponding output channels are 64, 128, 256 and 512;
embedding a spatial attention module between the second layer residual error module and the fourth layer residual error module of each branch of the differential network, wherein the embedding process of the spatial attention module comprises the following steps: inserting a spatial attention module in a residual network for guiding a model to pay attention to meaningful features, specifically, a three-dimensional tensor of h ' × w ' × c ' output by a convolution layer of a second layer of residual module, and using global average pooling and global maximum pooling, except that an operation is performed in the dimension of a channel, that is, all input channels are pooled into 2 real numbers, two weight vectors of (h ' × w ' × 1) are obtained from the input of the (h ' × w ' × c ') shape, and at the moment, the two weight vectors of (h ' × w ' × 1) are spliced into a weight vector feature map of (h ' × w ' × 2) based on the channel, wherein h ' and w ' are the length and the width of an input face image respectively, and c ' is the number of the channel;
performing convolution by using a convolution kernel of 7 multiplied by 7, wherein the filling size is 3, the number of channels is compressed into 1, and then a new (h '× w' × 1) weight vector characteristic diagram is formed after convolution through Sigmoid nonlinear activation operation;
the 8 × 8 feature map output by the residual error module of the fourth layer is up-sampled into a 32 × 32 feature map by a bilinear interpolation method, the feature map is multiplied by the (h '× w' × 1) weight vector feature map on a channel level, and a feature map with the size of 8 × 8 is output by down-sampling to obtain a new feature after scaling;
and subtracting the scaled new feature maps at two branches at the tail end of the network, and solving the absolute value of the difference as the input of the channel attention module.
In one embodiment, after the convolution operation is performed by the second layer residual error module, a three-dimensional tensor of 32 × 32 × 128 is input, which is the length, width and number of channels of the input feature map, the three-dimensional tensor is firstly subjected to global maximum pooling and global average pooling, that is, pooling in the dimension of the column channels, the maximum value and the average value of one column of channels are taken, and pooling one column of channels at a time becomes one value or one channel, and the length and the width are not changed. The input feature map is 32 × 32 × 128, and becomes a 32 × 32 × 1 feature map after pooling once. At this time, two (32 × 32 × 1) weight vectors are spliced into a (32 × 32 × 1) weight vector feature map based on the channels.
Performing convolution by using a 7 × 7 convolution kernel, wherein the filling size is 3, the number of channels is compressed into 1, and then a new (32 × 32 × 1) weight vector characteristic diagram is formed after convolution through Sigmoid nonlinear activation operation;
upsampling the 8 × 8 feature map output by the residual error module of the fourth layer into a 32 × 32 feature map by a bilinear interpolation method, multiplying the 32 × 32 × 1 weight vector feature map at a channel level, namely fusing features from a lower layer, and outputting the 8 × 8 feature map by downsampling to obtain a scaled new feature;
the specific process of the channel attention module for adjusting the weight of each channel is as follows:
and subtracting the scaled new feature maps at two branches at the tail end of the network, and taking the absolute value of the feature difference as the input of the channel attention module. In order to converge the spatial features, the channel attention module receives the new feature map after subtraction, and for this, two 1 × 1 × 512 channel descriptions are obtained by adopting global average pooling and global maximum pooling, wherein the two different pooling means that the extracted high-level features are richer;
respectively sending the two channel descriptions into a shallow neural network, wherein the number of neurons in the first layer is 512/16, the activation function is ReLU, the number of neurons in the second layer is 512, and the two layers of neural networks are shared;
adding and combining the two obtained feature graphs, and obtaining a weight coefficient of each channel by using a Sigmoid activation function;
after the weight of the characteristic channel is obtained, the original characteristic is weighted channel by channel through multiplication, and the original characteristic recalibration on the channel dimension is completed;
and taking the characteristics re-calibrated by the channel attention module as final human face characteristic representation, and outputting a classification result according to the full connection layer.
And performing joint cross entropy loss function on the network and solving the contrast loss cross entropy loss caused by the difference of the human face to the image, performing joint optimization on the whole differential network through back propagation minimization loss function, and performing iterative training on the network.
Further, the cross entropy loss function is expressed as follows:
Figure BDA0002992722240000111
Figure BDA0002992722240000112
the contrast loss is expressed as follows:
Figure BDA0002992722240000113
wherein the content of the first and second substances,
Figure BDA0002992722240000114
is a cross entropy loss function;
Figure BDA0002992722240000115
representing samples y in a networkiThe probability of the identity classification of (a),
Figure BDA0002992722240000116
face image, x, indicating the presence of an occlusion in the ith pair of face imagesiIndicating that there is no face image blocked in the ith personal face image pair, F is a full connection layer after the last convolution layer in the differential network,
Figure BDA0002992722240000117
for the loss of contrast between the two pictures due to the presence of the occlusion region,
Figure BDA0002992722240000118
represents the feature of the face with local occlusion after the channel attention mask operation, and μ (-) represents the channel attention outputThe weight characteristic graph is a value of 0,1]A mask in between; f (-) represents the characteristics of the final output of the convolutional layer; n represents the number of training lumped samples of the face image.
Using an Adam optimizer to carry out training adjustment, after multiple rounds of training, the network tends to be stable, and the iteration process is ended to obtain a trained network model, wherein the training process is shown in figure 4,
after an image data set is obtained, preprocessing a face image;
constructing a differential network model based on an attention mechanism, namely the network model constructed by the invention;
training the network using the data set and performing multiple iterations;
and carrying out loss solution on the result output by the network and the real identity value label corresponding to the face image until the loss tends to be stable.
At this time, the training is finished and the trained network model is output.
The trained network model is shown in fig. 5.
When the trained neural network model is used, a target domain single sample human face Gallery set and a partially shielded test human face image are input into the trained neural network model, and the model calculates and outputs the category of the finally shielded human face according to the characteristic cosine distance of the human face image.
The invention also provides a single-sample partially-occluded face recognition system based on the attention mechanism, which comprises an image acquisition module, a data preprocessing module, a neural network module and an output module, wherein:
the image acquisition module is used for inputting a data set and acquiring face image information or a face image to be detected;
the data preprocessing module is used for performing pixel normalization processing on the face image, performing data enhancement in a source domain data set and expanding a training set of a source domain through the operation of randomly adding a shelter;
the neural network module is used for constructing and training a differential neural network formed by two identical residual error networks embedded with an attention mechanism;
and the output module is used for outputting the final belonged identity category of the face image to be detected, namely migrating the model trained by using the source domain to a target domain data set, sending the single sample face Gallery set and the partially shielded test face image into the model, and judging the identity of the partially shielded test face image.
Further, the neural network module comprises ResNet-34, when the data output by the data preprocessing module is input into the neural network module, a shallow feature map is extracted through a convolution layer, the shallow feature map is input into a residual error network formed by four residual error module groups which are sequentially cascaded, and the global feature of the face image is extracted; a spatial attention module is embedded between the second layer residual error module and the fourth layer residual error module, the weight of pixels in the uncovered area of the shallow layer feature map is adjusted, and a spatial position weight feature map is output; connecting the spatial position weight feature map with a feature map output by a fourth layer residual error module, and acquiring local detail features from a lower layer through fusion of cross-layer information;
performing subtraction operation on the feature maps of the occluded image and the unoccluded image with the highlighted local detail features, and taking the absolute value of the subtracted difference as the input of a channel attention module;
the channel attention module performs weight adjustment on each channel, and comprises the following steps:
taking global average pooling and global maximum pooling on the subtracted values received by the channel attention module to obtain two channel descriptions;
respectively sending the two channel descriptions into a shallow neural network to extract features, and combining the extracted features through addition;
the combined features utilize a Sigmoid activation function to obtain the weight coefficient of each channel;
after the weight of the characteristic channel is obtained, weighting the characteristic channel by channel to the characteristic graphs of the occluded image and the unoccluded image with the highlighted local detail characteristic through multiplication, and finishing the recalibration of the original characteristic on the channel dimension;
and the features recalibrated by the channel attention module are used as final human face feature representation, and classification results are output according to the full connection layer.
And (4) performing iterative training on the network by jointly optimizing a cross entropy loss function and the contrast loss of the human face to the image due to the difference, wherein the training process is detailed in a method part and is not repeated herein.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (9)

1. The single-sample partial occlusion face recognition method based on the attention mechanism is characterized by comprising the following steps:
inputting a face image pair set with a category label as a source domain data set, wherein the face image pair comprises a clean front face and a face with partial shielding in the same identity, and preprocessing the face image pair data set;
inputting the preprocessed face image into a differential network formed by two ResNet-34, and extracting a shallow feature map through a convolution layer;
inputting the shallow feature map into a residual network consisting of four sequentially cascaded residual module groups, and extracting the global features of the face image;
embedding a space attention module between the second layer residual error module and the fourth layer residual error module, adjusting the weight of pixels in the uncovered area of the shallow layer feature map, and outputting a space position weight feature map;
the spatial position weight feature map is multiplied and connected with a feature map output by a residual error module of a fourth layer, and local detail features from a lower layer are obtained through fusion of cross-layer information;
taking the absolute value of the difference value of the feature map of the shielded image and the feature map of the unshielded image with the highlighted local detail features as the input of a channel attention module;
the channel attention module is used for calibrating the feature graphs of the shielded image and the unshielded image with the highlighted local detail features according to the input absolute value according to the channel, and the calibrated feature graphs are sent to the full-connection layer to output classification results;
performing iterative training on the network by jointly optimizing a cross entropy loss function and the contrast loss of the human face to the image due to the difference;
after multiple rounds of training, the network loss tends to be stable, and the iterative training process is finished to obtain a trained network model;
and inputting the target domain single sample human face Gallery set and the partially shielded test human face images into a trained network model, and calculating and outputting the category of the finally shielded human face according to the cosine distance of the characteristics of the human face images by the model.
2. The attention-based single-sample partial occlusion face recognition method of claim 1, wherein pre-processing the face image data set comprises cropping the face image to 128 x 128 size and performing a pixel normalization operation on the cropped face image expressed as:
Xpix=(Xpix-128)/128;
wherein, XpixAnd the pixel values are corresponding to the face images.
3. The attention mechanism-based single-sample partially-occluded face recognition method according to claim 1, wherein extracting the shallow feature map through a convolution layer comprises: inputting the face image with the channel number of 3 into a convolution layer with the convolution kernel size of 3 multiplied by 3, the channel number of 64 and the step length of 1 for feature extraction, outputting a feature map with the size of 128 multiplied by 128 and the output channel changed into 64, and obtaining a shallow feature map of the face image by passing the feature map through a maximum pooling layer.
4. The method for recognizing the single-sample partially-occluded face based on the attention mechanism of claim 1, wherein in the four residual module groups which are cascaded in sequence, each residual module group comprises 3, 4, 6 and 3 residual modules in sequence according to the cascade order, and the feature map output by each residual module group according to the cascade order has the size of 64 × 64, 32 × 32, 16 × 16 and 8 × 8.
5. The attention mechanism-based single-sample partial occlusion face recognition method according to claim 4, wherein the process of acquiring local detail features from a lower layer comprises:
a spatial attention module is embedded between the second layer residual error module and the fourth layer residual error module;
obtaining a weight vector characteristic diagram by utilizing the global average pooling and the global maximum pooling of the input h ' xw ' xc ' three-dimensional tensor;
processing the weight vector feature map by using a convolution layer with 7 multiplied by 7 convolution kernel, 3 filling size and 1 channel number and a Sigmoid nonlinear activation layer;
upsampling the 8 × 8 feature map output by the fourth layer residual group module into a 32 × 32 feature map by a bilinear interpolation method;
and multiplying the feature map after the up-sampling with the processed weight vector feature map, and outputting the feature map with the size of 8 multiplied by 8 through the down-sampling to obtain the local detail features from the lower layer.
6. The attention mechanism-based single-sample partial occlusion face recognition method according to claim 1, wherein the channel attention module performing calibration by channels on feature maps of an occlusion image and an unoccluded image after highlighting the local detail feature according to the input absolute value comprises:
performing subtraction operation on the feature maps of the occluded image and the unoccluded image with the highlighted local detail features, and taking the absolute value of the subtracted difference as the input of a channel attention module;
the channel attention module obtains two 1 × 1 × C channel descriptions by respectively using global average pooling and global maximum pooling operations;
the two channel descriptions are respectively sent into a shallow neural network, the number of neurons in the first layer in the shallow neural network is C/r, the activation function is ReLU, and the number of neurons in the second layer is C;
adding and combining two characteristic graphs extracted by a shallow neural network, and obtaining a weight coefficient of each channel by using a Sigmoid activation function;
after the weight of the characteristic channel is obtained, weighting the characteristics of the shielded image and the unoccluded image with the highlighted local detail characteristics channel by channel through multiplication, and finishing the recalibration of the original characteristics on the channel dimension;
wherein C is the number of channels, and r is the dimensionality reduction factor.
7. The attention mechanism-based single-sample partial occlusion face recognition method according to claim 1, wherein the cross entropy loss function is expressed as:
Figure FDA0002992722230000031
Figure FDA0002992722230000032
wherein the content of the first and second substances,
Figure FDA0002992722230000033
is a cross entropy loss function;
Figure FDA0002992722230000034
representing samples y in a networkiThe probability of the identity classification of (a),
Figure FDA0002992722230000035
indicating that the shielded face image exists in the ith personal face image pair, F is a full connection layer after the last convolution layer in the differential network,
Figure FDA0002992722230000036
features indicating faces with partial occlusions after channel attention mask operationsThe sign, μ (-) represents the weighted feature map of the channel attention output, is of value [0, 1%]A mask in between; f (-) represents the characteristics of the final output of the convolutional layer; n represents the number of training lumped samples of the face image.
8. The attention mechanism-based single-sample partial occlusion face recognition method according to claim 1, wherein the contrast loss function represents:
Figure FDA0002992722230000041
wherein the content of the first and second substances,
Figure FDA0002992722230000042
mu (-) represents the weight feature map of the channel attention output, with a value of [0,1 ] for the loss of contrast between the two pictures due to the existence of the occlusion region]A mask in between;
Figure FDA0002992722230000043
a face image indicating that an occlusion exists in the ith personal face image pair; x is the number ofiA face image indicating that no occlusion exists in the ith personal face image pair; f (-) represents the characteristics of the final output of the convolutional layer; n represents the number of training lumped samples of the face image.
9. Single sample part shelters from face identification system based on attention system, its characterized in that includes image acquisition module, data preprocessing module, neural network module and output module, wherein:
the image acquisition module is used for inputting a data set and acquiring face image information or a face image to be detected;
the data preprocessing module is used for performing pixel normalization processing on the face image, performing data enhancement in a source domain data set and expanding a training set of a source domain through the operation of randomly adding a shelter;
the neural network module is used for constructing and training a differential neural network consisting of two identical ReseNet-34 embedded with an attention mechanism;
and the output module is used for outputting the final belonged identity category of the face image to be detected, namely migrating the trained model of the source domain to a target domain data set, sending the single sample face Gallery set and the partially shielded test face image into the model, and judging the identity of the partially shielded test face image.
CN202110320104.0A 2021-03-25 2021-03-25 Single-sample partially-shielded face recognition method and system based on attention mechanism Active CN112949565B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110320104.0A CN112949565B (en) 2021-03-25 2021-03-25 Single-sample partially-shielded face recognition method and system based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110320104.0A CN112949565B (en) 2021-03-25 2021-03-25 Single-sample partially-shielded face recognition method and system based on attention mechanism

Publications (2)

Publication Number Publication Date
CN112949565A true CN112949565A (en) 2021-06-11
CN112949565B CN112949565B (en) 2022-06-03

Family

ID=76227742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110320104.0A Active CN112949565B (en) 2021-03-25 2021-03-25 Single-sample partially-shielded face recognition method and system based on attention mechanism

Country Status (1)

Country Link
CN (1) CN112949565B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361431A (en) * 2021-06-15 2021-09-07 山西大学 Network model and method for face shielding detection based on graph reasoning
CN113378980A (en) * 2021-07-02 2021-09-10 西安电子科技大学 Mask face shading recovery method based on self-adaptive context attention mechanism
CN113536965A (en) * 2021-06-25 2021-10-22 深圳数联天下智能科技有限公司 Method and related device for training face shielding recognition model
CN113705466A (en) * 2021-08-30 2021-11-26 浙江中正智能科技有限公司 Human face facial feature occlusion detection method used for occlusion scene, especially under high-imitation occlusion
CN113723414A (en) * 2021-08-12 2021-11-30 中国科学院信息工程研究所 Mask face shelter segmentation method and device
CN113947782A (en) * 2021-10-14 2022-01-18 哈尔滨工程大学 Pedestrian target alignment method based on attention mechanism
CN113947802A (en) * 2021-12-21 2022-01-18 武汉天喻信息产业股份有限公司 Method, device and equipment for identifying face with shielding and readable storage medium
CN113989902A (en) * 2021-11-15 2022-01-28 天津大学 Method, device and storage medium for identifying shielded face based on feature reconstruction
CN114331904A (en) * 2021-12-31 2022-04-12 电子科技大学 Face shielding identification method
CN114998958A (en) * 2022-05-11 2022-09-02 华南理工大学 Face recognition method based on lightweight convolutional neural network
CN115100709A (en) * 2022-06-23 2022-09-23 北京邮电大学 Feature-separated image face recognition and age estimation method
CN115631530A (en) * 2022-12-22 2023-01-20 暨南大学 Fair facial expression recognition method based on face action unit
CN115908964A (en) * 2022-09-20 2023-04-04 国药(武汉)医学实验室有限公司 Medical image classification method, system, terminal and storage medium
CN115984949A (en) * 2023-03-21 2023-04-18 威海职业学院(威海市技术学院) Low-quality face image recognition method and device with attention mechanism
CN116051859A (en) * 2023-02-21 2023-05-02 阿里巴巴(中国)有限公司 Service providing method, apparatus and storage medium
CN116311479A (en) * 2023-05-16 2023-06-23 四川轻化工大学 Face recognition method, system and storage medium for unlocking automobile
CN116563926A (en) * 2023-05-17 2023-08-08 智慧眼科技股份有限公司 Face recognition method, system, equipment and computer readable storage medium
CN117437684A (en) * 2023-12-14 2024-01-23 深圳须弥云图空间科技有限公司 Image recognition method and device based on corrected attention
CN117475357A (en) * 2023-12-27 2024-01-30 北京智汇云舟科技有限公司 Monitoring video image shielding detection method and system based on deep learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2016213857A1 (en) * 2012-09-28 2016-09-01 Sony Corporation Image Processing Device
CN111428116A (en) * 2020-06-08 2020-07-17 四川大学 Microblog social robot detection method based on deep neural network
CN111523407A (en) * 2020-04-08 2020-08-11 上海涛润医疗科技有限公司 Face recognition system and method and medical care recording system based on face recognition
CN111898413A (en) * 2020-06-16 2020-11-06 深圳市雄帝科技股份有限公司 Face recognition method, face recognition device, electronic equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2016213857A1 (en) * 2012-09-28 2016-09-01 Sony Corporation Image Processing Device
CN111523407A (en) * 2020-04-08 2020-08-11 上海涛润医疗科技有限公司 Face recognition system and method and medical care recording system based on face recognition
CN111428116A (en) * 2020-06-08 2020-07-17 四川大学 Microblog social robot detection method based on deep neural network
CN111898413A (en) * 2020-06-16 2020-11-06 深圳市雄帝科技股份有限公司 Face recognition method, face recognition device, electronic equipment and medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LI Y: "occlusion aware facial expression recognition using CNN with attention mechanism", 《IEEE TRANSACTION ON IMAGE PROCESSING》 *
杨壮等: "基于注意力机制和深度恒等映射的人脸识别", 《传感器与微系统》 *
闫硕: "单样本部分遮挡人脸识别研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361431B (en) * 2021-06-15 2023-09-22 山西大学 Network model and method for face shielding detection based on graph reasoning
CN113361431A (en) * 2021-06-15 2021-09-07 山西大学 Network model and method for face shielding detection based on graph reasoning
CN113536965B (en) * 2021-06-25 2024-04-09 深圳数联天下智能科技有限公司 Method and related device for training face shielding recognition model
CN113536965A (en) * 2021-06-25 2021-10-22 深圳数联天下智能科技有限公司 Method and related device for training face shielding recognition model
CN113378980A (en) * 2021-07-02 2021-09-10 西安电子科技大学 Mask face shading recovery method based on self-adaptive context attention mechanism
CN113378980B (en) * 2021-07-02 2023-05-09 西安电子科技大学 Mask face shielding recovery method based on self-adaptive context attention mechanism
CN113723414A (en) * 2021-08-12 2021-11-30 中国科学院信息工程研究所 Mask face shelter segmentation method and device
CN113723414B (en) * 2021-08-12 2023-12-15 中国科学院信息工程研究所 Method and device for dividing mask face shielding object
CN113705466A (en) * 2021-08-30 2021-11-26 浙江中正智能科技有限公司 Human face facial feature occlusion detection method used for occlusion scene, especially under high-imitation occlusion
CN113705466B (en) * 2021-08-30 2024-02-09 浙江中正智能科技有限公司 Face five sense organ shielding detection method for shielding scene, especially under high imitation shielding
CN113947782A (en) * 2021-10-14 2022-01-18 哈尔滨工程大学 Pedestrian target alignment method based on attention mechanism
CN113989902A (en) * 2021-11-15 2022-01-28 天津大学 Method, device and storage medium for identifying shielded face based on feature reconstruction
CN113947802B (en) * 2021-12-21 2022-04-01 武汉天喻信息产业股份有限公司 Method, device and equipment for identifying face with shielding and readable storage medium
CN113947802A (en) * 2021-12-21 2022-01-18 武汉天喻信息产业股份有限公司 Method, device and equipment for identifying face with shielding and readable storage medium
CN114331904A (en) * 2021-12-31 2022-04-12 电子科技大学 Face shielding identification method
CN114331904B (en) * 2021-12-31 2023-08-08 电子科技大学 Face shielding recognition method
CN114998958B (en) * 2022-05-11 2024-04-16 华南理工大学 Face recognition method based on lightweight convolutional neural network
CN114998958A (en) * 2022-05-11 2022-09-02 华南理工大学 Face recognition method based on lightweight convolutional neural network
CN115100709A (en) * 2022-06-23 2022-09-23 北京邮电大学 Feature-separated image face recognition and age estimation method
CN115908964B (en) * 2022-09-20 2023-12-12 国药(武汉)医学实验室有限公司 Medical image classification method, system, terminal and storage medium
CN115908964A (en) * 2022-09-20 2023-04-04 国药(武汉)医学实验室有限公司 Medical image classification method, system, terminal and storage medium
CN115631530A (en) * 2022-12-22 2023-01-20 暨南大学 Fair facial expression recognition method based on face action unit
CN115631530B (en) * 2022-12-22 2023-03-28 暨南大学 Fair facial expression recognition method based on face action unit
CN116051859B (en) * 2023-02-21 2023-09-08 阿里巴巴(中国)有限公司 Service providing method, apparatus and storage medium
CN116051859A (en) * 2023-02-21 2023-05-02 阿里巴巴(中国)有限公司 Service providing method, apparatus and storage medium
CN115984949A (en) * 2023-03-21 2023-04-18 威海职业学院(威海市技术学院) Low-quality face image recognition method and device with attention mechanism
CN116311479B (en) * 2023-05-16 2023-07-21 四川轻化工大学 Face recognition method, system and storage medium for unlocking automobile
CN116311479A (en) * 2023-05-16 2023-06-23 四川轻化工大学 Face recognition method, system and storage medium for unlocking automobile
CN116563926B (en) * 2023-05-17 2024-03-01 智慧眼科技股份有限公司 Face recognition method, system, equipment and computer readable storage medium
CN116563926A (en) * 2023-05-17 2023-08-08 智慧眼科技股份有限公司 Face recognition method, system, equipment and computer readable storage medium
CN117437684A (en) * 2023-12-14 2024-01-23 深圳须弥云图空间科技有限公司 Image recognition method and device based on corrected attention
CN117437684B (en) * 2023-12-14 2024-04-16 深圳须弥云图空间科技有限公司 Image recognition method and device based on corrected attention
CN117475357B (en) * 2023-12-27 2024-03-26 北京智汇云舟科技有限公司 Monitoring video image shielding detection method and system based on deep learning
CN117475357A (en) * 2023-12-27 2024-01-30 北京智汇云舟科技有限公司 Monitoring video image shielding detection method and system based on deep learning

Also Published As

Publication number Publication date
CN112949565B (en) 2022-06-03

Similar Documents

Publication Publication Date Title
CN112949565B (en) Single-sample partially-shielded face recognition method and system based on attention mechanism
CN107977932B (en) Face image super-resolution reconstruction method based on discriminable attribute constraint generation countermeasure network
CN111563508B (en) Semantic segmentation method based on spatial information fusion
WO2022036777A1 (en) Method and device for intelligent estimation of human body movement posture based on convolutional neural network
CN110348330B (en) Face pose virtual view generation method based on VAE-ACGAN
CN110969124B (en) Two-dimensional human body posture estimation method and system based on lightweight multi-branch network
CN111325111A (en) Pedestrian re-identification method integrating inverse attention and multi-scale deep supervision
CN111814611B (en) Multi-scale face age estimation method and system embedded with high-order information
CN112800903B (en) Dynamic expression recognition method and system based on space-time diagram convolutional neural network
CN112766160A (en) Face replacement method based on multi-stage attribute encoder and attention mechanism
CN109190581A (en) Image sequence target detection recognition methods
CN112801015B (en) Multi-mode face recognition method based on attention mechanism
CN113673590B (en) Rain removing method, system and medium based on multi-scale hourglass dense connection network
CN111612008A (en) Image segmentation method based on convolution network
CN112288011A (en) Image matching method based on self-attention deep neural network
CN114782298B (en) Infrared and visible light image fusion method with regional attention
CN111209873A (en) High-precision face key point positioning method and system based on deep learning
CN114170537A (en) Multi-mode three-dimensional visual attention prediction method and application thereof
CN113947814A (en) Cross-visual angle gait recognition method based on space-time information enhancement and multi-scale saliency feature extraction
CN113011253A (en) Face expression recognition method, device, equipment and storage medium based on ResNeXt network
CN116071676A (en) Infrared small target detection method based on attention-directed pyramid fusion
Guo et al. Multifeature extracting CNN with concatenation for image denoising
CN114882537A (en) Finger new visual angle image generation method based on nerve radiation field
Sakthimohan et al. Detection and Recognition of Face Using Deep Learning
CN111881803B (en) Face recognition method based on improved YOLOv3

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant