CN112949565A

CN112949565A - Single-sample partially-shielded face recognition method and system based on attention mechanism

Info

Publication number: CN112949565A
Application number: CN202110320104.0A
Authority: CN
Inventors: 钟福金; 侯梦军; 王润生
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2021-06-11
Anticipated expiration: 2041-03-25
Also published as: CN112949565B

Abstract

The invention belongs to the field of single-sample partially-occluded face recognition, and particularly relates to a single-sample partially-occluded face recognition method and system based on an attention mechanism, wherein the method comprises the following steps: acquiring a partially shielded test face image and an unshielded single sample face Gallery set and preprocessing the partially shielded test face image and the unshielded single sample face Gallery set; inputting the preprocessed data into a differential network formed by ReseNet-34, and extracting a shallow feature map through a convolution layer; utilizing a space attention module to adjust the weight of the shallow layer feature map from the space position information, and multiplying the weight by the feature map output by the last convolutional layer to highlight the local detail feature of the shallow layer feature map; carrying out subtraction operation on the feature images of the occluded image and the unoccluded image with the highlighted local detail features; after absolute value processing is carried out on the difference value, the difference value is calibrated on the original characteristic diagram according to the channel through a channel attention module; sending the calibrated characteristic diagram into a full connection layer to output a classification result; the invention can accurately identify the face image with partial shielding.

Description

Single-sample partially-shielded face recognition method and system based on attention mechanism

Technical Field

The invention belongs to the field of single-sample partially-occluded face recognition, and particularly relates to a single-sample partially-occluded face recognition method and system based on an attention mechanism.

Background

The human face recognition is a biological recognition technology for carrying out identity recognition based on the facial feature information of people, and the purpose of recognizing the identities of different people is achieved by collecting images or videos containing human faces by using a camera or a camera and carrying out a series of technical processing on the collected human faces. After many years of research, face recognition has achieved good results under controlled conditions, but face recognition under unconstrained conditions still faces many challenges. In some special application scenarios, such as an identification card management system, a criminal investigation law enforcement system, passport verification, identity recognition at a registration port, and the like, only one face image (a certificate photo face image) of each person can be collected as a training sample, and when the testing sample may be affected by severe face changes such as illumination changes, posture changes, expression changes, external object occlusion, and the like, the differences of facial features of a human face within a class are larger than the differences between classes, so that the problem of single-sample partially-occluded face recognition occurs, that is, only a single-sample design face recognition method is used to recognize an unknown face image.

In real life, the situations causing facial occlusion can be classified into the following three categories: 1) shielded by external objects (such as sunglasses, scarves, hats, etc.); 2) extreme lighting (e.g., shadows); 3) self-occlusion (e.g. side face) caused by pose change, we will hereinafter refer to occlusion as face recognition primarily occluded by external objects.

Although face recognition has been well addressed in the area of occlusion research, there are still some problems: 1) the current method still cannot completely remove the influence caused by the shielding. The recognition result will be more ideal if the influence of occlusion can be completely removed. 2) When the existing method faces the problem of single sample, the existing mature face recognition algorithms cannot extract the intra-class change information by using a single training sample, so that the recognition effects of the existing mature face recognition algorithms are poor.

Most research works are focused on improving the accuracy of an identification system, the problems of a face database are ignored, for example, due to the difficulty of collecting samples or the storage limit of the system, each person in the database may only have one sample image, under the condition, most of the traditional methods, such as PCA and LDA, have the condition of performance reduction or even incapability of working, and when a deep learning route is adopted, each person only stores the face database of one image, and lacks a large amount of samples to learn rich intra-class change information, and on the premise, the problem of shielding is processed and has no good performance. In summary, the single-sample partially-occluded face recognition is inevitable in real application scenarios, and the face recognition under the single-sample constraint still faces a great challenge.

Disclosure of Invention

In view of the above-mentioned problem of learning rich intra-class variation information due to lack of a large number of samples and the problem of influence on recognition accuracy due to loss of facial feature information caused by the existence of occlusion, the present invention provides a single-sample partially-occluded face recognition method and system based on attention mechanism, the method comprising:

inputting a face image pair set with a category label as a source domain data set, wherein the face image pair comprises a clean front face and a face with partial shielding in the same identity, and preprocessing the face image pair data set;

inputting the preprocessed face image into a differential network formed by two ResNet-34, and extracting a shallow feature map through a convolution layer;

inputting the shallow feature map into a residual network consisting of four sequentially cascaded residual module groups, and extracting the global features of the face image;

embedding a space attention module between the second layer residual error module and the fourth layer residual error module, adjusting the weight of pixels in the uncovered area of the shallow layer feature map, and outputting a space position weight feature map;

the spatial position weight feature map is multiplied and connected with a feature map output by a residual error module of a fourth layer, and local detail features from a lower layer are obtained through fusion of cross-layer information;

taking the absolute value of the difference value of the feature map of the shielded image and the feature map of the unshielded image with the highlighted local detail features as the input of a channel attention module;

the channel attention module calibrates the shielded image and the unshielded image feature map with the highlighted local detail features according to the input absolute value according to the channel, and the calibrated feature map is sent to the full connection layer to output a classification result;

performing iterative training on the network by jointly optimizing a cross entropy loss function and the contrast loss of the human face to the image due to the difference;

after multiple rounds of training, the network loss tends to be stable, and the iterative training process is finished to obtain a trained network model;

and inputting the target domain single sample human face Gallery set and the partially shielded test human face images into a trained network model, and calculating and outputting the category of the finally shielded human face according to the cosine distance of the characteristics of the human face images by the model.

Further, pre-processing the face image data set includes cropping the face image to a size of 128 × 128, and performing a pixel normalization operation on the cropped face image, which is expressed as:

X_pix＝(X_pix-128)/128；

wherein, X_pixAnd the image pixel values are corresponding to the face images.

Further, extracting the shallow feature map through a convolutional layer includes: inputting the face image with the channel number of 3 into a convolution layer with the convolution kernel size of 3 multiplied by 3, the channel number of 64 and the step length of 1 for feature extraction, outputting a feature map with the size of 128 multiplied by 128 and the output channel changed into 64, and obtaining a shallow feature map of the face image by passing the feature map through a maximum pooling layer.

Furthermore, in the four residual module groups which are sequentially cascaded, each residual module group sequentially comprises 3, 4, 6 and 3 residual modules according to the cascade order, and the size of the feature map output by each residual module group according to the cascade order is 64 × 64, 32 × 32, 16 × 16 and 8 × 8.

Further, the process of acquiring local detail features from the lower layer includes:

a spatial attention module is embedded between the second layer residual error module and the fourth layer residual error module;

obtaining a weight vector characteristic diagram by utilizing the global average pooling and the global maximum pooling of the input h ' xw ' xc ' three-dimensional tensor;

processing the weight vector feature map by using a convolution layer with 7 multiplied by 7 convolution kernel, 3 filling size and 1 channel number and a Sigmoid nonlinear activation layer;

upsampling the feature map with the size of 8 multiplied by 8 output by the fourth layer residual group module into a feature map with the size of 32 multiplied by 32 by a bilinear interpolation method;

and multiplying the feature map after the up-sampling with the processed weight vector feature map, and outputting the feature map with the size of 8 multiplied by 8 through the down-sampling to obtain the local detail features from the lower layer.

Further, the calibrating, by the channel attention module, on the feature map of the occluded image and the feature map of the unoccluded image after highlighting the local detail feature according to the absolute value of the difference of the input feature maps by channels includes:

performing subtraction operation on the feature maps of the occluded image and the unoccluded image with the highlighted local detail features, and taking the absolute value of the subtracted difference as the input of a channel attention module;

the channel attention module obtains two 1 × 1 × C channel descriptions by respectively using global average pooling and global maximum pooling operations;

the two channel descriptions are respectively sent into a shallow neural network, the number of neurons in the first layer in the shallow neural network is C/r, the activation function is ReLU, and the number of neurons in the second layer is C;

adding and combining two characteristic graphs extracted by a shallow neural network, and obtaining a weight coefficient of each channel by using a Sigmoid activation function;

after the weight of the characteristic channel is obtained, the original characteristic is weighted channel by channel through multiplication, and the original characteristic recalibration on the channel dimension is completed;

wherein C is the number of channels, and r is the dimensionality reduction factor.

Further, the cross entropy loss function is expressed as:

wherein the content of the first and second substances,

is a cross entropy loss function;

representing samples y in a networkⁱThe probability of the identity classification of (a),

indicating that the shielded face image exists in the ith personal face image pair, F is a full connection layer after the last convolution layer in the differential network,

the weight feature map is a weight feature map showing the feature of a face with partial occlusion after the channel attention mask operation, and μ (-) shows the channel attention output, and has a value of [0, 1%]A mask in between; f (-) represents the characteristics of the final output of the convolutional layer; n represents the number of training lumped samples of the face image.

Further, the contrast loss function represents:

wherein the content of the first and second substances,

mu (-) represents the weight feature map of the channel attention output, with a value of [0,1 ] for the loss of contrast between the two pictures due to the existence of the occlusion region]A mask in between;

a face image indicating that an occlusion exists in the ith personal face image pair; x is the number ofⁱA face image indicating that no occlusion exists in the ith personal face image pair; f (-) represents the characteristics of the final output of the convolutional layer; n represents the number of training lumped samples of the face image.

The invention also provides a single-sample partially-occluded face recognition system based on the attention mechanism, which comprises an image acquisition module, a data preprocessing module, a neural network module and an output module, wherein:

the image acquisition module is used for inputting a data set and acquiring face image information or a face image to be detected;

the data preprocessing module is used for performing pixel normalization processing on the face image, performing data enhancement in a source domain data set and expanding a training set of a source domain through the operation of randomly adding a shelter;

the neural network module is used for constructing and training a differential neural network formed by two identical ReseNet-34 embedded with an attention mechanism;

and the output module is used for outputting the final belonged identity category of the face image to be detected, namely migrating the model trained by using the source domain to a target domain data set, sending the single sample face Gallery set and the partially shielded test face image into the model, and judging the identity of the partially shielded test face image.

The invention has the beneficial technical effects that:

(1) the method has the effects of high speed and high precision, and can accurately judge the identity of the face image with partial shielding, which is input randomly.

(2) The invention provides a novel feature extraction framework which utilizes a differential network and simultaneously considers the fusion of high-level information and low-level information, the fusion of cross-layer information is obtained through the connection of global information and local information, the feature characterization capability of the network is enhanced, so that more discriminant representation is obtained, and the high-precision single-sample partially-shielded face recognition effect is realized.

(3) According to the method, a space attention module and a channel attention module are embedded in a differential network, the space attention module guides a model to pay attention to the characteristics of the model, and the characteristics of an unoccluded area are paid attention to; the channel attention module completes the recalibration of the original features on the channel dimension through the correlation between modeling channels, inhibits the channel which actively responds to the shielded area, and overcomes the defects of the existing single-sample partially-shielded face recognition method.

Drawings

Fig. 1 is a schematic process diagram of a single-sample partially-occluded face recognition method based on an attention mechanism according to an embodiment of the present invention;

FIG. 2 is a schematic view of a spatial attention module according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a channel attention module according to an embodiment of the invention;

FIG. 4 is a schematic diagram of a training process according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a single-sample partially-occluded face recognition network based on an attention mechanism according to an embodiment of the present invention;

fig. 6 is a diagram illustrating an application effect of the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a single-sample partially-occluded face recognition method based on an attention mechanism, which specifically comprises the following steps of:

In one embodiment, the invention is intended to use a trained model, and the data set used in the training phase is the CASIA _ WebFace face data set, which serves as the source domain and contains 494414 face images collected from the Internet, from 10575 people, with labels represented by the same numerical value of the same identity. In order to solve the problem of identification accuracy reduction caused by the existence of a shielding area, the invention randomly adds black blocks or masks, leaves and other real shielding objects to the CASIA _ WebFace data set for data enhancement, and trains the CASIA _ WebFace data set after data enhancement on a model. After the model is trained, the model is migrated to a target domain data set, and in order to ensure fairness of partitioning of a training set and a test set, partition structures of a target domain single-sample face Gallery set and a partially-shielded test face image set in all comparison experiments are the same in this embodiment. The target domain data set comprises three single sample face data sets of AR, Extended Yale B and CAS-PEAL-R1.

Preprocessing the source domain and target domain data sets: all pictures are uniformly cut into the size of 128 multiplied by 128, and the processed face image is subjected to pixel normalization processing, wherein the formula comprises the following steps:

X_pix＝(X_pix-128)/128；

wherein, X_pixThe correspondence is the input face image pixel value, specifically, the face image pixel value input to the differential network.

And sequentially transmitting the preprocessed sample images to a neural network, and training the network by utilizing a back propagation minimized loss function. Compared with the traditional single-sample partial occlusion face recognition algorithm, the method adopts ResNet-34 to reduce the size of the model and improve the accuracy of the model, the ResNet-34 adds a bypass connection (shortcut) branch outside an original convolutional layer to form a basic residual module, the original mapping H (X) is expressed as H (X) (F (X)) + x, wherein F (X) is residual mapping, x is an input signal, the learning of the convolutional layer pair H (X) is converted into the learning of F (X) through a residual module structure, the learning of F (X) is simpler than that of H (X), the structure reduces the calculated amount and effectively solves the attenuation problem caused by the fact that the number of layers of the network is too deep.

Inputting a face image into a ResNet-34 network, performing shallow feature extraction through a convolution layer to serve as an input feature map of a subsequent residual error network, specifically, performing feature extraction on a feature map with an input channel number of 3 through a convolution layer with a kernel size of 3 x 3, a channel number of 64 and a step length of 1, wherein the size of an output feature map is 128 x 128, an output channel is changed into 64, and the output feature map serves as an input feature map of a subsequent residual error module after passing through a maximum pooling layer.

And after the shallow feature map is extracted, four groups of residual modules which are sequentially connected are connected to form a residual network, and the residual network is used as a global branch to extract the global features of the face image.

It can be understood that the core improvement of the invention is that the attention mechanism and the cross-layer information fusion two modules proposed by the invention are integrated, and for the attention mechanism module, the attention mechanism module is divided into a space attention mechanism and a channel attention mechanism and is mainly embedded in ResNet-34; in order to simultaneously input a pair of face images each time, the ResNet-34 is improved to be used as a differential network, a shallow layer feature map is extracted at the beginning of the convolutional layer, four groups of residual modules which are sequentially connected are connected behind the maximum pooling layer to form a residual network, and the residual network is used as a global branch to extract the global features of the face images; on the other hand, a space attention module is embedded between the second layer and the fourth layer of the residual error network, the feature maps output by the second layer and the fourth layer are connected, namely cross-layer information fusion is carried out, and the absolute value of the difference of the feature maps of the fourth layer of the differential network is used as the input of a next channel attention module. In the present invention, if not specifically emphasized, the residual error network of the present invention mainly refers to a differential network composed of a plurality of sets of residual error modules and attention mechanism modules after the ResNet-34, and of course, the above division refers to only the point for more highlighting the improvement of the present invention, and those skilled in the art can adaptively understand the present invention according to the overall embodiment of the present invention and the attached drawings.

Further, the differential network is composed of a residual module and an attention mechanism module.

The process of constructing the differential network comprises the following steps:

inputting the shallow layer feature map output by the first convolutional layer into two ResNet-34 network branches, wherein each ResNet-34 network branch is formed by connecting 4 groups of residual modules in series, the number of input channels of each group of residual modules is 64, 128, 256 and 512, each residual module consists of convolution operation, Batch Normalization (BN) operation and modified linear unit (ReLU) operation, the series of operations act on the mapping of global features, and the corresponding output channels are 64, 128, 256 and 512;

embedding a spatial attention module between the second layer residual error module and the fourth layer residual error module of each branch of the differential network, wherein the embedding process of the spatial attention module comprises the following steps: inserting a spatial attention module in a residual network for guiding a model to pay attention to meaningful features, specifically, a three-dimensional tensor of h ' × w ' × c ' output by a convolution layer of a second layer of residual module, and using global average pooling and global maximum pooling, except that an operation is performed in the dimension of a channel, that is, all input channels are pooled into 2 real numbers, two weight vectors of (h ' × w ' × 1) are obtained from the input of the (h ' × w ' × c ') shape, and at the moment, the two weight vectors of (h ' × w ' × 1) are spliced into a weight vector feature map of (h ' × w ' × 2) based on the channel, wherein h ' and w ' are the length and the width of an input face image respectively, and c ' is the number of the channel;

performing convolution by using a convolution kernel of 7 multiplied by 7, wherein the filling size is 3, the number of channels is compressed into 1, and then a new (h '× w' × 1) weight vector characteristic diagram is formed after convolution through Sigmoid nonlinear activation operation;

the 8 × 8 feature map output by the residual error module of the fourth layer is up-sampled into a 32 × 32 feature map by a bilinear interpolation method, the feature map is multiplied by the (h '× w' × 1) weight vector feature map on a channel level, and a feature map with the size of 8 × 8 is output by down-sampling to obtain a new feature after scaling;

and subtracting the scaled new feature maps at two branches at the tail end of the network, and solving the absolute value of the difference as the input of the channel attention module.

In one embodiment, after the convolution operation is performed by the second layer residual error module, a three-dimensional tensor of 32 × 32 × 128 is input, which is the length, width and number of channels of the input feature map, the three-dimensional tensor is firstly subjected to global maximum pooling and global average pooling, that is, pooling in the dimension of the column channels, the maximum value and the average value of one column of channels are taken, and pooling one column of channels at a time becomes one value or one channel, and the length and the width are not changed. The input feature map is 32 × 32 × 128, and becomes a 32 × 32 × 1 feature map after pooling once. At this time, two (32 × 32 × 1) weight vectors are spliced into a (32 × 32 × 1) weight vector feature map based on the channels.

Performing convolution by using a 7 × 7 convolution kernel, wherein the filling size is 3, the number of channels is compressed into 1, and then a new (32 × 32 × 1) weight vector characteristic diagram is formed after convolution through Sigmoid nonlinear activation operation;

upsampling the 8 × 8 feature map output by the residual error module of the fourth layer into a 32 × 32 feature map by a bilinear interpolation method, multiplying the 32 × 32 × 1 weight vector feature map at a channel level, namely fusing features from a lower layer, and outputting the 8 × 8 feature map by downsampling to obtain a scaled new feature;

the specific process of the channel attention module for adjusting the weight of each channel is as follows:

and subtracting the scaled new feature maps at two branches at the tail end of the network, and taking the absolute value of the feature difference as the input of the channel attention module. In order to converge the spatial features, the channel attention module receives the new feature map after subtraction, and for this, two 1 × 1 × 512 channel descriptions are obtained by adopting global average pooling and global maximum pooling, wherein the two different pooling means that the extracted high-level features are richer;

respectively sending the two channel descriptions into a shallow neural network, wherein the number of neurons in the first layer is 512/16, the activation function is ReLU, the number of neurons in the second layer is 512, and the two layers of neural networks are shared;

adding and combining the two obtained feature graphs, and obtaining a weight coefficient of each channel by using a Sigmoid activation function;

and taking the characteristics re-calibrated by the channel attention module as final human face characteristic representation, and outputting a classification result according to the full connection layer.

And performing joint cross entropy loss function on the network and solving the contrast loss cross entropy loss caused by the difference of the human face to the image, performing joint optimization on the whole differential network through back propagation minimization loss function, and performing iterative training on the network.

Further, the cross entropy loss function is expressed as follows:

the contrast loss is expressed as follows:

wherein the content of the first and second substances,

is a cross entropy loss function;

face image, x, indicating the presence of an occlusion in the ith pair of face imagesⁱIndicating that there is no face image blocked in the ith personal face image pair, F is a full connection layer after the last convolution layer in the differential network,

for the loss of contrast between the two pictures due to the presence of the occlusion region,

represents the feature of the face with local occlusion after the channel attention mask operation, and μ (-) represents the channel attention outputThe weight characteristic graph is a value of 0,1]A mask in between; f (-) represents the characteristics of the final output of the convolutional layer; n represents the number of training lumped samples of the face image.

Using an Adam optimizer to carry out training adjustment, after multiple rounds of training, the network tends to be stable, and the iteration process is ended to obtain a trained network model, wherein the training process is shown in figure 4,

after an image data set is obtained, preprocessing a face image;

constructing a differential network model based on an attention mechanism, namely the network model constructed by the invention;

training the network using the data set and performing multiple iterations;

and carrying out loss solution on the result output by the network and the real identity value label corresponding to the face image until the loss tends to be stable.

At this time, the training is finished and the trained network model is output.

The trained network model is shown in fig. 5.

When the trained neural network model is used, a target domain single sample human face Gallery set and a partially shielded test human face image are input into the trained neural network model, and the model calculates and outputs the category of the finally shielded human face according to the characteristic cosine distance of the human face image.

the neural network module is used for constructing and training a differential neural network formed by two identical residual error networks embedded with an attention mechanism;

Further, the neural network module comprises ResNet-34, when the data output by the data preprocessing module is input into the neural network module, a shallow feature map is extracted through a convolution layer, the shallow feature map is input into a residual error network formed by four residual error module groups which are sequentially cascaded, and the global feature of the face image is extracted; a spatial attention module is embedded between the second layer residual error module and the fourth layer residual error module, the weight of pixels in the uncovered area of the shallow layer feature map is adjusted, and a spatial position weight feature map is output; connecting the spatial position weight feature map with a feature map output by a fourth layer residual error module, and acquiring local detail features from a lower layer through fusion of cross-layer information;

the channel attention module performs weight adjustment on each channel, and comprises the following steps:

taking global average pooling and global maximum pooling on the subtracted values received by the channel attention module to obtain two channel descriptions;

respectively sending the two channel descriptions into a shallow neural network to extract features, and combining the extracted features through addition;

the combined features utilize a Sigmoid activation function to obtain the weight coefficient of each channel;

after the weight of the characteristic channel is obtained, weighting the characteristic channel by channel to the characteristic graphs of the occluded image and the unoccluded image with the highlighted local detail characteristic through multiplication, and finishing the recalibration of the original characteristic on the channel dimension;

and the features recalibrated by the channel attention module are used as final human face feature representation, and classification results are output according to the full connection layer.

And (4) performing iterative training on the network by jointly optimizing a cross entropy loss function and the contrast loss of the human face to the image due to the difference, wherein the training process is detailed in a method part and is not repeated herein.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The single-sample partial occlusion face recognition method based on the attention mechanism is characterized by comprising the following steps:

the channel attention module is used for calibrating the feature graphs of the shielded image and the unshielded image with the highlighted local detail features according to the input absolute value according to the channel, and the calibrated feature graphs are sent to the full-connection layer to output classification results;

2. The attention-based single-sample partial occlusion face recognition method of claim 1, wherein pre-processing the face image data set comprises cropping the face image to 128 x 128 size and performing a pixel normalization operation on the cropped face image expressed as:

X_pix＝(X_pix-128)/128；

wherein, X_pixAnd the pixel values are corresponding to the face images.

3. The attention mechanism-based single-sample partially-occluded face recognition method according to claim 1, wherein extracting the shallow feature map through a convolution layer comprises: inputting the face image with the channel number of 3 into a convolution layer with the convolution kernel size of 3 multiplied by 3, the channel number of 64 and the step length of 1 for feature extraction, outputting a feature map with the size of 128 multiplied by 128 and the output channel changed into 64, and obtaining a shallow feature map of the face image by passing the feature map through a maximum pooling layer.

4. The method for recognizing the single-sample partially-occluded face based on the attention mechanism of claim 1, wherein in the four residual module groups which are cascaded in sequence, each residual module group comprises 3, 4, 6 and 3 residual modules in sequence according to the cascade order, and the feature map output by each residual module group according to the cascade order has the size of 64 × 64, 32 × 32, 16 × 16 and 8 × 8.

5. The attention mechanism-based single-sample partial occlusion face recognition method according to claim 4, wherein the process of acquiring local detail features from a lower layer comprises:

upsampling the 8 × 8 feature map output by the fourth layer residual group module into a 32 × 32 feature map by a bilinear interpolation method;

6. The attention mechanism-based single-sample partial occlusion face recognition method according to claim 1, wherein the channel attention module performing calibration by channels on feature maps of an occlusion image and an unoccluded image after highlighting the local detail feature according to the input absolute value comprises:

after the weight of the characteristic channel is obtained, weighting the characteristics of the shielded image and the unoccluded image with the highlighted local detail characteristics channel by channel through multiplication, and finishing the recalibration of the original characteristics on the channel dimension;

7. The attention mechanism-based single-sample partial occlusion face recognition method according to claim 1, wherein the cross entropy loss function is expressed as:

wherein the content of the first and second substances,

is a cross entropy loss function;

features indicating faces with partial occlusions after channel attention mask operationsThe sign, μ (-) represents the weighted feature map of the channel attention output, is of value [0, 1%]A mask in between; f (-) represents the characteristics of the final output of the convolutional layer; n represents the number of training lumped samples of the face image.

8. The attention mechanism-based single-sample partial occlusion face recognition method according to claim 1, wherein the contrast loss function represents:

wherein the content of the first and second substances,

9. Single sample part shelters from face identification system based on attention system, its characterized in that includes image acquisition module, data preprocessing module, neural network module and output module, wherein:

the neural network module is used for constructing and training a differential neural network consisting of two identical ReseNet-34 embedded with an attention mechanism;

and the output module is used for outputting the final belonged identity category of the face image to be detected, namely migrating the trained model of the source domain to a target domain data set, sending the single sample face Gallery set and the partially shielded test face image into the model, and judging the identity of the partially shielded test face image.