CN115862097A

CN115862097A - Method and device for identifying shielding face based on multi-attention and multi-scale feature learning

Info

Publication number: CN115862097A
Application number: CN202211493911.3A
Authority: CN
Inventors: 杨新宇; 张硕; 胡冠宇; 宋怡馨; 魏洁; 张与弛; 曹至欣; 郭靖宜
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2022-11-25
Filing date: 2022-11-25
Publication date: 2023-03-28

Abstract

The invention discloses a method and a device for identifying an occluded face based on multi-attention and multi-scale feature learning. The method is based on a channel attention and space attention mechanism, reduces the influence of a shelter on face recognition, and solves the problem that the face recognition accuracy is reduced under the situation of shelter. Firstly, adding a multi-level attention network on the basis of a traditional convolutional neural network, and extracting a channel attention diagram and a space attention diagram of a face image; secondly, constructing a multi-scale feature fusion device, and integrating local and global information of the face image; next, positioning an occlusion region and generating an occlusion mask through an occlusion mask generator, and reducing the influence of the occlusion region; and finally, carrying out shielding category classification and face identity classification through a multi-task learning network to obtain a final face recognition result. The method is simple and easy to implement, the model depth is shallow, the cost is low, and the identification efficiency is ensured while the accurate identification of the shielded face is realized.

Description

Occluded face recognition method and device based on multi-attention multi-scale feature learning

技术领域Technical Field

本发明属于人工智能人脸识别领域，具体涉及一种基于多注意力多尺度特征学习的遮挡人脸识别方法和装置。The present invention belongs to the field of artificial intelligence face recognition, and in particular relates to an occluded face recognition method and device based on multi-attention multi-scale feature learning.

背景技术Background Art

作为一种非侵入式的识别验证方式，人脸识别相比其他生物识别技术更受大众的喜爱与接受，随着识别技术的发展与进步，人脸识别技术已广泛部署在多种场景下如监控系统、安防系统、工业生产、家庭监护等，方便人们生活的各方各面。As a non-invasive identification and verification method, facial recognition is more popular and accepted by the public than other biometric technologies. With the development and advancement of recognition technology, facial recognition technology has been widely deployed in various scenarios such as monitoring systems, security systems, industrial production, home monitoring, etc., making it convenient for people in all aspects of their lives.

人脸识别技术的准确率很大程度上依赖于模型对于人脸关键特征的提取，而人脸区域是否完整对特征提取过程有很大影响。随着新冠疫情的全球大流行，佩戴口罩成为人们出行必不可少的要求。口罩作为外界干扰因素，使得人脸图片存在遮挡，进而导致部分特征损坏。在这种情况下，常用的人脸识别算法便会失去应有的较高准确率，最终无法完成口罩人脸识别任务。因此，急需针对遮挡人脸的识别提出相关新的研究算法。The accuracy of face recognition technology depends largely on the model's extraction of key facial features, and whether the facial area is complete has a great impact on the feature extraction process. With the global pandemic of the new coronavirus, wearing masks has become an indispensable requirement for people to travel. Masks, as external interference factors, cause occlusion of face images, which in turn damages some features. In this case, the commonly used face recognition algorithms will lose their high accuracy and ultimately fail to complete the mask face recognition task. Therefore, it is urgent to propose new research algorithms for the recognition of occluded faces.

发明内容Summary of the invention

本发明的目的是提供了一种基于多注意力多尺度特征学习的遮挡人脸识别方法和装置。本发明是通过提取并融合人脸图像的多层次特征，同时采用通道注意力和空间注意力机制消除遮挡区域对人脸识别的影响，提供了一种有效提高遮挡人脸识别精确度的方法。本发明逻辑简单且效果显著，能够有效屏蔽部分遮挡对人脸识别的不利影响，同时也支持无遮挡场景下的人脸识别任务。The purpose of the present invention is to provide a method and device for occluded face recognition based on multi-attention multi-scale feature learning. The present invention extracts and fuses multi-level features of face images, and uses channel attention and spatial attention mechanisms to eliminate the influence of occluded areas on face recognition, thereby providing a method for effectively improving the accuracy of occluded face recognition. The present invention has simple logic and significant effects, and can effectively shield the adverse effects of partial occlusion on face recognition, while also supporting face recognition tasks in unobstructed scenes.

本发明采用如下技术方案来实现的：The present invention is achieved by adopting the following technical solutions:

基于多注意力多尺度特征学习的遮挡人脸识别方法，该方法首先在卷积神经网络的基础上添加多层次注意力网络，提取人脸图像的通道注意力图和空间注意力图；其次，构建多尺度特征融合器，整合人脸图像的局部与全局信息，获得鲁棒性更强的人脸特征；然后，基于遮挡掩码生成器定位遮挡区域并生成遮挡掩码，削减遮挡区域的影响；最后，通过多任务学习网络同时进行遮挡类别的分类和人脸身份的分类，取得最好的人脸识别泛化效果。An occluded face recognition method based on multi-attention and multi-scale feature learning is proposed. First, a multi-level attention network is added to the convolutional neural network to extract the channel attention map and spatial attention map of the face image; secondly, a multi-scale feature fuser is constructed to integrate the local and global information of the face image to obtain more robust face features; then, the occluded area is located and the occlusion mask is generated based on the occlusion mask generator to reduce the impact of the occluded area; finally, the occlusion category and face identity are classified simultaneously through the multi-task learning network to achieve the best face recognition generalization effect.

本发明采用一种基于多注意力多尺度特征学习的遮挡人脸识别方法，该方法具体包括以下步骤：The present invention adopts an occluded face recognition method based on multi-attention multi-scale feature learning, which specifically includes the following steps:

提取人脸图像特征：采用残差神经网络作为人脸图像特征提取模块，提取人脸图像特征；Extract facial image features: Use residual neural network as facial image feature extraction module to extract facial image features;

获得人脸图像的通道注意力图：得到人脸图像特征之后，针对各个通道的特征图计算注意力权重，获取各通道与关键信息的相关程度；Obtain channel attention map of face image: After obtaining the features of face image, calculate the attention weight for the feature map of each channel to obtain the relevance of each channel to the key information;

获得人脸图像的空间注意力图：将通道注意力图细化后的特征图作为输入，计算不同像素点与关键信息的相关程度，得到人脸图像的多层次特征；Obtain the spatial attention map of the face image: take the feature map after the channel attention map is refined as input, calculate the correlation between different pixels and key information, and obtain the multi-level features of the face image;

人脸图像的多尺度特征融合：针对人脸图像的多层次特征，使用三层反卷积结构构建多尺度特征融合器，通过将不同尺度的特征图逐元素相加，得到包含不同分辨率和语义强度的人脸图像多尺度特征信息；Multi-scale feature fusion of face images: Based on the multi-level features of face images, a three-layer deconvolution structure is used to build a multi-scale feature fusion device. By adding feature maps of different scales element by element, multi-scale feature information of face images with different resolutions and semantic strengths is obtained.

遮挡掩码生成：学习对输入图像的遮挡位置高度敏感的特征掩码，计算遮挡区域对应的权重，通过对特征赋予不同权重的方式，消除被损坏特征对人脸识别的影响；Occlusion mask generation: Learn feature masks that are highly sensitive to the occlusion positions of the input image, calculate the weights corresponding to the occluded areas, and eliminate the impact of damaged features on face recognition by assigning different weights to features;

遮挡类别分类：将遮挡掩码生成器学习到的特征掩码作为输入，将其分类为遮挡类别，以监督遮挡掩码生成器的学习；Occlusion category classification: The feature mask learned by the occlusion mask generator is taken as input and classified into occlusion categories to supervise the learning of the occlusion mask generator;

人脸类别分类：将遮挡掩码生成器得到的区域权重与人脸图像多尺度特征信息相乘，得到清理后的人脸特征，并将其作为人脸类别分类器的输入，得到人脸类别的分类结果。Face category classification: Multiply the regional weights obtained by the occlusion mask generator with the multi-scale feature information of the face image to obtain the cleaned face features, which are used as the input of the face category classifier to obtain the classification result of the face category.

本发明进一步的改进在于，获得人脸图像的通道注意力图，具体包括：A further improvement of the present invention is to obtain a channel attention map of a face image, specifically including:

将提取到的人脸特征分别进行平均池化和最大池化操作聚合空间信息，并将其输入两个共享的全连接层拟合各通道特征之间的相关性，得到两个通道特征图；The extracted facial features are subjected to average pooling and maximum pooling operations to aggregate spatial information, and then input into two shared fully connected layers to fit the correlation between channel features, thus obtaining two channel feature maps.

对两个通道特征图中的对应元素相加，并使用Sigmoid激活函数处理，得到人脸图像的通道注意力图，图中的权重反映了该通道与关键信息的相关程度。The corresponding elements in the two channel feature maps are added and processed using the Sigmoid activation function to obtain the channel attention map of the face image. The weight in the map reflects the degree of relevance of the channel to the key information.

本发明进一步的改进在于，获得人脸图像的空间注意力图，包括：A further improvement of the present invention is to obtain a spatial attention map of a face image, including:

将已提取的人脸特征沿通道方向进行最大池化和平均池化操作，得到两个空间特征图；Perform maximum pooling and average pooling operations on the extracted facial features along the channel direction to obtain two spatial feature maps;

将两个空间特征图进行拼接，并通过卷积操作拟合空间维度上的特征相关性，得到空间注意力图，图中的权重反应了不同像素点与关键信息的相关程度。The two spatial feature maps are spliced together, and the feature correlation in the spatial dimension is fitted through convolution operation to obtain a spatial attention map. The weights in the map reflect the degree of correlation between different pixels and key information.

本发明进一步的改进在于，人脸图像的多尺度特征融合，具体包括：A further improvement of the present invention is that the multi-scale feature fusion of the face image specifically includes:

以获取人脸图像特征提取模块作为多尺度特征融合器的主体，使用从上到下的横向连接架构构建金字塔结构模型；The facial image feature extraction module is used as the main body of the multi-scale feature fusion device, and a pyramid structure model is constructed using a top-to-bottom horizontal connection architecture.

金字塔结构模型的输入为预处理的人脸图像，通过卷积操作和上采样操作得到包含不同分辨率和语义强度的人脸图像多尺度特征信息。The input of the pyramid structure model is a preprocessed face image, and multi-scale feature information of the face image with different resolutions and semantic strengths is obtained through convolution and upsampling operations.

本发明进一步的改进在于，遮挡掩码生成，具体包括：A further improvement of the present invention is that the occlusion mask generation specifically includes:

输入包含不同尺度、全局信息的人脸特征，通过卷积网络并结合PReLu激活函数、批正则化层和Sigmoid函数得到最终的遮挡掩码，用于清理因部分遮挡而受损的原始人脸特征。The input contains facial features of different scales and global information, and the final occlusion mask is obtained through the convolutional network combined with the PReLu activation function, batch normalization layer and Sigmoid function to clean up the original facial features damaged by partial occlusion.

本发明进一步的改进在于，遮挡类别分类，具体包括：A further improvement of the present invention is that the occlusion category classification specifically includes:

对人脸图片划分成若干矩形方格，通过矩形组合模拟遮挡区域并构建新的遮挡类别，基于此得到所有遮挡类别的遮挡字典，其中仍包含无遮挡的情况；The face image is divided into several rectangular grids, and the occlusion area is simulated by combining the rectangles and a new occlusion category is constructed. Based on this, an occlusion dictionary of all occlusion categories is obtained, which still includes the unoccluded case.

选取不同类别的口罩图片作为遮挡物，随机选择遮挡物中心将遮挡物图片整合在人脸图片上；Select different types of mask images as occluders, randomly select the occluder center and integrate the occluder image on the face image;

根据每个方格是否被遮挡计算对应的遮挡矩阵，并在已生成遮挡字典中查找对应的遮挡类别作为该遮挡人脸图片的标签；Calculate the corresponding occlusion matrix according to whether each square is occluded, and look up the corresponding occlusion category in the generated occlusion dictionary as the label of the occluded face image;

将已标记的遮挡人脸图片送入遮挡掩码生成器，学习与该遮挡类别相关的掩码；The labeled occluded face images are fed into the occlusion mask generator to learn the mask associated with the occlusion category;

将已学习到的掩码送入遮挡类别分类器分类，使用交叉熵作为损失函数监督遮挡掩码生成器的学习过程，以获得更准确的遮挡掩码。The learned mask is fed into the occlusion category classifier for classification, and the cross entropy is used as the loss function to supervise the learning process of the occlusion mask generator to obtain a more accurate occlusion mask.

本发明进一步的改进在于，人脸类别分类，具体包括：A further improvement of the present invention is that the face category classification specifically includes:

输入经过遮挡掩码处理后的人脸特征，采用基于边缘的损失函数LMCL监督模型学习与身份相关的人脸特征；Input the facial features after occlusion mask processing, and use the edge-based loss function LMCL supervised model to learn the identity-related facial features;

最终以人脸识别任务的损失函数与遮挡类别识别任务的损失函数相加作为最终的损失函数，监督模型使其更快收敛，完成人脸类别分类。Finally, the loss function of the face recognition task and the loss function of the occlusion category recognition task are added together as the final loss function to supervise the model so that it converges faster and completes the face category classification.

基于多注意力多尺度特征学习的遮挡人脸识别装置，包括：The occluded face recognition device based on multi-attention multi-scale feature learning comprises:

人脸图像特征提取模块，采用残差神经网络作为人脸图像特征提取模块，提取人脸图像特征；The face image feature extraction module uses a residual neural network as the face image feature extraction module to extract face image features;

通道注意力图构建模块，得到人脸图像特征之后，针对各个通道的特征图计算注意力权重，获取各通道与关键信息的相关程度；The channel attention map construction module calculates the attention weight for the feature map of each channel after obtaining the facial image features, and obtains the correlation between each channel and the key information;

空间注意力图构建模块，将通道注意力图细化后的特征图作为输入，计算不同像素点与关键信息的相关程度，得到人脸图像的多层次特征；The spatial attention map construction module takes the feature map refined by the channel attention map as input, calculates the correlation between different pixels and key information, and obtains the multi-level features of the face image;

多尺度特征融合模块，针对人脸图像的多层次特征，使用三层反卷积结构构建多尺度特征融合器，通过将不同尺度的特征图逐元素相加，得到包含不同分辨率和语义强度的人脸图像多尺度特征信息；The multi-scale feature fusion module uses a three-layer deconvolution structure to construct a multi-scale feature fuser for the multi-level features of facial images. By adding feature maps of different scales element by element, multi-scale feature information of facial images with different resolutions and semantic strengths is obtained.

遮挡掩码生成模块，学习对输入图像的遮挡位置高度敏感的特征掩码，计算遮挡区域对应的权重，通过对特征赋予不同权重的方式，消除被损坏特征对人脸识别的影响；The occlusion mask generation module learns feature masks that are highly sensitive to the occlusion positions of the input image, calculates the weights corresponding to the occluded areas, and eliminates the impact of damaged features on face recognition by assigning different weights to the features;

遮挡类别分类模块，将遮挡掩码生成器学习到的特征掩码作为输入，将其分类为遮挡类别，以监督遮挡掩码生成器的学习；The occlusion category classification module takes the feature mask learned by the occlusion mask generator as input and classifies it into occlusion categories to supervise the learning of the occlusion mask generator;

人脸类别分类模块，将遮挡掩码生成器得到的区域权重与人脸图像多尺度特征信息相乘，得到清理后的人脸特征，并将其作为人脸类别分类器的输入，得到人脸类别的分类结果。The face category classification module multiplies the regional weight obtained by the occlusion mask generator with the multi-scale feature information of the face image to obtain the cleaned face features, and uses them as the input of the face category classifier to obtain the classification result of the face category.

本发明至少具有如下有益的技术效果：The present invention has at least the following beneficial technical effects:

本发明提供的基于多注意力多尺度特征学习的遮挡人脸识别方法。该方法首先在人脸特征提取网络的基础上添加多层次注意力机制，提取人脸图像的通道注意力图和空间注意力图；其次，构建多尺度特征融合器，整合图像的局部与全局信息；接下来，通过遮挡掩码生成器定位遮挡区域并生成遮挡掩码，削减遮挡区域的影响；最后，通过一个多任务学习网络进行遮挡类别的分类和人脸身份的分类，得到最终的人脸识别结果。相比于普通的人脸识别算法，该方法可以削减口罩等遮挡物对人脸识别准确率的影响。实验表明，该方法在遮挡人脸任务上取得了97.76％的准确率，优于现有的遮挡人脸识别算法。在无遮挡人脸识别任务上，其准确率与现有方法相当。The present invention provides an occluded face recognition method based on multi-attention multi-scale feature learning. The method first adds a multi-level attention mechanism on the basis of the face feature extraction network to extract the channel attention map and spatial attention map of the face image; secondly, a multi-scale feature fuser is constructed to integrate the local and global information of the image; next, the occluded area is located and the occlusion mask is generated by the occlusion mask generator to reduce the influence of the occluded area; finally, the occlusion category is classified and the face identity is classified by a multi-task learning network to obtain the final face recognition result. Compared with ordinary face recognition algorithms, this method can reduce the influence of occluders such as masks on the accuracy of face recognition. Experiments show that the method achieves an accuracy of 97.76% in the occluded face task, which is better than the existing occluded face recognition algorithm. In the unoccluded face recognition task, its accuracy is comparable to that of the existing method.

本申请还提供了基于多注意力多尺度特征学习的遮挡人脸识别装置，其包括人脸特征提取模块、通道注意力图构建模块、空间注意力图构建模块、多尺度特征融合模块、遮挡掩码生成模块、遮挡分类模块、人脸分类模块共七个模块。人脸特征提取模块为后续模块提供了深层人脸特征；通道注意力和空间注意力模块提供了关于通道注意力图和空间注意力图的提取方法；多尺度特征融合模块可以将图像的局部与全局信息进行融合，得到更有利于识别的人脸图像多尺度特征信息；遮挡掩码生成模块可以定位遮挡区域并生成遮挡掩码，削减遮挡区域的影响，提高分类准确度；遮挡分类模块用于对遮挡掩码进行分类，有效监督遮挡掩码生成的过程；最后的人脸分类模块通过输入清理后的人脸特征，可以实现针对人脸类别的有效分类，提高分类效率。The present application also provides an occluded face recognition device based on multi-attention multi-scale feature learning, which includes seven modules: face feature extraction module, channel attention map construction module, spatial attention map construction module, multi-scale feature fusion module, occlusion mask generation module, occlusion classification module, and face classification module. The face feature extraction module provides deep face features for subsequent modules; the channel attention and spatial attention modules provide extraction methods for channel attention maps and spatial attention maps; the multi-scale feature fusion module can fuse local and global information of the image to obtain multi-scale feature information of the face image that is more conducive to recognition; the occlusion mask generation module can locate the occlusion area and generate the occlusion mask, reduce the influence of the occlusion area, and improve the classification accuracy; the occlusion classification module is used to classify the occlusion mask and effectively supervise the occlusion mask generation process; the final face classification module can achieve effective classification of face categories by inputting the cleaned face features, thereby improving the classification efficiency.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是基于注意力机制的遮挡人脸识别算法的整体处理流程图；Figure 1 is an overall processing flow chart of the occluded face recognition algorithm based on the attention mechanism;

图2是通道注意力图提取过程示意图；FIG2 is a schematic diagram of the channel attention map extraction process;

图3是空间注意力图提取过程示意图；FIG3 is a schematic diagram of the spatial attention map extraction process;

图4是多层次特征融合模块示意图；Fig. 4 is a schematic diagram of a multi-level feature fusion module;

图5是不同算法在LFW和Occ-LFW数据集上的混淆矩阵；Figure 5 is the confusion matrix of different algorithms on the LFW and Occ-LFW datasets;

图6是本发明提供的基于多注意力尺度特征学习的遮挡人脸识别装置功能模块图；FIG6 is a functional module diagram of an occluded face recognition device based on multi-attention scale feature learning provided by the present invention;

图7是本发明提供的实现所述基于多注意力尺度特征学习的遮挡人脸识别算法的电子设备的结构示意图。FIG7 is a schematic diagram of the structure of an electronic device for implementing the occluded face recognition algorithm based on multi-attention scale feature learning provided by the present invention.

具体实施方式DETAILED DESCRIPTION

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。需要说明的是，在不冲突的情况下，本发明中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the accompanying drawings, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided in order to enable a more thorough understanding of the present disclosure and to be able to fully convey the scope of the present disclosure to those skilled in the art. It should be noted that, in the absence of conflict, the embodiments of the present invention and the features described in the embodiments can be combined with each other. The present invention will be described in detail below with reference to the accompanying drawings and in combination with the embodiments.

参照图1，本发明提供的基于多注意力多尺度特征学习的遮挡人脸识别，该方法首先对输入人脸图像进行特征提取，并根据多层次注意力机制计算得到人脸图像的通道注意力图和空间注意力图；再经过多尺度特征融合器得到包含不同分辨率和语义强度的特征信息；遮挡类别识别任务分支经过遮挡掩码生成器学习该人脸图像的遮挡掩码，并将此掩码送入遮挡类别分类器监督掩码生成器的学习过程；人脸识别任务分支将遮挡类别识别任务分支学习到的遮挡掩码叠加在原始人脸特征上消除遮挡对识别的不利影响，完成遮挡人脸识别任务。具体包含以下模块：Referring to Figure 1, the present invention provides occluded face recognition based on multi-attention multi-scale feature learning. The method first extracts features from the input face image, and calculates the channel attention map and spatial attention map of the face image based on the multi-level attention mechanism; then obtains feature information containing different resolutions and semantic strengths through a multi-scale feature fuser; the occlusion category recognition task branch learns the occlusion mask of the face image through an occlusion mask generator, and sends this mask to the occlusion category classifier to supervise the learning process of the mask generator; the face recognition task branch superimposes the occlusion mask learned by the occlusion category recognition task branch on the original face features to eliminate the adverse effects of occlusion on recognition, and completes the occluded face recognition task. Specifically includes the following modules:

1.人脸图像的多层次特征提取：包括人脸图像特征提取、通道注意力图获取以及空间注意力图获取，具体有以下步骤：1. Multi-level feature extraction of face images: including face image feature extraction, channel attention map acquisition and spatial attention map acquisition. The specific steps are as follows:

Step1提取人脸图像特征：采用残差神经网络作为人脸图像特征提取网络，提取人脸图像特征；Step 1: Extract facial image features: Use residual neural network as facial image feature extraction network to extract facial image features;

Step2获得人脸图像的通道注意力图：参考图2，得到人脸图像特征之后，针对各个通道的特征图计算注意力权重，获取各通道与关键信息的相关程度；具体步骤如下：Step 2: Obtain the channel attention map of the face image: Referring to Figure 2, after obtaining the features of the face image, calculate the attention weight for the feature map of each channel to obtain the degree of relevance between each channel and the key information; the specific steps are as follows:

首先，对于输入

分别进行平均池化和最大值池化聚合空间信息，得到特征图

和

其次，将

和

输入两个共享的全连接层，以拟合通道特征之间的相关性：First, for the input

Average pooling and maximum pooling are performed to aggregate spatial information and obtain feature maps.

and

Secondly,

and

Enter two shared fully connected layers to fit the correlation between channel features:

式中，(W₀,b₀,W₁,b₁)分别代表两个全连接层各自的权重与偏置。

其中r为压缩比，以减少参数量。In the formula, (W ₀ ,b ₀ ,W ₁ ,b ₁ ) represent the weights and biases of the two fully connected layers respectively.

Where r is the compression ratio to reduce the number of parameters.

最后，将式(1)和(2)进行对应元素相加，并使用Sigmoid激活函数对其处理，得到最终的通道注意力图M_c(F)。该过程如下式所示。Finally, the corresponding elements of equations (1) and (2) are added and processed using the Sigmoid activation function to obtain the final channel attention map _Mc (F). The process is shown in the following equation.

式中，

即Sigmoid激活函数。In the formula,

That is the Sigmoid activation function.

Step3获得人脸图像的空间注意力图：参考图3，将通道注意力图细化后的特征图作为输入，计算不同像素点与关键信息的相关程度。具体步骤如下：Step 3: Obtain the spatial attention map of the face image: Referring to Figure 3, take the feature map after the channel attention map is refined as input and calculate the correlation between different pixels and key information. The specific steps are as follows:

该模块输入为经过通道注意力图细化后的特征图

输出为空间权重图M_s。首先，沿着通道方向对F进行最大池化和平均池化操作，得到特征图

和

然后，将平均池化结果

和最大池化结果

进行拼接，得到新的通道数为2的数据，并将其送入一层卷积神经网络以拟合空间维度上的特征相关性。该过程下式所示。The input of this module is the feature map after the channel attention map is refined.

The output is the spatial weight map _Ms. First, perform maximum pooling and average pooling operations on F along the channel direction to obtain the feature map

and

Then, the average pooling result is

And the maximum pooling result

After concatenation, new data with 2 channels is obtained, which is then fed into a convolutional neural network to fit the feature correlation in the spatial dimension. The process is shown in the following formula.

其中，

代表卷积核为k₁×k₂的卷积操作。in,

Represents a convolution operation with a convolution kernel of k ₁ ×k ₂ .

2.人脸图像的多尺度特征融合：参考图4，以步骤1中的人脸图像特征提取模块作为多尺度特征融合器的主体，使用从上到下的横向连接架构构建金字塔结构。该模型将处理过的人脸图像作为输入，输出不同尺度的人脸特征x₁,x₂,x₃。其中x₁是需要清理的底层人脸识别特征，x₂,x₃包含不同尺度的局部和全局信息，该过程可形式化表述如下2. Multi-scale feature fusion of face images: Referring to Figure 4, the face image feature extraction module in step 1 is used as the main body of the multi-scale feature fusion device, and a pyramid structure is constructed using a top-to-bottom horizontal connection architecture. The model takes the processed face image as input and outputs face features x ₁ , x ₂ , x ₃ of different scales. Among them, x ₁ is the underlying face recognition feature that needs to be cleaned, and x ₂ , x ₃ contain local and global information of different scales. The process can be formally expressed as follows

x₂＝conv(upsample(conv(x₁))+conv(C₂)) (5)x ₂ =conv(upsample(conv(x ₁ ))+conv(C ₂ )) (5)

x₃＝conv(upsample(conv(x₂))+conv(C₃)) (6)x ₃ =conv(upsample(conv(x ₂ ))+conv(C ₃ )) (6)

其中conv为卷积操作，upsample为上采样操作。Among them, conv is the convolution operation and upsample is the upsampling operation.

3.遮挡掩码生成：学习对输入图像的遮挡位置高度敏感的特征掩码，计算遮挡区域对应的权重，通过对特征赋予不同权重的方式，消除被损坏特征对人脸识别的影响。3. Occlusion mask generation: Learn a feature mask that is highly sensitive to the occlusion position of the input image, calculate the weights corresponding to the occluded area, and eliminate the impact of damaged features on face recognition by assigning different weights to the features.

4.多任务遮挡人脸分类模型：包括两个子任务，人脸类别分类和遮挡类别分类。具体步骤如下：4. Multi-task occluded face classification model: It includes two subtasks, face category classification and occlusion category classification. The specific steps are as follows:

Step1遮挡类别分类器：将遮挡掩码生成器学习到的特征掩码作为输入，将其分类为遮挡类别，以监督遮挡掩码生成器的学习。Step 1 Occlusion Category Classifier: Take the feature mask learned by the occlusion mask generator as input and classify it into occlusion categories to supervise the learning of the occlusion mask generator.

Step2人脸类别分类：将遮挡掩码生成器计算得到的区域权重与人脸图像多尺度特征信息相乘，得到清理后的人脸特征，将其作为人脸类别分类器的输入，得到人脸类别的分类结果。Step 2 Face category classification: Multiply the regional weight calculated by the occlusion mask generator with the multi-scale feature information of the face image to obtain the cleaned face features, which are used as the input of the face category classifier to obtain the face category classification result.

参照表1，与人脸识别领域的其他算法相比，本发明中所提出的算法在无遮挡人脸比对方面与Arcface基本一致，比遮挡人脸识别领域的FROM算法提升了0.4％。在有遮挡人脸比对方面，本文算法的准确率比Arcface提升了1.2％，比有遮挡人脸识别算法FROM提升了1％。这说明了基于注意力机制的遮挡人脸识别算法在人脸比对方面的有效性。Referring to Table 1, compared with other algorithms in the field of face recognition, the algorithm proposed in the present invention is basically consistent with Arcface in terms of unobstructed face comparison, and is 0.4% higher than the FROM algorithm in the field of occluded face recognition. In terms of occluded face comparison, the accuracy of the algorithm in this paper is 1.2% higher than Arcface and 1% higher than the occluded face recognition algorithm FROM. This shows the effectiveness of the occluded face recognition algorithm based on the attention mechanism in face comparison.

表1：本发明与其他人脸识别算法ArcFace和FROM在LFW和Occ-LFW数据集上人脸准确率的对比。Table 1: Comparison of face accuracy of the present invention and other face recognition algorithms ArcFace and FROM on LFW and Occ-LFW datasets.

为了进一步评估该算法与其他算法相比的情况，进一步得到了三个算法在LFW数据集和Occ-LFW数据集上的混淆矩阵，参考图5。在LFW数据集上，FROM最容易将相同人脸对与不同人脸对混淆，而Arcface最不易将两者混淆。在相同人脸比对方面，本发明的性能优于FROM，与Arcface性能相近；在不同人脸比对方面，本发明的性能明显优于FROM，但略逊于Arcface。在Occ-LFW数据集上，Arcface最容易将相同人脸对与不同人脸对混淆，而本发明最不易将两者混淆。在相同人脸比对和不同人脸比对方面，本发明的性能优于FROM和Arcface，这充分说明了基于注意力机制的遮挡人脸识别算法的有效性。In order to further evaluate the algorithm compared with other algorithms, the confusion matrices of the three algorithms on the LFW dataset and the Occ-LFW dataset were further obtained, refer to Figure 5. On the LFW dataset, FROM is most likely to confuse the same face pairs with different face pairs, while Arcface is least likely to confuse the two. In terms of same face comparison, the performance of the present invention is better than FROM and similar to that of Arcface; in terms of different face comparison, the performance of the present invention is significantly better than FROM, but slightly worse than Arcface. On the Occ-LFW dataset, Arcface is most likely to confuse the same face pairs with different face pairs, while the present invention is least likely to confuse the two. In terms of same face comparison and different face comparison, the performance of the present invention is better than FROM and Arcface, which fully demonstrates the effectiveness of the occluded face recognition algorithm based on the attention mechanism.

本发明提供的基于多注意力多尺度特征学习的遮挡人脸识别装置，包括人脸特征提取模块、通道注意力图构建模块、空间注意力图构建模块、多尺度特征融合模块、遮挡掩码生成模块、遮挡分类模块和人脸分类模块。The occluded face recognition device based on multi-attention and multi-scale feature learning provided by the present invention comprises a face feature extraction module, a channel attention map construction module, a spatial attention map construction module, a multi-scale feature fusion module, an occlusion mask generation module, an occlusion classification module and a face classification module.

1.人脸图像特征提取模块，采用残差神经网络作为人脸图像特征提取模块，提取人脸图像特征；1. Face image feature extraction module, using residual neural network as face image feature extraction module to extract face image features;

2.通道注意力图构建模块，得到人脸图像特征之后，针对各个通道的特征图计算注意力权重，获取各通道与关键信息的相关程度；2. Channel attention map construction module, after obtaining the facial image features, calculates the attention weight for the feature map of each channel to obtain the degree of relevance between each channel and the key information;

3.空间注意力图构建模块，将通道注意力图细化后的特征图作为输入，计算不同像素点与关键信息的相关程度，得到人脸图像的多层次特征；3. The spatial attention map construction module takes the feature map after the channel attention map is refined as input, calculates the correlation between different pixels and key information, and obtains the multi-level features of the face image;

4.多尺度特征融合模块，针对人脸图像的多层次特征，使用三层反卷积结构构建多尺度特征融合器，通过将不同尺度的特征图逐元素相加，得到包含不同分辨率和语义强度的人脸图像多尺度特征信息；4. Multi-scale feature fusion module: Based on the multi-level features of face images, a multi-scale feature fusion module is constructed using a three-layer deconvolution structure. By adding feature maps of different scales element by element, multi-scale feature information of face images with different resolutions and semantic strengths is obtained;

5.遮挡掩码生成模块，学习对输入图像的遮挡位置高度敏感的特征掩码，计算遮挡区域对应的权重，通过对特征赋予不同权重的方式，消除被损坏特征对人脸识别的影响；5. The occlusion mask generation module learns the feature mask that is highly sensitive to the occlusion position of the input image, calculates the weight corresponding to the occlusion area, and eliminates the influence of damaged features on face recognition by assigning different weights to the features;

6.遮挡类别分类模块，将遮挡掩码生成器学习到的特征掩码作为输入，将其分类为遮挡类别，以监督遮挡掩码生成器的学习；6. The occlusion category classification module takes the feature mask learned by the occlusion mask generator as input and classifies it into occlusion categories to supervise the learning of the occlusion mask generator;

7.人脸类别分类模块，将遮挡掩码生成器得到的区域权重与人脸图像多尺度特征信息相乘，得到清理后的人脸特征，并将其作为人脸类别分类器的输入，得到人脸类别的分类结果。7. The face category classification module multiplies the regional weight obtained by the occlusion mask generator with the multi-scale feature information of the face image to obtain the cleaned face features, and uses them as the input of the face category classifier to obtain the classification result of the face category.

虽然，上文中已经用一般性说明及具体实施方案对本发明作了详尽的描述，但在本发明基础上，可以对之作一些修改或改进，这对本领域技术人员而言是显而易见的。因此，在不偏离本发明精神的基础上所做的这些修改或改进，均属于本发明要求保护的范围。Although the present invention has been described in detail above with general descriptions and specific embodiments, it is obvious to those skilled in the art that some modifications or improvements can be made on the basis of the present invention. Therefore, these modifications or improvements made on the basis of not departing from the spirit of the present invention all belong to the scope of protection claimed by the present invention.

Claims

1. An occluded face recognition method based on multi-attention multi-scale feature learning is characterized in that, firstly, a multi-level attention network is added on the basis of a convolutional neural network to extract the channel attention map and spatial attention map of the face image; secondly, a multi-scale feature fuser is constructed to integrate the local and global information of the face image to obtain more robust face features; then, based on the occlusion mask generator, the occlusion area is located and the occlusion mask is generated to reduce the influence of the occlusion area; finally, the occlusion category classification and face identity classification are simultaneously performed through a multi-task learning network to achieve the best face recognition generalization effect.

2. The occluded face recognition method based on multi-attention multi-scale feature learning according to claim 1, characterized in that the method specifically comprises the following steps:

Extract facial image features: Use residual neural network as facial image feature extraction module to extract facial image features;

Obtain channel attention map of face image: After obtaining the features of face image, calculate the attention weight for the feature map of each channel to obtain the relevance of each channel to the key information;

Obtain the spatial attention map of the face image: take the feature map after the channel attention map is refined as input, calculate the correlation between different pixels and key information, and obtain the multi-level features of the face image;

Multi-scale feature fusion of face images: Based on the multi-level features of face images, a three-layer deconvolution structure is used to build a multi-scale feature fusion device. By adding feature maps of different scales element by element, multi-scale feature information of face images with different resolutions and semantic strengths is obtained.

Occlusion mask generation: Learn feature masks that are highly sensitive to the occlusion positions of the input image, calculate the weights corresponding to the occluded areas, and eliminate the impact of damaged features on face recognition by assigning different weights to features;

Occlusion category classification: The feature mask learned by the occlusion mask generator is taken as input and classified into occlusion categories to supervise the learning of the occlusion mask generator;

Face category classification: Multiply the regional weights obtained by the occlusion mask generator with the multi-scale feature information of the face image to obtain the cleaned face features, which are used as the input of the face category classifier to obtain the classification result of the face category.

3. The occluded face recognition method based on multi-attention multi-scale feature learning according to claim 2 is characterized in that obtaining a channel attention map of a face image specifically comprises:

The extracted facial features are subjected to average pooling and maximum pooling operations to aggregate spatial information, and then input into two shared fully connected layers to fit the correlation between channel features to obtain two channel feature maps;

The corresponding elements in the two channel feature maps are added and processed using the Sigmoid activation function to obtain the channel attention map of the face image. The weight in the map reflects the degree of relevance of the channel to the key information.

4. The occluded face recognition method based on multi-attention multi-scale feature learning according to claim 2, characterized in that obtaining a spatial attention map of a face image comprises:

Perform maximum pooling and average pooling operations on the extracted facial features along the channel direction to obtain two spatial feature maps;

The two spatial feature maps are concatenated, and the feature correlation in the spatial dimension is fitted through convolution operation to obtain a spatial attention map. The weights in the map reflect the degree of correlation between different pixels and key information.

5. The occluded face recognition method based on multi-attention multi-scale feature learning according to claim 2 is characterized in that the multi-scale feature fusion of the face image specifically includes:

The facial image feature extraction module is used as the main body of the multi-scale feature fusion device, and a pyramid structure model is constructed using a top-to-bottom horizontal connection architecture.

The input of the pyramid structure model is a preprocessed face image, and multi-scale feature information of the face image with different resolutions and semantic strengths is obtained through convolution and upsampling operations.

6. The occluded face recognition method based on multi-attention multi-scale feature learning according to claim 5, characterized in that the occlusion mask generation specifically comprises:

The input contains facial features of different scales and global information, and the final occlusion mask is obtained through the convolutional network combined with the PReLu activation function, batch normalization layer and Sigmoid function to clean up the original facial features damaged by partial occlusion.

7. The occluded face recognition method based on multi-attention multi-scale feature learning according to claim 2 is characterized in that the occlusion category classification specifically includes:

The face image is divided into several rectangular grids, and the occlusion area is simulated by combining the rectangles and a new occlusion category is constructed. Based on this, an occlusion dictionary of all occlusion categories is obtained, which still includes the unoccluded case.

Select different types of mask images as occluders, randomly select the occluder center and integrate the occluder image on the face image;

Calculate the corresponding occlusion matrix according to whether each square is occluded, and look up the corresponding occlusion category in the generated occlusion dictionary as the label of the occluded face image;

The labeled occluded face images are fed into the occlusion mask generator to learn the mask associated with the occlusion category;

The learned mask is fed into the occlusion category classifier for classification, and the cross entropy is used as the loss function to supervise the learning process of the occlusion mask generator to obtain a more accurate occlusion mask.

8. The occluded face recognition method based on multi-attention multi-scale feature learning according to claim 2, characterized in that the face category classification specifically includes:

Input the facial features after occlusion mask processing, and use the edge-based loss function LMCL supervised model to learn the identity-related facial features;

Finally, the loss function of the face recognition task and the loss function of the occlusion category recognition task are added together as the final loss function to supervise the model so that it converges faster and completes the face category classification.

9. An occluded face recognition device based on multi-attention multi-scale feature learning, characterized by comprising:

The face image feature extraction module uses a residual neural network as the face image feature extraction module to extract face image features;

The channel attention map construction module calculates the attention weight for the feature map of each channel after obtaining the facial image features, and obtains the correlation between each channel and the key information;

The spatial attention map construction module takes the feature map refined by the channel attention map as input, calculates the correlation between different pixels and key information, and obtains the multi-level features of the face image;

The multi-scale feature fusion module uses a three-layer deconvolution structure to construct a multi-scale feature fuser for the multi-level features of facial images. By adding feature maps of different scales element by element, multi-scale feature information of facial images with different resolutions and semantic strengths is obtained.

The occlusion mask generation module learns feature masks that are highly sensitive to the occlusion positions of the input image, calculates the weights corresponding to the occluded areas, and eliminates the impact of damaged features on face recognition by assigning different weights to features;

The occlusion category classification module takes the feature mask learned by the occlusion mask generator as input and classifies it into occlusion categories to supervise the learning of the occlusion mask generator;

The face category classification module multiplies the regional weight obtained by the occlusion mask generator with the multi-scale feature information of the face image to obtain the cleaned face features, and uses them as the input of the face category classifier to obtain the classification result of the face category.