CN110263756A

CN110263756A - A kind of human face super-resolution reconstructing system based on joint multi-task learning

Info

Publication number: CN110263756A
Application number: CN201910578695.4A
Authority: CN
Inventors: 吴成东; 王欢; 迟剑宁; 胡倩
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2019-09-20

Abstract

The present invention provides a human face super-resolution reconstruction system based on joint multi-task learning, comprising: an acquisition module, a first extraction module, a reconstruction module, a second extraction module, and a training module. The present invention obtains the shared representation of face features among related tasks through a joint training method for face multi-attribute learning tasks; then demonstrates the feasibility of perceptual loss in improving the reconstruction effect of face semantic information; finally, the human face The face attribute data set is enhanced, the data with missing relevant attribute labels is screened, and the feature point attributes are re-extracted with the facial key point detection algorithm. On this basis, joint multi-task learning is performed to generate super-resolution results with more realistic visual perception effects.

Description

A face super-resolution reconstruction system based on joint multi-task learning

技术领域technical field

本发明涉及人脸复原技术，适用于低分辨率下人脸图像的重建复原，具体涉及一种基于联合多任务学习的人脸超分辨率重建系统。The invention relates to face restoration technology, which is suitable for reconstruction and restoration of face images at low resolution, and in particular to a face super-resolution reconstruction system based on joint multi-task learning.

背景技术Background technique

监控环境采集到的图像会受到大气和成像的模糊作用以及目标运动变换的影响，导致捕获到的人脸面部图像分辨率较低，无法被人或机器所识别，因此提高获取图片的清晰度是一个亟须解决的问题。使用人脸超分辨率复原的技术增强人脸图像的分辨率的方法成为解决该问题的一项重要手段。人脸超分辨率重建是根据一个或多个观测到的低分辨率人脸图像预测高分辨率人脸图像的过程，是一个典型的病态问题。The images collected in the monitoring environment will be affected by the blurring effect of the atmosphere and imaging, as well as the target motion transformation, resulting in the low resolution of the captured face images, which cannot be recognized by humans or machines. Therefore, it is important to improve the clarity of the acquired pictures A problem that needs to be solved. Using face super-resolution restoration technology to enhance the resolution of face images has become an important means to solve this problem. Face super-resolution reconstruction is the process of predicting high-resolution face images from one or more observed low-resolution face images, which is a typical pathological problem.

面向人脸领域的超分辨率算法主要分为基于重构和基于学习的方法，其中基于学习的方法可细分为浅层学习和深度学习方法。基于重构的方法利用某种特定模型，从低分辨率图像中产生新的图像信息。然而，在实际的应用场景中，由于获取到的人脸图像的分辨率通常较低，因此需要较大尺度的放大倍数。但随着放大倍数的提高，基于重构的超分辨率算法性能下降明显，难以满足实际需要。而基于学习的方法则可以通过训练大型数据集的方法，重建出原低分辨率图像缺少的人脸高频边缘和纹理信息。Super-resolution algorithms for the face field are mainly divided into reconstruction-based and learning-based methods, and learning-based methods can be subdivided into shallow learning and deep learning methods. Reconstruction-based methods use a specific model to generate new image information from low-resolution images. However, in actual application scenarios, since the resolution of the acquired face images is usually low, a large scale magnification is required. However, as the magnification increases, the performance of the reconstruction-based super-resolution algorithm decreases significantly, which is difficult to meet the actual needs. The learning-based method can reconstruct the high-frequency edge and texture information of the face that the original low-resolution image lacks by training a large data set.

早期的人脸超分辨率算法假定人脸处于一个变化较小的受控环境中，学习先验图像梯度空间分布，通过特征变换实现低分辨率人脸和高分辨率人脸之间的映射。但是，由于面部组件匹配依赖人脸特征点的检测结果，当人脸图像的分辨率较小时，难以得到精确的检测结果。Early face super-resolution algorithms assumed that the face was in a controlled environment with little change, learned the prior image gradient spatial distribution, and realized the mapping between low-resolution faces and high-resolution faces through feature transformation. However, since facial component matching relies on the detection results of facial feature points, it is difficult to obtain accurate detection results when the resolution of the face image is small.

近年来，深度卷积神经网络成功的应用于人脸超分辨率任务。利用生成对抗网络(GAN)进行人脸超分辨率识别，利用对抗损失来判别生成人脸的真实性，并提出一种空间变换网络(STN)，用于反卷积网络的补偿环节。但由于生成对抗网络的训练过程不稳定，输出结果常会出现伪影。同时，由于人脸超分辨率的数据质量参差不齐，导致模型难以从噪声数据中区分出真正的相关信息。In recent years, deep convolutional neural networks have been successfully applied to face super-resolution tasks. Using Generative Adversarial Network (GAN) for face super-resolution recognition, using adversarial loss to judge the authenticity of the generated face, and proposing a Spatial Transformation Network (STN) for the compensation link of the deconvolution network. However, due to the unstable training process of the generative confrontation network, artifacts often appear in the output results. At the same time, due to the uneven data quality of face super-resolution, it is difficult for the model to distinguish the real relevant information from the noise data.

发明内容Contents of the invention

根据上述提出低分辨率下难以得到精确人脸识别结果的技术问题，而提供一种基于联合多任务学习的人脸超分辨率重建系统，结合人脸特征点检测，性别分类，面部表情识别等辅助任务，对人脸超分辨率技术进行优化。According to the above-mentioned technical problem that it is difficult to obtain accurate face recognition results at low resolution, a face super-resolution reconstruction system based on joint multi-task learning is provided, which combines face feature point detection, gender classification, facial expression recognition, etc. Auxiliary tasks to optimize face super-resolution technology.

本发明采用的技术手段如下：The technical means adopted in the present invention are as follows:

一种基于联合多任务学习的人脸超分辨率重建系统，其特征在于，包括：A face super-resolution reconstruction system based on joint multi-task learning, characterized in that it includes:

采集模块，所述采集模块采集小尺寸人脸图像并进行初步放大，得到大尺寸的低分辨率模糊人脸图像；A collection module, the collection module collects a small-sized face image and initially enlarges it to obtain a large-sized low-resolution blurred face image;

第一提取模块，所述第一提取模块利用多尺度特征图融合模型对所述模糊人脸图像进行特征提取，得到共享特征；A first extraction module, wherein the first extraction module uses a multi-scale feature map fusion model to perform feature extraction on the fuzzy face image to obtain shared features;

重建模块，所述第一重建模块对所述共享特征进行重建得到粗糙的高分辨率人脸图像，进而利用多任务学习方法对人脸性别信息、人脸表情信息、人脸年龄信息、人脸关键点信息以及高分辨率人脸图像进行融合，获取人脸特征在相关任务间的共享表示，最终获得人脸先验知识；A reconstruction module, the first reconstruction module reconstructs the shared features to obtain a rough high-resolution face image, and then utilizes a multi-task learning method to analyze the gender information of the face, the expression information of the face, the age information of the face, the Key point information and high-resolution face images are fused to obtain shared representations of face features among related tasks, and finally obtain face prior knowledge;

第二提取模块，所述第二提取模块将得到的所述高分辨率人脸图像和对应的高清人脸图像同时送入VGG16网络进行运算，得到与所述高分辨率人脸图像对应的第一人脸感知语义特征图，以及与所述高清人脸图像对应的第二人脸感知语义特征图，并提取所述第一人脸感知语义特征图与所述第二人脸感知语义特征图的差值；The second extraction module, the second extraction module sends the obtained high-resolution human face image and the corresponding high-definition human face image to the VGG16 network for calculation, and obtains the first corresponding to the high-resolution human face image. A face perception semantic feature map, and a second face perception semantic feature map corresponding to the high-definition face image, and extracting the first face perception semantic feature map and the second face perception semantic feature map the difference;

训练模块，所述训练模块将所述差值与所述人脸先验知识作为约束反向训练第一提取模块中的多尺度特征图融合模型。A training module, the training module uses the difference and the prior knowledge of the face as constraints to reverse train the multi-scale feature map fusion model in the first extraction module.

进一步地，所述采集模块采用双三次插值算法对输入图像进行初步放大。Further, the acquisition module initially enlarges the input image by using a bicubic interpolation algorithm.

进一步地，所述多尺度特征图融合模型通过残差结构将所述高分辨率人脸图像与所述人脸性别信息、人脸表情信息、人脸年龄信息、人脸关键点信息连接，使用编码器-解码器结构，修复人脸的细节和纹理特征，获得所述共享特征。Further, the multi-scale feature map fusion model connects the high-resolution face image with the face gender information, face expression information, face age information, and face key point information through a residual structure, using The encoder-decoder structure restores the details and texture features of the face to obtain the shared features.

进一步地，所述重建模块使用尺寸为3×3的卷积核对所述共享特征进行重建，对所述人脸性别信息、人脸表情信息、人脸年龄信息、人脸关键点信息检测任务使用全局均值池化和全连接层得到最终输出。Further, the reconstruction module uses a convolution kernel with a size of 3×3 to reconstruct the shared features, and uses Global mean pooling and fully connected layers get the final output.

进一步地，所述训练模块训练所述多尺度特征图融合模型时，对人脸关键点信息检测辅助任务使用平方损失函数，其他信息检测辅助任务使用交叉熵损失函数。Further, when the training module trains the multi-scale feature map fusion model, the square loss function is used for the auxiliary task of face key point information detection, and the cross entropy loss function is used for other information detection auxiliary tasks.

本发明先通过针对人脸多属性学习任务的联合训练方法，获取人脸特征在相关任务间的共享表示，并在此基础上结合感知损失技术提高人脸语义信息的重建效果，其有益效果包括：The present invention first obtains the shared representation of face features among related tasks through a joint training method for face multi-attribute learning tasks, and on this basis combines perceptual loss technology to improve the reconstruction effect of face semantic information, and its beneficial effects include :

1)本发明设计一种跨层连接的多尺度特征图融合网络，获取人脸信息在高维空间的特征表示，并通过对称式跨层连接结构对编码器和解码器在不同视觉层次的特征图进行融合，有效地提高算法的人脸超分辨重建效果。1) The present invention designs a multi-scale feature map fusion network with cross-layer connection, obtains the feature representation of face information in high-dimensional space, and uses a symmetrical cross-layer connection structure to compare the features of the encoder and decoder at different visual levels. The images are fused to effectively improve the face super-resolution reconstruction effect of the algorithm.

2)本发明利用人脸特征点、人脸表情、人脸性别和人脸性别等属性，准确地重建面部的细节，利用多任务学习方法把人脸超分辨率与面部特征点检测等相关任务结合起来，获取人脸特征在相关任务间的共享表示，从而进一步获得丰富的人脸先验知识。2) The present invention utilizes attributes such as face feature points, face expressions, face gender, and face gender to accurately reconstruct facial details, and uses a multi-task learning method to combine face super-resolution with facial feature point detection and other related tasks Combined, they obtain shared representations of facial features across related tasks, which further yield rich face prior knowledge.

3)本发明利用人脸先验知识和感知损失约束，产生在视觉感知上更加真实和清晰的人脸边缘和纹理细节。3) The present invention utilizes face prior knowledge and perceptual loss constraints to generate more realistic and clearer face edges and texture details in terms of visual perception.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图做以简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1为本发明系统工作流程图。Fig. 1 is the working flowchart of the system of the present invention.

图2为本发明多尺度融合模型工作流程图。Fig. 2 is a working flow chart of the multi-scale fusion model of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分的实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the solutions of the present invention, the following will clearly and completely describe the technical solutions in the embodiments of the present invention in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only It is an embodiment of a part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

需要说明的是，本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first" and "second" in the description and claims of the present invention and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.

本发明提供了一种基于联合多任务学习的人脸超分辨率重建系统，其特征在于，包括：The invention provides a face super-resolution reconstruction system based on joint multi-task learning, characterized in that, comprising:

采集模块，所述采集模块采集小尺寸人脸图像并进行初步放大，得到大尺寸的低分辨率模糊人脸图像。进一步地，所述采集模块采用双三次插值算法对输入图像进行初步放大。An acquisition module, the acquisition module acquires a small-sized human face image and initially enlarges it to obtain a large-sized low-resolution blurred human face image. Further, the acquisition module initially enlarges the input image by using a bicubic interpolation algorithm.

第一提取模块，所述第一提取模块利用多尺度特征图融合模型对所述模糊人脸图像进行特征提取，得到共享特征。超分辨率主任务与其他相关辅助任务都会使用共享特征进行学习并计算损失，计算所有任务的损失之和反向传播训练网络。具体地，采用利用残差结构把具有较高分辨率的浅层特征图与分辨率较低但语义特征明显的深层特征相连接。网络主体使用编码器-解码器的结构，其中编码器结构通过逐渐降低特征空间的维度，提取到较深层次的视觉特征送入解码器部分。解码器部分则使用深层次视觉特征逐步恢复空间维度，修复人脸的细节和纹理特征。A first extraction module, the first extraction module uses a multi-scale feature map fusion model to perform feature extraction on the fuzzy face image to obtain shared features. The super-resolution main task and other related auxiliary tasks will use the shared features to learn and calculate the loss, and calculate the sum of the losses of all tasks to backpropagate the training network. Specifically, a residual structure is used to connect shallow feature maps with higher resolution to deep features with lower resolution but obvious semantic features. The main body of the network uses an encoder-decoder structure, in which the encoder structure extracts deeper visual features and sends them to the decoder part by gradually reducing the dimension of the feature space. The decoder part uses deep-level visual features to gradually restore the spatial dimension and restore the details and texture features of the face.

重建模块，所述第一重建模块对所述共享特征进行重建得到粗糙的高分辨率人脸图像，进而利用多任务学习方法对人脸性别信息、人脸表情信息、人脸年龄信息、人脸关键点信息以及高分辨率人脸图像进行融合，获取人脸特征在相关任务间的共享表示，最终获得人脸先验知识。共享表示是指本发明通过多任务学习得到是人脸先验信息在高维空间的向量表示。虽然无法直观的可视化此类型特征，但实验结果表明，此类人脸先验知识能更好的实现人脸超分辨率任务。具体地，所述重建模块使用尺寸为3×3的卷积核对所述共享特征进行重建，对所述人脸性别信息、人脸表情信息、人脸年龄信息、人脸关键点信息检测任务使用全局均值池化和全连接层得到最终输出。A reconstruction module, the first reconstruction module reconstructs the shared features to obtain a rough high-resolution face image, and then utilizes a multi-task learning method to analyze the gender information of the face, the expression information of the face, the age information of the face, the Key point information and high-resolution face images are fused to obtain the shared representation of face features among related tasks, and finally obtain face prior knowledge. Shared representation refers to the vector representation of face prior information in high-dimensional space obtained through multi-task learning in the present invention. Although this type of feature cannot be visualized intuitively, the experimental results show that this type of face prior knowledge can better implement the face super-resolution task. Specifically, the reconstruction module uses a convolution kernel with a size of 3×3 to reconstruct the shared features, and uses Global mean pooling and fully connected layers get the final output.

第二提取模块，所述第二提取模块将得到的所述高分辨率人脸图像和对应的高清人脸图像同时送入VGG16网络进行运算，得到与所述高分辨率人脸图像对应的第一人脸感知语义特征图，以及与所述高清人脸图像对应的第二人脸感知语义特征图，并提取所述第一人脸感知语义特征图与所述第二人脸感知语义特征图的差值。其中，VGG-16是牛津大学在2014年提出来的模型，所述语义特征图是在第二个卷积段第四个卷积层的输出向量。The second extraction module, the second extraction module sends the obtained high-resolution human face image and the corresponding high-definition human face image to the VGG16 network for calculation, and obtains the first corresponding to the high-resolution human face image. A face perception semantic feature map, and a second face perception semantic feature map corresponding to the high-definition face image, and extracting the first face perception semantic feature map and the second face perception semantic feature map difference. Among them, VGG-16 is a model proposed by Oxford University in 2014, and the semantic feature map is the output vector of the fourth convolution layer in the second convolution section.

训练模块，所述训练模块将所述差值与所述人脸先验知识作为约束反向训练第一提取模块中的多尺度特征图融合模型。所述训练模块训练所述多尺度特征图融合模型时，对人脸关键点信息检测辅助任务使用平方损失函数，其他信息检测辅助任务使用交叉熵损失函数。A training module, the training module uses the difference and the prior knowledge of the face as constraints to reverse train the multi-scale feature map fusion model in the first extraction module. When the training module trains the multi-scale feature map fusion model, the square loss function is used for the auxiliary task of face key point information detection, and the cross entropy loss function is used for other information detection auxiliary tasks.

具体地，多尺度特征融合模型使用逐像素求差和感知损失的联合损失作为人脸超分辨率任务的损失函数，即：Specifically, the multi-scale feature fusion model uses the joint loss of pixel-wise difference and perceptual loss as the loss function of the face super-resolution task, namely:

L_SR＝L_MSE+λL_perce L _SR =L _MSE +λL _perce

其中，L_MSE代表逐像素比较的损失函数，L_perce代表语义特征比较的损失函数。Among them, L _MSE represents the loss function of pixel-by-pixel comparison, and L _perce represents the loss function of semantic feature comparison.

本发明中对特征点检测辅助任务使用平方损失函数，其他相关辅助任务使用交叉熵损失函数，训练期间的辅助任务损失函数具体形式如下：In the present invention, the square loss function is used for the auxiliary task of feature point detection, and the cross-entropy loss function is used for other related auxiliary tasks. The specific form of the auxiliary task loss function during training is as follows:

其中，为给定的人脸图像训练集，N为训练集中图片数量，x⁽ⁱ⁾为低分辨率图像，为对应的高分辨率图像，为关键点检测辅助任务人脸属性的真值，该真值是训练样本自带的图像属性，可以直接提取，为关键点检测辅助任务人脸属性的具体值，为其余各个辅助任务人脸属性的真值，为其余各个辅助任务人脸属性的具体值，k＝2,3分别代表表情分类、性别识别等分类任务，所以各自具体数值是一个概率，也就是0-1之间的数。λ₁为特征点检测辅助任务权重，λ_k,k＝2,3为其余各个辅助任务权重。in, For a given face image training set, N is the number of pictures in the training set, x ⁽ⁱ⁾ is a low-resolution image, For the corresponding high-resolution image, is the true value of the face attribute of the key point detection auxiliary task, the true value is the image attribute of the training sample, which can be directly extracted, is the specific value of the face attribute for the key point detection auxiliary task, is the true value of the face attributes of the other auxiliary tasks, is the specific value of the face attributes of the other auxiliary tasks, k=2, 3 respectively represent classification tasks such as expression classification and gender recognition, so the respective specific values are a probability, that is, a number between 0-1. λ ₁ is the weight of the auxiliary task of feature point detection, and λ _k , k=2, 3 is the weight of other auxiliary tasks.

上式中为人脸关键点信息检测任务使用的平方损失函数，为其他相关辅助任务使用的交叉熵损失函数。In the above formula The square loss function used for the face key point information detection task, A cross-entropy loss function used for other related auxiliary tasks.

所述训练模型使用梯度下降算法对模型进行训练。由于不同人脸属性任务的损失函数和学习难度各有不同，本发明的模型在训练开始时，超分辨率任务(也就是主任务)的学习受到人脸关键点检测、性别识别、表情分类等辅助任务的约束，以避免主干网络陷入不好的局部最优；随着训练的进行，当辅助任务的损失值下降到阈值以下时，该任务将不再使主任务受益，它们的学习过程即被停止。The training model uses a gradient descent algorithm to train the model. Since the loss function and learning difficulty of different face attribute tasks are different, when the model of the present invention starts training, the learning of the super-resolution task (that is, the main task) is subject to face key point detection, gender recognition, expression classification, etc. Auxiliary task constraints to avoid the backbone network from falling into a bad local optimum; as the training progresses, when the loss value of the auxiliary task drops below the threshold, the task will no longer benefit the main task, and their learning process is stopped.

如图1所示，本发明一种基于联合多任务学习的人脸超分辨率重建系统，该系统可执行以下步骤：As shown in Figure 1, the present invention is a face super-resolution reconstruction system based on joint multi-task learning, the system can perform the following steps:

步骤1，对输入人脸图像进行预处理，具体利用双三次采样算法对人脸图像进行初步放大。Step 1, preprocessing the input face image, specifically using the bicubic sampling algorithm to initially enlarge the face image.

在本实施例中，采用低分辨率人脸图像4000张作为训练集，1000张图像作为测试集，尺度为16*16大小。在步骤1中，对人脸图像进行初步8倍放大，得到128*128大小的低分辨率人脸图像。In this embodiment, 4,000 low-resolution face images are used as a training set, and 1,000 images are used as a test set, and the scale is 16*16. In step 1, the face image is initially magnified by 8 times to obtain a low-resolution face image with a size of 128*128.

步骤2，对步骤1得到的每一张低分辨率图像，分别利用多尺度特征图融合网络提取人脸信息在高维空间的特征表示。Step 2, for each low-resolution image obtained in step 1, use the multi-scale feature map fusion network to extract the feature representation of face information in high-dimensional space.

在本实施例中，采用多尺度特征图融合的方法，如图2所示，利用残差结构把具有较高分辨率的浅层特征图与分辨率较低但语义特征明显的深层特征相连接。同时该方法可以最大化网络中所有层之间的信息流，使得被连接层可以接受它前面部分层的特征作为输入。网络主体使用编码器-解码器的结构，其中编码器结构通过逐渐降低特征空间的维度，提取到较深层次的视觉特征送入解码器部分。解码器部分则使用深层次视觉特征逐步恢复空间维度，修复人脸的细节和纹理特征。通过对称式跨层连接结构对编码器和解码器在不同视觉层次的特征图进行融合，有效地提高算法的人脸超分辨重建效果。In this embodiment, the method of multi-scale feature map fusion is adopted, as shown in Figure 2, and the residual structure is used to connect the shallow feature map with higher resolution and the deep feature with lower resolution but obvious semantic features . At the same time, this method can maximize the information flow between all layers in the network, so that the connected layer can accept the features of some layers in front of it as input. The main body of the network uses an encoder-decoder structure, in which the encoder structure extracts deeper visual features and sends them to the decoder part by gradually reducing the dimension of the feature space. The decoder part uses deep-level visual features to gradually restore the spatial dimension and restore the details and texture features of the face. The feature maps of the encoder and decoder at different visual levels are fused through a symmetrical cross-layer connection structure, which effectively improves the face super-resolution reconstruction effect of the algorithm.

步骤3，将步骤2得到的人脸信息高维特征，分别送入不同人脸属性任务的分支，从而得到人脸初步超分辨率结果、人脸关键点位置、人脸年龄信息和人脸性别信息。Step 3. Send the high-dimensional features of face information obtained in step 2 to the branches of different face attribute tasks, so as to obtain the preliminary super-resolution result of the face, the position of key points of the face, the age information of the face, and the gender of the face information.

在本实施例中，超分辨率主任务使用3×3的卷积核和共享特征进行人脸重建，特征点检测等辅助任务使用全局均值池化和全连接层得到最终输出。In this embodiment, the super-resolution main task uses a 3×3 convolution kernel and shared features for face reconstruction, and auxiliary tasks such as feature point detection use global mean pooling and fully connected layers to obtain the final output.

步骤4，将步骤3得到的初步人脸超分辨率结果，和步骤1中该图在训练集中对应的高清人脸图像，分别输入到ImageNet数据集训练后的VGG-16模型，并提取第二个卷积段第四个卷积层的输出向量作为各自图像的高层语义特征，并计算差值。Step 4, input the preliminary face super-resolution result obtained in step 3, and the high-definition face image corresponding to the image in the training set in step 1 into the VGG-16 model trained on the ImageNet dataset, and extract the second The output vectors of the fourth convolutional layer of the first convolutional segment are used as the high-level semantic features of the respective images, and the difference is calculated.

步骤5，将步骤4计算得到的人脸语义特征感知损失与步骤3得到的人脸多属性信息作为约束，反向传播训练人脸超分辨率重建网络。In step 5, the face semantic feature perception loss calculated in step 4 and the face multi-attribute information obtained in step 3 are used as constraints, and the face super-resolution reconstruction network is trained by backpropagation.

在本实施例中，训练集是步骤1提出的4000张低分辨率人脸图像和与其对应的4000张高分辨率人脸图像。In this embodiment, the training set is the 4000 low-resolution face images proposed in step 1 and the corresponding 4000 high-resolution face images.

步骤6，将待重建的低分辨率人脸图像按照步骤1-3得到最终人脸超分辨率结果。In step 6, the low-resolution face image to be reconstructed is obtained according to steps 1-3 to obtain the final face super-resolution result.

在本实施例中，将步骤1中提出的1000张测试集图像按照步骤1-3得到最终人脸超分辨率结果。重建人脸图像在峰值信噪比评价指标上能到达30.65dB的精度。In this embodiment, the 1000 test set images proposed in step 1 are used to obtain the final face super-resolution result according to steps 1-3. The reconstructed face image can reach an accuracy of 30.65dB on the peak signal-to-noise ratio evaluation index.

综上，本发明本文针对人脸超分辨率问题提出了一种新的基于多任务联合学习的人脸重建算法。利用人脸特征点检测，性别分类，面部表情识别等辅助任务，对人脸超分辨率算法的性能进行优化。采用感知损失函数来代替求逐像素差距的损失函数，在恢复人脸边缘纹理特征的同时，提高了人脸感知语义信息的重建效果，在视觉感知效果上更加真实。实验分析表明本文提出的算法可以更好的利用人脸先验知识，产生在视觉感知上更加真实和清晰的人脸边缘和纹理细节。To sum up, the present invention proposes a new face reconstruction algorithm based on multi-task joint learning for the face super-resolution problem. The performance of the face super-resolution algorithm is optimized by using auxiliary tasks such as face feature point detection, gender classification, and facial expression recognition. The perceptual loss function is used to replace the loss function for calculating the pixel-by-pixel difference. While restoring the texture features of the face edge, the reconstruction effect of the perceptual semantic information of the face is improved, and the visual perception effect is more realistic. The experimental analysis shows that the algorithm proposed in this paper can make better use of the prior knowledge of the face, and produce more realistic and clearer face edges and texture details in terms of visual perception.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than limiting them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present invention. scope.

Claims

1. A face super-resolution reconstruction system based on joint multi-task learning, characterized in that, comprising:

A collection module, the collection module collects a small-sized face image and initially enlarges it to obtain a large-sized low-resolution blurred face image;

A first extraction module, wherein the first extraction module uses a multi-scale feature map fusion model to perform feature extraction on the fuzzy face image to obtain shared features;

A reconstruction module, the first reconstruction module reconstructs the shared features to obtain a rough high-resolution face image, and then utilizes a multi-task learning method to analyze the gender information of the face, the expression information of the face, the age information of the face, the Key point information and high-resolution face images are fused to obtain shared representations of face features among related tasks, and finally obtain face prior knowledge;

The second extraction module, the second extraction module sends the obtained high-resolution human face image and the corresponding high-definition human face image to the VGG16 network for calculation, and obtains the first corresponding to the high-resolution human face image. A face perception semantic feature map, and a second face perception semantic feature map corresponding to the high-definition face image, and extracting the first face perception semantic feature map and the second face perception semantic feature map the difference;

A training module, the training module uses the difference and the prior knowledge of the face as constraints to reverse train the multi-scale feature map fusion model in the first extraction module.

2. The human face super-resolution reconstruction system according to claim 1, wherein the acquisition module initially enlarges the input image using a bicubic interpolation algorithm.

3. The human face super-resolution reconstruction system according to claim 1 or 2, wherein the multi-scale feature map fusion model combines the high-resolution human face image with the human face gender through the residual structure Information, face expression information, face age information, and face key point information are connected, and the encoder-decoder structure is used to restore the details and texture features of the face to obtain the shared features.

4. The human face super-resolution reconstruction system according to claim 3, wherein the reconstruction module uses a convolution kernel with a size of 3×3 to reconstruct the shared feature, and the gender information of the human face, Face expression information, face age information, and face key point information detection tasks use global mean pooling and fully connected layers to obtain the final output.

5. the human face super-resolution reconstruction system according to claim 1, is characterized in that, when described training module trains described multi-scale feature map fusion model, uses square loss function to the auxiliary task of human face key point information detection, Other informative detection auxiliary tasks use the cross-entropy loss function.