CN117315735A

CN117315735A - Face super-resolution reconstruction method based on priori information and attention mechanism

Info

Publication number: CN117315735A
Application number: CN202211528427.XA
Authority: CN
Inventors: 端木春江; 吴成红; 叶靖
Original assignee: Zhejiang Normal University CJNU
Current assignee: Zhejiang Normal University CJNU
Priority date: 2022-12-01
Filing date: 2022-12-01
Publication date: 2023-12-29

Abstract

The invention discloses a face super-resolution reconstruction method based on priori information and an attention mechanism, wherein a model comprises the following steps: a shallow feature extraction network, a deep feature extraction network, a priori estimation network and a fine reconstruction network. The method comprises the following steps: firstly, inputting a low-resolution image, extracting shallow features of the image by using convolution, adding a residual block group and performing convolution operation to obtain shallow features, sending the obtained shallow features into a deep feature extraction network on one hand, sending the obtained shallow features into a priori estimation network on the other hand, sending the results of two branches into a fine reconstruction network, and outputting a final super-resolution reconstruction image. According to the invention, the face edge information and the face local analysis chart are used as priori information to be introduced into the face super-resolution reconstruction network, a high-efficiency channel attention mechanism is introduced into the network, the network can reconstruct a relatively clear face image, more facial features are provided, the complexity of the model is lower, and subjective evaluation and objective evaluation indexes are improved.

Description

Face super-resolution reconstruction method based on priori information and attention mechanism

Technical Field

The invention belongs to the technical field of image processing and super-resolution reconstruction of human faces, and particularly relates to a super-resolution amplification method of human face images based on facial priori information and an attention mechanism.

Background

The super-resolution reconstruction technology of the human face is a super-resolution technology aiming at a special structure of the human face, and aims to convert a low-resolution human face into a high-resolution human face through a certain technology. However, the face structure is special, unlike a usual image, the face structure has high-strength structural similarity and detail difference of identity information, the reconstruction difficulty is higher, the requirements are higher, in the reconstruction process, the consistency of geometric features is ensured, and the accurate recovery of texture information is also required to be noted. Therefore, face super-resolution reconstruction has a great challenge. The concept of super-resolution of a human face is firstly proposed by Baker and Kanada in 2000, and is a branch in the field of super-resolution of images, and super-resolution is performed specifically for a special scene of the human face. In recent years, the deep learning technology has been widely used in image processing, so that the face super-resolution field is also combined with the deep learning technology, and a new development stage is started from the face super-resolution field.

The face super-resolution technology based on deep learning can be divided into: the face super-resolution reconstruction method comprises the following steps of face super-resolution reconstruction based on interpolation, face super-resolution reconstruction based on reconstruction, face super-resolution method based on convolutional neural network and face super-resolution method based on countermeasure generation network. Dong et al propose the SRCNN model, which applies deep learning to image super-resolution for the first time. The SRCNN firstly uses bicubic interpolation to amplify the low-resolution image to the target size, then extracts image characteristics through a three-layer convolutional neural network, establishes a nonlinear mapping relation, and finally generates a high-resolution image, thereby greatly improving the reconstruction effect; huang D and Liu H propose an SRCNN-IBP algorithm based on SRCNN network, combine SRCNN network and iterative back projection algorithm (IBP), SRCNN-IBP algorithm can be regarded as introducing priori information of high-resolution image on the basis of SRCNN algorithm, so the quality of reconstructed image is superior to SRCNN algorithm, and meanwhile, it is explained that the priori information is important for super-resolution reconstruction of human face; the Leding et al uses a generated countermeasure network (GAN) to solve the super-resolution problem, and proposes a generated countermeasure network (SRGAN) based on image super-resolution, and uses a trained discriminator network to distinguish SR images from original real images; yuChen et al propose a face super-resolution reconstruction method FSRNet added with priori information, the method extracts face geometric information, and the result shows that the face recovery effect can be improved by utilizing face key points and a face analytic graph, but the generated face image has insufficient texture details, a complex model and a large amount of time.

Therefore, how to strengthen the recovery effect of priori information on the face, fully utilize the high-frequency characteristics and reduce redundant information, and provide a face super-resolution reconstruction method based on the priori information and the attention mechanism is a problem to be solved by those skilled in the art.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a face super-resolution reconstruction method based on prior information and an attention mechanism.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the human face super-resolution method based on the prior information and the attention mechanism is characterized by comprising the following steps of: a shallow feature extraction network, a deep feature extraction network, a priori estimation network and a fine reconstruction network;

the low-resolution face image feature extraction method comprises a low-layer feature extraction network for extracting the low-layer features of a face image, and a convolution layer for extracting features of the low-resolution face image, wherein the convolution layer can only extract preliminary features from the low-resolution image to generate a relatively coarse high-resolution face image；

Deep feature extraction network for extracting deep features of human face and extracting coarse high-resolution human face imageInput to deep feature extraction network->Deep feature extraction and->Comprises a convolution kernel of 3×3, the step size is 2; through a batch normalization layer and a ReLU activation function, 12 residual blocks are passed; finally, obtaining the extracted 64-channel characteristic diagram +.>. The formula is as follows:

wherein,representing a coarse high resolution face image, < >>Representing the deep feature extraction network employed;

the prior estimation network adopts 7x7 convolution check firstPerforming convolution, then performing normalization, reLU and other operations to obtain a 64x64 feature map, and connecting 3 residual blocks behind the obtained feature map; 2 stacking HourGlass networks, namely HourGlass modules, are constructed, prior information extraction is carried out, and in order to effectively merge features across scales and reserve space information of different scales, the HourGlass modules adopt a jump connection mechanism between symmetrical layers; 1The x1 convolution layer post-processes the obtained features, connecting the shared features to two separate 1x1 convolution layers to generate a heatmap and a resolution map>. The formula is as follows:

wherein,representing a coarse high resolution face image, < >>Representing the adopted prior estimation network;

fine rebuilding network, first mapping the feature mapAnd resolution map->Fusing the analysis chart and the feature chart to obtain a fused feature chart +.>The method comprises the steps of carrying out a first treatment on the surface of the The feature map is then->Inputting the reduced characteristic image into a fine reconstruction network, and firstly reducing the channel number of the characteristic image by using a 3X 3 convolution layer process; up-sampling the feature map by a 4x 4 deconvolution layer, connecting 3 residual blocks to decode the feature, and processing by a 3 x 3 deconvolution layer to obtain the feature map; finally, the feature map is sent to an ECA attention module to obtain a final fine super-resolution face image +.>；

Preferably, the network is a coarse reconstruction networkComprising the following steps:

performing nonlinear mapping through 3 residual blocks to generate a feature map; reconstructing by using a feature map based on an attention mechanism, and passing through a 3X 3 convolution layer; finally, an ECA attention module is added behind the convolution layer to generate a relatively coarse high-resolution face image. The formula is as follows:

wherein,representing a bicubic upsampled low resolution face image +.>Representing the adopted shallow feature extraction network;

preferably, the network is finely rebuiltComprising the following steps:

first, the feature map is formedAnd resolution map->Fusing the analysis graph and the feature graph to obtain a fused feature graphThe method comprises the steps of carrying out a first treatment on the surface of the The feature map is then->Inputting the characteristic image into a fine reconstruction network, firstly reducing the channel number of the characteristic image by using a 3X 3 convolution layer process, up-sampling the characteristic image by using a 4X 4 deconvolution layer, decoding the characteristic image by connecting 3 residual blocks, and then obtaining the characteristic image by using a 3X 3 convolution layer process; finally, willThe feature map is sent to an ECA attention module to obtain a final fine super-resolution face image +.>；

Preferably, the loss function comprises:

pixel loss

In image super-resolution reconstruction, higher evaluation indexes such as PSNR and SSIM can be obtained usually by using a mean square error (mean square error, MSE) loss, but high-frequency texture information is usually lost, resulting in excessive smoothing of the image. In order to avoid the above problems, L1 loss is used as a pixel loss function, there are

Face priori loss of the second place

In order to restrict the estimation process of the prior information of the human face, the prior information of the human face is fully utilized, and the prior estimation network is optimized by using the prior loss of the human face, which comprises the following steps of

Total loss of

The model total loss function is weighted and combined to obtain the total loss function finally used for model training, namely

Wherein the loss function adopts a mean square error loss function,representing the total number of training set images, +.>Is the i-th high resolution image, +.>Is the corresponding i-th coarse high resolution restoration image,/or->The corresponding i-th processed fine high-resolution recovery image; />Representing the real face resolution map corresponding to the ith item,/->And representing a real face analytic graph obtained by the i-th image through the prior estimation network.

The human face super-resolution method based on the prior information and the attention mechanism comprises the following steps:

s1, downloading an original image data set, wherein the original image data set comprises an original face image and an original face analysis chartData processing is carried out, an original image after the data processing is input into a downsampling model, a low-resolution image is obtained through processing, then double-three upsampling is carried out on the low-resolution image, an image with the same size as a high-resolution image is obtained as a low-resolution data set, and finally the data set is divided into a training set and a testing set;

s2, inputting the image obtained in the S1 into a shallow feature extraction module to extract shallow features of the face image, and extracting features of the low-resolution face image by using a convolution layer, wherein the convolution layer only can extract outline features of the face image to obtain a rough high-resolution image；

S3, the rough high-resolution image obtained in the S2 is processedInputting into deep feature extraction network for feature extraction to obtain feature map +.>；

S4, carrying out rough high-resolution image obtained in S2Inputting into a priori estimation network, extracting priori information to obtain an analytic graph +.>Wherein the prior estimation network consists of ResNet and stacked hourglass networks;

s5, the feature map obtained in the S3 is processedAnd S4>Fusing the analysis chart and the feature chart to obtain a fused feature chart +.>；

S6, the feature map obtained in the S5 is processedInputting the images into a fine reconstruction network for super-resolution reconstruction to obtain a final fine reconstruction face image +.>；

S7, training set images obtained in the step S2Original high resolution image->Final result->Input into a pixel-by-pixel loss function, and generate a fine high resolution image by processing the pixel-by-pixel loss function>Calculating to obtain loss function->The method comprises the steps of carrying out a first treatment on the surface of the Resolution map obtained in S4->And an analytic map in the original image dataset +.>Input into the pixel-by-pixel loss function, calculate the loss function +.>The method comprises the steps of carrying out a first treatment on the surface of the Adding the above loss functions to obtain the total loss function +.>Continuously iterating to minimize the loss function, training, and finally generating a face super-resolution network model;

s8, setting super parameters of the face super-resolution network model, inputting the preprocessed test set of S1 into the face super-resolution network model, and finally generating a high-resolution face image with clear detail texture and better effect through residual network processing and loss function minimization iteration.

Compared with the prior art, the invention has the beneficial effects that:

(1) In order to improve the capability of recovering edge information of a network, the invention adds the prior information of the face, takes the partial analysis chart of the face and the face as the prior information constraint of the network, respectively fuses the analysis chart and the characteristic chart corresponding to different face components, increases the guidance effect of the analysis chart on the super resolution of the face image, more effectively utilizes the extracted useful characteristics, improves the reconstruction efficiency, strengthens the reconstruction effect and reconstructs finer face geometric information;

(2) According to the invention, the high-efficiency attention module ECA is added after the network is finely reconstructed, so that the utilization effect of the network on the characteristic information is improved, the network can learn purposefully, the characteristic channel information is adaptively adjusted, the expression capacity of the characteristics is enhanced, more details such as contour textures are recovered, the human eye perception effect of a human face image is improved, and the subjective evaluation and objective evaluation standards are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of the overall structure of a super-resolution network used in a face super-resolution reconstruction method based on prior information and an attention mechanism;

FIG. 2 is a schematic diagram of a shallow feature extraction network structure used in a face super-resolution reconstruction method based on prior information and an attention mechanism; where conv denotes the convolutional layer operation and Res denotes the convolutional layer operation with residual.

FIG. 3 is a schematic diagram of a deep feature extraction network structure used in a face super-resolution reconstruction method based on prior information and an attention mechanism;

FIG. 4 is a schematic diagram of a prior estimation network used in a face super-resolution reconstruction method based on prior information and an attention mechanism; wherein HourGlass represents a stacked HourGlass network module;

FIG. 5 is a schematic diagram of a fine reconstruction network used in a face super-resolution reconstruction method based on prior information and an attention mechanism of the present invention;

FIG. 6 is a schematic diagram of a stacked hourglass network used in a face super-resolution reconstruction method based on prior information and an attention mechanism of the present invention;

FIG. 7 is a schematic diagram of an efficient channel attention network used in a face super-resolution reconstruction method based on prior information and an attention mechanism;

FIG. 8 is a schematic view of a partial image of a CelebA Mask-HQ dataset used in a face super-resolution reconstruction method based on prior information and attention mechanisms of the present invention, wherein only the image of the lower part of the eyes in the face is shown;

FIG. 9 is a diagram of a super-resolution image of a face generated by the present invention in comparison with other networks; wherein outer represents the proposed method of the invention;

fig. 10 is an enlarged detail contrast diagram of the face super-resolution image generated by the present invention and other networks.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention provides a face super-resolution reconstruction method based on priori information and an attention mechanism, which is shown in fig. 1 and comprises the following steps: a shallow feature extraction network, a deep feature extraction network, a priori estimation network and a fine reconstruction network;

Deep feature extraction network for extracting deep features of human face and extracting coarse high-resolution human face imageInput to deep feature extraction network->Deep feature extractionGet (get)>Comprises a convolution kernel of 3×3, the step size is 2; through a batch normalization layer and a ReLU activation function, 12 residual blocks are passed; finally, obtaining the extracted 64-channel characteristic diagram +.>. The formula is as follows:

the prior estimation network adopts 7x7 convolution check firstPerforming convolution, then performing normalization, reLU and other operations to obtain a 64x64 feature map, and connecting 3 residual blocks behind the obtained feature map; 2 stacking HourGlass modules, namely HourGlass modules, are constructed, prior information extraction is carried out, and in order to effectively merge features across scales and retain spatial information of different scales, the HourGlass modules adopt a jump connection mechanism between symmetrical layers; the 1x1 convolution layer post-processes the obtained features, connecting the shared features to two separate 1x1 convolution layers to generate a heatmap and a resolution map>. The formula is as follows:

It should be noted that: the super-resolution reconstruction of the human face is a super-resolution technology aiming at a special structure of the human face, and aims to convert a low-resolution human face into a high-resolution human face through a certain technology. However, the face structure is special, the reconstruction difficulty is higher, the requirements are higher, and in the reconstruction process, the consistency of geometric features is ensured, and the accurate recovery of texture information is also required to be noted. However, experiments prove that the addition of the prior information only cannot generate an ideal face output result, and the key is how to establish a super-resolution method for simultaneously improving the human eye perception effect and the objective evaluation standard according to the structure of the face. Therefore, the invention provides the facial super-resolution method based on the prior information and the attention mechanism, the addition of the prior information aims at fusing the analysis graphs and the feature graphs corresponding to different facial components, the extracted useful features are more effectively utilized, the reconstruction efficiency is improved, and finer facial geometric information is reconstructed; the addition of the high-efficiency attention module ECA improves the utilization effect of the network on the characteristic information, enables the network to learn purposefully, adjusts the characteristic channel information in a self-adaptive mode, enhances the expression capacity of the characteristics, is beneficial to recovering more details such as contour textures and the like, and improves the human eye perception effect of the face image.

In order to further implement the above technical solution, a coarse reconstruction networkComprising the following steps:

performing nonlinear mapping through 3 residual blocks to generate a feature map; reconstructing by using a feature map based on an attention mechanism, and passing through a 3X 3 convolution layer; finally, an ECA attention module consisting of 3 ECA modules is added behind the convolution layer to generate a relatively coarse high-resolution face image. The formula is as follows:

in order to further implement the above technical solution, a network is finely rebuiltComprising the following steps:

first, the feature map is formedAnd resolution map->Fusing the analysis graph and the feature graph to obtain a fused feature graphThe method comprises the steps of carrying out a first treatment on the surface of the The feature map is then->Inputting the characteristic image into a fine reconstruction network, firstly reducing the channel number of the characteristic image by using a 3X 3 convolution layer process, up-sampling the characteristic image by using a 4X 4 deconvolution layer, decoding the characteristic image by connecting 3 residual blocks, and then obtaining the characteristic image by using a 3X 3 convolution layer process; finally, the feature map is sent to an ECA attention module to obtain a final fine super-resolution face image +.>；

In order to further implement the above technical solution, the loss function includes:

pixel loss

Face priori loss of the second place

Total loss of

It should be noted that:

because the network is trained end to end, the three losses are matched with the respective weights to be added to be the total loss function of the face super-resolution network. Inputting the training set image, the original high-resolution image, the original analytic image, the analytic image extracted through the network and the final result into a pixel-by-pixel loss function, generating the high-resolution image through processing the pixel-by-pixel loss function, continuously iterating to minimize the loss function, obtaining a set of weight parameters which minimize the total loss function, taking the set of parameters as trained model parameters, and obtaining the trained face super-resolution model.

The invention will be further illustrated by the following specific experiments:

1. data set

The Celeb dataset is a large-scale face detection reference dataset of the university of hong Kong Chinese. It contains 202599 face pictures of 10177 celebrities, and the images in this dataset cover large pose changes and background clutter. Each image has 40 attribute notes, such as whether to wear glasses, long and short hair, nose, lips, color, gender and the like, and the data set is marked by gender to distinguish the gender of the human face, wherein the images comprise 118165 human face pictures of females and 138704 human face pictures of males.

CelebA Mask-HQ is a high-quality face attribute segmentation image of CelebA, and a total of 30000 high-definition face images with 1024 multiplied by 1024 sizes are obtained. For CelebA Mask-HQ, we randomly selected 17000 pictures for training, and the rest 13000 images were used for testing; for the Helen dataset we randomly selected 1200 pictures for training and the remaining 400 pictures for testing.

2. Training details

We cut the training image roughly from the face area, without any pre-alignment to 128 x 128, using a color image for training. The low resolution image is first bi-cubic interpolated to the high resolution image size and then trained, using RMSprop algorithm (root mean square prop) training model with initial learning rate of 2.5 x 10-4 and minimum batch of 14. For face images of CelebA Mask-HQ, we resize them to 128×128 as the original real image.

3. Analysis of results

A. Quantitative analysis

The previous method often ignores the face detail recovery so as to sacrifice part of the quality of the face detail to improve the face reconstruction effect, and the method has a great influence on the whole quality of the image or is unfavorable for the input of the image as a task of the next stage. For the image super-resolution reconstruction task, the invention adopts peak signal-to-noise ratio PSNR and structural similarity SSIM as indexes for evaluating SR performance. The face super-resolution network of the invention applies the super-resolution reconstruction algorithm to the face image task for the first time, and performs bicubic interpolation downsampled image size conversion operation on the network input image, so that the face super-resolution network cannot directly carry out numerical comparison with other face super-resolution models. For fair comparison, the invention performs downsampling processing of input images on several public super-resolution models, and selects URDGN algorithm and FSRNet super-resolution model for the public face super-resolution model; for the super-resolution reconstruction model, the invention selects three reconstruction algorithms of SRCNN algorithm, EDSR algorithm and Bicubic for comparison. After the models are debugged to the optimal state, each super-resolution model is respectively embedded into the face super-resolution model, training is carried out on CelebA Mask-HQ and Helen data sets one by one, and the larger PSNR and SSIM are, the better. Data indexes describing super-resolution performance when the magnification factor is x 8 on the CelebA Mask-HQ and Helen datasets are shown in Table 1.

Table 1 shows the index describing the super-resolution performance at a magnification factor of 8 on the CelebA Mask-HQ and Helen datasets, PSNR/SSIM, respectively, with the optimal results bolded.

TABLE 1

It should be noted that, according to the data in table 1, it can be found that the model of the present invention has significant performance improvement compared with other methods, and in SR performance, the method of the present invention has a 0.43dB improvement and 0.34dB improvement over the inferior method PSNR on the CelebA Mask-HQ and the Helen dataset, and a 0.01 improvement over the CelebA Mask-HQ on the ssim.

B. Qualitative analysis

A face image is selected from the test set, the image reconstruction effect of the improved algorithm is compared with that of the original algorithm, the effect is shown in figure 9, the reconstructed image of the original algorithm can be found to have obvious distortion in the areas such as eyes, lips and the like, the improved algorithm improves the image quality of the areas, the distortion of the face is obviously reduced, and the improved algorithm can prove that the improved algorithm has strong technical support for improving the quality of the reconstructed face image and reducing the effectiveness of the distortion of the reconstructed image and distinguishing the face.

Face images are selected from the test set, face image reconstruction is carried out on the face images by using various algorithms, the overall face image reconstruction effect and the local amplification reconstruction effect are shown as shown in fig. 10, the reconstructed face images obtained by the Bicubic algorithm through an interpolation method ignore many detail information, the images are too fuzzy, the SRCNN algorithm uses a convolutional neural network structure, the reconstruction effect is improved in structural similarity relative to the interpolation method, the recovery effect of the EDSR algorithm is relatively improved obviously, the result distortion of the URDGN recovery is obvious, the FSRNet algorithm uses a more complex network structure, the reconstructed image quality is improved, but the images are limited by parts and are too smooth. The invention improves the perception quality, the image has the perception effect which is more in line with the human eyes, the image has the texture which is more similar to the high-resolution image, and the PSNR and the SSIM are improved.

4. Ablation experiments

(1) Influence of attention module

To verify the role of the attention mechanism, the network was split into 2 comparative networks, one being the base network with the attention module added and the other with the base network with the attention module removed, and the observations retrained for each network as shown in table 2.

TABLE 2

The observation of the table can lead to the conclusion that each data index shows a better result along with the increase of the attention module, which proves the effectiveness of the attention mechanism on the face super-resolution reconstruction task. It is worth noting that from experimental data, it can be found that the effect of each item of data is slowed down as attention increases, since attention has collected enough features. At the same time, the number of attentiveness modules ECA necessarily affects the load capacity of the network, considering the trade-off between performance and calculation, it is recommended to select the number of ECAs according to the characteristics of the dataset itself. In qualitative and quantitative experimental analysis, the present invention selects the best performing ECA number of 3 networks to compare with other methods. In order to reduce training costs, the invention selects ECA number 3 for the study of other ablation experiments.

(2) Influence of attention mechanisms and a priori information

To verify the effect of the attention mechanism and the a priori information, the network was split into 2 comparison networks, one being the base network to which the attention module and the a priori information were added, the other being the base network from which the attention module and the a priori information were removed, and the observations were retrained for each network, with the results shown in table 3.

TABLE 3 Table 3

The observation of the table can lead to the conclusion that each data index shows a better result along with the increase of the attention module, which proves the effectiveness of the attention mechanism and the prior information on the face super-resolution reconstruction task. It is worth noting that from experimental data it can be found that the effect of each item of data is slowed down as the attention and a priori information increases, since a sufficient number of features have been acquired by the a priori information and attention module. At the same time, the number of attentiveness modules ECA and the amount of network load that the extraction of a priori information necessarily affects, considering the trade-off between performance and calculation, it is recommended to select the number of ECAs according to the characteristics of the dataset itself. In qualitative and quantitative experimental analysis, the present invention selects the best performing ECA number 1 network to compare with other methods. In order to reduce training costs, the invention selects ECA number of 1 for the study of other ablation experiments.

(3) Influence of the number of stacked hourglass networks

To verify the effect of stacking the hourglass networks, we studied a priori estimating the effect of the number of stacked hourglass networks in the network on network performance, retraining each network, and observing the results shown in table 4.

TABLE 4 Table 4

The observation of the table can lead to the conclusion that each data index shows better results with the increase of the number of HourGlass, which proves the effectiveness of the number of HourGlass on the task of face super-resolution reconstruction. It is worth noting that from experimental data it can be found that as the number of HourGlass increases, the effect of each item of data is slowed down, since a priori information has already acquired enough features. Meanwhile, the number of HourGlass inevitably affects the load capacity of the network, and considering the trade-off between performance and calculation amount, it is suggested to select the number of HourGlass according to the characteristics of the data set itself. Therefore, we finally select the number of HourGlass to be 2 for experiments, and obtain good reconstruction effect.

In summary, the invention provides a face super-resolution reconstruction method based on prior information and an attention mechanism to recover finer face images. The method comprises the steps of inputting a low-resolution image, extracting shallow features of the image by using convolution, adding a residual block group and performing convolution operation to obtain shallow features, sending the obtained shallow features into a deep feature extraction network, sending the obtained shallow features into a priori estimation network, adding a high-efficiency channel attention module in the middle of the deep feature extraction network, improving indexes and visual effects of face image restoration, and finally sending results of two branches into a fine reconstruction network to output satisfactory super-resolution reconstructed images. Is validated on the common dataset and proves to be superior to the partial method.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The human face super-resolution method based on the prior information and the attention mechanism is characterized by comprising the following steps of: a shallow feature extraction network, a deep feature extraction network, a priori estimation network and a fine reconstruction network;

Deep feature extraction network for extracting deep features of human face and extracting coarse high-resolution human face imageInput to deep feature extraction network->Deep feature extraction and->Comprises a convolution kernel of 3×3, the step size is 2; through a batch normalization layer and a ReLU activation function, 12 residual blocks are passed; finally, obtaining the extracted 64-channel characteristic diagram +.>The formula is as follows:

the prior estimation network adopts 7x7 convolution check firstPerforming convolution, then performing normalization, reLU and other operations to obtain a 64x64 feature map, and connecting 3 residual blocks behind the obtained feature map; 2 HourGlass stacking HourGlass modules are constructed for priori information extraction, and in order to effectively merge features across scales and retain spatial information of different scales, the HourGlass modules adopt a jump connection mechanism between symmetrical layers; the 1x1 convolution layer post-processes the obtained features, connecting the shared features to two separate 1x1 convolution layers to generate a heatmap and a resolution map>The formula is as follows:

Wherein the coarse reconstruction networkThe method comprises the following steps:

performing nonlinear mapping through 3 residual blocks to generate a feature map; reconstructing by using a feature map based on an attention mechanism, and passing through a 3X 3 convolution layer; finally, an ECA attention module consisting of 3 ECA modules is added behind the convolution layer to generate a relatively coarse high-resolution face imageThe formula is as follows:

wherein the network is finely reconstructedThe method comprises the following steps:

first, the feature map is formedAnd resolution map->Fusing the analysis chart and the feature chart to obtain a fused feature chart +.>The method comprises the steps of carrying out a first treatment on the surface of the The feature map is then->Inputting the characteristic image into a fine reconstruction network, firstly reducing the channel number of the characteristic image by using a 3X 3 convolution layer process, up-sampling the characteristic image by using a 4X 4 deconvolution layer, decoding the characteristic image by connecting 3 residual blocks, and then obtaining the characteristic image by using a 3X 3 convolution layer process; finally, the feature map is sent to an ECA attention module to obtain a final fine super-resolution face image +.>。

2. The face super-resolution method based on prior information and attention mechanism according to claim 1, wherein the loss function adopted in the face super-resolution network training in the step comprises:

pixel loss

In image super-resolution reconstruction, higher evaluation indexes such as PSNR and SSIM can be obtained by using the mean square error (mean square error, MSE) loss, but high-frequency texture information is lost, so that the image is excessively smooth, and in order to avoid the problem, L1 loss is used as a pixel loss function, and the method comprises the following steps of

Face priori loss of the second place

Total loss of

3. The face super-resolution method based on a priori information and attention mechanisms of claim 1, comprising the steps of:

s1, downloading an original image data set, wherein the original image data set comprises an original face image and an original face analysis chartAnd proceed to data processingInputting the original image after data processing into a downsampling model, processing to obtain a low-resolution image, performing bicubic upsampling on the low-resolution image to obtain an image with the same size as the high-resolution image as a low-resolution data set, and finally dividing the data set into a training set and a testing set;

S7, training set images obtained in the step S2Original high resolution image->Final result->Input into a pixel-by-pixel loss function, and generate a fine high resolution image by processing the pixel-by-pixel loss function>Calculating to obtain a loss functionThe method comprises the steps of carrying out a first treatment on the surface of the Resolution map obtained in S4->And an analytic map in the original image dataset +.>Input to pixel-by-pixel loss functionIn (c) calculating the loss function->The method comprises the steps of carrying out a first treatment on the surface of the Adding the above loss functions to obtain the total loss function +.>Continuously iterating to minimize the loss function, training, and finally generating a face super-resolution network model;