CN117315735A - Face super-resolution reconstruction method based on priori information and attention mechanism - Google Patents

Face super-resolution reconstruction method based on priori information and attention mechanism Download PDF

Info

Publication number
CN117315735A
CN117315735A CN202211528427.XA CN202211528427A CN117315735A CN 117315735 A CN117315735 A CN 117315735A CN 202211528427 A CN202211528427 A CN 202211528427A CN 117315735 A CN117315735 A CN 117315735A
Authority
CN
China
Prior art keywords
resolution
image
network
face
super
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211528427.XA
Other languages
Chinese (zh)
Inventor
端木春江
吴成红
叶靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Normal University CJNU
Original Assignee
Zhejiang Normal University CJNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Normal University CJNU filed Critical Zhejiang Normal University CJNU
Priority to CN202211528427.XA priority Critical patent/CN117315735A/en
Publication of CN117315735A publication Critical patent/CN117315735A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a face super-resolution reconstruction method based on priori information and an attention mechanism, wherein a model comprises the following steps: a shallow feature extraction network, a deep feature extraction network, a priori estimation network and a fine reconstruction network. The method comprises the following steps: firstly, inputting a low-resolution image, extracting shallow features of the image by using convolution, adding a residual block group and performing convolution operation to obtain shallow features, sending the obtained shallow features into a deep feature extraction network on one hand, sending the obtained shallow features into a priori estimation network on the other hand, sending the results of two branches into a fine reconstruction network, and outputting a final super-resolution reconstruction image. According to the invention, the face edge information and the face local analysis chart are used as priori information to be introduced into the face super-resolution reconstruction network, a high-efficiency channel attention mechanism is introduced into the network, the network can reconstruct a relatively clear face image, more facial features are provided, the complexity of the model is lower, and subjective evaluation and objective evaluation indexes are improved.

Description

Face super-resolution reconstruction method based on priori information and attention mechanism
Technical Field
The invention belongs to the technical field of image processing and super-resolution reconstruction of human faces, and particularly relates to a super-resolution amplification method of human face images based on facial priori information and an attention mechanism.
Background
The super-resolution reconstruction technology of the human face is a super-resolution technology aiming at a special structure of the human face, and aims to convert a low-resolution human face into a high-resolution human face through a certain technology. However, the face structure is special, unlike a usual image, the face structure has high-strength structural similarity and detail difference of identity information, the reconstruction difficulty is higher, the requirements are higher, in the reconstruction process, the consistency of geometric features is ensured, and the accurate recovery of texture information is also required to be noted. Therefore, face super-resolution reconstruction has a great challenge. The concept of super-resolution of a human face is firstly proposed by Baker and Kanada in 2000, and is a branch in the field of super-resolution of images, and super-resolution is performed specifically for a special scene of the human face. In recent years, the deep learning technology has been widely used in image processing, so that the face super-resolution field is also combined with the deep learning technology, and a new development stage is started from the face super-resolution field.
The face super-resolution technology based on deep learning can be divided into: the face super-resolution reconstruction method comprises the following steps of face super-resolution reconstruction based on interpolation, face super-resolution reconstruction based on reconstruction, face super-resolution method based on convolutional neural network and face super-resolution method based on countermeasure generation network. Dong et al propose the SRCNN model, which applies deep learning to image super-resolution for the first time. The SRCNN firstly uses bicubic interpolation to amplify the low-resolution image to the target size, then extracts image characteristics through a three-layer convolutional neural network, establishes a nonlinear mapping relation, and finally generates a high-resolution image, thereby greatly improving the reconstruction effect; huang D and Liu H propose an SRCNN-IBP algorithm based on SRCNN network, combine SRCNN network and iterative back projection algorithm (IBP), SRCNN-IBP algorithm can be regarded as introducing priori information of high-resolution image on the basis of SRCNN algorithm, so the quality of reconstructed image is superior to SRCNN algorithm, and meanwhile, it is explained that the priori information is important for super-resolution reconstruction of human face; the Leding et al uses a generated countermeasure network (GAN) to solve the super-resolution problem, and proposes a generated countermeasure network (SRGAN) based on image super-resolution, and uses a trained discriminator network to distinguish SR images from original real images; yuChen et al propose a face super-resolution reconstruction method FSRNet added with priori information, the method extracts face geometric information, and the result shows that the face recovery effect can be improved by utilizing face key points and a face analytic graph, but the generated face image has insufficient texture details, a complex model and a large amount of time.
Therefore, how to strengthen the recovery effect of priori information on the face, fully utilize the high-frequency characteristics and reduce redundant information, and provide a face super-resolution reconstruction method based on the priori information and the attention mechanism is a problem to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a face super-resolution reconstruction method based on prior information and an attention mechanism.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the human face super-resolution method based on the prior information and the attention mechanism is characterized by comprising the following steps of: a shallow feature extraction network, a deep feature extraction network, a priori estimation network and a fine reconstruction network;
the low-resolution face image feature extraction method comprises a low-layer feature extraction network for extracting the low-layer features of a face image, and a convolution layer for extracting features of the low-resolution face image, wherein the convolution layer can only extract preliminary features from the low-resolution image to generate a relatively coarse high-resolution face image
Deep feature extraction network for extracting deep features of human face and extracting coarse high-resolution human face imageInput to deep feature extraction network->Deep feature extraction and->Comprises a convolution kernel of 3×3, the step size is 2; through a batch normalization layer and a ReLU activation function, 12 residual blocks are passed; finally, obtaining the extracted 64-channel characteristic diagram +.>. The formula is as follows:
wherein,representing a coarse high resolution face image, < >>Representing the deep feature extraction network employed;
the prior estimation network adopts 7x7 convolution check firstPerforming convolution, then performing normalization, reLU and other operations to obtain a 64x64 feature map, and connecting 3 residual blocks behind the obtained feature map; 2 stacking HourGlass networks, namely HourGlass modules, are constructed, prior information extraction is carried out, and in order to effectively merge features across scales and reserve space information of different scales, the HourGlass modules adopt a jump connection mechanism between symmetrical layers; 1The x1 convolution layer post-processes the obtained features, connecting the shared features to two separate 1x1 convolution layers to generate a heatmap and a resolution map>. The formula is as follows:
wherein,representing a coarse high resolution face image, < >>Representing the adopted prior estimation network;
fine rebuilding network, first mapping the feature mapAnd resolution map->Fusing the analysis chart and the feature chart to obtain a fused feature chart +.>The method comprises the steps of carrying out a first treatment on the surface of the The feature map is then->Inputting the reduced characteristic image into a fine reconstruction network, and firstly reducing the channel number of the characteristic image by using a 3X 3 convolution layer process; up-sampling the feature map by a 4x 4 deconvolution layer, connecting 3 residual blocks to decode the feature, and processing by a 3 x 3 deconvolution layer to obtain the feature map; finally, the feature map is sent to an ECA attention module to obtain a final fine super-resolution face image +.>
Preferably, the network is a coarse reconstruction networkComprising the following steps:
performing nonlinear mapping through 3 residual blocks to generate a feature map; reconstructing by using a feature map based on an attention mechanism, and passing through a 3X 3 convolution layer; finally, an ECA attention module is added behind the convolution layer to generate a relatively coarse high-resolution face image. The formula is as follows:
wherein,representing a bicubic upsampled low resolution face image +.>Representing the adopted shallow feature extraction network;
preferably, the network is finely rebuiltComprising the following steps:
first, the feature map is formedAnd resolution map->Fusing the analysis graph and the feature graph to obtain a fused feature graphThe method comprises the steps of carrying out a first treatment on the surface of the The feature map is then->Inputting the characteristic image into a fine reconstruction network, firstly reducing the channel number of the characteristic image by using a 3X 3 convolution layer process, up-sampling the characteristic image by using a 4X 4 deconvolution layer, decoding the characteristic image by connecting 3 residual blocks, and then obtaining the characteristic image by using a 3X 3 convolution layer process; finally, willThe feature map is sent to an ECA attention module to obtain a final fine super-resolution face image +.>
Preferably, the loss function comprises:
pixel loss
In image super-resolution reconstruction, higher evaluation indexes such as PSNR and SSIM can be obtained usually by using a mean square error (mean square error, MSE) loss, but high-frequency texture information is usually lost, resulting in excessive smoothing of the image. In order to avoid the above problems, L1 loss is used as a pixel loss function, there are
Face priori loss of the second place
In order to restrict the estimation process of the prior information of the human face, the prior information of the human face is fully utilized, and the prior estimation network is optimized by using the prior loss of the human face, which comprises the following steps of
Total loss of
The model total loss function is weighted and combined to obtain the total loss function finally used for model training, namely
Wherein the loss function adopts a mean square error loss function,representing the total number of training set images, +.>Is the i-th high resolution image, +.>Is the corresponding i-th coarse high resolution restoration image,/or->The corresponding i-th processed fine high-resolution recovery image; />Representing the real face resolution map corresponding to the ith item,/->And representing a real face analytic graph obtained by the i-th image through the prior estimation network.
The human face super-resolution method based on the prior information and the attention mechanism comprises the following steps:
s1, downloading an original image data set, wherein the original image data set comprises an original face image and an original face analysis chartData processing is carried out, an original image after the data processing is input into a downsampling model, a low-resolution image is obtained through processing, then double-three upsampling is carried out on the low-resolution image, an image with the same size as a high-resolution image is obtained as a low-resolution data set, and finally the data set is divided into a training set and a testing set;
s2, inputting the image obtained in the S1 into a shallow feature extraction module to extract shallow features of the face image, and extracting features of the low-resolution face image by using a convolution layer, wherein the convolution layer only can extract outline features of the face image to obtain a rough high-resolution image
S3, the rough high-resolution image obtained in the S2 is processedInputting into deep feature extraction network for feature extraction to obtain feature map +.>
S4, carrying out rough high-resolution image obtained in S2Inputting into a priori estimation network, extracting priori information to obtain an analytic graph +.>Wherein the prior estimation network consists of ResNet and stacked hourglass networks;
s5, the feature map obtained in the S3 is processedAnd S4>Fusing the analysis chart and the feature chart to obtain a fused feature chart +.>
S6, the feature map obtained in the S5 is processedInputting the images into a fine reconstruction network for super-resolution reconstruction to obtain a final fine reconstruction face image +.>
S7, training set images obtained in the step S2Original high resolution image->Final result->Input into a pixel-by-pixel loss function, and generate a fine high resolution image by processing the pixel-by-pixel loss function>Calculating to obtain loss function->The method comprises the steps of carrying out a first treatment on the surface of the Resolution map obtained in S4->And an analytic map in the original image dataset +.>Input into the pixel-by-pixel loss function, calculate the loss function +.>The method comprises the steps of carrying out a first treatment on the surface of the Adding the above loss functions to obtain the total loss function +.>Continuously iterating to minimize the loss function, training, and finally generating a face super-resolution network model;
s8, setting super parameters of the face super-resolution network model, inputting the preprocessed test set of S1 into the face super-resolution network model, and finally generating a high-resolution face image with clear detail texture and better effect through residual network processing and loss function minimization iteration.
Compared with the prior art, the invention has the beneficial effects that:
(1) In order to improve the capability of recovering edge information of a network, the invention adds the prior information of the face, takes the partial analysis chart of the face and the face as the prior information constraint of the network, respectively fuses the analysis chart and the characteristic chart corresponding to different face components, increases the guidance effect of the analysis chart on the super resolution of the face image, more effectively utilizes the extracted useful characteristics, improves the reconstruction efficiency, strengthens the reconstruction effect and reconstructs finer face geometric information;
(2) According to the invention, the high-efficiency attention module ECA is added after the network is finely reconstructed, so that the utilization effect of the network on the characteristic information is improved, the network can learn purposefully, the characteristic channel information is adaptively adjusted, the expression capacity of the characteristics is enhanced, more details such as contour textures are recovered, the human eye perception effect of a human face image is improved, and the subjective evaluation and objective evaluation standards are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of the overall structure of a super-resolution network used in a face super-resolution reconstruction method based on prior information and an attention mechanism;
FIG. 2 is a schematic diagram of a shallow feature extraction network structure used in a face super-resolution reconstruction method based on prior information and an attention mechanism; where conv denotes the convolutional layer operation and Res denotes the convolutional layer operation with residual.
FIG. 3 is a schematic diagram of a deep feature extraction network structure used in a face super-resolution reconstruction method based on prior information and an attention mechanism;
FIG. 4 is a schematic diagram of a prior estimation network used in a face super-resolution reconstruction method based on prior information and an attention mechanism; wherein HourGlass represents a stacked HourGlass network module;
FIG. 5 is a schematic diagram of a fine reconstruction network used in a face super-resolution reconstruction method based on prior information and an attention mechanism of the present invention;
FIG. 6 is a schematic diagram of a stacked hourglass network used in a face super-resolution reconstruction method based on prior information and an attention mechanism of the present invention;
FIG. 7 is a schematic diagram of an efficient channel attention network used in a face super-resolution reconstruction method based on prior information and an attention mechanism;
FIG. 8 is a schematic view of a partial image of a CelebA Mask-HQ dataset used in a face super-resolution reconstruction method based on prior information and attention mechanisms of the present invention, wherein only the image of the lower part of the eyes in the face is shown;
FIG. 9 is a diagram of a super-resolution image of a face generated by the present invention in comparison with other networks; wherein outer represents the proposed method of the invention;
fig. 10 is an enlarged detail contrast diagram of the face super-resolution image generated by the present invention and other networks.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides a face super-resolution reconstruction method based on priori information and an attention mechanism, which is shown in fig. 1 and comprises the following steps: a shallow feature extraction network, a deep feature extraction network, a priori estimation network and a fine reconstruction network;
the low-resolution face image feature extraction method comprises a low-layer feature extraction network for extracting the low-layer features of a face image, and a convolution layer for extracting features of the low-resolution face image, wherein the convolution layer can only extract preliminary features from the low-resolution image to generate a relatively coarse high-resolution face image
Deep feature extraction network for extracting deep features of human face and extracting coarse high-resolution human face imageInput to deep feature extraction network->Deep feature extractionGet (get)>Comprises a convolution kernel of 3×3, the step size is 2; through a batch normalization layer and a ReLU activation function, 12 residual blocks are passed; finally, obtaining the extracted 64-channel characteristic diagram +.>. The formula is as follows:
wherein,representing a coarse high resolution face image, < >>Representing the deep feature extraction network employed;
the prior estimation network adopts 7x7 convolution check firstPerforming convolution, then performing normalization, reLU and other operations to obtain a 64x64 feature map, and connecting 3 residual blocks behind the obtained feature map; 2 stacking HourGlass modules, namely HourGlass modules, are constructed, prior information extraction is carried out, and in order to effectively merge features across scales and retain spatial information of different scales, the HourGlass modules adopt a jump connection mechanism between symmetrical layers; the 1x1 convolution layer post-processes the obtained features, connecting the shared features to two separate 1x1 convolution layers to generate a heatmap and a resolution map>. The formula is as follows:
wherein,representing a coarse high resolution face image, < >>Representing the adopted prior estimation network;
fine rebuilding network, first mapping the feature mapAnd resolution map->Fusing the analysis chart and the feature chart to obtain a fused feature chart +.>The method comprises the steps of carrying out a first treatment on the surface of the The feature map is then->Inputting the reduced characteristic image into a fine reconstruction network, and firstly reducing the channel number of the characteristic image by using a 3X 3 convolution layer process; up-sampling the feature map by a 4x 4 deconvolution layer, connecting 3 residual blocks to decode the feature, and processing by a 3 x 3 deconvolution layer to obtain the feature map; finally, the feature map is sent to an ECA attention module to obtain a final fine super-resolution face image +.>
It should be noted that: the super-resolution reconstruction of the human face is a super-resolution technology aiming at a special structure of the human face, and aims to convert a low-resolution human face into a high-resolution human face through a certain technology. However, the face structure is special, the reconstruction difficulty is higher, the requirements are higher, and in the reconstruction process, the consistency of geometric features is ensured, and the accurate recovery of texture information is also required to be noted. However, experiments prove that the addition of the prior information only cannot generate an ideal face output result, and the key is how to establish a super-resolution method for simultaneously improving the human eye perception effect and the objective evaluation standard according to the structure of the face. Therefore, the invention provides the facial super-resolution method based on the prior information and the attention mechanism, the addition of the prior information aims at fusing the analysis graphs and the feature graphs corresponding to different facial components, the extracted useful features are more effectively utilized, the reconstruction efficiency is improved, and finer facial geometric information is reconstructed; the addition of the high-efficiency attention module ECA improves the utilization effect of the network on the characteristic information, enables the network to learn purposefully, adjusts the characteristic channel information in a self-adaptive mode, enhances the expression capacity of the characteristics, is beneficial to recovering more details such as contour textures and the like, and improves the human eye perception effect of the face image.
In order to further implement the above technical solution, a coarse reconstruction networkComprising the following steps:
performing nonlinear mapping through 3 residual blocks to generate a feature map; reconstructing by using a feature map based on an attention mechanism, and passing through a 3X 3 convolution layer; finally, an ECA attention module consisting of 3 ECA modules is added behind the convolution layer to generate a relatively coarse high-resolution face image. The formula is as follows:
wherein,representing a bicubic upsampled low resolution face image +.>Representing the adopted shallow feature extraction network;
in order to further implement the above technical solution, a network is finely rebuiltComprising the following steps:
first, the feature map is formedAnd resolution map->Fusing the analysis graph and the feature graph to obtain a fused feature graphThe method comprises the steps of carrying out a first treatment on the surface of the The feature map is then->Inputting the characteristic image into a fine reconstruction network, firstly reducing the channel number of the characteristic image by using a 3X 3 convolution layer process, up-sampling the characteristic image by using a 4X 4 deconvolution layer, decoding the characteristic image by connecting 3 residual blocks, and then obtaining the characteristic image by using a 3X 3 convolution layer process; finally, the feature map is sent to an ECA attention module to obtain a final fine super-resolution face image +.>
In order to further implement the above technical solution, the loss function includes:
pixel loss
In image super-resolution reconstruction, higher evaluation indexes such as PSNR and SSIM can be obtained usually by using a mean square error (mean square error, MSE) loss, but high-frequency texture information is usually lost, resulting in excessive smoothing of the image. In order to avoid the above problems, L1 loss is used as a pixel loss function, there are
Face priori loss of the second place
In order to restrict the estimation process of the prior information of the human face, the prior information of the human face is fully utilized, and the prior estimation network is optimized by using the prior loss of the human face, which comprises the following steps of
Total loss of
The model total loss function is weighted and combined to obtain the total loss function finally used for model training, namely
Wherein the loss function adopts a mean square error loss function,representing the total number of training set images, +.>Is the i-th high resolution image, +.>Is the corresponding i-th coarse high resolution restoration image,/or->The corresponding i-th processed fine high-resolution recovery image; />Representing the real face resolution map corresponding to the ith item,/->And representing a real face analytic graph obtained by the i-th image through the prior estimation network.
It should be noted that:
because the network is trained end to end, the three losses are matched with the respective weights to be added to be the total loss function of the face super-resolution network. Inputting the training set image, the original high-resolution image, the original analytic image, the analytic image extracted through the network and the final result into a pixel-by-pixel loss function, generating the high-resolution image through processing the pixel-by-pixel loss function, continuously iterating to minimize the loss function, obtaining a set of weight parameters which minimize the total loss function, taking the set of parameters as trained model parameters, and obtaining the trained face super-resolution model.
The human face super-resolution method based on the prior information and the attention mechanism comprises the following steps:
s1, downloading an original image data set, wherein the original image data set comprises an original face image and an original face analysis chartData processing is carried out, an original image after the data processing is input into a downsampling model, a low-resolution image is obtained through processing, then double-three upsampling is carried out on the low-resolution image, an image with the same size as a high-resolution image is obtained as a low-resolution data set, and finally the data set is divided into a training set and a testing set;
s2, inputting the image obtained in the S1 into a shallow feature extraction module to extract shallow features of the face image, and extracting features of the low-resolution face image by using a convolution layer, wherein the convolution layer only can extract outline features of the face image to obtain a rough high-resolution image
S3, the rough high-resolution image obtained in the S2 is processedInputting into deep feature extraction network for feature extraction to obtain feature map +.>
S4, carrying out rough high-resolution image obtained in S2Inputting into a priori estimation network, extracting priori information to obtain an analytic graph +.>Wherein the prior estimation network consists of ResNet and stacked hourglass networks;
s5, the feature map obtained in the S3 is processedAnd S4>Fusing the analysis chart and the feature chart to obtain a fused feature chart +.>
S6, the feature map obtained in the S5 is processedInputting the images into a fine reconstruction network for super-resolution reconstruction to obtain a final fine reconstruction face image +.>
S7, training set images obtained in the step S2Original high resolution image->Final result->Input into a pixel-by-pixel loss function, and generate a fine high resolution image by processing the pixel-by-pixel loss function>Calculating to obtain loss function->The method comprises the steps of carrying out a first treatment on the surface of the Resolution map obtained in S4->And an analytic map in the original image dataset +.>Input into the pixel-by-pixel loss function, calculate the loss function +.>The method comprises the steps of carrying out a first treatment on the surface of the Adding the above loss functions to obtain the total loss function +.>Continuously iterating to minimize the loss function, training, and finally generating a face super-resolution network model;
s8, setting super parameters of the face super-resolution network model, inputting the preprocessed test set of S1 into the face super-resolution network model, and finally generating a high-resolution face image with clear detail texture and better effect through residual network processing and loss function minimization iteration.
The invention will be further illustrated by the following specific experiments:
1. data set
The Celeb dataset is a large-scale face detection reference dataset of the university of hong Kong Chinese. It contains 202599 face pictures of 10177 celebrities, and the images in this dataset cover large pose changes and background clutter. Each image has 40 attribute notes, such as whether to wear glasses, long and short hair, nose, lips, color, gender and the like, and the data set is marked by gender to distinguish the gender of the human face, wherein the images comprise 118165 human face pictures of females and 138704 human face pictures of males.
CelebA Mask-HQ is a high-quality face attribute segmentation image of CelebA, and a total of 30000 high-definition face images with 1024 multiplied by 1024 sizes are obtained. For CelebA Mask-HQ, we randomly selected 17000 pictures for training, and the rest 13000 images were used for testing; for the Helen dataset we randomly selected 1200 pictures for training and the remaining 400 pictures for testing.
2. Training details
We cut the training image roughly from the face area, without any pre-alignment to 128 x 128, using a color image for training. The low resolution image is first bi-cubic interpolated to the high resolution image size and then trained, using RMSprop algorithm (root mean square prop) training model with initial learning rate of 2.5 x 10-4 and minimum batch of 14. For face images of CelebA Mask-HQ, we resize them to 128×128 as the original real image.
3. Analysis of results
A. Quantitative analysis
The previous method often ignores the face detail recovery so as to sacrifice part of the quality of the face detail to improve the face reconstruction effect, and the method has a great influence on the whole quality of the image or is unfavorable for the input of the image as a task of the next stage. For the image super-resolution reconstruction task, the invention adopts peak signal-to-noise ratio PSNR and structural similarity SSIM as indexes for evaluating SR performance. The face super-resolution network of the invention applies the super-resolution reconstruction algorithm to the face image task for the first time, and performs bicubic interpolation downsampled image size conversion operation on the network input image, so that the face super-resolution network cannot directly carry out numerical comparison with other face super-resolution models. For fair comparison, the invention performs downsampling processing of input images on several public super-resolution models, and selects URDGN algorithm and FSRNet super-resolution model for the public face super-resolution model; for the super-resolution reconstruction model, the invention selects three reconstruction algorithms of SRCNN algorithm, EDSR algorithm and Bicubic for comparison. After the models are debugged to the optimal state, each super-resolution model is respectively embedded into the face super-resolution model, training is carried out on CelebA Mask-HQ and Helen data sets one by one, and the larger PSNR and SSIM are, the better. Data indexes describing super-resolution performance when the magnification factor is x 8 on the CelebA Mask-HQ and Helen datasets are shown in Table 1.
Table 1 shows the index describing the super-resolution performance at a magnification factor of 8 on the CelebA Mask-HQ and Helen datasets, PSNR/SSIM, respectively, with the optimal results bolded.
TABLE 1
It should be noted that, according to the data in table 1, it can be found that the model of the present invention has significant performance improvement compared with other methods, and in SR performance, the method of the present invention has a 0.43dB improvement and 0.34dB improvement over the inferior method PSNR on the CelebA Mask-HQ and the Helen dataset, and a 0.01 improvement over the CelebA Mask-HQ on the ssim.
B. Qualitative analysis
A face image is selected from the test set, the image reconstruction effect of the improved algorithm is compared with that of the original algorithm, the effect is shown in figure 9, the reconstructed image of the original algorithm can be found to have obvious distortion in the areas such as eyes, lips and the like, the improved algorithm improves the image quality of the areas, the distortion of the face is obviously reduced, and the improved algorithm can prove that the improved algorithm has strong technical support for improving the quality of the reconstructed face image and reducing the effectiveness of the distortion of the reconstructed image and distinguishing the face.
Face images are selected from the test set, face image reconstruction is carried out on the face images by using various algorithms, the overall face image reconstruction effect and the local amplification reconstruction effect are shown as shown in fig. 10, the reconstructed face images obtained by the Bicubic algorithm through an interpolation method ignore many detail information, the images are too fuzzy, the SRCNN algorithm uses a convolutional neural network structure, the reconstruction effect is improved in structural similarity relative to the interpolation method, the recovery effect of the EDSR algorithm is relatively improved obviously, the result distortion of the URDGN recovery is obvious, the FSRNet algorithm uses a more complex network structure, the reconstructed image quality is improved, but the images are limited by parts and are too smooth. The invention improves the perception quality, the image has the perception effect which is more in line with the human eyes, the image has the texture which is more similar to the high-resolution image, and the PSNR and the SSIM are improved.
4. Ablation experiments
(1) Influence of attention module
To verify the role of the attention mechanism, the network was split into 2 comparative networks, one being the base network with the attention module added and the other with the base network with the attention module removed, and the observations retrained for each network as shown in table 2.
TABLE 2
The observation of the table can lead to the conclusion that each data index shows a better result along with the increase of the attention module, which proves the effectiveness of the attention mechanism on the face super-resolution reconstruction task. It is worth noting that from experimental data, it can be found that the effect of each item of data is slowed down as attention increases, since attention has collected enough features. At the same time, the number of attentiveness modules ECA necessarily affects the load capacity of the network, considering the trade-off between performance and calculation, it is recommended to select the number of ECAs according to the characteristics of the dataset itself. In qualitative and quantitative experimental analysis, the present invention selects the best performing ECA number of 3 networks to compare with other methods. In order to reduce training costs, the invention selects ECA number 3 for the study of other ablation experiments.
(2) Influence of attention mechanisms and a priori information
To verify the effect of the attention mechanism and the a priori information, the network was split into 2 comparison networks, one being the base network to which the attention module and the a priori information were added, the other being the base network from which the attention module and the a priori information were removed, and the observations were retrained for each network, with the results shown in table 3.
TABLE 3 Table 3
The observation of the table can lead to the conclusion that each data index shows a better result along with the increase of the attention module, which proves the effectiveness of the attention mechanism and the prior information on the face super-resolution reconstruction task. It is worth noting that from experimental data it can be found that the effect of each item of data is slowed down as the attention and a priori information increases, since a sufficient number of features have been acquired by the a priori information and attention module. At the same time, the number of attentiveness modules ECA and the amount of network load that the extraction of a priori information necessarily affects, considering the trade-off between performance and calculation, it is recommended to select the number of ECAs according to the characteristics of the dataset itself. In qualitative and quantitative experimental analysis, the present invention selects the best performing ECA number 1 network to compare with other methods. In order to reduce training costs, the invention selects ECA number of 1 for the study of other ablation experiments.
(3) Influence of the number of stacked hourglass networks
To verify the effect of stacking the hourglass networks, we studied a priori estimating the effect of the number of stacked hourglass networks in the network on network performance, retraining each network, and observing the results shown in table 4.
TABLE 4 Table 4
The observation of the table can lead to the conclusion that each data index shows better results with the increase of the number of HourGlass, which proves the effectiveness of the number of HourGlass on the task of face super-resolution reconstruction. It is worth noting that from experimental data it can be found that as the number of HourGlass increases, the effect of each item of data is slowed down, since a priori information has already acquired enough features. Meanwhile, the number of HourGlass inevitably affects the load capacity of the network, and considering the trade-off between performance and calculation amount, it is suggested to select the number of HourGlass according to the characteristics of the data set itself. Therefore, we finally select the number of HourGlass to be 2 for experiments, and obtain good reconstruction effect.
In summary, the invention provides a face super-resolution reconstruction method based on prior information and an attention mechanism to recover finer face images. The method comprises the steps of inputting a low-resolution image, extracting shallow features of the image by using convolution, adding a residual block group and performing convolution operation to obtain shallow features, sending the obtained shallow features into a deep feature extraction network, sending the obtained shallow features into a priori estimation network, adding a high-efficiency channel attention module in the middle of the deep feature extraction network, improving indexes and visual effects of face image restoration, and finally sending results of two branches into a fine reconstruction network to output satisfactory super-resolution reconstructed images. Is validated on the common dataset and proves to be superior to the partial method.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (3)

1. The human face super-resolution method based on the prior information and the attention mechanism is characterized by comprising the following steps of: a shallow feature extraction network, a deep feature extraction network, a priori estimation network and a fine reconstruction network;
the low-resolution face image feature extraction method comprises a low-layer feature extraction network for extracting the low-layer features of a face image, and a convolution layer for extracting features of the low-resolution face image, wherein the convolution layer can only extract preliminary features from the low-resolution image to generate a relatively coarse high-resolution face image
Deep feature extraction network for extracting deep features of human face and extracting coarse high-resolution human face imageInput to deep feature extraction network->Deep feature extraction and->Comprises a convolution kernel of 3×3, the step size is 2; through a batch normalization layer and a ReLU activation function, 12 residual blocks are passed; finally, obtaining the extracted 64-channel characteristic diagram +.>The formula is as follows:
wherein,representing a coarse high resolution face image, < >>Representing the deep feature extraction network employed;
the prior estimation network adopts 7x7 convolution check firstPerforming convolution, then performing normalization, reLU and other operations to obtain a 64x64 feature map, and connecting 3 residual blocks behind the obtained feature map; 2 HourGlass stacking HourGlass modules are constructed for priori information extraction, and in order to effectively merge features across scales and retain spatial information of different scales, the HourGlass modules adopt a jump connection mechanism between symmetrical layers; the 1x1 convolution layer post-processes the obtained features, connecting the shared features to two separate 1x1 convolution layers to generate a heatmap and a resolution map>The formula is as follows:
wherein,representing a coarse high resolution face image, < >>Representing the adopted prior estimation network;
fine rebuilding network, first mapping the feature mapAnd resolution map->Fusing the analysis chart and the feature chart to obtain a fused feature chart +.>The method comprises the steps of carrying out a first treatment on the surface of the The feature map is then->Inputting the reduced characteristic image into a fine reconstruction network, and firstly reducing the channel number of the characteristic image by using a 3X 3 convolution layer process; up-sampling the feature map by a 4x 4 deconvolution layer, connecting 3 residual blocks to decode the feature, and processing by a 3 x 3 deconvolution layer to obtain the feature map; finally, the feature map is sent to an ECA attention module to obtain a final fine super-resolution face image +.>
Wherein the coarse reconstruction networkThe method comprises the following steps:
performing nonlinear mapping through 3 residual blocks to generate a feature map; reconstructing by using a feature map based on an attention mechanism, and passing through a 3X 3 convolution layer; finally, an ECA attention module consisting of 3 ECA modules is added behind the convolution layer to generate a relatively coarse high-resolution face imageThe formula is as follows:
wherein,representing a bicubic upsampled low resolution face image +.>Representing the adopted shallow feature extraction network;
wherein the network is finely reconstructedThe method comprises the following steps:
first, the feature map is formedAnd resolution map->Fusing the analysis chart and the feature chart to obtain a fused feature chart +.>The method comprises the steps of carrying out a first treatment on the surface of the The feature map is then->Inputting the characteristic image into a fine reconstruction network, firstly reducing the channel number of the characteristic image by using a 3X 3 convolution layer process, up-sampling the characteristic image by using a 4X 4 deconvolution layer, decoding the characteristic image by connecting 3 residual blocks, and then obtaining the characteristic image by using a 3X 3 convolution layer process; finally, the feature map is sent to an ECA attention module to obtain a final fine super-resolution face image +.>
2. The face super-resolution method based on prior information and attention mechanism according to claim 1, wherein the loss function adopted in the face super-resolution network training in the step comprises:
pixel loss
In image super-resolution reconstruction, higher evaluation indexes such as PSNR and SSIM can be obtained by using the mean square error (mean square error, MSE) loss, but high-frequency texture information is lost, so that the image is excessively smooth, and in order to avoid the problem, L1 loss is used as a pixel loss function, and the method comprises the following steps of
Face priori loss of the second place
In order to restrict the estimation process of the prior information of the human face, the prior information of the human face is fully utilized, and the prior estimation network is optimized by using the prior loss of the human face, which comprises the following steps of
Total loss of
The model total loss function is weighted and combined to obtain the total loss function finally used for model training, namely
Wherein the loss function adopts a mean square error loss function,representing the total number of training set images, +.>Is the i-th high resolution image, +.>Is the corresponding i-th coarse high resolution restoration image,/or->The corresponding i-th processed fine high-resolution recovery image; />Representing the real face resolution map corresponding to the ith item,/->And representing a real face analytic graph obtained by the i-th image through the prior estimation network.
3. The face super-resolution method based on a priori information and attention mechanisms of claim 1, comprising the steps of:
s1, downloading an original image data set, wherein the original image data set comprises an original face image and an original face analysis chartAnd proceed to data processingInputting the original image after data processing into a downsampling model, processing to obtain a low-resolution image, performing bicubic upsampling on the low-resolution image to obtain an image with the same size as the high-resolution image as a low-resolution data set, and finally dividing the data set into a training set and a testing set;
s2, inputting the image obtained in the S1 into a shallow feature extraction module to extract shallow features of the face image, and extracting features of the low-resolution face image by using a convolution layer, wherein the convolution layer only can extract outline features of the face image to obtain a rough high-resolution image
S3, the rough high-resolution image obtained in the S2 is processedInputting into deep feature extraction network for feature extraction to obtain feature map +.>
S4, carrying out rough high-resolution image obtained in S2Inputting into a priori estimation network, extracting priori information to obtain an analytic graph +.>Wherein the prior estimation network consists of ResNet and stacked hourglass networks;
s5, the feature map obtained in the S3 is processedAnd S4>Fusing the analysis chart and the feature chart to obtain a fused feature chart +.>
S6, the feature map obtained in the S5 is processedInputting the images into a fine reconstruction network for super-resolution reconstruction to obtain a final fine reconstruction face image +.>
S7, training set images obtained in the step S2Original high resolution image->Final result->Input into a pixel-by-pixel loss function, and generate a fine high resolution image by processing the pixel-by-pixel loss function>Calculating to obtain a loss functionThe method comprises the steps of carrying out a first treatment on the surface of the Resolution map obtained in S4->And an analytic map in the original image dataset +.>Input to pixel-by-pixel loss functionIn (c) calculating the loss function->The method comprises the steps of carrying out a first treatment on the surface of the Adding the above loss functions to obtain the total loss function +.>Continuously iterating to minimize the loss function, training, and finally generating a face super-resolution network model;
s8, setting super parameters of the face super-resolution network model, inputting the preprocessed test set of S1 into the face super-resolution network model, and finally generating a high-resolution face image with clear detail texture and better effect through residual network processing and loss function minimization iteration.
CN202211528427.XA 2022-12-01 2022-12-01 Face super-resolution reconstruction method based on priori information and attention mechanism Pending CN117315735A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211528427.XA CN117315735A (en) 2022-12-01 2022-12-01 Face super-resolution reconstruction method based on priori information and attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211528427.XA CN117315735A (en) 2022-12-01 2022-12-01 Face super-resolution reconstruction method based on priori information and attention mechanism

Publications (1)

Publication Number Publication Date
CN117315735A true CN117315735A (en) 2023-12-29

Family

ID=89248630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211528427.XA Pending CN117315735A (en) 2022-12-01 2022-12-01 Face super-resolution reconstruction method based on priori information and attention mechanism

Country Status (1)

Country Link
CN (1) CN117315735A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649343A (en) * 2024-01-29 2024-03-05 北京航空航天大学 Data uncertainty generation method and system based on conditional variation self-encoder
CN118333860A (en) * 2024-06-12 2024-07-12 济南大学 Residual enhancement type frequency space mutual learning face super-resolution method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649343A (en) * 2024-01-29 2024-03-05 北京航空航天大学 Data uncertainty generation method and system based on conditional variation self-encoder
CN117649343B (en) * 2024-01-29 2024-04-12 北京航空航天大学 Data uncertainty generation method and system based on conditional variation self-encoder
CN118333860A (en) * 2024-06-12 2024-07-12 济南大学 Residual enhancement type frequency space mutual learning face super-resolution method
CN118333860B (en) * 2024-06-12 2024-08-20 济南大学 Residual enhancement type frequency space mutual learning face super-resolution method

Similar Documents

Publication Publication Date Title
CN110570353B (en) Super-resolution reconstruction method for generating single image of countermeasure network by dense connection
CN112734646B (en) Image super-resolution reconstruction method based on feature channel division
Engin et al. Cycle-dehaze: Enhanced cyclegan for single image dehazing
CN112750082B (en) Human face super-resolution method and system based on fusion attention mechanism
CN106952228B (en) Super-resolution reconstruction method of single image based on image non-local self-similarity
CN107123089B (en) Remote sensing image super-resolution reconstruction method and system based on depth convolution network
CN117315735A (en) Face super-resolution reconstruction method based on priori information and attention mechanism
CN110580680B (en) Face super-resolution method and device based on combined learning
CN110889895A (en) Face video super-resolution reconstruction method fusing single-frame reconstruction network
CN103020898B (en) Sequence iris image super resolution ratio reconstruction method
CN113298718A (en) Single image super-resolution reconstruction method and system
Zheng et al. T-net: Deep stacked scale-iteration network for image dehazing
Yang et al. Image super-resolution based on deep neural network of multiple attention mechanism
CN116402691B (en) Image super-resolution method and system based on rapid image feature stitching
CN115393186A (en) Face image super-resolution reconstruction method, system, device and medium
Yang et al. A survey of super-resolution based on deep learning
CN115511708A (en) Depth map super-resolution method and system based on uncertainty perception feature transmission
CN117575915A (en) Image super-resolution reconstruction method, terminal equipment and storage medium
CN116681592A (en) Image super-resolution method based on multi-scale self-adaptive non-local attention network
Liu et al. Facial image inpainting using multi-level generative network
CN116485654A (en) Lightweight single-image super-resolution reconstruction method combining convolutional neural network and transducer
CN117078516A (en) Mine image super-resolution reconstruction method based on residual mixed attention
CN116563111A (en) Image amplification method based on depth recursion residual error channel attention
Zhang et al. Bilateral upsampling network for single image super-resolution with arbitrary scaling factors
CN104123707A (en) Local rank priori based single-image super-resolution reconstruction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination