CN113658040A - Face super-resolution method based on prior information and attention fusion mechanism - Google Patents

Face super-resolution method based on prior information and attention fusion mechanism Download PDF

Info

Publication number
CN113658040A
CN113658040A CN202110794066.2A CN202110794066A CN113658040A CN 113658040 A CN113658040 A CN 113658040A CN 202110794066 A CN202110794066 A CN 202110794066A CN 113658040 A CN113658040 A CN 113658040A
Authority
CN
China
Prior art keywords
resolution
image
network
feature
face
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110794066.2A
Other languages
Chinese (zh)
Other versions
CN113658040B (en
Inventor
张九龙
马仲杰
屈小娥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hai Bai Sichuan Science And Technology Co ltd
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN202110794066.2A priority Critical patent/CN113658040B/en
Publication of CN113658040A publication Critical patent/CN113658040A/en
Application granted granted Critical
Publication of CN113658040B publication Critical patent/CN113658040B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • G06T3/4076Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution using the original low-resolution images to iteratively correct the high-resolution images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a face super-resolution method based on prior information and an attention fusion mechanism, which comprises the steps of constructing a training set and a test set; inputting the training set into a rough super-resolution network for processing to obtain ISR1(ii) a Will ISR1Respectively inputting the data into an encoder network and a priori information extraction network to obtain a characteristic graph f and an analytic graph p; inputting the feature map f and the analytic map p into a feature fusion network to be fused to obtain fFusion(ii) a Will f isFusionInputting the data into a decoder network for decoding to obtain a final result ISR(ii) a Constructing joint loss functions without iterationGenerating a super-resolution network model through training instead of minimizing the loss function; the invention solves the problem of insufficient use of face prior information, fuses the feature map and the analysis map by utilizing an attention mechanism, respectively fuses the analysis map and the feature map corresponding to different face components, increases the guidance function of the analysis map on the super resolution of the face image, improves the reconstruction efficiency and strengthens the reconstruction effect.

Description

Face super-resolution method based on prior information and attention fusion mechanism
Technical Field
The invention belongs to the technical field of digital image processing methods, and relates to a face super-resolution method based on prior information and an attention fusion mechanism.
Background
Image super-resolution is a very important research problem in the fields of computer vision and image processing, and the application of image super-resolution reconstruction technology to face images is called 'face illusion (Hallucination)' or face super-resolution (SR), and is a super-resolution problem specific to the field of face images. In many practical cases, the face image is always of low quality, limited by the physical imaging system and some human factors. The images are often low in resolution and poor in identifiability, and block communication, criminal investigation and case solving, safety enhancement and the like, so that the face super-resolution has important research significance. With the development of the deep learning technology, the face image super-resolution method based on deep learning obtains good effect. At present, the mainstream face super-resolution method based on deep learning includes the following types: the super-resolution method based on the CNN network, the super-resolution method based on the GAN network, the super-resolution method based on reinforcement learning, the super-resolution method based on ensemble learning, and the face super-resolution method based on prior information guidance.
The four methods treat the face image as a universal image, and the super-resolution algorithm of the universal image is also suitable for super-resolution of the face image as a whole. On the other hand, the face image has the characteristics of strong structurality and obvious characteristics. For example, face prior information such as facial landmarks, face resolution maps, and facial heatmaps, and thus more accurate methods can be designed specifically. The universal face super-resolution method ignores face prior information and generates a face image with a fuzzy face structure. Most of the existing networks extract image features through convolution operation, and an equalization processing method is adopted for each channel and position feature, so that each feature has different importance degrees, and the equalization processing causes the network to spend much computing resources on the unimportant features.
Disclosure of Invention
The invention aims to provide a face super-resolution method based on prior information and an attention fusion mechanism, which solves the problem of insufficient use of the prior information of a face in the prior art, and effectively improves the quality of face image super-resolution reconstruction, including PSNR and SSIM.
The technical scheme adopted by the invention is that a face super-resolution method based on prior information and attention fusion mechanism is implemented according to the following steps:
step 1, making an original image data set and performing data enhancement, then inputting a face image subjected to data enhancement processing into a degradation model to be processed to obtain a low-resolution image data set, performing bicubic up-sampling on the low-resolution image to obtain an image with the same size as a high-resolution image as the low-resolution data set, and finally dividing the data set into a training set and a testing set;
step 2, inputting the image obtained in the step 1 into a rough super-resolution network for processing to obtain an image I after rough super-resolution processingSR1
Step 3, training set image I obtained in step 2SR1Inputting the data into a coder network for feature extraction to obtain a feature map f;
step 4, the image I obtained in the step 2 is processedSR1Inputting the prior information into a prior information extraction network to extract the prior information to obtain an analytic graph p, wherein the prior information extraction network consists of ResNet and a stacked hourglass network;
step 5, inputting the feature diagram f obtained in the step 3 and the analysis diagram p obtained in the step 4 into a feature fusion network for fusion of the analysis diagram and the feature diagram to obtain a fused feature diagram fFusion
Step 6, the characteristic diagram f obtained in the step 5 is processedFusionInputting the data into a decoder network for decoding to obtain a final super-resolution processing result ISR
Step 7, the I obtained in the step 2SR1And the original image is input into a pixel-by-pixel loss function to obtain l1The analytic graph p obtained in the step 4 and the original data set are usedThe analytic graph p-in is input into a pixel-by-pixel loss function to be calculated to obtain l2And (4) obtaining a final result I obtained in the step 6SRAnd the original image is input into a pixel-by-pixel loss function to be calculated to obtain l3Adding the above loss functions to obtain Ltotal. Continuously iterating to minimize the loss function, and finally generating a super-resolution network model after training;
and 8, setting hyper-parameters of the super-resolution network model, inputting the preprocessed test set image in the step 1 into the super-resolution network model, and finally generating a high-resolution face image with clear detail texture and better effect through residual network processing and loss function minimized iteration.
The present invention is also characterized in that,
the step 1 specifically comprises the following steps:
step 1.1, downloading a CelebAMask-HQ data set, wherein a total amount of 30000 high-definition face images of 1024x1024 are obtained, and cutting the images into 128x128 images by using a resize function of matlab as the size of an original image, so that the calculation amount is reduced.
And step 1.2, carrying out mirror image turning on all images in the data set to obtain 60000 human face images and obtain a human face data set with enhanced data.
And step 1.3, performing degradation processing on the data set obtained in the step 1.2, inputting all images in the data set into a prepared degradation model in advance to generate a corresponding low-resolution face image, and simulating a degradation process in reality.
The degradation function is particularly complex and the super-resolution is difficult because many factors (including blur, noise, etc.) in the actual environment can reduce the resolution of the image. Therefore, in the existing super-resolution technology research, the degradation process is simplified, only blurring, down-sampling and noise are considered, as shown in formula 1,
Figure BDA0003162154820000041
wherein k represents a fuzzy core, which means that the fuzzy core performs convolution operation on the high-resolution face image, ↓isdownsampling operation, s represents a downsampling factor, and n represents noise. Thus, the degradation process can be described as blurring the high resolution face image, then 8 times down-sampling the blurred image, and then adding noise to the resulting image to obtain a degraded low resolution face image with a size of 16x 16.
Step 1.4, carrying out double-thrice upsampling operation on the low-resolution face image obtained in the step 1.3 to obtain a low-resolution face image I with the size consistent with that of the original imageLRAnd the size is 128x 128.
Step 1.5, according to 6: 2: 2 divide the data set in step 1.4 into a training set, a validation set and a test set. Wherein 36000 images are in the training set, and 12000 images are in the verification set and the test set.
The step 2 specifically comprises the following steps:
for the low-resolution face image I obtained in the step 1.5LRPerforming a coarse super-resolution process, i.e. ILRThe image is led into a CoarseSRNet network to be processed to obtain ISR1(ii) a As shown in the formula 2, as shown in the formula,
ISR1=CoarseSRNet(ILR) (2)
wherein ILRRepresenting the low resolution image after a bicubic up-sampling, CoarseSRNet represents the coarse super-resolution network employed.
The CoarseSRNet network in the step 2 adopts a 3x3 convolution kernel and a ReLU activation function, 64 filters are used for generating 64 feature maps, and finally, a result I after rough super resolution is obtained through 3x3 convolutionSR1Its size remains 128x 128.
The step 3 specifically comprises the following steps:
as shown in the formula 3, as shown in the formula,
f=Encoder(ISR1) (3)
step 3.1, the I obtained in step 2SR1Inputting the data into a feature extraction network for feature extraction, wherein the feature extraction network uses an encoder structure. The encoder uses 64 convolution kernels of 3 × 3 with a step size of 2, and performs a batch normalization operation on the input image ISR1Down samplingAnd obtaining a 64x64 size feature map of 64 channels from 64x64, and realizing the mapping from the image space to the feature space.
And 3.2, combining an attention mechanism and a residual block to form a residual attention network to extract features. And (4) inputting the feature map obtained in the step (3.1) into a residual error attention network to extract deep features, so as to obtain a multi-channel feature map.
And 3.3, inputting the characteristic diagram obtained in the step 3.2 into a 3x3 convolution layer, and obtaining an extracted characteristic diagram f through convolution, normalization and Tanh activation function. The profile channel is 64 and has dimensions 64x 64.
The step 4 is specifically that,
as shown in the formula 4, as shown in the formula,
p=PriorEstimate(ISR1) (4)
step 4.1, the result I after the rough super resolution obtained in the step 2 is processedSR1Inputting the data into a priori information extraction network, and checking I by adopting 128 convolution checks of 7x7SR1Performing convolution, and then performing normalization and ReLu operation to obtain 128 feature maps of 64x 64;
and 4.2, constructing a stacked hourglass network for prior information extraction. And stacking 4 hourglass networks for extracting the face analysis graph. In order to effectively merge features across scales and retain spatial information of different scales, the stacked hourglass network adopts a jump connection mechanism at a symmetrical layer time. The resulting features were post-processed followed by a 1x1 convolutional layer. Finally, the shared features are concatenated to two separate 1 × 1 convolutional layers to generate a landmark heat map and a parse map.
And 4.3, inputting the feature map obtained in the step 4.1 into a stacked hourglass network, and processing to obtain a face analysis map p with 128 channels, wherein the size of the face analysis map p is 128x64x 64.
The step 5 specifically comprises the following steps:
inputting the feature map f of the 64 channels obtained in the step 4.3 and the face analysis map p of the 128 channels obtained in the step five into a feature fusion network for fusion of the analysis maps and the feature maps to obtain a fused feature map fFusionWith a size of 64x64x11, for a total of 11 channels, one for each channelThe features respectively correspond to a face component, namely face skin, left eyebrow, right eyebrow, left eye, right eye, left ear, right ear, nose, mouth, upper lip and lower lip, and 11 face components in total.
Step 5.1, constructing a feature fusion network, which mainly comprises three parts, wherein the first part is formed by 1x1 convolution and is used for carrying out dimension reduction processing on a face analysis graph; the second part is composed of an attention module CBAM, and the feature maps are weighted through a channel attention mechanism and a space attention mechanism to obtain feature maps describing 11 different face components; the third part is that the feature graph f after final fusion is obtained by respectively adding and averaging the feature graph describing different face components and the analysis graphFusion
Step 5.2, using 11 convolution kernels of 1x1 to reduce the dimension of the 128-channel face analysis graph p obtained in the step 4.3 to 11 channels to obtain pjThe value range of j is 1 to 11, which respectively represent an analysis graph corresponding to a face component. Implementation-specific loss function l3And (4) restraining.
And 5.3, processing the feature map by adopting an attention mechanism to obtain the feature map subjected to weighting processing aiming at each face component, and then cascading.
An attention module is formed through a serial channel attention mechanism and a space attention mechanism, the importance degrees of different space positions and different channels in each feature are automatically obtained through a learning mode, and the useful features are improved and the features which are not important to the current task are restrained by multiplying different weights.
Step 5.4, step 5.3 is executed for 11 times in a circulating way, and the feature graphs corresponding to the 11 surface components are respectively subjected to weighting processing to obtain the feature f after the attention mechanism processingjThe size of which is 64x64x64, and j ranges from 1 to 11, and this feature is used to cascade with the resolution map of the corresponding facial component.
Step 5.5, the face analysis picture p obtained in the step 5.2 is usedjThe characteristic diagram f of the corresponding subscript processed by the attention mechanism obtained in the step 5.4jThe weighted average operation is carried out on the obtained data,obtaining a fused feature map
Figure BDA0003162154820000081
Its size is 64x64x 1. As shown in the formula 5, as shown in the formula,
Figure BDA0003162154820000082
wherein
Figure BDA0003162154820000083
Representing the characteristics after the fusion of the jth channel, Mean representing the cross-channel averaging operation, Cbam representing the pair fjThe attention-weighting process is performed so that,
Figure BDA0003162154820000084
representing element-by-element multiplication.
Step 5.6, the fused characteristic diagram obtained in the step 5.5 is processed
Figure BDA0003162154820000085
Cascading to obtain the output value f of the final characteristic fusion networkFusionThe size of which is 64x64x11, as shown in equation 6,
Figure BDA0003162154820000086
wherein f isFusionRepresenting the output of the final feature fusion network, cat represents the concatenation operation,
Figure BDA0003162154820000087
the corresponding fusion characteristics of the j surface components are shown. j from 1 to 11 denotes 11 face components.
The step 6 is specifically that,
the fused feature map f obtained in the step 5.3FusionInputting the data into a decoder for decoding, adding an deconvolution layer after network convolution, normalization and ReLU activation for up-sampling processing. Finally obtaining a result I after convolution by 3x3SR. While using a jump connection to convert a low resolution image ILRAnd an image I after a coarse super-resolution processSR1And the reconstruction effect can be better realized by splicing with the output result of the feature fusion module.
The step 7 is specifically that,
step 7.1, define the joint loss function, as shown in equation 7,
Figure BDA0003162154820000088
wherein, the loss function adopts a mean square error loss function, N represents the number of images in the training set, hr(i)The high resolution image corresponding to the ith low resolution image is shown.
Figure BDA0003162154820000089
Showing the result of the i-th image after the rough super-resolution processing.
Figure BDA0003162154820000091
Representing the true analytic graph, p, corresponding to the ith image(i)Representing a real face analysis graph obtained by the ith image through a prior information estimation network;
Figure BDA0003162154820000092
and the final result obtained after the ith image is subjected to super-resolution processing is shown.
Step 7.2, I output in step 2SR1Original image hr, original analysis chart
Figure BDA0003162154820000093
Analytic graph p extracted through network and final result ISRAnd inputting the image into a pixel-by-pixel loss function, and generating a high-resolution image through pixel-by-pixel loss function processing. The loss function is continuously minimized iteratively.
Step 7.3, continuously iterating step 7.2 to obtain a joint loss function LtotalUsing the minimum group of weight parameters as the trained model parametersAnd counting to obtain the trained super-resolution network model.
The step 8 is specifically that,
and (3) training a model by using an RMSprop algorithm, inputting the test set data preprocessed in the step (1) into the model generated in the step (7.3), and finally generating a super-resolution processed high-definition face image through residual error network processing and joint loss function minimum iteration.
The invention has the beneficial effects that:
(1) the method of the invention introduces a channel attention mechanism into the residual block to extract the characteristics, so that the network can learn purposefully, adjust the characteristic channel information in a self-adaptive manner, enhance the expression capability of the characteristics and help to recover more details such as contour texture.
(2) The method of the invention fuses the feature map and the analysis map by using an attention mechanism, respectively fuses the analysis map and the feature map corresponding to different facial components, increases the guiding function of the analysis map on the super resolution of the face image, more effectively utilizes the extracted useful features and inhibits useless features. The network can accurately distribute the computing resources according to the weight, the reconstruction efficiency is improved, and the reconstruction effect is enhanced.
Drawings
FIG. 1 is a schematic diagram of the whole structure of a super-resolution network used in a face super-resolution method based on prior information and an attention fusion mechanism according to the present invention;
FIG. 2 is a schematic diagram of a feature extraction network structure used in a face super-resolution method based on prior information and an attention fusion mechanism according to the present invention;
FIG. 3 is a schematic diagram of a prior information extraction network used in a human face super-resolution method based on prior information and an attention fusion mechanism according to the present invention;
FIG. 4 is a schematic diagram of a feature fusion module used in a face super-resolution method based on prior information and attention fusion mechanism according to the present invention;
FIG. 5 is a diagram of a conventional face prior;
FIG. 6 is a schematic diagram of a face analysis graph used in a face super-resolution method based on prior information and an attention fusion mechanism according to the present invention;
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention discloses a face super-resolution method based on prior information and attention fusion mechanism, which is implemented according to the following steps:
step 1, an original image data set is made and data enhancement is carried out, then a face image after data enhancement processing is input into a degradation model to be processed to obtain a low-resolution image data set, then the low-resolution image is subjected to bicubic up-sampling to obtain an image with the same size as a high-resolution image as the low-resolution data set, and the data set is divided into a training set and a testing set.
Step 2, inputting the image obtained in the step 1 into a rough super-resolution network for processing to obtain an image I after rough super-resolution processingSR1
The step 2 specifically comprises the following steps: for the low-resolution face image I obtained in the step 1.5LRPerforming a coarse super-resolution process, i.e. ILRThe image is led into a CoarseSRNet network to be processed to obtain ISR1(ii) a As shown in the formula 2, as shown in the formula,
ISR1=CoarseSRNet(ILR) (2)
wherein ILRRepresenting the low resolution image after a bicubic up-sampling, CoarseSRNet represents the coarse super-resolution network employed. The LR image is subjected to rough super resolution processing.
Step 3, training set image I obtained in step 2SR1Inputting the data into an encoder network for feature extraction to obtain a feature map f. As shown in the formula 3, as shown in the formula,
f=Encoder(ISR1) (3)
the method specifically comprises the following steps:
step 3.1, the I obtained in step 2SR1Inputting the data into a feature extraction network for feature extraction, wherein the feature extraction network uses an encoder structure. The size of the parse graph is downsampled to 64x64, taking into account the computational cost. To make it specialThe characteristic size is consistent, 64 convolution kernels of 3 multiplied by 3 are used by an encoder, the step length is 2, and an input image I is subjected to batch normalization operationSR1Down-sampling to 64x64 to obtain a 64x64 size feature map of 64 channels, and mapping from the image space to the feature space is realized.
And 3.2, combining an attention mechanism and a residual block to form a residual attention network to extract features. And (4) inputting the feature map obtained in the step (3.1) into a residual error attention network to extract deep features, so as to obtain a multi-channel feature map.
And 3.3, inputting the characteristic diagram obtained in the step 3.2 into a 3x3 convolution layer, and obtaining an extracted characteristic diagram f through convolution, normalization and Tanh activation function. The profile channel is 64 and has dimensions 64x 64.
Step 4, the image I obtained in the step 2 is processedSR1The method comprises the following steps of inputting the prior information into a prior information extraction network to extract the prior information to obtain an analytic graph p, wherein the prior information extraction network consists of ResNet and a stacked hourglass network, and specifically comprises the following steps:
as shown in the formula 4, as shown in the formula,
p=PriorEstimate(ISR1) (4)
step 4.1, the result I after the rough super resolution obtained in the step 2 is processedSR1Inputting the data into a priori information extraction network, and checking I by adopting 128 convolution checks of 7x7SR1Performing convolution, and then performing normalization and ReLu operation to obtain 128 feature maps of 64x 64;
and 4.2, extracting an analytic graph by adopting a stacked hourglass network. And stacking 4 hourglass networks for extracting the face analysis graph. The resulting features were post-processed followed by a 1x1 convolutional layer. Finally, the shared features are concatenated to two separate 1 × 1 convolutional layers to generate a landmark heat map and a parse map.
And 4.3, inputting the feature map obtained in the step 4.1 into a stacked hourglass network, and processing to obtain a face analysis map p with 128 channels, wherein the size of the face analysis map p is 128x64x 64.
Step 5, inputting the feature diagram f obtained in the step 3 and the analysis diagram p obtained in the step 4 into a feature fusion network for fusion of the analysis diagram and the feature diagram,obtaining a fused feature map fFusion. The method specifically comprises the following steps:
inputting the feature map f of the 64 channels obtained in the step 4.3 and the face analysis map p of the 128 channels obtained in the step five into a feature fusion network for fusion of the analysis maps and the feature maps to obtain a fused feature map fFusionThe size of the face component is 64x64x11, a total of 11 channels are provided, and each channel is characterized by corresponding to a face component, namely face skin, left eyebrow, right eyebrow, left eye, right eye, left ear, right ear, nose, mouth, upper lip, lower lip and a total of 11 face components.
Step 5.1, constructing a feature fusion network, which mainly comprises three parts, wherein the first part is formed by 1x1 convolution and is used for carrying out dimension reduction processing on a face analysis graph; the second part is composed of an attention module CBAM, and the feature maps are weighted through a channel attention mechanism and a space attention mechanism to obtain feature maps describing 11 different face components; the third part is that the feature graph f after final fusion is obtained by respectively adding and averaging the feature graph describing different face components and the analysis graphFusion
Step 5.2, using 11 convolution kernels of 1x1 to reduce the dimension of the 128-channel face analysis graph p obtained in the step 4.3 to 11 channels to obtain pjThe value range of j is 1 to 11, which respectively represent an analysis graph corresponding to a face component. Implementation-specific loss function l3And (4) restraining.
And 5.3, processing the feature map by adopting an attention mechanism to obtain the feature map subjected to weighting processing aiming at each face component, and then cascading.
An attention module is formed through a serial channel attention mechanism and a spatial attention mechanism, the feature maps of 64 channels obtained in the step 4.3 are input into the attention module to be subjected to weight marking, the importance degree of each feature channel is automatically obtained through a learning mode, each channel is multiplied by different weights, then the feature maps which are noticed by the channels are input into the spatial attention module, the importance degrees of different spatial positions in each feature are automatically obtained through a similar learning mode, and the different spatial positions in the feature maps are multiplied by different weights to promote useful features and restrain features which are not important for the current task.
Step 5.4, step 5.3 is executed for 11 times in a circulating way, and the feature graphs corresponding to the 11 surface components are respectively subjected to weighting processing to obtain the feature f after the attention mechanism processingjThe size of which is 64x64x64, and j ranges from 1 to 11, and this feature is used to cascade with the resolution map of the corresponding facial component.
Step 5.5, the face analysis picture p obtained in the step 5.2 is usedjThe characteristic diagram f of the corresponding subscript processed by the attention mechanism obtained in the step 5.4jWeighted average operation is carried out to obtain a fused characteristic diagram
Figure BDA0003162154820000141
Its size is 64x64x 1. As shown in the formula 5, as shown in the formula,
Figure BDA0003162154820000142
wherein
Figure BDA0003162154820000143
Representing the characteristics after the fusion of the jth channel, Mean representing the cross-channel averaging operation, Cbam representing the pair fjThe attention-weighting process is performed so that,
Figure BDA0003162154820000144
representing element-by-element multiplication.
Step 5.6, the fused characteristic diagram obtained in the step 5.5 is processed
Figure BDA0003162154820000145
Cascading to obtain the output value f of the final characteristic fusion networkFusionThe size of which is 64x64x11, as shown in equation 6,
Figure BDA0003162154820000146
wherein f isFusionRepresenting the output of the final feature fusion network, cat represents the concatenation operation,
Figure BDA0003162154820000147
the corresponding fusion characteristics of the j surface components are shown. j from 1 to 11 denotes 11 face components.
Step 6, the characteristic diagram f obtained in the step 5 is processedFusionInputting the data into a decoder network for decoding to obtain a final super-resolution processing result ISR
Step 7, the I obtained in the step 2SR1And the original image is input into a pixel-by-pixel loss function to obtain l1The analytic graph p obtained in the step 4 and the analytic graph in the original data set are used
Figure BDA0003162154820000148
Input into a pixel-by-pixel loss function to obtain l2And (4) obtaining a final result I obtained in the step 6SRAnd the original image is input into a pixel-by-pixel loss function to be calculated to obtain l3Adding the above loss functions to obtain Ltotal. Continuously iterating to minimize loss function, training to generate super
Distinguishing a network model;
the step 7 is specifically that,
step 7.1, define the joint loss function, as shown in equation 7,
Figure BDA0003162154820000151
wherein, the loss function adopts a mean square error loss function, N represents the number of images in the training set, hr(i)The high resolution image corresponding to the ith low resolution image is shown.
Figure BDA0003162154820000152
Showing the result of the i-th image after the rough super-resolution processing.
Figure BDA0003162154820000153
Representing the true analytic graph, p, corresponding to the ith image(i)Representing a real face analysis graph obtained by the ith image through a prior information estimation network;
Figure BDA0003162154820000154
and the final result obtained after the ith image is subjected to super-resolution processing is shown.
Step 7.2, I output in step 2SR1Original image hr, original analysis chart
Figure BDA0003162154820000155
Analytic graph p extracted through network and final result ISRAnd inputting the image into a pixel-by-pixel loss function, and generating a high-resolution image through pixel-by-pixel loss function processing. And continuously iterating to minimize the loss function, and training to finally generate the super-resolution network model.
Step 7.3, continuously iterating step 7.2 to obtain a joint loss function LtotalAnd taking the minimum group of weight parameters as the trained model parameters to obtain the trained super-resolution network model.
And 8, setting hyper-parameters of the super-resolution network model, inputting the preprocessed test set image in the step 1 into the super-resolution network model, and finally generating a high-resolution face image with clear detail texture and better effect through residual network processing and loss function minimized iteration.
Examples
A face super-resolution method based on prior information and attention fusion mechanism is disclosed, as shown in FIG. 1, and is implemented specifically according to the following steps:
step 1, an original image data set is made and data enhancement is carried out, then a face image after data enhancement processing is input into a degradation model to be processed to obtain a low-resolution image data set, then the low-resolution image is subjected to bicubic up-sampling to obtain an image with the same size as a high-resolution image as the low-resolution data set, and the data set is divided into a training set and a testing set. The method specifically comprises the following steps:
step 1.1, download the CelebAMask-HQ dataset and crop the image to 128x128 as the original image size using the resize function of matlab.
And 1.2, carrying out mirror image overturning on all images in the data set to carry out data enhancement.
And step 1.3, inputting the data set obtained in the step 1.2 into a prepared degradation model to generate a corresponding low-resolution face image, and simulating a degradation process in reality. As shown in the formula 1, as shown in the figure,
Figure BDA0003162154820000161
wherein k represents a fuzzy kernel, which means that the fuzzy kernel performs convolution operation on the high-resolution face image, ↓ represents downsampling operation, s represents a downsampling factor, n represents noise, and the low-resolution face image after being processed and degraded is obtained, and the size of the low-resolution face image is 16x 16.
Step 1.4, carrying out double-thrice upsampling operation on the low-resolution face image obtained in the step 1.3 to obtain a low-resolution face image I with the size consistent with that of the original imageLRAnd the size is 128x 128.
And 1.5, dividing the data set in the step 1.4 into a training set, a verification set and a test set. Wherein 36000 images are in the training set, and 12000 images are in the verification set and the test set.
Step 2, inputting the image obtained in the step 1 into a rough super-resolution network for processing to obtain an image I after rough super-resolution processingSR1
The step 2 specifically comprises the following steps:
in order to take the accuracy of the subsequent feature extraction and prior information extraction into consideration, the low-resolution face image I obtained in step 1.5 is firstly subjected toLRPerforming a coarse super-resolution process, i.e. ILRThe image is led into a CoarseSRNet network to be processed to obtain ISR1(ii) a As shown in the formula 2, as shown in the formula,
ISR1=CoarseSRNet(ILR) (2)
wherein ILRRepresenting the low resolution image after a bicubic up-sampling, CoarseSRNet represents the coarse super-resolution network employed. The general image super-resolution network SRCNN is improved to obtain a simplified SRCNN serving as CoarseSRNet, and LR images are subjected to rough super-resolution processing.
The CoarseSRNet network in the step 2 adopts a 3x3 convolution kernel and a ReLU activation function, 64 filters are used for generating 64 feature maps, and finally, a result I after rough super resolution is obtained through 3x3 convolutionSR1Its size remains 128x 128.
Step 3, training set image I obtained in step 2SR1Inputting the data into the encoder network to perform feature extraction to obtain a feature map f, as shown in fig. 2.
The method specifically comprises the following steps:
step 3.1, the I obtained in step 2SR1Inputting the data into a feature extraction network for feature extraction, wherein the feature extraction network uses an encoder structure. As shown in the formula 3, as shown in the formula,
f=Encoder(ISR1) (3)
the size of the parse graph is downsampled to 64x64, taking into account the computational cost. Therefore, in order to make the feature size consistent, the encoder uses 64 convolution kernels of 3 × 3 with a step size of 2, and then performs a batch normalization operation on the input image ISR1Down-sampling to 64x64 to obtain a 64x64 size feature map of 64 channels, and mapping from the image space to the feature space is realized.
And 3.2, inspiring by a residual error attention module (RCAN), and combining an attention mechanism and a residual error block to form a residual error attention network to extract features. And (4) inputting the feature map obtained in the step (3.1) into a residual error attention network to extract deep features, so as to obtain a multi-channel feature map.
The step 3.2 is specifically as follows:
step 3.2.1, the traditional deep learning method adopts an equalization processing method in channel domains with different importance, so that a large amount of computing resources are wasted in unimportant features. In order to solve the problems, a residual attention block RAB is constructed, and a channel attention mechanism is introduced into the residual attention block, so that a network can learn purposefully, effective features can be extracted more effectively, and useless features are suppressed. Capturing weight information implied by a channel domain through an attention mechanism so as to more efficiently distribute computing resources and accelerate network convergence;
step 3.2.2, combine 12 residual attention blocks RAB to form a residual attention network, as shown in fig. 2.
And 3.2.3, inputting the multi-channel feature map f obtained in the step 3.1 into a residual error attention network to extract deep features, and obtaining the extracted deep features.
And 3.3, inputting the characteristic diagram obtained in the step 3.2.3 into a 3x3 convolution layer, and obtaining an extracted characteristic diagram f through convolution, normalization and Tanh activation function. The profile channel is 64 and has dimensions 64x 64.
Step 4, the image I obtained in the step 2 is processedSR1The prior information is input into a prior information extraction network to extract the prior information to obtain an analytic graph p, as shown in fig. 3, wherein the prior information extraction network is composed of a ResNet and a stacked hourglass network, and specifically comprises the following steps:
the prior information of the human face mainly comprises a human face landmark image, a landmark heat map, a human face analytic map and the like, and due to the fact that under the condition that the resolution of the image is too small, key points of the human face are not accurate enough, the subsequent prior information can influence the process of guiding the super-resolution of the human face. Therefore, the face analysis graph is selected as the face prior information instead of the face key point, the above three kinds of face prior information are shown in figure 5,
step 4.1, the result I after the rough super resolution obtained in the step 2 is processedSR1The global feature is input into a priori information extraction network, and generally, the larger the convolution kernel is, the larger the receptive field is, and the better the obtained global feature is. So 128 convolution checks I of 7x7 are usedSR1Convolution is carried out, and then 128 feature maps of 64x64 are obtained through normalization and ReLu operation, as shown in formula 4,
p=PriorEstimate(ISR1) (4)
and 4.2, constructing a stacked hourglass network for prior information extraction. Inspired by the latest success of the stacked hourglass network in human body posture estimation, the stacked hourglass network is adopted to extract an analytic graph. And stacking 4 hourglass networks for extracting the face analysis graph. Since the analytic graph is two-dimensional, in the a priori estimation network, all features except the last layer are shared between the two tasks. In order to effectively merge features across scales and retain spatial information of different scales, the stacked hourglass network adopts a jump connection mechanism at a symmetrical layer time. The resulting features were post-processed followed by a 1x1 convolutional layer. Finally, the shared features are concatenated to two separate 1 × 1 convolutional layers to generate a landmark heat map and a parse map.
And 4.3, inputting the feature map obtained in the step 4.1 into a stacked hourglass network, and processing to obtain a face analysis map p with 128 channels, wherein the size of the face analysis map p is 128x64x 64.
Step 5, inputting the feature diagram f obtained in the step 3 and the analysis diagram p obtained in the step 4 into a feature fusion network for fusion of the analysis diagram and the feature diagram to obtain a fused feature diagram fFusion. As shown in fig. 4, the feature fusion module specifically includes:
inputting the feature map f of the 64 channels obtained in the step 4.3 and the face analysis map p of the 128 channels obtained in the step five into a feature fusion network for fusion of the analysis maps and the feature maps to obtain a fused feature map fFusionThe size of the face component is 64x64x11, there are 11 channels, and each channel is characterized by a face component, which is face skin, left eyebrow, right eyebrow, left eye, right eye, left ear, right ear, nose, mouth, upper lip, lower lip, and 11 face components in total, as shown in fig. 6.
Step 5.1, constructing a feature fusion network, which mainly comprises three parts, wherein the first part is formed by 1x1 convolution and is used for carrying out dimension reduction processing on a face analysis graph; the second part is composed of an attention module CBAM, and the feature maps are weighted through a channel attention mechanism and a space attention mechanism to obtain feature maps describing 11 different face components; the third part is that the feature graph f after final fusion is obtained by respectively adding and averaging the feature graph describing different face components and the analysis graphFusion
Step 5.2, using 11 convolution kernels of 1x1 to reduce the dimension of the 128-channel face analysis graph p obtained in the step 4.3 to 11 channels to obtain pjThe value range of j is 1 to 11, which respectively represent an analysis graph corresponding to a face component. Implementation-specific loss function l3And (4) restraining.
Step 5.3, in existing approaches, the facial structure may not be fully exploited, as features of different facial components are typically extracted by a shared network. Thus, a priori information present in different face components may be ignored by the network. Therefore, different facial regions should be restored separately for better performance. Therefore, the feature map is processed by an attention mechanism to obtain the feature map after weighting processing is performed on each face component, and then cascading is performed.
An attention module is formed through a serial channel attention mechanism and a spatial attention mechanism, the feature maps of 64 channels obtained in the step 4.3 are input into the attention module to be subjected to weight marking, the importance degree of each feature channel is automatically obtained through a learning mode, each channel is multiplied by different weights, then the feature maps which are noticed by the channels are input into the spatial attention module, the importance degrees of different spatial positions in each feature are automatically obtained through a similar learning mode, and the different spatial positions in the feature maps are multiplied by different weights to promote useful features and restrain features which are not important for the current task.
Step 5.4, step 5.3 is executed for 11 times in a circulating way, and the feature graphs corresponding to the 11 surface components are respectively subjected to weighting processing to obtain the feature f after the attention mechanism processingjThe size of which is 64x64x64, and j ranges from 1 to 11, and this feature is used to cascade with the resolution map of the corresponding facial component.
Step 5.5, the face analysis picture p obtained in the step 5.2 is usedjThe characteristic diagram f of the corresponding subscript processed by the attention mechanism obtained in the step 5.4jWeighted average operation is carried out to obtain a fused characteristic diagram
Figure BDA0003162154820000211
Its size is 64x64x 1. As shown in the formula 5, as shown in the formula,
Figure BDA0003162154820000212
wherein
Figure BDA0003162154820000213
Representing the characteristics after the fusion of the jth channel, Mean representing the cross-channel averaging operation, Cbam representing the pair fjThe attention-weighting process is performed so that,
Figure BDA0003162154820000214
representing element-by-element multiplication.
Step 5.6, the fused characteristic diagram obtained in the step 5.5 is processed
Figure BDA0003162154820000215
Cascading to obtain the output value of the final feature fusion network
Figure BDA0003162154820000216
The size of which is 64x64x11, as shown in equation 6,
Figure BDA0003162154820000217
wherein f isFusionRepresenting the output of the final feature fusion network, cat represents the concatenation operation,
Figure BDA0003162154820000221
the corresponding fusion characteristics of the j surface components are shown. j from 1 to 11 denotes 11 face components.
Step 6, the characteristic diagram f obtained in the step 5 is processedFusionInputting the data into a decoder network for decoding to obtain a final super-resolution processing result ISR. In particular to a method for preparing a high-performance nano-silver alloy,
the fused feature map f obtained in the step 5.3FusionInput to a decoder for decoding, decoder and encoderThe structure is similar, and the method is also formed by combining residual blocks, and only adds an deconvolution layer to perform upsampling processing after network convolution, normalization and ReLU activation. Finally obtaining a result I after convolution by 3x3SR. At the same time in order to better utilize ILRAbundant low-frequency information of image contained in shallow image feature effectively utilizes low-resolution image I by utilizing jump connectionLRAnd an image I after a coarse super-resolution processSR1And the low-frequency information is spliced with the output result of the feature fusion module, so that the low-frequency information can be directly transmitted to the tail end of the module through jump connection, and a better reconstruction effect is realized.
Step 7, the I obtained in the step 2SR1And the original image is input into a pixel-by-pixel loss function to obtain l1Inputting the analytic graph p obtained in the step 4 and the analytic graph p-in the original data set into a pixel-by-pixel loss function to calculate to obtain l2And (4) obtaining a final result I obtained in the step 6SRAnd the original image is input into a pixel-by-pixel loss function to be calculated to obtain l3Adding the above loss functions to obtain Ltotal. Continuously iterating to minimize the loss function, and finally generating a super-resolution network model after training;
the step 7 is specifically that,
step 7.1, define the joint loss function, as shown in equation 7,
Figure BDA0003162154820000222
wherein, the loss function adopts a mean square error loss function, N represents the number of images in the training set, hr(i)The high resolution image corresponding to the ith low resolution image is shown.
Figure BDA0003162154820000231
Showing the result of the i-th image after the rough super-resolution processing.
Figure BDA0003162154820000232
Representing the true analytic graph, p, corresponding to the ith image(i)Is shown asThe i images are subjected to a real face analysis graph obtained by a priori information estimation network;
Figure BDA0003162154820000233
and the final result obtained after the ith image is subjected to super-resolution processing is shown.
Step 7.2, I output in step 2SR1Original image hr, original analysis chart
Figure BDA0003162154820000234
Analytic graph p extracted through network and final result ISRAnd inputting the image into a pixel-by-pixel loss function, and generating a high-resolution image through pixel-by-pixel loss function processing. And continuously iterating to minimize the loss function, and training to finally generate the super-resolution network model.
Step 7.3, continuously iterating step 7.2 to obtain a joint loss function LtotalAnd taking the minimum group of weight parameters as the trained model parameters to obtain the trained super-resolution network model.
And 8, setting hyper-parameters of the super-resolution network model, inputting the preprocessed test set image in the step 1 into the super-resolution network model, and finally generating a high-resolution face image with clear detail texture and better effect through residual network processing and loss function minimized iteration.
The step 8 is specifically that,
the model was trained using the RMSprop algorithm with an initial learning rate of 2.5 x 10-4 and a minimum batch of 14. λ is set to 0.8 empirically. The training was run with a batch size of 8 and a set learning rate of 10-31.2 × 10 per iteration5The secondary reduction is half;
and (3) inputting the test set data preprocessed in the step (1) into the model generated in the step (7.3), and finally generating a super-resolution processed high-definition face image through residual error network processing and combined loss function minimum iteration.

Claims (8)

1. A face super-resolution method based on prior information and attention fusion mechanism is characterized by comprising the following steps:
step 1, downloading an original image data set, including an original face image and an original face analysis image p-and performing data enhancement, inputting the original image subjected to data enhancement processing into a degradation model to process to obtain a low-resolution image, performing double-thrice up-sampling on the low-resolution image to obtain an image with the same size as a high-resolution image as a low-resolution data set, and finally dividing the data set into a training set and a testing set;
step 2, inputting the training set image obtained in the step 1 into a rough super-resolution network for processing to obtain a training set image I after rough super-resolution processingSR1
Step 3, training set image I obtained in step 2SR1Inputting the data into a coder network for feature extraction to obtain a feature map f;
step 4, training set image I obtained in step 2SR1Inputting the prior information into a prior information extraction network to extract the prior information to obtain an analytic graph p, wherein the prior information extraction network consists of ResNet and a stacked hourglass network;
step 5, inputting the feature diagram f obtained in the step 3 and the analysis diagram p obtained in the step 4 into a feature fusion network for fusion of the analysis diagram and the feature diagram to obtain a fused feature diagram fFusion
Step 6, the characteristic diagram f obtained in the step 5 is processedFusionInputting the data into a decoder network for decoding to obtain a final super-resolution processing result ISR
Step 7, the I obtained in the step 2SR1Calculating a loss function l from the pixel-by-pixel loss function of the input original image1Analyzing graph p obtained in step 4 and analyzing graph in original image data set
Figure FDA0003162154810000021
The loss function l is calculated by inputting the loss function into the pixel-by-pixel2And (4) obtaining a super-resolution processing result I obtained in the step (6)SRAnd inputting the original image into a pixel-by-pixel loss function to obtain a loss function l3To be connected toThe loss functions of the surfaces are added to obtain a joint loss function Ltotal. Continuously iterating to minimize the loss function, and finally generating a super-resolution network model after training;
and 8, setting hyper-parameters of the super-resolution network model, inputting the test set subjected to pretreatment in the step 1 into the super-resolution network model, and performing residual network treatment and loss function minimum iteration to finally generate a high-resolution face image with clear detail texture and better effect.
2. The method according to claim 1, wherein step 1 is specifically:
step 1.1, downloading a data set to obtain a high-definition face image, and cutting the image into 128x128 serving as the size of an original image by using a resize function of matlab to reduce the calculated amount;
step 1.2, carrying out mirror image turning on all images in the data set to obtain a data-enhanced face data set;
step 1.3, performing degradation processing on the data set obtained in the step 1.2, inputting all images in the data set into a prepared degradation model in advance to generate corresponding low-resolution face images, and simulating a degradation process in reality;
step 1.4, carrying out double-thrice upsampling operation on the low-resolution face image obtained in the step 1.3 to obtain a low-resolution face image I with the size consistent with that of the original imageLR
Step 1.5, according to 6: 2: 2 divide the data set in step 1.4 into a training set, a validation set and a test set.
3. The method according to claim 1, wherein step 2 is specifically:
for the low-resolution face image I obtained in the step 1.5LRPerforming a coarse super-resolution process, i.e. ILRThe image is led into a CoarseSRNet network to be processed to obtain ISR1(ii) a As shown in the formula 2, as shown in the formula,
ISR1=CoarseSRNet(ILR) (2)
wherein ILRRepresenting the low resolution image after bicubic up-sampling, CoarseSRNet representing the coarse super-resolution network employed;
the CoarseSRNet network in the step 2 adopts a 3x3 convolution kernel and a ReLU activation function, 64 filters are used for generating 64 feature maps, and finally, a result I after rough super resolution is obtained through 3x3 convolutionSR1
4. The method according to claim 1, wherein step 3 is specifically:
step 3.1, the I obtained in step 2SR1Inputting the data into a feature extraction network for feature extraction, wherein the feature extraction network uses an encoder structure. The encoder uses a convolution kernel of 3x3 with a step size of 2, and performs a batch normalization operation on the input image ISR1Down-sampling to 64x64, obtaining a 64x64 size feature map of 64 channels, and realizing the mapping from the image space to the feature space, as shown in formula 3,
f=Encoder(ISR1) (3)
and 3.2, combining an attention mechanism and a residual block to form a residual attention network to extract features. Inputting the feature map obtained in the step 3.1 into a residual error attention network to extract deep features to obtain a multi-channel feature map;
and 3.3, inputting the characteristic diagram obtained in the step 3.2 into a 3x3 convolution layer, and obtaining an extracted characteristic diagram f through convolution, normalization and Tanh activation function. The signature channel is 64.
5. The method according to claim 1, characterized in that said step 4 is in particular,
step 4.1, the result I after the rough super resolution obtained in the step 2 is processedSR1Inputting into a priori information extraction network, and checking I by convolution of 7x7SR1Convolution is carried out, and then a feature map of 64x64 is obtained through normalization and ReLu operation, as shown in formula 4,
p=PriorEstimate(ISR1) (4)
and 4.2, constructing a stacked hourglass network for prior information extraction. Stacking 4 hourglass networks to extract a face analysis graph; in order to effectively merge features in a cross-scale mode and retain spatial information in different scales, the stacked hourglass network adopts a jump connection mechanism in a symmetrical layer time; the resulting features were post-processed followed by a 1x1 convolutional layer. Finally, the shared features are connected to two separate 1x1 convolutional layers to generate a landmark heat map and a parse map;
and 4.3, inputting the characteristic diagram obtained in the step 4.1 into a stacked hourglass network, and processing to obtain a face analysis diagram p with 128 channels.
6. The method of claim 1, wherein the step 5 comprises:
inputting the feature map f of the 64 channels obtained in the step 4.3 and the face analysis map p of the 128 channels obtained in the step five into a feature fusion network for fusion of the analysis maps and the feature maps to obtain a fused feature map fFusionThe feature map has 11 channels, and the features of each channel respectively correspond to a face component, namely face skin, left eyebrow, right eyebrow, left eye, right eye, left ear, right ear, nose, mouth, upper lip and lower lip, and totally 11 face components; the method comprises the following specific steps:
step 5.1, constructing a feature fusion network, which mainly comprises three parts, wherein the first part is formed by 1x1 convolution and is used for carrying out dimension reduction processing on a face analysis graph; the second part is composed of an attention module CBAM, and the feature maps are weighted through a channel attention mechanism and a space attention mechanism to obtain feature maps describing 11 different face components; the third part is that the feature graph f after final fusion is obtained by respectively adding and averaging the feature graph describing different face components and the analysis graphFusion
Step 5.2, using 11 convolution kernels of 1x1 to reduce the dimension of the 128-channel face analysis graph p obtained in the step 4.3 to 11 channels, and obtaining an analysis graph p of 11 channelsjJ ranges from 1 to 11, each representing an analysis graph corresponding to a face component; utensil for cleaning buttockLoss function l of body realization process3And (4) restraining.
And 5.3, processing the feature map by adopting an attention mechanism to obtain the feature map subjected to weighting processing aiming at each face component, and then cascading.
An attention module is formed through a serial channel attention mechanism and a space attention mechanism, the importance degrees of different space positions and different channels in each feature are automatically obtained through a learning mode, and the useful features are improved and the features which are not important to the current task are restrained by multiplying different weights.
Step 5.4, step 5.3 is executed for 11 times in a circulating way, and the feature graphs corresponding to the 11 surface components are respectively subjected to weighting processing to obtain the feature f after the attention mechanism processingjAnd j ranges from 1 to 11, and this feature is used to cascade with the resolution map of the corresponding face component.
Step 5.5, the face analysis picture p obtained in the step 5.2 is usedjThe characteristic diagram f of the corresponding subscript processed by the attention mechanism obtained in the step 5.4jWeighted average operation is carried out to obtain a fused characteristic diagram
Figure FDA0003162154810000051
As shown in the formula 5, as shown in the formula,
Figure FDA0003162154810000052
wherein
Figure FDA0003162154810000053
Representing the characteristics after the fusion of the jth channel, Mean representing the cross-channel averaging operation, Cbam representing the pair fjThe attention-weighting process is performed so that,
Figure FDA0003162154810000054
representing element-by-element multiplication.
Step 5.6, the fused characteristic diagram obtained in the step 5.5 is processed
Figure FDA0003162154810000055
Cascading to obtain the output value f of the final characteristic fusion networkFusionAs shown in the formula 6, the above-mentioned,
Figure FDA0003162154810000061
wherein f isFusionRepresenting the output of the final feature fusion network, cat represents the concatenation operation,
Figure FDA0003162154810000062
the corresponding fusion characteristics of the j surface components are shown. j from 1 to 11 denotes 11 face components.
7. The method according to claim 1, wherein step 6 is specifically:
the fused feature map f obtained in the step 5.6FusionInputting the data into a decoder for decoding, and adding an anti-convolution layer for up-sampling processing after network convolution, normalization and ReLU activation; finally obtaining a result I after convolution by 3x3SR. While using a jump connection to convert a low resolution image ILRAnd an image I after a coarse super-resolution processSR1And the output result of the characteristic fusion module is spliced, so that a good reconstruction effect can be realized.
8. The method according to claim 1, characterized in that said step 7 is in particular,
step 7.1, define the joint loss function, as shown in equation 7,
Figure FDA0003162154810000063
wherein, the loss function adopts a mean square error loss function, N represents the number of images in the training set, hr(i)Showing the ith low-resolution image pairHigh resolution images should be used.
Figure FDA0003162154810000064
Showing the result of the i-th image after the rough super-resolution processing.
Figure FDA0003162154810000065
Representing the true analytic graph, p, corresponding to the ith image(i)Representing a real face analysis graph obtained by the ith image through a prior information estimation network;
Figure FDA0003162154810000066
and the final result obtained after the ith image is subjected to super-resolution processing is shown.
Step 7.2, training set image I output in step 2SR1Original image hr, original analysis chart
Figure FDA0003162154810000067
Analytic graph p extracted through network and final result ISRInputting the image into a pixel-by-pixel loss function, generating a high-resolution image through pixel-by-pixel loss function processing, and continuously iterating to minimize the loss function;
step 7.3, continuously iterating step 7.2 to obtain a joint loss function LtotalAnd taking the minimum group of weight parameters as the trained model parameters to obtain the trained super-resolution network model.
CN202110794066.2A 2021-07-14 2021-07-14 Human face super-resolution method based on priori information and attention fusion mechanism Active CN113658040B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110794066.2A CN113658040B (en) 2021-07-14 2021-07-14 Human face super-resolution method based on priori information and attention fusion mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110794066.2A CN113658040B (en) 2021-07-14 2021-07-14 Human face super-resolution method based on priori information and attention fusion mechanism

Publications (2)

Publication Number Publication Date
CN113658040A true CN113658040A (en) 2021-11-16
CN113658040B CN113658040B (en) 2024-07-16

Family

ID=78477390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110794066.2A Active CN113658040B (en) 2021-07-14 2021-07-14 Human face super-resolution method based on priori information and attention fusion mechanism

Country Status (1)

Country Link
CN (1) CN113658040B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118303A (en) * 2022-01-25 2022-03-01 中科视语(北京)科技有限公司 Face key point detection method and device based on prior constraint
CN114529450A (en) * 2022-01-25 2022-05-24 华南理工大学 Face image super-resolution method based on improved depth iterative cooperative network
CN115205117A (en) * 2022-07-04 2022-10-18 中国电信股份有限公司 Image reconstruction method and device, computer storage medium and electronic equipment
CN115358932A (en) * 2022-10-24 2022-11-18 山东大学 Multi-scale feature fusion face super-resolution reconstruction method and system
CN116563916A (en) * 2023-04-25 2023-08-08 山东大学 Attention fusion-based cyclic face super-resolution method and system
CN116630168A (en) * 2022-02-10 2023-08-22 腾讯科技(深圳)有限公司 Image processing method, apparatus, device, medium, and computer program product
CN117274067A (en) * 2023-11-22 2023-12-22 浙江优众新材料科技有限公司 Light field image blind super-resolution processing method and system based on reinforcement learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089803A1 (en) * 2016-03-21 2018-03-29 Boe Technology Group Co., Ltd. Resolving Method and System Based on Deep Learning
CN107958246A (en) * 2018-01-17 2018-04-24 深圳市唯特视科技有限公司 A kind of image alignment method based on new end-to-end human face super-resolution network
CN110148085A (en) * 2019-04-22 2019-08-20 智慧眼科技股份有限公司 Face image super-resolution reconstruction method and computer-readable storage medium
WO2020168731A1 (en) * 2019-02-19 2020-08-27 华南理工大学 Generative adversarial mechanism and attention mechanism-based standard face generation method
CN111768342A (en) * 2020-09-03 2020-10-13 之江实验室 Human face super-resolution method based on attention mechanism and multi-stage feedback supervision
CN112070668A (en) * 2020-08-18 2020-12-11 西安理工大学 Image super-resolution method based on deep learning and edge enhancement
CN112686830A (en) * 2020-12-30 2021-04-20 太原科技大学 Super-resolution method of single depth map based on image decomposition
CN112750082A (en) * 2021-01-21 2021-05-04 武汉工程大学 Face super-resolution method and system based on fusion attention mechanism

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089803A1 (en) * 2016-03-21 2018-03-29 Boe Technology Group Co., Ltd. Resolving Method and System Based on Deep Learning
CN107958246A (en) * 2018-01-17 2018-04-24 深圳市唯特视科技有限公司 A kind of image alignment method based on new end-to-end human face super-resolution network
WO2020168731A1 (en) * 2019-02-19 2020-08-27 华南理工大学 Generative adversarial mechanism and attention mechanism-based standard face generation method
CN110148085A (en) * 2019-04-22 2019-08-20 智慧眼科技股份有限公司 Face image super-resolution reconstruction method and computer-readable storage medium
CN112070668A (en) * 2020-08-18 2020-12-11 西安理工大学 Image super-resolution method based on deep learning and edge enhancement
CN111768342A (en) * 2020-09-03 2020-10-13 之江实验室 Human face super-resolution method based on attention mechanism and multi-stage feedback supervision
CN112686830A (en) * 2020-12-30 2021-04-20 太原科技大学 Super-resolution method of single depth map based on image decomposition
CN112750082A (en) * 2021-01-21 2021-05-04 武汉工程大学 Face super-resolution method and system based on fusion attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
金炜;陈莹;: "多尺度残差通道注意机制下的人脸超分辨率网络", 计算机辅助设计与图形学学报, no. 06, 30 June 2020 (2020-06-30), pages 959 - 967 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118303A (en) * 2022-01-25 2022-03-01 中科视语(北京)科技有限公司 Face key point detection method and device based on prior constraint
CN114118303B (en) * 2022-01-25 2022-04-29 中科视语(北京)科技有限公司 Face key point detection method and device based on prior constraint
CN114529450A (en) * 2022-01-25 2022-05-24 华南理工大学 Face image super-resolution method based on improved depth iterative cooperative network
CN114529450B (en) * 2022-01-25 2023-04-25 华南理工大学 Face image super-resolution method based on improved depth iteration cooperative network
CN116630168A (en) * 2022-02-10 2023-08-22 腾讯科技(深圳)有限公司 Image processing method, apparatus, device, medium, and computer program product
CN115205117A (en) * 2022-07-04 2022-10-18 中国电信股份有限公司 Image reconstruction method and device, computer storage medium and electronic equipment
CN115205117B (en) * 2022-07-04 2024-03-08 中国电信股份有限公司 Image reconstruction method and device, computer storage medium and electronic equipment
CN115358932A (en) * 2022-10-24 2022-11-18 山东大学 Multi-scale feature fusion face super-resolution reconstruction method and system
CN115358932B (en) * 2022-10-24 2023-03-24 山东大学 Multi-scale feature fusion face super-resolution reconstruction method and system
CN116563916A (en) * 2023-04-25 2023-08-08 山东大学 Attention fusion-based cyclic face super-resolution method and system
CN117274067A (en) * 2023-11-22 2023-12-22 浙江优众新材料科技有限公司 Light field image blind super-resolution processing method and system based on reinforcement learning

Also Published As

Publication number Publication date
CN113658040B (en) 2024-07-16

Similar Documents

Publication Publication Date Title
CN113658040A (en) Face super-resolution method based on prior information and attention fusion mechanism
CN110827216B (en) Multi-generator generation countermeasure network learning method for image denoising
CN111488865B (en) Image optimization method and device, computer storage medium and electronic equipment
Huang et al. Underwater image enhancement via adaptive group attention-based multiscale cascade transformer
CN112132959B (en) Digital rock core image processing method and device, computer equipment and storage medium
CN111709895A (en) Image blind deblurring method and system based on attention mechanism
CN112541864A (en) Image restoration method based on multi-scale generation type confrontation network model
CN112541877B (en) Defuzzification method, system, equipment and medium for generating countermeasure network based on condition
Zhou et al. Guided deep network for depth map super-resolution: How much can color help?
CN112509144A (en) Face image processing method and device, electronic equipment and storage medium
CN117274059A (en) Low-resolution image reconstruction method and system based on image coding-decoding
CN115631107A (en) Edge-guided single image noise removal
Han et al. UIEGAN: Adversarial learning-based photorealistic image enhancement for intelligent underwater environment perception
CN114758030B (en) Underwater polarization imaging method integrating physical model and deep learning
CN117114984A (en) Remote sensing image super-resolution reconstruction method based on generation countermeasure network
CN115861094A (en) Lightweight GAN underwater image enhancement model fused with attention mechanism
Liu et al. Facial image inpainting using multi-level generative network
CN117593187A (en) Remote sensing image super-resolution reconstruction method based on meta-learning and transducer
CN116563554A (en) Low-dose CT image denoising method based on hybrid characterization learning
Zou et al. Diffcr: A fast conditional diffusion framework for cloud removal from optical satellite images
Lee et al. Two-stream learning-based compressive sensing network with high-frequency compensation for effective image denoising
CN115018726A (en) U-Net-based image non-uniform blur kernel estimation method
Toutounchi et al. Advanced super-resolution using lossless pooling convolutional networks
Wang et al. APST-Flow: A Reversible Network-Based Artistic Painting Style Transfer Method.
CN115496989B (en) Generator, generator training method and method for avoiding image coordinate adhesion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240522

Address after: 518000 1104, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Hongyue Information Technology Co.,Ltd.

Country or region after: China

Address before: 710048 Shaanxi province Xi'an Beilin District Jinhua Road No. 5

Applicant before: XI'AN University OF TECHNOLOGY

Country or region before: China

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240619

Address after: Room 28, A01, 3rd Floor, No. 17 Guangshun North Street, Chaoyang District, Beijing, 100020

Applicant after: Beijing Hai Bai Sichuan Science and Technology Co.,Ltd.

Country or region after: China

Address before: 518000 1104, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province

Applicant before: Shenzhen Hongyue Information Technology Co.,Ltd.

Country or region before: China

GR01 Patent grant
GR01 Patent grant