CN112598604A - Blind face restoration method and system - Google Patents
Blind face restoration method and system Download PDFInfo
- Publication number
- CN112598604A CN112598604A CN202110241203.XA CN202110241203A CN112598604A CN 112598604 A CN112598604 A CN 112598604A CN 202110241203 A CN202110241203 A CN 202110241203A CN 112598604 A CN112598604 A CN 112598604A
- Authority
- CN
- China
- Prior art keywords
- image
- loss function
- representing
- blind face
- affnet
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000006870 function Effects 0.000 claims abstract description 68
- 238000012549 training Methods 0.000 claims abstract description 33
- 238000012360 testing method Methods 0.000 claims abstract description 18
- 230000008447 perception Effects 0.000 claims abstract description 13
- 230000008485 antagonism Effects 0.000 claims abstract description 11
- 230000002708 enhancing effect Effects 0.000 claims abstract description 6
- 238000005457 optimization Methods 0.000 claims abstract description 6
- 238000012545 processing Methods 0.000 claims abstract description 5
- 230000014509 gene expression Effects 0.000 claims description 37
- 239000000126 substance Substances 0.000 claims description 36
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 14
- 230000003044 adaptive effect Effects 0.000 claims description 14
- 238000007476 Maximum Likelihood Methods 0.000 claims description 13
- 238000005286 illumination Methods 0.000 claims description 13
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000006731 degradation reaction Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 238000013519 translation Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 230000015556 catabolic process Effects 0.000 claims description 7
- 101100409194 Rattus norvegicus Ppargc1b gene Proteins 0.000 claims description 5
- 101100001671 Emericella variicolor andF gene Proteins 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 230000006835 compression Effects 0.000 claims description 3
- 238000007906 compression Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 2
- 230000001131 transforming effect Effects 0.000 claims 1
- 230000008901 benefit Effects 0.000 abstract description 6
- 230000004927 fusion Effects 0.000 description 15
- 230000000007 visual effect Effects 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 5
- 230000000750 progressive effect Effects 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 4
- 238000007500 overflow downdraw method Methods 0.000 description 4
- 230000036544 posture Effects 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 238000009412 basement excavation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
Images
Classifications
-
- G06T5/73—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G06T5/92—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Abstract
The invention discloses a blind face restoration method and a system, comprising the following steps: acquiring a blind face data set, evaluating the quality of the blind face data set by using a Laplacian gradient, and removing blurred and non-human face images; enhancing image data of the blind face data set, and randomly distributing to obtain a training set and a test set; constructing an AFFNet network; inputting images of a training set into an AFFNet network, training the AFFNet network by combining a reconstruction loss function, a perception loss function, a style loss function and an antagonism loss function, and training and optimizing the AFFNet network by using an SGD (generalized serving-grid-directed) optimization algorithm to obtain an optimal blind face restoration model; and inputting the images of the test set into the optimal blind face restoration model, and matching and selecting to obtain the image with the highest accuracy as a final retrieval result. Through the scheme, the method has the advantages of simple logic, accuracy, reliability and the like, and has high practical value and popularization value in the technical field of image processing.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a blind face restoration method and a blind face restoration system.
Background
Blind face restoration as described herein is the restoration of low quality degraded images (noise, artifacts and blurring and combinations thereof) into sharp high quality images. In recent years, the acquisition and sharing of face images has been greatly improved, and on the one hand, with the development of image acquisition and display technologies, more and more high-quality (HQ) visual media have come into play. On the other hand, degraded images and video are still ubiquitous due to the variety of acquisition equipment, the influence of the environment and object motion. Therefore, how to recover a clear high-quality image from the degraded images is a valuable research topic in the field of computer vision.
High-quality face images play a very important role in entertainment, monitoring, human-computer interaction and other applications, so that face restoration is an urgent need of a multifunctional visual system. Currently, GFRNet in the prior art is a conventional method for face restoration based on a single sample image, but when the postures and expressions of a guide image and a degraded image are different, the definition is obviously reduced. In addition, GFRNet uses direct concatenation to fuse degenerate and curve features, which are limited to a single environmental state and have poor generalization capability to low-quality (LQ) images of unknown degenerate processes. GFRNet does not reconstruct much more of the human face's texture details from the guide image well, nor does it completely remove the noise and artifacts of the degraded image. Therefore, the single-sample method in the prior art is poor in restoration effect of guiding the LQ face image.
Multi-sample images can greatly improve the ability of image restoration compared to single-sample restoration. For degraded LQ face images, a multi-sample HQ image of the same person is likely to be useful. For example, face images in a smartphone album are typically grouped by appearance, and additionally a High Quality (HQ) sample image may be referenced to a Low Quality (LQ) image. Therefore, the introduction of multiple samples greatly reduces the difficulty of degradation estimation and image restoration, and provides a new visual angle for improving the blind face restoration method.
To solve the above problems, the multi-sample image-based method can guide the unique advantages of LQ image restoration. At present, the blind face restoration method in the prior art also has the following problems:
firstly, most of the existing blind face restoration methods are based on single-sample HQ images, and have limitations on the mining of the generalization capability of an unknown degradation process;
second, the prior art GFRNet uses a curvilinear sub-network to spatially calibrate the guide and degraded images. However, due to the lack of direct monitoring information to guide the image, the curve subnet is difficult to train and has poor generalization capability;
thirdly, the guide image and the degraded image are usually shot under different illumination conditions, and the background difference is large;
fourth, the cascade-based fusion method is still limited in complementarity between the guide image and the degraded image.
Therefore, a blind face restoration method based on multi-sample image and adaptive spatial feature fusion with simple logic, accuracy and reliability is urgently provided to improve the accuracy and generalization capability of blind face restoration.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide a blind face restoration method and system, and the technical solution adopted by the present invention is as follows:
a blind face restoration method based on multi-sample image and adaptive spatial feature fusion comprises the following steps:
acquiring a blind face data set, evaluating the quality of the blind face data set by using a Laplacian gradient, and removing blurred and non-human face images; enhancing image data of the blind face data set, and randomly distributing to obtain a training set and a test set;
constructing an AFFNet network;
inputting images of a training set into an AFFNet network, training the AFFNet network by combining a reconstruction loss function, a perception loss function, a style loss function and an antagonism loss function, and training and optimizing the AFFNet network by using an SGD (generalized serving-grid-directed) optimization algorithm to obtain an optimal blind face restoration model;
and inputting the images of the test set into the optimal blind face restoration model, and matching and selecting to obtain the image with the highest accuracy as a final retrieval result.
Further, the image data is enhanced for the blind face data set, including random cropping, horizontal flipping and chrominance transformation of the image of the blind face data set.
Further, the expression of the blind face restoration model is as follows:
wherein the content of the first and second substances,representing a degraded image of the face of a person,a feature representing a degraded image is present in the image,is a key point of the degraded image,a key point representing the guide image is displayed,number of key points (=68),Denotes a parameter, k[0,],Representing model parameters.
Furthermore, the method also comprises the step of carrying out degradation model processing on the blind face data, wherein the expression is as follows:
wherein the content of the first and second substances,which represents a convolution operation, is a function of,Ka blur kernel is represented by the number of pixels,a bi-cubic down-sampler is shown,indicating having a noise levelThe noise of the gaussian noise of (a),JPEG q is expressed with a quality factorqJPEG compression of (1).
Further, the AFFNet network selects an optimal guide image from the blind face data set by adopting a weighted least square method WLS model, performs space calibration and illumination translation on the guide image in a feature space by utilizing a mobile least square method and self-adaptive example normalization, and fuses curve features of the guide image and restoration features of a degraded image by utilizing self-adaptive space features.
Furthermore, the weighted least square method WLS model selects an optimal guidance image from the blind face data set using a minimum weighted affine distance, and the expression is:
wherein the content of the first and second substances,D a (L d ,) Representing an affine distance;w m is shown asmThe weight of each keypoint;andrespectively representing degraded imagesmA key point andka second of the guide imagemA key point;is thatThe homogeneity of (1);Wrepresenting keypoint weight vectorswA diagonal matrix of (a);representing the transpose of the matrix.
Further, utilizeWeight of initialization key point of face image representing degradation, searching guide imageOptimal guide image in forward propagationUpdating the weight of the key point by using a back propagation algorithm, wherein the weight expression of the key point is as follows:
wherein the content of the first and second substances,presentation guidance imageThe affine distance of (c).
Further, the method for performing space calibration and illumination translation on the guide image in the feature space by using a moving least square method and self-adaptive example normalization comprises the following steps:
affine matrix of the guide imageM p The expression of (a) is:
wherein the content of the first and second substances,L g representing optimal guide image key points;L d key points representing degraded images;is thatIs a homogeneous representation of;pis the coordinates of the degraded image and is,p=(x,y);
obtaining curve characteristics of guide image through bilinear interpolationThe expression is as follows:
wherein (A), (B), (C), (D), (C), (x,y) A coordinate representing the degraded image;a coordinate representing the guide image;is (a)x,y) Homogeneous coordinates of (a);Nto represent4 nearest neighbors of;F g features representing an optimal guide image;
and (3) adjusting the curve characteristics of the guide image by using self-adaptive example normalization, wherein the expression is as follows:
wherein the content of the first and second substances,F d andF g w,a curve feature representing a restoration feature of the degraded image and a curve feature of the guide image, respectively;andmean and standard deviation, respectively.
Further, the joint reconstruction loss function, the perception loss function, the pattern loss function and the antagonism loss function train the AFFNet network, and the expression is as follows:
wherein the content of the first and second substances,a joint loss function representing a perceptual loss function and a reconstruction loss function,representing a perceptual loss function;
the expression of the joint loss function of the perceptual loss function and the reconstruction loss function is as follows:
wherein the content of the first and second substances, MSE a weight parameter representing a reconstruction loss function, which has a value in the range of 0 to 1, perc and the weight parameter represents a perception loss function and has a value ranging from 0 to 1.
And (3) adopting a reconstruction loss function to constrain the reconstructed image so as to obtain a reconstructed image close to the real image, and adopting a mean square error to measure the difference between the reconstructed image and the real image, wherein the expression is as follows:
wherein the content of the first and second substances,which represents the reconstructed image(s) of the image,representing the real image, C, H and W representing the channel, height and width of the image, respectively;
and (3) adopting a perception loss function to constrain the reconstructed image, wherein the expression is as follows:
wherein the content of the first and second substances,second to represent a pre-trained faceNet modeluLayer characteristics; the above-mentionedu 1,2,3,4];
The perceptual loss functionL real The expression of (a) is:
wherein the content of the first and second substances, styl a weight parameter representing a pattern loss function, which ranges from 0 to 1, adv and a weight parameter representing the resistance loss function, wherein the value of the weight parameter ranges from 0 to 1.
The expression of the style loss function is:
wherein the content of the first and second substances,which represents the reconstructed image(s) of the image,representing a real image, C, H and W representing the channel, height and width of the image respectively,second to represent a pre-trained faceNet modeluLayer characteristics; the above-mentionedu 1,2,3,4](ii) a The above-mentionedIndicating the interchange of rows and columns of the matrix.
Discriminator for AFFNet network by adopting antagonism loss functionl adv D,Sum generatorl adv G,Training is carried out, and the expression is as follows:
wherein the content of the first and second substances,Iandthe real image and the reconstructed image are represented separately,P(I) AndP() Representing the true image distribution and the reconstructed image distribution, respectively, G and D both represent a neural network, E represents the maximum likelihood estimate,to representIToP(I) The maximum likelihood estimate of (a) is,to representToP() The maximum likelihood estimate of (a) is,to representToP() The maximum likelihood estimate of (a) is,representing real imagesIThe input neural network generates a picture which is then,to representThe input neural network generates a picture.
A blind face restoration method and system comprises the following steps: the data preprocessing module is used for acquiring a blind face data set, evaluating the quality of the blind face data set by utilizing a Laplacian gradient and removing blurred and non-face images; enhancing image data of the blind face data set, and randomly distributing to obtain a training set and a test set;
the feature extraction module is used for extracting high-dimensional image features based on the constructed AFFNet network;
the training module is used for initializing parameters of the AFFNet network, inputting images of a training set into the AFFNet network, training the AFFNet network by combining a reconstruction loss function, a perception loss function, a style loss function and a resistance loss function, training and optimizing the AFFNet network by utilizing an SGD (generalized serving-fuzzy-decomposition) optimization algorithm, and obtaining an optimal blind face restoration model;
and the test module is used for inputting the images of the test set into the optimal blind face restoration model, matching and selecting the images to obtain the image with the highest accuracy as the final retrieval result.
Compared with the prior art, the invention has the following beneficial effects:
(1) the method skillfully adopts a weighted least square method WLS model, selects samples with similar postures and expressions from multiple sample HQ images to select an optimal guide image, adopts WLS to guide optimal selection at key points, and learns the weight of the key points to enable the selected guide image to reach the highest restoration precision, thereby solving the problem that the blind face restoration method based on single sample HQ images has limitation on the excavation of the generalization capability of unknown degradation processes.
(2) The invention introduces a moving least square Method (MLS), and can greatly reduce the posture and expression difference through guiding selection, thereby utilizing the MLS to calibrate the guide image and the degraded image in the characteristic space, and solving the problems of lack of direct monitoring information of the guide image, difficult training of a curve subnet and poor generalization capability.
(3) The present invention proposes Adaptive Instance Normalization (AIN) and then illumination translation of the guide image using the AIN to reduce the illumination difference between the guide image and the degraded image.
(4) The invention provides 4 self-adaptive spatial feature fusion (AFF) blocks, which fuse curve features of a guide image and restoration features of a degraded image in a self-adaptive and progressive mode so as to reconstruct an AFFNet subnet, and solve the problem that the fusion method based on cascade is still limited in utilizing complementarity between the guide image and the degraded image.
(5) The AFFNet of the invention has good generalization capability to complex and unknown degradation processes, and can effectively generate vivid results on LQ images;
(6) the invention skillfully adopts random cutting, horizontal turning and chrominance transformation (brightness and contrast) to enhance the image data;
in conclusion, the method has the advantages of simple logic, accuracy, reliability and the like, and has high practical value and popularization value in the technical field of image processing.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of protection, and it is obvious for those skilled in the art that other related drawings can be obtained according to these drawings without inventive efforts.
FIG. 1 is a logic flow diagram of the present invention.
FIG. 2 is a schematic diagram of the AFFNet network structure of the present invention.
Detailed Description
To further clarify the objects, technical solutions and advantages of the present application, the present invention will be further described with reference to the accompanying drawings and examples, and embodiments of the present invention include, but are not limited to, the following examples. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Examples
As shown in fig. 1 to fig. 2, the present embodiment provides a blind face restoration method and a system, wherein the system includes a data preprocessing module, a feature extraction module, a training module, and a testing module.
Specifically, the method comprises the following steps: as shown in fig. 1, the data preprocessing module S101 collects a blind face data set VGGFace2, evaluates the quality of the data set using laplacian gradients, removes blurry and non-face images, enhances image data using random cropping, horizontal flipping and chrominance transformation (luminance and contrast), sets the image size to 256 × 256, then converts to a corresponding tfrechrd format file, reads data in a multi-thread parallelized manner, and obtains a training and testing set;
the feature extraction module S102 is used for extracting high-dimensional image features through a convolution layer of the network based on the constructed AFFNet network;
the training module S103 is used for initializing parameters of an AFFNet network structure, inputting a blind face image into the AFFNet network, introducing 4 loss functions (reconstruction, perception, pattern and antagonism loss functions) to train the whole network structure, training and optimizing the AFFNet network by using an SGD (generalized minimum mean square) optimization algorithm, and fusing curve characteristics of a guide image and characteristics of degraded image restoration in a self-adaptive and progressive mode to obtain an optimal blind face restoration model;
and the test module S104 is used for inputting the optimal blind face restoration model for matching the test image and selecting the image with the highest accuracy as the final retrieval result.
The following describes a blind face restoration method and system, which details the guiding selection, spatial calibration, illumination translation, and adaptive feature fusion module proposed in this embodiment.
As shown in fig. 2, this embodiment proposes a weighted least square WLS model, which selects an optimal guidance image from a multi-sample image set, and then performs spatial calibration and illumination translation on the guidance image in a feature space by using a Moving Least Square (MLS) method and Adaptive Instance Normalization (AIN) method to mitigate the difference between the pose and the expression after guidance selection. Finally, 4 Adaptive Feature Fusion (AFF) blocks fuse the curve features of the guide image and the restoration features of the degraded image.
The blind face restoration method is based on a group of sample imagesIn a degraded face imageIn reconstructing HQ image thereof。、Andthe images have the same size 256 × 256, and when the image sizes are different, the images are resized to the same size (256 × 256) using bicubic sampling. Each face image obtains 68 key points through the face key point detection method, and therefore, the blind face restoration model can be expressed as:
wherein the content of the first and second substances,is a degraded face image;features representing degraded images;is a key point of the degraded image, R 2 8(k=1,…,K=68);is a key point of the guide image;number of key points (=68),It is indicated that one of the parameters,k [0,];representing model parameters.
For most guided blind face restoration methods, the pose and expression differences between the guide image and the degraded image can reduce the accuracy of the restoration. Therefore, it is preferable to select a guide image having a similar pose and expression to the degraded image. The method comprises the steps of solving a Weighted Least Square (WLS) model, measuring the similarity between key points by adopting a weighted affine distance, and determining an optimal guide image by solving a minimum weighted affine distance (minimum weighted affine distanceK * ) Can be expressed as:
wherein the content of the first and second substances,D a (L d ,) Representing an affine distance;w m is shown asmThe weight of each keypoint;andrespectively representing degraded images mA key point andka second of the guide imagemA key point;is thatIs a homogeneous representation of (a)The coordinate of (2)x,y] T Accordingly is atHas a homogeneous coordinate of [ 2 ]x,y,1] T );,W=Diag(w) Is a keypoint weight vectorwThe diagonal matrix of (a).
In this embodiment, the image is degradedI d To initialize the weight of the key point and guide the imageFinding out a guiding image with optimal accuracy in the forward propagation processAnd updating the weight of the key point through a back propagation algorithm to enable the selected guide image to have a relatively small affine distance. The learning of the weight of the key point enables the selected guide image to achieve the highest recovery precision, and the weight of the key pointl w Can be expressed as:
although the optimal guide image and the degraded image have similar postures and expressions, the error is still large, and the reconstructed image is subjected to artifact. Thus, GFRNet uses a curvilinear sub-network to spatially calibrate the guide image and the degraded image. However, due to the lack of direct monitoring information to guide the image, the curved sub-network is difficult to train and has poor generalization capability. In addition, the guide image and the degraded image are generally taken under different lighting conditions. To solve these problems, the present embodiment employs the MLS method for spatial calibration and the AIN method for illumination translation.
The embodiment introduces a Moving Least Squares (MLS) method to calibrate the guide image and the degraded image in the feature space, rather than learning curve subnets, and the difference of the pose and the expression can be greatly reduced through guide selection. In addition, the MLS calibration is minute, and the feature extraction sub-network of the curve sub-network can perform end-to-end learning in the training process, so that the feature extraction and the MLS can work cooperatively to calibrate the image more accurately.
Specific diagonal matrixW p Has a size of 68X 68, the first of the diagonal matrixmA diagonal element. Thus, a specific affine matrixM p Can be expressed as:
wherein the content of the first and second substances,L g representing optimal guide image key points;L d key points representing degraded images;is thatIs a homogeneous representation of;pis the coordinates of the degraded image and is,p=(x,y) (ii) a The curve subnet can obtain curve characteristics through bilinear interpolation to guide the curve characteristics of the imageCan be expressed as:
wherein (A), (B), (C), (D), (C), (x,y) Is a coordinate of the degraded image;is a coordinate of the guide image;is (a)x,y) Homogeneous coordinates of (a);Nis that4 nearest neighbors of;F g features representing an optimal guide image; the curve features are differentiable, so feature extraction can also be learned end-to-end in the training process.
In the present embodiment, Adaptive Instance Normalization (AIN) is the transformation of the degraded image into the required pattern. The invention takes illumination as a pattern, utilizes AIN to adjust the curve characteristic of the guide image to ensure that the curve characteristic of the guide image has illumination similar to the restoration characteristic of the degraded image, and guides the curve characteristic of the imageF g w a,,Can be expressed as:
wherein the content of the first and second substances,F d andF g w,a curve feature representing a restoration feature of the degraded image and a curve feature of the guide image, respectively;andmean and standard deviation, respectively.
GFRNet employs cascade-based fusion and is performed in multiple feature layers. However, the cascade-based fusion method is still limited in exploiting complementarity between the guide image and the degraded image. Therefore, this embodiment proposes 4 AFF blocks to adaptively and progressively fuse the curve feature of the guide image and the feature of the degraded image restoration, thereby reconstructing the AFFNet subnet. The AFFNet subnet consists of two shuffle layers, each followed by two residual blocks.
In this embodiment, on the one hand, the instructional image typically contains more high-quality facial details. On the other hand, in the case of a liquid,F g w a,,andF d the HQ image is reconstructed better by spatially transferring the complementarity. Therefore, the face image obtains the features of the key points of the face through a key point detection algorithmF l Then, thenF g w a,,、F d AndF l as input features and using a control module to generate an attention maskF m ,F m Guide(s) toF g w a,,AndF d the fused features are passed through 4 AFF blocks to obtain combined featuresF c 。
Compared with the GFRNet, the cascade-based fusion AFF is a more flexible fusion method and can adapt to different degraded images and guide images. Due to the advantages of self-adaption and progressive fusion, the AFFNet has good generalization capability on LQ face images in a complex and unknown degradation process.
In this embodiment, 4 kinds of loss functions (reconstruction, perception, pattern, and antagonism loss functions) are specifically introduced to train the whole network structure, which is as follows:
(1) the reconstruction loss function is used for constraining the reconstructed image to be closer to the real image and measuring by adopting the mean square errorAndIdifference between, mean square errorI MSE ) Can be expressed as:
wherein the content of the first and second substances,andIrespectively representing a reconstructed image and a real image;C、HandWrepresenting the channel, height and width of the image, respectively.
(2) Perceptual loss function for constraining reconstructed imagesSo as to improve the visual quality of the reconstructed image and make the reconstructed image closer to a real image in a characteristic space.
Wherein the content of the first and second substances,second of network architecture faceNet model representing pre-trained face recognitionuThe characteristics of the layers are such that,u [1,2,3,4]. Total loss of massL rec Can be expressed as:
wherein the content of the first and second substances, MSE and perc is a weight parameter that is a function of, MSE the value of (a) is in the range of 0 to 1, perc is in the range of 0 to 1.
(3)The pattern loss function can generate accurate visual effect, pattern lossl style Can be expressed as:
(4) the antagonism loss is an effective method for improving the visual quality and is widely applied to an image generation task. The present invention introduces spectral normalization on the weight of each convolutional layer and trains discriminators with antagonism lossesl adv D,Sum generatorl adv G,The formula is as follows:
wherein the content of the first and second substances,l adv D,for updating the discriminator; whilel adv G,Used to update AFFNet;Iandrespectively representing a real image and a reconstructed image;P(I) AndP() Respectively representing a real image distribution and a reconstructed image distribution; g and D both represent a neural network; e represents the maximum likelihood estimate and the maximum likelihood estimate,to representIToP(I) Maximum likelihood estimation of (2);to representToP() Maximum likelihood estimation of (2);to representToP() Maximum likelihood estimation of (2);representing real imagesIInputting a neural network to generate a picture;to representThe input neural network generates a picture.
Therefore, the overall perception is lostL real Can be expressed as:
wherein the content of the first and second substances, styl and adv is a weight parameter that is a function of, styl the value of (a) is in the range of 0 to 1, adv is in the range of 0 to 1.
In this embodiment, the overall objective function for blind face restorationLIs defined as:
in addition, the degradation model of the present embodiment can be expressed as:
wherein the content of the first and second substances,representing a convolution operation;krepresenting a blur kernel;representing a bicubic downsampler;indicating having a noise levelGaussian noise of (2);JPEG q is expressed with a quality factorqJPEG compression of (1). The degradation model can generate a vivid LQ image, thereby achieving the highest restoration precision.
All experiments were developed on NVIDIA platform using python3.7, and a blind face data set VGGFace2 was collected, where VGGFace2 data set contained 16 ten thousand sets of face images, 10 thousand sets of training sets, and 6 thousand sets of testing sets, each set containing 3-10 HQ sample images, and the poses and expressions of the training and testing sets did not overlap. The experiment used an SGD optimizer to train AFFNet with a batch size of 8 and momentum parameters 1 =0.5 and 2 =0.999, initial learning rate 0.0002, loss term weight parameter MSE =300、 perc =5、 style =1 and adv and (2). Experiments used peak signal-to-noise ratio (PSNR), Structural Similarity (SSIM) and LPIPS to quantify the accuracy of the model.
TABLE 1
Experiments compared 4 variants of AFFNet to verify the validity of adaptive feature fusion, 1-Consat fused 1 adaptive spatial feature fusion block, 4-Consat fused 4 adaptive spatial feature fusion block, w/o 1-Atten and w/o 4-Atten removed the attention mask in AFF block, respectively. The results are shown in Table 1, AFFNet is superior to Concat and w/o Atten in PSNR and SSIM indexes, and the effectiveness of adaptive spatial feature fusion is proved. To verify the effectiveness of progressive mode fusion, the AFFNet model was experimentally constructed from 4 different AFF blocks (1-AFF, 2-AFF, 4-AFF, and 8-AFF). Due to the advantage of progressive mode fusion, better accuracy can be obtained by stacking more AFF blocks, and when the number of AFF blocks is more than 4, PSNR and SSIM begin to saturate. Therefore, 4-AFF acts as the optimal AFFNet model. In addition, three AFFNet variants were considered in the experiment, w/o AIN by removing the AIN moduleW/o MLS by removing MLS modules, and Untrain F g Subnet extraction by FaceNet network initialization featureF g . AFFNet is the most accurate in Table 1, indicating that the micromability of MLS makesF g Has learning ability and good effect on the space calibration of the degraded image and the selected guide image. In addition, illumination translation and self-adaptive fusion based on AIN can effectively generate a vivid result on a real LQ image, and the restoration accuracy and generalization capability of the AFFNet are improved.
The above-mentioned embodiments are only preferred embodiments of the present invention, and do not limit the scope of the present invention, but all the modifications made by the principles of the present invention and the non-inventive efforts based on the above-mentioned embodiments shall fall within the scope of the present invention.
Claims (10)
1. A blind face restoration method, comprising the steps of:
acquiring a blind face data set, evaluating the quality of the blind face data set by using a Laplacian gradient, and removing blurred and non-human face images; enhancing image data of the blind face data set, and randomly distributing to obtain a training set and a test set;
constructing an AFFNet network;
inputting images of a training set into an AFFNet network, training the AFFNet network by combining a reconstruction loss function, a perception loss function, a style loss function and an antagonism loss function, and training and optimizing the AFFNet network by using an SGD (generalized serving-grid-directed) optimization algorithm to obtain an optimal blind face restoration model;
and inputting the images of the test set into the optimal blind face restoration model, and matching and selecting to obtain the image with the highest accuracy as a final retrieval result.
2. A blind face restoration method according to claim 1, wherein the enhancing image data of the blind face data set comprises randomly cropping, horizontally flipping and chroma transforming the image of the blind face data set.
3. The blind face restoration method according to claim 1, wherein the expression of the blind face restoration model is:
wherein the content of the first and second substances,representing a degraded image of the face of a person,a feature representing a degraded image is present in the image,is a key point of the degraded image,a key point representing the guide image is displayed,the number of the key points is represented,denotes a parameter, k[0,],Representing model parameters.
4. The blind face restoration method according to claim 3, further comprising performing degradation model processing on the blind face data, wherein the expression is as follows:
wherein the content of the first and second substances,which represents a convolution operation, is a function of,Ka blur kernel is represented by the number of pixels,a bi-cubic down-sampler is shown,indicating having a noise levelThe noise of the gaussian noise of (a),JPEG q is expressed with a quality factorqJPEG compression of (1).
5. The blind face restoration method according to claim 4, wherein the AFFNet network adopts a weighted least square method WLS model to select an optimal guide image from the blind face data set, performs spatial calibration and illumination translation on the guide image in a feature space by using a moving least square method and adaptive example normalization, and fuses curve features of the guide image and restoration features of a degraded image by using adaptive space features.
6. The blind face restoration method according to claim 5, wherein the Weighted Least Squares (WLS) model selects an optimal guidance image from the blind face data set by using a minimum weighted affine distance, and the expression is as follows:
wherein the content of the first and second substances,D a (L d ,) Representing an affine distance;w m is shown asmThe weight of each keypoint;andrespectively representing degraded imagesmA key point andka second of the guide imagemA key point;is thatThe homogeneity of (1);Wrepresenting keypoint weight vectorswA diagonal matrix of (a);indicating the interchange of rows and columns of the matrix.
7. A blind face restoration method according to claim 6, characterized by usingWeight of initialization key point of face image representing degradation, searching guide imageOptimal guide image in forward propagationUpdating the weight of the key point by using a back propagation algorithm, wherein the weight expression of the key point is as follows:
8. The blind face restoration method according to claim 7, wherein the spatial calibration and illumination translation of the guidance image in the feature space using the moving least squares method and adaptive instance normalization comprises the following steps:
affine matrix of the guide imageM p The expression of (a) is:
wherein the content of the first and second substances,L g representing optimal guide image key points;L d key points representing degraded images;is thatIs a homogeneous representation of;pis the coordinates of the degraded image and is,p=(x,y);
obtaining curve characteristics of guide image through bilinear interpolationThe expression is as follows:
wherein (A), (B), (C), (D), (C), (x,y) A coordinate representing the degraded image;a coordinate representing the guide image;is (a)x,y) Homogeneous coordinates of (a);Nto represent4 nearest neighbors of;F g features representing an optimal guide image;
and (3) adjusting the curve characteristics of the guide image by using self-adaptive example normalization, wherein the expression is as follows:
9. The blind face restoration method according to claim 1, wherein the joint reconstruction loss function, the perceptual loss function, the pattern loss function and the antagonism loss function train an AFFNet network, and the expression is as follows:
wherein the content of the first and second substances,a joint loss function representing a perceptual loss function and a reconstruction loss function,representing a perceptual loss function;
the expression of the joint loss function of the perceptual loss function and the reconstruction loss function is as follows:
wherein the content of the first and second substances, MSE a weight parameter representing a reconstruction loss function, which has a value in the range of 0 to 1, perc a weight parameter representing a perception loss function, wherein the value range of the weight parameter is 0 to 1;
and (3) adopting a reconstruction loss function to constrain the reconstructed image so as to obtain a reconstructed image close to the real image, and adopting a mean square error to measure the difference between the reconstructed image and the real image, wherein the expression is as follows:
wherein the content of the first and second substances,which represents the reconstructed image(s) of the image,representing the real image, C, H and W representing the channel, height and width of the image, respectively;
and (3) adopting a perception loss function to constrain the reconstructed image, wherein the expression is as follows:
wherein the content of the first and second substances,second to represent a pre-trained faceNet modeluLayer characteristics; the above-mentionedu [1,2,3,4];
The perceptual loss functionL real The expression of (a) is:
wherein the content of the first and second substances, styl a weight parameter representing a pattern loss function, which ranges from 0 to 1, adv a weight parameter representing a resistance loss function, the value range of which is 0 to 1;
the expression of the style loss function is:
wherein the content of the first and second substances,which represents the reconstructed image(s) of the image,representing a real image, C, H and W representing the channel, height and width of the image respectively,second to represent a pre-trained faceNet modeluLayer characteristics; the above-mentionedu [1,2,3,4](ii) a The above-mentionedRepresenting the row-column interchange of the matrix;
discriminator for AFFNet network by adopting antagonism loss functionl adv D,Sum generatorl adv G,Training is carried out, and the expression is as follows:
wherein the content of the first and second substances,Iandthe real image and the reconstructed image are represented separately,P(I) AndP() Representing the true image distribution and the reconstructed image distribution, respectively, G and D both represent a neural network, E represents the maximum likelihood estimate,to representIToP(I) The maximum likelihood estimate of (a) is,to representToP() The maximum likelihood estimate of (a) is,to representToP() The maximum likelihood estimate of (a) is,representing real imagesIThe input neural network generates a picture which is then,to representThe input neural network generates a patch.
10. A system for using the blind face restoration method according to any one of claims 1 to 9, comprising:
the data preprocessing module is used for acquiring a blind face data set, evaluating the quality of the blind face data set by utilizing a Laplacian gradient and removing blurred and non-face images; enhancing image data of the blind face data set, and randomly distributing to obtain a training set and a test set;
the feature extraction module is used for extracting high-dimensional image features based on the constructed AFFNet network;
the training module is used for initializing parameters of the AFFNet network, inputting images of a training set into the AFFNet network, training the AFFNet network by combining a reconstruction loss function, a perception loss function, a style loss function and a resistance loss function, training and optimizing the AFFNet network by utilizing an SGD (generalized serving-fuzzy-decomposition) optimization algorithm, and obtaining an optimal blind face restoration model;
and the test module is used for inputting the images of the test set into the optimal blind face restoration model, matching and selecting the images to obtain the image with the highest accuracy as the final retrieval result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110241203.XA CN112598604A (en) | 2021-03-04 | 2021-03-04 | Blind face restoration method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110241203.XA CN112598604A (en) | 2021-03-04 | 2021-03-04 | Blind face restoration method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112598604A true CN112598604A (en) | 2021-04-02 |
Family
ID=75210161
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110241203.XA Pending CN112598604A (en) | 2021-03-04 | 2021-03-04 | Blind face restoration method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112598604A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113222910A (en) * | 2021-04-25 | 2021-08-06 | 南京邮电大学 | Method and device for extracting characteristic points of X-ray head shadow measurement image based on perception loss |
CN113239866A (en) * | 2021-05-31 | 2021-08-10 | 西安电子科技大学 | Face recognition method and system based on space-time feature fusion and sample attention enhancement |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111898699A (en) * | 2020-08-11 | 2020-11-06 | 海之韵(苏州)科技有限公司 | Automatic detection and identification method for hull target |
CN112131121A (en) * | 2020-09-27 | 2020-12-25 | 腾讯科技(深圳)有限公司 | Fuzzy detection method and device for user interface, electronic equipment and storage medium |
CN112149591A (en) * | 2020-09-28 | 2020-12-29 | 长沙理工大学 | SSD-AEFF automatic bridge detection method and system for SAR image |
-
2021
- 2021-03-04 CN CN202110241203.XA patent/CN112598604A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111898699A (en) * | 2020-08-11 | 2020-11-06 | 海之韵(苏州)科技有限公司 | Automatic detection and identification method for hull target |
CN112131121A (en) * | 2020-09-27 | 2020-12-25 | 腾讯科技(深圳)有限公司 | Fuzzy detection method and device for user interface, electronic equipment and storage medium |
CN112149591A (en) * | 2020-09-28 | 2020-12-29 | 长沙理工大学 | SSD-AEFF automatic bridge detection method and system for SAR image |
Non-Patent Citations (1)
Title |
---|
XIAOMING LI等: "Enhanced Blind Face Restoration with Multi-Exemplar Images and Adaptive Spatial Feature Fusion", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113222910A (en) * | 2021-04-25 | 2021-08-06 | 南京邮电大学 | Method and device for extracting characteristic points of X-ray head shadow measurement image based on perception loss |
CN113222910B (en) * | 2021-04-25 | 2022-11-01 | 南京邮电大学 | Method and device for extracting characteristic points of X-ray head shadow measurement image based on perception loss |
CN113239866A (en) * | 2021-05-31 | 2021-08-10 | 西安电子科技大学 | Face recognition method and system based on space-time feature fusion and sample attention enhancement |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111047516B (en) | Image processing method, image processing device, computer equipment and storage medium | |
CN110136062B (en) | Super-resolution reconstruction method combining semantic segmentation | |
Yu et al. | A unified learning framework for single image super-resolution | |
Cai et al. | FCSR-GAN: Joint face completion and super-resolution via multi-task learning | |
WO2021022929A1 (en) | Single-frame image super-resolution reconstruction method | |
Cheng et al. | Zero-shot image super-resolution with depth guided internal degradation learning | |
CN112541864A (en) | Image restoration method based on multi-scale generation type confrontation network model | |
CN111626927B (en) | Binocular image super-resolution method, system and device adopting parallax constraint | |
CN112288627A (en) | Recognition-oriented low-resolution face image super-resolution method | |
CN113538246B (en) | Remote sensing image super-resolution reconstruction method based on unsupervised multi-stage fusion network | |
Guan et al. | Srdgan: learning the noise prior for super resolution with dual generative adversarial networks | |
CN112598604A (en) | Blind face restoration method and system | |
Zheng et al. | T-net: Deep stacked scale-iteration network for image dehazing | |
Hu et al. | Meta-USR: A unified super-resolution network for multiple degradation parameters | |
Muqeet et al. | Hybrid residual attention network for single image super resolution | |
CN115526777A (en) | Blind over-separation network establishing method, blind over-separation method and storage medium | |
CN116934592A (en) | Image stitching method, system, equipment and medium based on deep learning | |
CN115526779A (en) | Infrared image super-resolution reconstruction method based on dynamic attention mechanism | |
Xia et al. | Meta-learning based degradation representation for blind super-resolution | |
CN113034388B (en) | Ancient painting virtual repair method and construction method of repair model | |
CN114663880A (en) | Three-dimensional target detection method based on multi-level cross-modal self-attention mechanism | |
CN109615576B (en) | Single-frame image super-resolution reconstruction method based on cascade regression basis learning | |
CN114359041A (en) | Light field image space super-resolution reconstruction method | |
Liang et al. | Image deblurring by exploring in-depth properties of transformer | |
CN112200752A (en) | Multi-frame image deblurring system and method based on ER network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210402 |