CN116362991A

CN116362991A - Blind face recovery method based on domain alignment GAN priori

Info

Publication number: CN116362991A
Application number: CN202310061184.1A
Authority: CN
Inventors: 李志恒; 尹宇婷
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2023-01-17
Filing date: 2023-01-17
Publication date: 2023-06-30

Abstract

A blind face recovery method based on domain alignment GAN priori comprises the following steps: acquiring a low-definition face image training set; constructing a network comprising domain-aligned GAN inversion branches and feature extraction-fusion branches; the domain aligned GAN inversion branch includes encoder E _w A pre-trained generator G, an image domain discriminator D and a potential spatial domain discriminator D _w The method comprises the steps of carrying out a first treatment on the surface of the The feature extraction-fusion branch comprises an Encoder Encoder and a Decoder Decode, wherein the Decoder Decode also comprises a feature fusion module; inputting the training set into a network for training, wherein loss functions are defined from the image domain and the potential spatial domain of the GAN network respectively to limit the training of the network; and inputting the low-definition face image to be restored into a trained network for restoration to obtain a restored high-definition face image. The inventionThe face image with rich details, high fidelity and good definition can be obviously restored from the blurred face.

Description

Blind face recovery method based on domain alignment GAN priori

Technical Field

The invention relates to the technical field of face restoration, in particular to a blind face restoration method based on domain alignment GAN (generation countermeasure network) prior.

Background

With the development of computers and artificial intelligence, applications such as ultra-high definition video, face editing and face recognition are layered, and requirements and dependence on high-quality face images are increasingly increased. However, in the real world, due to objective factors such as illumination, weather, equipment, compressed sampling during storage, and the like, the face image often suffers from complex and various unknown degradation such as low resolution, noise, blurring, compression, and the like, so that the requirements of a series of application scenes cannot be met. Therefore, the restoration of low quality face images is an urgent challenge to be solved.

Blind face restoration is an image processing technique that reconstructs high quality faces with real details from an unknown degraded low quality face image. In recent years, due to the great potential of face restoration technology in practical application, many blind face restoration algorithms have been proposed successively, and early work has been mainly performed by using the inherent facial properties of face images, such as facial analysis images, facial key points, facial thermodynamic diagrams and the like, as a priori knowledge to help restore the geometric features and facial details of the faces, but these methods often have difficulty in obtaining satisfactory results due to geometric a priori inaccuracy extracted from low-quality images and lack of abundant texture details. Other methods use high quality faces as references to guide the restoration of low quality faces, but ignore the inability to obtain high quality face references in real scenes, thereby limiting the applicability of the algorithm.

It should be noted that the information disclosed in the above background section is only for understanding the background of the present application and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.

Disclosure of Invention

The invention aims to overcome the defects of the background technology and provide a blind face recovery method based on domain alignment GAN prior so as to recover a face image with richer details, higher fidelity and better definition from a blurred face.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

in a first aspect, a domain alignment GAN a priori based blind face recovery method includes the steps of:

a1, acquiring a low-definition face image training set;

a2, constructing a network, wherein the network comprises a domain-aligned GAN inversion branch and a feature extraction-fusion branch; the domain aligned GAN inversion branch includes an encoder E _w A pre-trained generator G, an image domain discriminator D and a potential spatial domain discriminator D _w Wherein the encoder E _w The pre-trained generator G is used for obtaining the generated face prior characteristics from the corresponding potential codes, the image domain discriminator D is used for distinguishing the reconstructed image from the real image, and the potential space domain discriminator D _w The method comprises the steps of performing domain alignment on the potential codes obtained through inversion and the potential codes of the original training of the GAN network; the feature extraction-fusion branch comprises an Encoder Encoder and a Decoder Decode, wherein the Decoder Decode also comprises a feature fusion module;

a3, inputting the training set into the network for training, wherein loss functions are defined from the image domain and the potential space domain of the GAN network respectively so as to limit the training of the network;

and A4, inputting the low-definition face image to be restored into a trained network for restoration, and obtaining a restored high-definition face image.

In a second aspect, a computer readable storage medium stores a computer program which, when executed by a processor, implements the domain alignment GAN a priori based blind face recovery method

The invention has the following beneficial effects:

the invention provides a blind face restoration method based on domain alignment GAN priori, which can utilize color, texture and other generated face priori information contained in a pre-trained GAN network (such as a face generation network StyleGAN generator) to assist in restoring faithful details in the face restoration process. Unlike available face recovering method based on GAN, the present invention utilizes domain alignment GAN reverse branching to make the network utilize the generated face priori effectively and to restore face in feature fusion mode to reach the balance of detail fidelity and fidelity. The method can effectively restore the face image with richer details, higher fidelity and better definition from the blurred face, thereby expanding the application range of other technologies related to the application of the face image.

Advantages of embodiments of the present invention include:

1. adding the discriminator D _w And training a GAN inversion branch to enable the inversion potential codes to be aligned with a real potential space, and improving the quality of the generated face priori.

2. According to the feature extraction-fusion branch, the feature fusion module fuses the multi-resolution convolution feature extracted from the low-definition input image and the generated face priori feature extracted from the generator G, so that the recovery of a face region and an image background can be better considered, the faithful details of an original low-definition image are fully restored, and the face fidelity is maintained.

3. The embodiment of the invention can restore the complex degraded low-definition face image in the real world, and has better effect and practicability compared with the existing method.

Drawings

Fig. 1 is a flowchart of a blind face recovery method based on domain alignment GAN prior according to an embodiment of the invention.

Fig. 2 is a schematic diagram of a network structure according to an embodiment of the present invention.

Fig. 3 is a network training flow chart according to an embodiment of the present invention.

Detailed Description

The following describes embodiments of the present invention in detail. It should be emphasized that the following description is merely exemplary in nature and is in no way intended to limit the scope of the invention or its applications.

Referring to fig. 1, an embodiment of the present invention provides a blind face recovery method based on domain alignment GAN prior, including the following steps:

a1, acquiring a low-definition face image training set;

a2, constructing a network, wherein the network comprises a domain-aligned GAN inversion branch and a feature extraction-fusion branch; the domain aligned GAN inversion branch includes an encoder E _w Pre-trained generator G, image domainDiscriminator D and potential spatial domain discriminator D _w Wherein the encoder E _w The pre-trained generator G is used for obtaining the generated face prior characteristics from the corresponding potential codes, the image domain discriminator D is used for distinguishing the reconstructed image from the real image, and the potential space domain discriminator D _w The method comprises the steps of performing domain alignment on the potential codes obtained through inversion and the potential codes of the original training of the GAN network; the feature extraction-fusion branch comprises an Encoder Encoder and a Decoder Decode, wherein the Decoder Decode also comprises a feature fusion module;

Specific embodiments of the present invention are described further below.

Referring to fig. 2-3, in some embodiments, a domain alignment GAN a priori based blind face recovery method may include the steps of:

s1: training data set construction: n high-definition face images I are obtained from an open source high-definition face image database _HQ The corresponding low-definition face image I is obtained through degradation treatment _LQ Forming a training set;

s2: and (3) network construction: constructing a network structure required during training;

s3: defining a loss function: limiting training of the network by defining loss functions from the image domain and the potential spatial domain of the GAN network (e.g., face generation network StyleGAN), respectively;

s4: training a network: inputting the training set into the network for training;

s5: low-definition face image restoration: and inputting the low-definition face image to be restored into a trained network for restoration to obtain a restored high-definition face image.

The open source high definition face image database can adopt an FFHQ face data set and comprises 7w Zhang Gaoqing face images.

Wherein the degradation process may be performed in the following manner:

and carrying out a series of degradation processing of random parameters such as blurring, downsampling, gaussian noise adding, JPEG compression and the like on the high-definition face image. The mathematical formula can be expressed as:

wherein k is _σ ，n _δ Representing fuzzy kernel and gaussian noise, operator

Respectively two-dimensional convolution, downsampling, and JPEG compression. The values of the parameters sigma, s, delta and q are respectively {0,2:10}, {1:8}, {0:15}, and {60:100}.

Preferably, the network structure constructed in step S2 includes the following structure:

domain aligned GAN inversion branches, encoded by an encoder E _w A pre-trained generator G, an image domain discriminator D, and a potential spatial domain discriminator D _w Composition is prepared.

The feature extraction-fusion branch consists of an Encoder Encoder and a Decoder Decoder, wherein the Decoder also comprises a designed feature fusion module.

Preferably, the loss function defined in step S3 includes:

(1) Image reconstruction losses, including L2 losses in pixel space and perceptual losses in feature space for reconstructed images and input low definition images.

(2) The contrast loss of image space can more effectively generate realistic textures in the process of image restoration.

(3) The loss of countering potential space in the GAN network aligns the potential code resulting from the encoder reversing the low definition image with the potential code when the original GAN network is trained.

Referring to fig. 2 to 3, in some embodiments, a blind face recovery method based on domain alignment GAN a priori may specifically include the following steps:

s1, training data set construction

7w Zhang Gaoqing face images are downloaded from the FFHQ open source face data set and randomly divided into a training set and a testing set according to 9:1, all the images are processed through a random mixing degradation model, and a low-definition-high-definition paired data set for training is constructed.

Preferably, the random hybrid degradation model may be expressed as:

S2, network construction

As shown in fig. 2, the domain aligned GAN inversion branch consists of the following parts: encoder E _w The method comprises the steps of inverting a low-definition input image to a potential space of a GAN network to obtain a potential code; a pre-trained generator G for obtaining generated face prior characteristics from corresponding potential codes; the frequency discriminator D is used for distinguishing the reconstructed image from the real image; frequency discriminator D _w For domain alignment of the inverted potential code with the potential code of the GAN network original training.

The feature extraction-fusion branch consists of an Encoder and a Decoder. The encoder is stacked by a plurality of reblocks for extracting convolution features from degraded placed images we gradually halve the resolution of the feature map to 4 ² . Finally, inputting the multi-resolution convolution characteristic into a Decoder for image reconstructionAnd (5) building. The Decoder is stacked by a plurality of unblocks, each followed by a feature fusion module for fusing two different features until a final result is generated. Furthermore, bridging the fusion of shallow and deep layers with a jump connection by the decoder and encoder can provide more semantic information during decoding.

The feature fusion module adaptively integrates local features and global dependency relations through a global attention module and a local attention module which are connected in series. A global attention module is first employed to selectively emphasize some specific feature maps of all channels. And local attention modules are connected in series, and features are selectively aggregated at each position. A final feature expression is obtained.

S3, defining a loss function

The usual image restoration problem is to minimize the root Mean Square Error (MSE) between the restored image and the original low-definition image, but usually makes the restored image more blurred, so we add to the generation of the contrast loss in the contrast network (GAN) during model training, so that the restored image has clearer and more natural texture boundaries. In addition, training the E _w When we propose to add the countering loss of the potential spatial domain so that the inverted potential code falls in a high dimensional space with the same data distribution as the potential code of the original GAN network training. The loss function after modification is as follows:

the loss function when training the domain aligned GAN inversion branches is as follows:

wherein I is _lq And I _hq Is a degraded low definition input image and a reference high definition image, ew is an encoder, G is a generator, λ _L2 And lambda (lambda) _per The weight super-parameters of the pixel and the perception loss are respectively represented, and phi (·) represents a VGG feature extraction model;

where D is the potential spatial domain discriminator,

is a weight super-parameter of the image domain against loss;

wherein D is _w Is a potential spatial domain discriminator which,

is a potential spatial domain fight loss;

finally, the loss function is lost by the reconstruction when training the domain aligned GAN inversion branches

Image Domain fight loss->

And potential spatial domain fight losses->

The composition is as follows:

the loss function when training the feature extraction-fusion branch is as follows:

wherein I is _lq 、I _hg And I _rec Respectively representing a degraded low definition input image, a reference high definition image and a final reconstructed image, lambda _L2 And lambda (lambda) _per Representing pixel and perceived loss, respectivelyThe weight of (2) exceeds the reference;

wherein lambda is _adv A weight super-parameter representing countermeasures against losses;

finally, the loss function is lost by the reconstruction when training the feature extraction-fusion branch

And countering losses

The composition is as follows:

further, as shown in fig. 3, the network training at least includes:

s401: training the domain aligned GAN inversion branches. Training the E _w And the input low-definition pictures are reversed to the potential space of the pre-trained GAN network to obtain potential codes, and then the potential codes are input to the pre-training generator G to obtain a reconstructed image. Obtaining a model based on PSNR index by limiting the reconstruction loss function between the reconstructed image and the high definition image as the E _w Is initialized;

s402: adding the D in the domain aligned GAN inversion branch _w And D, adding the contrast loss of the image space and the potential space to perform iterative training until the error of the output layer reaches the preset precision requirement or the training frequency reaches the maximum iterative frequency, thereby obtaining trained E _w ，D _w D, a step of preparing a product;

wherein the encoder E _w And said discriminator D _w And D has learning rates of 1e-5 and 1e-4, respectively, and halved after every 3e4 iterations.

S403: subjecting said E to _w ，D _w And D, fixing network parameters, training the feature extraction-fusion branch until the output layer error reaches the preset precision requirement or the training frequency reaches the maximum iteration frequency, and storing the network structure and parameters of the current model to obtain the optimized model.

Wherein, the learning rates of the Encoder and the Decode are respectively 1e-4 and 2e-4.

The blind face restoration method based on domain alignment GAN priori in the embodiment of the invention can utilize color, texture and the like contained in a pre-trained GAN network (such as a face generation network StyleGAN generator) to generate face priori information so as to assist in restoring the faithful details in the face restoration process. Unlike available face recovering method based on GAN, the present invention utilizes domain alignment GAN reverse branching to make the network utilize the generated face priori effectively and to restore face in feature fusion mode to reach the balance of detail fidelity and fidelity. The method can effectively restore the face image with richer details, higher fidelity and better definition from the blurred face, thereby expanding the application range of other technologies related to the application of the face image.

Advantages of embodiments of the present invention include:

adding the discriminator D _w And training a GAN inversion branch to enable the inversion potential codes to be aligned with a real potential space, and improving the quality of the generated face priori.

According to the feature extraction-fusion branch, the feature fusion module fuses the multi-resolution convolution feature extracted from the low-definition input image and the generated face priori feature extracted from the generator G, so that the recovery of a face region and an image background can be better considered, the faithful details of an original low-definition image are fully restored, and the face fidelity is maintained.

The embodiment of the invention can restore the complex degraded low-definition face image in the real world, and has better effect and practicability compared with the existing method.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The background section of the present invention may contain background information about the problems or environments of the present invention and is not necessarily descriptive of the prior art. Accordingly, inclusion in the background section is not an admission of prior art by the applicant.

The foregoing is a further detailed description of the invention in connection with specific/preferred embodiments, and it is not intended that the invention be limited to such description. It will be apparent to those skilled in the art that several alternatives or modifications can be made to the described embodiments without departing from the spirit of the invention, and these alternatives or modifications should be considered to be within the scope of the invention. In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "preferred embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Those skilled in the art may combine and combine the features of the different embodiments or examples described in this specification and of the different embodiments or examples without contradiction. Although embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope of the invention as defined by the appended claims.

Claims

1. The blind face recovery method based on domain alignment GAN priori is characterized by comprising the following steps:

a1, acquiring a low-definition face image training set;

2. The domain alignment GAN a priori based blind face restoration method according to claim 1, wherein in step A1, N high definition face images I are obtained _HQ The corresponding low-definition face image I is obtained through degradation treatment _LQ Forming a training set;

wherein the degradation treatment adopts the following modes:

and carrying out random parameter blurring, downsampling, gaussian noise adding and JPEG compression degradation processing on the high-definition face image, wherein the mathematical formula is expressed as follows:

↓ _s ，JPEG _q The values of parameters sigma, s, delta and q are respectively shown as {0,2:10}, {1:8}, {0:15}, and {60:100}.

3. A domain alignment GAN a priori based blind face restoration method according to claim 1 or 2, wherein multi-resolution convolution features are extracted from low-definition input images and generated face a priori features are extracted from the pre-trained generator G by the feature fusion module.

4. A domain alignment GAN a priori based blind face recovery method according to any of claims 1 to 3, wherein the feature fusion module adaptively integrates local features and global dependencies through a global attention module and a local attention module in series; the global attention module is adopted to selectively emphasize specific feature graphs of all channels, and local attention modules are connected in series to selectively aggregate features at each position, so that a final feature expression is obtained.

5. A domain alignment GAN prior based blind face restoration method according to any of claims 1 to 4 wherein the loss function comprises an image reconstruction loss comprising an L2 loss in pixel space and a perceptual loss in feature space of the reconstructed image and the input low definition image, an image space contrast loss for generating realistic textures during image restoration, and a GAN network potential space contrast loss for aligning a potential code resulting from inverting the low definition image by an encoder with a potential code upon training of the original GAN network.

6. The domain alignment GAN a priori based blind face recovery method of any of claims 1 to 5,

where D is the potential spatial domain discriminator,

is a weight super-parameter of the image domain against loss;

wherein D is _w Is a potential spatial domain discriminator which,

is a potential spatial domain fight loss;

Image Domain fight loss->

And potential spatial domain countermeasures against losses/>

The composition is as follows:

wherein I is _lq 、I _hq And I _rec Respectively representing a degraded low definition input image, a reference high definition image and a final reconstructed image, lambda _L2 And lambda (lambda) _per Respectively representing the weight super-parameters of the pixel and the perception loss;

And counter-loss->

The composition is as follows:

7. the domain alignment GAN a priori based blind face recovery method according to any of claims 1 to 6, wherein in step A3, the training specifically comprises the steps of:

a31: training the domain aligned GAN inversion branch: training the encoder E _w The input low-definition pictures are reversely transferred to a potential space of a pre-trained GAN network to obtain potential codes, and then the potential codes are input to the pre-training generator G to obtain a reconstructed image; obtaining a model based on PSNR index by limiting a reconstruction loss function between a reconstructed image and a high definition image as the encoder E _w Is initialized;

a32: adding the image domain discriminator D and the potential spatial domain discriminator D in the domain alignment GAN inversion branch _w Adding the contrast loss of the image space and the potential space to perform iterative training until the error of the output layer reaches the preset precision requirement or the training frequency reaches the maximum iterative frequency, thereby obtaining a trained encoder E _w The image domain discriminator D and the potential spatial domain discriminator D _w ；

A33: the encoder E _w The image domain discriminator D and the potential spatial domain discriminator D _w Training the feature extraction-fusion branch until the output layer error reaches the preset precision requirement or the training frequency reaches the maximum iteration frequency, and storing the network structure and parameters of the current model to obtain the optimized model.

8. The domain alignment GAN a priori based blind face recovery method of claim 7, wherein the encoder E is set in stages a31 and a32 _w And the potential spatial domain discriminator D _w The learning rate of the image domain discriminator D is respectively 1e-5 and 1e-4, and halved after every 3e4 iterations; setting learning rates of the Encoder Encoder and the Decoder Decode to be 1e-4 and 2e-4 respectively in the stage of the step A33.

9. The domain alignment GAN a priori based blind face recovery method of any of claims 1 to 8, wherein the Encoder is stacked by a plurality of unblocks, from downConvolutional features are extracted from the stage-placed image to gradually halve the resolution of the feature map to 4 ² The method comprises the steps of carrying out a first treatment on the surface of the Finally, inputting the multi-resolution convolution characteristic into the Decoder for image reconstruction; the Decoder is formed by stacking a plurality of unblocks, and each unblock is followed by a feature fusion module for fusing two different features until a final result is generated; preferably, the Decoder and the Encoder bridge the fusion of the shallow and deep layers with a jump connection.

10. A computer readable storage medium storing a computer program, which when executed by a processor, implements a domain alignment GAN a priori based blind face recovery method according to any of claims 1 to 9.