CN116664435A - Face restoration method based on multi-scale face analysis map integration - Google Patents
Face restoration method based on multi-scale face analysis map integration Download PDFInfo
- Publication number
- CN116664435A CN116664435A CN202310643998.6A CN202310643998A CN116664435A CN 116664435 A CN116664435 A CN 116664435A CN 202310643998 A CN202310643998 A CN 202310643998A CN 116664435 A CN116664435 A CN 116664435A
- Authority
- CN
- China
- Prior art keywords
- face
- image
- network
- restoration
- feature map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000004458 analytical method Methods 0.000 title claims abstract description 22
- 230000010354 integration Effects 0.000 title claims abstract description 8
- 238000012549 training Methods 0.000 claims abstract description 19
- 230000004927 fusion Effects 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000009467 reduction Effects 0.000 claims description 2
- 230000006870 function Effects 0.000 abstract description 8
- 238000011084 recovery Methods 0.000 abstract description 7
- 230000002349 favourable effect Effects 0.000 abstract description 2
- 238000010801 machine learning Methods 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000013461 design Methods 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 5
- 230000005012 migration Effects 0.000 description 4
- 238000013508 migration Methods 0.000 description 4
- 230000015556 catabolic process Effects 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/77—Retouching; Inpainting; Scratch removal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a face restoration method based on integration of a multi-scale face analysis chart, and belongs to the technical fields of computer vision, machine learning and the like. Firstly, the invention builds a basic network along with the encoder-decoder structure, adds jump connection of the attention module of the added channel in the basic network, so that the decoder can fully utilize the effective information contained in the extracted feature map to carry out face recovery; meanwhile, the face analysis image is used as a style image to be integrated into the generated face image; finally, the invention adds a loss function which is favorable for saving identity information in training, constrains the restored face identity at the characteristic level extracted by the face recognition network, and adds a countermeasures loss and multi-scale discriminator, so that the network further generates a face image with more real details.
Description
Technical Field
The invention belongs to the technical fields of computer vision, machine learning and the like, and particularly relates to a face restoration method based on deep learning.
Background
Face restoration is an important issue in the field of computer vision. The face restoration technology is a means capable of enabling low-quality face images to be restored to high-quality face images, and lays a foundation for subsequent realization of high-level applications such as face recognition, expression recognition and the like. At present, the face restoration technology is applied to image processing software which is frequently used by people, restoration of old photos and old movies, digital camera zooming compensation, intelligent monitoring systems and the like.
Most face restoration methods currently can be classified into two categories according to whether GAN prior information is used: the first is to use a pre-trained StyleGAN model as a GAN prior and use it as a body to design a network. The method has the advantages that the visual impression effect is more excellent, the face image with extremely low input quality can also obtain a very high-definition output result, and the method is a popular method in recent years. However, the core idea of these GAN prior-based methods is to encode degraded face images into the potential space of the pre-trained GAN, requiring complex modules to be designed to modify the already packaged GAN prior, or to fine tune the GAN prior through additional training. Therefore, the problem of large model parameters exists, so that the method has higher requirements on the video memory and the memory of training equipment, and the potential space dimension of the pre-training GAN is low, so that the space expression capability is poor, the face structure of the degraded face image can not be completely captured, and the face structure of the restored image is unnatural.
The second category is face restoration networks that use other a priori information or do not use a priori information. The method has the advantages of less network parameters and easy training, but has two common problems: (1) The output high-definition image cannot store the identity information of the original input image. The final purpose of face restoration is to perform deep visual tasks such as face detection, recognition, etc., so the identity information of the face in the image is critical to both the human and the network model. However, most of the existing methods only pay attention to the structural design of the generated network, and aim to obtain images with better visual quality, and the problem that the identity of the input image is inconsistent with that of the restored image due to the fact that information in the low-quality input image needs to be fully extracted and effectively utilized is not considered. (2) the face structure information cannot be effectively utilized. The structural information used in the network, such as a face analysis chart, a face key point chart and the like, is often extracted from a low-quality input image, and the information contained in the low-definition image is rough, if the network is estimated by directly using a simple face structure, the estimated prior image is inaccurate, and thus the recovery result is directly influenced. In addition, the existing utilization of the face structure information often uses direct feature stitching, attention mechanisms and the like, global and local information cannot be fully utilized, and therefore the help provided by the face structure information to face restoration is very limited.
Based on the above analysis, we want to be able to design a face restoration network that can meet the following three points: (1) The network model has fewer parameters and is easy to train, i.e. the GAN prior is not used. (2) The method and the device realize effective utilization of the information such as pixels, structures and the like in the input low-definition image, namely fully extract the pixel information in the low-definition face and reasonably utilize the prior information of the face. (3) The design helps preserve identity information and maintains a loss function of image fidelity.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides the face restoration network which can fully utilize the face structure and the identity information, thereby improving the fidelity of the restored face. Firstly, the invention builds a basic network by using an Encoder-Decoder structure commonly used for image restoration, wherein the Encoder part gradually extracts the characteristics in the input face low-definition picture, and the Decoder part gradually upsamples to the same resolution as the input image; secondly, adding jump connection of a channel attention module in the basic network, so that a decoder can fully utilize effective information contained in the extracted feature map to carry out face recovery; meanwhile, a face analysis chart is selected as a structure priori, and in order to integrate the priori information into a network in a high-efficiency manner, the method AdaIN for referencing style migration enables the face analysis chart to be used as a style image and fused into a generated face image; finally, the invention adds a loss function which is favorable for saving identity information in training, constrains the restored face identity at the characteristic level extracted by the face recognition network, and adds a countermeasures loss and multi-scale discriminator, so that the network further generates a face image with more real details.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a face restoration method of an encoder-decoder structure based on multi-scale face analysis map integration is characterized by comprising the following steps: and (3) adjusting the original low-definition face image to a face image with the resolution of 512 multiplied by 512, inputting the face image into a face restoration network, and obtaining a high-definition face image with the resolution of 512 multiplied by 512 after restoration.
The face restoration network comprises an initialization layer, a backbone network and an RGB conversion layer which are sequentially connected.
The initialization layer is used for adjusting the face image with the resolution of 512×512 to obtain a 512×512×32 feature map F 0 。
The backbone network adopts an encoder-decoder structure with jump connection added;
wherein the encoder-decoder structure comprises a fourth lower sample block, a third lower sample block, a second lower sample block, a first upper sample block, a second upper sample block, a third upper sample block, a fourth upper sample block, and a feature map F, which are sequentially connected 0 Gradually reducing the resolution to 32×32 through 4 downsampling blocks, gradually increasing the resolution through 4 upsampling blocks, and generating a feature map F 0 Feature maps of equal size
The jump connection mode is as follows: feature map output by downsampling blockFeature map +.>Feature fusion is performed by vector stitching, wherein the feature map output by the first downsampling block +.>Feature fusion is carried out by splicing with the self vector; feature map obtained by feature fusion and dimension reduction>As input to the next upsampling block;
the RGB conversion layer is used for characteristic graphs with the size of 512 multiplied by 32And converting to obtain a high-definition face image with the resolution of 512 multiplied by 512.
Preferably, the feature map F 'is based on the backbone network' i Up Feature map F' obtained after style modulation branching " i Up As the input feature map of the next up-sampling block, the structural information in the face analysis map is integrated into the face restoration network in a light-weight manner;
specifically, the style modulation branch comprises a convolution layer, an activation layer and two branch structures which are sequentially connected; the two branches structure is composed of two identical convolution layers, and the two branches respectively output style parameters a i And b i And the face restoration network is merged through the following formula:
where μ represents the mean and σ represents the variance.
Preferably, during the jump connection, a channel attention module is inserted for picking out features that assist in face restoration.
Preferably, a multi-scale discriminant is used to stabilize the training during training of the face restoration network.
The beneficial effects of the invention are as follows:
(1) A face restoration network based on an encoder-decoder structure is built, and jump connection for adding a channel attention mechanism is added, so that the network can fully extract and utilize effective information in a low-definition face.
(2) And adding a style modulation branch capable of effectively utilizing the face analysis diagram into the basic network for face restoration to construct a face restoration network with good performance.
(3) A penalty function is designed that can facilitate preservation of texture, identity details.
Drawings
Fig. 1 is a schematic diagram of a backbone network and a hop connection network.
Fig. 2 is a schematic diagram of a style modulation branch added to the network architecture of fig. 1.
Fig. 3 is a schematic diagram of a face restoration method based on the encoder-decoder structure incorporated by the multi-scale face analysis.
Fig. 4, a schematic view of the channel attention structure.
FIG. 5 is a schematic diagram of the face resolution map integration module.
FIG. 6 is a qualitative comparison of the Helen dataset with the most advanced face restoration method.
Fig. 7 is a quantitative comparison of the Helen dataset with the most advanced face restoration method.
Detailed Description
A detailed description of each of the detailed problems involved in the technical scheme of the present invention is given below.
(1) Building encoder-decoder face restoration network based on jump connection
The backbone network of the present invention is an encoder-decoder structure with added hop connections.
The encoder-decoder structure consists of 4 downsampled blocks and 4 upsampled blocks. Firstly, the low-definition face image is adjusted to be 512 multiplied by 512, then is input into a backbone network, the resolution is gradually reduced to 32 multiplied by 32 pixels through 4 downsampling blocks, the resolution is gradually increased through 4 upsampling blocks, and finally the face image with the same size as the input image is generated.
Considering that the features of different network layers are different, such as shallow network attention texture features and deep network attention global features, the features are important to the face restoration task, namely global features and texture features; in addition, due to the downsampling operation, the loss of some edge features is unavoidable, so that in order to utilize more and more comprehensive information in the low-definition image, jump connection is added on a backbone network, and vector splicing is carried out on feature graphs with the same resolution in an encoder and a decoder to achieve the effect of feature fusion.
(2) Channel attention module based on face restoration network
Considering that not all the features extracted from the encoder can provide effective information for human face restoration, for example, some invalid information such as noise, artifacts and the like can exist in the input low-definition image, and after all the information is introduced, the phenomenon of poor restoration such as plaque and the like of the output restored image can be caused. To solve this problem, the present invention adds a channel attention module (Channel attention module, CAM) to the feature map extracted from each downsampling block in the encoder to learn which features contribute to face restoration, and then concatenates the feature map with the feature map generated from the corresponding upsampling block in the decoder. The working mechanism of channel attention is modeling the importance of individual feature channels, and depending on the learning to enhance or suppress different channels, a "filter" like function is provided herein. The present embodiment will be described using the channel attention module send (Sequeeze and Excitation Net) as an example.
As shown in the figure4, feature map F to be output from the downsampling block i Down Inputting the two-dimensional characteristic images into a two-branch structure, and in the branches for obtaining the weight, firstly, using a pooling layer to laminate the space dimension of the characteristic images, namely, aggregating the characteristic images of each two dimensions into a constant, wherein the operation is equivalent to pooling the global receptive field, and the number of characteristic channels is unchanged; then, establishing the correlation between channels through two full connection layers, and generating weight for each characteristic channel; normalizing the weight through an activation layer; finally, weighting and characteristic diagram F i Down Multiplication thereby changing the importance of the different channels.
(3) Adding style modulation branches based on the face restoration network
In order to enable the face analysis map to provide more information in the face restoration network that helps to restore faces and is sufficiently lightweight, the present invention uses classical adaptive instance normalization methods (Adapt ive Instance Normalization, adaIN) in style migration; specifically, a face analysis chart is taken as input, and a style modulation branch comprises a convolution layer, an activation layer and two branch structures which are sequentially connected; the two branches structure is composed of two identical convolution layers, and the two branches respectively output style parameters a i And b i And the face restoration network is merged through the following formula:
where μ represents the mean and σ represents the variance.
AdaIN is based on a forward neural network, is fast to generate and supports migration of any style. Style GAN uses AdaIN to blend the Style of a real face into a generated face so as to achieve the purpose of generating a vivid synthesized face, and the AdaIN can be used for well blending the Style face image into the synthesized face image. In specific implementation, unlike AdaIN, which uses the mean and standard deviation of the face resolution as style parameters, the two style parameters of the present invention are learned by a simple convolution layer, because this task is not a simple style migration, but hopes that the face resolution can provide the face with additional prior knowledge, i.e. information that is beneficial to the final task, so that the two style parameters are learned by a simple convolution layer in a self-adaptive manner. Each face analysis image input used by the face restoration network can be extracted from the output of the previous up-sampling block, and the face analysis image input with higher definition can obtain more accurate face analysis images, so that a face generated image with higher quality is obtained.
(3) The model is trained and its effectiveness is verified experimentally.
The constraint items of the face restoration network are mainly divided into the following five points:
(a) Loss of texture detail
The invention adds the difference between the generated image and the Gram matrix of different semantic areas in the high-definition image as texture loss. The specific calculation is to extract features using VGG19 and calculate the loss using the relu3_1, relu4_1, and relu5_1 features therein. Representing the m-th layer feature in VGG19 as phi m The resolution mask for 19 regions is denoted as M n Then L ss The method comprises the following steps:
wherein,,generating a face image, I H For a real high definition face image, G (-) is used to calculate the feature phi m Mid-semantic region mask M n Gram matrix of (c):
where ε is a constant added to avoid division by zero.
(b) Reconstruction loss
Reconstruction loss, also known as generation loss, is used to measure the output image and the true image of the generation networkDifferences. The reconstruction loss used in the invention is the combination of pixels and characteristic space mean square error (mean square error, MSE) and aims at constraining the output of the networkAs close as possible to the real image I H 。
Wherein,,intermediate face image generated for the i-th upsampling block, a>L is a true high-definition face image with corresponding resolution rec The second term of (2) is the multi-scale feature matching penalty, which matches +.>And->Is a discriminant feature of (a).
(c) Identity loss
In order to prevent the situation that the qualitative and quantitative indexes of the output result are good, but the identity of the output result is difficult to keep the same with that of the original input image, the invention introduces identity loss and constrains the distance between the high-dimensional characteristics of the generated image and the real image, thereby improving the identity similarity. The specific operation is to extract the high-dimensional features of the generated image by using a pre-trained face recognition model Arcface and measure the gap by using the Euclidean distance:
where φ (-) represents the pre-trained face recognition model Arcface.
(d) Countering losses
In order to enable the face image generated by the EDSM network to have high fidelity, the invention designs a discriminator and a corresponding loss function to expand the face image into the EDSM-GAN. The specific loss function is an unsaturated loss function, the goal of which is to maximize the probability that the generated image is judged to be true, thereby providing a larger gradient to the generator early in GAN training.
Wherein L is GAN_D L is the optimization target of the discriminator GAN_G The optimization objective of the generator, I represents the ith arbiter, D (-) represents the arbiter,intermediate face image generated for the i-th upsampling block, a>And the real high-definition face image with the corresponding resolution is obtained.
(e) Training targets
EDSM-GAN output is achieved by minimizing L respectively G And L GAN_D To achieve the training purpose:
L G =λ ss L ss +λ rec L rec +λ id L id +λ adv L GAN_G
wherein lambda is ss 、λ rec 、λ id 、λ adv Texture detail loss, reconstruction loss, identity loss, and counter loss weight values, respectively.
In addition to the constraint terms, in order to generate more realistic details, the recovery result can be improved in subjective performance index, and a discriminator is further added in the embodiment, so that the problem of the GAN network for generating the high-resolution image is solved. In the early stage of training, the high-resolution image has many details which are not easy to imitate due to the insufficient capability of the generator, so that the discriminator can very easily distinguish true and false images, and can not provide feedback for the generator for reference, and the gradient problem in GAN training can be easily amplified, so that the training is unstable, and even the model is crashed. To circumvent this problem, the present invention chooses to use a multi-scale discriminant to stabilize the training.
The multi-scale discriminator consists of a plurality of sub-discriminators, wherein the input of each sub-discriminator is a middle face restoration image corresponding to the up-sampling block and a corresponding resolution true value obtained by down-sampling a high-definition image. Each sub-arbiter can only obtain limited information on the image of the corresponding resolution, so different sub-discriminators are used to solve different levels of discrimination tasks. For example, it is more difficult to resolve the generated image with lower resolution than the true or false of the real image, so that the low resolution sub-discriminant plays an important role in stabilizing the early training of GAN. With the training, the action of the face recovery system is gradually transferred to the sub-discriminants with higher resolution, so that all the discriminants can have excellent discrimination capability, and effective feedback is provided for the face recovery network.
In order to more comprehensively show the visual impression effect of the face restoration model, three groups of low-definition image inputs with slight to severe degradation degrees are selected, and fig. 6 is a comparison of the full-face qualitative effect under the Helen data set. It can be seen that when the degradation degree of the input image is light, all face restoration algorithms can restore the result with good visual impression; when the degradation degree of the input image is serious, the effect of all face restoration algorithms is sliding down to different degrees; the method proposed by Wan et al has no restoration effect on the low-definition input image, the GFPGAN and GCFSR perform poorly, and color blocks in the low-definition image are still reserved in the restored image; GPEN and EDSM-GAN provided by the invention restore better face structure and texture details.
Fig. 7 shows the image quality index comparison results of different face restoration algorithms on the Helen dataset. It can be seen that the network provided by the invention achieves the optimal effect on four image quality indexes, and the method provided by the invention has good recovery performance.
Claims (4)
1. A face restoration method of an encoder-decoder structure based on multi-scale face analysis map integration is characterized by comprising the following steps: the original low-definition face image is adjusted to a face image with the resolution ratio of 512 multiplied by 512, and then the face image is input into a face restoration network, so that a high-definition face image with the resolution ratio of 512 multiplied by 512 after restoration is obtained;
the face restoration network comprises an initialization layer, a backbone network and an RGB conversion layer which are connected in sequence;
the initialization layer is used for adjusting the face image with the resolution of 512×512 to obtain a 512×512×32 feature map F 0 ;
The backbone network adopts an encoder-decoder structure added with jump connection;
wherein the encoder-decoder structure comprises a fourth lower sample block, a third lower sample block, a second lower sample block, a first upper sample block, a second upper sample block, a third upper sample block, a fourth upper sample block, and a feature map F, which are sequentially connected 0 Gradually reducing the resolution to 32×32 through 4 downsampling blocks, gradually increasing the resolution through 4 upsampling blocks, and generating a feature map F 0 Feature maps of equal size
The jump connection mode is as follows: feature map F of downsampling block output i Down Feature map of the same resolution as the upsampling block outputFeature fusion is performed by vector stitching, wherein the feature map F is output by the first downsampling block 1 Down Feature fusion with self vector concatenationCombining; feature map obtained by feature fusion and dimension reduction>As input to the next upsampling block;
the RGB conversion layer is used for characteristic graphs with the size of 512 multiplied by 32And converting to obtain a high-definition face image with the resolution of 512 multiplied by 512.
2. The encoder-decoder structured face restoration method based on multi-scale face analysis map integration of claim 1, wherein: based on the backbone network, the feature map is formedFeature map obtained after style modulation branching +.>As the input feature map of the next up-sampling block, the structural information in the face analysis map is integrated into the face restoration network in a light-weight manner;
specifically, the style modulation branch comprises a convolution layer, an activation layer and two branch structures which are sequentially connected; the two branches structure is composed of two identical convolution layers, and the two branches respectively output style parameters a i And b i And the face restoration network is merged through the following formula:
where μ represents the mean and σ represents the variance.
3. The encoder-decoder structure face restoration method based on multi-scale face analysis map integration as claimed in claim 1 or 2, wherein: during the jump connection, a channel attention module is inserted for picking out features that assist in face restoration.
4. A method for face restoration based on encoder-decoder structure incorporating a multi-scale face analysis map as recited in claim 3, wherein: in training the face restoration network, a multi-scale discriminant is used to stabilize the training.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310643998.6A CN116664435A (en) | 2023-06-01 | 2023-06-01 | Face restoration method based on multi-scale face analysis map integration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310643998.6A CN116664435A (en) | 2023-06-01 | 2023-06-01 | Face restoration method based on multi-scale face analysis map integration |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116664435A true CN116664435A (en) | 2023-08-29 |
Family
ID=87714856
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310643998.6A Pending CN116664435A (en) | 2023-06-01 | 2023-06-01 | Face restoration method based on multi-scale face analysis map integration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116664435A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117391995A (en) * | 2023-10-18 | 2024-01-12 | 山东财经大学 | Progressive face image restoration method, system, equipment and storage medium |
CN118072332A (en) * | 2024-04-19 | 2024-05-24 | 西北工业大学 | Self-evolution zero sample target identification method based on sketch and text double prompt |
-
2023
- 2023-06-01 CN CN202310643998.6A patent/CN116664435A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117391995A (en) * | 2023-10-18 | 2024-01-12 | 山东财经大学 | Progressive face image restoration method, system, equipment and storage medium |
CN118072332A (en) * | 2024-04-19 | 2024-05-24 | 西北工业大学 | Self-evolution zero sample target identification method based on sketch and text double prompt |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Golts et al. | Unsupervised single image dehazing using dark channel prior loss | |
CN111275637B (en) | Attention model-based non-uniform motion blurred image self-adaptive restoration method | |
CN112819910B (en) | Hyperspectral image reconstruction method based on double-ghost attention machine mechanism network | |
Chen et al. | MICU: Image super-resolution via multi-level information compensation and U-net | |
CN113284051B (en) | Face super-resolution method based on frequency decomposition multi-attention machine system | |
CN113283444B (en) | Heterogeneous image migration method based on generation countermeasure network | |
CN113658057B (en) | Swin converter low-light-level image enhancement method | |
CN116664435A (en) | Face restoration method based on multi-scale face analysis map integration | |
CN110363068B (en) | High-resolution pedestrian image generation method based on multiscale circulation generation type countermeasure network | |
CN112967178B (en) | Image conversion method, device, equipment and storage medium | |
CN112837224A (en) | Super-resolution image reconstruction method based on convolutional neural network | |
CN111489405B (en) | Face sketch synthesis system for generating confrontation network based on condition enhancement | |
CN112381716B (en) | Image enhancement method based on generation type countermeasure network | |
Zheng et al. | T-net: Deep stacked scale-iteration network for image dehazing | |
CN113379606B (en) | Face super-resolution method based on pre-training generation model | |
Liu et al. | Deep image inpainting with enhanced normalization and contextual attention | |
CN116739899A (en) | Image super-resolution reconstruction method based on SAUGAN network | |
CN114841859A (en) | Single-image super-resolution reconstruction method based on lightweight neural network and Transformer | |
CN117333410A (en) | Infrared and visible light image fusion method based on Swin transducer and GAN | |
Wang et al. | Rca-cyclegan: unsupervised underwater image enhancement using red channel attention optimized cyclegan | |
CN115272072A (en) | Underwater image super-resolution method based on multi-feature image fusion | |
CN115035011A (en) | Low-illumination image enhancement method for self-adaptive RetinexNet under fusion strategy | |
CN117576483B (en) | Multisource data fusion ground object classification method based on multiscale convolution self-encoder | |
Kumar et al. | Underwater image enhancement using deep learning | |
CN112541566B (en) | Image translation method based on reconstruction loss |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |