CN116109510A - Face image restoration method based on structure and texture dual generation - Google Patents

Face image restoration method based on structure and texture dual generation Download PDF

Info

Publication number
CN116109510A
CN116109510A CN202310141472.8A CN202310141472A CN116109510A CN 116109510 A CN116109510 A CN 116109510A CN 202310141472 A CN202310141472 A CN 202310141472A CN 116109510 A CN116109510 A CN 116109510A
Authority
CN
China
Prior art keywords
texture
face image
image
dual
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310141472.8A
Other languages
Chinese (zh)
Inventor
李剑波
尹泽召
黄进
冯义从
汪依帆
曾涛
荣鹏
王新元
刘俊宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Jiaotong University
Original Assignee
Southwest Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Jiaotong University filed Critical Southwest Jiaotong University
Priority to CN202310141472.8A priority Critical patent/CN116109510A/en
Publication of CN116109510A publication Critical patent/CN116109510A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • G06T5/77
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a face image restoration method based on structure and texture dual generation, which relates to the technical field of image restoration, realizes restoration of damaged face images by a deep learning method, solves the problem of inconsistent structure and texture after the face images are restored, improves the restoration effect of large-area damaged images, and comprises the following steps: step S1: preprocessing an input image to obtain a face image to be repaired; step S2: establishing a face image restoration model generated based on structure and texture dual, and inputting the image obtained in the step S1 into the image restoration model for training; step S3: the face image restoration model is obtained by continuous iterative training until the network finally converges; step S4: and inputting the damaged face image into a trained face image restoration model to obtain a restored face image.

Description

Face image restoration method based on structure and texture dual generation
Technical Field
The invention relates to the technical field of image restoration, in particular to the technical field of a face image restoration method based on structure and texture dual generation.
Background
Image restoration aims at restoring damaged area pixels in an image and keeping the filled image as consistent as possible with the original image at the visual and semantic level. It is not only critical in computer vision tasks, but also an important cornerstone for research of other image processing tasks. Face restoration plays an important role in practical application as one of the important branches. Compared with common image restoration, the human face has stronger semantics and more complex texture details, so that not only the reasonability of the human face structure is required to be considered, but also the character information is required to be reserved in the restoration process.
Image restoration has made a great progress from earlier traditional methods to the current deep learning-based methods. The traditional method is only suitable for repairing the missing pictures of the single Zhang Jianshan small region, and lacks semantic consistency. Therefore, deep learning-based methods are becoming mainstream.
Pathak first proposes Context Encoders, uses the encoder-decoder network to extract features and outputs reconstruction results, which is also the first base GAN restoration method; iizuka et al introduce a local-global dual arbiter on the basis of Context Encoder, while using dilation convolution to propose a GLCIC network; yu et al propose a deepfill network to borrow or replicate feature information from known background patches by a contextual awareness mechanism to generate missing fronts Jing Buding; the Nazeri et al designed edge connect using a two-stage model, first generated an edge phantom of an irregular missing region by an edge generator as a priori result, and then filled the missing region using a picture patching network based on the edge phantom.
However, these methods do not employ both structural and textural features, resulting in inconsistent structure and texture of the output image. Defect repair involves both high-level semantic knowledge and low-level pixel information, and only by high-structure fusion of the two parts of information, the image repair level of the human visual system can be approximated. To this end Guo et al propose a novel dual-flow network for image restoration that models texture synthesis and texture-guided structural reconstruction of structural constraints in a coupled manner to obtain more reasonable outputs. Although this method results in improved consistency between structure and texture, there are two problems, 1) the relationship between structure and texture is not fully considered, resulting in a limited degree of consistency therebetween. 2) Lack of context reasoning considering image global and local pixel continuity results in repaired images with structural distortion and texture blurring defects, especially when large areas are broken. Based on the two defects, the scheme provides a face image restoration method based on structure and texture dual generation. The method can realize large-area face image restoration without damage while enhancing the texture and structure consistency of the face image restoration.
Disclosure of Invention
The invention aims at: in order to solve the technical problems, the invention provides a face image restoration method based on structure and texture dual generation.
The invention adopts the following technical scheme for realizing the purposes:
a face image restoration method based on structure and texture dual generation comprises the following steps:
step S1: preprocessing an input image to obtain a face image to be repaired;
step S2: establishing a face image restoration model generated based on structure and texture dual, and inputting the image obtained in the step S1 into the image restoration model for training;
step S3: the face image restoration model is obtained by continuous iterative training until the network finally converges;
step S4: and inputting the damaged face image into a trained face image restoration model to obtain a restored face image.
As an optional technical solution, in the step 2, the face image restoration model is a structure for generating an countermeasure network, and is composed of a generator and a discriminator;
the generator comprises a dual encoder-decoder and a feature fusion part, and the discriminator consists of a texture discriminator and a structure discriminator.
As an optional technical solution, the convolution layers of the dual encoder-decoder employ gated convolution to encode and decode features, and a batch normalization layer is added after each gated convolution layer, expressed as:
Gating=∑∑W g ·I
Feature=∑∑W f ·I
Output=BN(φ(Feature)⊙σ(Gating))
wherein I represents a feature map; gating means Gating; feature represents a Feature map after convolution; output represents the final Output feature map, W g And W is f Respectively representing different convolution kernels; phi is the LeakyReLU activation function, sigma is the Sigmoid activation function, and alpha represents element level multiplication, gating of gating convolution as compared to hard gatingThe closer the gating value is to 1, the more effective pixels, the more value between 0 and 1, BN represents batch normalization.
As an alternative, the dual encoder-decoder,
during the encoding stage, the left and right encoders respectively receive the broken image and the broken structural image to encode texture and structural features,
at the decoding stage, the texture decoder synthesizes a texture constrained by borrowing structural features from the texture encoder, while the texture decoder recovers the texture-guided structure by retrieving the texture features from the texture encoder.
As an optional technical solution, the discriminator is a dual-flow discriminator with a texture branch and a structural branch, and the structural branch of the discriminator is further provided with an additional edge detector for edge extraction, wherein two discriminator trunks are composed of common convolutions, and the edge detector is composed of convolutional neural network residual blocks.
As an optional technical solution, the preprocessing in step SS1 is:
firstly, the size of the image is adjusted, the image is adjusted to 256 multiplied by 256 by clipping and filling,
then, a binarization mask M is obtained from an irregular mask data set provided by NVIDIA to artificially damage the image, so that a damaged image is obtained; graying the damaged image to obtain a damaged gray image;
and finally, extracting the face contour information from the damaged gray level image through a Canny edge detection algorithm to obtain a damaged edge image.
As an optional technical scheme, step 3 adopts a CelebA-HQ dataset for training, including training images and test images, the experimental equipment adopts NVIDIA V100, and the whole model is realized by using pyrerch; when training the model, the batch size is set to 8, and optimization is performed by using an Adam optimizer.
As an alternative, first 2×10 is used -4 Initial training is performed using a learning rate of 5×10 -5 A learning rate fine tuning model of (a); learning rate is littleThe tuning model is trained using joint loss, including reconstruction loss, perception loss, style loss, and antagonism loss.
As an alternative solution, four loss functions are as follows:
reconstructing a loss function: l (L) rec =E[||I out -I gt || 1 ]
Wherein E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed, I.I 1 Represents L 1 Norm number
Perceptual loss function:
Figure BDA0004087656750000041
the perceived loss of pre-training by VGG-16 on ImageNet was used to simulate human visual perception of image quality. Wherein E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed, I.I 1 Represents L 1 Norms, phi i Representing the activation diagram of the ith pooling layer of Vgg16, in actual process, i E [1,3 ]]。
Style loss function:
Figure BDA0004087656750000042
wherein E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed,
Figure BDA0004087656750000043
it represents an activation diagram phi i Gram matrix of (c).
Countering loss function:
Figure BDA0004087656750000044
wherein E represents the desire, G represents the generator, D represents the discriminator, I gt Representing a real picture, E gt Representing a true edge map, I out Representing the generated pictures, E out Representing the generated edge map.
As an alternative solution, in order to guide the dual encoder-decoder to generate structural and texture features, at F s And F t Intermediate losses are also introduced above:
L inter =L structure +L texture =BCE(E gt ,P s (F s ))+l 1 (I gt ,P t (F t ))
wherein I is gt Representing a real picture, E gt Representing a true edge map, P s And P t Is a mapping function composed of convolution kernel residual blocks, and is used for integrating structural characteristics F s And texture feature F t Respectively mapped to a corresponding edge map and RGB picture.
The beneficial effects of the invention are as follows:
1. the current image restoration model can not restore by using the information of the structure and the texture characteristics simultaneously and fully, so that the problem that the restored image has inconsistent structure and texture is caused.
2. The existing research still has the problem of structural distortion or texture blurring when repairing large-area irregular missing areas, mainly because the context of the image is not fully utilized, resulting in insufficient connection from local features to overall consistency. The invention can fully utilize the context information of the image and has better repairing effect when the image is damaged in a large area.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a network structure diagram of the method of the present invention.
Fig. 3 is an adaptive bi-directional feature fusion module (Adaptive Dual Feature Fusion, ADFF) in the method generator of the present invention.
FIG. 4 is a block diagram of a gated aggregated context switch (Gated Aggregated Contextual Transformations, GACT) module in the method generator of the present invention
FIG. 5 is a schematic diagram showing the qualitative comparison effect of the method of the present invention with other methods.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples
A face image restoration method based on structure and texture dual generation, referring to fig. 1, comprises the following steps:
and step S1, preprocessing an input image to obtain a face image to be repaired. The size of the image is first adjusted, and the image is adjusted to 256×256 size by clipping and filling. And then, acquiring a binarization mask M from the irregular mask data set provided by NVIDIA to artificially damage the image, so as to obtain a damaged image. And carrying out graying treatment on the damaged image to obtain a damaged gray image, and finally extracting face contour information from the damaged gray image through a Canny edge detection algorithm to obtain a damaged edge image.
And S2, building a face image restoration model generated based on the structure and texture dual, and inputting the image obtained in the S1 into the image restoration model for training.
The facial image restoration model generated based on the dual of the structure and the texture is shown in fig. 2, and the model is composed of a generator and a discriminator based on the structure of the generated countermeasure network. The generator comprises a dual encoder-decoder and a feature fusion part, and the discriminator consists of a texture discriminator and a structure discriminator.
Specifically, the dual encoder-decoder uses a connection similar to U-net, the two encoders on the left and right of the encoding stage respectively receive the corrupted image and the corrupted structure image to encode texture and structural features, during the decoding stage the texture decoder synthesizes texture constrained by borrowing structural features from the structural encoder, and the structural decoder restores texture-guided structure by retrieving texture features from the texture encoder. By using the dual structure, the structure and the texture can be well complemented, thereby improving the consistency of the texture and the structure.
The convolution layer of the dual encoder-decoder adopts the gate convolution to encode and decode the characteristics, and compared with partial convolution, the gate convolution learns the characteristics in an end-to-end mode and dynamically updates the mask, so that the method can effectively adapt to the condition of uneven pixel distribution, and can lead the repair result to be clearer and accord with the upper and lower semantics. At the same time, a batch normalization layer is added after each gated convolution layer to prevent gradient extinction during training, which can be expressed as:
Gating=∑∑W g ·I
Feature=∑∑W f ·I
Output=BN(φ(Feature)⊙σ(Gating))
wherein I represents a feature map; gating means Gating; feature represents a Feature map after convolution; output represents the final Output feature map, W g And W is f Respectively representing different convolution kernels; BN represents batch normalization; phi is the LeakyReLU activation function, sigma is the Sigmoid activation function, and as compared to hard gating, the gating value of the gating convolution is between 0 and 1, with a gating value closer to 1 indicating more valid pixels.
In addition, 6 gating aggregated context conversion (Gated Aggregated Contextual Transformations, GACT) modules are introduced into the dual encoder-decoder, which are embedded between the encoder and decoder in a gated residual connection, enabling capturing of image remote context information and enriching of interesting patterns. GACT module As shown in FIG. 3, the GACT module is designed to be disassembledSplitting, converting and aggregating strategies. (i) splitting: feature map x for 256 channels of input 1 The sub-feature map (ii) was reduced to 4 64 channels using 4 3 x 3 gated convolutions: the convolution kernels of each gated convolution have different void fractions. The larger void fraction enables the convolution kernel to focus on a larger area of the input image, while the convolution kernel with smaller void fraction focuses on a local pattern of smaller receptive fields. (iii) polymerization: the 4 context conversion features from different receptive fields are finally aggregated by channel dimension splicing and standard gating convolution to obtain fusion features x 2 . In addition, the residual error connection structure is also used for reference, firstly, for x 1 And (3) performing gating convolution and Sigmoid operation by using a 3X 3 standard to form a threshold g, and performing gating weighting on the converted fusion characteristic and the original characteristic to obtain a final output characteristic, wherein the weighting formula is as follows: x is x 1 ×g+x 2 ×(1-g)。
After the complete structural and texture features are generated by the dual encoder-decoder portion of the generator, the two features are further fused using an adaptive bi-directional feature fusion module (Adaptive Dual Feature Fusion, ADFF). By controlling the fusion ratio of textures and structures, the method adaptively fuses two semantic features, so that the repair result is more reasonable when the structural continuity and textures are enhanced. The ADFF module is shown in fig. 4.
Specifically, the texture map output by the decoder is denoted as F t The structural feature map is denoted as F s . To build texture-aware structural features, soft-gating G t Is formulated as:
G t =σ(SE(g([F s ,F t ])))
wherein []Representing a concatenation of channel dimensions, g (·) represents a convolution with a convolution kernel size of 3. SE (-) represents the channel attention mechanism for obtaining important channel dimension information. Sigma (·) is a sigmoid activation function, using G t Texture features can be dynamically fused to structural features, and the fusion formula is as follows:
Figure BDA0004087656750000081
where α is a learnable parameter, ☉ and pixel-by-pixel point multiplication and pixel-by-pixel addition, respectively. The same method calculates texture features perceived by the structure. The fusion formula is as follows:
G s =σ(SE(h([F s ,F t ])))
Figure BDA0004087656750000082
and finally, fusing texture and structural characteristics through the following formula to obtain final fusion characteristics.
F b =SK(k([F S ',F t ']))
Where SK is a convolution kernel attention mechanism that can adaptively select an appropriate convolution kernel that helps repair the consistency of image structure and texture.
The resulting fused features are finally fed into a contextual feature aggregation (Contextual Feature Aggregation, CFA) module that generates more vivid details by modeling long-term spatial dependencies.
The arbiter is a dual stream arbiter having texture branches and structural branches, the structural branches of the arbiter also having an additional edge detector for edge extraction. Wherein two discriminator trunks consist of common convolutions, and in order to improve the stability of the generated countermeasure network, common normalization is also used. The edge detector is composed of convolutional neural network residual blocks.
And step S3, training is continuously iterated until the network finally converges, and a face image restoration model is obtained.
The invention adopts CelebA-HQ data set for training, comprising 28000 training images and 2000 test images. The experimental equipment adopts NVIDIA V100, and the whole model is realized by PyTorch. When training the model, the batch size is set to 8, and optimization is performed by using an Adam optimizer. First using 2×10 -4 Initial training is performed using a learning rate of 5×10 -5 Is a learning rate fine tuning model.
The model is trained using joint loss, including reconstruction loss, perception loss, style loss, and antagonism loss, to obtain visually true and semantically reasonable repair results.
Reconstruction loss: l (L) rec =E[||I out -I gt || 1 ]
Wherein E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed, I.I 1 Represents L 1 Norm number
Perceptual loss:
Figure BDA0004087656750000091
the perceived loss of pre-training by VGG-16 on ImageNet was used to simulate human visual perception of image quality. Wherein E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed, I.I 1 Represents L 1 Norms, phi i Representing the activation diagram of the ith pooling layer of Vgg16, in actual process, i E [1,3 ]]。
Style loss:
Figure BDA0004087656750000092
wherein E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed,
Figure BDA0004087656750000093
it represents an activation diagram phi i Gram matrix of (c).
Countering losses:
Figure BDA0004087656750000094
wherein E represents the desire, G represents the generator, D represents the discriminator, I gt Representing a real picture, E gt Representing a true edge map, I out Representing the generated pictures, E out Representing the generated edge map.
To guide the pairThe encoder-decoder is capable of generating structural and textural features, at F s And F t Intermediate losses are also introduced above: l (L) inter =L structure +L texture =BCE(E gt ,P s (F s ))+l 1 (I gt ,P t (F t ))
Wherein I is gt Representing a real picture, E gt Representing a true edge map, P s And P t Is a mapping function composed of convolution kernel residual blocks, and is used for integrating structural characteristics F s And texture feature F t Respectively mapped to a corresponding edge map and RGB picture.
The total loss is: l (L) joint =λ rec L recperc L percstyle L styleadv L advinter L inter
Wherein lambda is recpercstyleadv And lambda (lambda) inter Respectively representing the calculated parameters of the corresponding loss. It is set as:
λ rec =10,λ perc =0.1,λ style =250,λ adv =0.1 and λ inter =1
Step S4: and inputting the damaged face image into a trained face image restoration model to obtain a restored face image. To verify the effectiveness of the algorithm, the experiment used a test set of CelebA-HQ datasets, and the algorithm was compared qualitatively and quantitatively with the EdgeConnect, RFR-inpainting, CTSDG algorithm under different mask area ratios.
Qualitative analysis: as shown in fig. 5, fig. 5a shows a broken face image to be repaired, and in fig. 5b, the repaired face structure is distorted and severely distorted when the EdgeConnect is broken in a large area, and only when the face structure is broken in a small area, a good result can be produced. In fig. 5c, RFR-inpainting produces excessively smooth content, and in case of large area breakage, there are problems of color inconsistency, artifacts, texture blurring, etc. In fig. 5d, CTSDG also has a problem of texture blurring distortion. Fig. 5e shows the repair result of the present invention, and it can be seen that the face structure and texture of the repair are more consistent, the semantics are more reasonable, and the better repair result can be generated even when the large area is broken. Fig. 5f shows a true image corresponding to the ground fault image.
Quantitative analysis: experiments were performed on the CelebA-HQ dataset, with 10% to 50% different proportions of masks representing the size of the damaged area, and the results generated were quantitatively compared. Mainly, three evaluation indexes are needed, and the PSNR, SSIM and MAE are shown in the following table, and compared with other methods, the method has the optimal result on all three indexes. (+.cndot.C. representing the larger and better value, +.cndot.C. representing the smaller and better value, +.cndot.C. representing the best result by bold)
Table 1: objective evaluation index comparison of CelebA-HQ data set experimental results
Figure BDA0004087656750000101
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (10)

1. The face image restoration method based on the structure and texture dual generation is characterized by comprising the following steps of:
step S1: preprocessing an input image to obtain a face image to be repaired;
step S2: establishing a face image restoration model generated based on structure and texture dual, and inputting the image obtained in the step S1 into the image restoration model for training;
step S3: the face image restoration model is obtained by continuous iterative training until the network finally converges;
step S4: and inputting the damaged face image into a trained face image restoration model to obtain a restored face image.
2. The face image restoration method based on the dual generation of the structure and the texture according to claim 1, wherein in the step 2, the face image restoration model is a structure for generating an countermeasure network, and is composed of a generator and a discriminator;
the generator comprises a dual encoder-decoder and a feature fusion part, and the discriminator consists of a texture discriminator and a structure discriminator.
3. A face image restoration method based on structure and texture dual generation according to claim 2, wherein the convolution layers of the dual encoder-decoder employ gated convolution to encode and decode features, and a batch normalization layer is added after each gated convolution layer, expressed as:
Gating=∑∑W g ·I
Feature=∑∑W f ·I
Output=BN(φ(Feature)⊙σ(Gating))
wherein I represents a feature map; gating means Gating; feature represents a Feature map after convolution; output represents the final Output feature map, W g And W is f Respectively representing different convolution kernels; phi is the LeakyReLU activation function, sigma is the Sigmoid activation function, and as compared with hard gating, the gating value of the gating convolution is between 0 and 1, the closer the gating value is to 1, the more effective pixels are represented, and BN represents the batch normalization.
4. A face image restoration method based on structure and texture dual generation according to claim 2, wherein said dual encoder-decoder,
during the encoding stage, the left and right encoders respectively receive the broken image and the broken structural image to encode texture and structural features,
at the decoding stage, the texture decoder synthesizes a texture constrained by borrowing structural features from the texture encoder, while the texture decoder recovers the texture-guided structure by retrieving the texture features from the texture encoder.
5. A face image restoration method based on structure and texture dual generation according to claim 2, wherein the discriminator is a dual-flow discriminator with texture branches and structure branches, the structure branches of the discriminator are also provided with an additional edge detector for edge extraction, wherein two discriminator trunks are composed of common convolutions, and the edge detector is composed of convolutional neural network residual blocks.
6. The face image restoration method based on the structure and texture dual generation according to claim 1, wherein the preprocessing in step SS1 is:
firstly, the size of the image is adjusted, the image is adjusted to 256 multiplied by 256 by clipping and filling,
then, a binarization mask M is obtained from an irregular mask data set provided by NVIDIA to artificially damage the image, so that a damaged image is obtained; graying the damaged image to obtain a damaged gray image;
and finally, extracting the face contour information from the damaged gray level image through a Canny edge detection algorithm to obtain a damaged edge image.
7. The face image restoration method based on structure and texture dual generation according to claim 1, wherein the step 3 adopts a CelebA-HQ dataset for training, comprises training images and test images, experimental equipment adopts NVIDIA V100, and the whole model is realized by PyTorch; when training the model, the batch size is set to 8, and optimization is performed by using an Adam optimizer.
8. A face image restoration method based on structure and texture dual generation according to claim 7, characterized in that firstly 2×10 is used -4 Initial training is performed using a learning rate of 5×10 -5 A learning rate fine tuning model of (a); the learning rate fine tuning model is trained using joint loss, including reconstruction loss, perception loss, style loss, and antagonismLoss.
9. The face image restoration method based on structure and texture dual generation according to claim 8, wherein four loss functions are as follows:
reconstructing a loss function: l (L) rec =E[||I out -I gt || 1 ]
Wherein E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed, I.I 1 Represents an L1 norm;
perceptual loss function:
Figure FDA0004087656740000031
the perceived loss of pre-training by VGG-16 on ImageNet is used to simulate human visual perception of image quality, where E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed, I.I 1 Represents L 1 Norms, phi i Representing the activation diagram of the ith pooling layer of Vgg16, in actual process, i E [1,3 ]];
Style loss function:
Figure FDA0004087656740000032
wherein E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed,
Figure FDA0004087656740000033
it represents an activation diagram phi i Gram matrix of (a);
countering loss function:
Figure FDA0004087656740000034
wherein E represents the desire, G represents the generator, D represents the discriminator, I gt Representing a real picture, E gt Representing a true edge map, I out The picture to be generated is represented by a picture,E out representing the generated edge map.
10. A face image restoration method based on structure and texture dual generation according to claim 1, characterized in that, in order to guide the dual encoder-decoder to be able to generate structure and texture features, in F s And F t Intermediate losses are also introduced above:
Figure FDA0004087656740000035
wherein I is gt Representing a real picture, E gt Representing a true edge map, P s And P t Is a mapping function composed of convolution kernel residual blocks, and is used for integrating structural characteristics F s And texture feature F t Respectively mapped to a corresponding edge map and RGB picture.
CN202310141472.8A 2023-02-21 2023-02-21 Face image restoration method based on structure and texture dual generation Pending CN116109510A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310141472.8A CN116109510A (en) 2023-02-21 2023-02-21 Face image restoration method based on structure and texture dual generation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310141472.8A CN116109510A (en) 2023-02-21 2023-02-21 Face image restoration method based on structure and texture dual generation

Publications (1)

Publication Number Publication Date
CN116109510A true CN116109510A (en) 2023-05-12

Family

ID=86263723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310141472.8A Pending CN116109510A (en) 2023-02-21 2023-02-21 Face image restoration method based on structure and texture dual generation

Country Status (1)

Country Link
CN (1) CN116109510A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116895091A (en) * 2023-07-24 2023-10-17 山东睿芯半导体科技有限公司 Facial recognition method and device for incomplete image, chip and terminal

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116895091A (en) * 2023-07-24 2023-10-17 山东睿芯半导体科技有限公司 Facial recognition method and device for incomplete image, chip and terminal

Similar Documents

Publication Publication Date Title
CN111292264B (en) Image high dynamic range reconstruction method based on deep learning
CN113658051B (en) Image defogging method and system based on cyclic generation countermeasure network
CN114463209B (en) Image restoration method based on deep multi-feature collaborative learning
CN110717868B (en) Video high dynamic range inverse tone mapping model construction and mapping method and device
CN110689495B (en) Image restoration method for deep learning
CN111709900A (en) High dynamic range image reconstruction method based on global feature guidance
CN114627006B (en) Progressive image restoration method based on depth decoupling network
CN114066747A (en) Low-illumination image enhancement method based on illumination and reflection complementarity
CN115018727A (en) Multi-scale image restoration method, storage medium and terminal
CN114897742B (en) Image restoration method with texture and structural features fused twice
CN115829876A (en) Real degraded image blind restoration method based on cross attention mechanism
CN116109510A (en) Face image restoration method based on structure and texture dual generation
Liu et al. Facial image inpainting using multi-level generative network
CN113066025A (en) Image defogging method based on incremental learning and feature and attention transfer
CN113034388A (en) Ancient painting virtual repairing method and construction method of repairing model
CN117408924A (en) Low-light image enhancement method based on multiple semantic feature fusion network
CN116681621A (en) Face image restoration method based on feature fusion and multiplexing
CN116934613A (en) Branch convolution channel attention module for character repair
CN116416216A (en) Quality evaluation method based on self-supervision feature extraction, storage medium and terminal
CN116309171A (en) Method and device for enhancing monitoring image of power transmission line
CN116051407A (en) Image restoration method
CN115035170A (en) Image restoration method based on global texture and structure
CN114494387A (en) Data set network generation model and fog map generation method
CN116958317A (en) Image restoration method and system combining edge information and appearance stream operation
CN113888417A (en) Human face image restoration method based on semantic analysis generation guidance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination