CN116109510A - Face image restoration method based on structure and texture dual generation - Google Patents
Face image restoration method based on structure and texture dual generation Download PDFInfo
- Publication number
- CN116109510A CN116109510A CN202310141472.8A CN202310141472A CN116109510A CN 116109510 A CN116109510 A CN 116109510A CN 202310141472 A CN202310141472 A CN 202310141472A CN 116109510 A CN116109510 A CN 116109510A
- Authority
- CN
- China
- Prior art keywords
- texture
- face image
- image
- dual
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000009977 dual effect Effects 0.000 title claims abstract description 41
- 238000012549 training Methods 0.000 claims abstract description 24
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 20
- 230000004927 fusion Effects 0.000 claims description 14
- 230000004913 activation Effects 0.000 claims description 13
- 238000010586 diagram Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 7
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 6
- 230000001788 irregular Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000003708 edge detection Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 230000008447 perception Effects 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000016776 visual perception Effects 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 abstract description 3
- 230000000694 effects Effects 0.000 abstract description 3
- 230000008439 repair process Effects 0.000 description 9
- 101100409194 Rattus norvegicus Ppargc1b gene Proteins 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 4
- ACKXCNXSQYSCQN-DVOZRKJFSA-N beta-D-Galp-(1->6)-beta-D-GalpNAc Chemical compound O[C@@H]1[C@H](O)[C@@H](NC(=O)C)[C@H](O)O[C@@H]1CO[C@H]1[C@H](O)[C@@H](O)[C@@H](O)[C@@H](CO)O1 ACKXCNXSQYSCQN-DVOZRKJFSA-N 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 239000011800 void material Substances 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 230000008485 antagonism Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008033 biological extinction Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000010339 dilation Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000004451 qualitative analysis Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G06T5/77—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention discloses a face image restoration method based on structure and texture dual generation, which relates to the technical field of image restoration, realizes restoration of damaged face images by a deep learning method, solves the problem of inconsistent structure and texture after the face images are restored, improves the restoration effect of large-area damaged images, and comprises the following steps: step S1: preprocessing an input image to obtain a face image to be repaired; step S2: establishing a face image restoration model generated based on structure and texture dual, and inputting the image obtained in the step S1 into the image restoration model for training; step S3: the face image restoration model is obtained by continuous iterative training until the network finally converges; step S4: and inputting the damaged face image into a trained face image restoration model to obtain a restored face image.
Description
Technical Field
The invention relates to the technical field of image restoration, in particular to the technical field of a face image restoration method based on structure and texture dual generation.
Background
Image restoration aims at restoring damaged area pixels in an image and keeping the filled image as consistent as possible with the original image at the visual and semantic level. It is not only critical in computer vision tasks, but also an important cornerstone for research of other image processing tasks. Face restoration plays an important role in practical application as one of the important branches. Compared with common image restoration, the human face has stronger semantics and more complex texture details, so that not only the reasonability of the human face structure is required to be considered, but also the character information is required to be reserved in the restoration process.
Image restoration has made a great progress from earlier traditional methods to the current deep learning-based methods. The traditional method is only suitable for repairing the missing pictures of the single Zhang Jianshan small region, and lacks semantic consistency. Therefore, deep learning-based methods are becoming mainstream.
Pathak first proposes Context Encoders, uses the encoder-decoder network to extract features and outputs reconstruction results, which is also the first base GAN restoration method; iizuka et al introduce a local-global dual arbiter on the basis of Context Encoder, while using dilation convolution to propose a GLCIC network; yu et al propose a deepfill network to borrow or replicate feature information from known background patches by a contextual awareness mechanism to generate missing fronts Jing Buding; the Nazeri et al designed edge connect using a two-stage model, first generated an edge phantom of an irregular missing region by an edge generator as a priori result, and then filled the missing region using a picture patching network based on the edge phantom.
However, these methods do not employ both structural and textural features, resulting in inconsistent structure and texture of the output image. Defect repair involves both high-level semantic knowledge and low-level pixel information, and only by high-structure fusion of the two parts of information, the image repair level of the human visual system can be approximated. To this end Guo et al propose a novel dual-flow network for image restoration that models texture synthesis and texture-guided structural reconstruction of structural constraints in a coupled manner to obtain more reasonable outputs. Although this method results in improved consistency between structure and texture, there are two problems, 1) the relationship between structure and texture is not fully considered, resulting in a limited degree of consistency therebetween. 2) Lack of context reasoning considering image global and local pixel continuity results in repaired images with structural distortion and texture blurring defects, especially when large areas are broken. Based on the two defects, the scheme provides a face image restoration method based on structure and texture dual generation. The method can realize large-area face image restoration without damage while enhancing the texture and structure consistency of the face image restoration.
Disclosure of Invention
The invention aims at: in order to solve the technical problems, the invention provides a face image restoration method based on structure and texture dual generation.
The invention adopts the following technical scheme for realizing the purposes:
a face image restoration method based on structure and texture dual generation comprises the following steps:
step S1: preprocessing an input image to obtain a face image to be repaired;
step S2: establishing a face image restoration model generated based on structure and texture dual, and inputting the image obtained in the step S1 into the image restoration model for training;
step S3: the face image restoration model is obtained by continuous iterative training until the network finally converges;
step S4: and inputting the damaged face image into a trained face image restoration model to obtain a restored face image.
As an optional technical solution, in the step 2, the face image restoration model is a structure for generating an countermeasure network, and is composed of a generator and a discriminator;
the generator comprises a dual encoder-decoder and a feature fusion part, and the discriminator consists of a texture discriminator and a structure discriminator.
As an optional technical solution, the convolution layers of the dual encoder-decoder employ gated convolution to encode and decode features, and a batch normalization layer is added after each gated convolution layer, expressed as:
Gating=∑∑W g ·I
Feature=∑∑W f ·I
Output=BN(φ(Feature)⊙σ(Gating))
wherein I represents a feature map; gating means Gating; feature represents a Feature map after convolution; output represents the final Output feature map, W g And W is f Respectively representing different convolution kernels; phi is the LeakyReLU activation function, sigma is the Sigmoid activation function, and alpha represents element level multiplication, gating of gating convolution as compared to hard gatingThe closer the gating value is to 1, the more effective pixels, the more value between 0 and 1, BN represents batch normalization.
As an alternative, the dual encoder-decoder,
during the encoding stage, the left and right encoders respectively receive the broken image and the broken structural image to encode texture and structural features,
at the decoding stage, the texture decoder synthesizes a texture constrained by borrowing structural features from the texture encoder, while the texture decoder recovers the texture-guided structure by retrieving the texture features from the texture encoder.
As an optional technical solution, the discriminator is a dual-flow discriminator with a texture branch and a structural branch, and the structural branch of the discriminator is further provided with an additional edge detector for edge extraction, wherein two discriminator trunks are composed of common convolutions, and the edge detector is composed of convolutional neural network residual blocks.
As an optional technical solution, the preprocessing in step SS1 is:
firstly, the size of the image is adjusted, the image is adjusted to 256 multiplied by 256 by clipping and filling,
then, a binarization mask M is obtained from an irregular mask data set provided by NVIDIA to artificially damage the image, so that a damaged image is obtained; graying the damaged image to obtain a damaged gray image;
and finally, extracting the face contour information from the damaged gray level image through a Canny edge detection algorithm to obtain a damaged edge image.
As an optional technical scheme, step 3 adopts a CelebA-HQ dataset for training, including training images and test images, the experimental equipment adopts NVIDIA V100, and the whole model is realized by using pyrerch; when training the model, the batch size is set to 8, and optimization is performed by using an Adam optimizer.
As an alternative, first 2×10 is used -4 Initial training is performed using a learning rate of 5×10 -5 A learning rate fine tuning model of (a); learning rate is littleThe tuning model is trained using joint loss, including reconstruction loss, perception loss, style loss, and antagonism loss.
As an alternative solution, four loss functions are as follows:
reconstructing a loss function: l (L) rec =E[||I out -I gt || 1 ]
Wherein E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed, I.I 1 Represents L 1 Norm number
the perceived loss of pre-training by VGG-16 on ImageNet was used to simulate human visual perception of image quality. Wherein E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed, I.I 1 Represents L 1 Norms, phi i Representing the activation diagram of the ith pooling layer of Vgg16, in actual process, i E [1,3 ]]。
wherein E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed,it represents an activation diagram phi i Gram matrix of (c).
wherein E represents the desire, G represents the generator, D represents the discriminator, I gt Representing a real picture, E gt Representing a true edge map, I out Representing the generated pictures, E out Representing the generated edge map.
As an alternative solution, in order to guide the dual encoder-decoder to generate structural and texture features, at F s And F t Intermediate losses are also introduced above:
L inter =L structure +L texture =BCE(E gt ,P s (F s ))+l 1 (I gt ,P t (F t ))
wherein I is gt Representing a real picture, E gt Representing a true edge map, P s And P t Is a mapping function composed of convolution kernel residual blocks, and is used for integrating structural characteristics F s And texture feature F t Respectively mapped to a corresponding edge map and RGB picture.
The beneficial effects of the invention are as follows:
1. the current image restoration model can not restore by using the information of the structure and the texture characteristics simultaneously and fully, so that the problem that the restored image has inconsistent structure and texture is caused.
2. The existing research still has the problem of structural distortion or texture blurring when repairing large-area irregular missing areas, mainly because the context of the image is not fully utilized, resulting in insufficient connection from local features to overall consistency. The invention can fully utilize the context information of the image and has better repairing effect when the image is damaged in a large area.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a network structure diagram of the method of the present invention.
Fig. 3 is an adaptive bi-directional feature fusion module (Adaptive Dual Feature Fusion, ADFF) in the method generator of the present invention.
FIG. 4 is a block diagram of a gated aggregated context switch (Gated Aggregated Contextual Transformations, GACT) module in the method generator of the present invention
FIG. 5 is a schematic diagram showing the qualitative comparison effect of the method of the present invention with other methods.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples
A face image restoration method based on structure and texture dual generation, referring to fig. 1, comprises the following steps:
and step S1, preprocessing an input image to obtain a face image to be repaired. The size of the image is first adjusted, and the image is adjusted to 256×256 size by clipping and filling. And then, acquiring a binarization mask M from the irregular mask data set provided by NVIDIA to artificially damage the image, so as to obtain a damaged image. And carrying out graying treatment on the damaged image to obtain a damaged gray image, and finally extracting face contour information from the damaged gray image through a Canny edge detection algorithm to obtain a damaged edge image.
And S2, building a face image restoration model generated based on the structure and texture dual, and inputting the image obtained in the S1 into the image restoration model for training.
The facial image restoration model generated based on the dual of the structure and the texture is shown in fig. 2, and the model is composed of a generator and a discriminator based on the structure of the generated countermeasure network. The generator comprises a dual encoder-decoder and a feature fusion part, and the discriminator consists of a texture discriminator and a structure discriminator.
Specifically, the dual encoder-decoder uses a connection similar to U-net, the two encoders on the left and right of the encoding stage respectively receive the corrupted image and the corrupted structure image to encode texture and structural features, during the decoding stage the texture decoder synthesizes texture constrained by borrowing structural features from the structural encoder, and the structural decoder restores texture-guided structure by retrieving texture features from the texture encoder. By using the dual structure, the structure and the texture can be well complemented, thereby improving the consistency of the texture and the structure.
The convolution layer of the dual encoder-decoder adopts the gate convolution to encode and decode the characteristics, and compared with partial convolution, the gate convolution learns the characteristics in an end-to-end mode and dynamically updates the mask, so that the method can effectively adapt to the condition of uneven pixel distribution, and can lead the repair result to be clearer and accord with the upper and lower semantics. At the same time, a batch normalization layer is added after each gated convolution layer to prevent gradient extinction during training, which can be expressed as:
Gating=∑∑W g ·I
Feature=∑∑W f ·I
Output=BN(φ(Feature)⊙σ(Gating))
wherein I represents a feature map; gating means Gating; feature represents a Feature map after convolution; output represents the final Output feature map, W g And W is f Respectively representing different convolution kernels; BN represents batch normalization; phi is the LeakyReLU activation function, sigma is the Sigmoid activation function, and as compared to hard gating, the gating value of the gating convolution is between 0 and 1, with a gating value closer to 1 indicating more valid pixels.
In addition, 6 gating aggregated context conversion (Gated Aggregated Contextual Transformations, GACT) modules are introduced into the dual encoder-decoder, which are embedded between the encoder and decoder in a gated residual connection, enabling capturing of image remote context information and enriching of interesting patterns. GACT module As shown in FIG. 3, the GACT module is designed to be disassembledSplitting, converting and aggregating strategies. (i) splitting: feature map x for 256 channels of input 1 The sub-feature map (ii) was reduced to 4 64 channels using 4 3 x 3 gated convolutions: the convolution kernels of each gated convolution have different void fractions. The larger void fraction enables the convolution kernel to focus on a larger area of the input image, while the convolution kernel with smaller void fraction focuses on a local pattern of smaller receptive fields. (iii) polymerization: the 4 context conversion features from different receptive fields are finally aggregated by channel dimension splicing and standard gating convolution to obtain fusion features x 2 . In addition, the residual error connection structure is also used for reference, firstly, for x 1 And (3) performing gating convolution and Sigmoid operation by using a 3X 3 standard to form a threshold g, and performing gating weighting on the converted fusion characteristic and the original characteristic to obtain a final output characteristic, wherein the weighting formula is as follows: x is x 1 ×g+x 2 ×(1-g)。
After the complete structural and texture features are generated by the dual encoder-decoder portion of the generator, the two features are further fused using an adaptive bi-directional feature fusion module (Adaptive Dual Feature Fusion, ADFF). By controlling the fusion ratio of textures and structures, the method adaptively fuses two semantic features, so that the repair result is more reasonable when the structural continuity and textures are enhanced. The ADFF module is shown in fig. 4.
Specifically, the texture map output by the decoder is denoted as F t The structural feature map is denoted as F s . To build texture-aware structural features, soft-gating G t Is formulated as:
G t =σ(SE(g([F s ,F t ])))
wherein []Representing a concatenation of channel dimensions, g (·) represents a convolution with a convolution kernel size of 3. SE (-) represents the channel attention mechanism for obtaining important channel dimension information. Sigma (·) is a sigmoid activation function, using G t Texture features can be dynamically fused to structural features, and the fusion formula is as follows:
where α is a learnable parameter, ☉ and pixel-by-pixel point multiplication and pixel-by-pixel addition, respectively. The same method calculates texture features perceived by the structure. The fusion formula is as follows:
G s =σ(SE(h([F s ,F t ])))
and finally, fusing texture and structural characteristics through the following formula to obtain final fusion characteristics.
F b =SK(k([F S ',F t ']))
Where SK is a convolution kernel attention mechanism that can adaptively select an appropriate convolution kernel that helps repair the consistency of image structure and texture.
The resulting fused features are finally fed into a contextual feature aggregation (Contextual Feature Aggregation, CFA) module that generates more vivid details by modeling long-term spatial dependencies.
The arbiter is a dual stream arbiter having texture branches and structural branches, the structural branches of the arbiter also having an additional edge detector for edge extraction. Wherein two discriminator trunks consist of common convolutions, and in order to improve the stability of the generated countermeasure network, common normalization is also used. The edge detector is composed of convolutional neural network residual blocks.
And step S3, training is continuously iterated until the network finally converges, and a face image restoration model is obtained.
The invention adopts CelebA-HQ data set for training, comprising 28000 training images and 2000 test images. The experimental equipment adopts NVIDIA V100, and the whole model is realized by PyTorch. When training the model, the batch size is set to 8, and optimization is performed by using an Adam optimizer. First using 2×10 -4 Initial training is performed using a learning rate of 5×10 -5 Is a learning rate fine tuning model.
The model is trained using joint loss, including reconstruction loss, perception loss, style loss, and antagonism loss, to obtain visually true and semantically reasonable repair results.
Reconstruction loss: l (L) rec =E[||I out -I gt || 1 ]
Wherein E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed, I.I 1 Represents L 1 Norm number
the perceived loss of pre-training by VGG-16 on ImageNet was used to simulate human visual perception of image quality. Wherein E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed, I.I 1 Represents L 1 Norms, phi i Representing the activation diagram of the ith pooling layer of Vgg16, in actual process, i E [1,3 ]]。
wherein E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed,it represents an activation diagram phi i Gram matrix of (c).
wherein E represents the desire, G represents the generator, D represents the discriminator, I gt Representing a real picture, E gt Representing a true edge map, I out Representing the generated pictures, E out Representing the generated edge map.
To guide the pairThe encoder-decoder is capable of generating structural and textural features, at F s And F t Intermediate losses are also introduced above: l (L) inter =L structure +L texture =BCE(E gt ,P s (F s ))+l 1 (I gt ,P t (F t ))
Wherein I is gt Representing a real picture, E gt Representing a true edge map, P s And P t Is a mapping function composed of convolution kernel residual blocks, and is used for integrating structural characteristics F s And texture feature F t Respectively mapped to a corresponding edge map and RGB picture.
The total loss is: l (L) joint =λ rec L rec +λ perc L perc +λ style L style +λ adv L adv +λ inter L inter
Wherein lambda is rec ,λ perc ,λ style ,λ adv And lambda (lambda) inter Respectively representing the calculated parameters of the corresponding loss. It is set as:
λ rec =10,λ perc =0.1,λ style =250,λ adv =0.1 and λ inter =1
Step S4: and inputting the damaged face image into a trained face image restoration model to obtain a restored face image. To verify the effectiveness of the algorithm, the experiment used a test set of CelebA-HQ datasets, and the algorithm was compared qualitatively and quantitatively with the EdgeConnect, RFR-inpainting, CTSDG algorithm under different mask area ratios.
Qualitative analysis: as shown in fig. 5, fig. 5a shows a broken face image to be repaired, and in fig. 5b, the repaired face structure is distorted and severely distorted when the EdgeConnect is broken in a large area, and only when the face structure is broken in a small area, a good result can be produced. In fig. 5c, RFR-inpainting produces excessively smooth content, and in case of large area breakage, there are problems of color inconsistency, artifacts, texture blurring, etc. In fig. 5d, CTSDG also has a problem of texture blurring distortion. Fig. 5e shows the repair result of the present invention, and it can be seen that the face structure and texture of the repair are more consistent, the semantics are more reasonable, and the better repair result can be generated even when the large area is broken. Fig. 5f shows a true image corresponding to the ground fault image.
Quantitative analysis: experiments were performed on the CelebA-HQ dataset, with 10% to 50% different proportions of masks representing the size of the damaged area, and the results generated were quantitatively compared. Mainly, three evaluation indexes are needed, and the PSNR, SSIM and MAE are shown in the following table, and compared with other methods, the method has the optimal result on all three indexes. (+.cndot.C. representing the larger and better value, +.cndot.C. representing the smaller and better value, +.cndot.C. representing the best result by bold)
Table 1: objective evaluation index comparison of CelebA-HQ data set experimental results
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.
Claims (10)
1. The face image restoration method based on the structure and texture dual generation is characterized by comprising the following steps of:
step S1: preprocessing an input image to obtain a face image to be repaired;
step S2: establishing a face image restoration model generated based on structure and texture dual, and inputting the image obtained in the step S1 into the image restoration model for training;
step S3: the face image restoration model is obtained by continuous iterative training until the network finally converges;
step S4: and inputting the damaged face image into a trained face image restoration model to obtain a restored face image.
2. The face image restoration method based on the dual generation of the structure and the texture according to claim 1, wherein in the step 2, the face image restoration model is a structure for generating an countermeasure network, and is composed of a generator and a discriminator;
the generator comprises a dual encoder-decoder and a feature fusion part, and the discriminator consists of a texture discriminator and a structure discriminator.
3. A face image restoration method based on structure and texture dual generation according to claim 2, wherein the convolution layers of the dual encoder-decoder employ gated convolution to encode and decode features, and a batch normalization layer is added after each gated convolution layer, expressed as:
Gating=∑∑W g ·I
Feature=∑∑W f ·I
Output=BN(φ(Feature)⊙σ(Gating))
wherein I represents a feature map; gating means Gating; feature represents a Feature map after convolution; output represents the final Output feature map, W g And W is f Respectively representing different convolution kernels; phi is the LeakyReLU activation function, sigma is the Sigmoid activation function, and as compared with hard gating, the gating value of the gating convolution is between 0 and 1, the closer the gating value is to 1, the more effective pixels are represented, and BN represents the batch normalization.
4. A face image restoration method based on structure and texture dual generation according to claim 2, wherein said dual encoder-decoder,
during the encoding stage, the left and right encoders respectively receive the broken image and the broken structural image to encode texture and structural features,
at the decoding stage, the texture decoder synthesizes a texture constrained by borrowing structural features from the texture encoder, while the texture decoder recovers the texture-guided structure by retrieving the texture features from the texture encoder.
5. A face image restoration method based on structure and texture dual generation according to claim 2, wherein the discriminator is a dual-flow discriminator with texture branches and structure branches, the structure branches of the discriminator are also provided with an additional edge detector for edge extraction, wherein two discriminator trunks are composed of common convolutions, and the edge detector is composed of convolutional neural network residual blocks.
6. The face image restoration method based on the structure and texture dual generation according to claim 1, wherein the preprocessing in step SS1 is:
firstly, the size of the image is adjusted, the image is adjusted to 256 multiplied by 256 by clipping and filling,
then, a binarization mask M is obtained from an irregular mask data set provided by NVIDIA to artificially damage the image, so that a damaged image is obtained; graying the damaged image to obtain a damaged gray image;
and finally, extracting the face contour information from the damaged gray level image through a Canny edge detection algorithm to obtain a damaged edge image.
7. The face image restoration method based on structure and texture dual generation according to claim 1, wherein the step 3 adopts a CelebA-HQ dataset for training, comprises training images and test images, experimental equipment adopts NVIDIA V100, and the whole model is realized by PyTorch; when training the model, the batch size is set to 8, and optimization is performed by using an Adam optimizer.
8. A face image restoration method based on structure and texture dual generation according to claim 7, characterized in that firstly 2×10 is used -4 Initial training is performed using a learning rate of 5×10 -5 A learning rate fine tuning model of (a); the learning rate fine tuning model is trained using joint loss, including reconstruction loss, perception loss, style loss, and antagonismLoss.
9. The face image restoration method based on structure and texture dual generation according to claim 8, wherein four loss functions are as follows:
reconstructing a loss function: l (L) rec =E[||I out -I gt || 1 ]
Wherein E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed, I.I 1 Represents an L1 norm;
the perceived loss of pre-training by VGG-16 on ImageNet is used to simulate human visual perception of image quality, where E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed, I.I 1 Represents L 1 Norms, phi i Representing the activation diagram of the ith pooling layer of Vgg16, in actual process, i E [1,3 ]];
wherein E represents the desire, I out Representing the generated picture, I gt A picture representing a true image is displayed,it represents an activation diagram phi i Gram matrix of (a);
wherein E represents the desire, G represents the generator, D represents the discriminator, I gt Representing a real picture, E gt Representing a true edge map, I out The picture to be generated is represented by a picture,E out representing the generated edge map.
10. A face image restoration method based on structure and texture dual generation according to claim 1, characterized in that, in order to guide the dual encoder-decoder to be able to generate structure and texture features, in F s And F t Intermediate losses are also introduced above:
wherein I is gt Representing a real picture, E gt Representing a true edge map, P s And P t Is a mapping function composed of convolution kernel residual blocks, and is used for integrating structural characteristics F s And texture feature F t Respectively mapped to a corresponding edge map and RGB picture.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310141472.8A CN116109510A (en) | 2023-02-21 | 2023-02-21 | Face image restoration method based on structure and texture dual generation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310141472.8A CN116109510A (en) | 2023-02-21 | 2023-02-21 | Face image restoration method based on structure and texture dual generation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116109510A true CN116109510A (en) | 2023-05-12 |
Family
ID=86263723
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310141472.8A Pending CN116109510A (en) | 2023-02-21 | 2023-02-21 | Face image restoration method based on structure and texture dual generation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116109510A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116895091A (en) * | 2023-07-24 | 2023-10-17 | 山东睿芯半导体科技有限公司 | Facial recognition method and device for incomplete image, chip and terminal |
-
2023
- 2023-02-21 CN CN202310141472.8A patent/CN116109510A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116895091A (en) * | 2023-07-24 | 2023-10-17 | 山东睿芯半导体科技有限公司 | Facial recognition method and device for incomplete image, chip and terminal |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111292264B (en) | Image high dynamic range reconstruction method based on deep learning | |
CN113658051B (en) | Image defogging method and system based on cyclic generation countermeasure network | |
CN114463209B (en) | Image restoration method based on deep multi-feature collaborative learning | |
CN110717868B (en) | Video high dynamic range inverse tone mapping model construction and mapping method and device | |
CN110689495B (en) | Image restoration method for deep learning | |
CN111709900A (en) | High dynamic range image reconstruction method based on global feature guidance | |
CN114627006B (en) | Progressive image restoration method based on depth decoupling network | |
CN114066747A (en) | Low-illumination image enhancement method based on illumination and reflection complementarity | |
CN115018727A (en) | Multi-scale image restoration method, storage medium and terminal | |
CN114897742B (en) | Image restoration method with texture and structural features fused twice | |
CN115829876A (en) | Real degraded image blind restoration method based on cross attention mechanism | |
CN116109510A (en) | Face image restoration method based on structure and texture dual generation | |
Liu et al. | Facial image inpainting using multi-level generative network | |
CN113066025A (en) | Image defogging method based on incremental learning and feature and attention transfer | |
CN113034388A (en) | Ancient painting virtual repairing method and construction method of repairing model | |
CN117408924A (en) | Low-light image enhancement method based on multiple semantic feature fusion network | |
CN116681621A (en) | Face image restoration method based on feature fusion and multiplexing | |
CN116934613A (en) | Branch convolution channel attention module for character repair | |
CN116416216A (en) | Quality evaluation method based on self-supervision feature extraction, storage medium and terminal | |
CN116309171A (en) | Method and device for enhancing monitoring image of power transmission line | |
CN116051407A (en) | Image restoration method | |
CN115035170A (en) | Image restoration method based on global texture and structure | |
CN114494387A (en) | Data set network generation model and fog map generation method | |
CN116958317A (en) | Image restoration method and system combining edge information and appearance stream operation | |
CN113888417A (en) | Human face image restoration method based on semantic analysis generation guidance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |