CN111768354A - Face image restoration system based on multi-scale face part feature dictionary - Google Patents
Face image restoration system based on multi-scale face part feature dictionary Download PDFInfo
- Publication number
- CN111768354A CN111768354A CN202010779169.7A CN202010779169A CN111768354A CN 111768354 A CN111768354 A CN 111768354A CN 202010779169 A CN202010779169 A CN 202010779169A CN 111768354 A CN111768354 A CN 111768354A
- Authority
- CN
- China
- Prior art keywords
- face
- scale
- dictionary
- features
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 claims abstract description 49
- 238000000605 extraction Methods 0.000 claims abstract description 45
- 238000003064 k means clustering Methods 0.000 claims abstract description 7
- 230000004913 activation Effects 0.000 claims description 110
- 238000011176 pooling Methods 0.000 claims description 61
- 238000012545 processing Methods 0.000 claims description 28
- 230000009466 transformation Effects 0.000 claims description 22
- 238000010606 normalization Methods 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 16
- 230000001815 facial effect Effects 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 15
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 13
- 238000007500 overflow downdraw method Methods 0.000 claims description 12
- 230000004927 fusion Effects 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 9
- 230000006835 compression Effects 0.000 claims description 7
- 238000007906 compression Methods 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 6
- 230000003044 adaptive effect Effects 0.000 claims description 5
- 238000011084 recovery Methods 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000011276 addition treatment Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 238000006073 displacement reaction Methods 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000011282 treatment Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/02—Affine transformations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Probability & Statistics with Applications (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Processing (AREA)
Abstract
A face image restoration system based on a multi-scale face part feature dictionary belongs to the technical field of face image restoration. The invention aims at the problem that in the existing face image restoration technology, the high-quality face image obtained from the real low-quality face image needs to be guided by the high-definition face image of the user, so that the application of the high-quality face image is limited. The method comprises a face feature dictionary offline generation module: the face feature dictionary is used for respectively extracting high-definition face feature from each sample image in the high-definition face image data set, and obtaining a face feature dictionary by adopting a k-means clustering mode on an extraction result; the face image restoration module: the face restoration method comprises the steps of extracting features of a degraded face image to be restored, and fusing a feature extraction result with the face part feature dictionary to obtain a face feature to be restored after the part is enhanced; and reconstructing the human face features to be restored to obtain a guide restoration result image. The invention is used for restoring the low-quality image.
Description
Technical Field
The invention relates to a face image restoration system based on a multi-scale face part feature dictionary, and belongs to the technical field of face image restoration.
Background
The face image restoration technology is used for restoring a low-quality face image (such as blurriness, high noise, compression artifacts, low quality caused by long-distance shooting, low-quality shooting equipment, network transmission and the like) into a high-quality face image. With the development of technology and equipment, people are pursuing high-definition quality images and videos more and more, and mobile phone manufacturers also pursue high quality of shot face images. For the face image with low quality caused by the inevitable reason, the visual perception of people is often unacceptable, and the real degradation of the image cannot be simulated, so how to recover the image with high quality from a real image with low quality is a hot point of research of enterprises and researchers.
In recent years, deep learning has been a breakthrough in improving image quality, and can significantly improve the visual quality of images. However, most of the current methods are limited to the restoration work of a single image, and learn the mapping from a low-quality face image to a high-quality image through a convolutional neural network. Because the characteristics of the real images cannot be simulated, the method cannot be applied to most of the real images, and therefore, ideal robustness and effect cannot be achieved.
In order to solve the problems of the method, one or more high-definition images of the same person are adopted as a guide to assist the recovery process of the network. Although this process achieves a certain performance improvement, it requires that the identity of the degraded image is known in advance and one or more guide maps with high definition are provided, which greatly limits its application range.
Disclosure of Invention
The invention provides a face image restoration system based on a multi-scale face part feature dictionary, aiming at the problem that in the existing face image restoration technology, a high-quality face image obtained from a real low-quality face image needs to be guided by a high-definition face image of a user, so that the application of the high-quality face image is limited.
The invention discloses a face image restoration system based on a multi-scale face part feature dictionary, which comprises:
the face feature dictionary offline generation module: the face feature dictionary is used for respectively extracting high-definition face feature from each sample image in the high-definition face image data set, and obtaining a face feature dictionary by adopting a k-means clustering mode on an extraction result;
the face image restoration module: the face restoration method comprises the steps of extracting features of a degraded face image to be restored, and fusing a feature extraction result with the face part feature dictionary to obtain a face feature to be restored after the part is enhanced; and reconstructing the human face features to be restored to obtain a guide restoration result image.
According to the face image restoration system based on the multi-scale face part feature dictionary, the face part feature dictionary obtained by the face feature dictionary off-line generation module comprises M scales, and M is an integer greater than or equal to 1.
According to the face image restoration system based on the multi-scale face part feature dictionary, when M is 4, the process of processing the sample image by adopting the Vggface model comprises the following steps:
sequentially performing convolution, activation, pooling, convolution, activation, convolution and activation operations on each sample image to obtain high-definition face part features with the scale of 1;
sequentially performing activation, pooling, convolution, activation, convolution, activation and convolution operations on the high-definition face features with the scale of 1 to obtain high-definition face part features with the scale of 2;
sequentially performing activation, pooling, convolution, activation, convolution, activation and convolution operations on the high-definition face features with the scale of 2 to obtain high-definition face part features with the scale of 3;
sequentially performing activation, pooling, convolution, activation, convolution, activation and convolution operations on the high-definition face features with the scale of 3 to obtain high-definition face part features with the scale of 4;
the activation operation adopts a ReLU activation function, and the pooling operation adopts maximum pooling operation;
the first and second convolution operations are both 64 convolution operations of 3 x 3 with a step size of 1;
the third and fourth convolution operations are both 128 convolution operations of 3 x 3 with a step size of 1;
the fifth convolution operation to the ninth convolution operation are all 256 convolution operations with 3 × 3 and step size of 1;
the tenth to sixteen convolution operations are all 512 convolution operations with 3 × 3 and step size of 1;
the high-definition face part feature with the scale of 1, the high-definition face part feature with the scale of 2, the high-definition face part feature with the scale of 3 and the high-definition face part feature with the scale of 4 are respectively processed through a dictionary generation module to obtain a face part feature dictionary with the corresponding scale;
the process of processing the input data by the dictionary generation module comprises the following steps:
acquiring high-definition face part features with the scale of 1, high-definition face part features with the scale of 2, high-definition face part features with the scale of 3 or high-definition face part features with the scale of 4;
then, carrying out region alignment operation: acquiring positions of a left eye, a right eye, a nose and a mouth of the high-definition face position characteristics by adopting a face key point detection algorithm; cutting the left eye, the right eye, the nose and the mouth from the corresponding high-definition human face part features in a RoIAlign mode according to the obtained positions of all parts to obtain the part features of the left eye, the right eye, the nose and the mouth as extraction results;
respectively obtaining K1 clustering centers of the left eye, K2 clustering centers of the right eye, K3 clustering centers of the nose and K4 clustering centers of the mouth of all the part characteristics of all the parts in the extraction result in a K-means clustering mode; wherein K1 cluster centers correspond to the left-eye dictionary, K2 cluster centers correspond to the right-eye dictionary, K3 cluster centers correspond to the nose dictionary, and K4 cluster centers correspond to the mouth dictionary; k1, K2, K3 and K4 are all greater than or equal to 1;
the face part feature dictionary with the scale of 1 is obtained corresponding to the high-definition face part feature with the scale of 1, the face part feature dictionary with the scale of 2 is obtained corresponding to the high-definition face part feature with the scale of 2, the face part feature dictionary with the scale of 3 is obtained corresponding to the high-definition face part feature with the scale of 3, and the face part feature dictionary with the scale of 4 is obtained corresponding to the high-definition face part feature with the scale of 4.
According to the facial image restoration system based on the multi-scale facial feature dictionary, the facial image restoration module comprises:
a primary face feature extraction module: sequentially performing convolution, activation, pooling, convolution, activation and convolution operations on a degraded face image to be restored to obtain degraded face features with the scale of 1;
a secondary face feature extraction module: sequentially performing pooling, activation, convolution, activation and convolution operations on the degraded human face features with the scale of 1 to obtain degraded human face features with the scale of 2;
the three-level face feature extraction module: sequentially performing activation, pooling, convolution, activation, convolution, activation and convolution operations on the degraded human face features with the scale of 2 to obtain degraded human face features with the scale of 3;
four-level face feature extraction module: sequentially performing activation, pooling, convolution, activation, convolution, activation and convolution operations on the degraded human face features with the scale of 3 to obtain degraded human face features with the scale of 4;
the activation operation adopts a ReLU activation function, and the pooling operation adopts maximum pooling operation;
the first and second convolution operations are both 64 convolution operations of 3 x 3 with a step size of 1;
the third and fourth convolution operations are both 128 convolution operations of 3 x 3 with a step size of 1;
the fifth convolution operation to the 9 th convolution operation are all 256 convolution operations with 3 × 3 and the step size of 1;
the tenth to sixteen convolution operations are all 512 convolution operations with 3 × 3 and step size of 1;
the dictionary feature guidance enhancement module with the scale of 1: the face feature dictionary fusion method is used for fusing the degraded face features with the scale of 1 and the face part feature dictionary with the scale of 1 to obtain first-level face features to be restored after part enhancement;
scale 2 dictionary feature guidance enhancement module: the face feature dictionary fusion method is used for fusing the degraded face features with the scale of 2 and the face part feature dictionary with the scale of 2 to obtain second-level face features to be restored after part enhancement;
scale 3 dictionary feature guidance enhancement module: the face feature dictionary fusion method is used for fusing the degraded face features with the scale of 3 and the face part feature dictionary with the scale of 3 to obtain three-level face features to be restored after part enhancement;
the dictionary feature guidance enhancement module with the scale of 4: the face feature dictionary fusion method is used for fusing the degraded face features with the scale of 4 and the face part feature dictionary with the scale of 4 to obtain four-level face features to be restored after part enhancement;
a fourth-level reconstruction module: the system is used for carrying out affine transformation on the degraded human face features with the scale of 4 and four-level human face features to be restored, and inputting transformation results into network decoding features to obtain four-level reconstruction result features;
a third-level reconstruction module: the system is used for carrying out affine transformation on the four-level reconstruction result image and the three-level face features to be restored and inputting the transformation result into model network decoding features to obtain three-level reconstruction result features;
a secondary reconstruction module: the system is used for carrying out affine transformation on the three-level reconstruction result image and the second-level face feature to be restored and inputting the transformation result into the model network decoding feature to obtain the second-level reconstruction result feature;
a primary reconstruction module: the system comprises a model network decoding device, a first-level reconstruction result image processing device, a second-level reconstruction result image processing device, a first-level reconstruction result image processing device and a second-level reconstruction result image processing device, wherein the first-level reconstruction result image processing device is used for performing affine transformation on the first-level reconstruction result image and first-level to-be-restored face features;
an output module: and the primary reconstruction result image is output as a guide restoration result image.
According to the face image restoration system based on the multi-scale face part feature dictionary, the processing process of the dictionary feature guidance enhancement module with the scale of 1, the dictionary feature guidance enhancement module with the scale of 2, the dictionary feature guidance enhancement module with the scale of 3 and the dictionary feature guidance enhancement module with the scale of 4 on input data is the same; taking the dictionary feature guidance enhancement module with the scale of 1 as an example for explanation:
the scale-1 dictionary feature guidance enhancement module comprises:
face part feature extraction module: the method is used for obtaining the position characteristics of the left eye, the right eye, the nose and the mouth from the degraded human face characteristics with the scale of 1 according to the human face key points;
the dictionary feature self-adaptive normalization module: performing self-adaptive normalization operation on the part characteristics of the left eye, the right eye, the nose and the mouth in the face part characteristic dictionary with the scale of 1 by combining the left eye dictionary, the right eye dictionary, the nose dictionary and the mouth dictionary to obtain normalized dictionary characteristics;
and traversing the dictionary module: traversing in the normalized dictionary features to obtain corresponding features closest to the features of the left eye, the right eye, the nose and the mouth as matching dictionary features;
a confidence prediction module: the system is used for predicting confidence according to residual errors between the part features of the left eye, the right eye, the nose and the mouth and the corresponding matched dictionary features to obtain self-adaptive fusion features of all parts;
a restoration module: and the method is used for replacing the self-adaptive fusion features of all parts into the degraded human face image to be restored according to the human face key points to obtain the first-level human face features to be restored.
According to the face image restoration system based on the multi-scale face part feature dictionary, the method for acquiring the normalized dictionary features by the dictionary feature adaptive normalization module comprises the following steps:
and carrying out self-adaptive normalization operation on the position characteristics of the left eye, the right eye, the nose and the mouth:
wherein:in order to normalize the characteristics of the dictionary after normalization,constructing a kth clustering center of the c part feature with the scale of s for the offline; the sigma is the operation of the variance, and,c ∈ { left eye, right eye, nose and mouth }, wherein s is 1, mu is mean value operation, for the c-th part feature of the degraded human face image with the scale of s to be restored;
the process of traversing the dictionary module to obtain the matching dictionary features comprises the following steps:
calculating respective features closest to the site features of the left eye, right eye, nose and mouth:
in the formulaIn order to match the confidence of the dictionary features,<,>the inner product operation is performed.
According to the face image restoration system based on the multi-scale face part feature dictionary, the process of obtaining the self-adaptive fusion features of all parts by the confidence coefficient prediction module comprises the following steps:
in the formulaIn order to adaptively blend the features of the video,is the closest corresponding feature matched;predicting the network for confidence, ΘCPredicting a network learnable parameter for the confidence; the confidence prediction network comprises two layers of convolution operation with 3 x 3 and step length of 1;
the primary reconstruction module obtains a scale change parameter alpha and a displacement change parameter beta through convolution operation of two layers of 3 x 3 with the step length of 1 for a secondary reconstruction result image and primary human face features to be restored, and obtains a primary reconstruction result image SFTs through calculation:
According to the face image restoration system based on the multi-scale face part feature dictionary, the restoration model further comprises a constraint form of a training network, the training network constrains whole network learning through reconstruction loss, and the reconstruction loss comprises loss of a primary reconstruction result image and an undegraded high-definition image corresponding to the primary reconstruction result image between a pixel space and a feature space
In the formula ofl2Representing the pixel spatial loss weight, λpmThe weight of the loss of the feature space is represented,for the first-order reconstruction of the resulting image, IhFor corresponding undegraded high definition images, Cm、Hm、,WmSequentially representing the number, height and width of characteristic channels of the mth layer of the first-level reconstruction result imagemObtaining the mth layer of convolution characteristics for a pre-trained face recognition network;
the training network further comprises a multi-scale discriminant loss function:
for the image of the guiding restoration result, down-sampling r ═ {1,2,4,8} is respectively carried out to obtain 4 groups of images with different resolutions, and loss is calculated by adopting a changeloss mode through four discrimination networksFor learning of the discriminative network, the penalty is defined as:
wherein R is the upper limit of the scale, DrIs a discriminator with a dimension r,for an image down-sampled by r times for a non-degraded high-definition image, x r denotes down-sampled by r times,in the interest of expectation,is composed ofThe distribution of (a);
for learning of a generative network under discriminant network constraints, penaltiesIs defined as:
λa,rdiscriminating the network weight with the scale r; l isdIs a face key point, Dic is a constructed face dictionary, theta is a parameter which can be learned by a model, IdFor the degraded face image to be restored, P (I)d) Is IdThe distribution of (a) to (b) is,a recovery module;
according to the face image restoration system based on the multi-scale face part feature dictionary, the training network performs end-to-end training on other network structures except the first-level face feature extraction module, the second-level face feature extraction module, the third-level face feature extraction module and the fourth-level face feature extraction module by adopting an Adam optimization algorithm.
According to the face image restoration system based on the multi-scale face part feature dictionary, the degraded face image to be restored is obtained by sequentially carrying out blurring, down-sampling, noise adding and JPEG (joint photographic experts group) compression on a high-definition face image, wherein the blurring adopts Gaussian blurring and motion blurring, and the standard deviation of Gaussian blurring kernel is adoptedThe method comprises the following steps of performing down-sampling treatment by a bicubic down-sampling method according to a sampling scale S ∈ {1:0.1: S }, performing noise addition treatment by Gaussian white noise according to a noise level N ∈ {0,1:0.1: N }, performing JPEG compression quality parameters Q ∈ {0,10:0.1: Q }, wherein P is more than or equal to 5, S is more than or equal to 8, N is more than or equal to 15, and Q is more than or equal to 80;
the whole network is trained through the constructed low-quality degraded human face image to be restored and the corresponding high-definition human face image, and the obtained trained network is used for restoring the low-quality image.
The invention has the beneficial effects that: the method replaces the guide image to enhance by constructing the high-definition face part dictionary, so that the image restoration is not limited by the application range any more, and the method can be applied to most face enhancement scenes; compared with one or more guide images, the face part dictionary constructed by the invention has the advantages that high-quality features with higher similarity can be selected as the guide images, and the quality of guide enhancement is greatly improved.
The invention provides a face part dictionary algorithm for assisting face image enhancement aiming at the problem that a real low-quality image cannot be effectively restored in the prior art and the problem that one or more high-definition unified identity characteristic images are needed in a method based on guide image enhancement.
Drawings
FIG. 1 is a flow chart of a face image restoration system based on a multi-scale face part feature dictionary according to the present invention;
FIG. 2 is a block flow diagram of a face image restoration module;
FIG. 3 is a schematic diagram of a network structure for generating a face region feature dictionary;
fig. 4 is a schematic diagram of a network structure in which a face part feature dictionary is migrated to a face image restoration module to realize image restoration.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The invention is further described with reference to the following drawings and specific examples, which are not intended to be limiting.
In a first embodiment, as shown in fig. 1 to 4, the present invention provides a face image restoration system based on a multi-scale face part feature dictionary, including:
the face feature dictionary offline generation module 100: the face feature dictionary is used for respectively extracting high-definition face feature from each sample image in the high-definition face image data set, and obtaining a face feature dictionary by adopting a k-means clustering mode on an extraction result;
face image restoration module 200: the face restoration method comprises the steps of extracting features of a degraded face image to be restored, and fusing a feature extraction result with the face part feature dictionary to obtain a face feature to be restored after the part is enhanced; and reconstructing the human face features to be restored to obtain a guide restoration result image.
In the embodiment, the sample images forming the high-definition face image data set have different postures, expressions, illumination conditions and the like, so that the diversity of the sample images is ensured.
In the embodiment, the face part feature dictionary is generated in an off-line mode, which is beneficial to greatly improving the efficiency of low-quality image restoration.
Further, the face feature dictionary obtained by the face feature dictionary offline generation module 100 includes M scales, where M is an integer greater than or equal to 1.
Still further, with reference to fig. 1 to 4, when M is 4, the processing of the sample image by using the VggFace model includes:
sequentially performing convolution, activation, pooling, convolution, activation, convolution and activation operations on each sample image to obtain high-definition face part features with the scale of 1;
sequentially performing activation, pooling, convolution, activation, convolution, activation and convolution operations on the high-definition face features with the scale of 1 to obtain high-definition face part features with the scale of 2;
sequentially performing activation, pooling, convolution, activation, convolution, activation and convolution operations on the high-definition face features with the scale of 2 to obtain high-definition face part features with the scale of 3;
sequentially performing activation, pooling, convolution, activation, convolution, activation and convolution operations on the high-definition face features with the scale of 3 to obtain high-definition face part features with the scale of 4;
the activation operation adopts a ReLU activation function, and the pooling operation adopts maximum pooling operation;
the first and second convolution operations are both 64 convolution operations of 3 x 3 with a step size of 1;
the third and fourth convolution operations are both 128 convolution operations of 3 x 3 with a step size of 1;
the fifth convolution operation to the ninth convolution operation are all 256 convolution operations with 3 × 3 and step size of 1;
the tenth to sixteen convolution operations are all 512 convolution operations with 3 × 3 and step size of 1;
the high-definition face part feature with the scale of 1, the high-definition face part feature with the scale of 2, the high-definition face part feature with the scale of 3 and the high-definition face part feature with the scale of 4 are respectively processed through a dictionary generation module to obtain a face part feature dictionary with the corresponding scale;
the process of processing the input data by the dictionary generation module comprises the following steps:
acquiring high-definition face part features with the scale of 1, high-definition face part features with the scale of 2, high-definition face part features with the scale of 3 or high-definition face part features with the scale of 4;
then, carrying out region alignment operation: extracting parts of a left eye, a right eye, a nose and a mouth of the high-definition face part characteristics by using a Vggface model with fixed parameters by adopting a face key point detection algorithm; cutting the left eye, the right eye, the nose and the mouth from the corresponding high-definition human face part features in a RoIAlign mode according to the obtained positions of all parts to obtain the part features of the left eye, the right eye, the nose and the mouth as extraction results; in this way, a large number of high-quality features of the individual regions can be obtained;
respectively obtaining K1 clustering centers of the left eye, K2 clustering centers of the right eye, K3 clustering centers of the nose and K4 clustering centers of the mouth of all the part characteristics of all the parts in the extraction result in a K-means clustering mode; wherein K1 cluster centers correspond to the left-eye dictionary, K2 cluster centers correspond to the right-eye dictionary, K3 cluster centers correspond to the nose dictionary, and K4 cluster centers correspond to the mouth dictionary; k1, K2, K3 and K4 are all greater than or equal to 1;
the face part feature dictionary with the scale of 1 is obtained corresponding to the high-definition face part feature with the scale of 1, the face part feature dictionary with the scale of 2 is obtained corresponding to the high-definition face part feature with the scale of 2, the face part feature dictionary with the scale of 3 is obtained corresponding to the high-definition face part feature with the scale of 3, and the face part feature dictionary with the scale of 4 is obtained corresponding to the high-definition face part feature with the scale of 4.
The embodiment is used for extracting S scale features, wherein S is 4, the network structure comprises convolutional layers C1 to C14, and pooling operations A1 to A4;
the convolution layer C1 performs a first convolution on a high-definition input sample image, and performs a first activation operation;
convolutional layer C2 performs a second convolution on the output of convolutional layer C1, a second activation operation;
pooling operation A1 performs a first pooling operation on the output of convolutional layer C2;
convolutional layer C3 performs a third convolution on the output of pooling operation A1, a third activation operation;
convolutional layer C4 performs a fourth convolution on the output of convolutional layer C3, a fourth activation operation;
the output scale of the convolutional layer C4 is 1 of the high-definition human face part characteristics;
pooling layer A2 performs a fifth activation operation on the output of convolutional layer C4, a second pooling operation;
convolutional layer C5 performs a fifth convolution operation, a sixth activation operation, on the output of pooling layer A2;
convolutional layer C6 performs a sixth convolution operation, a seventh activation operation on the convolutional layer C5 output;
convolutional layer C7 performs the seventh convolution operation, the eighth activation operation on the convolutional layer C6 output;
convolutional layer C8 performs an eighth convolution operation on the output of convolutional layer C7;
the output scale of the convolutional layer C8 is 2 high-definition human face part characteristics;
the pooling layer A3 performs a ninth activation operation and a third pooling operation on the output of the convolutional layer C8;
convolutional layer C9 performs the ninth convolution operation, the tenth activation operation on the output of pooling layer A3;
the convolutional layer C10 performs the tenth convolution operation, the eleventh activation operation on the output of the convolutional layer C9;
the convolutional layer C11 performs the eleventh convolution operation, the twelfth activation operation on the output of the convolutional layer C10;
convolutional layer C12 performs a twelfth convolution operation on the output of convolutional layer C11;
the output scale of the convolutional layer C12 is 3 high-definition human face part characteristics;
pooling layer A4 performs a thirteenth activation operation, a fourth pooling operation on the output of convolutional layer C12;
convolutional layer C13 performs a thirteenth convolution operation, a fourteenth activation operation on the output of pooling layer A4;
convolutional layer C14 performs a fourteenth convolution operation, a fifteenth activation operation on the output of convolutional layer C13;
the convolutional layer C15 performs a fifteenth convolution operation, a sixteenth activation operation on the output of the convolutional layer C14;
convolutional layer C16 performs a sixteenth convolution operation on the output of convolutional layer C15;
the convolutional layer C16 has a high-definition face feature output scale of 4.
Still further, as shown in fig. 2, the facial image restoration module 200 includes:
a primary face feature extraction module: sequentially performing convolution, activation, pooling, convolution, activation and convolution operations on a degraded face image to be restored to obtain degraded face features with the scale of 1;
a secondary face feature extraction module: sequentially performing pooling, activation, convolution, activation and convolution operations on the degraded human face features with the scale of 1 to obtain degraded human face features with the scale of 2;
the three-level face feature extraction module: sequentially performing activation, pooling, convolution, activation, convolution, activation and convolution operations on the degraded human face features with the scale of 2 to obtain degraded human face features with the scale of 3;
four-level face feature extraction module: sequentially performing activation, pooling, convolution, activation, convolution, activation and convolution operations on the degraded human face features with the scale of 3 to obtain degraded human face features with the scale of 4;
the activation operation adopts a ReLU activation function, and the pooling operation adopts maximum pooling operation;
the first and second convolution operations are both 64 convolution operations of 3 x 3 with a step size of 1;
the third and fourth convolution operations are both 128 convolution operations of 3 x 3 with a step size of 1;
the fifth convolution operation to the 9 th convolution operation are all 256 convolution operations with 3 × 3 and the step size of 1;
the tenth to sixteen convolution operations are all 512 convolution operations with 3 × 3 and step size of 1;
the dictionary feature guidance enhancement module with the scale of 1: the face feature dictionary fusion method is used for fusing the degraded face features with the scale of 1 and the face part feature dictionary with the scale of 1 to obtain first-level face features to be restored after part enhancement;
scale 2 dictionary feature guidance enhancement module: the face feature dictionary fusion method is used for fusing the degraded face features with the scale of 2 and the face part feature dictionary with the scale of 2 to obtain second-level face features to be restored after part enhancement;
scale 3 dictionary feature guidance enhancement module: the face feature dictionary fusion method is used for fusing the degraded face features with the scale of 3 and the face part feature dictionary with the scale of 3 to obtain three-level face features to be restored after part enhancement;
the dictionary feature guidance enhancement module with the scale of 4: the face feature dictionary fusion method is used for fusing the degraded face features with the scale of 4 and the face part feature dictionary with the scale of 4 to obtain four-level face features to be restored after part enhancement;
a fourth-level reconstruction module: the system is used for carrying out affine transformation on the degraded human face features with the scale of 4 and four-level human face features to be restored, and inputting transformation results into network decoding features to obtain four-level reconstruction result features;
a third-level reconstruction module: the system is used for carrying out affine transformation on the four-level reconstruction result image and the three-level face features to be restored and inputting the transformation result into model network decoding features to obtain three-level reconstruction result features;
a secondary reconstruction module: the system is used for carrying out affine transformation on the three-level reconstruction result image and the second-level face feature to be restored and inputting the transformation result into the model network decoding feature to obtain the second-level reconstruction result feature;
a primary reconstruction module: the system comprises a model network decoding device, a first-level reconstruction result image processing device, a second-level reconstruction result image processing device, a first-level reconstruction result image processing device and a second-level reconstruction result image processing device, wherein the first-level reconstruction result image processing device is used for performing affine transformation on the first-level reconstruction result image and first-level to-be-restored face features;
an output module: and the primary reconstruction result image is output as a guide restoration result image.
In this embodiment, the process of extracting an image by the four-level face feature extraction module specifically includes:
the first-level human face feature extraction module comprises the following steps:
the convolution layer C1 performs the first convolution on the degraded human face image to be restored, and performs the first activation operation;
convolutional layer C2 performs a second convolution on the output of convolutional layer C1, a second activation operation;
pooling operation A1 performs a first pooling operation on the output of convolutional layer C2;
convolutional layer C3 performs a third convolution on the output of pooling operation A1, a third activation operation;
convolutional layer C4 performs a fourth convolution on the output of convolutional layer C3, a fourth activation operation;
the output scale of convolutional layer C4 is 1 degraded human face features;
the secondary face feature extraction module comprises the following steps:
pooling layer A2 performs a fifth activation operation on the output of convolutional layer C4, a second pooling operation;
convolutional layer C5 performs a fifth convolution operation, a sixth activation operation, on the output of pooling layer A2;
convolutional layer C6 performs a sixth convolution operation, a seventh activation operation on the convolutional layer C5 output;
convolutional layer C7 performs the seventh convolution operation, the eighth activation operation on the convolutional layer C6 output;
convolutional layer C8 performs an eighth convolution operation on the output of convolutional layer C7;
the output scale of convolutional layer C8 is 2 degraded human face features;
the pooling layer A3 performs a ninth activation operation and a third pooling operation on the output of the convolutional layer C8;
convolutional layer C9 performs the ninth convolution operation, the tenth activation operation on the output of pooling layer A3;
the convolutional layer C10 performs the tenth convolution operation, the eleventh activation operation on the output of the convolutional layer C9;
the convolutional layer C11 performs the eleventh convolution operation, the twelfth activation operation on the output of the convolutional layer C10;
convolutional layer C12 performs a twelfth convolution operation on the output of convolutional layer C11;
the output scale of convolutional layer C12 is 3 degraded human face features;
pooling layer A4 performs a thirteenth activation operation, a fourth pooling operation on the output of convolutional layer C12;
convolutional layer C13 performs a thirteenth convolution operation, a fourteenth activation operation on the output of pooling layer A4;
convolutional layer C14 performs a fourteenth convolution operation, a fifteenth activation operation on the output of convolutional layer C13;
the convolutional layer C15 performs a fifteenth convolution operation, a sixteenth activation operation on the output of the convolutional layer C14;
convolutional layer C16 performs a sixteenth convolution operation on the output of convolutional layer C15;
the output scale of convolutional layer C16 is 4 degraded facial features.
Still further, with reference to fig. 2, the processing procedure of the dictionary feature guidance enhancement module with scale 1, the processing procedure of the dictionary feature guidance enhancement module with scale 2, the processing procedure of the dictionary feature guidance enhancement module with scale 3, and the processing procedure of the dictionary feature guidance enhancement module with scale 4 are the same; taking the dictionary feature guidance enhancement module with the scale of 1 as an example for explanation:
the scale-1 dictionary feature guidance enhancement module comprises:
face part feature extraction module: the method is used for obtaining the position characteristics of the left eye, the right eye, the nose and the mouth from the degraded human face characteristics with the scale of 1 according to the human face key points;
the dictionary feature self-adaptive normalization module: performing self-adaptive normalization operation on the part characteristics of the left eye, the right eye, the nose and the mouth in the face part characteristic dictionary with the scale of 1 by combining the left eye dictionary, the right eye dictionary, the nose dictionary and the mouth dictionary to obtain normalized dictionary characteristics;
and traversing the dictionary module: traversing in the normalized dictionary features to obtain corresponding features closest to the features of the left eye, the right eye, the nose and the mouth as matching dictionary features;
a confidence prediction module: the system is used for predicting confidence according to residual errors between the part features of the left eye, the right eye, the nose and the mouth and the corresponding matched dictionary features to obtain self-adaptive fusion features of all parts;
a restoration module: and the method is used for replacing the self-adaptive fusion features of all parts into the degraded human face image to be restored according to the human face key points to obtain the first-level human face features to be restored. Specifically, the enhanced features of each part are replaced with corresponding features in the features of the input image according to the positions detected by the key points of the human face, so as to obtain the enhanced next-stage human face features to be restored.
In this embodiment, the four dictionary feature guidance enhancement modules add the enhanced features to the network decoding features through feature affine transformation, and the four enhancement modules respectively operate in feature spaces of different scales, so that a coarse-to-fine guidance restoration result can be obtained.
Still further, the method for obtaining the normalized dictionary features by the dictionary feature adaptive normalization module comprises the following steps:
and carrying out self-adaptive normalization operation on the position characteristics of the left eye, the right eye, the nose and the mouth:
wherein:in order to normalize the characteristics of the dictionary after normalization,constructing a kth clustering center of the c part feature with the scale of s for the offline; the sigma is the operation of the variance, and,c ∈ { left eye, right eye, nose and mouth }, wherein s is 1, mu is mean value operation, for the c-th part feature of the degraded human face image with the scale of s to be restored;
based on the above operation, an adaptive normalization operation is performed on all dictionary features of each part c.
The process of traversing the dictionary module to obtain the matching dictionary features comprises the following steps:
calculating respective features closest to the site features of the left eye, right eye, nose and mouth:
in the formulaTo match the confidence of the dictionary features, the inner product operation can be implemented by adopting the convolution operation with the bias execution of 0.
And outputting the similarity between the part of the current input image and each clustering center of the dictionary, and selecting the feature with the highest score, namely the most similar feature as the matching feature.
Still further, the process of obtaining the adaptive fusion characteristics of each part by the confidence prediction module includes:
in the formulaIn order to adaptively blend the features of the video,is the closest corresponding feature matched;predicting the network for confidence, ΘCPredicting a network learnable parameter for the confidence; the confidence prediction network comprises two layers of convolution operation with 3 x 3 and step length of 1;
the primary reconstruction module obtains a scale change parameter alpha and a displacement change parameter beta through convolution operation of two layers of 3 x 3 with the step length of 1 for a secondary reconstruction result image and primary human face features to be restored, and obtains a primary reconstruction result image SFTs through calculation:
In the embodiment, for the matched dictionary features, a confidence coefficient is predicted according to a residual error between the dictionary features and the input part features, and the confidence coefficient is applied to the dictionary features and is added back to the input part features.
Still further, the restoration model further comprises a constraint form of a training network, the training network constrains the whole network learning through reconstruction loss, and the reconstruction loss comprises that the first-stage reconstruction result image and the non-degraded high-definition image corresponding to the first-stage reconstruction result image are between the pixel space and the feature spaceLoss of
In the formula ofl2Representing the pixel spatial loss weight, λpmThe weight of the loss of the feature space is represented,for the first-order reconstruction of the resulting image, IhFor corresponding undegraded high definition images, Cm、Hm、,WmSequentially representing the number, height and width of characteristic channels of the mth layer of the first-level reconstruction result imagemObtaining the nth layer of convolution characteristics for a pre-trained face recognition network;
the training network further comprises a multi-scale discriminant loss function:
for the image of the guiding restoration result, down-sampling r ═ {1,2,4,8} is respectively carried out to obtain 4 groups of images with different resolutions, and loss is calculated by adopting a changeloss mode through four discrimination networksFor learning of the discriminative network, the penalty is defined as:
wherein R is the upper limit of the scale, DrIs a discriminator with a dimension r,for an image down-sampled by r times for a non-degraded high-definition image, x r denotes down-sampled by r times,in the interest of expectation,is composed ofThe distribution of (a);
for learning of a generative network under discriminant network constraints, penaltiesIs defined as:
λa,rdiscriminating the network weight with the scale r; l isdIs a face key point, Dic is a constructed face dictionary, theta is a parameter which can be learned by a model, IdFor the degraded face image to be restored, P (I)d) Is IdThe distribution of (a) to (b) is,a recovery module;
and further, the training network performs end-to-end training on other network structures except the first-level human face feature extraction module, the second-level human face feature extraction module, the third-level human face feature extraction module and the fourth-level human face feature extraction module by adopting an Adam optimization algorithm.
Further, the degraded human face image to be restored is obtained by sequentially carrying out blurring, down-sampling, noise adding and JPEG compression on the high-definition human face image, wherein the blurring process adopts Gaussian blurring and motion blurring, and Gaussian blurring kernel standard deviationThe method comprises the following steps of performing down-sampling treatment by a bicubic down-sampling method according to a sampling scale S ∈ {1:0.1: S }, performing noise addition treatment by Gaussian white noise according to a noise level N ∈ {0,1:0.1: N }, performing JPEG compression quality parameters Q ∈ {0,10:0.1: Q }, wherein P is more than or equal to 5, S is more than or equal to 8, N is more than or equal to 15, and Q is more than or equal to 80;
the whole network is trained through the constructed low-quality degraded human face image to be restored and the corresponding high-definition human face image, and the obtained trained network is used for restoring the low-quality image.
The restoration system of the invention provides the construction operation of the multi-scale face part feature dictionary and the operation of transferring the feature dictionary to the degradation graph on the basis of the existing face image restoration system based on the convolutional neural network. Firstly, extracting each face part from a large number of high-definition face images, and then obtaining the characteristics of each part in different scales by adopting a k-means mode. For each location of a degraded graph, there are K high-definition location dictionaries that can be used as a guide enhancement. Firstly, for each scale, adopting a site self-adaptive normalization operation to normalize dictionary features, and using the normalized dictionary features to process the problem of inconsistent distribution of the degradation map and the dictionary features. Traversing the whole normalized dictionary to obtain part features with the most similar features as guidance; for the problem that dictionary degrees required by different degenerated inputs are inconsistent, residual errors based on the matching features and the input features are used for predicting a confidence coefficient for obtaining dictionary features in a targeted mode. Finally, the multi-scale dictionary fusion features are beneficial to learning detailed features from coarse to fine by the network.
The method is suitable for any face repairing scene.
The face feature extraction module adopted by the invention comprises but is not limited to the use of a Vggface model.
The method adopts a form of constructing a high-quality dictionary to assist image enhancement in the convolutional neural network, and comprises the step of obtaining a plurality of high-definition part features in a K mean value mode.
The present invention is implemented by convolution operation and multilayer convolution operation in a neural network structure, and is not limited to a neural network.
Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that features described in different dependent claims and herein may be combined in ways different from those described in the original claims. It is also to be understood that features described in connection with individual embodiments may be used in other described embodiments.
Claims (10)
1. A face image restoration system based on a multi-scale face part feature dictionary is characterized by comprising:
face feature dictionary offline generation module (100): the face feature dictionary is used for respectively extracting high-definition face feature from each sample image in the high-definition face image data set, and obtaining a face feature dictionary by adopting a k-means clustering mode on an extraction result;
face image restoration module (200): the face restoration method comprises the steps of extracting features of a degraded face image to be restored, and fusing a feature extraction result with the face part feature dictionary to obtain a face feature to be restored after the part is enhanced; and reconstructing the human face features to be restored to obtain a guide restoration result image.
2. The system for restoring a human face image based on a multi-scale human face feature dictionary according to claim 1, wherein the human face feature dictionary obtained by the human face feature dictionary off-line generation module (100) comprises M scales, and M is an integer greater than or equal to 1.
3. The system for restoring a human face image based on a multi-scale human face feature dictionary according to claim 2, wherein when M is 4, the process of processing the sample image by using the VggFace model comprises:
sequentially performing convolution, activation, pooling, convolution, activation, convolution and activation operations on each sample image to obtain high-definition face part features with the scale of 1;
sequentially performing activation, pooling, convolution, activation, convolution, activation and convolution operations on the high-definition face features with the scale of 1 to obtain high-definition face part features with the scale of 2;
sequentially performing activation, pooling, convolution, activation, convolution, activation and convolution operations on the high-definition face features with the scale of 2 to obtain high-definition face part features with the scale of 3;
sequentially performing activation, pooling, convolution, activation, convolution, activation and convolution operations on the high-definition face features with the scale of 3 to obtain high-definition face part features with the scale of 4;
the activation operation adopts a ReLU activation function, and the pooling operation adopts maximum pooling operation;
the first and second convolution operations are both 64 convolution operations of 3 x 3 with a step size of 1;
the third and fourth convolution operations are both 128 convolution operations of 3 x 3 with a step size of 1;
the fifth convolution operation to the ninth convolution operation are all 256 convolution operations with 3 × 3 and step size of 1;
the tenth to sixteen convolution operations are all 512 convolution operations with 3 × 3 and step size of 1;
the high-definition face part feature with the scale of 1, the high-definition face part feature with the scale of 2, the high-definition face part feature with the scale of 3 and the high-definition face part feature with the scale of 4 are respectively processed through a dictionary generation module to obtain a face part feature dictionary with the corresponding scale;
the process of processing the input data by the dictionary generation module comprises the following steps:
acquiring high-definition face part features with the scale of 1, high-definition face part features with the scale of 2, high-definition face part features with the scale of 3 or high-definition face part features with the scale of 4;
then, carrying out region alignment operation: acquiring positions of a left eye, a right eye, a nose and a mouth of the high-definition face position characteristics by adopting a face key point detection algorithm; cutting the left eye, the right eye, the nose and the mouth from the corresponding high-definition human face part features in a RoIAlign mode according to the obtained positions of all parts to obtain the part features of the left eye, the right eye, the nose and the mouth as extraction results;
respectively obtaining K1 clustering centers of the left eye, K2 clustering centers of the right eye, K3 clustering centers of the nose and K4 clustering centers of the mouth of all the part characteristics of all the parts in the extraction result in a K-means clustering mode; wherein K1 cluster centers correspond to the left-eye dictionary, K2 cluster centers correspond to the right-eye dictionary, K3 cluster centers correspond to the nose dictionary, and K4 cluster centers correspond to the mouth dictionary; k1, K2, K3 and K4 are all greater than or equal to 1;
the face part feature dictionary with the scale of 1 is obtained corresponding to the high-definition face part feature with the scale of 1, the face part feature dictionary with the scale of 2 is obtained corresponding to the high-definition face part feature with the scale of 2, the face part feature dictionary with the scale of 3 is obtained corresponding to the high-definition face part feature with the scale of 3, and the face part feature dictionary with the scale of 4 is obtained corresponding to the high-definition face part feature with the scale of 4.
4. The multi-scale face region feature dictionary based face image restoration system according to claim 3, wherein the face image restoration module (200) comprises:
a primary face feature extraction module: sequentially performing convolution, activation, pooling, convolution, activation and convolution operations on a degraded face image to be restored to obtain degraded face features with the scale of 1;
a secondary face feature extraction module: sequentially performing pooling, activation, convolution, activation and convolution operations on the degraded human face features with the scale of 1 to obtain degraded human face features with the scale of 2;
the three-level face feature extraction module: sequentially performing activation, pooling, convolution, activation, convolution, activation and convolution operations on the degraded human face features with the scale of 2 to obtain degraded human face features with the scale of 3;
four-level face feature extraction module: sequentially performing activation, pooling, convolution, activation, convolution, activation and convolution operations on the degraded human face features with the scale of 3 to obtain degraded human face features with the scale of 4;
the activation operation adopts a ReLU activation function, and the pooling operation adopts maximum pooling operation;
the first and second convolution operations are both 64 convolution operations of 3 x 3 with a step size of 1;
the third and fourth convolution operations are both 128 convolution operations of 3 x 3 with a step size of 1;
the fifth convolution operation to the 9 th convolution operation are all 256 convolution operations with 3 × 3 and the step size of 1;
the tenth to sixteen convolution operations are all 512 convolution operations with 3 × 3 and step size of 1;
the dictionary feature guidance enhancement module with the scale of 1: the face feature dictionary fusion method is used for fusing the degraded face features with the scale of 1 and the face part feature dictionary with the scale of 1 to obtain first-level face features to be restored after part enhancement;
scale 2 dictionary feature guidance enhancement module: the face feature dictionary fusion method is used for fusing the degraded face features with the scale of 2 and the face part feature dictionary with the scale of 2 to obtain second-level face features to be restored after part enhancement;
scale 3 dictionary feature guidance enhancement module: the face feature dictionary fusion method is used for fusing the degraded face features with the scale of 3 and the face part feature dictionary with the scale of 3 to obtain three-level face features to be restored after part enhancement;
the dictionary feature guidance enhancement module with the scale of 4: the face feature dictionary fusion method is used for fusing the degraded face features with the scale of 4 and the face part feature dictionary with the scale of 4 to obtain four-level face features to be restored after part enhancement;
a fourth-level reconstruction module: the system is used for carrying out affine transformation on the degraded human face features with the scale of 4 and four-level human face features to be restored, and inputting transformation results into network decoding features to obtain four-level reconstruction result features;
a third-level reconstruction module: the system is used for carrying out affine transformation on the four-level reconstruction result image and the three-level face features to be restored and inputting the transformation result into model network decoding features to obtain three-level reconstruction result features;
a secondary reconstruction module: the system is used for carrying out affine transformation on the three-level reconstruction result image and the second-level face feature to be restored and inputting the transformation result into the model network decoding feature to obtain the second-level reconstruction result feature;
a primary reconstruction module: the system comprises a model network decoding device, a first-level reconstruction result image processing device, a second-level reconstruction result image processing device, a first-level reconstruction result image processing device and a second-level reconstruction result image processing device, wherein the first-level reconstruction result image processing device is used for performing affine transformation on the first-level reconstruction result image and first-level to-be-restored face features;
an output module: and the primary reconstruction result image is output as a guide restoration result image.
5. The system for restoring a human face image based on a multi-scale human face part feature dictionary according to claim 4, wherein the processing procedures of the dictionary feature guidance enhancement module with the scale of 1, the dictionary feature guidance enhancement module with the scale of 2, the dictionary feature guidance enhancement module with the scale of 3 and the dictionary feature guidance enhancement module with the scale of 4 on the input data are the same; taking the dictionary feature guidance enhancement module with the scale of 1 as an example for explanation:
the scale-1 dictionary feature guidance enhancement module comprises:
face part feature extraction module: the method is used for obtaining the position characteristics of the left eye, the right eye, the nose and the mouth from the degraded human face characteristics with the scale of 1 according to the human face key points;
the dictionary feature self-adaptive normalization module: performing self-adaptive normalization operation on the part characteristics of the left eye, the right eye, the nose and the mouth in the face part characteristic dictionary with the scale of 1 by combining the left eye dictionary, the right eye dictionary, the nose dictionary and the mouth dictionary to obtain normalized dictionary characteristics;
and traversing the dictionary module: traversing in the normalized dictionary features to obtain corresponding features closest to the features of the left eye, the right eye, the nose and the mouth as matching dictionary features;
a confidence prediction module: the system is used for predicting confidence according to residual errors between the part features of the left eye, the right eye, the nose and the mouth and the corresponding matched dictionary features to obtain self-adaptive fusion features of all parts;
a restoration module: and the method is used for replacing the self-adaptive fusion features of all parts into the degraded human face image to be restored according to the human face key points to obtain the first-level human face features to be restored.
6. The system for restoring a human face image based on a multi-scale human face feature dictionary according to claim 5, wherein the method for obtaining the normalized dictionary features by the dictionary feature adaptive normalization module comprises:
and carrying out self-adaptive normalization operation on the position characteristics of the left eye, the right eye, the nose and the mouth:
wherein:in order to normalize the characteristics of the dictionary after normalization,constructing a kth clustering center of the c part feature with the scale of s for the offline; the sigma is the operation of the variance, and,c ∈ { left eye, right eye, nose and mouth }, wherein s is 1, mu is mean value operation, for the c-th part feature of the degraded human face image with the scale of s to be restored;
the process of traversing the dictionary module to obtain the matching dictionary features comprises the following steps:
calculating respective features closest to the site features of the left eye, right eye, nose and mouth:
7. The system for restoring a facial image based on a multi-scale facial feature dictionary according to claim 6,
the process of the confidence coefficient prediction module for obtaining the self-adaptive fusion characteristics of each part comprises the following steps:
in the formulaIn order to adaptively blend the features of the video,is the closest corresponding feature matched;predicting the network for confidence, ΘCPredicting a network learnable parameter for the confidence; the confidence prediction network comprises two layers of convolution operation with 3 x 3 and step length of 1;
the primary reconstruction module obtains a scale change parameter alpha and a displacement change parameter beta through convolution operation of two layers of 3 x 3 with the step length of 1 for a secondary reconstruction result image and primary human face features to be restored, and obtains a primary reconstruction result image SFTs through calculation:
8. The system for restoring a facial image based on a multi-scale facial features dictionary as claimed in claim 7, wherein the system is characterized in thatThe restoration model further comprises a constraint form of a training network, the training network constrains the whole network learning through reconstruction loss, and the reconstruction loss comprises the loss of the primary reconstruction result image and the non-degraded high-definition image corresponding to the primary reconstruction result image between the pixel space and the feature space
In the formula ofl2Representing the pixel spatial loss weight, λpmThe weight of the loss of the feature space is represented,for the first-order reconstruction of the resulting image, IhFor corresponding undegraded high definition images, Cm、Hm、,WmSequentially representing the number, height and width of characteristic channels of the mth layer of the first-level reconstruction result imagemObtaining the mth layer of convolution characteristics for a pre-trained face recognition network;
the training network further comprises a multi-scale discriminant loss function:
for the image of the guiding restoration result, down-sampling r ═ {1,2,4,8} is respectively carried out to obtain 4 groups of images with different resolutions, and loss is calculated by adopting a changeloss mode through four discrimination networksFor learning of the discriminative network, the penalty is defined as:
wherein R is the upper limit of the scale, DrIs a discriminator with a dimension r,for an image down-sampled by r times for a non-degraded high-definition image, x r denotes down-sampled by r times,in the interest of expectation,is composed ofThe distribution of (a);
for learning of a generating network under discriminating network constraints, loss ladv,GIs defined as:
9. The system for restoring a facial image based on a multi-scale facial feature dictionary according to claim 8, wherein the training network adopts an Adam optimization algorithm to perform end-to-end training on other network structures except the first-level facial feature extraction module, the second-level facial feature extraction module, the third-level facial feature extraction module and the fourth-level facial feature extraction module.
10. The system for restoring a human face image based on a multi-scale human face part feature dictionary according to claim 9, wherein the degraded human face image to be restored is obtained by sequentially performing blurring, down-sampling, noise adding and JPEG (joint photographic experts group) compression on a high-definition human face imageObtaining the standard deviation of Gaussian blur and motion blur by the blur processingThe method comprises the following steps of performing down-sampling treatment by a bicubic down-sampling method according to a sampling scale S ∈ {1:0.1: S }, performing noise addition treatment by Gaussian white noise according to a noise level N ∈ {0,1:0.1: N }, performing JPEG compression quality parameters Q ∈ {0,10:0.1: Q }, wherein P is more than or equal to 5, S is more than or equal to 8, N is more than or equal to 15, and Q is more than or equal to 80;
the whole network is trained through the constructed low-quality degraded human face image to be restored and the corresponding high-definition human face image, and the obtained trained network is used for restoring the low-quality image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010779169.7A CN111768354A (en) | 2020-08-05 | 2020-08-05 | Face image restoration system based on multi-scale face part feature dictionary |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010779169.7A CN111768354A (en) | 2020-08-05 | 2020-08-05 | Face image restoration system based on multi-scale face part feature dictionary |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111768354A true CN111768354A (en) | 2020-10-13 |
Family
ID=72729707
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010779169.7A Pending CN111768354A (en) | 2020-08-05 | 2020-08-05 | Face image restoration system based on multi-scale face part feature dictionary |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111768354A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113128624A (en) * | 2021-05-11 | 2021-07-16 | 山东财经大学 | Graph network face recovery method based on multi-scale dictionary |
CN113554569A (en) * | 2021-08-04 | 2021-10-26 | 哈尔滨工业大学 | Face image restoration system based on double memory dictionaries |
CN113688752A (en) * | 2021-08-30 | 2021-11-23 | 厦门美图宜肤科技有限公司 | Face pigment detection model training method, device, equipment and storage medium |
CN114170108A (en) * | 2021-12-14 | 2022-03-11 | 哈尔滨工业大学 | Natural scene image blind restoration system based on human face degradation model migration |
CN116452466A (en) * | 2023-06-14 | 2023-07-18 | 荣耀终端有限公司 | Image processing method, device, equipment and computer readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103996024A (en) * | 2014-05-13 | 2014-08-20 | 南京信息工程大学 | Bayesian estimation sparse representation face recognition method based on dictionary reconstruction |
US20180268203A1 (en) * | 2017-03-17 | 2018-09-20 | Nec Laboratories America, Inc. | Face recognition system for face recognition in unlabeled videos with domain adversarial learning and knowledge distillation |
CN110288697A (en) * | 2019-06-24 | 2019-09-27 | 天津大学 | 3D face representation and method for reconstructing based on multiple dimensioned figure convolutional neural networks |
CN111260577A (en) * | 2020-01-15 | 2020-06-09 | 哈尔滨工业大学 | Face image restoration system based on multi-guide image and self-adaptive feature fusion |
-
2020
- 2020-08-05 CN CN202010779169.7A patent/CN111768354A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103996024A (en) * | 2014-05-13 | 2014-08-20 | 南京信息工程大学 | Bayesian estimation sparse representation face recognition method based on dictionary reconstruction |
US20180268203A1 (en) * | 2017-03-17 | 2018-09-20 | Nec Laboratories America, Inc. | Face recognition system for face recognition in unlabeled videos with domain adversarial learning and knowledge distillation |
CN110288697A (en) * | 2019-06-24 | 2019-09-27 | 天津大学 | 3D face representation and method for reconstructing based on multiple dimensioned figure convolutional neural networks |
CN111260577A (en) * | 2020-01-15 | 2020-06-09 | 哈尔滨工业大学 | Face image restoration system based on multi-guide image and self-adaptive feature fusion |
Non-Patent Citations (1)
Title |
---|
XIAOMING LI 等: ""Blind Face Restoration via Deep Multi-scale Component Dictionaries"", 《HTTPS://ARXIV.ORG/PDF/2008.00418.PDF》, 2 August 2020 (2020-08-02), pages 6 - 10 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113128624A (en) * | 2021-05-11 | 2021-07-16 | 山东财经大学 | Graph network face recovery method based on multi-scale dictionary |
CN113554569A (en) * | 2021-08-04 | 2021-10-26 | 哈尔滨工业大学 | Face image restoration system based on double memory dictionaries |
CN113554569B (en) * | 2021-08-04 | 2022-03-08 | 哈尔滨工业大学 | Face image restoration system based on double memory dictionaries |
CN113688752A (en) * | 2021-08-30 | 2021-11-23 | 厦门美图宜肤科技有限公司 | Face pigment detection model training method, device, equipment and storage medium |
WO2023029233A1 (en) * | 2021-08-30 | 2023-03-09 | 厦门美图宜肤科技有限公司 | Face pigment detection model training method and apparatus, device, and storage medium |
CN113688752B (en) * | 2021-08-30 | 2024-02-02 | 厦门美图宜肤科技有限公司 | Training method, device, equipment and storage medium for face color detection model |
CN114170108A (en) * | 2021-12-14 | 2022-03-11 | 哈尔滨工业大学 | Natural scene image blind restoration system based on human face degradation model migration |
CN114170108B (en) * | 2021-12-14 | 2024-04-12 | 哈尔滨工业大学 | Natural scene image blind restoration system based on face degradation model migration |
CN116452466A (en) * | 2023-06-14 | 2023-07-18 | 荣耀终端有限公司 | Image processing method, device, equipment and computer readable storage medium |
CN116452466B (en) * | 2023-06-14 | 2023-10-20 | 荣耀终端有限公司 | Image processing method, device, equipment and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111768354A (en) | Face image restoration system based on multi-scale face part feature dictionary | |
CN108520503B (en) | Face defect image restoration method based on self-encoder and generation countermeasure network | |
CN108537754B (en) | Face image restoration system based on deformation guide picture | |
CN110136062B (en) | Super-resolution reconstruction method combining semantic segmentation | |
CN110378208B (en) | Behavior identification method based on deep residual error network | |
CN112507617B (en) | Training method of SRFlow super-resolution model and face recognition method | |
CN112291570B (en) | Real-time video enhancement method based on lightweight deformable convolutional neural network | |
WO2024040973A1 (en) | Multi-scale fused dehazing method based on stacked hourglass network | |
CN115393396B (en) | Unmanned aerial vehicle target tracking method based on mask pre-training | |
CN111639564A (en) | Video pedestrian re-identification method based on multi-attention heterogeneous network | |
CN109949217B (en) | Video super-resolution reconstruction method based on residual learning and implicit motion compensation | |
CN103577813A (en) | Information fusion method for heterogeneous iris recognition | |
CN112036260A (en) | Expression recognition method and system for multi-scale sub-block aggregation in natural environment | |
CN112241939A (en) | Light-weight rain removing method based on multi-scale and non-local | |
CN111260577B (en) | Face image restoration system based on multi-guide image and self-adaptive feature fusion | |
CN112598604A (en) | Blind face restoration method and system | |
CN113421186A (en) | Apparatus and method for unsupervised video super-resolution using a generation countermeasure network | |
CN116630369A (en) | Unmanned aerial vehicle target tracking method based on space-time memory network | |
US20230072445A1 (en) | Self-supervised video representation learning by exploring spatiotemporal continuity | |
Zhao et al. | Adaptive Dual-Stream Sparse Transformer Network for Salient Object Detection in Optical Remote Sensing Images | |
CN117333908A (en) | Cross-modal pedestrian re-recognition method based on attitude feature alignment | |
Sun et al. | Learning discrete representations from reference images for large scale factor image super-resolution | |
CN115147457B (en) | Memory enhanced self-supervision tracking method and device based on space-time perception | |
CN113128624B (en) | Graph network face recovery method based on multi-scale dictionary | |
CN113554569B (en) | Face image restoration system based on double memory dictionaries |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201013 |