CN114820310A - Semantic feature-based face super-resolution reconstruction method and system - Google Patents
Semantic feature-based face super-resolution reconstruction method and system Download PDFInfo
- Publication number
- CN114820310A CN114820310A CN202210426417.9A CN202210426417A CN114820310A CN 114820310 A CN114820310 A CN 114820310A CN 202210426417 A CN202210426417 A CN 202210426417A CN 114820310 A CN114820310 A CN 114820310A
- Authority
- CN
- China
- Prior art keywords
- face
- layer
- resolution
- reconstruction
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 66
- 238000006731 degradation reaction Methods 0.000 claims abstract description 38
- 230000015556 catabolic process Effects 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 18
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 11
- 230000010354 integration Effects 0.000 claims abstract description 11
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 11
- 230000003321 amplification Effects 0.000 claims abstract description 10
- 238000003199 nucleic acid amplification method Methods 0.000 claims abstract description 10
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 7
- 230000008447 perception Effects 0.000 claims abstract description 5
- 230000008569 process Effects 0.000 claims description 29
- 230000006870 function Effects 0.000 claims description 26
- 230000004913 activation Effects 0.000 claims description 21
- 238000011176 pooling Methods 0.000 claims description 9
- 238000001228 spectrum Methods 0.000 claims description 9
- 230000011218 segmentation Effects 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 6
- 210000004709 eyebrow Anatomy 0.000 claims description 3
- 210000004209 hair Anatomy 0.000 claims description 3
- 210000003128 head Anatomy 0.000 claims description 3
- 210000001331 nose Anatomy 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000003595 spectral effect Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 230000016776 visual perception Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/162—Detection; Localisation; Normalisation using pixel segmentation or colour matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a face super-resolution reconstruction method and a face super-resolution reconstruction system based on semantic features, wherein the method comprises the following steps: step one, a quality degradation stage: synthesizing the high-resolution face of the training set into a low-resolution face image containing complex noise and fuzzy distribution; step two, a generation stage: the system comprises a synthesis module, an amplification module and an integration module, and not only improves the generalization capability of the FSR network model under the multi-degradation mode, but also enhances the SR face perception quality reconstructed under the multi-degradation mode.
Description
Technical Field
The invention relates to a face super-resolution reconstruction method and system based on semantic features.
Background
Super-resolution reconstruction (SR) is an important area of image quality enhancement research. The technique may recover more texture information from a Low Resolution (LR) image and form a High Resolution (HR) image. Face super-resolution reconstruction (FSR) is an application branch of SR technology. This branch is directed to recovering a high resolution face morphology from a face at a low resolution. The FSR technology can be applied to the auxiliary related advanced visual tasks, such as the application of face recognition, face correction, security video analysis and other related biological characteristics.
Deep learning techniques are widely explored in face super-resolution reconstruction. On the basis of the research of applying the convolutional neural network structure to the SR reconstruction of the natural image, the research of the FSR method is also advanced. Existing FSR methods can be divided into two major categories, supervised and unsupervised: the supervised FSR method is based on a paired LR-HR face data set to complete learning, namely an LR face defined by a training set outputs a reconstructed SR face through a convolution network structure enhancement feature, and is supervised and optimized by an HR face in the training set; the FSR method under unsupervised utilizes a mode of data enhancement or network function relationship enrichment, does not need a paired LR-HR data set, can complete learning only by using a single face image in a training set, and can reduce the dependence of network model training on data compared with a supervised method.
In addition, compared with a natural image, the human face image contains unique prior information, such as semantic graphs, facial marker points, edges and other biological features, and the prior information plays a unique role in some FSR methods. Some existing FSR methods introduce relevant feature information and fuse the feature information as network model input into an FSR reconstruction network to play a role in guiding face reconstruction. Meanwhile, some methods try to predict relevant prior information and establish constraint in the face reconstruction process, and enhance the expression of the prior information in the FSR network model. The prior information specific to the face images can improve the directivity of the FSR method in face reconstruction and improve the reconstruction effect of the method on the face structure and related attributes.
In the prior art, a supervision method needs paired LR-HR face data set support model training, an LR face is obtained by a real face through a fixed down-sampling method, and the single degradation process hypothesis cannot adapt to the face reconstruction requirement in a multi-degradation mode. In unsupervised learning, the reconstructed SR face has deformation or distortion phenomenon without reference of real face, which affects visual effect. Meanwhile, in a multi-degradation mode, some prior information such as facial landmark points, edges and the like is inaccurate or unavailable, so that the guided FSR model has a poor reconstruction effect. The problem to be solved urgently in the field is to select reasonable face prior information for efficiently enhancing the reconstruction performance of the FSR model in the multi-degradation mode.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a face super-resolution reconstruction method and system based on semantic features.
In order to achieve the purpose, the invention is realized by the following technical scheme:
a face super-resolution reconstruction method based on semantic features comprises the following steps:
step one, a quality degradation stage: synthesizing the high-resolution face of the training set into a low-resolution face image containing complex noise and fuzzy distribution;
step two, a generation stage: the generation stage comprises a rough reconstruction process and a fine reconstruction process, the rough reconstruction process is used for roughly amplifying the low-resolution face image to the target resolution, the fine reconstruction process is used for extracting semantic features based on the result of the rough reconstruction process, then the semantic features and the general features are integrated under a semantic attention module, and finally a super-resolution face reconstruction result is formed and used as the output of the network.
Preferably, the degradation phase comprises the steps of:
step S11, the high resolution face image HR in the training set firstly passes through a residual error network group, the residual error network group comprises a plurality of residual error network blocks and a plurality of pooling layers, the pooling layers are positioned between two adjacent residual error network blocks, the pooling layers have the function of down-sampling, when the high resolution face image HR is down-sampled to 4x4, the structure formed by the sub-pixel layers and the related residual error network blocks restores the up-sampling of the image to 16x16,and finally integrated into three-channel images as degraded low-resolution face images LR deg ;
Step S12, ensuring LR deg The identity characteristics of the image are not interfered, and the image is sampled with the standard image LR after the interpolation deg A content loss function L is established between pix-LR :L pix-LR =||LR deg -LR bic || 2 ;
Step S13, introducing complex noise and fuzzy information in the degradation stage to make the synthesized LR deg The generation of the confrontation network is introduced when the content distribution is closer to the human face captured in the real environment, and then the confrontation network is processed by an LR discriminator D LR LR (low rate of speech) human face LR captured in real environment real Determination of synthetic LR deg To generate an LR challenge loss function L established by the challenge process adv-LR The following were used:
preferably, the residual network block includes a convolutional layer a, a convolutional layer B, a regularization layer C, an activation layer D, a convolutional layer E, a regularization layer F, and an activation layer G, the convolutional layer a is connected to the regularization layer C through the convolutional layer B, the regularization layer C is connected to the convolutional layer E through the activation layer D, and the convolutional layer E is connected to the activation layer G through the regularization layer F.
Preferably, the LR discriminator D LR The spectrum regularization layer G is connected with the convolution layer G through the activation layer K, and the convolution layer L is connected with the activation layer N through the spectrum regularization layer M.
Preferably, the coarse reconstruction process is LR deg SR face SR with rough reconstruction under target resolution output on rough reconstruction network Coarss And establishing a related content loss function L pix-CoarssSR :L pix-CoarssSR =||SR Coarss -HR|| 2 Then based on SR Coarss Design thereinSemantic loss function L seg-CoarssSR :L seg-CoarssSR =||Seg(SR Coarss )-Seg(HR)|| 2 (ii) a The Seg (-) is a pre-trained human face semantic segmentation network, the network can output human face semantic segmentation prediction results under 19 channels, and the 19 channels respectively correspond to different human face components including eyes, a nose, eyebrows, hair and a head.
Preferably, in the fine reconstruction process, the general features and the semantic features of the rough reconstructed face are integrated and enhanced through a fine reconstruction network, and finally a face super-resolution reconstruction result is output.
Preferably, the fine reconstruction network is composed of a convolutional layer M, a residual network group, a semantic feature attention network, and an integrated convolutional layer.
Preferably, the fine reconstruction process comprises the following steps:
step A21, input SR Coarss After general features are extracted from general convolutional layers, the general features are fused with corresponding semantic feature channels and enter a semantic feature attention network together, and then mixed features containing attention are obtained;
step A22, enabling the mixed features containing attention to enter a residual error network group deepening feature expression, then entering an integrated convolution layer, and then converting the integrated convolution layer into a 3-channel image serving as a face reconstruction result SR finally output in a generation stage Fins ;
Step A23, generating the SR of the phase output Fins Establishing constraints, SR at first Fine Establishing a content loss function L between standard images HR in the same training set pix-FineSR :L pix-FineSR =||SR Fine -HR|| 2 ;
Step A24, SR to be reconstructed Fine Passes through SR discriminator D SR Integrates the features and outputs a 16x16 size tensor whose values represent D SR The reconstruction quality of the corresponding region has a value in the range of 0-1. The closer to 1 represents the better perceived quality of the reconstruction, and the SR penalty function L of the SR face thus established adv-SR The following were used:
L adv-SR =-σ[D SR (SR Fine log(D SR (SR))+(1-D SR (SR Fine ))log(1-D SR (HR))]。
the system comprises a synthesis module, an amplification module and an integration module, wherein the synthesis module is connected with the integration module through the amplification module, the synthesis module is used for synthesizing a low-resolution face image containing complex noise and fuzzy distribution by a face, the amplification module is used for roughly amplifying the low-resolution face image to a target resolution, and the integration module is used for integrating semantic features and general features of the image so as to form a super-resolution face reconstruction result.
The invention has the following beneficial effects: the framework of the invention is an unsupervised model, and the quality degradation stage and the generation stage of the human face are jointly trained, so that the problem of limitation of a data set in training is solved, and the generalization of the model in multiple quality degradation modes is improved. In the degradation stage, a degradation network is designed and the degradation process of the image is learned, so that the synthesized LR face contains rich noise and fuzzy information, and the limitation on a data set is solved. In the generation stage, human face semantic features are integrated in a simplified to complex mode, the expression of the semantic features in a deep reconstruction network is enhanced by utilizing a channel attention mechanism, the visual effect of the reconstructed human face is enhanced by using reasonable human face prior information, and the problem that the human face component contour is easy to distort and deform without supervision is solved. Through experiments, the method has excellent performance in face super-resolution reconstruction under the condition of multiple degradations and has clear and accurate visual perception effect. The face super-resolution reconstruction method based on the semantic features and aiming at the multi-degradation mode combines an unsupervised learning framework and semantic feature prior information, so that the generalization capability of an FSR network model under the multi-degradation mode is improved, and meanwhile the SR face perception quality reconstructed under the multi-degradation mode is enhanced.
Drawings
FIG. 1 is a flowchart of the process of the present invention;
FIG. 2 is a schematic diagram of a residual block;
FIG. 3 shows LR discriminator D LR Schematic structural diagram of (a);
FIG. 4 is a schematic diagram of a fine reconstruction network;
fig. 5 is a block diagram of a system.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings of the specification:
as shown in fig. 1, a face super-resolution reconstruction method based on semantic features includes the following steps:
step one, a quality degradation stage: synthesizing the high-resolution face of the training set into a low-resolution face image containing complex noise and fuzzy distribution;
step two, a generation stage: the generation stage comprises a rough reconstruction process and a fine reconstruction process, the rough reconstruction process is used for roughly amplifying the low-resolution face image to the target resolution, the fine reconstruction process is used for extracting semantic features based on the result of the rough reconstruction process, then the semantic features and the general features are integrated under a semantic attention module, and finally a super-resolution face reconstruction result is formed and used as the output of the network.
In the degradation stage, synthesizing the high-resolution face of the training set into a low-resolution face containing complex noise and fuzzy distribution as the input of the generation stage; in the generation stage, the low-resolution image is roughly amplified to the target resolution, the semantic features are extracted according to the low-resolution image, then the semantic features and the general features are integrated under a semantic attention module, and finally a super-resolution face reconstruction result is formed and is used as the output of a network.
The degradation phase comprises the following steps:
s11, the high-resolution face image HR in the training set firstly passes through a residual error network group, the residual error network group comprises a plurality of residual error network blocks and a plurality of pooling layers, the pooling layers are positioned between two adjacent residual error network blocks, the pooling layers have a down-sampling function, when the high-resolution face image HR is down-sampled to 4x4, the image is up-sampled and restored to 16x16 size by a structure formed by a sub-pixel layer and related residual error network blocks, and finally the high-resolution face image HR is integrated into a three-channel image serving as a down-samplingLow resolution face image LR after quality deg ;
Step S12, ensuring LR deg The identity characteristics of the image are not interfered, and the image is sampled with the standard image LR after the interpolation deg A content loss function L is established between pis-LR :L pix-LR =||LR deg -LR bic || 2 ;
Step S13, introducing complex noise and fuzzy information in the degradation stage to make the synthesized LR deg The generation of the confrontation network is introduced when the content distribution is closer to the human face captured in the real environment, and then the confrontation network is processed by an LR discriminator D LR LR (low rate of speech) human face LR captured in real environment real Determination of synthesized LR deg Of the synthesized LR in the degradation stage deg Warp D LR After the integrated features are extracted, a single-channel tensor of 16x16 size is output. The value range in the tensor is from 0 to 1, the value range represents the quality distribution of the low-resolution image area at the corresponding position, the closer the value is to 1, the better the quality degradation quality of the corresponding area is, and the LR countermeasure loss function L established in the countermeasure process is generated adv-LR The following were used:
the LR output in the degradation stage under the combined action of the content loss function and the LR counter-loss function deg The distribution of (a) is more complex and diverse than a single interpolation downsampling. The synthesized low-resolution face image is used as the input of a subsequent generation stage, and the generalization and reconstruction capability of the FSR model under the multi-degradation mode can be effectively improved.
As shown in fig. 2, the residual network block includes convolutional layer a1, convolutional layer B2, regularization layer C3, active layer D4, convolutional layer E5, regularization layer F6, and active layer G7, where convolutional layer a1 is connected to regularization layer C3 through convolutional layer B2, regularization layer C3 is connected to convolutional layer E5 through active layer D4, and convolutional layer E5 is connected to active layer G7 through regularization layer F6.
As shown in fig. 3, an LR discriminator D LR Comprises an active layer H8, a convolutional layer I9, and spectrum regularizationA layering G10, an activation layer K11, a convolutional layer L12, a spectral regularization layer M13, an activation layer N14, the activation layer H8 is connected to the spectral regularization layer G10 through a convolutional layer I9, the spectral regularization layer G10 is connected to the convolutional layer L12 through an activation layer K11, and the convolutional layer L12 is connected to the activation layer N14 through a spectral regularization layer M13.
The course of the coarse reconstruction is LR deg SR face SR with rough reconstruction under target resolution output on rough reconstruction network Coarss The coarse reconstruction network consists of 11 residual error network blocks, the residual error network blocks are divided into two groups, and the number ratio is 8: 3, the tail part of each group of residual error blocks comprises a bilinear interpolation up-sampling layer and a rough reconstruction network L seg-CoarssSR =||Seg(SR Coarss )-Seg(HR)|| 2 The output of the network is SR face SR of rough reconstruction under target resolution Coarss And establishing a related content loss function L pix-CoarssSR :L pix-CoarssSR =||SR Coarss -HR|| 2 Then based on SR Coarss A semantic loss function L is designed seg-CoarssSR :L seg-CoarssSR =||Seg(SR Coarss )-Seg(HR)|| 2 。
The Seg (-) is a pre-trained human face semantic segmentation network, the network can output human face semantic segmentation prediction results under 19 channels, and the 19 channels respectively correspond to different human face components including eyes, a nose, eyebrows, hair and a head. The above SR-based Coarss The predicted semantic segmentation will act as semantic prior information in the subsequent fine reconstruction phase.
In the fine reconstruction process, the general features and the semantic features of the rough reconstructed face are integrated and enhanced through a fine reconstruction network, and finally a face super-resolution reconstruction result is output.
As shown in fig. 4, the fine reconstruction network consists of convolutional layer M15, residual network group 16, semantic feature attention network 17, and integrated convolutional layer 18.
In the fine reconstruction process, the fine reconstruction network integrates and enhances the general features and the semantic features of the rough reconstructed face, and finally outputs a face super-resolution reconstruction result. As shown in the corresponding area of FIG. 1, the fine reconstruction network consists of a general convolutional layer and a residual error networkThe system comprises a network group, a semantic feature attention network and an integration convolutional layer (a convolutional layer for integrating information of all channels), wherein 1 semantic feature attention network and 4 residual error network blocks jointly form a group of integral residual error relations. Input SR Coarss After general features are extracted from general convolutional layers, the general features are fused with corresponding semantic feature channels and enter a semantic feature attention network designed by people together. After two kinds of human face features input into the network are fused, a mixed feature containing attention is obtained through channel attention operation. The attention mechanism can effectively distinguish important human face features, and the influence of semantic features under general features in a reconstructed network is strengthened. The hybrid features then enter a residual network group deepening feature representation. After the enhancement of a residual error network group and a semantic feature attention network, mixed features enter an integrated convolution layer and are converted into a 3-channel image serving as a face reconstruction result SR finally output in a generation stage Fins 。
The fine reconstruction process comprises the following steps:
step A21, input SR Coarss After general features are extracted from general convolutional layers, the general features are fused with corresponding semantic feature channels and enter a semantic feature attention network together, and then mixed features containing attention are obtained;
step A22, enabling the mixed features containing attention to enter a residual error network group deepening feature expression, then entering an integrated convolution layer, and then converting the integrated convolution layer into a 3-channel image serving as a face reconstruction result SR finally output in a generation stage Fins ;
Step A23, generating the SR of the phase output Fins Establishing constraints, SR at first Fine Establishing a content loss function L between standard images HR in the same training set pix-FineSR :L pix-FineSR =||SR Fine -HR|| 2 ;
Step A24, SR to be reconstructed Fine Passes through SR discriminator D SR Integrates the features and outputs a 16x16 size tensor whose values represent D SR The closer to 1 the reconstruction quality of the corresponding region represents the better the reconstruction perception quality, and the SR antithetical loss function of the SR face is established therebyNumber L adv-SR The following were used: l is adv-SR =-σ[D SR (SR Fine log(D SR (HR))+(1-D SR (SR Fine ))log(1-D SR (HR))]。
As shown in fig. 5, the system of the semantic feature-based face super-resolution reconstruction method formed in the above steps includes a synthesis module 61, an amplification module 62, and an integration module 63, where the synthesis module 61 is connected to the integration module 63 through the amplification module 62, the synthesis module is used for synthesizing a low-resolution face image containing complex noise and fuzzy distribution from a face, the amplification module is used for roughly amplifying the low-resolution face image to a target resolution, and the integration module is used for integrating semantic features and general features of the image, so as to form a super-resolution face reconstruction result.
Under the combined action of the content loss function and the SR counter loss function, the final reconstruction result SR output by the generation stage Fine Both in content and perceptual quality. In summary, the innovation points of the invention are as follows: the face super-resolution reconstruction method based on semantic features and aiming at the multi-degradation mode combines an unsupervised learning framework and semantic feature prior information, so that the generalization capability of an FSR network model under the multi-degradation mode is improved, and meanwhile the reconstructed SR face perception quality under the multi-degradation mode is enhanced.
The invention has the innovation points that the human face semantic features are introduced in the unsupervised learning mode, the generalization capability of the FSR network model in reconstruction in the multi-degradation mode is enhanced, and the semantic features introduced in the unsupervised method as prior information are emphasized to be stable and excellent in performance in the multi-degradation mode. The invention has an unsupervised face super-resolution reconstruction network to enhance the generalization, adapts to the face reconstruction requirement under multiple degradation modes, and introduces face semantic features as prior to improve the visual perception quality.
The invention realizes the face super-resolution reconstruction task under the multi-degradation mode by utilizing an unsupervised network model and introducing the face semantic features as prior information, and gets rid of the limitation that a pair of LR-HR face data sets are necessarily used in a supervised method. The network model obtained by training under the framework of the invention can adapt to the requirements of face super-resolution reconstruction under various degradation conditions, and improves the visual perception effect of the reconstructed face. The network model can be suitable for assisting in video security detection, small-size face recognition, face verification and other related advanced visual tasks in a real environment.
It should be noted that the above list is only one specific embodiment of the present invention. It is clear that the invention is not limited to the embodiments described above, but that many variations are possible, all of which can be derived or suggested directly from the disclosure of the invention by a person skilled in the art, and are considered to be within the scope of the invention.
Claims (9)
1. A face super-resolution reconstruction method based on semantic features is characterized by comprising the following steps:
step one, a quality degradation stage: synthesizing the high-resolution face of the training set into a low-resolution face image containing complex noise and fuzzy distribution;
step two, a generation stage: the generation stage comprises a coarse reconstruction process and a fine reconstruction process, the coarse reconstruction process is used for roughly amplifying the low-resolution face image to the target resolution, the fine reconstruction process is used for extracting semantic features based on the result of the coarse reconstruction process, then the semantic features and the general features are integrated under a semantic attention module, and finally a super-resolution face reconstruction result is formed and used as the output of the network.
2. The semantic feature-based face super-resolution reconstruction method according to claim 1, wherein the quality degradation stage comprises the following steps:
s11, the high resolution face image HR in the training set firstly passes through a residual error network group, the residual error network group comprises a plurality of residual error network blocks and a plurality of pooling layers, the pooling layers are positioned between two adjacent residual error network blocks, the pooling layers have a down-sampling function, and when the high resolution face image HR is down-sampled to 4x4, the image is up-sampled by a structure formed by a sub-pixel layer and related residual error network blocksThe image is restored to the size of 16x16 and finally integrated into a three-channel image as a degraded low-resolution face image LR deg ;
Step S12, ensuring LR deg The identity characteristics of the image are not interfered, and the image is sampled with the standard image LR after the interpolation deg A content loss function L is established between pix-LR :L pix-LR =||LR deg -LR bic || 2 ;
Step S13, introducing complex noise and fuzzy information in the degradation stage to make the synthesized LR deg The generation of the confrontation network is introduced when the content distribution is closer to the human face captured in the real environment, and then the confrontation network is processed by an LR discriminator D LR LR (low rate of speech) human face LR captured in real environment real Determination of synthetic LR deg To generate an LR challenge loss function L established by the challenge process adv-LR The following were used:
L adv-LR =-σ[D LR (LR deg log(D LR (LR real ))+(1-D LR (LR deg ))log(1-D LR (LR real ))]
3. the semantic-feature-based face super-resolution reconstruction method according to claim 2, wherein the residual network block comprises a convolutional layer A (1), a convolutional layer B (2), a regularization layer C (3), an activation layer D (4), a convolutional layer E (5), a regularization layer F (6) and an activation layer G (7), the convolutional layer A (1) is connected with the regularization layer C (3) through the convolutional layer B (2), the regularization layer C (3) is connected with the convolutional layer E (5) through the activation layer D (4), and the convolutional layer E (5) is connected with the activation layer G (7) through the regularization layer F (6).
4. The semantic feature-based face super-resolution reconstruction method of claim 2, wherein the LR discriminator D LR The spectrum regularization layer comprises an activation layer H (8), a convolution layer I (9), a spectrum regularization layer G (10), an activation layer K (11), a convolution layer L (12), a spectrum regularization layer M (13) and an activation layer N (14), wherein the activation layer H (8) is connected with the spectrum regularization layer G (10) through the convolution layer I (9), and the spectrum regularization layer G (10) is connected with the convolution layer L (12) through the activation layer K (11)The convolutional layer L (12) is connected with an activation layer N (14) through a spectrum regularization layer M (13).
5. The semantic feature-based face super-resolution reconstruction method of claim 1, wherein the coarse reconstruction process is LR deg SR face SR with rough reconstruction under target resolution output on rough reconstruction network Coarss And establishing a related content loss function L pix-CoarssSR :L pix-CoarssSR =||SR Coarss -HR|| 2 Then based on SR Coarss Designing a semantic loss function L seg-CoarssSR :L seg-CoarseSR =||Seg(SR Coarse )-Seg(HR)|| 2 (ii) a The Seg (-) is a pre-trained human face semantic segmentation network, the network can output human face semantic segmentation prediction results under 19 channels, and the 19 channels respectively correspond to different human face components including eyes, a nose, eyebrows, hair and a head.
6. The method for reconstructing super-resolution human face based on semantic features of claim 5, wherein in the fine reconstruction process, the general features and semantic features of the rough reconstructed human face are integrated and enhanced through a fine reconstruction network, and a human face super-resolution reconstruction result is finally output.
7. The super-resolution face reconstruction method based on semantic features of claim 6, wherein the fine reconstruction network is composed of a convolutional layer M (15), a residual network group (16), a semantic feature attention network (17) and an integrated convolutional layer (18).
8. The semantic feature-based face super-resolution reconstruction method according to claim 7, wherein the fine reconstruction process comprises the following steps:
step A21, input SR Coarss After general features are extracted from general convolutional layers, the general features are fused with corresponding semantic feature channels and enter a semantic feature attention network together, and then mixed features containing attention are obtained;
step A22, enabling the mixed features containing attention to enter a residual error network group deepening feature expression, then entering an integrated convolution layer, and then converting the integrated convolution layer into a 3-channel image serving as a face reconstruction result SR finally output in a generation stage Fins ;
Step A23, generating the SR of the phase output Fins Establishing constraints, SR at first Fins Establishing a content loss function L between standard images HR in the same training set pix-FineSR :L pix-FineSR =||SR Fine -HR|| 2 ;
Step A24, SR to be reconstructed Fins Passes through SR discriminator D SR Integrates the features and outputs a 16x16 size tensor, whose values range from 0 to 1, representing D SR The closer to 1 the reconstruction quality of the corresponding region represents the better the reconstruction perception quality, and the SR confrontation loss function L of the SR face is established by the closer to 1 adv-SR The following were used:
L adv-SR =-σ[D SR (SR Fins log(D SR (HR))+(1-D SR (SR Fine ))log(1-D SR (HR))]。
9. the system of the semantic feature based face super-resolution reconstruction method according to claim 1, wherein the system comprises a synthesis module (61), an amplification module (62), and an integration module (63), the synthesis module (61) is connected to the integration module (63) through the amplification module (62), the synthesis module is used for face synthesis to obtain a low-resolution face image containing complex noise and fuzzy distribution, the amplification module is used for roughly amplifying the low-resolution face image to a target resolution, and the integration module is used for integrating semantic features and general features of the image, so as to form a super-resolution face reconstruction result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210426417.9A CN114820310A (en) | 2022-04-21 | 2022-04-21 | Semantic feature-based face super-resolution reconstruction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210426417.9A CN114820310A (en) | 2022-04-21 | 2022-04-21 | Semantic feature-based face super-resolution reconstruction method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114820310A true CN114820310A (en) | 2022-07-29 |
Family
ID=82505319
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210426417.9A Pending CN114820310A (en) | 2022-04-21 | 2022-04-21 | Semantic feature-based face super-resolution reconstruction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114820310A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115953296A (en) * | 2022-12-09 | 2023-04-11 | 中山大学·深圳 | Transform and convolutional neural network combined based face super-resolution reconstruction method and system |
-
2022
- 2022-04-21 CN CN202210426417.9A patent/CN114820310A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115953296A (en) * | 2022-12-09 | 2023-04-11 | 中山大学·深圳 | Transform and convolutional neural network combined based face super-resolution reconstruction method and system |
CN115953296B (en) * | 2022-12-09 | 2024-04-05 | 中山大学·深圳 | Face super-resolution reconstruction method and system based on combination of transducer and convolutional neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112287940A (en) | Semantic segmentation method of attention mechanism based on deep learning | |
CN110969124B (en) | Two-dimensional human body posture estimation method and system based on lightweight multi-branch network | |
CN110348330B (en) | Face pose virtual view generation method based on VAE-ACGAN | |
CN111179167B (en) | Image super-resolution method based on multi-stage attention enhancement network | |
CN112819910B (en) | Hyperspectral image reconstruction method based on double-ghost attention machine mechanism network | |
CN113283444B (en) | Heterogeneous image migration method based on generation countermeasure network | |
CN113658040A (en) | Face super-resolution method based on prior information and attention fusion mechanism | |
CN111476133B (en) | Unmanned driving-oriented foreground and background codec network target extraction method | |
CN110188667B (en) | Face rectification method based on three-party confrontation generation network | |
CN112070668A (en) | Image super-resolution method based on deep learning and edge enhancement | |
CN109903373A (en) | A kind of high quality human face generating method based on multiple dimensioned residual error network | |
CN114820310A (en) | Semantic feature-based face super-resolution reconstruction method and system | |
CN110992374A (en) | Hair refined segmentation method and system based on deep learning | |
CN113935435A (en) | Multi-modal emotion recognition method based on space-time feature fusion | |
CN115631107A (en) | Edge-guided single image noise removal | |
CN115526777A (en) | Blind over-separation network establishing method, blind over-separation method and storage medium | |
CN113379606A (en) | Face super-resolution method based on pre-training generation model | |
CN116664435A (en) | Face restoration method based on multi-scale face analysis map integration | |
CN114187668B (en) | Face silence living body detection method and device based on positive sample training | |
CN116309213A (en) | High-real-time multi-source image fusion method based on generation countermeasure network | |
CN116258627A (en) | Super-resolution recovery system and method for extremely-degraded face image | |
CN115424337A (en) | Iris image restoration system based on priori guidance | |
CN116266336A (en) | Video super-resolution reconstruction method, device, computing equipment and storage medium | |
CN115131414A (en) | Unmanned aerial vehicle image alignment method based on deep learning, electronic equipment and storage medium | |
CN111950496B (en) | Mask person identity recognition method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |