CN114820310A - Semantic feature-based face super-resolution reconstruction method and system - Google Patents

Semantic feature-based face super-resolution reconstruction method and system Download PDF

Info

Publication number
CN114820310A
CN114820310A CN202210426417.9A CN202210426417A CN114820310A CN 114820310 A CN114820310 A CN 114820310A CN 202210426417 A CN202210426417 A CN 202210426417A CN 114820310 A CN114820310 A CN 114820310A
Authority
CN
China
Prior art keywords
face
layer
resolution
reconstruction
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210426417.9A
Other languages
Chinese (zh)
Inventor
金枝
齐浩然
张欢荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202210426417.9A priority Critical patent/CN114820310A/en
Publication of CN114820310A publication Critical patent/CN114820310A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/162Detection; Localisation; Normalisation using pixel segmentation or colour matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a face super-resolution reconstruction method and a face super-resolution reconstruction system based on semantic features, wherein the method comprises the following steps: step one, a quality degradation stage: synthesizing the high-resolution face of the training set into a low-resolution face image containing complex noise and fuzzy distribution; step two, a generation stage: the system comprises a synthesis module, an amplification module and an integration module, and not only improves the generalization capability of the FSR network model under the multi-degradation mode, but also enhances the SR face perception quality reconstructed under the multi-degradation mode.

Description

Semantic feature-based face super-resolution reconstruction method and system
Technical Field
The invention relates to a face super-resolution reconstruction method and system based on semantic features.
Background
Super-resolution reconstruction (SR) is an important area of image quality enhancement research. The technique may recover more texture information from a Low Resolution (LR) image and form a High Resolution (HR) image. Face super-resolution reconstruction (FSR) is an application branch of SR technology. This branch is directed to recovering a high resolution face morphology from a face at a low resolution. The FSR technology can be applied to the auxiliary related advanced visual tasks, such as the application of face recognition, face correction, security video analysis and other related biological characteristics.
Deep learning techniques are widely explored in face super-resolution reconstruction. On the basis of the research of applying the convolutional neural network structure to the SR reconstruction of the natural image, the research of the FSR method is also advanced. Existing FSR methods can be divided into two major categories, supervised and unsupervised: the supervised FSR method is based on a paired LR-HR face data set to complete learning, namely an LR face defined by a training set outputs a reconstructed SR face through a convolution network structure enhancement feature, and is supervised and optimized by an HR face in the training set; the FSR method under unsupervised utilizes a mode of data enhancement or network function relationship enrichment, does not need a paired LR-HR data set, can complete learning only by using a single face image in a training set, and can reduce the dependence of network model training on data compared with a supervised method.
In addition, compared with a natural image, the human face image contains unique prior information, such as semantic graphs, facial marker points, edges and other biological features, and the prior information plays a unique role in some FSR methods. Some existing FSR methods introduce relevant feature information and fuse the feature information as network model input into an FSR reconstruction network to play a role in guiding face reconstruction. Meanwhile, some methods try to predict relevant prior information and establish constraint in the face reconstruction process, and enhance the expression of the prior information in the FSR network model. The prior information specific to the face images can improve the directivity of the FSR method in face reconstruction and improve the reconstruction effect of the method on the face structure and related attributes.
In the prior art, a supervision method needs paired LR-HR face data set support model training, an LR face is obtained by a real face through a fixed down-sampling method, and the single degradation process hypothesis cannot adapt to the face reconstruction requirement in a multi-degradation mode. In unsupervised learning, the reconstructed SR face has deformation or distortion phenomenon without reference of real face, which affects visual effect. Meanwhile, in a multi-degradation mode, some prior information such as facial landmark points, edges and the like is inaccurate or unavailable, so that the guided FSR model has a poor reconstruction effect. The problem to be solved urgently in the field is to select reasonable face prior information for efficiently enhancing the reconstruction performance of the FSR model in the multi-degradation mode.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a face super-resolution reconstruction method and system based on semantic features.
In order to achieve the purpose, the invention is realized by the following technical scheme:
a face super-resolution reconstruction method based on semantic features comprises the following steps:
step one, a quality degradation stage: synthesizing the high-resolution face of the training set into a low-resolution face image containing complex noise and fuzzy distribution;
step two, a generation stage: the generation stage comprises a rough reconstruction process and a fine reconstruction process, the rough reconstruction process is used for roughly amplifying the low-resolution face image to the target resolution, the fine reconstruction process is used for extracting semantic features based on the result of the rough reconstruction process, then the semantic features and the general features are integrated under a semantic attention module, and finally a super-resolution face reconstruction result is formed and used as the output of the network.
Preferably, the degradation phase comprises the steps of:
step S11, the high resolution face image HR in the training set firstly passes through a residual error network group, the residual error network group comprises a plurality of residual error network blocks and a plurality of pooling layers, the pooling layers are positioned between two adjacent residual error network blocks, the pooling layers have the function of down-sampling, when the high resolution face image HR is down-sampled to 4x4, the structure formed by the sub-pixel layers and the related residual error network blocks restores the up-sampling of the image to 16x16,and finally integrated into three-channel images as degraded low-resolution face images LR deg
Step S12, ensuring LR deg The identity characteristics of the image are not interfered, and the image is sampled with the standard image LR after the interpolation deg A content loss function L is established between pix-LR :L pix-LR =||LR deg -LR bic || 2
Step S13, introducing complex noise and fuzzy information in the degradation stage to make the synthesized LR deg The generation of the confrontation network is introduced when the content distribution is closer to the human face captured in the real environment, and then the confrontation network is processed by an LR discriminator D LR LR (low rate of speech) human face LR captured in real environment real Determination of synthetic LR deg To generate an LR challenge loss function L established by the challenge process adv-LR The following were used:
Figure BDA0003608647000000031
preferably, the residual network block includes a convolutional layer a, a convolutional layer B, a regularization layer C, an activation layer D, a convolutional layer E, a regularization layer F, and an activation layer G, the convolutional layer a is connected to the regularization layer C through the convolutional layer B, the regularization layer C is connected to the convolutional layer E through the activation layer D, and the convolutional layer E is connected to the activation layer G through the regularization layer F.
Preferably, the LR discriminator D LR The spectrum regularization layer G is connected with the convolution layer G through the activation layer K, and the convolution layer L is connected with the activation layer N through the spectrum regularization layer M.
Preferably, the coarse reconstruction process is LR deg SR face SR with rough reconstruction under target resolution output on rough reconstruction network Coarss And establishing a related content loss function L pix-CoarssSR :L pix-CoarssSR =||SR Coarss -HR|| 2 Then based on SR Coarss Design thereinSemantic loss function L seg-CoarssSR :L seg-CoarssSR =||Seg(SR Coarss )-Seg(HR)|| 2 (ii) a The Seg (-) is a pre-trained human face semantic segmentation network, the network can output human face semantic segmentation prediction results under 19 channels, and the 19 channels respectively correspond to different human face components including eyes, a nose, eyebrows, hair and a head.
Preferably, in the fine reconstruction process, the general features and the semantic features of the rough reconstructed face are integrated and enhanced through a fine reconstruction network, and finally a face super-resolution reconstruction result is output.
Preferably, the fine reconstruction network is composed of a convolutional layer M, a residual network group, a semantic feature attention network, and an integrated convolutional layer.
Preferably, the fine reconstruction process comprises the following steps:
step A21, input SR Coarss After general features are extracted from general convolutional layers, the general features are fused with corresponding semantic feature channels and enter a semantic feature attention network together, and then mixed features containing attention are obtained;
step A22, enabling the mixed features containing attention to enter a residual error network group deepening feature expression, then entering an integrated convolution layer, and then converting the integrated convolution layer into a 3-channel image serving as a face reconstruction result SR finally output in a generation stage Fins
Step A23, generating the SR of the phase output Fins Establishing constraints, SR at first Fine Establishing a content loss function L between standard images HR in the same training set pix-FineSR :L pix-FineSR =||SR Fine -HR|| 2
Step A24, SR to be reconstructed Fine Passes through SR discriminator D SR Integrates the features and outputs a 16x16 size tensor whose values represent D SR The reconstruction quality of the corresponding region has a value in the range of 0-1. The closer to 1 represents the better perceived quality of the reconstruction, and the SR penalty function L of the SR face thus established adv-SR The following were used:
L adv-SR =-σ[D SR (SR Fine log(D SR (SR))+(1-D SR (SR Fine ))log(1-D SR (HR))]。
the system comprises a synthesis module, an amplification module and an integration module, wherein the synthesis module is connected with the integration module through the amplification module, the synthesis module is used for synthesizing a low-resolution face image containing complex noise and fuzzy distribution by a face, the amplification module is used for roughly amplifying the low-resolution face image to a target resolution, and the integration module is used for integrating semantic features and general features of the image so as to form a super-resolution face reconstruction result.
The invention has the following beneficial effects: the framework of the invention is an unsupervised model, and the quality degradation stage and the generation stage of the human face are jointly trained, so that the problem of limitation of a data set in training is solved, and the generalization of the model in multiple quality degradation modes is improved. In the degradation stage, a degradation network is designed and the degradation process of the image is learned, so that the synthesized LR face contains rich noise and fuzzy information, and the limitation on a data set is solved. In the generation stage, human face semantic features are integrated in a simplified to complex mode, the expression of the semantic features in a deep reconstruction network is enhanced by utilizing a channel attention mechanism, the visual effect of the reconstructed human face is enhanced by using reasonable human face prior information, and the problem that the human face component contour is easy to distort and deform without supervision is solved. Through experiments, the method has excellent performance in face super-resolution reconstruction under the condition of multiple degradations and has clear and accurate visual perception effect. The face super-resolution reconstruction method based on the semantic features and aiming at the multi-degradation mode combines an unsupervised learning framework and semantic feature prior information, so that the generalization capability of an FSR network model under the multi-degradation mode is improved, and meanwhile the SR face perception quality reconstructed under the multi-degradation mode is enhanced.
Drawings
FIG. 1 is a flowchart of the process of the present invention;
FIG. 2 is a schematic diagram of a residual block;
FIG. 3 shows LR discriminator D LR Schematic structural diagram of (a);
FIG. 4 is a schematic diagram of a fine reconstruction network;
fig. 5 is a block diagram of a system.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings of the specification:
as shown in fig. 1, a face super-resolution reconstruction method based on semantic features includes the following steps:
step one, a quality degradation stage: synthesizing the high-resolution face of the training set into a low-resolution face image containing complex noise and fuzzy distribution;
step two, a generation stage: the generation stage comprises a rough reconstruction process and a fine reconstruction process, the rough reconstruction process is used for roughly amplifying the low-resolution face image to the target resolution, the fine reconstruction process is used for extracting semantic features based on the result of the rough reconstruction process, then the semantic features and the general features are integrated under a semantic attention module, and finally a super-resolution face reconstruction result is formed and used as the output of the network.
In the degradation stage, synthesizing the high-resolution face of the training set into a low-resolution face containing complex noise and fuzzy distribution as the input of the generation stage; in the generation stage, the low-resolution image is roughly amplified to the target resolution, the semantic features are extracted according to the low-resolution image, then the semantic features and the general features are integrated under a semantic attention module, and finally a super-resolution face reconstruction result is formed and is used as the output of a network.
The degradation phase comprises the following steps:
s11, the high-resolution face image HR in the training set firstly passes through a residual error network group, the residual error network group comprises a plurality of residual error network blocks and a plurality of pooling layers, the pooling layers are positioned between two adjacent residual error network blocks, the pooling layers have a down-sampling function, when the high-resolution face image HR is down-sampled to 4x4, the image is up-sampled and restored to 16x16 size by a structure formed by a sub-pixel layer and related residual error network blocks, and finally the high-resolution face image HR is integrated into a three-channel image serving as a down-samplingLow resolution face image LR after quality deg
Step S12, ensuring LR deg The identity characteristics of the image are not interfered, and the image is sampled with the standard image LR after the interpolation deg A content loss function L is established between pis-LR :L pix-LR =||LR deg -LR bic || 2
Step S13, introducing complex noise and fuzzy information in the degradation stage to make the synthesized LR deg The generation of the confrontation network is introduced when the content distribution is closer to the human face captured in the real environment, and then the confrontation network is processed by an LR discriminator D LR LR (low rate of speech) human face LR captured in real environment real Determination of synthesized LR deg Of the synthesized LR in the degradation stage deg Warp D LR After the integrated features are extracted, a single-channel tensor of 16x16 size is output. The value range in the tensor is from 0 to 1, the value range represents the quality distribution of the low-resolution image area at the corresponding position, the closer the value is to 1, the better the quality degradation quality of the corresponding area is, and the LR countermeasure loss function L established in the countermeasure process is generated adv-LR The following were used:
Figure BDA0003608647000000061
the LR output in the degradation stage under the combined action of the content loss function and the LR counter-loss function deg The distribution of (a) is more complex and diverse than a single interpolation downsampling. The synthesized low-resolution face image is used as the input of a subsequent generation stage, and the generalization and reconstruction capability of the FSR model under the multi-degradation mode can be effectively improved.
As shown in fig. 2, the residual network block includes convolutional layer a1, convolutional layer B2, regularization layer C3, active layer D4, convolutional layer E5, regularization layer F6, and active layer G7, where convolutional layer a1 is connected to regularization layer C3 through convolutional layer B2, regularization layer C3 is connected to convolutional layer E5 through active layer D4, and convolutional layer E5 is connected to active layer G7 through regularization layer F6.
As shown in fig. 3, an LR discriminator D LR Comprises an active layer H8, a convolutional layer I9, and spectrum regularizationA layering G10, an activation layer K11, a convolutional layer L12, a spectral regularization layer M13, an activation layer N14, the activation layer H8 is connected to the spectral regularization layer G10 through a convolutional layer I9, the spectral regularization layer G10 is connected to the convolutional layer L12 through an activation layer K11, and the convolutional layer L12 is connected to the activation layer N14 through a spectral regularization layer M13.
The course of the coarse reconstruction is LR deg SR face SR with rough reconstruction under target resolution output on rough reconstruction network Coarss The coarse reconstruction network consists of 11 residual error network blocks, the residual error network blocks are divided into two groups, and the number ratio is 8: 3, the tail part of each group of residual error blocks comprises a bilinear interpolation up-sampling layer and a rough reconstruction network L seg-CoarssSR =||Seg(SR Coarss )-Seg(HR)|| 2 The output of the network is SR face SR of rough reconstruction under target resolution Coarss And establishing a related content loss function L pix-CoarssSR :L pix-CoarssSR =||SR Coarss -HR|| 2 Then based on SR Coarss A semantic loss function L is designed seg-CoarssSR :L seg-CoarssSR =||Seg(SR Coarss )-Seg(HR)|| 2
The Seg (-) is a pre-trained human face semantic segmentation network, the network can output human face semantic segmentation prediction results under 19 channels, and the 19 channels respectively correspond to different human face components including eyes, a nose, eyebrows, hair and a head. The above SR-based Coarss The predicted semantic segmentation will act as semantic prior information in the subsequent fine reconstruction phase.
In the fine reconstruction process, the general features and the semantic features of the rough reconstructed face are integrated and enhanced through a fine reconstruction network, and finally a face super-resolution reconstruction result is output.
As shown in fig. 4, the fine reconstruction network consists of convolutional layer M15, residual network group 16, semantic feature attention network 17, and integrated convolutional layer 18.
In the fine reconstruction process, the fine reconstruction network integrates and enhances the general features and the semantic features of the rough reconstructed face, and finally outputs a face super-resolution reconstruction result. As shown in the corresponding area of FIG. 1, the fine reconstruction network consists of a general convolutional layer and a residual error networkThe system comprises a network group, a semantic feature attention network and an integration convolutional layer (a convolutional layer for integrating information of all channels), wherein 1 semantic feature attention network and 4 residual error network blocks jointly form a group of integral residual error relations. Input SR Coarss After general features are extracted from general convolutional layers, the general features are fused with corresponding semantic feature channels and enter a semantic feature attention network designed by people together. After two kinds of human face features input into the network are fused, a mixed feature containing attention is obtained through channel attention operation. The attention mechanism can effectively distinguish important human face features, and the influence of semantic features under general features in a reconstructed network is strengthened. The hybrid features then enter a residual network group deepening feature representation. After the enhancement of a residual error network group and a semantic feature attention network, mixed features enter an integrated convolution layer and are converted into a 3-channel image serving as a face reconstruction result SR finally output in a generation stage Fins
The fine reconstruction process comprises the following steps:
step A21, input SR Coarss After general features are extracted from general convolutional layers, the general features are fused with corresponding semantic feature channels and enter a semantic feature attention network together, and then mixed features containing attention are obtained;
step A22, enabling the mixed features containing attention to enter a residual error network group deepening feature expression, then entering an integrated convolution layer, and then converting the integrated convolution layer into a 3-channel image serving as a face reconstruction result SR finally output in a generation stage Fins
Step A23, generating the SR of the phase output Fins Establishing constraints, SR at first Fine Establishing a content loss function L between standard images HR in the same training set pix-FineSR :L pix-FineSR =||SR Fine -HR|| 2
Step A24, SR to be reconstructed Fine Passes through SR discriminator D SR Integrates the features and outputs a 16x16 size tensor whose values represent D SR The closer to 1 the reconstruction quality of the corresponding region represents the better the reconstruction perception quality, and the SR antithetical loss function of the SR face is established therebyNumber L adv-SR The following were used: l is adv-SR =-σ[D SR (SR Fine log(D SR (HR))+(1-D SR (SR Fine ))log(1-D SR (HR))]。
As shown in fig. 5, the system of the semantic feature-based face super-resolution reconstruction method formed in the above steps includes a synthesis module 61, an amplification module 62, and an integration module 63, where the synthesis module 61 is connected to the integration module 63 through the amplification module 62, the synthesis module is used for synthesizing a low-resolution face image containing complex noise and fuzzy distribution from a face, the amplification module is used for roughly amplifying the low-resolution face image to a target resolution, and the integration module is used for integrating semantic features and general features of the image, so as to form a super-resolution face reconstruction result.
Under the combined action of the content loss function and the SR counter loss function, the final reconstruction result SR output by the generation stage Fine Both in content and perceptual quality. In summary, the innovation points of the invention are as follows: the face super-resolution reconstruction method based on semantic features and aiming at the multi-degradation mode combines an unsupervised learning framework and semantic feature prior information, so that the generalization capability of an FSR network model under the multi-degradation mode is improved, and meanwhile the reconstructed SR face perception quality under the multi-degradation mode is enhanced.
The invention has the innovation points that the human face semantic features are introduced in the unsupervised learning mode, the generalization capability of the FSR network model in reconstruction in the multi-degradation mode is enhanced, and the semantic features introduced in the unsupervised method as prior information are emphasized to be stable and excellent in performance in the multi-degradation mode. The invention has an unsupervised face super-resolution reconstruction network to enhance the generalization, adapts to the face reconstruction requirement under multiple degradation modes, and introduces face semantic features as prior to improve the visual perception quality.
The invention realizes the face super-resolution reconstruction task under the multi-degradation mode by utilizing an unsupervised network model and introducing the face semantic features as prior information, and gets rid of the limitation that a pair of LR-HR face data sets are necessarily used in a supervised method. The network model obtained by training under the framework of the invention can adapt to the requirements of face super-resolution reconstruction under various degradation conditions, and improves the visual perception effect of the reconstructed face. The network model can be suitable for assisting in video security detection, small-size face recognition, face verification and other related advanced visual tasks in a real environment.
It should be noted that the above list is only one specific embodiment of the present invention. It is clear that the invention is not limited to the embodiments described above, but that many variations are possible, all of which can be derived or suggested directly from the disclosure of the invention by a person skilled in the art, and are considered to be within the scope of the invention.

Claims (9)

1. A face super-resolution reconstruction method based on semantic features is characterized by comprising the following steps:
step one, a quality degradation stage: synthesizing the high-resolution face of the training set into a low-resolution face image containing complex noise and fuzzy distribution;
step two, a generation stage: the generation stage comprises a coarse reconstruction process and a fine reconstruction process, the coarse reconstruction process is used for roughly amplifying the low-resolution face image to the target resolution, the fine reconstruction process is used for extracting semantic features based on the result of the coarse reconstruction process, then the semantic features and the general features are integrated under a semantic attention module, and finally a super-resolution face reconstruction result is formed and used as the output of the network.
2. The semantic feature-based face super-resolution reconstruction method according to claim 1, wherein the quality degradation stage comprises the following steps:
s11, the high resolution face image HR in the training set firstly passes through a residual error network group, the residual error network group comprises a plurality of residual error network blocks and a plurality of pooling layers, the pooling layers are positioned between two adjacent residual error network blocks, the pooling layers have a down-sampling function, and when the high resolution face image HR is down-sampled to 4x4, the image is up-sampled by a structure formed by a sub-pixel layer and related residual error network blocksThe image is restored to the size of 16x16 and finally integrated into a three-channel image as a degraded low-resolution face image LR deg
Step S12, ensuring LR deg The identity characteristics of the image are not interfered, and the image is sampled with the standard image LR after the interpolation deg A content loss function L is established between pix-LR :L pix-LR =||LR deg -LR bic || 2
Step S13, introducing complex noise and fuzzy information in the degradation stage to make the synthesized LR deg The generation of the confrontation network is introduced when the content distribution is closer to the human face captured in the real environment, and then the confrontation network is processed by an LR discriminator D LR LR (low rate of speech) human face LR captured in real environment real Determination of synthetic LR deg To generate an LR challenge loss function L established by the challenge process adv-LR The following were used:
L adv-LR =-σ[D LR (LR deg log(D LR (LR real ))+(1-D LR (LR deg ))log(1-D LR (LR real ))]
3. the semantic-feature-based face super-resolution reconstruction method according to claim 2, wherein the residual network block comprises a convolutional layer A (1), a convolutional layer B (2), a regularization layer C (3), an activation layer D (4), a convolutional layer E (5), a regularization layer F (6) and an activation layer G (7), the convolutional layer A (1) is connected with the regularization layer C (3) through the convolutional layer B (2), the regularization layer C (3) is connected with the convolutional layer E (5) through the activation layer D (4), and the convolutional layer E (5) is connected with the activation layer G (7) through the regularization layer F (6).
4. The semantic feature-based face super-resolution reconstruction method of claim 2, wherein the LR discriminator D LR The spectrum regularization layer comprises an activation layer H (8), a convolution layer I (9), a spectrum regularization layer G (10), an activation layer K (11), a convolution layer L (12), a spectrum regularization layer M (13) and an activation layer N (14), wherein the activation layer H (8) is connected with the spectrum regularization layer G (10) through the convolution layer I (9), and the spectrum regularization layer G (10) is connected with the convolution layer L (12) through the activation layer K (11)The convolutional layer L (12) is connected with an activation layer N (14) through a spectrum regularization layer M (13).
5. The semantic feature-based face super-resolution reconstruction method of claim 1, wherein the coarse reconstruction process is LR deg SR face SR with rough reconstruction under target resolution output on rough reconstruction network Coarss And establishing a related content loss function L pix-CoarssSR :L pix-CoarssSR =||SR Coarss -HR|| 2 Then based on SR Coarss Designing a semantic loss function L seg-CoarssSR :L seg-CoarseSR =||Seg(SR Coarse )-Seg(HR)|| 2 (ii) a The Seg (-) is a pre-trained human face semantic segmentation network, the network can output human face semantic segmentation prediction results under 19 channels, and the 19 channels respectively correspond to different human face components including eyes, a nose, eyebrows, hair and a head.
6. The method for reconstructing super-resolution human face based on semantic features of claim 5, wherein in the fine reconstruction process, the general features and semantic features of the rough reconstructed human face are integrated and enhanced through a fine reconstruction network, and a human face super-resolution reconstruction result is finally output.
7. The super-resolution face reconstruction method based on semantic features of claim 6, wherein the fine reconstruction network is composed of a convolutional layer M (15), a residual network group (16), a semantic feature attention network (17) and an integrated convolutional layer (18).
8. The semantic feature-based face super-resolution reconstruction method according to claim 7, wherein the fine reconstruction process comprises the following steps:
step A21, input SR Coarss After general features are extracted from general convolutional layers, the general features are fused with corresponding semantic feature channels and enter a semantic feature attention network together, and then mixed features containing attention are obtained;
step A22, enabling the mixed features containing attention to enter a residual error network group deepening feature expression, then entering an integrated convolution layer, and then converting the integrated convolution layer into a 3-channel image serving as a face reconstruction result SR finally output in a generation stage Fins
Step A23, generating the SR of the phase output Fins Establishing constraints, SR at first Fins Establishing a content loss function L between standard images HR in the same training set pix-FineSR :L pix-FineSR =||SR Fine -HR|| 2
Step A24, SR to be reconstructed Fins Passes through SR discriminator D SR Integrates the features and outputs a 16x16 size tensor, whose values range from 0 to 1, representing D SR The closer to 1 the reconstruction quality of the corresponding region represents the better the reconstruction perception quality, and the SR confrontation loss function L of the SR face is established by the closer to 1 adv-SR The following were used:
L adv-SR =-σ[D SR (SR Fins log(D SR (HR))+(1-D SR (SR Fine ))log(1-D SR (HR))]。
9. the system of the semantic feature based face super-resolution reconstruction method according to claim 1, wherein the system comprises a synthesis module (61), an amplification module (62), and an integration module (63), the synthesis module (61) is connected to the integration module (63) through the amplification module (62), the synthesis module is used for face synthesis to obtain a low-resolution face image containing complex noise and fuzzy distribution, the amplification module is used for roughly amplifying the low-resolution face image to a target resolution, and the integration module is used for integrating semantic features and general features of the image, so as to form a super-resolution face reconstruction result.
CN202210426417.9A 2022-04-21 2022-04-21 Semantic feature-based face super-resolution reconstruction method and system Pending CN114820310A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210426417.9A CN114820310A (en) 2022-04-21 2022-04-21 Semantic feature-based face super-resolution reconstruction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210426417.9A CN114820310A (en) 2022-04-21 2022-04-21 Semantic feature-based face super-resolution reconstruction method and system

Publications (1)

Publication Number Publication Date
CN114820310A true CN114820310A (en) 2022-07-29

Family

ID=82505319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210426417.9A Pending CN114820310A (en) 2022-04-21 2022-04-21 Semantic feature-based face super-resolution reconstruction method and system

Country Status (1)

Country Link
CN (1) CN114820310A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115953296A (en) * 2022-12-09 2023-04-11 中山大学·深圳 Transform and convolutional neural network combined based face super-resolution reconstruction method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115953296A (en) * 2022-12-09 2023-04-11 中山大学·深圳 Transform and convolutional neural network combined based face super-resolution reconstruction method and system
CN115953296B (en) * 2022-12-09 2024-04-05 中山大学·深圳 Face super-resolution reconstruction method and system based on combination of transducer and convolutional neural network

Similar Documents

Publication Publication Date Title
CN112287940A (en) Semantic segmentation method of attention mechanism based on deep learning
CN110969124B (en) Two-dimensional human body posture estimation method and system based on lightweight multi-branch network
CN110348330B (en) Face pose virtual view generation method based on VAE-ACGAN
CN111179167B (en) Image super-resolution method based on multi-stage attention enhancement network
CN112819910B (en) Hyperspectral image reconstruction method based on double-ghost attention machine mechanism network
CN113283444B (en) Heterogeneous image migration method based on generation countermeasure network
CN113658040A (en) Face super-resolution method based on prior information and attention fusion mechanism
CN111476133B (en) Unmanned driving-oriented foreground and background codec network target extraction method
CN110188667B (en) Face rectification method based on three-party confrontation generation network
CN112070668A (en) Image super-resolution method based on deep learning and edge enhancement
CN109903373A (en) A kind of high quality human face generating method based on multiple dimensioned residual error network
CN114820310A (en) Semantic feature-based face super-resolution reconstruction method and system
CN110992374A (en) Hair refined segmentation method and system based on deep learning
CN113935435A (en) Multi-modal emotion recognition method based on space-time feature fusion
CN115631107A (en) Edge-guided single image noise removal
CN115526777A (en) Blind over-separation network establishing method, blind over-separation method and storage medium
CN113379606A (en) Face super-resolution method based on pre-training generation model
CN116664435A (en) Face restoration method based on multi-scale face analysis map integration
CN114187668B (en) Face silence living body detection method and device based on positive sample training
CN116309213A (en) High-real-time multi-source image fusion method based on generation countermeasure network
CN116258627A (en) Super-resolution recovery system and method for extremely-degraded face image
CN115424337A (en) Iris image restoration system based on priori guidance
CN116266336A (en) Video super-resolution reconstruction method, device, computing equipment and storage medium
CN115131414A (en) Unmanned aerial vehicle image alignment method based on deep learning, electronic equipment and storage medium
CN111950496B (en) Mask person identity recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination