CN114418872A - Real image aesthetic feeling enhancing method based on mGANPrior - Google Patents

Real image aesthetic feeling enhancing method based on mGANPrior Download PDF

Info

Publication number
CN114418872A
CN114418872A CN202111627418.1A CN202111627418A CN114418872A CN 114418872 A CN114418872 A CN 114418872A CN 202111627418 A CN202111627418 A CN 202111627418A CN 114418872 A CN114418872 A CN 114418872A
Authority
CN
China
Prior art keywords
image
inv
aesthetic
enhanced
real image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111627418.1A
Other languages
Chinese (zh)
Inventor
张桦
苟若芸
张灵均
吴以凡
许艳萍
叶挺聪
包尔权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202111627418.1A priority Critical patent/CN114418872A/en
Publication of CN114418872A publication Critical patent/CN114418872A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • G06T5/77
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses a real image aesthetic feeling enhancement method based on mGANPrior, for a real image to be enhanced in aesthetic feeling, selecting a PGGAN pre-training generation model of a corresponding type, and determining the type of an aesthetic feeling effect needing to be enhanced; performing semantic segmentation on the real image by using a cascade segmentation module method; obtaining an inverse mapping image by using an mGANNprior method; according to the aesthetic style to be enhanced, corresponding degradation transformation is carried out on the real image and the inversely mapped image, loss is calculated, and a final hidden vector and an image I with the enhanced image aesthetic style are obtained through gradient descent optimizationenh(ii) a The method of the invention realizes the aesthetic enhancement of the real image, not only retains the original information of the image to the maximum extent, but also aligns the imageLike controllable aesthetic style modification, the invention provides a loss function of degradation transformation according to aesthetic factors, and can generate aesthetic fuzzy effect on a real image.

Description

Real image aesthetic feeling enhancing method based on mGANPrior
Technical Field
The invention relates to the field of inverse mapping for generating a confrontation network, semantic segmentation of an image and image aesthetic enhancement, in particular to inverse mapping a real image to a hidden space for generating the confrontation network, and constructing aesthetic degradation transformation for loss calculation aiming at two aesthetic styles so as to achieve the effect of enhancing the aesthetic feeling of the real image.
Background
The development of a generative countermeasure network (GAN) has made great breakthrough and progress in recent years, and researchers have developed a large number of GAN-based superior derivative generative models such as PGGAN, StyleGAN, BigGAN, and the like. The PGGAN is the first generation model that proposes a progressive training method to obtain a high-quality generated image, and the model can finally obtain a 1024 × 1024 high-quality image through a progressive training method. The PGGAN provides pre-trained generative models in a number of scenarios, such as churches, towers, bridges, bedrooms, etc.
The GAN model maps the hidden space to the image space, and the GAN inverse mapping is the inverse process thereof, i.e., the mapping from the image space to the hidden space is established. GAN inverse mapping aims at mapping real images back into the hidden space of the pre-trained GAN model, which can then be reconstructed from the inverse mapped hidden space by the generative model. The mGANNprior is a GAN inverse mapping method capable of reconstructing a real image in high quality, and the method has excellent inverse mapping effect on various indoor and outdoor scene generation models. Meanwhile, before the loss between the generated image and the real image is calculated, mGANNprior applies different degradation transformations to the image to realize image processing work such as coloring, super-resolution reconstruction, image restoration and the like of the real image.
Semantic segmentation is a deep learning algorithm that associates a label or class with each pixel of a picture. Which is used to identify the set of pixels that make up the distinguishable categories. For example, a bridge landscape may require identifying bridges, brooks, grass, trees, sky, etc. The Cascade segmentation module (Cascade segmentation module) method is a general semantic segmentation method proposed in a thesis of Scene segmentation through hADE20K Dataset (Scene analysis is performed through ADE20K Dataset), and a Scene is analyzed into a material part, an object part and an object part in a Cascade mode, so that the method can be applied to semantic segmentation under different scenes.
Based on the above background, if the degenerate transformation is designed for 14-class aesthetic styles, the aesthetic enhancement of the real image based on GAN inverse mapping can be realized. Provided that the degradation transformation is known and derivable, e.g. graying, downsampling and image cropping, respectively, for the degradation transformations corresponding to image coloring, super-resolution reconstruction and image restoration. Aesthetic factors still have no guided transformation method with the current research results, so the invention aims at designing a degradation transformation mode aiming at the aesthetic style, thereby realizing the aesthetic enhancement of the real image based on the GAN.
Disclosure of Invention
Aiming at the problems, the invention provides a method for enhancing the aesthetic feeling of a real image based on mGAN prior, different degradation transformation methods are designed aiming at two fuzzy aesthetic styles of shallow depth of field and motion blur, and the enhancement of the fuzzy aesthetic feeling of the real image is realized, and the technical scheme of the invention comprises the following steps:
step 1: selecting a real image I to be enhanced in aesthetic feeling, selecting a PGGAN pre-training generation model of a corresponding type, and determining that shallow depth of field aesthetic feeling enhancement or motion blur aesthetic feeling enhancement needs to be performed on the image.
Step 2: and performing semantic segmentation on the real image I by using a Cascade segmentation module (Cascade segmentation module) method, and extracting main pixels. And different binary matrices m are obtained according to the aesthetic effect (shallow depth of field or motion blur) to be enhanced.
And step 3: obtaining an inverse mapped image I by using an mGANprepror methodinv
And 4, step 4: with inverse mapping of the real image I according to the aesthetic style to be enhancedImage IinvMaking a corresponding degenerate transformation and calculating the loss, and then optimizing the hidden vector z according to the gradient descenti,(i∈(1,n))Will optimize zi,(i∈(1,n))Again as input, a new inverse mapped image I is obtained by the mGANPrior method of step 3invAnd calculating loss, stopping training until ten continuous iterations of the loss have no descending trend, and obtaining a final hidden vector zi,(i∈(1,n))Image I with enhanced aesthetic styleenh. Wherein the loss function employs a mean square error loss (MSE) in combination with perceptual feature reconstruction loss.
Further, the specific method in step 3 is as follows:
and (3) dividing the generation network of the generation model selected in the step (1) into two parts from a designated layer by adopting an mGANNprior method, wherein the layer and the previous network layer are front networks G1, all the networks behind the layer are rear networks G2, the designated layer is selected by self according to the requirement, and the inverse mapping effect is positively correlated with the layer depth. Randomly generating a number n of hidden vectors zi,(i∈(1,n))The pre-network G1 is used as input of the pre-network G1 to obtain n feature maps, the obtained feature maps are combined based on the adaptive channel importance and input into the post-network G2 to obtain a generated image Iinv
Further, the specific method in step 4 is as follows:
(1) for real image I and inverse mapped image IinvApplying an image degradation transformation to obtain X and Xinv
If the aesthetic feeling of the shallow depth of field is enhanced, the following shallow depth of field degradation transformation is performed. According to m pairs of real image I background part and inverse mapping image IinvAnd downsampling the pixels of the middle main body part. The degenerate transformation equation is as follows:
X=I*m+down(I*1-m))#(1)
Xinv=down(Iinv*m)+Iinv*(1-m)#(2)
if the aesthetic feeling of the motion blur is enhanced, the following motion blur degradation transformation is performed. And downsampling the pixels of the main part of the real image I according to m. The degenerate transformation equation is as follows:
X=down(I*m)+I*(1-m)#(3)
Xinv=Iinv#(4)
(2) calculating X and XinvMSE and perceptual feature loss of:
Figure BDA0003440313440000031
wherein
Figure BDA0003440313440000032
Is X and XinvWhere φ (-) is a perceptual feature extractor, φ (X) and φ (X)inv) Are X and X respectivelyinvThe perceptual features of (1), i (X), phi (X)inv)||1Is phi (X) and phi (X)inv) L1 distance.
(3) Optimizing z with gradient descenti,(i∈(1,n))
(4) And (5) performing iterative training. Optimizing z by using the method of step 3 of mGANPriori,(i∈(1,n))Taking the n characteristic graphs as the input of the preposed network G1 again; combining the n characteristic graphs according to the principle of the importance of the self-adaptive channel and inputting the n characteristic graphs into G2 to obtain a new image I of inverse mappinginv
(5) Repeating the steps (1) to (4) to perform degradation transformation and calculate loss, and performing gradient descent on zi,(i∈(1,n))Optimization and iterative training are performed. Stopping training until ten consecutive iterations are lost and no longer trend downward, obtaining fuzzy aesthetic enhanced Ienh. The shallow depth of field degraded transformation is used to obtain an image with enhanced aesthetic feeling of the shallow depth of field, and the motion blur degraded transformation is used to obtain an image with enhanced aesthetic feeling of the motion blur.
The invention has the following beneficial effects:
1. and the aesthetic feeling of the real image is enhanced by using an inverse mapping method mGANPrior of GAN. The original information of the image is reserved to the greatest extent, and controllable aesthetic style modification is carried out on the image.
2. The fuzzy effect in the aesthetic style, such as shallow depth of field and motion blur, is the most difficult effect to realize by two kinds of GANs, because the GAN model can eliminate the blur as much as possible in the training process, the patent provides a loss function of degradation transformation according to the aesthetic factor, and the aesthetic fuzzy effect can be generated on the real image.
Drawings
FIG. 1 is a flow chart of a method embodying the present invention.
Detailed Description
Embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.
As shown in fig. 1, the present invention provides a method for enhancing an aesthetic feeling of a real image based on mGANprior, which includes inputting a real image and selecting a suitable PGGAN generation model, performing semantic segmentation on the image to extract a main portion to be enhanced, and generating a binary matrix m according to the image and the extracted semantics, where a value of the main portion is 1 and a value of the background portion is 0. Obtaining an inverse mapping image by using the mGANprepror, performing degradation transformation and loss calculation on the real image and the inverse mapping image, performing parameter optimization and iterative training according to gradient descent, and finally obtaining an image with enhanced aesthetic feeling, wherein the detailed steps are as follows.
Step 1: selecting a real image I to be enhanced in aesthetic feeling, and artificially judging whether the image belongs to a certain class in a PGGAN pre-training generation model, for example, selecting a PGGAN corresponding bridge generation model for the image containing the bridge. If the PGGAN is not in any one of the PGGAN pre-training generative models, one PGGAN generative model can be arbitrarily selected. And determines whether a shallow depth of field aesthetic enhancement or a motion blur aesthetic enhancement is desired for the image. .
Step 2: and performing semantic segmentation on the image by using a cascade segmentation module, and extracting main pixels. Obtaining a binary matrix m, if the aesthetic feeling of shallow depth of field is enhanced, setting the value of the part needing to be clearly imaged in the m to be 1, and setting the other values to be 0; if the motion blur aesthetic feeling is enhanced, the median value of the part needing to be added is set to be 1 in m, and the rest values are set to be 0.
The depth of field refers to the range of clear imaging before and after the focus, picture elements in the range can be imaged clearly, and picture elements outside the range can be gradually blurred. A shallow depth of field means that only a part of the image is in focus. Motion blur is a noticeable blurred drag trace caused by a fast moving object in a static scene or a series of pictures like a movie or animation.
And step 3: obtaining an inverse mapping image I through an mGANPrior modelinv
(1) Taking the PGGAN generation network as a boundary from the layer 8, taking the first layer to the eighth layer as a front network G1, and taking the subsequent network as a rear network G2;
(2) generating 30 hidden vectors zi,(i∈(1,30))
(3)zi,(i∈(1,30))G1 is input to obtain 30 feature maps;
(4) combining 30 characteristic graphs according to the principle of the importance of the adaptive channel, inputting the combined characteristic graphs into G2 to obtain an inversely mapped image Iinv
And 4, step 4: comparing the real image I and the inverse-mapped image I according to the aesthetic feeling to be enhancedinvApplying different degenerate transformations to obtain X and XinvCalculating X and XinvIs taken as the training loss, z is optimized by gradient descenti,(i∈(1,30))
(1) For real image I and inverse mapped image IinvApplying an image degradation transformation to obtain X and Xinv
If the aesthetic feeling of the shallow depth of field is enhanced, the following shallow depth of field degradation transformation is performed. According to m pairs of real image I background part and inverse mapping image IinvAnd downsampling the pixels of the middle main body part. The degenerate transformation equation is as follows:
X=I*m+down(I*(1-m))#(1)
Xinv=down(Iinv*m)+Iinv*(1-m)#(2)
if the aesthetic feeling of the motion blur is enhanced, the following motion blur degradation transformation is performed. And downsampling the pixels of the main part of the real image I according to m. The degenerate transformation equation is as follows:
X=down(I*m)+I*(1-m)#(3)
Xinv=Iinv#(4)
(2) calculating X and XinvMSE and perceptual feature loss of
Figure BDA0003440313440000051
Wherein
Figure BDA0003440313440000052
Is X and XinvWhere φ (-) is a perceptual feature extractor, φ (X) and φ (X)inv) Are X and X respectivelyinvThe perceptual features of (1), i (X), phi (X)inv)||1Is phi (X) and phi (X)inv) L1 distance.
(3) Optimizing z with gradient descenti,(i∈(1,30))
(4) And (5) performing iterative training. Optimizing z by using the method of step 3 of mGANPriori,(i∈(1,30))Taking the obtained result as the input of the preposed network G1 again to obtain 30 feature maps; combining the 30 characteristic graphs according to the principle of the importance of the self-adaptive channel and inputting the combination into G2 to obtain a new image I of inverse mappinginv
(5) Repeating the steps (1) to (4) to perform degradation transformation and calculate loss, and performing gradient descent on zi,(i∈(1,30))Optimization and iterative training are performed. Stopping training until ten consecutive iterations are lost and no longer trend downward, obtaining fuzzy aesthetic enhanced Ienh. The shallow depth of field degraded transformation is used to obtain an image with enhanced aesthetic feeling of the shallow depth of field, and the motion blur degraded transformation is used to obtain an image with enhanced aesthetic feeling of the motion blur.

Claims (3)

1. A method for enhancing the aesthetic feeling of a real image based on mGAN prior is characterized by comprising the following steps:
step 1: selecting a real image I to be enhanced in aesthetic feeling, selecting a PGGAN pre-training generation model of a corresponding type, and determining that shallow depth of field aesthetic feeling enhancement or motion blur aesthetic feeling enhancement needs to be performed on the image;
step 2: performing semantic segmentation on the real image I by using a cascade segmentation module method, and extracting a main pixel; obtaining different binary matrixes m according to the aesthetic feeling effect enhanced by the requirement;
step (ii) of3: obtaining an inverse mapped image I by using an mGANprepror methodinv
And 4, step 4: according to the aesthetic style to be enhanced, the real image I and the inverse mapping image I are subjected toinvMaking a corresponding degenerate transformation and calculating the loss, and then optimizing the hidden vector z according to the gradient descenti,(i∈(1,n))Will optimize zi,(i∈(1,n))Again as input, a new inverse mapped image I is obtained by the mGANPrior method of step 3invAnd calculating loss, stopping training until ten continuous iterations of the loss have no descending trend, and obtaining a final hidden vector zi,(i∈(1,n))Image I with enhanced aesthetic styleenh(ii) a The loss function adopts a mode of combining mean square error loss (MSE) and perceptual feature reconstruction loss.
2. The method for enhancing the real image aesthetic feeling based on the mGANprior as claimed in claim 1, wherein the specific method in step 3 is as follows:
dividing the generation network of the generation model selected in the step 1 into two parts from a designated layer by adopting an mGANNprior method, wherein the layer and the previous network layer are front networks G1, all the networks behind the layer are rear networks G2, the designated layer is selected by self according to requirements, and the inverse mapping effect is positively correlated with the layer depth; randomly generating a number n of hidden vectors zi,(i∈(1,n))The pre-network G1 is used as input of the pre-network G1 to obtain n feature maps, the obtained feature maps are combined based on the adaptive channel importance and input into the post-network G2 to obtain a generated image Iinv
3. The method for enhancing the real image aesthetic feeling based on the mGANprior as claimed in claim 1, wherein the step 4 is as follows:
(1) for real image I and inverse mapped image IinvApplying an image degradation transformation to obtain X and Xinv
If the aesthetic feeling of the shallow depth of field is enhanced, performing the following shallow depth of field degradation transformation; according to m pairs of real image I background part and inverse mapping image IinvDown-sampling the pixels of the middle main body part; the degenerate transformation equation is as follows:
X=I*m+down(I*(1-m))#(1)
Xinv=down(Iinv*m)+Iinv*(1-m)#(2)
if the aesthetic feeling of the motion blur is enhanced, performing the following motion blur degradation transformation; downsampling the pixels of the main part of the real image I according to the m; the degenerate transformation equation is as follows:
X=down(I*m)+I*(1-m)#(3)
Xinv=Iinv#(4)
(2) calculating X and XinvMSE and perceptual feature loss of:
Figure FDA0003440313430000021
wherein
Figure FDA0003440313430000022
Is X and XinvWhere φ (-) is a perceptual feature extractor, φ (X) and φ (X)inv) Are X and X respectivelyinvThe perceptual features of (1), i (X), phi (X)inv)||1Is phi (X) and phi (X)inv) L1 distance;
(3) optimizing z with gradient descenti,(i∈(1,n))
(4) Performing iterative training; optimizing z by using the method of step 3 of mGANPriori,(i∈(1,n))Taking the n characteristic graphs as the input of the preposed network G1 again; combining the n characteristic graphs according to the principle of the importance of the self-adaptive channel and inputting the n characteristic graphs into G2 to obtain a new image I of inverse mappinginv
(5) Repeating the steps (1) to (4) to perform degradation transformation and calculate loss, and performing gradient descent on zi,(i∈(1,n))Optimizing and performing iterative training; stopping training until ten consecutive iterations are lost and no longer trend downward, obtaining fuzzy aesthetic enhanced Ienh(ii) a Obtaining a shallow depth of field aesthetically enhanced image using a shallow depth of field degenerative transform, using motion blur degradationThe transformation results in a motion blurred, aesthetically enhanced image.
CN202111627418.1A 2021-12-28 2021-12-28 Real image aesthetic feeling enhancing method based on mGANPrior Pending CN114418872A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111627418.1A CN114418872A (en) 2021-12-28 2021-12-28 Real image aesthetic feeling enhancing method based on mGANPrior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111627418.1A CN114418872A (en) 2021-12-28 2021-12-28 Real image aesthetic feeling enhancing method based on mGANPrior

Publications (1)

Publication Number Publication Date
CN114418872A true CN114418872A (en) 2022-04-29

Family

ID=81270002

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111627418.1A Pending CN114418872A (en) 2021-12-28 2021-12-28 Real image aesthetic feeling enhancing method based on mGANPrior

Country Status (1)

Country Link
CN (1) CN114418872A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649338A (en) * 2024-01-29 2024-03-05 中山大学 Method for generating countermeasures against network inverse mapping for face image editing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649338A (en) * 2024-01-29 2024-03-05 中山大学 Method for generating countermeasures against network inverse mapping for face image editing

Similar Documents

Publication Publication Date Title
Lim et al. DSLR: Deep stacked Laplacian restorer for low-light image enhancement
US11017586B2 (en) 3D motion effect from a 2D image
Li et al. Semantic-aware grad-gan for virtual-to-real urban scene adaption
CN111028177B (en) Edge-based deep learning image motion blur removing method
CN110136062B (en) Super-resolution reconstruction method combining semantic segmentation
CN110570377A (en) group normalization-based rapid image style migration method
Panetta et al. Tmo-net: A parameter-free tone mapping operator using generative adversarial network, and performance benchmarking on large scale hdr dataset
JP2000512833A (en) Improving depth perception by integrating monocular cues
CN112288632B (en) Single image super-resolution method and system based on simplified ESRGAN
CN111986075B (en) Style migration method for target edge clarification
CN111626951B (en) Image shadow elimination method based on content perception information
CN113222875B (en) Image harmonious synthesis method based on color constancy
Kumar et al. Structure-preserving NPR framework for image abstraction and stylization
Conde et al. Lens-to-lens bokeh effect transformation. NTIRE 2023 challenge report
Li et al. Uphdr-gan: Generative adversarial network for high dynamic range imaging with unpaired data
CN116012232A (en) Image processing method and device, storage medium and electronic equipment
Guo et al. Deep illumination-enhanced face super-resolution network for low-light images
Liu et al. Facial image inpainting using multi-level generative network
CN114418872A (en) Real image aesthetic feeling enhancing method based on mGANPrior
CN113962905A (en) Single image rain removing method based on multi-stage feature complementary network
CN116612263B (en) Method and device for sensing consistency dynamic fitting of latent vision synthesis
CN117274059A (en) Low-resolution image reconstruction method and system based on image coding-decoding
CN111064905B (en) Video scene conversion method for automatic driving
CN116703719A (en) Face super-resolution reconstruction device and method based on face 3D priori information
CN114331894A (en) Face image restoration method based on potential feature reconstruction and mask perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination