CN112581360A - Multi-style image aesthetic quality enhancement method based on structural constraint - Google Patents

Multi-style image aesthetic quality enhancement method based on structural constraint Download PDF

Info

Publication number
CN112581360A
CN112581360A CN202011609567.0A CN202011609567A CN112581360A CN 112581360 A CN112581360 A CN 112581360A CN 202011609567 A CN202011609567 A CN 202011609567A CN 112581360 A CN112581360 A CN 112581360A
Authority
CN
China
Prior art keywords
network
image
feature
aesthetic quality
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011609567.0A
Other languages
Chinese (zh)
Other versions
CN112581360B (en
Inventor
俞俊
牛豪康
高飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202011609567.0A priority Critical patent/CN112581360B/en
Publication of CN112581360A publication Critical patent/CN112581360A/en
Application granted granted Critical
Publication of CN112581360B publication Critical patent/CN112581360B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06T3/04
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics

Abstract

The invention discloses a structure-constrained multi-style image aesthetic quality enhancement method. The invention comprises the following steps: (1) converting input image data into vectors of an LAB space; (2) inputting the vector converted into the LAB space into an enhancement network, wherein the enhancement network comprises two structure adjustment networks and a pixel adjustment network; wherein the structure adjustment network is used for improving the aesthetic property of the composition; the pixel adjusting network further adjusts the color and the light and shadow effect of the image by adjusting the numerical value of each pixel; (3) performing refinement processing on the extracted features; inputting the characteristics of the enhanced network output into the refined network to obtain a final output aesthetic quality enhanced image; (4) a multi-scale multi-distribution constraint discrimination network; and a multi-scale multi-distribution constraint discrimination network is adopted to optimize an enhancement network and a refinement network, so that the quality of the final output aesthetic quality enhanced image is improved. The structure adjustment network of the invention can automatically extract the optimal n beautified areas without human intervention.

Description

Multi-style image aesthetic quality enhancement method based on structural constraint
Technical Field
The invention provides a novel method for enhancing Multi-style image aesthetic quality (Multi-style image aesthetic quality enhanced on structural constraints) based on structural constraints, and mainly relates to a method for carrying out training by using a convolutional neural network, reconstructing partial regions of an image, capturing deep characteristic information and mixing a specific style to obtain a model capable of carrying out Multi-style aesthetic quality optimization on the image.
Background
The Image aesthetic quality enhancement (Image aesthetic quality enhancement) process typically involves the adjustment of factors such as hue, saturation, and composition. The existing method generally adopts two modes of clipping and pixel adjustment, and generally increases various rule constraints on the adjustment process based on expert knowledge, thereby limiting the diversity of the enhancement effect. In addition, in the conventional method, the internal correlation of the image is not considered in the adjustment process of the image pixels, and the rationality of the enhanced image in terms of light and shadow and color cannot be ensured. Finally, image aesthetics come in a variety of styles, with large differences in composition, color, shading, etc. between different styles. However, the existing method does not take the style factor into consideration, and usually only can obtain the enhancement effect with a single style, and is difficult to meet different requirements of users.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for enhancing the aesthetic quality of a multi-format image based on structural constraint. In order to break through the frame of adjusting the image structure based on clipping and introduce the structural constraint of the image content, a multi-style frame of enhancing the image aesthetic quality is realized.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step (1) feature space transformation
And converting the input image data into vectors of an LAB space, so that the expression of the image in light and shadow factors such as colors and the like is consistent with the subjective perception of human eyes.
Step (2) feature extraction
Inputting the vector converted into the LAB space into an enhancement network, wherein the enhancement network comprises two structure adjustment networks and a pixel adjustment network; wherein the structure adjustment network is used for improving the aesthetic property of the composition; the pixel adjustment network further adjusts the color and light and shadow effects of the image by adjusting the value of each pixel.
Step (3) refining the extracted features
And inputting the characteristics of the enhanced network output into the refined network to obtain the final output aesthetic quality enhanced image.
Step (4) multi-scale multi-distribution constraint discrimination network
And optimizing an enhancement network and a refinement network by adopting a multi-scale multi-distribution constraint discrimination network, thereby improving the quality of the final output aesthetic quality enhanced image.
Further, the feature space transformation in step (1):
1-1, preprocessing an input image by cutting, turning and the like;
1-2 convert the preprocessed image as input into a vector in LAB space.
Further, the feature extraction in step (2) is implemented as follows:
2-1 structural adjustment network:
the method comprises the steps of training a pre-trained target detection reference network by combining a composition marking data set and an aesthetic quality evaluation data set, carrying out image aesthetic task fine adjustment on the network in the training process, adopting a graph evaluation model to score candidate regions by a fine adjustment strategy, and then selecting the optimal top n candidate regions based on a sequencing result. The pre-trained target detection reference network has better composition evaluation and aesthetic quality prediction capabilities, and therefore reliable feedback is provided for generation of the candidate region.
Taking the output of the trained target detection reference network as the input of the graph attention network; the target detection reference network extracts target features, associated features, and area features from an input image and constructs a map. And inputting the constructed graph into a multi-layer graph attention network, and outputting a beautification graph and a feature matrix corresponding to the beautified input image. Features of each layer graph attention network during GAT iterationThe progressive transformation of the image structure and the semantic expression of the corresponding content are expressed, so that the predicted beautification map and the feature matrix { X ] of all layers in the GAT are planned to be predicted(1),X(2),...,X(L)And inputting the image data into a fine network for synthesizing the enhanced image.
2-2 pixel adjustment network: it is intended to adaptively adjust the shading and color of an image for different styles. And inputting the Lab three-channel data of the input image into a content encoder, and extracting high-level semantic features. In addition, considering that pixel adjustment rules corresponding to different aesthetic styles are different, style marking One-Hot vectors are simultaneously input into a style encoder to extract style high-level semantic features. And then, the high-level semantic features and the style high-level semantic features are connected in series and input into a decoder, and a pixel adjustment factor matrix T corresponding to each position of the Lab three channels is predicted by using an adjustable Sigmoid activation function k sigma (·). Where k is an adjustment factor and σ (·) represents a Sigmoid function. And finally, performing dot multiplication on the pixel adjustment factor matrix T and the Lab matrix X of the original input image to obtain the image T X with adjusted brightness and color.
Furthermore, the content encoder, the style encoder and the decoder are connected in a U-Net connection mode. Considering similar regions in the input image, the adjustment factor k needs to be similar, so the Guiding Attention (GA) mechanism is adopted to reconstruct the output characteristics of the decoder. The guiding attention calculation flow of the feature map y of the content encoder (such as the l-th layer) and the feature map x of the corresponding decoding layer (n-l layers) is as follows:
Figure BDA0002874297760000031
α (-) represents an attention calculation function, typically a feature similarity description between different locations; f (-), g (-), h (-), are the mappings of the feature graph x. α (-) describes the correlation between all locations in the input image. Thus α (-) is described as the structure of the input image and is used to reconstruct the decoder output features. And after the feature graph z obtained by reconstruction is connected with the content encoder feature graph y and the style encoder feature graph s in series, inputting the feature graph z into a subsequent decoding layer to obtain output. Therefore, similar positions in the input image are ensured, and the output pixel adjustment factor matrix T also has similar expression, so that the output image and the input image are promoted to keep similar structures.
Further, the refining process in step (3):
3-1 refinement network: and (4) adjusting the beautification picture and the feature matrix output by the network based on the structure and the image output by the pixel adjustment network, and synthesizing the beautified image.
The advanced network adopts an encoding-decoding structure, wherein a convolutional layer adopts a residual error network block, and a Self-Attention (SA) mechanism is introduced into a decoding layer to reconstruct the output characteristics of a decoder. The SA calculation for the decoded layer feature map x is represented as:
Figure BDA0002874297760000032
wherein α (·) represents an attention calculation function; f (-), g (-), h (-), are the mappings of the feature graph x. α (-) describes the correlation between all locations in the input image. The basic idea is as follows: for a particular location, it is reconstructed using the features of all locations. Similar contents in the output image have similar characteristic expressions, so that similar appearance expression is obtained, and the reasonability of the aesthetic quality enhanced image is ensured.
Further, the multi-scale multi-distribution constraint discriminating network in the step (4):
4-1 aesthetic quality enhancement: the quality of the aesthetic quality enhanced image is improved by adopting multi-distribution constraint. In order to improve the capability of judging the network, firstly, a pre-trained image aesthetic quality evaluation model is used as an aesthetic feature extraction module; then, feature graphs of three network layers with different depths of the model are used as the input of a discriminant network, and discriminant sub-networks are respectively constructed
Figure BDA0002874297760000041
Different judgmentsThe network of the other sub-networks corresponds to the expression of the aesthetic quality of the image on different scales. In order to improve the discrimination capability of the model, a multitask and multi-label learning mode is adopted, each discrimination network simultaneously predicts the image style type, the aesthetic quality (G/B) and the truth (R/F), and respectively adopts cross entropy Loss, triple Loss (triple Loss) and L2 Loss. Wherein the triplet penalty is due to the true aesthetic image Y, the enhanced image
Figure BDA0002874297760000042
And the original image X should satisfy the aesthetic degree
Figure BDA0002874297760000043
Triple Loss (Triplet Loss) is therefore introduced as an objective function, namely:
Figure BDA0002874297760000044
wherein alpha is a regulatory factor of]+Meaning that only terms greater than 0 are taken.
4-2 content discrimination network: and comparing the feature maps obtained from the regions corresponding to the aesthetic quality enhancement images and the input images by adopting a pre-trained fast-RCNN network, and calculating the L2 distance between the feature maps to be used as content loss. In the training phase, the generator and the discriminator are optimized in an end-to-end mode. In the testing phase, the given image and the style label are input into the generator, and the enhanced image can be output.
The invention has the following beneficial effects:
the invention aims to overcome the defects of the prior art and provides a method for enhancing the aesthetic quality of a multi-format image based on structural constraint. In order to break through the frame of adjusting the image structure based on clipping and introduce the structural constraint of the image content, a multi-style frame of enhancing the image aesthetic quality is realized. The invention has the advantage that the structure adjusting network can automatically extract the optimal n beautifying areas without human intervention. Further attention is drawn to the extent to which the network can further refine the area. And then, an optimization strategy is automatically provided by the pixel adjustment network through extracting image characteristics, and the optimization strategy is input into the refinement network, so that the model can beautify the input image with high efficiency and high quality.
Drawings
FIG. 1 is a schematic diagram of an aesthetic quality assessment framework using composition blending with global features;
FIG. 2 is an architectural diagram of a global feature and composition feature extraction network;
FIG. 1 (a) is an overall architecture diagram of a structurally constrained multi-format image aesthetic quality enhancement model;
FIG. 1 (b) is a schematic diagram of a fabric adjustment network architecture;
FIG. 2 (c) is a schematic diagram of a pixel adjustment network architecture
FIG. 2 (d) is a schematic diagram of the architecture of a multi-scale, multi-distribution constraint discriminating network
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in fig. 1 and fig. 2, a method for enhancing the aesthetic quality of a multi-format image with structural constraints specifically includes the following steps:
step (1) feature space transformation
And converting the input image data into vectors of an LAB space, so that the expression of the image in light and shadow factors such as colors and the like is consistent with the subjective perception of human eyes.
Step (2) feature extraction
Inputting the vector converted into the LAB space into an enhancement network, wherein the enhancement network comprises two structure adjustment networks and a pixel adjustment network; wherein the structure adjustment network is used for improving the aesthetic property of the composition; the pixel adjustment network further adjusts the color and light and shadow effects of the image by adjusting the value of each pixel.
Step (3) refining the extracted features
And inputting the characteristics of the enhanced network output into the refined network to obtain the final output aesthetic quality enhanced image.
Step (4) multi-scale multi-distribution constraint discrimination network
And optimizing an enhancement network and a refinement network by adopting a multi-scale multi-distribution constraint discrimination network, thereby improving the quality of the final output aesthetic quality enhanced image.
Further, the feature space transformation in step (1):
1-1, preprocessing an input image by cutting, turning and the like;
1-2 convert the preprocessed image as input into a vector in LAB space.
Further, the feature extraction in step (2) is implemented as follows:
2-1 structural adjustment network:
the method comprises the steps of training a pre-trained target detection reference network by combining a composition marking data set and an aesthetic quality evaluation data set, carrying out image aesthetic task fine adjustment on the network in the training process, adopting a graph evaluation model to score candidate regions by a fine adjustment strategy, and then selecting the optimal top n candidate regions based on a sequencing result. The pre-trained target detection reference network has better composition evaluation and aesthetic quality prediction capabilities, and therefore reliable feedback is provided for generation of the candidate region.
Taking the output of the trained target detection reference network as the input of the graph attention network; the target detection reference network extracts target features, associated features, and area features from an input image and constructs a map. And inputting the constructed graph into a multi-layer graph attention network, and outputting a beautification graph and a feature matrix corresponding to the beautified input image. In the iterative process of GAT, the feature of each layer graph attention network expresses the progressive transformation of the image structure and the semantic expression of the corresponding content, so that the predicted beautification graph and the feature matrix { X ] of all layers in the GAT are planned to be predicted(1),X(2),...,X(L)And inputting the image data into a fine network for synthesizing the enhanced image.
2-2 pixel adjustment network: it is intended to adaptively adjust the shading and color of an image for different styles. And inputting the Lab three-channel data of the input image into a content encoder, and extracting high-level semantic features. In addition, considering that pixel adjustment rules corresponding to different aesthetic styles are different, style marking One-Hot vectors are simultaneously input into a style encoder to extract style high-level semantic features. And then, the high-level semantic features and the style high-level semantic features are connected in series and input into a decoder, and a pixel adjustment factor matrix T corresponding to each position of the Lab three channels is predicted by using an adjustable Sigmoid activation function k sigma (·). Where k is an adjustment factor and σ (·) represents a Sigmoid function. And finally, performing dot multiplication on the pixel adjustment factor matrix T and the Lab matrix X of the original input image to obtain the image T X with adjusted brightness and color.
Furthermore, the content encoder, the style encoder and the decoder are connected in a U-Net connection mode. Considering similar regions in the input image, the adjustment factor k needs to be similar, so the Guiding Attention (GA) mechanism is adopted to reconstruct the output characteristics of the decoder. The guiding attention calculation flow of the feature map y of the content encoder (such as the l-th layer) and the feature map x of the corresponding decoding layer (n-l layers) is as follows:
Figure BDA0002874297760000071
α (-) represents an attention calculation function, typically a feature similarity description between different locations; f (-), g (-), h (-), are the mappings of the feature graph x. α (-) describes the correlation between all locations in the input image. Thus α (-) is described as the structure of the input image and is used to reconstruct the decoder output features. And after the feature graph z obtained by reconstruction is connected with the content encoder feature graph y and the style encoder feature graph s in series, inputting the feature graph z into a subsequent decoding layer to obtain output. Therefore, similar positions in the input image are ensured, and the output pixel adjustment factor matrix T also has similar expression, so that the output image and the input image are promoted to keep similar structures.
Further, the refining process in step (3):
3-1 refinement network: and (4) adjusting the beautification picture and the feature matrix output by the network based on the structure and the image output by the pixel adjustment network, and synthesizing the beautified image.
The advanced network adopts an encoding-decoding structure, wherein a convolutional layer adopts a residual error network block, and a Self-Attention (SA) mechanism is introduced into a decoding layer to reconstruct the output characteristics of a decoder. The SA calculation for the decoded layer feature map x is represented as:
Figure BDA0002874297760000072
wherein α (·) represents an attention calculation function; f (-), g (-), h (-), are the mappings of the feature graph x. α (-) describes the correlation between all locations in the input image. The basic idea is as follows: for a particular location, it is reconstructed using the features of all locations. Similar contents in the output image have similar characteristic expressions, so that similar appearance expression is obtained, and the reasonability of the aesthetic quality enhanced image is ensured.
Further, the multi-scale multi-distribution constraint discriminating network in the step (4):
4-1 aesthetic quality enhancement: the quality of the aesthetic quality enhanced image is improved by adopting multi-distribution constraint. In order to improve the capability of judging the network, firstly, a pre-trained image aesthetic quality evaluation model is used as an aesthetic feature extraction module; then, feature graphs of three network layers with different depths of the model are used as the input of a discriminant network, and discriminant sub-networks are respectively constructed
Figure BDA0002874297760000081
Different discriminative subnetworks correspond to the expression of the aesthetic quality of the image on different scales. In order to improve the discrimination capability of the model, a multitask and multi-label learning mode is adopted, each discrimination network simultaneously predicts the image style type, the aesthetic quality (G/B) and the truth (R/F), and respectively adopts cross entropy Loss, triple Loss (triple Loss) and L2 Loss. Wherein the triplet penalty is due to the true aesthetic image Y, the enhanced image
Figure BDA0002874297760000082
And the original image X should satisfy the aesthetic degree
Figure BDA0002874297760000083
Triple Loss (Triplet Loss) is therefore introduced as an objective function, namely:
Figure BDA0002874297760000084
wherein alpha is a regulatory factor of]+Meaning that only terms greater than 0 are taken.
4-2 content discrimination network: and comparing the feature maps obtained from the regions corresponding to the aesthetic quality enhancement images and the input images by adopting a pre-trained fast-RCNN network, and calculating the L2 distance between the feature maps to be used as content loss. In the training phase, the generator and the discriminator are optimized in an end-to-end mode. In the testing phase, the given image and the style label are input into the generator, and the enhanced image can be output.

Claims (5)

1. A method for enhancing aesthetic quality of a multi-style image with structural constraint is characterized by comprising the following steps:
step (1) feature space conversion; converting input image data into vectors of an LAB space;
step (2) feature extraction; inputting the vector converted into the LAB space into an enhancement network, wherein the enhancement network comprises two structure adjustment networks and a pixel adjustment network; wherein the structure adjustment network is used for improving the aesthetic property of the composition; the pixel adjusting network further adjusts the color and the light and shadow effect of the image by adjusting the numerical value of each pixel;
step (3) refining the extracted features; inputting the characteristics of the enhanced network output into the refined network to obtain a final output aesthetic quality enhanced image;
step (4), multi-scale multi-distribution constraint discrimination network; and optimizing an enhancement network and a refinement network by adopting a multi-scale multi-distribution constraint discrimination network, thereby improving the quality of the final output aesthetic quality enhanced image.
2. A method for enhancing the aesthetic quality of a structurally-constrained multi-style image according to claim 1, wherein the feature space transformation of step (1):
1-1, performing cutting and turning pretreatment on an input image;
1-2 convert the preprocessed image as input into a vector in LAB space.
3. A method for enhancing aesthetic quality of multi-format image with structural constraints according to claim 1 or 2, characterized in that the feature extraction in step (2) is implemented as follows:
2-1 structural adjustment network:
the method comprises the steps that a pre-trained target detection reference network is adopted, a composition marking data set and an aesthetic quality evaluation data set are combined to train the pre-trained target detection reference network, image aesthetic task fine adjustment is conducted on the network in the training process, a fine adjustment strategy is to adopt a graph evaluation model to score candidate regions, and then the optimal top n candidate regions are selected based on a sequencing result; the pre-trained target detection reference network has better composition evaluation and aesthetic quality prediction capabilities, so that reliable feedback is provided for generation of candidate areas;
taking the output of the trained target detection reference network as the input of the graph attention network; extracting target features, associated features and regional features from an input image by a target detection reference network, and forming a diagram; then inputting the formed graph into a multilayer graph attention network, and outputting a beautified graph and a feature matrix corresponding to the beautified input image; in the iterative process of GAT, the feature of each layer graph attention network expresses the progressive transformation of the image structure and the semantic expression of the corresponding content, so that the predicted beautification graph and the feature matrix { X ] of all layers in the GAT are planned to be predicted(1),X(2),...,X(L)Inputting the image data into a precision network for synthesizing the enhanced image;
2-2 pixel adjustment network: aiming at adaptively adjusting the shadow and the color of the image aiming at different styles; inputting Lab three-channel data of an input image into a content encoder, and extracting high-level semantic features; meanwhile, inputting the style mark One-Hot vector into a style encoder, and extracting style high-level semantic features; then, high-level semantic features and style high-level semantic features are input into a decoder after being connected in series, and a pixel adjustment factor matrix T corresponding to each position of a Lab three channel is predicted by utilizing an adjustable Sigmoid activation function k sigma (·); wherein k is an adjustment factor, and sigma (·) represents a Sigmoid function; finally, performing dot multiplication on the pixel adjustment factor matrix T and the Lab matrix X of the original input image to obtain an image T X with adjusted brightness and color;
the content encoder, the style encoder and the decoder are integrally connected in a U-Net connection mode; considering similar areas in the input image, the adjustment factors k of the similar areas need to be similar, so that the output characteristics of the decoder are reconstructed by adopting a guiding attention mechanism; the guiding attention calculation flow of the feature graph y of the content encoder and the corresponding decoding layer feature graph x is as follows:
Figure FDA0002874297750000021
α (·) represents an attention calculation function; f (-), g (-), h (-), are the mappings of the feature graph x; α (-) describes the correlation between all locations in the input image; therefore, alpha (-) is taken as the structural description of the input image, and the output characteristics of the decoder are reconstructed by using alpha (-); after the feature graph z obtained by reconstruction is connected with the content encoder feature graph y and the style encoder feature graph s in series, the feature graph z is input into a subsequent decoding layer to obtain output; therefore, similar positions in the input image are ensured, and the output pixel adjustment factor matrix T also has similar expression, so that the output image and the input image are promoted to keep similar structures.
4. A method of aesthetically enhancing a structurally constrained multi-style image according to claim 3, wherein said refinement of step (3):
3-1 refinement network: beautifying images and feature matrixes output by the structure adjusting network and images output by the pixel adjusting network are synthesized based on the beautifying images and the feature matrixes output by the structure adjusting network;
the refinement network adopts a coding-decoding structure, wherein the convolution layer adopts a residual error network block, and a self-attention mechanism is introduced into a decoding layer to reconstruct the output characteristics of a decoder; the SA calculation for the decoded layer feature map x is represented as:
Figure FDA0002874297750000031
wherein α (·) represents an attention calculation function; f (-), g (-), h (-), are the mappings of the feature graph x; α (-) describes the correlation between all locations in the input image; the basic idea is as follows: for a specific location, reconstructing it using the features of all locations; similar contents in the output image have similar characteristic expressions, so that similar appearance expression is obtained, and the reasonability of the aesthetic quality enhanced image is ensured.
5. The method according to claim 4, wherein the multi-scale multi-distribution constraint discrimination network of step (4) comprises:
4-1, firstly, a pre-trained image aesthetic quality evaluation model is used as an aesthetic feature extraction module; then, feature graphs of three network layers with different depths of the model are used as the input of a discriminant network, and discriminant sub-networks are respectively constructed
Figure FDA0002874297750000032
Different discrimination sub-networks correspond to the expression of the aesthetic quality of the image on different scales; the discrimination capability of the model is improved by adopting a multi-task and multi-label learning mode, each discrimination network simultaneously predicts the image style type, the aesthetic quality (G/B) and the truth (R/F), and respectively adopts cross entropy Loss, triple Loss and L2 Loss; wherein the loss of triads is adoptedFor true and beautiful image Y, enhanced image
Figure FDA0002874297750000033
And the original image X should satisfy the aesthetic degree
Figure FDA0002874297750000034
The triplet losses are therefore introduced as an objective function, namely:
Figure FDA0002874297750000035
wherein alpha is a regulatory factor of]+Meaning that only terms greater than 0 are taken;
4-2, comparing the aesthetic quality enhancement image with feature maps obtained from corresponding areas of the input image by adopting a pre-trained Faster-RCNN network, and calculating an L2 distance between the feature maps as content loss; in the training stage, optimizing the generator and the discriminator in an end-to-end mode; in the testing phase, the given image and the style label are input into the generator, and the enhanced image can be output.
CN202011609567.0A 2020-12-30 2020-12-30 Method for enhancing aesthetic quality of multi-style image based on structural constraint Active CN112581360B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011609567.0A CN112581360B (en) 2020-12-30 2020-12-30 Method for enhancing aesthetic quality of multi-style image based on structural constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011609567.0A CN112581360B (en) 2020-12-30 2020-12-30 Method for enhancing aesthetic quality of multi-style image based on structural constraint

Publications (2)

Publication Number Publication Date
CN112581360A true CN112581360A (en) 2021-03-30
CN112581360B CN112581360B (en) 2024-04-09

Family

ID=75144595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011609567.0A Active CN112581360B (en) 2020-12-30 2020-12-30 Method for enhancing aesthetic quality of multi-style image based on structural constraint

Country Status (1)

Country Link
CN (1) CN112581360B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458750A (en) * 2019-05-31 2019-11-15 北京理工大学 A kind of unsupervised image Style Transfer method based on paired-associate learning
CN110782448A (en) * 2019-10-25 2020-02-11 广东三维家信息科技有限公司 Rendered image evaluation method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458750A (en) * 2019-05-31 2019-11-15 北京理工大学 A kind of unsupervised image Style Transfer method based on paired-associate learning
CN110782448A (en) * 2019-10-25 2020-02-11 广东三维家信息科技有限公司 Rendered image evaluation method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
兰红;刘秦邑;: "图注意力网络的场景图到图像生成模型", 中国图象图形学报, no. 08 *

Also Published As

Publication number Publication date
CN112581360B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN110223359B (en) Dual-stage multi-color-matching-line draft coloring model based on generation countermeasure network and construction method and application thereof
CN110443864B (en) Automatic artistic font generation method based on single-stage small-amount sample learning
Cheng et al. Light-guided and cross-fusion U-Net for anti-illumination image super-resolution
CN113313644B (en) Underwater image enhancement method based on residual double-attention network
CN111145290B (en) Image colorization method, system and computer readable storage medium
CN109766822B (en) Gesture recognition method and system based on neural network
CN113222875B (en) Image harmonious synthesis method based on color constancy
CN112950661A (en) Method for generating antithetical network human face cartoon based on attention generation
Li et al. Globally and locally semantic colorization via exemplar-based broad-GAN
CN111275613A (en) Editing method for generating confrontation network face attribute by introducing attention mechanism
CN111160138A (en) Fast face exchange method based on convolutional neural network
CN113392711A (en) Smoke semantic segmentation method and system based on high-level semantics and noise suppression
CN112767286A (en) Dark light image self-adaptive enhancement method based on intensive deep learning
CN111696136A (en) Target tracking method based on coding and decoding structure
CN113610732A (en) Full-focus image generation method based on interactive counterstudy
CN115222581A (en) Image generation method, model training method, related device and electronic equipment
CN115984323A (en) Two-stage fusion RGBT tracking algorithm based on space-frequency domain equalization
CN114639002A (en) Infrared and visible light image fusion method based on multi-mode characteristics
CN114359626A (en) Visible light-thermal infrared obvious target detection method based on condition generation countermeasure network
CN117351340A (en) Underwater image enhancement algorithm based on double-color space
CN112837212A (en) Image arbitrary style migration method based on manifold alignment
CN117151990B (en) Image defogging method based on self-attention coding and decoding
CN112581360B (en) Method for enhancing aesthetic quality of multi-style image based on structural constraint
CN109522918B (en) Hyperspectral image feature extraction method based on improved local singular spectrum analysis
CN116503502A (en) Unpaired infrared image colorization method based on contrast learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant