CN112581360A - Multi-style image aesthetic quality enhancement method based on structural constraint - Google Patents
Multi-style image aesthetic quality enhancement method based on structural constraint Download PDFInfo
- Publication number
- CN112581360A CN112581360A CN202011609567.0A CN202011609567A CN112581360A CN 112581360 A CN112581360 A CN 112581360A CN 202011609567 A CN202011609567 A CN 202011609567A CN 112581360 A CN112581360 A CN 112581360A
- Authority
- CN
- China
- Prior art keywords
- network
- image
- feature
- aesthetic quality
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 239000000203 mixture Substances 0.000 claims abstract description 14
- 239000013598 vector Substances 0.000 claims abstract description 14
- 230000000694 effects Effects 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 20
- 230000014509 gene expression Effects 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 15
- 238000001514 detection method Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000002708 enhancing effect Effects 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 8
- 230000009466 transformation Effects 0.000 claims description 8
- 238000010586 diagram Methods 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 6
- 238000013441 quality evaluation Methods 0.000 claims description 6
- 238000007670 refining Methods 0.000 claims description 5
- 230000002194 synthesizing effect Effects 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000013210 evaluation model Methods 0.000 claims description 3
- 230000000750 progressive effect Effects 0.000 claims description 3
- 230000001105 regulatory effect Effects 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000012804 iterative process Methods 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 claims 1
- 239000000284 extract Substances 0.000 abstract description 4
- 238000005457 optimization Methods 0.000 description 3
- 239000003086 colorant Substances 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
Images
Classifications
-
- G06T3/04—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
Abstract
The invention discloses a structure-constrained multi-style image aesthetic quality enhancement method. The invention comprises the following steps: (1) converting input image data into vectors of an LAB space; (2) inputting the vector converted into the LAB space into an enhancement network, wherein the enhancement network comprises two structure adjustment networks and a pixel adjustment network; wherein the structure adjustment network is used for improving the aesthetic property of the composition; the pixel adjusting network further adjusts the color and the light and shadow effect of the image by adjusting the numerical value of each pixel; (3) performing refinement processing on the extracted features; inputting the characteristics of the enhanced network output into the refined network to obtain a final output aesthetic quality enhanced image; (4) a multi-scale multi-distribution constraint discrimination network; and a multi-scale multi-distribution constraint discrimination network is adopted to optimize an enhancement network and a refinement network, so that the quality of the final output aesthetic quality enhanced image is improved. The structure adjustment network of the invention can automatically extract the optimal n beautified areas without human intervention.
Description
Technical Field
The invention provides a novel method for enhancing Multi-style image aesthetic quality (Multi-style image aesthetic quality enhanced on structural constraints) based on structural constraints, and mainly relates to a method for carrying out training by using a convolutional neural network, reconstructing partial regions of an image, capturing deep characteristic information and mixing a specific style to obtain a model capable of carrying out Multi-style aesthetic quality optimization on the image.
Background
The Image aesthetic quality enhancement (Image aesthetic quality enhancement) process typically involves the adjustment of factors such as hue, saturation, and composition. The existing method generally adopts two modes of clipping and pixel adjustment, and generally increases various rule constraints on the adjustment process based on expert knowledge, thereby limiting the diversity of the enhancement effect. In addition, in the conventional method, the internal correlation of the image is not considered in the adjustment process of the image pixels, and the rationality of the enhanced image in terms of light and shadow and color cannot be ensured. Finally, image aesthetics come in a variety of styles, with large differences in composition, color, shading, etc. between different styles. However, the existing method does not take the style factor into consideration, and usually only can obtain the enhancement effect with a single style, and is difficult to meet different requirements of users.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for enhancing the aesthetic quality of a multi-format image based on structural constraint. In order to break through the frame of adjusting the image structure based on clipping and introduce the structural constraint of the image content, a multi-style frame of enhancing the image aesthetic quality is realized.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step (1) feature space transformation
And converting the input image data into vectors of an LAB space, so that the expression of the image in light and shadow factors such as colors and the like is consistent with the subjective perception of human eyes.
Step (2) feature extraction
Inputting the vector converted into the LAB space into an enhancement network, wherein the enhancement network comprises two structure adjustment networks and a pixel adjustment network; wherein the structure adjustment network is used for improving the aesthetic property of the composition; the pixel adjustment network further adjusts the color and light and shadow effects of the image by adjusting the value of each pixel.
Step (3) refining the extracted features
And inputting the characteristics of the enhanced network output into the refined network to obtain the final output aesthetic quality enhanced image.
Step (4) multi-scale multi-distribution constraint discrimination network
And optimizing an enhancement network and a refinement network by adopting a multi-scale multi-distribution constraint discrimination network, thereby improving the quality of the final output aesthetic quality enhanced image.
Further, the feature space transformation in step (1):
1-1, preprocessing an input image by cutting, turning and the like;
1-2 convert the preprocessed image as input into a vector in LAB space.
Further, the feature extraction in step (2) is implemented as follows:
2-1 structural adjustment network:
the method comprises the steps of training a pre-trained target detection reference network by combining a composition marking data set and an aesthetic quality evaluation data set, carrying out image aesthetic task fine adjustment on the network in the training process, adopting a graph evaluation model to score candidate regions by a fine adjustment strategy, and then selecting the optimal top n candidate regions based on a sequencing result. The pre-trained target detection reference network has better composition evaluation and aesthetic quality prediction capabilities, and therefore reliable feedback is provided for generation of the candidate region.
Taking the output of the trained target detection reference network as the input of the graph attention network; the target detection reference network extracts target features, associated features, and area features from an input image and constructs a map. And inputting the constructed graph into a multi-layer graph attention network, and outputting a beautification graph and a feature matrix corresponding to the beautified input image. Features of each layer graph attention network during GAT iterationThe progressive transformation of the image structure and the semantic expression of the corresponding content are expressed, so that the predicted beautification map and the feature matrix { X ] of all layers in the GAT are planned to be predicted(1),X(2),...,X(L)And inputting the image data into a fine network for synthesizing the enhanced image.
2-2 pixel adjustment network: it is intended to adaptively adjust the shading and color of an image for different styles. And inputting the Lab three-channel data of the input image into a content encoder, and extracting high-level semantic features. In addition, considering that pixel adjustment rules corresponding to different aesthetic styles are different, style marking One-Hot vectors are simultaneously input into a style encoder to extract style high-level semantic features. And then, the high-level semantic features and the style high-level semantic features are connected in series and input into a decoder, and a pixel adjustment factor matrix T corresponding to each position of the Lab three channels is predicted by using an adjustable Sigmoid activation function k sigma (·). Where k is an adjustment factor and σ (·) represents a Sigmoid function. And finally, performing dot multiplication on the pixel adjustment factor matrix T and the Lab matrix X of the original input image to obtain the image T X with adjusted brightness and color.
Furthermore, the content encoder, the style encoder and the decoder are connected in a U-Net connection mode. Considering similar regions in the input image, the adjustment factor k needs to be similar, so the Guiding Attention (GA) mechanism is adopted to reconstruct the output characteristics of the decoder. The guiding attention calculation flow of the feature map y of the content encoder (such as the l-th layer) and the feature map x of the corresponding decoding layer (n-l layers) is as follows:
α (-) represents an attention calculation function, typically a feature similarity description between different locations; f (-), g (-), h (-), are the mappings of the feature graph x. α (-) describes the correlation between all locations in the input image. Thus α (-) is described as the structure of the input image and is used to reconstruct the decoder output features. And after the feature graph z obtained by reconstruction is connected with the content encoder feature graph y and the style encoder feature graph s in series, inputting the feature graph z into a subsequent decoding layer to obtain output. Therefore, similar positions in the input image are ensured, and the output pixel adjustment factor matrix T also has similar expression, so that the output image and the input image are promoted to keep similar structures.
Further, the refining process in step (3):
3-1 refinement network: and (4) adjusting the beautification picture and the feature matrix output by the network based on the structure and the image output by the pixel adjustment network, and synthesizing the beautified image.
The advanced network adopts an encoding-decoding structure, wherein a convolutional layer adopts a residual error network block, and a Self-Attention (SA) mechanism is introduced into a decoding layer to reconstruct the output characteristics of a decoder. The SA calculation for the decoded layer feature map x is represented as:
wherein α (·) represents an attention calculation function; f (-), g (-), h (-), are the mappings of the feature graph x. α (-) describes the correlation between all locations in the input image. The basic idea is as follows: for a particular location, it is reconstructed using the features of all locations. Similar contents in the output image have similar characteristic expressions, so that similar appearance expression is obtained, and the reasonability of the aesthetic quality enhanced image is ensured.
Further, the multi-scale multi-distribution constraint discriminating network in the step (4):
4-1 aesthetic quality enhancement: the quality of the aesthetic quality enhanced image is improved by adopting multi-distribution constraint. In order to improve the capability of judging the network, firstly, a pre-trained image aesthetic quality evaluation model is used as an aesthetic feature extraction module; then, feature graphs of three network layers with different depths of the model are used as the input of a discriminant network, and discriminant sub-networks are respectively constructedDifferent judgmentsThe network of the other sub-networks corresponds to the expression of the aesthetic quality of the image on different scales. In order to improve the discrimination capability of the model, a multitask and multi-label learning mode is adopted, each discrimination network simultaneously predicts the image style type, the aesthetic quality (G/B) and the truth (R/F), and respectively adopts cross entropy Loss, triple Loss (triple Loss) and L2 Loss. Wherein the triplet penalty is due to the true aesthetic image Y, the enhanced imageAnd the original image X should satisfy the aesthetic degreeTriple Loss (Triplet Loss) is therefore introduced as an objective function, namely:
wherein alpha is a regulatory factor of]+Meaning that only terms greater than 0 are taken.
4-2 content discrimination network: and comparing the feature maps obtained from the regions corresponding to the aesthetic quality enhancement images and the input images by adopting a pre-trained fast-RCNN network, and calculating the L2 distance between the feature maps to be used as content loss. In the training phase, the generator and the discriminator are optimized in an end-to-end mode. In the testing phase, the given image and the style label are input into the generator, and the enhanced image can be output.
The invention has the following beneficial effects:
the invention aims to overcome the defects of the prior art and provides a method for enhancing the aesthetic quality of a multi-format image based on structural constraint. In order to break through the frame of adjusting the image structure based on clipping and introduce the structural constraint of the image content, a multi-style frame of enhancing the image aesthetic quality is realized. The invention has the advantage that the structure adjusting network can automatically extract the optimal n beautifying areas without human intervention. Further attention is drawn to the extent to which the network can further refine the area. And then, an optimization strategy is automatically provided by the pixel adjustment network through extracting image characteristics, and the optimization strategy is input into the refinement network, so that the model can beautify the input image with high efficiency and high quality.
Drawings
FIG. 1 is a schematic diagram of an aesthetic quality assessment framework using composition blending with global features;
FIG. 2 is an architectural diagram of a global feature and composition feature extraction network;
FIG. 1 (a) is an overall architecture diagram of a structurally constrained multi-format image aesthetic quality enhancement model;
FIG. 1 (b) is a schematic diagram of a fabric adjustment network architecture;
FIG. 2 (c) is a schematic diagram of a pixel adjustment network architecture
FIG. 2 (d) is a schematic diagram of the architecture of a multi-scale, multi-distribution constraint discriminating network
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in fig. 1 and fig. 2, a method for enhancing the aesthetic quality of a multi-format image with structural constraints specifically includes the following steps:
step (1) feature space transformation
And converting the input image data into vectors of an LAB space, so that the expression of the image in light and shadow factors such as colors and the like is consistent with the subjective perception of human eyes.
Step (2) feature extraction
Inputting the vector converted into the LAB space into an enhancement network, wherein the enhancement network comprises two structure adjustment networks and a pixel adjustment network; wherein the structure adjustment network is used for improving the aesthetic property of the composition; the pixel adjustment network further adjusts the color and light and shadow effects of the image by adjusting the value of each pixel.
Step (3) refining the extracted features
And inputting the characteristics of the enhanced network output into the refined network to obtain the final output aesthetic quality enhanced image.
Step (4) multi-scale multi-distribution constraint discrimination network
And optimizing an enhancement network and a refinement network by adopting a multi-scale multi-distribution constraint discrimination network, thereby improving the quality of the final output aesthetic quality enhanced image.
Further, the feature space transformation in step (1):
1-1, preprocessing an input image by cutting, turning and the like;
1-2 convert the preprocessed image as input into a vector in LAB space.
Further, the feature extraction in step (2) is implemented as follows:
2-1 structural adjustment network:
the method comprises the steps of training a pre-trained target detection reference network by combining a composition marking data set and an aesthetic quality evaluation data set, carrying out image aesthetic task fine adjustment on the network in the training process, adopting a graph evaluation model to score candidate regions by a fine adjustment strategy, and then selecting the optimal top n candidate regions based on a sequencing result. The pre-trained target detection reference network has better composition evaluation and aesthetic quality prediction capabilities, and therefore reliable feedback is provided for generation of the candidate region.
Taking the output of the trained target detection reference network as the input of the graph attention network; the target detection reference network extracts target features, associated features, and area features from an input image and constructs a map. And inputting the constructed graph into a multi-layer graph attention network, and outputting a beautification graph and a feature matrix corresponding to the beautified input image. In the iterative process of GAT, the feature of each layer graph attention network expresses the progressive transformation of the image structure and the semantic expression of the corresponding content, so that the predicted beautification graph and the feature matrix { X ] of all layers in the GAT are planned to be predicted(1),X(2),...,X(L)And inputting the image data into a fine network for synthesizing the enhanced image.
2-2 pixel adjustment network: it is intended to adaptively adjust the shading and color of an image for different styles. And inputting the Lab three-channel data of the input image into a content encoder, and extracting high-level semantic features. In addition, considering that pixel adjustment rules corresponding to different aesthetic styles are different, style marking One-Hot vectors are simultaneously input into a style encoder to extract style high-level semantic features. And then, the high-level semantic features and the style high-level semantic features are connected in series and input into a decoder, and a pixel adjustment factor matrix T corresponding to each position of the Lab three channels is predicted by using an adjustable Sigmoid activation function k sigma (·). Where k is an adjustment factor and σ (·) represents a Sigmoid function. And finally, performing dot multiplication on the pixel adjustment factor matrix T and the Lab matrix X of the original input image to obtain the image T X with adjusted brightness and color.
Furthermore, the content encoder, the style encoder and the decoder are connected in a U-Net connection mode. Considering similar regions in the input image, the adjustment factor k needs to be similar, so the Guiding Attention (GA) mechanism is adopted to reconstruct the output characteristics of the decoder. The guiding attention calculation flow of the feature map y of the content encoder (such as the l-th layer) and the feature map x of the corresponding decoding layer (n-l layers) is as follows:
α (-) represents an attention calculation function, typically a feature similarity description between different locations; f (-), g (-), h (-), are the mappings of the feature graph x. α (-) describes the correlation between all locations in the input image. Thus α (-) is described as the structure of the input image and is used to reconstruct the decoder output features. And after the feature graph z obtained by reconstruction is connected with the content encoder feature graph y and the style encoder feature graph s in series, inputting the feature graph z into a subsequent decoding layer to obtain output. Therefore, similar positions in the input image are ensured, and the output pixel adjustment factor matrix T also has similar expression, so that the output image and the input image are promoted to keep similar structures.
Further, the refining process in step (3):
3-1 refinement network: and (4) adjusting the beautification picture and the feature matrix output by the network based on the structure and the image output by the pixel adjustment network, and synthesizing the beautified image.
The advanced network adopts an encoding-decoding structure, wherein a convolutional layer adopts a residual error network block, and a Self-Attention (SA) mechanism is introduced into a decoding layer to reconstruct the output characteristics of a decoder. The SA calculation for the decoded layer feature map x is represented as:
wherein α (·) represents an attention calculation function; f (-), g (-), h (-), are the mappings of the feature graph x. α (-) describes the correlation between all locations in the input image. The basic idea is as follows: for a particular location, it is reconstructed using the features of all locations. Similar contents in the output image have similar characteristic expressions, so that similar appearance expression is obtained, and the reasonability of the aesthetic quality enhanced image is ensured.
Further, the multi-scale multi-distribution constraint discriminating network in the step (4):
4-1 aesthetic quality enhancement: the quality of the aesthetic quality enhanced image is improved by adopting multi-distribution constraint. In order to improve the capability of judging the network, firstly, a pre-trained image aesthetic quality evaluation model is used as an aesthetic feature extraction module; then, feature graphs of three network layers with different depths of the model are used as the input of a discriminant network, and discriminant sub-networks are respectively constructedDifferent discriminative subnetworks correspond to the expression of the aesthetic quality of the image on different scales. In order to improve the discrimination capability of the model, a multitask and multi-label learning mode is adopted, each discrimination network simultaneously predicts the image style type, the aesthetic quality (G/B) and the truth (R/F), and respectively adopts cross entropy Loss, triple Loss (triple Loss) and L2 Loss. Wherein the triplet penalty is due to the true aesthetic image Y, the enhanced imageAnd the original image X should satisfy the aesthetic degreeTriple Loss (Triplet Loss) is therefore introduced as an objective function, namely:
wherein alpha is a regulatory factor of]+Meaning that only terms greater than 0 are taken.
4-2 content discrimination network: and comparing the feature maps obtained from the regions corresponding to the aesthetic quality enhancement images and the input images by adopting a pre-trained fast-RCNN network, and calculating the L2 distance between the feature maps to be used as content loss. In the training phase, the generator and the discriminator are optimized in an end-to-end mode. In the testing phase, the given image and the style label are input into the generator, and the enhanced image can be output.
Claims (5)
1. A method for enhancing aesthetic quality of a multi-style image with structural constraint is characterized by comprising the following steps:
step (1) feature space conversion; converting input image data into vectors of an LAB space;
step (2) feature extraction; inputting the vector converted into the LAB space into an enhancement network, wherein the enhancement network comprises two structure adjustment networks and a pixel adjustment network; wherein the structure adjustment network is used for improving the aesthetic property of the composition; the pixel adjusting network further adjusts the color and the light and shadow effect of the image by adjusting the numerical value of each pixel;
step (3) refining the extracted features; inputting the characteristics of the enhanced network output into the refined network to obtain a final output aesthetic quality enhanced image;
step (4), multi-scale multi-distribution constraint discrimination network; and optimizing an enhancement network and a refinement network by adopting a multi-scale multi-distribution constraint discrimination network, thereby improving the quality of the final output aesthetic quality enhanced image.
2. A method for enhancing the aesthetic quality of a structurally-constrained multi-style image according to claim 1, wherein the feature space transformation of step (1):
1-1, performing cutting and turning pretreatment on an input image;
1-2 convert the preprocessed image as input into a vector in LAB space.
3. A method for enhancing aesthetic quality of multi-format image with structural constraints according to claim 1 or 2, characterized in that the feature extraction in step (2) is implemented as follows:
2-1 structural adjustment network:
the method comprises the steps that a pre-trained target detection reference network is adopted, a composition marking data set and an aesthetic quality evaluation data set are combined to train the pre-trained target detection reference network, image aesthetic task fine adjustment is conducted on the network in the training process, a fine adjustment strategy is to adopt a graph evaluation model to score candidate regions, and then the optimal top n candidate regions are selected based on a sequencing result; the pre-trained target detection reference network has better composition evaluation and aesthetic quality prediction capabilities, so that reliable feedback is provided for generation of candidate areas;
taking the output of the trained target detection reference network as the input of the graph attention network; extracting target features, associated features and regional features from an input image by a target detection reference network, and forming a diagram; then inputting the formed graph into a multilayer graph attention network, and outputting a beautified graph and a feature matrix corresponding to the beautified input image; in the iterative process of GAT, the feature of each layer graph attention network expresses the progressive transformation of the image structure and the semantic expression of the corresponding content, so that the predicted beautification graph and the feature matrix { X ] of all layers in the GAT are planned to be predicted(1),X(2),...,X(L)Inputting the image data into a precision network for synthesizing the enhanced image;
2-2 pixel adjustment network: aiming at adaptively adjusting the shadow and the color of the image aiming at different styles; inputting Lab three-channel data of an input image into a content encoder, and extracting high-level semantic features; meanwhile, inputting the style mark One-Hot vector into a style encoder, and extracting style high-level semantic features; then, high-level semantic features and style high-level semantic features are input into a decoder after being connected in series, and a pixel adjustment factor matrix T corresponding to each position of a Lab three channel is predicted by utilizing an adjustable Sigmoid activation function k sigma (·); wherein k is an adjustment factor, and sigma (·) represents a Sigmoid function; finally, performing dot multiplication on the pixel adjustment factor matrix T and the Lab matrix X of the original input image to obtain an image T X with adjusted brightness and color;
the content encoder, the style encoder and the decoder are integrally connected in a U-Net connection mode; considering similar areas in the input image, the adjustment factors k of the similar areas need to be similar, so that the output characteristics of the decoder are reconstructed by adopting a guiding attention mechanism; the guiding attention calculation flow of the feature graph y of the content encoder and the corresponding decoding layer feature graph x is as follows:
α (·) represents an attention calculation function; f (-), g (-), h (-), are the mappings of the feature graph x; α (-) describes the correlation between all locations in the input image; therefore, alpha (-) is taken as the structural description of the input image, and the output characteristics of the decoder are reconstructed by using alpha (-); after the feature graph z obtained by reconstruction is connected with the content encoder feature graph y and the style encoder feature graph s in series, the feature graph z is input into a subsequent decoding layer to obtain output; therefore, similar positions in the input image are ensured, and the output pixel adjustment factor matrix T also has similar expression, so that the output image and the input image are promoted to keep similar structures.
4. A method of aesthetically enhancing a structurally constrained multi-style image according to claim 3, wherein said refinement of step (3):
3-1 refinement network: beautifying images and feature matrixes output by the structure adjusting network and images output by the pixel adjusting network are synthesized based on the beautifying images and the feature matrixes output by the structure adjusting network;
the refinement network adopts a coding-decoding structure, wherein the convolution layer adopts a residual error network block, and a self-attention mechanism is introduced into a decoding layer to reconstruct the output characteristics of a decoder; the SA calculation for the decoded layer feature map x is represented as:
wherein α (·) represents an attention calculation function; f (-), g (-), h (-), are the mappings of the feature graph x; α (-) describes the correlation between all locations in the input image; the basic idea is as follows: for a specific location, reconstructing it using the features of all locations; similar contents in the output image have similar characteristic expressions, so that similar appearance expression is obtained, and the reasonability of the aesthetic quality enhanced image is ensured.
5. The method according to claim 4, wherein the multi-scale multi-distribution constraint discrimination network of step (4) comprises:
4-1, firstly, a pre-trained image aesthetic quality evaluation model is used as an aesthetic feature extraction module; then, feature graphs of three network layers with different depths of the model are used as the input of a discriminant network, and discriminant sub-networks are respectively constructedDifferent discrimination sub-networks correspond to the expression of the aesthetic quality of the image on different scales; the discrimination capability of the model is improved by adopting a multi-task and multi-label learning mode, each discrimination network simultaneously predicts the image style type, the aesthetic quality (G/B) and the truth (R/F), and respectively adopts cross entropy Loss, triple Loss and L2 Loss; wherein the loss of triads is adoptedFor true and beautiful image Y, enhanced imageAnd the original image X should satisfy the aesthetic degreeThe triplet losses are therefore introduced as an objective function, namely:
wherein alpha is a regulatory factor of]+Meaning that only terms greater than 0 are taken;
4-2, comparing the aesthetic quality enhancement image with feature maps obtained from corresponding areas of the input image by adopting a pre-trained Faster-RCNN network, and calculating an L2 distance between the feature maps as content loss; in the training stage, optimizing the generator and the discriminator in an end-to-end mode; in the testing phase, the given image and the style label are input into the generator, and the enhanced image can be output.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011609567.0A CN112581360B (en) | 2020-12-30 | 2020-12-30 | Method for enhancing aesthetic quality of multi-style image based on structural constraint |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011609567.0A CN112581360B (en) | 2020-12-30 | 2020-12-30 | Method for enhancing aesthetic quality of multi-style image based on structural constraint |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112581360A true CN112581360A (en) | 2021-03-30 |
CN112581360B CN112581360B (en) | 2024-04-09 |
Family
ID=75144595
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011609567.0A Active CN112581360B (en) | 2020-12-30 | 2020-12-30 | Method for enhancing aesthetic quality of multi-style image based on structural constraint |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112581360B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110458750A (en) * | 2019-05-31 | 2019-11-15 | 北京理工大学 | A kind of unsupervised image Style Transfer method based on paired-associate learning |
CN110782448A (en) * | 2019-10-25 | 2020-02-11 | 广东三维家信息科技有限公司 | Rendered image evaluation method and device |
-
2020
- 2020-12-30 CN CN202011609567.0A patent/CN112581360B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110458750A (en) * | 2019-05-31 | 2019-11-15 | 北京理工大学 | A kind of unsupervised image Style Transfer method based on paired-associate learning |
CN110782448A (en) * | 2019-10-25 | 2020-02-11 | 广东三维家信息科技有限公司 | Rendered image evaluation method and device |
Non-Patent Citations (1)
Title |
---|
兰红;刘秦邑;: "图注意力网络的场景图到图像生成模型", 中国图象图形学报, no. 08 * |
Also Published As
Publication number | Publication date |
---|---|
CN112581360B (en) | 2024-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110223359B (en) | Dual-stage multi-color-matching-line draft coloring model based on generation countermeasure network and construction method and application thereof | |
CN110443864B (en) | Automatic artistic font generation method based on single-stage small-amount sample learning | |
Cheng et al. | Light-guided and cross-fusion U-Net for anti-illumination image super-resolution | |
CN113313644B (en) | Underwater image enhancement method based on residual double-attention network | |
CN111145290B (en) | Image colorization method, system and computer readable storage medium | |
CN109766822B (en) | Gesture recognition method and system based on neural network | |
CN113222875B (en) | Image harmonious synthesis method based on color constancy | |
CN112950661A (en) | Method for generating antithetical network human face cartoon based on attention generation | |
Li et al. | Globally and locally semantic colorization via exemplar-based broad-GAN | |
CN111275613A (en) | Editing method for generating confrontation network face attribute by introducing attention mechanism | |
CN111160138A (en) | Fast face exchange method based on convolutional neural network | |
CN113392711A (en) | Smoke semantic segmentation method and system based on high-level semantics and noise suppression | |
CN112767286A (en) | Dark light image self-adaptive enhancement method based on intensive deep learning | |
CN111696136A (en) | Target tracking method based on coding and decoding structure | |
CN113610732A (en) | Full-focus image generation method based on interactive counterstudy | |
CN115222581A (en) | Image generation method, model training method, related device and electronic equipment | |
CN115984323A (en) | Two-stage fusion RGBT tracking algorithm based on space-frequency domain equalization | |
CN114639002A (en) | Infrared and visible light image fusion method based on multi-mode characteristics | |
CN114359626A (en) | Visible light-thermal infrared obvious target detection method based on condition generation countermeasure network | |
CN117351340A (en) | Underwater image enhancement algorithm based on double-color space | |
CN112837212A (en) | Image arbitrary style migration method based on manifold alignment | |
CN117151990B (en) | Image defogging method based on self-attention coding and decoding | |
CN112581360B (en) | Method for enhancing aesthetic quality of multi-style image based on structural constraint | |
CN109522918B (en) | Hyperspectral image feature extraction method based on improved local singular spectrum analysis | |
CN116503502A (en) | Unpaired infrared image colorization method based on contrast learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |