CN111242841B - Image background style migration method based on semantic segmentation and deep learning - Google Patents

Image background style migration method based on semantic segmentation and deep learning Download PDF

Info

Publication number
CN111242841B
CN111242841B CN202010043890.XA CN202010043890A CN111242841B CN 111242841 B CN111242841 B CN 111242841B CN 202010043890 A CN202010043890 A CN 202010043890A CN 111242841 B CN111242841 B CN 111242841B
Authority
CN
China
Prior art keywords
picture
style
content
network
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010043890.XA
Other languages
Chinese (zh)
Other versions
CN111242841A (en
Inventor
颜成钢
郑鑫磊
孙垚棋
张继勇
张勇东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202010043890.XA priority Critical patent/CN111242841B/en
Publication of CN111242841A publication Critical patent/CN111242841A/en
Application granted granted Critical
Publication of CN111242841B publication Critical patent/CN111242841B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a picture background style migration method based on semantic segmentation and deep learning. Firstly, selecting content pictures and style pictures and preprocessing the pictures; then, directly calculating a picture with a relatively close result by the content picture and the style picture through a ResNet network; and then obtaining style constraint and content constraint through a VGG-19 network, performing gradient descent according to a loss function, obtaining a background style migration result in a multi-iteration mode, and finally putting the migration result back on the picture. The invention has the advantages of hundreds of times of speed improvement, strong expandability, style migration of local areas, retention of image main body contents, prominent main body and enhanced image artistic expressive force, and strong code readability and transportability.

Description

Image background style migration method based on semantic segmentation and deep learning
Technical Field
The invention relates to the field of deep learning, in particular to a picture background style migration method based on semantic segmentation and deep learning.
Background
Neural network based image style migration was proposed in 2015 by Gatys et al. There is only one innovation point in the article by Gatys, but the innovation point has very important significance in the fields of style migration and image texture generation: the paper proposes a method of modeling texture with deep learning. Until then scientists have hoped to find a method that enables texture to be described by local statistical models, while manual modeling is too complex and extremely poorly versatile. Gatys inspired from papers in the field related to object recognition, found that VGG19 networks can be considered as many local feature recognizers, and after experimental verification, found that these feature recognizers also perform very well in the field of style migration.
However, the Gatys method has obvious problems and disadvantages, and the most important is that the migration speed of the original method is very slow, almost twenty minutes is needed even when the top GPU is operated in 2019, and the style migration of a picture with the size of 512 by 512 can be completed in more than twelve hours on a common CPU.
The other disadvantage is that the transfer method of the Gatys only can transfer the whole picture, and cannot perform style transfer on a target with a certain characteristic and keep other objects unchanged.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a picture background style migration method based on semantic segmentation and deep learning. The invention aims to only transfer the image background when transferring the image style so as to highlight the main body and enhance the artistic expressive force of the image.
In order to achieve the technical purpose, the technical scheme of the invention is that a picture background style migration method based on semantic segmentation and deep learning comprises the following steps:
selecting a content picture and a style picture and preprocessing the pictures;
selecting a picture with a definite main body as a content picture, and performing semantic segmentation on the content picture by adopting a U-Net network, wherein the backbone network of the U-Net is selected as ResNet-18. The number of output classes of the network is set to 2, defining the softmax function required for classification:
Figure BDA0002368694230000021
in the formula, a k (x) Expressing the score of the characteristic channel corresponding to each pixel point, K is the number of classes, p k (x) Is thatAnd for the class k result of classifying the pixel points, the softmax function is used for classifying and outputting the calculation result after the neural network calculation is completed, before the softmax is passed, the data is represented as a one-dimensional activated data body, and the output result after the softmax is passed is mapped to the corresponding class and forms different marks of the main body and the background. After the main body is divided, subtracting the main body by using the full image to obtain a content image with the main body removed and only the background left;
selecting a style picture of which the background style needs to be obtained, and generally selecting a picture with obvious and strong style as the style picture in order to ensure that the effect is good;
and finally, cutting the content picture and the style picture into the same size to obtain the preprocessed inner picture and the preprocessed style picture.
Respectively inputting the preprocessed content pictures and style pictures into a ResNet-50 network, and outputting the content pictures and style pictures after the last layer of pooling layer is output
Figure BDA0002368694230000022
And &>
Figure BDA0002368694230000023
Performing weighted addition with the formula
Figure BDA0002368694230000024
Will be provided with
Figure BDA0002368694230000025
Inputting into ResNet-50 for up-sampling to obtain a picture with the same size as the input picture>
Figure BDA0002368694230000026
Step (3), inputting the style pictures into the VGG-19 network, and marking the style pictures as
Figure BDA0002368694230000029
To obtain a representation of the input image style, a feature space for capturing texture information is employed. The feature space can be built on top of the filter responses of any layer of the network, consisting of feature correlations between different filter responses, where the expected values occupy the spatial extent of the feature map. The characteristic correlation is based on the Gram matrix->
Figure BDA0002368694230000027
Give, wherein>
Figure BDA0002368694230000028
Is the inner product between the vectorized feature maps i and j in layer i.
By the inner product operation, a stylistic representation of the input image can be obtained. These information captured by the stylistic feature space built on different levels of the network are visualized by constructing images that match the stylistic representation of a given input image. By making a pair
Figure BDA0002368694230000031
Performing a gradient descent may enable a style migration, wherein the loss function is defined as a mean square distance between Gram matrix terms of the style picture and Gram matrix terms of the image to be generated.
A l And G l Are respectively provided with
Figure BDA0002368694230000032
Is and>
Figure BDA0002368694230000033
the respective features in layer l are indicated. The contribution of the first layer convolution layer to the total loss is:
Figure BDA0002368694230000034
the total loss can be expressed as
Figure BDA0002368694230000035
In the formula, w l Is a weight of the contribution of each layer to the total loss. Regarding activities in layer l, E l The derivative of (c) can be calculated by analysis:
Figure BDA0002368694230000036
E l relative to
Figure BDA0002368694230000037
The gradient of the value of each pixel in (a) can be easily calculated using standard error back propagation, taking the outputs of the first, third, fifth, ninth, and thirteenth layers of the VGG-19 network as the style constraints, respectively.
Step (4) inputting the content pictures into the VGG-19 network, and marking the content pictures as
Figure BDA0002368694230000038
In order to visualize the image information encoded in the different convolutional layers, the image output in step (2) is->
Figure BDA0002368694230000039
A gradient descent operation is performed to find another image that matches the characteristic response of the original image. P l Represents->
Figure BDA00023686942300000310
Characterization in layer l, F l Represents->
Figure BDA00023686942300000311
In the l-th layer, the feature representation of all convolution kernels after activation is defined, and then the square error loss between the two feature representations is defined
Figure BDA00023686942300000312
The derivative of this loss with respect to the active data volume in layer l is equal to
Figure BDA0002368694230000041
From which standard error back-propagation calculations can be used with respect to the image
Figure BDA0002368694230000042
Of the gradient of (a). Thus, the original random image can be modified until it is produced in a certain layer of the convolutional neural network that->
Figure BDA0002368694230000043
In the same response, the initial random image is not a completely random noise map but the output result of step (2)>
Figure BDA00023686942300000418
Taking the output of the tenth layer of the VGG-19 network as content constraint;
step (5) for making style pictures
Figure BDA0002368694230000044
Is transferred to the content picture->
Figure BDA0002368694230000045
In step (2), the output picture obtained in step (2) is taken>
Figure BDA0002368694230000046
As the image to be combined, this image is simultaneously matched->
Figure BDA0002368694230000047
Is represented by and->
Figure BDA0002368694230000048
Is input into the VGG-19 network, and a loss function calculation step is used(3) The resulting style constraint->
Figure BDA0002368694230000049
And the output picture obtained in the step (2) is based on>
Figure BDA00023686942300000410
Loss of
Figure BDA00023686942300000411
And the content constraint obtained in step (4)>
Figure BDA00023686942300000412
And the output picture obtained in the step (2) is based on>
Figure BDA00023686942300000413
Loss of
Figure BDA00023686942300000414
The two are multiplied by a coefficient and then added. A gradient descent algorithm is used to optimize the loss function. Overall style migration by jointly minimizing>
Figure BDA00023686942300000415
The euclidean distance between the three. The overall loss function is as follows:
Figure BDA00023686942300000416
where alpha and beta are weighting factors for content and style reconstruction, respectively.
And (6) repeating the step (5) to carry out 10 times of iterative computation so as to enable the output picture obtained in the step (2) to be output
Figure BDA00023686942300000419
The output after an iteration->
Figure BDA00023686942300000417
Similar in content to the content graph as much as possibleThe tiles are similar in style to the style pictures.
And (7) finally, putting the segmented main body back on the background subjected to style migration.
The invention has the following beneficial effects:
(1) Compared with the traditional style migration algorithm, the algorithm does not perform gradient reduction from a white noise picture, but directly calculates a picture relatively close to a result from a content picture and a style picture, and uses a faster and simpler ResNet network, so that the test speed is increased by hundreds of times, and the quality is not obviously reduced;
(2) The expandability is strong, the deeper image style and content information can be extracted by deepening the depth of ResNet, and the arithmetic operation time is slightly increased;
(3) Carrying out style migration on the local area, and reserving the main content of the image so as to achieve the effects of highlighting the main body and enhancing the artistic expressive force of the image;
(4) The high-level framework fastai development based on the pyrrch is adopted, and the code readability and the portability are strong.
Drawings
FIG. 1 is a system flow diagram of the present invention;
FIG. 2 is a content portrait image employed by embodiments of the present invention;
FIG. 3 is the result of the portrait style migration method of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, the method for migrating the background style of a picture of the present invention includes the following steps:
selecting a content picture and a style picture and preprocessing the pictures;
selecting a picture with a definite main body as a content picture, such as a portrait and a still picture with a definite theme, and performing semantic segmentation on the content picture by adopting a U-Net network, wherein a backbone network of the U-Net is selected as ResNet-18, and because the segmentation task is simple, the task can be completed by using a shallow network. Setting the number of output classes of the network to 2, defining the softmax function required for classification:
Figure BDA0002368694230000051
in the formula, ak (x) represents the score of the feature channel corresponding to each pixel point, K is the number of classes, and p k (x) The softmax function is used for classifying and outputting the calculation result after the neural network calculation is completed, before the calculation result passes the softmax function, the data is represented as a one-dimensional activated data body, the output result after the softmax function is mapped to the corresponding category, and different marks of a main body and a background are formed. After the main body is divided, subtracting the main body by using the full image to obtain a content image with the main body removed and only the background left;
selecting style pictures of which the background style needs to be obtained, and generally selecting pictures with obvious and strong styles as the style pictures for better effect, such as oil paintings of impression groups and abstract groups;
and finally, cutting the content picture and the style picture into the same size to obtain the preprocessed inner picture and the preprocessed style picture.
Respectively inputting the preprocessed content pictures and style pictures into a ResNet-50 network, and outputting the content pictures and style pictures after the last layer of pooling layer is output
Figure BDA0002368694230000061
And &>
Figure BDA0002368694230000062
Performing weighted addition with the formula
Figure BDA0002368694230000063
Will be provided with
Figure BDA0002368694230000064
Inputting the picture into ResNet-50 to perform upsampling to obtain a picture which is equal to the size of the input picture>
Figure BDA0002368694230000065
Step (3) inputting the style pictures into the VGG-19 network, and marking the style pictures as
Figure BDA00023686942300000612
To obtain a representation of the input image style, a feature space for capturing texture information is employed. The feature space can be built on top of the filter responses of any layer of the network, consisting of feature correlations between different filter responses, where the expected values occupy the spatial extent of the feature map. The characteristic correlation is based on the Gram matrix->
Figure BDA0002368694230000066
Give, wherein>
Figure BDA0002368694230000067
Is the inner product between the vectorized feature maps i and j in layer i.
By the inner product operation, a stylistic representation of the input image may be obtained, which captures the texture information of the input image, rather than a global arrangement, because the inner product operation destroys the spatial correlation of the original image. These information captured by the stylistic feature space built on different levels of the network are visualized by constructing images that match the stylistic representation of a given input image. By pairs
Figure BDA0002368694230000068
Performing a gradient descent may enable a style transition, wherein the loss function is defined as the mean square distance between the Gram matrix terms of the style picture and the Gram matrix terms of the image to be generated.
A l And G l Are respectively provided with
Figure BDA0002368694230000069
Is and>
Figure BDA00023686942300000610
the respective characteristics are represented in layer i. The contribution of the first layer convolution layer to the total loss is:
Figure BDA00023686942300000611
the total loss can be expressed as
Figure BDA0002368694230000071
In the formula, w l Is a weight of the contribution of each layer to the total loss. With respect to activities in layer l, E l The derivative of (c) can be calculated by analysis:
Figure BDA0002368694230000072
E l relative to
Figure BDA0002368694230000073
The gradient of the value of each pixel in (a) can be easily calculated using standard error back propagation, taking the outputs of the first, third, fifth, ninth, and thirteenth layers of the VGG-19 network as the style constraints, respectively.
Step (4), inputting the content pictures into the VGG-19 network, and marking the content pictures as
Figure BDA0002368694230000074
In order to visualize the image information encoded in the different convolutional layers, the image output in step (2) is->
Figure BDA0002368694230000075
A gradient descent operation is performed to find another image that matches the characteristic response of the original image. P is l Represents->
Figure BDA0002368694230000076
Characterization in layer l, F l Represents->
Figure BDA0002368694230000077
In the l-th layer, the feature representation of all convolution kernels after activation is defined, and then the square error loss between the two feature representations is defined
Figure BDA0002368694230000078
The derivative of this loss with respect to the volume of active data in layer l is equal to
Figure BDA0002368694230000079
From which standard error back-propagation calculations can be used with respect to the image
Figure BDA00023686942300000710
Of the gradient of (c). Thus, the original random image can be modified until it is produced in a certain layer of the convolutional neural network that->
Figure BDA00023686942300000711
The same response, the initial random image is not a completely random noise plot, but rather the output result of step (2)>
Figure BDA00023686942300000712
Taking the output of the tenth layer of the VGG-19 network as a content constraint;
step (5) for making style pictures
Figure BDA00023686942300000713
Is transferred to the content picture->
Figure BDA00023686942300000714
In step (2), the output picture obtained in step (2) is taken>
Figure BDA00023686942300000715
As the image to be combined, this image is simultaneously matched->
Figure BDA00023686942300000716
Is represented by and->
Figure BDA0002368694230000081
Is input into the VGG-19 network, the style constraint ≥ obtained in step (3) is calculated using a penalty function>
Figure BDA0002368694230000082
And the output picture obtained in the step (2) is based on>
Figure BDA0002368694230000083
Loss of
Figure BDA0002368694230000084
And the content constraint obtained in step (4)>
Figure BDA0002368694230000085
And the output picture obtained in the step (2) is based on>
Figure BDA0002368694230000086
Loss of (2)
Figure BDA0002368694230000087
The two are multiplied by a coefficient and then added. A gradient descent algorithm is used to optimize the loss function. Overall style migration by jointly minimizing>
Figure BDA0002368694230000088
The euclidean distance between the three. The overall loss function is as follows:
Figure BDA0002368694230000089
where alpha and beta are weight factors for content and style reconstruction, respectively.
FIG. 2 is a content portrait image employed by embodiments of the present invention;
FIG. 3 is the result of the portrait style migration method of the present invention.

Claims (4)

1. A picture background style migration method based on semantic segmentation and deep learning is characterized by comprising the following steps:
selecting a content picture and a style picture and preprocessing the pictures;
respectively inputting the preprocessed content pictures and style pictures into a ResNet-50 network, and outputting the content pictures and style pictures after the last layer of pooling layer is output
Figure FDA0004105404860000011
And &>
Figure FDA0004105404860000012
Performing weighted addition with the formula
Figure FDA0004105404860000013
Will be provided with
Figure FDA0004105404860000014
Inputting the picture into ResNet-50 to perform upsampling to obtain a picture which is equal to the size of the input picture>
Figure FDA0004105404860000015
Step (3), inputting the style picture into a VGG-19 network to obtain style constraint;
step (4), inputting the content picture into a VGG-19 network to obtain content constraint;
step (5),To take style pictures
Figure FDA0004105404860000016
Is transferred to a content picture>
Figure FDA0004105404860000017
Adopting the output picture obtained in the step (2)
Figure FDA0004105404860000018
As an image to be combined, which is simultaneously matched +>
Figure FDA0004105404860000019
Is represented by and->
Figure FDA00041054048600000110
Is input into the VGG-19 network and the style picture ÷ obtained in step (3) is calculated using a loss function>
Figure FDA00041054048600000111
And the output picture obtained in the step (2) is based on>
Figure FDA00041054048600000121
Loss of
Figure FDA00041054048600000112
And the content picture obtained in the step (4) is/are>
Figure FDA00041054048600000113
And the output picture obtained in the step (2) is based on>
Figure FDA00041054048600000114
Loss of (2)
Figure FDA00041054048600000115
Multiplying the two by a coefficient and then adding the two; make itOptimizing the loss function with a gradient descent algorithm; overall style migration by jointly minimizing>
Figure FDA00041054048600000116
Figure FDA00041054048600000117
The Euclidean distance between the three is calculated; the overall loss function is as follows:
Figure FDA00041054048600000118
where α and β are weight factors for content and style reconstruction, respectively;
and (6) repeating the step (5) to carry out 10 times of iterative computation so as to enable the output picture obtained in the step (2) to be output
Figure FDA00041054048600000119
The output after an iteration->
Figure FDA00041054048600000120
Similar to the content picture as much as possible in content and similar to the genre picture in genre;
and (7) finally, putting the segmented main body back on the background subjected to style migration.
2. The method for migrating the background style of the picture based on semantic segmentation and deep learning according to claim 1, wherein the specific method in the step (1) is as follows:
selecting a picture with a definite main body as a content picture, and performing semantic segmentation on the content picture by adopting a U-Net network, wherein the backbone network of the U-Net is selected as ResNet-18; setting the number of output classes of the network to 2, defining the softmax function required for classification:
Figure FDA0004105404860000021
in the formula, ak (x) represents the score of the feature channel corresponding to each pixel point, K is the number of classes, and p k (x) The soft max function is used for classifying and outputting the calculation result after the neural network calculation is completed, before the neural network passes through the soft max function, the data is represented as a one-dimensional activated data body, the output result after the soft max function is mapped to the corresponding category, and different marks of a main body and a background are formed; after the main body is divided, subtracting the main body by using the full image to obtain a content image which removes the main body and only leaves the background;
selecting a style picture of which the background style needs to be obtained, and selecting a picture with obvious and strong style as the style picture in order to obtain a better effect;
and finally, cutting the content picture and the style picture into the same size to obtain the preprocessed inner picture and the preprocessed style picture.
3. The method for migrating the background style of the picture based on semantic segmentation and deep learning according to claim 2, wherein the specific method in the step (3) is as follows:
inputting style pictures into a VGG-19 network and marking style pictures as
Figure FDA0004105404860000025
To obtain a representation of the input image style, a feature space for capturing texture information is employed; the feature space is established on the filter responses of any layer of the network and is composed of feature correlations among different filter responses, wherein the expected value occupies the spatial range of the feature mapping; the characteristic correlation is based on the Gram matrix->
Figure FDA0004105404860000022
Give, wherein>
Figure FDA0004105404860000023
Is the inner product between the vectorized feature maps i and j in layer l;
through inner product operation, style representation of the input image can be obtained; visualizing these stylistic feature space captured information built on different levels of the network by constructing images that match the stylistic representation of a given input image; by pairs
Figure FDA0004105404860000024
Performing gradient descent to realize style migration, wherein a loss function is defined as the mean square distance between the Gram matrix item of the style picture and the Gram matrix item of the image to be generated;
A l and G l Are respectively provided with
Figure FDA0004105404860000031
Is and>
Figure FDA0004105404860000032
the respective feature representation in layer l; the contribution of the first layer convolution layer to the total loss is:
Figure FDA0004105404860000033
the total loss can be expressed as
Figure FDA0004105404860000034
In the formula, w l Is the weight of each layer contributing to the total loss; with respect to activities in layer l, E l The derivative of (c) can be calculated by analysis:
Figure FDA0004105404860000035
F l to represent
Figure FDA0004105404860000036
The feature representation of all the convolution kernels in the l layer after activation; e l Relative to->
Figure FDA0004105404860000037
The gradient of the value of each pixel in (a) can be easily calculated using standard error back propagation, taking the outputs of the first, third, fifth, ninth, and thirteenth layers of the VGG-19 network as the style constraints, respectively.
4. The method for migrating the background style of the picture based on the semantic segmentation and the deep learning according to claim 3, wherein the specific method in the step (4) is as follows:
inputting content pictures into VGG-19 network and marking content pictures as
Figure FDA00041054048600000312
In order to visualize the image information encoded in the different convolutional layers, the image output in step (2) is +>
Figure FDA0004105404860000038
Performing a gradient descent operation to find another image matching the characteristic response of the original image; p l Represents->
Figure FDA0004105404860000039
Characterization in layer l, F l Represents->
Figure FDA00041054048600000310
In level i the activated representation of the features of all convolution kernels is defined, and the loss of square error between the two representations of the features is then defined>
Figure FDA00041054048600000311
The derivative of this loss with respect to the active data volume in layer l is equal to
Figure FDA0004105404860000041
From which standard error back-propagation calculations can be used with respect to the image
Figure FDA0004105404860000042
A gradient of (a); thus, the original random image can be modified until it is produced in a certain layer of the convolutional neural network that->
Figure FDA0004105404860000043
The same response, the initial random image is not a completely random noise plot, but rather the output result of step (2)>
Figure FDA0004105404860000044
The output of the tenth layer of the VGG-19 network is taken as the content constraint. />
CN202010043890.XA 2020-01-15 2020-01-15 Image background style migration method based on semantic segmentation and deep learning Active CN111242841B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010043890.XA CN111242841B (en) 2020-01-15 2020-01-15 Image background style migration method based on semantic segmentation and deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010043890.XA CN111242841B (en) 2020-01-15 2020-01-15 Image background style migration method based on semantic segmentation and deep learning

Publications (2)

Publication Number Publication Date
CN111242841A CN111242841A (en) 2020-06-05
CN111242841B true CN111242841B (en) 2023-04-18

Family

ID=70879562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010043890.XA Active CN111242841B (en) 2020-01-15 2020-01-15 Image background style migration method based on semantic segmentation and deep learning

Country Status (1)

Country Link
CN (1) CN111242841B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111986302A (en) * 2020-07-23 2020-11-24 北京石油化工学院 Image style migration method and device based on deep learning
CN112686906B (en) * 2020-12-25 2022-06-14 山东大学 Image segmentation method and system based on uniform distribution migration guidance
CN112819873B (en) * 2021-02-05 2022-06-24 四川大学 High-generalization cross-domain road scene semantic segmentation method and system
CN113344771B (en) * 2021-05-20 2023-07-25 武汉大学 Multifunctional image style migration method based on deep learning
CN113570500A (en) * 2021-08-04 2021-10-29 光华临港工程应用技术研发(上海)有限公司 Method for realizing real image style migration based on global information guide network
CN113868651B (en) * 2021-09-27 2024-04-26 中国石油大学(华东) Web log-based website anticreeper method
CN114239116B (en) * 2021-12-21 2022-07-12 盈嘉互联(北京)科技有限公司 BIM design recommendation method based on style migration
CN115511700B (en) * 2022-09-15 2024-03-05 南京栢拓视觉科技有限公司 Material style migration system with refined high-quality effect
CN115641253B (en) * 2022-09-27 2024-02-20 南京栢拓视觉科技有限公司 Material nerve style migration method for improving aesthetic quality of content
CN116227428B (en) * 2023-05-08 2023-07-18 中国科学技术大学 Text style migration method based on migration mode perception
CN116452414B (en) * 2023-06-14 2023-09-08 齐鲁工业大学(山东省科学院) Image harmony method and system based on background style migration
CN118096505A (en) * 2024-04-28 2024-05-28 厦门两万里文化传媒有限公司 Commodity display picture generation method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10049297B1 (en) * 2017-03-20 2018-08-14 Beihang University Data driven method for transferring indoor scene layout and color style
CN108805803A (en) * 2018-06-13 2018-11-13 衡阳师范学院 A kind of portrait style moving method based on semantic segmentation Yu depth convolutional neural networks
CN109697690A (en) * 2018-11-01 2019-04-30 北京达佳互联信息技术有限公司 Image Style Transfer method and system
CN110111291A (en) * 2019-05-10 2019-08-09 衡阳师范学院 Based on part and global optimization blending image convolutional neural networks Style Transfer method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10049297B1 (en) * 2017-03-20 2018-08-14 Beihang University Data driven method for transferring indoor scene layout and color style
CN108805803A (en) * 2018-06-13 2018-11-13 衡阳师范学院 A kind of portrait style moving method based on semantic segmentation Yu depth convolutional neural networks
CN109697690A (en) * 2018-11-01 2019-04-30 北京达佳互联信息技术有限公司 Image Style Transfer method and system
CN110111291A (en) * 2019-05-10 2019-08-09 衡阳师范学院 Based on part and global optimization blending image convolutional neural networks Style Transfer method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭美钦 ; 江健民 ; .人脸图像风格迁移的改进算法.深圳大学学报(理工版).(第03期),全文. *

Also Published As

Publication number Publication date
CN111242841A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
CN111242841B (en) Image background style migration method based on semantic segmentation and deep learning
CN108230278B (en) Image raindrop removing method based on generation countermeasure network
CN109685819B (en) Three-dimensional medical image segmentation method based on feature enhancement
CN109886881B (en) Face makeup removal method
CN107767328A (en) The moving method and system of any style and content based on the generation of a small amount of sample
CN106462771A (en) 3D image significance detection method
CN107784288A (en) A kind of iteration positioning formula method for detecting human face based on deep neural network
CN110543916B (en) Method and system for classifying missing multi-view data
CN114581356B (en) Image enhancement model generalization method based on style migration data augmentation
CN112101364B (en) Semantic segmentation method based on parameter importance increment learning
CN113256494B (en) Text image super-resolution method
CN112884668A (en) Lightweight low-light image enhancement method based on multiple scales
CN110852935A (en) Image processing method for human face image changing with age
CN108460400A (en) A kind of hyperspectral image classification method of combination various features information
CN107967497A (en) Manuscripted Characters Identification Method based on convolutional neural networks and extreme learning machine
CN109886281A (en) One kind is transfinited learning machine color image recognition method based on quaternary number
CN117788629A (en) Image generation method, device and storage medium with style personalization
CN115953330B (en) Texture optimization method, device, equipment and storage medium for virtual scene image
CN110210562B (en) Image classification method based on depth network and sparse Fisher vector
CN110222222B (en) Multi-modal retrieval method based on deep topic self-coding model
CN116030961A (en) Traditional Chinese medicine constitution identification method and system based on multi-view tongue picture feature fusion
CN114445280A (en) Point cloud down-sampling method based on attention mechanism
WO2021137942A1 (en) Pattern generation
CN110852167A (en) Remote sensing image classification method based on optimal activation model
Fadaeddini et al. Data augmentation using fast converging CIELAB-GAN for efficient deep learning dataset generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant