CN108171649B

CN108171649B - Image stylization method for keeping focus information

Info

Publication number: CN108171649B
Application number: CN201711292746.4A
Authority: CN
Inventors: 叶武剑; 徐佐腾; 刘怡俊
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2017-12-08
Filing date: 2017-12-08
Publication date: 2021-08-17
Anticipated expiration: 2037-12-08
Also published as: CN108171649A

Abstract

The invention relates to an image stylization method for keeping focus information, which adds 'focus position difference' as a punishment item into the traditional image stylization method, namely, the sum of perception loss and focus loss is used as total loss, and the Adam algorithm is used for adjusting the weight of an image conversion network to obtain an optimized network; after a certain picture is input into the optimized network, an image with original image focus information reserved is generated, and the style is blended more naturally. The method not only enables the generated stylized graph to still keep the main semantic content of the original graph, keeps the focus information of the graph, but also avoids style transfer of the traditional simple texture superposition, and the effect graph can better highlight the original graph theme.

Description

Image stylization method for keeping focus information

Technical Field

The invention relates to the technical field of image processing and deep learning, in particular to an image stylizing method for keeping focus information.

Background

The method comprises the steps of generating an image based on a residual error neural network, calculating perception loss by comparing the generated image with an original image and a feature map obtained when a style map passes through a VGG network, and training the residual error neural network through back propagation, so that the residual error neural network generates a picture which meets the requirements and has a certain specific style and content. To get the perceptual loss, two parts of the loss need to be calculated: one part is to compare the characteristics of the original image and the generated image in the high level in the VGG network to obtain the content loss; and the other part is to compare the style picture with the low-level characteristics of the generated picture in the VGG network to obtain the style loss.

For example, in document 1(Johnson J, Alahi A, Li F. Perceptial Losses for Real-Time Style Transfer and Super-Resolution [ M ].2016.), an image difference calculation method called "perceptual loss" is discussed. This method does not directly compare the difference between the pixels of two pictures, but compares the difference in the features that the pictures generate when passing through a neural network. The method is used for comparing style texture information with high dimension of the image with shape outline information, so that the perception loss is calculated, and finally a neural network which can add a certain specific style to any picture is trained.

For example, in document 2(Gatys L A, Ecker A S, Bethge M.A Neural Algorithm of aromatic Style [ J ]. Computer Science,2015.), a method is discussed in which a picture of randomly initialized pixels is continuously modified by using a gradient descent method to minimize the loss of the picture after passing through a trained Neural network, and finally, an image with a given Style and content fused is obtained. In the gradient descent method of document 2, VGG-19 is used as a neural network for calculating loss, the network is modified, information of a target content image and a genre image is recorded in one forward propagation, and after the modified image is subjected to one forward propagation as a network input, a difference between the image and the target image is obtained, loss and gradient are calculated, and the target image is modified.

In the method of document 1, a residual neural network is obtained by using perceptual loss training, and the network binds a specific image style. And inputting a network needing style conversion into the network, and obtaining the stylized version of the image after one-time forward propagation. However, the stylized picture obtained by using the method is a whole picture, and is nondifferential and weightless stylized, and is similar to the way that the texture information in the target stylized picture is simply superposed into the picture to be converted. The stylization effect is relatively general.

The image stylization process of document 2 directly acts on the target image, and in the process of generating a stylized image, it generally requires several hundred forward and backward propagation processes, and an image whose pixels are randomly initialized is continuously modified by a gradient descent method until the image approaches an expected requirement. This method has the same disadvantage as document 1 in that the stylization is overall, and there is no difference and no emphasis on stylization. Moreover, because one stylized image is generated each time, the process of forward and backward propagation is required for many times, and the time for generating the stylized image is large.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a stylizing method capable of keeping focus information of an image, so that the stylized image does not lose information which is originally intended to be expressed, has focus and is emphasized.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: the method comprises the following steps:

s1, building a residual error neural network as an image conversion network;

the residual neural network has 12 layers, including 5 residual neural network blocks (residual blocks), each containing two convolutional layers with a convolution kernel of 3 × 3 size. The method has strong expression capability and can record the information of the target style image. Only one style is designated during each training, and then a large number of images with different contents are input into the network to obtain stylized images, so that the image conversion network is trained into a network which records a target style and can stylize any content image.

S2, sending the image to be processed into an image conversion network to obtain a stylized image;

s3, using a VGG network as a perception loss network, firstly inputting a target style image into the network, capturing target style information, then respectively sending the image to be processed and the generated stylized image into the network, and calculating to obtain perception loss;

the loss is composed of two parts, one part represents the difference between the content contour of the generated image and the content contour of the original image, and is called content loss; part of the representation of the generated image and the target style sheet in the tone texture difference, called style loss;

s4, respectively carrying out focus loss network on the generated stylized image and the original image, calculating a matrix product, and solving the root mean square error of the stylized image and the original image as the focus loss;

the focus loss network used in the step is a residual error neural network with an 18-layer structure, which is inconsistent with the structural layer number of the image conversion network and is trained; the weight value of the last layer of Softmax layer of the network is a 1000 multiplied by 512 matrix, and a 512-dimensional vector is obtained for each classification result; matrix multiplication operation is carried out on the vector and the activation value of the last convolution layer of the network, so that the network is implicit and special attention is paid to a certain part of the image;

s5, taking the sum of the perception loss and the focus loss as the total loss, and adjusting the weight of the image conversion network by using an Adam algorithm;

s6, taking down an image from the training set and inputting the image into the adjusted image conversion network, and repeating the steps S2 to S5 until the maximum iteration number is reached to obtain an optimized network;

and S7, inputting the pictures to be stylized into the optimization network to obtain stylized images keeping focus information.

Further, the specific steps of calculating the perceptual loss in step S3 are as follows:

s31, selecting feature maps of the four layers of relu1_2, relu2_2, relu3_3 and relu4_3 of the perception loss network as style feature maps, and selecting feature maps of a relu3_3 layer as content feature maps;

s32, firstly, the target style image is transmitted once in the perception loss network, and style characteristic diagrams of all layers are captured and stored to be used as target style characteristic diagrams in the training process;

s33, reading a picture from the data set, inputting the picture as a target content graph to the perception loss network, capturing and storing the content characteristic graph as a target content characteristic graph of the training;

s34, inputting the pictures read in the step S33 into an image conversion network to obtain a generated stylized graph; inputting the picture into a perception loss network to respectively obtain a content characteristic diagram and a style characteristic diagram of the picture;

s35, calculating the mean square error between the content characteristic diagram of the formatting diagram in the step S34 and the target content characteristic diagram in the step S33 as the content loss part of the perception loss;

s36, calculating the mean square error between the style characteristic diagram of the formatting diagram in the step S34 and the target style characteristic diagram in the step S32 to be used as the style loss part of the perception loss;

s37, assuming that the size of the feature map of the j-th layer is C × H × W, the perceptual loss calculation formula is as follows:

and S38, adding the losses of the layers to obtain the total sensing loss.

Further, the specific steps of calculating the focus loss in step S4 are as follows:

s41, extracting the weight value of the last Softmax layer of the focus loss network;

s42, taking out a content graph from the data set and obtaining a stylized graph obtained after the content graph passes through an image conversion network; then, scaling and normalizing the content graph and the generated stylized graph;

s43, respectively propagating the preprocessed content graph and the preprocessed stylized graph in the focus loss network once in the forward direction to obtain a classification result of the corresponding picture and an activation value of the last convolutional layer;

s44, extracting corresponding vectors from the weight data in the step S41 according to the index values of the classification results, and performing matrix multiplication operation on the vectors and the activation values in the step S43 to obtain initial focus information corresponding to the content graph and the stylized graph;

s45, scaling the initial focus information to the size of the content map and normalizing the size to be between 0 and 256 to obtain a focus positioning map corresponding to the content map and the stylized map;

s46, calculating the difference between the content graph and the focus positioning graph of the stylized graph to obtain the focus loss, wherein the calculation formula of the focus loss is as follows:

compared with the prior art, the principle of the scheme is as follows:

in the traditional image stylization method, only the perception loss needs to be calculated, and then a residual error neural network (image conversion network) is trained through back propagation, so that the image conversion network generates a picture which meets the requirements and has a certain specific style and content; according to the scheme, the focus position difference is added into a traditional image stylization method as a punishment item, namely the sum of the perception loss and the focus loss is used as the total loss, and the weight of the image conversion network is adjusted by using an Adam algorithm to obtain an optimized network; after a picture is input into the optimized network, a stylized image which does not change the focus information of the original picture and is not superimposed by simple textures is generated.

Compared with the prior art, the scheme has the following two advantages:

1. the generated stylized graph still retains the main semantic content of the original graph and keeps the focus information of the image.

2. The style transfer of the traditional simple texture superposition is avoided, and the original image theme can be more highlighted by the effect image.

Drawings

FIG. 1 is a block diagram of a method for stylizing an image that maintains focus information in accordance with the present invention;

FIG. 2 is a schematic diagram of the loss-aware network according to the present invention;

fig. 3 is a comparison of the original image, the stylized image obtained by the method proposed by Gatys et al in document 2, and the focus positioning image of the stylized image obtained by the present invention.

Detailed Description

The invention will be further illustrated with reference to specific examples:

referring to fig. 1, in the image stylizing method for maintaining focus information according to this embodiment, X is an image to be stylized, and is different in each iterative training and is also a current target content image; xs is a given, desired target style image; y is an image which is converted by the current image conversion network, integrates the content of X and the style of Xs, and keeps the focus information of X unchanged;

the method comprises the following specific steps:

s1, building a residual error neural network as an image conversion network;

s3, using a VGG network as a perception loss network, firstly inputting a target style image into the network, capturing target style information, then respectively sending the image to be processed and the generated stylized image into the network, and calculating to obtain perception loss; the specific steps for calculating the perceptual loss are as follows:

s38, adding the losses of the layers to obtain the total sensing loss;

s4, respectively sending the generated stylized image and the original image into a focus loss network, calculating a matrix product, and solving the root mean square error of the stylized image and the original image as the focus loss; the specific steps for calculating the focal loss are as follows:

s41, extracting the weight value of the last Softmax layer of the network by using the ResNet-18 residual error neural network as a focus loss network;

The method and the device solve the problems that in traditional image stylization, style transfer is simple and hard, styles cannot be well fused with original image contents, focus information of the original content images is shifted or lost, and the like. The stylized result can not only keep the main semantic information which is supposed to be expressed by the original content graph, but also be more natural, the style transfer of the traditional simple texture superposition is avoided, and the effect graph can better highlight the original graph theme.

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that variations based on the shape and principle of the present invention should be covered within the scope of the present invention.

Claims

1. An image stylization method that preserves focus information, characterized by: the method comprises the following steps:

s1, building a residual error neural network as an image conversion network;

s4, respectively sending the generated stylized image and the original image into a focus loss network, calculating a matrix product, and solving the root mean square error of the stylized image and the original image as the focus loss;

s7, inputting the pictures to be stylized into an optimization network to obtain stylized images keeping focus information;

the specific steps of calculating the focus loss in step S4 are as follows:

s42, taking out an original image from the data set, and obtaining a stylized graph obtained after the original image passes through an image conversion network; then, scaling and normalizing the original image and the generated stylized image;

s43, the preprocessed original image and the preprocessed stylized image are respectively transmitted once in the focus loss network, and the classification result of the corresponding image and the activation value of the last convolutional layer are obtained;

s44, extracting corresponding vectors from the weight data in the step S41 according to the index values of the classification results, and performing matrix multiplication operation on the vectors and the activation values in the step S43 to obtain initial focus information corresponding to the original image and the stylized image;

s45, zooming the initial focus information to the size of the original image, and normalizing the size to be between 0 and 256 to obtain a focus positioning image corresponding to the original image and the stylized image;

and S46, calculating the difference between the original image and the focus positioning image of the stylized image to obtain the focus loss.

2. An image stylization method that preserves focus information, as defined by claim 1, wherein: the specific steps of calculating the perceptual loss in step S3 are as follows:

s32, the target style image is transmitted once in the perception loss network, and style characteristic diagrams of all layers are captured and stored to be used as target style characteristic diagrams in the training process;

s34, inputting the pictures read in the step S33 into an image conversion network to obtain a generated stylized graph; inputting the generated stylized graph into a perception loss network, and respectively obtaining a content characteristic graph and a style characteristic graph of the generated stylized graph;

and S38, adding the losses of the layers to obtain the total sensing loss.