CN108171649B - Image stylization method for keeping focus information - Google Patents

Image stylization method for keeping focus information Download PDF

Info

Publication number
CN108171649B
CN108171649B CN201711292746.4A CN201711292746A CN108171649B CN 108171649 B CN108171649 B CN 108171649B CN 201711292746 A CN201711292746 A CN 201711292746A CN 108171649 B CN108171649 B CN 108171649B
Authority
CN
China
Prior art keywords
image
network
loss
stylized
focus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711292746.4A
Other languages
Chinese (zh)
Other versions
CN108171649A (en
Inventor
叶武剑
徐佐腾
刘怡俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201711292746.4A priority Critical patent/CN108171649B/en
Publication of CN108171649A publication Critical patent/CN108171649A/en
Application granted granted Critical
Publication of CN108171649B publication Critical patent/CN108171649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an image stylization method for keeping focus information, which adds 'focus position difference' as a punishment item into the traditional image stylization method, namely, the sum of perception loss and focus loss is used as total loss, and the Adam algorithm is used for adjusting the weight of an image conversion network to obtain an optimized network; after a certain picture is input into the optimized network, an image with original image focus information reserved is generated, and the style is blended more naturally. The method not only enables the generated stylized graph to still keep the main semantic content of the original graph, keeps the focus information of the graph, but also avoids style transfer of the traditional simple texture superposition, and the effect graph can better highlight the original graph theme.

Description

Image stylization method for keeping focus information
Technical Field
The invention relates to the technical field of image processing and deep learning, in particular to an image stylizing method for keeping focus information.
Background
The method comprises the steps of generating an image based on a residual error neural network, calculating perception loss by comparing the generated image with an original image and a feature map obtained when a style map passes through a VGG network, and training the residual error neural network through back propagation, so that the residual error neural network generates a picture which meets the requirements and has a certain specific style and content. To get the perceptual loss, two parts of the loss need to be calculated: one part is to compare the characteristics of the original image and the generated image in the high level in the VGG network to obtain the content loss; and the other part is to compare the style picture with the low-level characteristics of the generated picture in the VGG network to obtain the style loss.
For example, in document 1(Johnson J, Alahi A, Li F. Perceptial Losses for Real-Time Style Transfer and Super-Resolution [ M ].2016.), an image difference calculation method called "perceptual loss" is discussed. This method does not directly compare the difference between the pixels of two pictures, but compares the difference in the features that the pictures generate when passing through a neural network. The method is used for comparing style texture information with high dimension of the image with shape outline information, so that the perception loss is calculated, and finally a neural network which can add a certain specific style to any picture is trained.
For example, in document 2(Gatys L A, Ecker A S, Bethge M.A Neural Algorithm of aromatic Style [ J ]. Computer Science,2015.), a method is discussed in which a picture of randomly initialized pixels is continuously modified by using a gradient descent method to minimize the loss of the picture after passing through a trained Neural network, and finally, an image with a given Style and content fused is obtained. In the gradient descent method of document 2, VGG-19 is used as a neural network for calculating loss, the network is modified, information of a target content image and a genre image is recorded in one forward propagation, and after the modified image is subjected to one forward propagation as a network input, a difference between the image and the target image is obtained, loss and gradient are calculated, and the target image is modified.
In the method of document 1, a residual neural network is obtained by using perceptual loss training, and the network binds a specific image style. And inputting a network needing style conversion into the network, and obtaining the stylized version of the image after one-time forward propagation. However, the stylized picture obtained by using the method is a whole picture, and is nondifferential and weightless stylized, and is similar to the way that the texture information in the target stylized picture is simply superposed into the picture to be converted. The stylization effect is relatively general.
The image stylization process of document 2 directly acts on the target image, and in the process of generating a stylized image, it generally requires several hundred forward and backward propagation processes, and an image whose pixels are randomly initialized is continuously modified by a gradient descent method until the image approaches an expected requirement. This method has the same disadvantage as document 1 in that the stylization is overall, and there is no difference and no emphasis on stylization. Moreover, because one stylized image is generated each time, the process of forward and backward propagation is required for many times, and the time for generating the stylized image is large.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a stylizing method capable of keeping focus information of an image, so that the stylized image does not lose information which is originally intended to be expressed, has focus and is emphasized.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: the method comprises the following steps:
s1, building a residual error neural network as an image conversion network;
the residual neural network has 12 layers, including 5 residual neural network blocks (residual blocks), each containing two convolutional layers with a convolution kernel of 3 × 3 size. The method has strong expression capability and can record the information of the target style image. Only one style is designated during each training, and then a large number of images with different contents are input into the network to obtain stylized images, so that the image conversion network is trained into a network which records a target style and can stylize any content image.
S2, sending the image to be processed into an image conversion network to obtain a stylized image;
s3, using a VGG network as a perception loss network, firstly inputting a target style image into the network, capturing target style information, then respectively sending the image to be processed and the generated stylized image into the network, and calculating to obtain perception loss;
the loss is composed of two parts, one part represents the difference between the content contour of the generated image and the content contour of the original image, and is called content loss; part of the representation of the generated image and the target style sheet in the tone texture difference, called style loss;
s4, respectively carrying out focus loss network on the generated stylized image and the original image, calculating a matrix product, and solving the root mean square error of the stylized image and the original image as the focus loss;
the focus loss network used in the step is a residual error neural network with an 18-layer structure, which is inconsistent with the structural layer number of the image conversion network and is trained; the weight value of the last layer of Softmax layer of the network is a 1000 multiplied by 512 matrix, and a 512-dimensional vector is obtained for each classification result; matrix multiplication operation is carried out on the vector and the activation value of the last convolution layer of the network, so that the network is implicit and special attention is paid to a certain part of the image;
s5, taking the sum of the perception loss and the focus loss as the total loss, and adjusting the weight of the image conversion network by using an Adam algorithm;
s6, taking down an image from the training set and inputting the image into the adjusted image conversion network, and repeating the steps S2 to S5 until the maximum iteration number is reached to obtain an optimized network;
and S7, inputting the pictures to be stylized into the optimization network to obtain stylized images keeping focus information.
Further, the specific steps of calculating the perceptual loss in step S3 are as follows:
s31, selecting feature maps of the four layers of relu1_2, relu2_2, relu3_3 and relu4_3 of the perception loss network as style feature maps, and selecting feature maps of a relu3_3 layer as content feature maps;
s32, firstly, the target style image is transmitted once in the perception loss network, and style characteristic diagrams of all layers are captured and stored to be used as target style characteristic diagrams in the training process;
s33, reading a picture from the data set, inputting the picture as a target content graph to the perception loss network, capturing and storing the content characteristic graph as a target content characteristic graph of the training;
s34, inputting the pictures read in the step S33 into an image conversion network to obtain a generated stylized graph; inputting the picture into a perception loss network to respectively obtain a content characteristic diagram and a style characteristic diagram of the picture;
s35, calculating the mean square error between the content characteristic diagram of the formatting diagram in the step S34 and the target content characteristic diagram in the step S33 as the content loss part of the perception loss;
s36, calculating the mean square error between the style characteristic diagram of the formatting diagram in the step S34 and the target style characteristic diagram in the step S32 to be used as the style loss part of the perception loss;
s37, assuming that the size of the feature map of the j-th layer is C × H × W, the perceptual loss calculation formula is as follows:
Figure BDA0001499683770000041
and S38, adding the losses of the layers to obtain the total sensing loss.
Further, the specific steps of calculating the focus loss in step S4 are as follows:
s41, extracting the weight value of the last Softmax layer of the focus loss network;
s42, taking out a content graph from the data set and obtaining a stylized graph obtained after the content graph passes through an image conversion network; then, scaling and normalizing the content graph and the generated stylized graph;
s43, respectively propagating the preprocessed content graph and the preprocessed stylized graph in the focus loss network once in the forward direction to obtain a classification result of the corresponding picture and an activation value of the last convolutional layer;
s44, extracting corresponding vectors from the weight data in the step S41 according to the index values of the classification results, and performing matrix multiplication operation on the vectors and the activation values in the step S43 to obtain initial focus information corresponding to the content graph and the stylized graph;
s45, scaling the initial focus information to the size of the content map and normalizing the size to be between 0 and 256 to obtain a focus positioning map corresponding to the content map and the stylized map;
s46, calculating the difference between the content graph and the focus positioning graph of the stylized graph to obtain the focus loss, wherein the calculation formula of the focus loss is as follows:
Figure BDA0001499683770000051
compared with the prior art, the principle of the scheme is as follows:
in the traditional image stylization method, only the perception loss needs to be calculated, and then a residual error neural network (image conversion network) is trained through back propagation, so that the image conversion network generates a picture which meets the requirements and has a certain specific style and content; according to the scheme, the focus position difference is added into a traditional image stylization method as a punishment item, namely the sum of the perception loss and the focus loss is used as the total loss, and the weight of the image conversion network is adjusted by using an Adam algorithm to obtain an optimized network; after a picture is input into the optimized network, a stylized image which does not change the focus information of the original picture and is not superimposed by simple textures is generated.
Compared with the prior art, the scheme has the following two advantages:
1. the generated stylized graph still retains the main semantic content of the original graph and keeps the focus information of the image.
2. The style transfer of the traditional simple texture superposition is avoided, and the original image theme can be more highlighted by the effect image.
Drawings
FIG. 1 is a block diagram of a method for stylizing an image that maintains focus information in accordance with the present invention;
FIG. 2 is a schematic diagram of the loss-aware network according to the present invention;
fig. 3 is a comparison of the original image, the stylized image obtained by the method proposed by Gatys et al in document 2, and the focus positioning image of the stylized image obtained by the present invention.
Detailed Description
The invention will be further illustrated with reference to specific examples:
referring to fig. 1, in the image stylizing method for maintaining focus information according to this embodiment, X is an image to be stylized, and is different in each iterative training and is also a current target content image; xs is a given, desired target style image; y is an image which is converted by the current image conversion network, integrates the content of X and the style of Xs, and keeps the focus information of X unchanged;
the method comprises the following specific steps:
s1, building a residual error neural network as an image conversion network;
s2, sending the image to be processed into an image conversion network to obtain a stylized image;
s3, using a VGG network as a perception loss network, firstly inputting a target style image into the network, capturing target style information, then respectively sending the image to be processed and the generated stylized image into the network, and calculating to obtain perception loss; the specific steps for calculating the perceptual loss are as follows:
s31, selecting feature maps of the four layers of relu1_2, relu2_2, relu3_3 and relu4_3 of the perception loss network as style feature maps, and selecting feature maps of a relu3_3 layer as content feature maps;
s32, firstly, the target style image is transmitted once in the perception loss network, and style characteristic diagrams of all layers are captured and stored to be used as target style characteristic diagrams in the training process;
s33, reading a picture from the data set, inputting the picture as a target content graph to the perception loss network, capturing and storing the content characteristic graph as a target content characteristic graph of the training;
s34, inputting the pictures read in the step S33 into an image conversion network to obtain a generated stylized graph; inputting the picture into a perception loss network to respectively obtain a content characteristic diagram and a style characteristic diagram of the picture;
s35, calculating the mean square error between the content characteristic diagram of the formatting diagram in the step S34 and the target content characteristic diagram in the step S33 as the content loss part of the perception loss;
s36, calculating the mean square error between the style characteristic diagram of the formatting diagram in the step S34 and the target style characteristic diagram in the step S32 to be used as the style loss part of the perception loss;
s37, assuming that the size of the feature map of the j-th layer is C × H × W, the perceptual loss calculation formula is as follows:
Figure BDA0001499683770000071
s38, adding the losses of the layers to obtain the total sensing loss;
s4, respectively sending the generated stylized image and the original image into a focus loss network, calculating a matrix product, and solving the root mean square error of the stylized image and the original image as the focus loss; the specific steps for calculating the focal loss are as follows:
s41, extracting the weight value of the last Softmax layer of the network by using the ResNet-18 residual error neural network as a focus loss network;
s42, taking out a content graph from the data set and obtaining a stylized graph obtained after the content graph passes through an image conversion network; then, scaling and normalizing the content graph and the generated stylized graph;
s43, respectively propagating the preprocessed content graph and the preprocessed stylized graph in the focus loss network once in the forward direction to obtain a classification result of the corresponding picture and an activation value of the last convolutional layer;
s44, extracting corresponding vectors from the weight data in the step S41 according to the index values of the classification results, and performing matrix multiplication operation on the vectors and the activation values in the step S43 to obtain initial focus information corresponding to the content graph and the stylized graph;
s45, scaling the initial focus information to the size of the content map and normalizing the size to be between 0 and 256 to obtain a focus positioning map corresponding to the content map and the stylized map;
s46, calculating the difference between the content graph and the focus positioning graph of the stylized graph to obtain the focus loss, wherein the calculation formula of the focus loss is as follows:
Figure BDA0001499683770000072
s5, taking the sum of the perception loss and the focus loss as the total loss, and adjusting the weight of the image conversion network by using an Adam algorithm;
s6, taking down an image from the training set and inputting the image into the adjusted image conversion network, and repeating the steps S2 to S5 until the maximum iteration number is reached to obtain an optimized network;
and S7, inputting the pictures to be stylized into the optimization network to obtain stylized images keeping focus information.
The method and the device solve the problems that in traditional image stylization, style transfer is simple and hard, styles cannot be well fused with original image contents, focus information of the original content images is shifted or lost, and the like. The stylized result can not only keep the main semantic information which is supposed to be expressed by the original content graph, but also be more natural, the style transfer of the traditional simple texture superposition is avoided, and the effect graph can better highlight the original graph theme.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that variations based on the shape and principle of the present invention should be covered within the scope of the present invention.

Claims (2)

1. An image stylization method that preserves focus information, characterized by: the method comprises the following steps:
s1, building a residual error neural network as an image conversion network;
s2, sending the image to be processed into an image conversion network to obtain a stylized image;
s3, using a VGG network as a perception loss network, firstly inputting a target style image into the network, capturing target style information, then respectively sending the image to be processed and the generated stylized image into the network, and calculating to obtain perception loss;
s4, respectively sending the generated stylized image and the original image into a focus loss network, calculating a matrix product, and solving the root mean square error of the stylized image and the original image as the focus loss;
s5, taking the sum of the perception loss and the focus loss as the total loss, and adjusting the weight of the image conversion network by using an Adam algorithm;
s6, taking down an image from the training set and inputting the image into the adjusted image conversion network, and repeating the steps S2 to S5 until the maximum iteration number is reached to obtain an optimized network;
s7, inputting the pictures to be stylized into an optimization network to obtain stylized images keeping focus information;
the specific steps of calculating the focus loss in step S4 are as follows:
s41, extracting the weight value of the last Softmax layer of the network by using the ResNet-18 residual error neural network as a focus loss network;
s42, taking out an original image from the data set, and obtaining a stylized graph obtained after the original image passes through an image conversion network; then, scaling and normalizing the original image and the generated stylized image;
s43, the preprocessed original image and the preprocessed stylized image are respectively transmitted once in the focus loss network, and the classification result of the corresponding image and the activation value of the last convolutional layer are obtained;
s44, extracting corresponding vectors from the weight data in the step S41 according to the index values of the classification results, and performing matrix multiplication operation on the vectors and the activation values in the step S43 to obtain initial focus information corresponding to the original image and the stylized image;
s45, zooming the initial focus information to the size of the original image, and normalizing the size to be between 0 and 256 to obtain a focus positioning image corresponding to the original image and the stylized image;
and S46, calculating the difference between the original image and the focus positioning image of the stylized image to obtain the focus loss.
2. An image stylization method that preserves focus information, as defined by claim 1, wherein: the specific steps of calculating the perceptual loss in step S3 are as follows:
s31, selecting feature maps of the four layers of relu1_2, relu2_2, relu3_3 and relu4_3 of the perception loss network as style feature maps, and selecting feature maps of a relu3_3 layer as content feature maps;
s32, the target style image is transmitted once in the perception loss network, and style characteristic diagrams of all layers are captured and stored to be used as target style characteristic diagrams in the training process;
s33, reading a picture from the data set, inputting the picture as a target content graph to the perception loss network, capturing and storing the content characteristic graph as a target content characteristic graph of the training;
s34, inputting the pictures read in the step S33 into an image conversion network to obtain a generated stylized graph; inputting the generated stylized graph into a perception loss network, and respectively obtaining a content characteristic graph and a style characteristic graph of the generated stylized graph;
s35, calculating the mean square error between the content characteristic diagram of the formatting diagram in the step S34 and the target content characteristic diagram in the step S33 as the content loss part of the perception loss;
s36, calculating the mean square error between the style characteristic diagram of the formatting diagram in the step S34 and the target style characteristic diagram in the step S32 to be used as the style loss part of the perception loss;
s37, assuming that the size of the feature map of the j-th layer is C × H × W, the perceptual loss calculation formula is as follows:
Figure FDA0003019750680000021
and S38, adding the losses of the layers to obtain the total sensing loss.
CN201711292746.4A 2017-12-08 2017-12-08 Image stylization method for keeping focus information Active CN108171649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711292746.4A CN108171649B (en) 2017-12-08 2017-12-08 Image stylization method for keeping focus information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711292746.4A CN108171649B (en) 2017-12-08 2017-12-08 Image stylization method for keeping focus information

Publications (2)

Publication Number Publication Date
CN108171649A CN108171649A (en) 2018-06-15
CN108171649B true CN108171649B (en) 2021-08-17

Family

ID=62525490

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711292746.4A Active CN108171649B (en) 2017-12-08 2017-12-08 Image stylization method for keeping focus information

Country Status (1)

Country Link
CN (1) CN108171649B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109144641B (en) * 2018-08-14 2021-11-02 四川虹美智能科技有限公司 Method and device for displaying image through refrigerator display screen
CN109345446B (en) * 2018-09-18 2022-12-02 西华大学 Image style transfer algorithm based on dual learning
CN109559363B (en) * 2018-11-23 2023-05-23 杭州网易智企科技有限公司 Image stylization processing method and device, medium and electronic equipment
CN111860823B (en) * 2019-04-30 2024-06-11 北京市商汤科技开发有限公司 Neural network training method, neural network image processing method, neural network training device, neural network image processing equipment and storage medium
TWI730467B (en) * 2019-10-22 2021-06-11 財團法人工業技術研究院 Method of transforming image and network for transforming image
CN111160138A (en) * 2019-12-11 2020-05-15 杭州电子科技大学 Fast face exchange method based on convolutional neural network
WO2022204868A1 (en) * 2021-03-29 2022-10-06 深圳高性能医疗器械国家研究院有限公司 Method for correcting image artifacts on basis of multi-constraint convolutional neural network
CN113469923B (en) * 2021-05-28 2024-05-24 北京达佳互联信息技术有限公司 Image processing method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006009257A1 (en) * 2004-07-23 2006-01-26 Matsushita Electric Industrial Co., Ltd. Image processing device and image processing method
CN105913377A (en) * 2016-03-24 2016-08-31 南京大学 Image splicing method for reserving image correlation information
CN106952224A (en) * 2017-03-30 2017-07-14 电子科技大学 A kind of image style transfer method based on convolutional neural networks
CN107292875A (en) * 2017-06-29 2017-10-24 西安建筑科技大学 A kind of conspicuousness detection method based on global Local Feature Fusion

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090310863A1 (en) * 2008-06-11 2009-12-17 Gallagher Andrew C Finding image capture date of hardcopy medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006009257A1 (en) * 2004-07-23 2006-01-26 Matsushita Electric Industrial Co., Ltd. Image processing device and image processing method
CN105913377A (en) * 2016-03-24 2016-08-31 南京大学 Image splicing method for reserving image correlation information
CN106952224A (en) * 2017-03-30 2017-07-14 电子科技大学 A kind of image style transfer method based on convolutional neural networks
CN107292875A (en) * 2017-06-29 2017-10-24 西安建筑科技大学 A kind of conspicuousness detection method based on global Local Feature Fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Perceptual Losses for Real-Time Style Transfer and Super-Resolution;Justin Johnson 等;《ECCV 2016: Computer Vision – ECCV 201》;20160917;第694-711页 *

Also Published As

Publication number Publication date
CN108171649A (en) 2018-06-15

Similar Documents

Publication Publication Date Title
CN108171649B (en) Image stylization method for keeping focus information
CN113240580B (en) Lightweight image super-resolution reconstruction method based on multi-dimensional knowledge distillation
CN109344288B (en) Video description combining method based on multi-modal feature combining multi-layer attention mechanism
CN109977942B (en) Scene character recognition method based on scene classification and super-resolution
CN111324774B (en) Video duplicate removal method and device
CN110634170B (en) Photo-level image generation method based on semantic content and rapid image retrieval
CN112418292B (en) Image quality evaluation method, device, computer equipment and storage medium
Li et al. Context-aware semantic inpainting
Liu et al. Effective image super resolution via hierarchical convolutional neural network
CN114418853B (en) Image super-resolution optimization method, medium and equipment based on similar image retrieval
US11803950B2 (en) Universal style transfer using multi-scale feature transform and user controls
CN114549913A (en) Semantic segmentation method and device, computer equipment and storage medium
Tang et al. Attribute-guided sketch generation
Xing et al. Few-shot single-view 3d reconstruction with memory prior contrastive network
Wan et al. Generative adversarial learning for detail-preserving face sketch synthesis
Li et al. High-resolution network for photorealistic style transfer
CN115187456A (en) Text recognition method, device, equipment and medium based on image enhancement processing
CN117576248B (en) Image generation method and device based on gesture guidance
EP4075328A1 (en) Method and device for classifying and searching for a 3d model on basis of deep attention
CN116740069B (en) Surface defect detection method based on multi-scale significant information and bidirectional feature fusion
Ma et al. SwinFG: A fine-grained recognition scheme based on swin transformer
Ueno et al. Continuous and gradual style changes of graphic designs with generative model
CN117315090A (en) Cross-modal style learning-based image generation method and device
CN116469172A (en) Bone behavior recognition video frame extraction method and system under multiple time scales
CN114037644B (en) Artistic word image synthesis system and method based on generation countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant