CN115330590B

CN115330590B - Image style migration method and system

Info

Publication number: CN115330590B
Application number: CN202211019889.9A
Authority: CN
Inventors: 刘纯平; 石涤波; 陈哲恺; 季怡; 李蓥
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2023-07-11
Anticipated expiration: 2042-08-24
Also published as: CN115330590A

Abstract

The invention discloses an image style migration method and system, comprising the following steps: s1, respectively inputting a content image and a style image into two encoders, and respectively obtaining a content characteristic and a style characteristic; s2, inputting the obtained content features and style features into two SPNet pyramid networks respectively for semantic enhancement, and obtaining enhanced content features and style features respectively; s3, inputting the content characteristics and the style characteristics before enhancement into a first characteristic fusion module for characteristic fusion, and inputting the content characteristics and the style characteristics after enhancement into a second characteristic fusion module for characteristic fusion; s4, adding the result obtained by feature fusion of the first feature fusion module and the result obtained by feature fusion of the second feature fusion module, and inputting the result into a first convolution layer for feature extraction to obtain stylized features; s5, inputting the obtained stylized characteristics into a decoder, and decoding to obtain the image with the transited style. The image style migration method can obtain high-quality stylized images.

Description

Image style migration method and system

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image style migration method and system.

Background

The image style migration considers a content image and a style image, and the style migration obtains a new image through the content image and the style image, the new image renders style information such as color, stroke and the like of the style image, and synchronously maintains content structure information of the content image.

Existing style migration methods based on neural networks can be divided into two categories: an on-line nerve method based on image optimization and an off-line nerve method based on model optimization.

Gatys et al proposed mountain-climbing based on neural network style migration in 2015, and used an online neural method based on image optimization. The method is that a white noise image is input and simultaneously matched with a content characteristic representation of the content image and a style characteristic representation of the style image, and finally a stylized result is obtained.

The offline neural method based on model optimization can be divided into 3 types, namely one model is used for training one style, one model is used for training multiple styles, and one model is used for training any style.

Most of the existing image style migration researches adopt a model to train any style, and the method is originated from the fact that Huang et al in 2017 proposes a self-adaptive instance normalization method, so that the problem that stylized predefining is needed for generating model training is solved, and any style migration is realized.

The style migration algorithm AdaIN proposed by Huang et al in 2017 was issued in paper Arbitrary Style Transfer in Real-time withAdaptive Instance Normalization. AdaIN is based on a forward neural network, and the author proposes a standardized method Adaptive InstanceNormalization (AdaIN) to achieve migration of any style.

Park et al in 2019 propose a high-efficiency novel style attention network (SANet) method that balances global and local style patterns while preserving content structure to synthesize high quality stylized images. The method is derived from Arbitrary Style Transferwith Style-attationlnterworks.

The SANet architecture takes the input of the content and style images from the VGG-19 encoder as feature maps, normalizes them, and converts them into feature spaces to calculate the attention between the content and style feature maps.

Li et al in 2019 propose a linear feed forward network module (LST) to achieve high quality arbitrary style migration approach, which comes out of Learning Linear Transformations for Fast Image andVideo Style Transfer.

However, the stylized results obtained by the above-described methods are poor in visual quality, and are mainly expressed in two aspects: firstly, the original content result or outline is distorted; secondly, the deep style semantic information (texture) is not reflected on the result.

Disclosure of Invention

The technical problem to be solved by the invention is to provide an image style migration method which is high in quality and capable of rendering style texture information on a secondary area of a stylized result and maintaining the content structure or outline of a primary area.

In order to solve the above problems, the present invention provides an image style migration method, comprising the steps of:

s1, respectively inputting a content image and a style image into two encoders, and respectively obtaining a content characteristic and a style characteristic;

s2, inputting the obtained content features and style features into two SPNet pyramid networks respectively for semantic enhancement, and obtaining enhanced content features and style features respectively;

s3, inputting the content characteristics and the style characteristics before enhancement into a first characteristic fusion module for characteristic fusion, and inputting the content characteristics and the style characteristics after enhancement into a second characteristic fusion module for characteristic fusion;

s4, adding the result obtained by feature fusion of the first feature fusion module and the result obtained by feature fusion of the second feature fusion module, and inputting the result into a first convolution layer for feature extraction to obtain stylized features;

s5, inputting the obtained stylized characteristics into a decoder, and decoding to obtain the image with the transited style.

As a further improvement of the present invention, the two encoders divide the obtained content features and style features into 5-layer pyramid feature maps, respectively, and the 5-layer pyramid feature maps include feature maps with 5-layer resolution sizes sequentially halved from top to bottom and channel numbers sequentially doubled.

As a further improvement of the invention, the SPNet pyramid network comprises a sub-pixel convolution layer and a second convolution layer, and the sub-pixel convolution layer amplifies the feature map of each lower layer in the pyramid feature map of 5 layers by one time and then adds the amplified feature map of each lower layer with the feature map of the upper layer to obtain a pyramid feature map of 4 layers; and adding the feature map of the uppermost layer in the pyramid feature map of the 4 layers with the feature map of the next layer after downsampling by the second convolution layer, and continuing to add the feature map of the next layer after downsampling until an enhancement feature map with the same resolution as the feature map of the lowermost layer in the pyramid feature map of the 4 layers is obtained.

As a further improvement of the present invention, the first convolution layer is a 3×3 convolution layer and the second convolution layer is a 1×1 convolution layer.

As a further improvement of the present invention, the encoder is a VGG encoder, and the decoder is a VGG decoder.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any one of the methods described above when executing the program.

The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of any of the methods described above.

The invention also provides an image style migration system, which comprises:

the two encoders are used for respectively receiving the content image and the style image and respectively obtaining the content characteristics and the style characteristics;

the two SPNet pyramid networks are used for carrying out semantic enhancement on the obtained content characteristics and the wind grid characteristics respectively and obtaining enhanced content characteristics and wind grid characteristics respectively;

the first feature fusion module is used for carrying out feature fusion on the content features and the style features before enhancement;

the second feature fusion module is used for carrying out feature fusion on the enhanced content features and style features;

the adding module is used for adding the result after the feature fusion of the first SAFF module and the result after the feature fusion of the second SAFF module;

the first convolution layer is used for extracting features after addition to obtain stylized features;

and the decoder is used for receiving the stylized characteristics and decoding to obtain the image after style migration.

As a further improvement of the present invention, the SPNet pyramid network includes:

the sub-pixel convolution layer is used for amplifying one time of each lower layer of the pyramid feature images of the 5 layers and then adding the amplified lower layer of the feature images with the feature images of the upper layer to obtain pyramid feature images of 4 layers;

and the second convolution layer is used for adding the feature map of the uppermost layer in the pyramid feature map of the 4 layers with the feature map of the next layer after downsampling, and continuing to add the feature map of the next layer after downsampling until an enhancement feature map with the same resolution as the feature map of the lowermost layer in the pyramid feature map of the 4 layers is obtained.

The invention has the beneficial effects that:

the image style migration method of the invention renders style texture information on a secondary area (such as a background) of the stylized result, and maintains the content structure or outline of the primary area at the same time, thus obtaining high-quality images after style migration.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention, as well as the preferred embodiments thereof, together with the following detailed description of the invention, given by way of illustration only, together with the accompanying drawings.

Drawings

FIG. 1 is a flow chart of an image style migration method in a preferred embodiment of the present invention;

FIG. 2 is a schematic diagram of an image style migration method in a preferred embodiment of the present invention;

FIG. 3 is a schematic diagram of an SPNet pyramid network in a preferred embodiment of the present invention;

FIG. 4 is a schematic diagram of a subpixel convolutional layer in a preferred embodiment of the present invention;

Detailed Description

The present invention will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the invention and practice it.

Example 1

As shown in fig. 1-2, the preferred embodiment of the present invention discloses an image style migration method, which includes the following steps:

s1, respectively inputting a content image and a style image into two encoders, and respectively obtaining a content characteristic and a style characteristic; optionally, the encoder is a VGG encoder. Specifically, content image I _C And style image I _S Respectively inputting two VGG encoders to obtain content characteristics F _C And style characteristics F _S 。

s4, adding the result obtained by feature fusion of the first feature fusion module and the result obtained by feature fusion of the second feature fusion module, and inputting the result into a first convolution layer for feature extraction to obtain stylized features; optionally, the first convolution layer is a 3×3 convolution layer.

S5, inputting the obtained stylized characteristics into a decoder, and decoding to obtain the image with the transited style. Optionally, the decoder is a VGG decoder.

Alternatively, as shown in fig. 3, the two encoders divide the obtained content features and style features into 5-layer pyramid feature maps, respectively, where the 5-layer pyramid feature maps include feature maps with 5-layer resolution sizes sequentially halved from top to bottom, and the number of channels sequentially doubled.

The SPNet pyramid network comprises sub-pixel convolution layers (sub-pixel convolution) and a second convolution layer, wherein the sub-pixel convolution layers amplify the feature images of each lower layer in the pyramid feature images of 5 layers by one time and then add the feature images with the feature images of the upper layer to obtain pyramid feature images of 4 layers; and adding the feature map of the uppermost layer in the pyramid feature map of the 4 layers with the feature map of the next layer after downsampling by the second convolution layer, and continuing to add the feature map of the next layer after downsampling until an enhancement feature map with the same resolution as the feature map of the lowermost layer in the pyramid feature map of the 4 layers is obtained. Optionally, the second convolution layer is a 1×1 convolution layer.

As shown in fig. 4, it can be seen that the feature map passes through the subpixel convolution layer with a specific resolution and dimension change process. The first low resolution Feature map (LR Feature), assuming a size of w×h×c, is scaled up to 2w×2h by one layer of sub-pixel convolution (sub-pixel convolution), but the number of channels is reduced by a factor of four (C/4). Finally, a characteristic diagram with the size of 2w multiplied by 2h multiplied by C/2 is obtained through convolution with the size of 1 multiplied by 1.

Example two

The embodiment discloses an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the image style migration method in the first embodiment.

Example III

The present embodiment discloses a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the image style migration method described in the above embodiment one.

Example IV

The embodiment discloses an image style migration system, comprising:

Optionally, the two encoders divide the obtained content features and style features into 5-layer pyramid feature maps, respectively, where the 5-layer pyramid feature maps include feature maps with 5-layer resolution sizes sequentially halved from top to bottom and the number of channels sequentially doubled.

The SPNet pyramid network includes:

The image style migration system of the embodiment of the present invention is used for implementing the foregoing image style migration method, so that the specific implementation of the system can be seen from the foregoing example part of the image style migration method, and therefore, the specific implementation of the system can be referred to the description of the corresponding examples of the various parts, and will not be further described herein.

In addition, since the image style migration system of the present embodiment is used to implement the foregoing image style migration method, the roles thereof correspond to those of the foregoing method, and will not be described herein.

The above embodiments are merely preferred embodiments for fully explaining the present invention, and the scope of the present invention is not limited thereto. Equivalent substitutions and modifications will occur to those skilled in the art based on the present invention, and are intended to be within the scope of the present invention. The protection scope of the invention is subject to the claims.

Claims

1. An image style migration method is characterized by comprising the following steps:

s5, inputting the obtained stylized characteristics into a decoder, and decoding to obtain an image with the transited style;

the two encoders divide the obtained content characteristics and style characteristics into 5 layers of pyramid characteristic diagrams respectively, wherein the 5 layers of pyramid characteristic diagrams comprise characteristic diagrams with 5 layers of resolution sizes being sequentially halved from top to bottom and the number of channels being sequentially doubled;

the SPNet pyramid network comprises a sub-pixel convolution layer and a second convolution layer, wherein the sub-pixel convolution layer amplifies the feature map of each lower layer in the pyramid feature map of 5 layers by one time and then adds the feature map of the upper layer to obtain a pyramid feature map of 4 layers; and adding the feature map of the uppermost layer in the pyramid feature map of the 4 layers with the feature map of the next layer after downsampling by the second convolution layer, and continuing to add the feature map of the next layer after downsampling until an enhancement feature map with the same resolution as the feature map of the lowermost layer in the pyramid feature map of the 4 layers is obtained.

2. The image style migration method of claim 1, wherein the first convolution layer is a 3 x 3 convolution layer and the second convolution layer is a 1 x 1 convolution layer.

3. The image style migration method of claim 1, wherein the encoder is a VGG encoder and the decoder is a VGG decoder.

4. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1-3 when the program is executed.

5. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any of claims 1-3.

6. An image style migration system, comprising:

7. The image style migration system of claim 6, wherein the two encoders divide the obtained content features and style features into 5-layer pyramid feature maps, respectively, the 5-layer pyramid feature maps comprising feature maps with 5-layer resolution sizes sequentially halved from top to bottom, and channel numbers sequentially doubled.

8. The image style migration system of claim 7, wherein the SPNet pyramid network comprises: