CN115330590A

CN115330590A - Image style migration method and system

Info

Publication number: CN115330590A
Application number: CN202211019889.9A
Authority: CN
Inventors: 刘纯平; 石涤波; 陈哲恺; 季怡; 李蓥
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2022-11-11
Anticipated expiration: 2042-08-24
Also published as: CN115330590B

Abstract

The invention discloses an image style migration method and system, comprising the following steps: s1, respectively inputting a content image and a style image into two encoders, and respectively obtaining a content characteristic and a style characteristic; s2, respectively inputting the obtained content features and the obtained style features into two SPNet pyramid networks for semantic enhancement, and respectively obtaining the enhanced content features and the enhanced style features; s3, inputting the content features and the style features before enhancement into a first feature fusion module for feature fusion, and inputting the content features and the style features after enhancement into a second feature fusion module for feature fusion; s4, adding the result of the feature fusion of the first feature fusion module and the result of the feature fusion of the second feature fusion module, and inputting the result into a first convolution layer for feature extraction to obtain stylized features; and S5, inputting the obtained stylized features into a decoder, and decoding to obtain the image with the style transferred. The image style migration method can obtain high-quality stylized images.

Description

Image style migration method and system

Technical Field

The invention relates to the technical field of image processing, in particular to an image style migration method and system.

Background

The image style migration is to consider a content image and a style image, the style migration is to obtain a new image through the content image and the style image, the new image renders style information such as colors, strokes and the like of the style image, and simultaneously, content structure information of the content image is synchronously maintained.

Existing style migration methods based on neural networks can be divided into two categories: an online neural method based on image optimization and an offline neural method based on model optimization.

Gatys et al proposed a mountain-opening work based on neural network style migration in 2015, and adopted an online neural method based on image optimization. The method is that a white noise image is input and simultaneously matched with the content characteristic representation of the content image and the style characteristic representation of the style image, and finally a stylized result is obtained.

The off-line neural method based on model optimization can be divided into 3 types, one model trains one style, one model trains multiple styles, and one model trains any style.

Most of the existing image style migration researches adopt a model to train any style, and the method originates from that Huang et al provides a self-adaptive instance normalization method in 2017, so that the problem that stylization predefinition is needed for generating model training is solved, and any style migration is realized.

The Style migration algorithm AdaIN proposed by Huang et al, 2017, is from the paper "ARBITRARY Style Transfer in Real-time with Adaptive Instrument Normalization". AdaIN is based on a forward neural network, and an author proposes a standardized Adaptive impact Normalization (AdaIN) method, so that migration of any style is realized.

2019 Park et al propose a novel style attention network (SANet) method with high efficiency, which keeps the content structure and synthesizes high-quality stylized images while balancing global and local style modes. This method is described in the Arbitrary Style Transfer with Style-extensive Networks.

The SANet architecture takes the input of content and style images from the VGG-19 encoder as a feature map, normalizes it, and converts it into a feature space to calculate the degree of attention between the content and style feature maps.

Li et al, 2019, proposed a Linear feed-forward network module (LST) to implement a high-quality arbitrary Style transition method from Learing Linear Transformations for Fast Image and Video Style Transfer.

However, the visual quality of the stylized results obtained by the above methods is poor, and mainly appears in two aspects: one is that the original content results or contours are distorted; secondly, the deep style semantic information (texture) cannot be reflected on the result.

Disclosure of Invention

The technical problem to be solved by the invention is to provide an image style migration method which has high quality and can be used for stylizing texture information in the secondary area rendering of the stylized result and maintaining the content structure or the outline of the primary area.

In order to solve the above problem, the present invention provides an image style migration method, including the steps of:

s1, respectively inputting a content image and a style image into two encoders to respectively obtain a content characteristic and a style characteristic;

s2, respectively inputting the obtained content features and the obtained style features into two SPNet pyramid networks for semantic enhancement, and respectively obtaining the enhanced content features and the enhanced style features;

s3, inputting the content features and the style features before enhancement into a first feature fusion module for feature fusion, and inputting the content features and the style features after enhancement into a second feature fusion module for feature fusion;

s4, adding the result of the feature fusion of the first feature fusion module and the result of the feature fusion of the second feature fusion module, and inputting the result into a first convolution layer for feature extraction to obtain stylized features;

and S5, inputting the obtained stylized features into a decoder, and decoding to obtain the image with the style transferred.

As a further improvement of the present invention, the two encoders divide the obtained content features and the obtained style features into 5 layers of pyramid feature maps, and the 5 layers of pyramid feature maps include feature maps in which the resolution of 5 layers is reduced by half in sequence from top to bottom and the number of channels is doubled in sequence.

As a further improvement of the invention, the SPNet pyramid network comprises a sub-pixel convolution layer and a second convolution layer, and the feature map of each lower layer in the 5 layers of pyramid feature maps is multiplied by one time and added to the feature map of the upper layer by the sub-pixel convolution layer to obtain the 4 layers of pyramid feature maps; and performing down-sampling on the feature map of the uppermost layer in the 4 layers of pyramid feature maps through the second convolution layer, adding the feature map of the next layer, continuing the down-sampling, and adding the feature map of the next layer until obtaining an enhanced feature map with the resolution consistent with that of the feature map of the lowermost layer in the 4 layers of pyramid feature maps.

As a further improvement of the present invention, the first convolutional layer is a 3 × 3 convolutional layer, and the second convolutional layer is a 1 × 1 convolutional layer.

As a further improvement of the invention, the encoder is a VGG encoder and the decoder is a VGG decoder.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of any one of the above methods when executing the program.

The invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of any of the methods described above.

The invention also provides an image style migration system, which comprises:

the two encoders are used for respectively receiving the content image and the style image and respectively obtaining the content characteristic and the style characteristic;

the two SPNet pyramid networks are used for performing semantic enhancement on the obtained content features and the obtained style features respectively and obtaining the enhanced content features and the enhanced style features respectively;

the first feature fusion module is used for performing feature fusion on the content features and the style features before enhancement;

the second characteristic fusion module is used for carrying out characteristic fusion on the enhanced content characteristic and the enhanced style characteristic;

the adding module is used for adding the result obtained after the characteristic fusion of the first SAFF module and the result obtained after the characteristic fusion of the second SAFF module;

the first convolution layer is used for carrying out feature extraction after addition to obtain stylized features;

and the decoder is used for receiving the stylized features and decoding the stylized features to obtain the image with the style transferred.

As a further improvement of the present invention, the two encoders divide the obtained content features and the obtained style features into 5 layers of pyramid feature maps, where the 5 layers of pyramid feature maps include feature maps in which the resolution of 5 layers from top to bottom is reduced by half in sequence, and the number of channels is doubled in sequence.

As a further improvement of the present invention, the SPNet pyramid network comprises:

the sub-pixel convolution layer is used for amplifying each lower layer of feature map in the 5 layers of pyramid feature maps by one time and adding the feature map to the upper layer of feature map to obtain 4 layers of pyramid feature maps;

and the second convolution layer is used for performing down-sampling on the feature map of the uppermost layer in the 4 layers of pyramid feature maps, adding the feature map of the next layer, continuing the down-sampling, and adding the feature map of the next layer until obtaining an enhanced feature map with the resolution consistent with the resolution of the feature map of the lowermost layer in the 4 layers of pyramid feature maps.

The invention has the beneficial effects that:

the image style migration method of the invention renders the style texture information in the secondary area (such as background) of the stylized result, and maintains the content structure or outline of the main area, thereby obtaining the high-quality image after style migration.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understood, the following preferred embodiments are described in detail with reference to the accompanying drawings.

Drawings

FIG. 1 is a flow chart of an image style migration method in a preferred embodiment of the present invention;

FIG. 2 is a schematic diagram of an image style migration method in a preferred embodiment of the present invention;

FIG. 3 is a schematic diagram of an SPNet pyramid network in a preferred embodiment of the present invention;

FIG. 4 is a schematic diagram of a subpixel convolution layer in a preferred embodiment of the present invention;

FIG. 5 is a visualization of a feature map before and after an SPNet pyramid network in a preferred embodiment of the present invention;

FIG. 6 is a stylized image resulting from an image style migration methodology in a preferred embodiment of the present invention;

FIG. 7 is a comparison of a stylized image resulting from an image style migration methodology in accordance with a preferred embodiment of the present invention with a prior art methodology.

Detailed Description

The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.

Example one

As shown in fig. 1-2, a preferred embodiment of the present invention discloses an image style migration method, which comprises the following steps:

s1, drawing a content graphInputting the image and the style image into two encoders respectively, and obtaining content characteristics and style characteristics respectively; optionally, the encoder is a VGG encoder. Specifically, the content image I _C And a style image I _S Respectively inputting the two VGG encoders to respectively obtain the content characteristics F _C And style feature F _S 。

s4, adding the result obtained after feature fusion of the first feature fusion module and the result obtained after feature fusion of the second feature fusion module, and inputting the result into the first convolution layer for feature extraction to obtain stylized features; optionally, the first convolutional layer is a 3 × 3 convolutional layer.

And S5, inputting the obtained stylized features into a decoder, and decoding to obtain the image with the style transferred. Optionally, the decoder is a VGG decoder.

Optionally, as shown in fig. 3, the two encoders divide the obtained content features and the obtained style features into 5 layers of pyramid feature maps, where the 5 layers of pyramid feature maps include feature maps in which the resolution of 5 layers from top to bottom is reduced by half in sequence, and the number of channels is doubled in sequence.

The SPNet pyramid network comprises sub-pixel convolution layers (sub-pixel convolution) and second convolution layers, and the sub-pixel convolution layers amplify the feature map of each lower layer in the pyramid feature maps of the 5 layers by one time and add the feature map of the upper layer to obtain the pyramid feature maps of the 4 layers; and performing down-sampling on the feature map of the uppermost layer in the 4 layers of pyramid feature maps through the second convolution layer, adding the feature map of the next layer, continuing the down-sampling, and adding the feature map of the next layer until obtaining an enhanced feature map with the resolution consistent with that of the feature map of the lowermost layer in the 4 layers of pyramid feature maps. Optionally, the second convolutional layer is a 1 × 1 convolutional layer.

As shown in fig. 4, the specific resolution and dimension change process of the feature map as it passes through the sub-pixel convolution layer can be seen. First, the input low resolution Feature map (LR Feature) is assumed to be w × h × C, and the size is enlarged to 2w × 2h by sub-pixel convolution (sub-pixel convolution), but the number of channels is reduced by a factor of four (C/4). Finally, a feature map with the size of 2w multiplied by 2h multiplied by C/2 is obtained through convolution with the size of 1 multiplied by 1.

As shown in fig. 5, column 2 is the feature map visualization before SPNet, and column 3 is the enhanced feature map visualization after SPNet. It can be clearly seen that the feature map contains more fine semantic information after SPNet.

As shown in fig. 6, column 2 is a visualization of the 5 th layer signature graph in the SPNet (visible to contain important information). Column 3 is the stylized result of an SPNet that does not contain the layer 5 feature map, and column four is the stylized result of a complete SPNet, a comparison showing that the third column girl's nose is distorted by style texture.

As shown in fig. 7, it can be observed that the visual quality of the stylized results obtained by the prior art (columns 3,4,5, 6) is poor, mainly in two respects: one is that the original content results or contours are distorted; secondly, the deep style semantic information (texture) can not be reflected on the result. The last column is the stylized result obtained by the SPNet method, and the quality is obviously better.

Example two

The embodiment discloses an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to implement the steps of the image style migration method in the first embodiment.

EXAMPLE III

The present embodiment discloses a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the image style migration method in the first embodiment.

Example four

The embodiment discloses an image style migration system, which comprises:

and the decoder is used for receiving the stylized features and decoding the stylized features to obtain the images with the styles after the styles are migrated.

Optionally, the two encoders divide the obtained content features and the obtained style features into 5 layers of pyramid feature maps, and the 5 layers of pyramid feature maps include feature maps in which the resolution of 5 layers from top to bottom is reduced by half in sequence and the number of channels is doubled in sequence.

The SPNet pyramid network includes:

the sub-pixel convolution layer is used for multiplying the feature map of each lower layer in the 5 layers of pyramid feature maps by one time and adding the feature map of the upper layer to obtain a 4-layer pyramid feature map;

and the second convolution layer is used for performing down-sampling on the feature map of the uppermost layer in the 4 layers of pyramid feature maps, then adding the feature map of the next layer, continuing the down-sampling, and then adding the feature map of the next layer until obtaining an enhanced feature map with the resolution consistent with the resolution of the feature map of the lowermost layer in the 4 layers of pyramid feature maps.

The image style migration system of the embodiment of the present invention is used for implementing the foregoing image style migration method, and therefore, a specific implementation of the system can be seen in the foregoing embodiment section of the image style migration method, and therefore, the specific implementation thereof may refer to the description of the corresponding respective section embodiments, and is not described herein again.

In addition, since the image style migration system of the present embodiment is used for implementing the foregoing image style migration method, the role thereof corresponds to that of the foregoing method, and details thereof are not repeated here.

The above embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Claims

1. An image style migration method is characterized by comprising the following steps:

s1, respectively inputting a content image and a style image into two encoders, and respectively obtaining a content characteristic and a style characteristic;

2. The image style migration method according to claim 1, wherein the two encoders divide the obtained content features and the obtained style features into 5 layers of pyramid feature maps, and the 5 layers of pyramid feature maps comprise feature maps in which the resolution of 5 layers from top to bottom is reduced by half in sequence and the number of channels is doubled in sequence.

3. The image style migration method according to claim 2, wherein the SPNet pyramid network comprises a sub-pixel convolution layer and a second convolution layer, and the sub-pixel convolution layer multiplies the feature map of each lower layer in the 5 layers of pyramid feature maps and adds the feature map of the upper layer to obtain 4 layers of pyramid feature maps; and performing down-sampling on the feature map of the uppermost layer in the 4 layers of pyramid feature maps through the second convolution layer, adding the feature map of the next layer, continuing the down-sampling, and adding the feature map of the next layer until obtaining an enhanced feature map with the resolution consistent with that of the feature map of the lowermost layer in the 4 layers of pyramid feature maps.

4. The image style migration method of claim 3, wherein the first convolutional layer is a 3 x 3 convolutional layer, and the second convolutional layer is a 1 x 1 convolutional layer.

5. The image style migration method according to claim 1, wherein the encoder is a VGG encoder and the decoder is a VGG decoder.

6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1-5 are implemented when the program is executed by the processor.

7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.

8. An image style migration system, comprising:

9. The image style migration system according to claim 8, wherein the two encoders divide the obtained content features and the obtained style features into 5-level pyramid feature maps, and the 5-level pyramid feature maps comprise feature maps in which the resolution of 5 levels from top to bottom is reduced by half in sequence and the number of channels is doubled in sequence.

10. The image style migration system of claim 9, wherein the SPNet pyramid network comprises: