CN113888400B

CN113888400B - Image style migration method and device

Info

Publication number: CN113888400B
Application number: CN202111302183.9A
Authority: CN
Inventors: 李祎; 谢鑫; 付海燕; 王波; 郭艳卿
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2021-11-04
Filing date: 2021-11-04
Publication date: 2024-04-26
Anticipated expiration: 2041-11-04
Also published as: CN113888400A

Abstract

The invention provides an image style migration method and device, wherein the method comprises the following steps: inputting the content image I _c and the wind grid image I _s into a pre-trained encoder network E for feature extraction, fusing and projecting the content features C and the wind grid features S into a hidden space Z; inputting the information of the style characteristic S into a first layer convolution of a decoder network D to obtain a demodulated first layer weight A' ₁ of the decoder; obtaining a separation matrix W based on FastICA algorithm, the separation matrix W enabling the matrixThe correlation of the vectors in' is minimal; based on the separation matrix W and matrix' Computing to obtain a demodulated semantic direction set; and editing the hidden space vector in the hidden space Z based on the acquired semantic direction, and finally acquiring the image after style migration by combining with the decoder network D. The method does not need to train a large-scale style data set or learn any parameters, and can be applied to most style migration models.

Description

Image style migration method and device

Technical Field

The invention relates to application of artificial intelligence in the fields of computer vision and image style migration, in particular to an image style migration method and device.

Background

Early image style migration techniques were narrow in the range of styles to which they were applied, and often only one algorithm was specific to one image texture type, and the migration conversion results were not ideal. With the rise of artificial intelligence and depth learning in recent years, the field of image style migration has developed many excellent achievements, and the generated stylized images are more and more lifelike. The nature of style transfer is to migrate the style of the drawing to the image of the photograph and preserve the original content of the photograph. In order to produce images having multiple styles, models often require a large set of style data. In addition to the need of large-scale data sets, most of the current style migration models adopt two methods of iterative optimization and feedforward network to improve the quality of model stylized images:

Iterative optimization (ITERATIVE OPTIMIZATION) method: the image iteration is to directly perform optimization iteration on the white noise image to realize style migration, and the optimization target is the image. Many algorithms calculate the maximum mean difference in an iterative process, measuring the difference between the style image and the content image. The two images are "aligned" so as to reduce losses and errors caused by image iterations.

Feed-forward network (feed-forward network) method: the optimization target is a neural network model, the model is updated by gradient descent to optimize the network model, and style migration is realized in a network feedforward mode.

Both methods have advantages and disadvantages. The method based on iterative optimization has the advantages of high quality, good controllability, easy parameter adjustment, long calculation time and poor real-time performance of the synthesized image. The feedforward network-based method is high in calculation speed, can be used for video rapid stylization, is a mainstream technology of industrial application software at present, but needs a large amount of training data when the image generation quality is to be further improved.

Disclosure of Invention

According to the technical problems of long calculation time and large quantity of training data, the image style migration method and device are provided. The invention mainly learns different style semantics from the hidden space of the pre-training style migration model, modifies related coding information in the hidden space along different semantic directions and decodes the related coding information to obtain images with various styles.

The invention adopts the following technical means:

An image style migration method, comprising the steps of:

acquiring a content image I _c and a style image I _s;

Inputting the content image I _c and the wind grid image I _s into a pre-trained encoder network E for feature extraction, so as to obtain a content feature C and a wind grid feature S;

fusing the content features C and the wind grid features S through mathematical operation or a convolution network, and projecting the fused image features to a hidden space Z;

Inputting the style tensor obtained after the coding of the coder to the style characteristic S, inputting the style tensor to the first layer convolution of the decoder network D, and adjusting the weight A ₁ corresponding to the first layer convolution of the decoder network based on the style characteristic S, thereby obtaining the demodulated first layer weight A' ₁ of the decoder;

obtaining a separation matrix W based on the FastICA algorithm, wherein the separation matrix W minimizes the correlation of each vector in the matrix A' ₁ ^TA′₁;

Calculating and acquiring a demodulated semantic direction set based on the separation matrix W and the matrix A' ₁ ^TA′₁;

and editing the hidden space vector in the hidden space Z based on the acquired semantic direction, and finally acquiring the image after style migration by combining with the decoder network D.

Further, fusing the content features C and the style features S, including obtaining a fusion result according to the following calculation:

wherein AdaIN (C, S) is the fusion result of the content feature C and the style feature S, sigma (S) is the standard deviation of the style feature S, mu (S) is the mean value of the style feature S, sigma (C) is the standard deviation of the content feature C, and mu (C) is the mean value of the content feature C.

Further, adjusting the weight a ₁ corresponding to the first layer convolution of the decoder network based on the style characteristic S to obtain a demodulated first layer weight a '₁, including obtaining the demodulated first layer weight a' ₁ according to the following manner:

Wherein A ₁ is the weight corresponding to the first layer convolution of the decoder network, A' ₁ is the weight of the first layer of the demodulated decoder, S is the style characteristic, and epsilon is a constant term.

Further, the demodulated semantic direction set is obtained through calculation based on the separation matrix W and the matrix A' ₁ ^TA′₁, and the demodulated semantic direction set is obtained according to the following calculation:

N＝{n₁,n₂,…,n_k}＝WA₁ ^TA′₁

where N is the demodulated semantic direction set, N _i is the i-th semantic direction, i= … k, W is the separation matrix, a '₁ is the demodulated decoder first layer weight, and a' ₁ ^T is the transpose of the demodulated decoder first layer weight.

Further, editing the hidden space vector in the hidden space Z based on the acquired semantic direction includes acquiring the hidden space vector in the hidden space Z based on the following calculation:

z′＝z+αn_i

Wherein z' is the edited hidden space vector, z is the hidden space vector, alpha is the preset style change degree, and n _i th semantic direction.

The invention also provides an image style migration device, which is used for realizing the image style migration method as claimed in claim 1, comprising the following steps:

an acquisition unit configured to acquire a content image I _c and a style image I _s;

The encoding unit is used for inputting the content image I _c and the wind grid image I _s into a pre-trained encoder network E for feature extraction so as to obtain a content feature C and a wind grid feature S;

The fusion unit is used for fusing the content features C and the wind grid features S through mathematical operation or a convolution network and projecting the fused image features to the hidden space Z;

The weight adjusting unit is used for inputting the information of the style characteristic S into the first layer convolution of the decoder network D, and adjusting the weight A ₁ corresponding to the first layer convolution of the decoder network based on the style characteristic S so as to obtain the demodulated first layer weight A' ₁ of the decoder;

A separation matrix acquisition unit, configured to acquire a separation matrix W based on a fastca algorithm, where the separation matrix W minimizes a correlation of each vector in the matrix a' ₁ ^TA′₁;

The calculating unit is used for calculating and acquiring a demodulated semantic direction set based on the separation matrix W and the matrix A' ₁ ^TA′₁;

and the decoding unit is used for editing the hidden space vector in the hidden space Z based on the acquired semantic direction and finally acquiring the image after style migration by combining with the decoder network D.

wherein AdaIN (C, S) is the fusion result of the content feature C and the style feature S, sigma (S) is the standard deviation mean of the style feature S, mu (S) is the mean of the style feature S, sigma (C) is the standard deviation of the content feature C, and mu (C) is the mean of the content feature C.

Further, the weight adjustment unit obtains the demodulated decoder first layer weight a' ₁ according to the following manner:

Further, the computing unit obtains the demodulated semantic direction set according to the following calculation:

N＝{n₁,n₂,…,n_k}＝WA′₁ ^TA′₁

Further, the decoding unit acquires the hidden space vector in the hidden space Z based on the following calculation:

z′＝z+αn_i

Wherein z' is the edited hidden space vector, z is the hidden space vector, alpha is the degree of change or style change of the potential vector, and n _i is the i-th semantic direction.

Compared with the prior art, the invention has the following advantages:

1. The invention greatly reduces the complexity of the model, does not need to learn any parameters or a large number of data sets, and can learn a large number of styles from the hidden space of the pre-training model only by simple mathematical theory.

2. The invention can efficiently generate various types of images and edit the target properties of the images. The algorithm is simple and easy to use, can be embedded into different style migration models, and has strong universality and flexibility.

3. Compared with the traditional method, the method saves time and simultaneously avoids waste of equipment resources.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort to a person skilled in the art.

FIG. 1 is a schematic diagram of a style migration basic framework.

FIG. 2 is a flow chart of an image style migration method according to the present invention.

FIG. 3 is a graphical representation of the results of the present invention in example 1 tested on AdaIN, linear, MST, SANet models.

FIG. 4 is a graphical representation of the results of the invention of example 2 performed on AdaIN, linear, MST, SANet models.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

Image style migration is a hot research direction in the field of computer vision. With the rise of deep learning, the image style migration field is developed in a breakthrough manner. The style migration is to migrate the style of one canvas to another real image and keep the content in the real image unchanged. The input is a content graph and a style graph, and the output is a stylized result. The entire stylized image is divided into three parts as shown in fig. 1.

A first part: and (5) feature coding. And inputting a content image and a style image, and extracting relevant characteristics of the two images by using an encoder network by the system, namely, the content characteristics and the style characteristics.

A second part: and (5) feature fusion. The system needs to integrate the style characteristics of the oil painting into the content characteristics of the real image, namely, the two characteristics are combined into a potential space, so that preparation work is performed for generating a new picture for a subsequent decoder.

Third section: and (5) feature decoding. The decoder obtains the potential codes from the hidden space and converts the potential codes into artistic images with the same style as the style pictures through the neural network.

Currently, conventional style migration models require a large number of artwork datasets and advanced convolutional neural network architectures to train to produce realistic artistic images, a process that is time consuming and labor intensive. In order to solve the above problems, the present invention provides a style migration method, mainly aiming at the hidden space of the third part, by learning and exploring the abundant potential information in the hidden space, the system can learn a great deal of artistic styles from the hidden space of the pre-training model. Compared with the traditional method, the method has strong universality and flexibility, and can be embedded into most style migration models, such as AdaIN, linear, MST, SANet and the like without relearning the data set.

As shown in FIG. 2, the invention provides an image style migration method, which is an unsupervised decoupling method and is mainly applied to a feature decoding part in style migration. The first layer weight of the decoder is decomposed, a large number of artistic semantics are learned from the demodulated weight, and the attributes corresponding to the images are edited according to the directions of the artistic semantics, so that the artistic works with various styles are generated. Mainly comprises the following steps:

S1, acquiring a content image I _c and a grid image I _s.

S2, inputting the content image I _c and the grid image I _s into a pre-trained encoder network E for feature extraction, so as to obtain a content feature C and a grid feature S.

And S3, fusing the content features C and the wind grid features S through mathematical operation or a convolution network, and projecting the fused image features to the hidden space Z.

Specifically, in this embodiment, it is preferable to use adaptive instance normalization, that is, calculate two feature means and standard deviation respectively, fuse the content feature C and the wind grid feature S, and specifically obtain a fusion result according to the following calculation:

S4, inputting information of the style characteristic S into the first layer convolution of the decoder network D, wherein the information of the style characteristic S refers to a style tensor obtained by encoding a style image through an encoder, and adjusting a weight A ₁ corresponding to the first layer convolution of the decoder network based on the information of the style characteristic S, so as to obtain a demodulated first layer weight A' ₁ of the decoder. This step is mainly used for adapting the pre-trained model, specifically, the first layer weights a' ₁ of the decoder after the past demodulation are calculated according to the following:

Wherein A ₁ is the weight corresponding to the first layer convolution of the decoder network, A' ₁ is the weight of the first layer of the decoder after demodulation, S is the style characteristic, epsilon is a constant term, and the function of the constant term is to make the denominator of the formula be not 0.

The information of the style characteristic S is re-added to the first layer convolution of the decoder network to achieve weight demodulation so that the adjusted weights contain more style information.

S5, obtaining a separation matrix W based on the FastICA algorithm, wherein the separation matrix W minimizes the correlation of each vector in the matrix A' ₁ ^TA′₁.

S6, calculating and obtaining a demodulated semantic direction set based on the separation matrix W and the matrix A' ₁ ^TA′₁.

Since the artistic style in an image is extremely complex, most semantic attributes are coupled, changing one attribute while changing another is highly likely to change another artistic attribute. To decouple the efficient artistic semantics, we need to reduce the correlation of the individual vectors in the matrix a' ₁ ^TA′₁ as much as possible. In this embodiment, the fastca algorithm preferably finds a separation matrix W to multiply with the matrix a '₁ ^TA′₁, i.e., n=wa' ₁ ^TA′₁, and uses correlation minimization to separate each artistic semantic meaning with the greatest possibility, so as to learn the artistic style from the hidden space.

S7, editing the hidden space vector in the hidden space Z based on the acquired semantic direction, and finally acquiring the image after style migration by combining with the decoder network D.

Specifically, modifying the hidden space vector to z '=z+αn _i, ultimately generates a new artistic image i=d (z').

The application effect of the present invention will be further described below by way of specific application examples.

As shown in FIG. 3, the method of the present invention is tested on AdaIN, linear, MST, SANet models, where the first column of each set of images represents the source content image and the bottom right hand corner represents the style image; the second column represents the decoded source output artistic image; the third column and the fourth column represent images with diversified styles obtained by modifying hidden space vectors along different semantic directions, namely, the modified results along the positive and negative directions, namely, z' =z+αn _i (the third column), and z=z- αn _i (the fourth column) realize editing of the related attributes of the picture.

The present invention is applicable to a variety of aspects:

1. Entertainment applications

Modern people have stronger and stronger dependence on internet social contact, and people have higher requirements on specific applications of the internet social contact. The algorithm can be well applied to various software from various drawing software on a computer to various drawing software on a mobile phone. People can beautify or modify own pictures easily and share various social platforms. As the demand of people for beauty is higher, artistic beautification is also gradually proposed. It is also desirable to make the favorite pictures into various styles, such as cool tone style, nostalgic style, photo-by-print style, sketch style, oil painting style, etc., as shown in fig. 3, after enjoying the pictures taken by the user.

2. Auxiliary creation tool

Along with the development of mobile internet technology, various intelligent products are layered endlessly. The advent of the graphic era makes the content of pictures with rich colors and various types be touted by users, and users are eagerly desirous of beautifying and editing the pictures which are shot immediately, sharing communication, label indication and map rendering. At present, beautifying photos is becoming a hobby for people. The algorithm can serve as a user-assisted creation tool, is particularly beneficial to painters to conveniently create artistic works of specific styles, as shown in fig. 4, and can be applied to the aspects of creating computer vision diagrams, fashion designs and the like.

3. Meets the functional requirement

The image style migration function often requires a server with at least one GPU and runs on the Linux operating system. Server-side functions require servers capable of network connection, often require large memory, and data persistence requires large memory server hard disks. This makes many excellent style migration methods impractical to use. The algorithm greatly reduces the complexity of the model, does not need a large number of data sets to train, does not need to learn any parameters, and can run on a plurality of platforms. The algorithm not only meets the hardware requirement, but also meets the requirement of a user.

As shown in fig. 4, for the present invention, the diversity of algorithms is verified on AdaIN, linear, MST, SANet four models, the first column in the figure represents the source content image, and the lower right corner represents the style image; the second column represents the decoded source output artistic image; other columns represent diversified artistic images that we generate after modifying the potential vectors along the artistic semantic direction learned from the hidden space.

The application also provides an image style migration device corresponding to the image style migration method in the application, comprising:

N＝{n₁,n₂,…,n_k}＝WA′₁ ^TA′₁

z′＝z+αn_i

Wherein, z' is the edited hidden space vector, z is the hidden space vector, n _i is the i-th semantic direction, alpha is the degree of style change, the variable is defined manually, if people want to make the image change obvious, alpha is set to be larger, otherwise, alpha is set to be smaller. The change in alpha will cause a change in the potential vector which in turn will cause a change in the style of the final image, so the variable can also be defined as the degree of change in the potential vector.

For the embodiments of the present invention, since they correspond to those in the above embodiments, the description is relatively simple, and the relevant similarities will be found in the description of the above embodiments, and will not be described in detail herein.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. An image style migration method is characterized by comprising the following steps:

acquiring a content image I _c and a style image I _s;

fusing the content features C and the wind grid features S through mathematical operation or a convolution network, projecting the fused image features to a hidden space Z, and fusing the content features C and the wind grid features S, wherein the method comprises the following steps of obtaining a fusion result according to the following calculation:

Wherein AdaIN (C, S) is the fusion result of the content feature C and the style feature S, sigma (S) is the standard deviation of the style feature S, mu (S) is the mean value of the style feature S, sigma (C) is the standard deviation of the content feature C, and mu (C) is the mean value of the content feature C;

Inputting the style tensor obtained after the coding of the coder to the style characteristic S, inputting the style tensor to the first layer convolution of the decoder network D, adjusting the weight A ₁ corresponding to the first layer convolution of the decoder network based on the style characteristic S, and further obtaining the demodulated first layer weight A ₁ ', wherein the method comprises the steps of obtaining the demodulated first layer weight A ₁' of the decoder according to the following modes:

Wherein A ₁ is the weight corresponding to the first layer convolution of the decoder network, A ₁' is the weight of the first layer of the demodulated decoder, S is the style characteristic, and epsilon is a constant term;

obtaining a separation matrix W based on the FastICA algorithm, wherein the separation matrix W minimizes the correlation of each vector in the matrix A ₁'^TA'₁;

Calculating and acquiring the demodulated semantic direction set based on the separation matrix W and the matrix a ₁'^TA'₁, including acquiring the demodulated semantic direction set according to the following calculation:

N＝{n₁,n₂,L,n_k}＝WA'₁ ^TA'₁

Wherein N is the demodulated semantic direction set, N _i is the i-th semantic direction, i=1lk, w is the separation matrix, a '₁ is the demodulated decoder first layer weight, and a' ₁ ^T is the transpose of the demodulated decoder first layer weight;

2. The image style migration method of claim 1, wherein editing the hidden space vector in the hidden space Z based on the acquired semantic direction comprises acquiring the hidden space vector in the hidden space Z based on the following calculation:

z′＝z+αn_i

3. An image style migration apparatus for implementing the image style migration method according to claim 1, comprising:

The encoding unit is configured to input the content image I _c and the grid image I _s into a pre-trained encoder network E for feature extraction, thereby obtaining a content feature C and a grid feature S, where the content feature C and the grid feature S are fused, and the fusion result is obtained according to the following calculation:

The weight adjustment unit is configured to input information of the style characteristic S into a first layer convolution of the decoder network D, adjust a weight a ₁ corresponding to the first layer convolution of the decoder network based on the style characteristic S, and obtain a demodulated first layer weight a '₁, and obtain a demodulated first layer weight a' ₁ according to the following manner:

Wherein A ₁ is the weight corresponding to the first layer convolution of the decoder network, A' ₁ is the weight of the first layer of the demodulated decoder, S is the style characteristic, and epsilon is a constant term;

A separation matrix acquisition unit, configured to acquire a separation matrix W based on a fastca algorithm, where the separation matrix W minimizes a correlation of each vector in the matrix a' ₁ ^TA'₁;

the calculating unit is configured to calculate and obtain a demodulated semantic direction set based on the separation matrix W and the matrix a ₁'^TA'₁, and includes obtaining a demodulated semantic direction set according to the following calculation:

N＝{n₁,n₂,L,n_k}＝WA'₁ ^TA'₁

4. The image style migration apparatus according to claim 3, wherein the decoding unit acquires the hidden space vector in the hidden space Z based on the following calculation:

z′＝z+αn_i