CN111429342B

CN111429342B - Photo style migration method based on style corpus constraint

Info

Publication number: CN111429342B
Application number: CN202010239903.0A
Authority: CN
Inventors: 乔应旭; 刘红敏; 霍占强; 杨红果; 王静; 付威廉
Original assignee: Henan University of Technology
Current assignee: Henan University of Technology
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2024-01-05
Anticipated expiration: 2040-03-31
Also published as: CN111429342A

Abstract

The invention relates to a photo style migration method based on style corpus constraint, which comprises the following steps: acquiring a data set required by a student network for training, selecting a teacher network and the student network, acquiring generated photos of the teacher network and the student network, constructing a style corpus, designing a multi-level anti-distillation strategy based on the constraint of the style corpus, training and optimizing the student network to carry out photo style migration, and acquiring stylized photos. The method provided by the invention can effectively relieve the problems of image distortion, unreal and the like after stylization caused by mutual interference of the style information and the content information of the single photo, and remarkably improves the style migration efficiency of the photo.

Description

Photo style migration method based on style corpus constraint

Technical Field

The invention relates to the field of grid migration in image processing, in particular to a method for representing and migrating single-piece image style information in photo style migration.

Background

Style migration is the main research content of non-realistic drawing in computer graphics, and the drawing styles of different artistic forms are modeled through algorithms, so that the expression form of visual information in an image is enhanced. The research on the artistic stylization of the image can enrich the theoretical contents of computer graphics, image processing and the like, and can deepen and expand the application field of the image. Photo style migration and art style migration are two main tasks of style migration, and compared with art style migration, photo style migration not only needs to migrate style information of an art photo to a content photo, but also requires a stylized image to be the same as a photo taken by a camera.

The existing photo style migration method mainly adopts statistical methods such as a gram matrix [1] and a covariance matrix [2] [4] to model style information of a single artistic photo, and uses a loss function and complex feature transformation based on the gram matrix to conduct style rendering. Because the style information and the content information are mutually wound in a single image, the style information cannot be clearly and accurately modeled by a mathematical formula alone, so that the content information and the style information are mutually influenced in the style migration, the problem that the stylized image has structure distortion, inconsistent style of the same semantic space and blurred image exists, and the application requirement of the style migration of the photo is not met. In order to solve the problem of image quality degradation caused by the failure to accurately model style information, the conventional method needs to introduce complex color space constraint [1], additional post-processing [2] [3] and complex feature transformation operation [4], so that the migration speed of the style of the photo is slow, and the practical application is severely restricted. Therefore, there is a need to study photo style migration methods that have better migration effects and are more efficient.

Reference is made to:

1.F.Lua,S.Paris,E.Shechtman,and K.Bala,“Deep photo styletransfer,”in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition,2017,pp.4990–4998.

2.Y.Li,M.-Y.Liu,X.Li,M.-H.Yang,and J.Kautz,“A closed-formsolution to photorealistic image stylization,”in Proceedings of theEuropean Conference on ComputerVision(ECCV),2018,pp.453–468.

3.X.Li,S.Liu,J.Kautz,and M.-H.Yang,“Learning linear transformations for fast image and video style transfer,”in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2019,pp.3809–3817.

4.J.Yoo,Y.Uh,S.Chun,B.Kang,and J.-W.Ha,“Photorealisticstyle transfer via wavelet transforms,”in International Conference onComputer Vision(ICCV),2019.

disclosure of Invention

Aiming at the problem that the prior method can not effectively solve the problem that single image style information and content information are intertwined, the invention provides a photo style migration method based on style corpus constraint and anti-distillation learning strategy, which mainly comprises the following steps:

step S1: acquiring a data set required for training a student network;

step S2: selecting a teacher network and a student network to obtain generated photos of the teacher network and the student network;

step S3: constructing a style corpus;

step S4: designing a multi-level anti-distillation strategy based on style corpus constraints;

step S5: training and optimizing a student network to carry out photo style migration;

step S6: acquiring a stylized photo;

compared with the current method for modeling the style information of the single image by using the statistical method, the method provided by the invention adopts the style corpus to restrict the style information of the single image, and can effectively overcome the problem that the style information and the content information of the single image are difficult to accurately model due to intertwining. Based on the characteristics of the same photo style in the same style package and different photo styles in different style packages, the consistency constraint is carried out on the style migration effect through countermeasure learning, and the problems of image distortion, unrealism and the like caused by mutual interference of style information and content information are relieved. Finally, the invention uses the strategy of knowledge distillation to directly learn the complex feature conversion operation in the photo style migration by using the neural network, thereby improving the photo style migration efficiency.

Drawings

FIG. 1 is a schematic flow chart of the present invention;

FIG. 2 is a schematic diagram of a frame of the present invention;

FIG. 3 is an effect diagram of the present invention;

table 1 is a table of the test time statistics of the present invention.

Detailed Description

Referring to fig. 1 and 2, which are a flowchart and a frame diagram of a photo style migration method based on style corpus constraint, the method mainly comprises the following steps: acquiring a data set required by a student network for training, selecting a teacher network and the student network, acquiring generated photos of the teacher network and the student network, constructing a style corpus, designing a multi-level anti-distillation strategy based on the constraint of the style corpus, training and optimizing the student network to carry out photo style migration, and acquiring stylized photos. The specific implementation details of each step are as follows:

step S1: the data set required for training the student network is acquired in the following specific way:

step S11: the COCO dataset was downloaded as a content dataset, the number of images in the dataset being noted N.

Step S12: and downloading the art photos disclosed by the WikiArt website as a style data set, wherein the number of photos in the data set is recorded as M.

Step S2: selecting a teacher network and a student network to obtain generated photos of the teacher network and the student network, wherein the specific mode is as follows:

step S21: selecting end-to-end style migration network WCT based on wavelet transform correction ² As a teacher network, the fixed network weight parameter is denoted T.

Step S22: an artistic style migration network AdaIN is selected as a student network, jump connection is introduced between a pooling layer of an encoder and a corresponding deconvolution layer in a decoder, and the network is randomly initialized and marked as S.

Step S23: carrying out standardization, clipping and packaging treatment on the content data set and the style data set; optionally one image from the content data set, denoted c _i (i=1, 2, …, N), optionally one image from the style dataset, denoted r _j (j=1, 2, …, M), c _i And r _j Inputting teacher network T to obtain generated photo T _i,j C, adding _i And r _j Inputting student network S to obtain generated photo S _i,j 。

Step S3: the style corpus is constructed by the following specific modes:

step S31: utilizing teacher network T to make style photo r _j Is rendered to all images in the content data set, and the resulting generated photo collection is recorded as a style packageStyle bag B _j All of the photos in (a) are different but the style is the same.

Step S32: according to step S31, obtaining style packages corresponding to all photos in the style data set, and defining a style corpusIs thatThe style information of the photos in different style packages is different, but the content images are the same.

Step S4: the multi-level anti-distillation strategy based on style corpus constraint is designed in the following specific mode:

step S41: design loss function L _pix ＝||s _i,j -t _i,j || ₁ So that the student network generates a photo s _i,j Generating a photo t with a teacher network _i,j As close as possible in pixel space.

Step S42: design loss functionSo that the student network generates a photo s _i,j Generating a photo t with a teacher network _i,j In the feature space as close as possible, wherein +.>The characteristic diagram corresponding to the image in the loss network VGG19 is shown, and the parameter k is shown as 3 rd and 8 th of VGG16 network,

15. 22 layers. Lambda (lambda) _k The weight coefficients corresponding to different layers are represented, and 1,1,0.5,0.5 is respectively taken.

Step S43: design loss functionSo that the student network generates a photo s _i,j Generating a photo t with a teacher network _i,j As close as possible in overall distribution, where D _cd Is a discriminator consisting entirely of convolutional layers, the parameter C representing the set of content images, R representing the set of style images, Ω _s Representing the result set of the student network output omega _T Representing the teacher's network output result set. Symbol E [. Cndot.]Indicating a pair of brackets []The function value in the inner is expected.

Content image c _i Style image r _j Generating photos s with student network _i,j Spliced together as FalseContent image c _i Style image r _j Generating a photo t with a teacher network _i,j Spliced together to be used as True, and the student network generates photos s through countermeasure training _i,j Generating a photo t with a teacher network _i,j As close as possible.

Step S44: design loss function

So that the student network generates a photo s _i,j Generating a photo t with a teacher network _i,j As close as possible in style, where D _sd Representing a style discriminator consisting entirely of convolutional layers, symbol E [. Cndot.]Indicating a pair of brackets []The function value in (t) is expected _i+1,j ,t _i,j ) Representing two pictures of different content and same style generated by teacher network,(s) _i,j ,t _i,j ) Is two photos with the same content and style generated by a student network and a teacher network, (t) _i,j ,t _i,j+1 ) Representing two photos of the same content but in different styles generated by the teacher network.

Step S5: the training and optimizing student network carries out photo style migration in the following specific modes:

step S51: combining multiple loss functions according to different weights to obtain an overall optimization function

The whole network is trained, and the T parameters of the teacher network are always fixed in the training process.

Step S52: locking style discriminator D _sd And student network S, update condition discriminator D _cd Parameter twice post-lock condition discriminator D _cd 。

Step S53: unlocking style discriminator D _sd Locking style discriminator D after updating parameters twice _sd 。

Step S54: and unlocking the student network S, and locking the student network S after updating the parameters once.

Step S55: repeating steps S52, S53 and S54 until the condition discriminator D _cd And style discriminator D _sd Stopping training when the loss function converges near 0.5, and storing the training student network S and the condition discriminator D _cd And style discriminator D _sd 。

Step S6: and (3) acquiring the stylized photo, namely selecting any one content image and style photo to input the student network S obtained in the step S5, and obtaining the stylized photo.

Compared with the current method for modeling the style information of the single image by using the statistical method, the method provided by the invention adopts the style corpus to restrict the style information of the single image, and can effectively overcome the problem that the style information and the content information of the single image are difficult to accurately model due to intertwining. Based on the characteristics of the same photo style in the same style package and different photo styles in different style packages, the consistency constraint is carried out on the style migration effect through countermeasure learning, and the problems of image distortion, unrealism and the like caused by mutual interference of style information and content information are relieved. Finally, the invention uses the strategy of knowledge distillation to directly learn the complex feature conversion operation in the photo style migration by using the neural network, thereby improving the photo style migration efficiency. The photo style migration method based on the style corpus constraint has the effects and efficiency shown in the figure 3 and the table 1, effectively solves the problems of image distortion and unrealistic in the prior photo style migration, and improves the speed by 13-50 times compared with the method [4 ].

Claims

1. A photo style migration method based on style corpus constraint is characterized by comprising the following steps:

step S11: downloading COCO data sets as content data sets, wherein the number of images in the data sets is recorded as N;

step S12: downloading art photos disclosed by a Wikiart website as a style data set, wherein the number of photos in the data set is recorded as M;

step S21: selecting end-to-end style migration network WCT based on wavelet transform correction ² As a teacher network, fixing a network weight parameter, and marking as T;

step S22: selecting an artistic style migration network AdaIN as a student network, introducing jump connection between a pooling layer of an encoder and a corresponding deconvolution layer in a decoder, randomly initializing the network, and marking as S;

step S23: carrying out standardization, clipping and packaging treatment on the content data set and the style data set; optionally one image from the content data set, denoted c _i (i=1, 2, …, N), optionally one image from the style dataset, denoted r _j (j=1, 2, …, M), c _i And r _j Inputting teacher network T to obtain generated photo T _i,j C, adding _i And r _j Inputting student network S to obtain generated photo S _i,j ；

Step S3: the style corpus is constructed by the following specific modes:

step S31: utilizing teacher network T to make style photo r _j Is rendered to all images in the content data set, and the resulting generated photo collection is recorded as a style packageStyle bag B _j All the photos have different contents but the same style;

step S32: according to step S31, obtaining style packages corresponding to all photos in the style data set, and defining a style corpus asThe style information of the photos in different style packages is different but the content images are the same;

step S41: design loss function L _pix ＝||s _i,j -t _i,j || ₁ So that the student network generates a photo s _i,j Generating a photo t with a teacher network _i,j As close as possible in pixel space;

step S42: design loss functionSo that the student network generates a photo s _i,j Generating a photo t with a teacher network _i,j In the feature space as close as possible, wherein +.>Representing the corresponding characteristic diagram of the image in the loss network VGG-16, and the parameter k represents layers 3, 8, 15 and 22 of the VGG16 network; lambda (lambda) _k The weight coefficients corresponding to different layers are represented, and the values 1,1,0.5,0.5 are respectively taken;

step S43: design loss functionSo that the student network generates a photo s _i,j Generating a photo t with a teacher network _i,j As close as possible in overall distribution, where D _cd Is a discriminator consisting entirely of convolutional layers, the parameter C representing the set of content images, R representing the set of style images, Ω _s Representing the result set of the student network output omega _T Representing the teacher network output result set, symbol E [. Cndot.]Indicating a pair of brackets []The function value in the content image c is calculated and expected _i Style image r _j Generating photos s with student network _i,j Spliced together as False, and the content image c _i Style image r _j Generating a photo t with a teacher network _i,j Spliced together to be used as True, and the student network generates photos s through countermeasure training _i,j Generating a photo t with a teacher network _i,j Approaching as much as possible;

step S44: design loss function So that the student network generates a photo s _i,j Generating a photo t with a teacher network _i,j As close as possible in style, where D _sd Representing a style discriminator consisting entirely of convolutional layers, E [. Cndot.]Indicating a pair of brackets []The function value in (t) is expected _i+1,j ,t _i,j ) Representing two pictures of different content and same style generated by teacher network,(s) _i,j ,t _i,j ) Is two photos with the same content and style generated by a student network and a teacher network, (t) _i,j ,t _i,j+1 ) Representing two photos with the same content and different styles generated by a teacher network;

Training the whole network, wherein the T parameter of the teacher network is always fixed in the training process;

step S52: locking style discriminator D _sd And student network S, update condition discriminator D _cd Parameter twice post-lock condition discriminator D _cd ；

Step S53: unlocking style discriminator D _sd Locking style discriminator D after updating parameters twice _sd ；

Step S54: unlocking the student network S, and locking the student network S after updating parameters once;

step S55: repeating steps S52, S53 and S54 until the condition discriminator D _cd And style discriminator D _sd Stopping training when the loss function converges near 0.5, and storing the training student network S and the condition discriminator D _cd And style discriminator D _sd ；