CN112529771B

CN112529771B - Portrait style migration method

Info

Publication number: CN112529771B
Application number: CN202011427405.5A
Authority: CN
Inventors: 张娟; 续兆攀; 周明全
Original assignee: Shaanxi Normal University
Current assignee: Shaanxi Normal University
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2024-05-31
Anticipated expiration: 2040-12-07
Also published as: CN112529771A

Abstract

The invention discloses a portrait style migration method, which comprises the following steps: 1. acquiring a content image; 2. and (3) establishing a style migration total loss function: obtaining a total loss function through content loss, style loss, total variation regularization loss and a regular penalty term; 3. iterative optimization of the generated image: and the computer adopts a gradient descent method, and performs iterative optimization on the initial generated image by utilizing a total loss function to obtain a style migration image. The method has simple steps and reasonable design, and adopts the content loss, the style loss, the total variation regularization loss and the regular penalty term as the total loss function, thereby adopting the gradient descent method to carry out iterative optimization so as to lead the final style migration image to be similar to the content image and the style image, and improving the quality and the effect of the portrait style migration.

Description

Portrait style migration method

Technical Field

The invention belongs to the technical field of portrait style migration, and particularly relates to a portrait style migration method.

Background

Image style migration refers to a technique of learning the style of a famous picture using an algorithm and then applying this style to another picture. Style migration techniques are widely used in the industrial fields of image processing, game making, movie effect rendering, and the like. Along with the rapid progress of artificial intelligence research, the method is inspired by a convolutional neural network, and a common image is rendered into an image with an artistic style by utilizing deep learning, so that the method has excellent effect of synthesizing the stylized image, and the defects of low modeling efficiency and poor stylized effect of the traditional method are overcome without independently modeling each style, thereby bringing about extensive attention in academia and industry, and then a great deal of research and application results are generated. The basic idea of the style migration technology is to utilize a deep neural network to extract the style textures and semantic content information of an image respectively, and then combine the style textures and semantic content information into a picture. The method has the texture of the artistic style image and the content of the common image, and realizes stylized rendering of the image. The deep learning style migration algorithm network may not need to be modeled separately for a particular style type by rendering the image to any artistic style. However, the style migration algorithm based on deep learning has the following main problems: iterative optimization is performed only by content loss and style loss, resulting in a very poor effect in many cases, especially for style migration of characters.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a portrait style migration method which has simple steps and reasonable design, and iterative optimization is carried out by taking content loss, style loss, total variation regularization loss and regular penalty term as a total loss function, so that the final style migration image is similar to the content image and the style image, and the quality and effect of portrait style migration are improved.

In order to solve the technical problems, the invention adopts the following technical scheme: a portrait style migration method is characterized in that:

step one, acquiring a content image:

Step 101, shooting a person under a solid background by adopting a camera to obtain a person image; wherein the size of the character image is A multiplied by C pixel points, A is larger than C, A and C are positive integers, wherein A represents rows and C represents columns;

102, separating the background and the foreground of the character image by a computer to obtain an optimized binarized character image; wherein the optimized binarized person image is a person image;

Step 103, selecting a background image from a background image library of the computer; the size of the background image is A multiplied by C pixel points;

104, the computer synthesizes the character region in the character image with the background image according to the optimized binarized character image to obtain a synthesized character image, and takes the synthesized character image as a content image;

Step two, establishing a style migration total loss function:

Step 201, randomly generating a white noise image as an initial generated image;

Step 202, selecting a convolutional neural network VGG19 as an original model; the convolutional neural network VGG19 comprises 16 convolutional layers and 5 pooling layers, wherein the 16 convolutional layers are Relu1_1 convolutional layers and Relu1_2 convolutional layers respectively; relu2_1 convolutional layer, relu2_2 convolutional layer; relu3_1 convolutional layer, relu3_2 convolutional layer, relu3_3 convolutional layer, relu3_4 convolutional layer; relu4_1 convolutional layer, relu4_2 convolutional layer, relu4_3 convolutional layer, relu4_4 convolutional layer; relu5_1 convolutional layer, relu5_2 convolutional layer, relu5 _5_3 convolutional layer, relu5 _5_4 convolutional layer;

step 203, the computer acquires content loss, and the specific process is as follows:

Step 2031, inputting the content image and the initial generated image into an original model, and setting that the content image outputs a 4_1 th layer of content feature map through a Relu4_1 convolution layer, and the content image outputs a 5_1 th layer of content feature map through a Relu5_1 convolution layer;

Setting an initial generated image to output a 4_1 layer generated feature map through a Relu4_1 convolution layer, and outputting a 5_1 layer generated feature map through a Relu5_1 convolution layer; the number of the 4_1 th layer content feature map, the 5_1 th layer content feature map, the 4_1 th layer generation feature map and the 5_1 th layer generation feature map is N, and n=512;

Step 2032, the computer is according to the formula Obtaining a content loss L _n,4 between the nth 4_1 layer content feature map and the nth 4_1 layer generation feature map; wherein/>Representing the eigenvalue at (i, j) in the nth 4_1 layer intra-layer eigenvector graph,/>The characteristic values at (I, J) in the 4_1 layer generated characteristic diagram of the nth layer are represented, I, J and N are positive integers, I is not less than 1 and not more than I, J is not less than 1 and not more than J, I represents the total column number of the 4_1 layer generated characteristic diagram or the 4_1 layer generated characteristic diagram, J represents the total line number of the 4_1 layer generated characteristic diagram or the 4_1 layer generated characteristic diagram, and N is not less than 1;

The computer is according to the formula Obtaining a content loss L _n,5 between the nth 5_1 layer content feature map and the nth 5_1 layer generation feature map; wherein/>Representing the eigenvalues at (i ', j') in the nth 5_1 layer intra-layer eigenvector graph,/>The characteristic values at (I ', J') in the nth 5_1 layer generated characteristic diagram are all positive integers, I 'is equal to or less than 1 and less than or equal to 1', J 'is equal to or less than 1 and less than or equal to J', I 'represents the total column number of the 5_1 layer generated characteristic diagram or the 5_1 layer generated characteristic diagram, and J' represents the total line number of the 5_1 layer generated characteristic diagram or the 5_1 layer generated characteristic diagram;

step 2033, the computer is according to the formula Obtaining content loss L _c;

Step 204, obtaining style loss by a computer, wherein the specific process is as follows:

Step 2041, the computer overlays the first mask layer on the style image to generate a first mask style image; the computer overlays the second mask layer on the style image to generate a second mask style image; wherein the transparency of the character area in the first mask layer is set to 100%, and the transparency of the background area in the first mask layer is set to 0; the transparency of the human object area in the second mask layer is set to be 0, the transparency of the background area in the second mask layer is set to be 100%, the background areas in the first mask layer and the second mask layer are both white, and the human object areas in the first mask layer and the second mask layer are both black;

Step 2042, the computer overlays the first mask layer on the initial generated image to generate a first mask generated image, and the computer overlays the second mask layer on the initial generated image to generate a second mask generated image;

Step 2043, the computer inputs the first mask-style image, the second mask-style image, the first mask-generated image, and the second mask-generated image into the original model, and the method for obtaining the style loss of the feature map output by the Relu1_1 convolution layer, relu2_1 convolution layer, relu3_1 convolution layer, relu4_1 convolution layer, and Relu5_1 convolution layer is the same, wherein the method for obtaining the style loss of the feature map output by the Relu c _1 convolution layer comprises the following specific steps:

Step 20431, setting a first mask style image, a second mask style image, a first mask generated image and a second mask generated image, and outputting a c_1 layer first mask style feature map, a c_1 layer second mask style feature map, a c_1 layer first mask generated feature map and a c_1 layer second mask generated feature map through Relu c _1 convolution layers respectively; the number of the c_1 layer first mask style feature map, the c_1 layer second mask style feature map, the c_1 layer first mask generation feature map and the c_1 layer second mask generation feature map is N _c; wherein c is a positive integer, and c is more than or equal to 1 and less than or equal to 5;

Step 20432, obtaining the gram matrix of the N _c c_1 th layer first mask style feature map by computer And/>The element value of the a _c th row b _c th column is recorded as/>And/>Representing the cross-correlation degree between an a _c th c_1 th layer first mask style feature map and a b _c th c_1 th layer first mask style feature map in N _c c_1 th layer first mask style feature maps, wherein a _c and b _c are positive integers, and a _c≤N_c,1≤b_c≤N_c is not less than 1;

Obtaining N _c (c_1) th layer of a gram matrix of the second mask style characteristic diagram And/>The element value of the a _c th row b _c th column is recorded as/>And/>Representing the degree of cross-correlation between the a _c th c_1 th layer second mask style feature map and the b _c th c_1 th layer second mask style feature map in the N _c c_1 th layer second mask style feature maps;

Obtaining N _c c-1 th layer first mask to generate a gram matrix of the feature map And/>The element value of the a _c th row b _c th column is recorded as/>And/>Representing the degree of cross-correlation between the a _c th c_1 th layer first mask generation feature map and the b _c th c_1 th layer first mask generation feature map in the N _c c_1 th layer first mask generation feature maps;

obtaining N _c c-1 th layer second masks to generate a gram matrix of the feature map And/>The element value of the a _c th row b _c th column is recorded as/>And/>Representing the degree of cross-correlation between the a _c th c_1 th layer second mask-generation feature map and the b _c th c_1 th layer second mask-generation feature map in the N _c c_1 th layer second mask-generation feature maps;

Step 20433, the computer is based on the formula Obtaining style loss L _s,c-1 of the c_1 layer generated feature map;

step 20434, the computer is based on the formula Obtaining style loss L _s;

Step 205, the computer is according to the formula Obtaining total variation regularization loss L _tv; where R _i″,j″、G_i″,j″ and B _i″,j″ represent the R, G, and B components at the (i ', j ') pixel coordinates in the initial generated image, R _i″,j″+1、G_i″,j″+1 and B _i″,j″+1 represent the R, G, and B components at the (i ', j ' +1) pixel coordinates in the initial generated image, R _i″+1,j″、G_i″+1,j″ and B _i″+1,j″ represent R, G and B components at the (i ' +1, j ') pixel coordinates in the initial generated image, i ', j ' are positive integers, and 1.ltoreq.i '.

Step 206, the computer is used for preparing the following formulaObtaining a regular penalty term L _m; wherein e is a positive integer, and e takes values of 1,2 and 3, when e=1, the R channel of the initial generated image is represented, and V ₁ (o) represents vectorization of the R component matrix of the initial generated image through PCA dimension reduction; when e=2, representing the G channel of the initial generated image, V ₂ (o) represents the vectorization of the G component matrix of the initial generated image through PCA dimension reduction; when e=3, representing the B channel of the initial generated image, V ₃ (o) represents the vectorization of the B component matrix of the initial generated image by PCA dimension reduction, and the sizes of V ₁(o)、V₂ (o) and V ₃ (o) are both c×1; m _I denotes a laplacian matrix of the content image, and the size of M _I is c×c;

Step 207, the computer obtains a total loss function L _total according to a formula L _total＝αL_c+ρL_s+γL_tv+L_m; wherein α represents the weight coefficient of the content loss, ρ represents the weight coefficient of the style loss, and γ represents the weight of the total variation regularization loss;

Step three, generating iterative optimization of the image:

Step 301, a computer adopts a gradient descent algorithm, and the initial generated image is subjected to iterative optimization by utilizing a total loss function;

and 302, repeating the iterative optimization of the step 301 until the iterative optimization preset times are met, and obtaining the style migration image.

The portrait style migration method is characterized by comprising the following steps of: in step 102, the computer separates the background from the foreground of the character image to obtain an optimized binarized character image, and the specific process is as follows:

Step 1021, a computer invokes a binarization module to perform binarization processing on the figure image to obtain a binarized figure image; wherein the background of the binarized character image is white, and the character area in the binarized character image is black;

Step 1022, the computer invokes the corrosion module to perform corrosion treatment on the binarized figure image to obtain a corroded binarized figure image;

Step 1023, the computer invokes an expansion module to expand the corroded binarized character image to obtain an optimized binarized character image.

The portrait style migration method is characterized by comprising the following steps of: in step 104, the computer synthesizes the character region in the character image with the background image according to the optimized binarized character image to obtain a synthesized character image, and the specific process is as follows:

step 1041, a computer invoking CNNY edge extraction module performs edge extraction on the optimized binarized character image to obtain a character region outline;

Step 1042, the computer marks the region surrounded by the outline of the character region as the character region, and obtains the pixel coordinates and RGB three-component values of each pixel point of the corresponding character region in the character image; wherein, in order from left to right and top to bottom, the pixel sitting of the a-th pixel point in the character region is marked as (u _a,v_a), the Red component at the pixel coordinate (u _a,v_a) of the a-th pixel point is marked as Red _a, the Blue component at the pixel coordinate (u _a,v_a) of the a-th pixel point is marked as Blue _a, the Green component at the pixel coordinate (u _a,v_a) of the a-th pixel point is marked as Green _a,u_a to represent the column coordinate of the a-th pixel point, v _a to represent the row coordinate of the a-th pixel point, and a is a positive integer;

In step 1043, the computer replaces the Red component, the blue component and the Green component at the pixel coordinate (u _a,v_a) of the a-th pixel point in the background image with Red _a、Blue_a and Green _a respectively until the replacement of all the pixel points in the human region is completed, so as to obtain the synthetic human image.

The portrait style migration method is characterized by comprising the following steps of: in step 201, the size of the initial generated image is the same as the size of the content image, the initial generated image is an RGB color image, and white noise on the initial generated image is subject to normal distribution.

The portrait style migration method is characterized by comprising the following steps of: in step 202, the activation functions of 16 convolution layers are all ReLU activation functions, the kernel sizes of 16 convolution layers are all 3×3, the step sizes of 16 convolution layers are all 1, the kernel sizes of 5 pooling layers are all 2×2, and the step sizes of 5 pooling layers are all 2.

The portrait style migration method is characterized by comprising the following steps of: in step 207, the value of the weighting coefficient alpha of the content loss is 0-2, the value of the weighting coefficient rho of the style loss is 0-1000, and the value of the weight gamma of the total variation regularization loss is 100-110;

in the step 301, the learning rate alpha 'in the gradient descent algorithm is 0 < alpha' < 1;

In step 302, the preset iterative optimization times are 2000-2100.

Compared with the prior art, the invention has the following advantages:

1. the portrait style migration method is simple in steps and reasonable in design, and improves the quality and effect of portrait style migration.

2. The portrait style migration method has good use effect, firstly, a personage image and a background image are obtained, and the personage image and the background image are synthesized to obtain a content image; and secondly, establishing a total loss function of style migration, and finally, performing iterative optimization on the initial generated image by using the total loss function by adopting a gradient descent algorithm to obtain a final style migration image which is similar to the content image and the style image, thereby improving the quality and effect of portrait style migration.

3. According to the method, the content loss, the style loss, the total variation regularization loss and the regular penalty term are taken into consideration in the establishment of the style migration total loss function, so that the subsequent iterative optimization is conveniently carried out by using the total loss function, constraint is added to the migration process of the content image through the regular penalty term, noise in the generated image can be effectively removed through the total variation regularization loss, and meanwhile, the contour and texture details of the generated image can be kept, so that the generated image is smooth, and the quality and effect of portrait style migration are improved.

In summary, the method has simple steps and reasonable design, and the content loss, the style loss, the total variation regularization loss and the regular penalty term are used as the total loss function, so that the iterative optimization is performed by adopting the gradient descent algorithm, the final style migration image is similar to the content image and the style image, and the quality and the effect of the portrait style migration are improved.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a flow chart of a portrait style migration method according to the present invention.

Detailed Description

A portrait style migration method as shown in fig. 1, the method comprising the steps of:

step one, acquiring a content image:

Step two, establishing a style migration total loss function:

step 20434, the computer is based on the formula Obtaining style loss L _s;

Step three, generating iterative optimization of the image:

In this embodiment, in step 102, the computer performs background and foreground separation on the character image to obtain an optimized binarized character image, and the specific process is as follows:

In this embodiment, in step 104, the computer synthesizes the character region in the character image with the background image according to the optimized binarized character image to obtain a synthesized character image, which specifically includes the following steps:

In this embodiment, the size of the initial generated image and the size of the content image in step 201 are the same, the initial generated image is an RGB color image, and white noise on the initial generated image is subject to normal distribution.

In this embodiment, the activation functions of 16 convolution layers in step 202 are all ReLU activation functions, the kernel sizes of 16 convolution layers are all 3×3, the step sizes of 16 convolution layers are all 1, the kernel sizes of 5 pooling layers are all 2×2, and the step sizes of 5 pooling layers are all 2.

In this embodiment, in step 207, the value of the weighting coefficient α of the content loss is 0-2, the value of the weighting coefficient ρ of the style loss is 0-1000, and the value of the weight γ of the total variation regularization loss is 100-110;

In step 302, the preset iterative optimization times are 2000-2100.

In this embodiment, in actual use, the optimized binarized character image is used as the first mask layer and the second mask layer.

In the embodiment, the total variation regularization loss is obtained, so that noise in the generated image can be effectively removed, and the contour and texture details of the generated image can be reserved, so that the generated image is smooth, and the quality and effect of portrait style migration are improved.

In this embodiment, the size of the personal image is 1200×840, and the size of the background image is 1200×840. The size of the style image is 1200×840, and a=1200, c=840.

In this embodiment, the computer inputs the first mask style image, the second mask style image, the first mask generated image, and the second mask generated image into the original model, and sets the first mask style image, the second mask style image, the first mask generated image, and the second mask generated image to output a 1_1 layer first mask style feature map, a 1_1 layer second mask style feature map, a 1_1 layer first mask generated feature map, and a 1_1 layer second mask generated feature map through Relu1_1 convolution layers, respectively; the number of the first mask style feature map of the 1_1 layer, the second mask style feature map of the 1_1 layer, the first mask generation feature map of the 1_1 layer and the second mask generation feature map of the 1_1 layer is N ₁, and N ₁ =64.

In this embodiment, the computer inputs the first mask style image, the second mask style image, the first mask generated image, and the second mask generated image into the original model, and sets the first mask style image, the second mask style image, the first mask generated image, and the second mask generated image to output a layer 2_1 first mask style feature map, a layer 2_1 second mask style feature map, a layer 2_1 first mask generated feature map, and a layer 2_1 second mask generated feature map through Relu2_1 convolution layers, respectively; the numbers of the first mask style feature map of the layer 2_1, the second mask style feature map of the layer 2_1, the first mask generation feature map of the layer 2_1 and the second mask generation feature map of the layer 2_1 are N ₂, and N ₂ =128.

In this embodiment, the computer inputs the first mask style image, the second mask style image, the first mask generated image, and the second mask generated image into the original model, and sets the first mask style image, the second mask style image, the first mask generated image, and the second mask generated image to output a layer 3_1 first mask style feature map, a layer 3_1 second mask style feature map, a layer 3_1 first mask generated feature map, and a layer 3_1 second mask generated feature map through Relu3_1 convolution layers, respectively; and the number of the first mask style feature map of the 3_1 layer, the second mask style feature map of the 3_1 layer, the first mask generation feature map of the 3_1 layer and the second mask generation feature map of the 3_1 layer is N ₃, and N ₃ =256.

In this embodiment, the computer inputs the first mask style image, the second mask style image, the first mask generated image, and the second mask generated image into the original model, and sets the first mask style image, the second mask style image, the first mask generated image, and the second mask generated image to output a 4_1 layer first mask style feature map, a 4_1 layer second mask style feature map, a 4_1 layer first mask generated feature map, and a 4_1 layer second mask generated feature map through the Relu4_1 convolution layer, respectively; and the number of the first mask style feature map of the 4_1 layer, the second mask style feature map of the 4_1 layer, the first mask generation feature map of the 4_1 layer and the second mask generation feature map of the 4_1 layer is N ₄, and N ₄ =512.

In this embodiment, the computer inputs the first mask style image, the second mask style image, the first mask generated image, and the second mask generated image into the original model, and sets the first mask style image, the second mask style image, the first mask generated image, and the second mask generated image to output a 5_1 layer first mask style feature map, a 5_1 layer second mask style feature map, a 5_1 layer first mask generated feature map, and a 5_1 layer second mask generated feature map through Relu5_1 convolution layers, respectively; and the numbers of the 5_1 layer first mask style feature map, the 5_1 layer second mask style feature map, the 5_1 layer first mask generation feature map and the 5_1 layer second mask generation feature map are all N ₅, and N ₅ =512.

In this embodiment, the learning rate α' in the gradient descent algorithm has a value of 0.1.

In this embodiment, α and ρ are set to balance content loss and style loss, where the more semantic content information is retained by α, the more generated stylized image semantic profiles are clearer, and the more texture style information is retained by ρ, the greater the stylized image stylized degree is greater. Therefore, the generated image has the content of the content image and the style of the style image, and the migration of the style of the image is realized.

In this embodiment, the value of γ is set to 100-110, so that noise in the generated image can be effectively removed while the contour and texture details of the image can be maintained.

The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and any simple modification, variation and equivalent structural changes made to the above embodiment according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims

1. A portrait style migration method, characterized in that the method comprises the following steps:

step one, acquiring a content image:

Step two, establishing a style migration total loss function:

Step 20432, obtaining the gram matrix of the N _c c_1 th layer first mask style feature map by computer And is also provided withThe element value of the a _c th row b _c th column is recorded as/>And/>Representing the cross-correlation degree between an a _c th c_1 th layer first mask style feature map and a b _c th c_1 th layer first mask style feature map in N _c c_1 th layer first mask style feature maps, wherein a _c and b _c are positive integers, and a _c≤N_c,1≤b_c≤N_c is not less than 1;

step 20434, the computer is based on the formula Obtaining style loss L _s;

Step 206, the computer is used for preparing the following formulaObtaining a regular penalty term L _m; wherein e is a positive integer, and e takes values of 1,2 and 3, when e=1, the R channel of the initial generated image is represented, and V ₁ (o) represents vectorization of the R component matrix of the initial generated image through PCA dimension reduction; when e=2, representing the G channel of the initial generated image, V ₂ (o) represents the vectorization of the G component matrix of the initial generated image through PCA dimension reduction; when e=3, representing the B channel of the initial generated image, V ₃ (o) represents the vectorization of the B component matrix of the initial generated image through PCA dimension reduction, and M _I represents the laplace matrix of the content image;

Step three, generating iterative optimization of the image:

2. A portrait style migration method as claimed in claim 1, wherein: in step 102, the computer separates the background from the foreground of the character image to obtain an optimized binarized character image, and the specific process is as follows:

3. A portrait style migration method as claimed in claim 1, wherein: in step 104, the computer synthesizes the character region in the character image with the background image according to the optimized binarized character image to obtain a synthesized character image, and the specific process is as follows:

4. A portrait style migration method as claimed in claim 1, wherein: in step 201, the size of the initial generated image is the same as the size of the content image, the initial generated image is an RGB color image, and white noise on the initial generated image is subject to normal distribution.

5. A portrait style migration method as claimed in claim 1, wherein: in step 202, the activation functions of 16 convolution layers are all ReLU activation functions, the kernel sizes of 16 convolution layers are all 3×3, the step sizes of 16 convolution layers are all 1, the kernel sizes of 5 pooling layers are all 2×2, and the step sizes of 5 pooling layers are all 2.

6. A portrait style migration method as claimed in claim 1, wherein: in step 207, the value of the weighting coefficient alpha of the content loss is 0-2, the value of the weighting coefficient rho of the style loss is 0-1000, and the value of the weight gamma of the total variation regularization loss is 100-110;

In step 302, the preset iterative optimization times are 2000-2100.