CN108805803B

CN108805803B - Portrait style migration method based on semantic segmentation and deep convolution neural network

Info

Publication number: CN108805803B
Application number: CN201810606345.XA
Authority: CN
Inventors: 赵辉煌; 郑金华; 孙雅琪
Original assignee: Hengyang Normal University
Current assignee: Hengyang Normal University
Priority date: 2018-06-13
Filing date: 2018-06-13
Publication date: 2020-03-13
Anticipated expiration: 2038-06-13
Also published as: CN108805803A

Abstract

The invention discloses a portrait style migration method based on semantic segmentation and a deep convolutional neural network, which comprises the steps of firstly selecting a portrait to be converted and a target style portrait, then carrying out semantic segmentation on two images to segment a portrait region and a background region, then segmenting specific five sense organs from the portrait region, then defining a portrait style migration loss function, adopting a deep convolutional neural network VGG-19 as an image advanced style feature extraction basic model, defining a content constraint layer and a style constraint layer, then defining the content constraint layer and the style constraint layer in the VGG-19 model, and establishing a new model structure. And respectively inputting the segmented semantic image and the original image into a new VGG-19 model, extracting high-level style features and content features of the image, utilizing a portrait style migration loss function, adopting a gradient descent method, and finally generating a style migration result image through repeated iteration to minimize the loss function.

Description

Portrait style migration method based on semantic segmentation and deep convolution neural network

Technical Field

The invention relates to the field of deep learning, in particular to a portrait style migration method based on semantic segmentation and a deep convolutional neural network.

Background

With the rapid development of scientific and technological technology, in the field of deep learning research, the process of fusing semantic content of a picture with different styles by Using CNN is called Neural Style migration (Neural Style Transfer), and the oral report article "image Style Transfer Using Neural networks" of garys et al in CVPR proves the surprising ability of Convolutional Neural Networks (CNN) in image Style migration: by separating and recombining picture content and styles, CNNs can create artistic charm works. Since then, there has been a great interest in neural style migration in academic research and industrial applications, transferring the artistic style with artistic works to daily photos, becoming a computer vision task that has received great attention in both academic and industrial circles. Meanwhile, the method brings about a plurality of exclamatory applications in the style migration of the portrait. A Torr Vision Group at Oxford university provides a model (Conditional Random Fields as Current Neural Networks) in ICCV 2015, and after training, a CRFes RNN model can segment target contents in an image.

The existing style migration method mainly has the following problems: the style migration of images has great randomness, so that the effect is not ideal in many cases. Particularly, for the style migration of the portrait, some errors sometimes occur, for example, the eye part features in the style image are migrated to the mouth, or the image background features are migrated to the portrait, and the migration effect is very undesirable.

Disclosure of Invention

The invention provides a portrait style migration method based on semantic segmentation and a deep convolution neural network, aiming at realizing targeted style migration of a portrait and improving the portrait style migration effect.

In order to achieve the technical purpose, the technical scheme of the invention is that,

a portrait style migration method based on semantic segmentation and a deep convolutional neural network comprises the following steps:

step 1, selecting a content portrait image needing style transfer and a style portrait image serving as a style source, and performing semantic segmentation on the content image and the style image respectively to segment a portrait region and a background region, namely forming a semantic image of the content image and the style image;

step 2, adopting a deep convolutional neural network VGG-19 as an image high-level feature extraction original model, taking relu5_1 as a content constraint feature extraction layer, and taking relu3_1 and relu4_1 as style constraint feature extraction layers;

step 3, establishing new feature graphs for the content constraint feature extraction layer and the style constraint feature extraction layer respectively;

step 4, generating Gaussian noise images randomly as new initialization images;

step 5, adjusting the size of the initialized new image according to the size of the content portrait image;

step 6, inputting the style portrait image, the content image semantic image and the style image semantic image into a convolutional neural network VGG-19, and then calculating style constraint layer loss functions of the content portrait semantic image and the style portrait semantic image on style constraint layers relu3_1 and relu4_1 by using a Markov random field;

step 7, inputting the initialized new image into a convolutional neural network VGG-19, and calculating a content constraint loss function of the finally generated style image in a content constraint layer relu5_1 by using a Markov random field model;

step 8, integrating the results of the step 6 and the step 7 to obtain a total loss function, generating a portrait style transition result by respectively adopting an optimization algorithm based on a gradient descent method for different layers, namely generating the gradient of the style transition portrait by iterative calculation by adopting the gradient descent method, and approaching the original content portrait and the style portrait along the direction of negative gradient by using the total loss function so as to ensure that the style transition portrait generated by each iteration is similar to the original content portrait and the style portrait respectively as much as possible;

and 9, repeating the steps 6-8 for 100 times of iteration, repeating the steps 5-8 for 3 times of iteration, and outputting the final portrait style transition image.

In the method, in the step 1, firstly, semantic segmentation is carried out on a content image and a style image to segment semantic images of a portrait region and a background region, then, semantic segmentation is further carried out on the portrait region to segment 5 regions of a face, a nose, eyes, a mouth and a body as 5 semantic images, and finally 6 semantic images of the background, the face, the nose, the eyes, the mouth and the body are obtained.

In the method, in the step 3, the new feature map of the content constraint feature extraction layer is

Where l represents the content constrained feature extraction layer in the corresponding VGG-19, i.e. relu5_1,

is a feature graph generated by a content portrait image on a content constraint layer based on a VGG19 network model, β_cThe parameters are adjusted for semantic content portrait weights,

semantic image representing content portrait, k being 1,2,3,4,5,6, β_cValue range [0,200](ii) a The new characteristic diagram of the style constraint characteristic extraction layer is

Wherein l represents the style constraint feature extraction layers in the corresponding VGG-19, namely relu3_1 and relu4_1,

is a feature diagram generated by a style portrait image based on a VGG19 network model in a style constraint layer, β_sThe parameters are adjusted for semantic style portrait weights,

semantic image representing a stylistic portrait, k being 1,2,3,4,5,6, β_sValue range [0,200]。

Said method, said step 5, setting the size of the initialization new image to be

Wherein

h_cThe length and the width of the content portrait image are respectively, L is a parameter for adjusting the image size, and L is respectively 3,2 and 1 in each iteration.

In the method, in step 6, the style constraint layer loss function is:

wherein the content of the first and second substances,

phi (x) is a feature map, i represents the ith, j represents the jth, phi (x) and m_cDividing the blocks into local blocks of r x r size, i.e. local blocks, wherein each local block is psi (phi (x)), psi (m)_c) Dividing phi (x) into p1 local patches, and dividing m_cThe segmentation generates p2 local patches,

representing a stylistic portrait image, R representing a set of real numbers, w_c，h_cRespectively the length and width of the portrait image of the content,

wherein R represents a real number set, w_s，h_sLength and width of the portrait image of contents, m_cSemantic image, m, representing a portrait of content_sA semantic image representing a stylistic portrait;

represents the ith local patch in Ψ (Φ (x)),

to represent

The ith local patch of (1). While

And

each representing Ψ^*(Φ(x_s) Either) or

Neutralization of

Or

The most matched local patch, k, represents the number of semantic images;

wherein the local patch selection rule is defined as

The method, the step 7, the content constraint loss function is

E_c(Φ(x),Φ(x_c))＝||Φ(x)-Φ(x_c)||²。

The method, the step 8, the total loss function is

E(x)＝α₁E_s(Φ(x),Φ(x_s),m_c,m_s)+α₂E_c(Φ(x),Φ(x_c))

α therein₁And α₂The values of the adjustment parameters are α for adjusting the intensity of the original content image and the lattice image contained in the generated image respectively₁∈[0,1]，α₂∈[0,200]。

In the step 8, the optimization algorithm based on the gradient descent method includes the following steps:

(1) initialization, where the iteration parameters i-0, j-m, define the matrix H and initialize it to a diagonal matrix with elements 1, and the allowable error e-10^-5Calculating an initial gradient

x₀A Gaussian noise image randomly generated in the step 4 is obtained;

(2) if i<Itr or if

The ith iteration node is outputFruit x_i+1And ending the optimization algorithm; otherwise, turning to the step (3); wherein itr is the highest number of iterations;

(3) definition of p_iIs the negative gradient direction p of the ith iteration_i＝-g_i；

(4) Updating the result of the ith iteration, x_i+1＝x_i+p_i；

(5) Definition s_iAs a result of the previous step x_iAnd the error of the result of this iteration, i.e. s_i＝x_i+1-x_iDefinition of y_iGraduating as a result of the previous step

And the gradient of the result of this iteration

Error, i.e.

Definition of

Wherein T represents a matrix transfer;

(6) updating

(7) Defining variable q as x_iGradient of (2)

(8) Iterative calculation of j ═ 1

Get

Updating q, q ═ q-a_iy_i-jUntil j is m, and m is a preset iteration number;

(9) update g_i,g_i＝H_iq；

(10) Iterative calculation of j ═ 1

Taking out the raw materials,

update g_i，g_i＝g_i+s_i-j(a_j-b) until j ═ m

(11) And (5) updating an iteration step, i is equal to i +1, and jumping to the step (2).

The method, in the optimization algorithm based on the gradient descent method, further comprises the step of retaining the results of the latest m times after the step (5) is executed, if i>m, then delete s_i-m、s_i-m-1...s₁And y_i-m、y_i-m-1...y₁。

The method establishes an image content model and an image style model based on high-level semantic representation in a convolutional neural network, and then optimizes an initial image (such as a random noise image) to enable the initial image to have content representation similar to a content portrait image and style representation similar to a style portrait image in the same convolutional neural network, so that an image fusing the content of the content portrait image and the style of the style portrait image is generated, and a style transfer function is realized.

The difference and the advantage of the invention compared with other style transfer algorithms are

(1) The invention carries out more subdivision on a feature map generated by the original portrait, namely feature map, establishes a loss function by extracting sub-blocks of the feature map, and minimizes the loss function by adopting a gradient descent method. Therefore, the generated portrait has better detail characteristics and more ideal effect. Has essential difference with the traditional method.

(2) According to the method, the original style portrait and the content portrait are subjected to semantic segmentation to obtain a plurality of semantic images, the semantic portraits are converted into feature maps, the feature maps are added to a selected layer in a VGG network model, and more features are provided for an image style migration method to select.

(3) The present invention defines a new loss function. The constraint of the semantic image on the output result is increased. The method avoids the generation of some errors in the style transfer (such as the transfer of eye part characteristics to the mouth in the style portrait or the transfer of image background characteristics to the portrait), and improves the effect of the portrait style transfer.

In conclusion, the invention realizes the technical effect of style transfer on any style portrait image which can be subjected to semantic segmentation.

Drawings

FIG. 1 is a system flow diagram of the present invention;

FIG. 2 is a model architecture diagram of the present invention;

FIG. 3 is a content portrait image employed by embodiments of the present invention;

FIG. 4 is a stylistic portrait image employed by embodiments of the present invention;

FIG. 5 is a style migration result of the portrait style migration method of the present invention.

Fig. 6 is a style migration result display of the portrait style migration method by the conventional method.

Detailed Description

Referring to fig. 1 and fig. 2, which are a system flowchart and a model architecture diagram of the present invention, respectively, and referring to fig. 4, the present embodiment selects an artistic image as a style portrait

Selecting an image as the content portrait

As shown in fig. 3. Wherein w_c，h_cLength and width, w, of the portrait image of the content, respectively_s，h_sRespectively, the length and width of the portrait image of the content; then, semantic segmentation is carried out on the style portrait and the content portrait by adopting a semantic-based image segmentation algorithm:

step 1, selecting a CRF as RNN model developed by Oxford university as a semantic segmentation model of an image portrait region, performing semantic segmentation on a content image and a style image respectively to segment the portrait region and a background region,

step 2, adopting an Openface face region segmentation algorithm, calibrating the face, nose, eyes, mouth and body regions of the portrait region, and then performing semantic segmentationAnd (4) cutting and segmenting 5 regions of the face, the nose, the eyes, the mouth and the body to serve as 5 semantic images, and finally obtaining 6 semantic images of the background, the face, the nose, the eyes, the mouth and the body. Semantic image in which the content is portrait

And style portrait semantic images

k＝1,2,3,4,5,6。

FIG. 3 is a target content image

FIG. 4 is a target portrait style image

Our goal is to generate a style migration graph 5.

And 3, selecting a deep convolutional neural network VGG-19 which obtains excellent performances in ImageNet image classification competition in 2014 as an image advanced style feature extraction model.

Step 4, setting a content constraint layer, and selecting the target content image x shown in the figure 3_cFIG. 4 is a target style image x_sSelecting relu5_1 as a content constraint layer, selecting relu3_1 and relu4_1 as style constraint layers, and setting L to be 3,2 and 1, namely, adopting three layers of iterations, wherein the maximum iteration number itr of each layer is 100;

step 5, reading semantic images of the content portraits at a VGG19 network content constraint layer relu5_1

And content portrait x_cAnd updating feature maps in the VGG19 network content constraint layer.

Feature maps, f at content constraint level for new VGG19 networks_cIs a content portrait x_cFeature maps generated at the content constraint layer.

And take β_c＝20。

And 6, establishing a new input and output model in the content layer relu5_1, and recalculating the gradient of the network model in the relu5_1 layer. And updating the output of the network model at the relu5_1 layer to obtain new output at the relu5_ l layer.

Step 7, setting a style constraint layer, and enabling the target style image x_sThe input is input into a convolutional neural network VGG-19, and the style image is calculated at a style constraint layer relu3_ l, relu4_ 1.

Step 8, reading semantic images of the style portraits at a VGG19 network style constraint layer relu3_ l, relu4_1

And style portrait x_sUpdating feature maps in the VGG19 network style constraint layer,

feature maps, f at the style constraint level for a new VGG19 network_sIs a style portrait x_sIn feature maps generated by the stylistic constraint layer β_s＝20。

And 9, establishing a new input and output model at the style layers relu3_ l and relu4_1, and recalculating the gradient of the network model at the relu3_ l and relu4_1 layers. The updated network model is output at the relu3_ l, relu4_1 level. And results in a new output at the level relu3_ l, relu4_ 1.

Step 10, generating Gaussian noise image randomly as new initialization image

Step 11, utilizing the node of the last iterationIf necessary, resetting the image size

Wherein

Step 12, the target content portrait x_cAnd semantic image m_cInputting the data into a convolutional neural network VGG-19, outputting feature maps in the network model and recording the feature maps as phi (x) in a content constraint layer by using a Markov Random Field (MRF) model_c)，m_c。

Step 13, the target style image x_sAnd semantic image m_sInputting into convolutional neural network VGG-19, and outputting feature maps in the network model at content constraint layer by using Markov Random Field (MRF) model and recording as phi (x)_s)，m_s。

Step 14, Φ (x)_s)，m_sDividing by 1 step size, and dividing by phi (x)_s)，m_sAnd m_cDivided into p small blocks (local patch) of size 3 × 3.

Step 15, loss functions on style constraint layers relu3_ l and relu4_1,

β_c,β_sthe method is used for adjusting the weight of semantic images, wherein p1 and p2 represent that phi (x) is segmented to generate p1 local patches and m is_cThe segmentation generates p2 local patches,

step 16, Ψ_i(Φ (x)) represents a local patch, and

and

respectively represents phi (x)_s) Or

Meso-and Ψ_i(Φ (x)) and

the best matching patch, k, represents the number of semantic images.

Step 17, the local patch selection rule is defined as,

step 18, calculating a loss function on the content constraint layer relu5_1, inputting the new image X into the convolutional neural network VGG-19 to obtain the loss function of X generating the portrait on the content constraint layer relu5_ l by utilizing a Markov Random Field (MRF) model on the content constraint layer,

E_c(Φ(x),Φ(x_c))＝||Φ(x)-Φ(x_c)||²

step 19, establish the total loss function:

E(x)＝α₁E_s(Φ(x),Φ(x_s),m_c,m_s)+α₂E_c(Φ(x),Φ(x_c))

get α₁＝0.001，α₂＝20。

Step 20, the minimization optimization function e (x) is then solved by gradient descent. An input image X is generated. The optimization algorithm based on the gradient descent method comprises the following steps:

x₀For the gaussian noise image randomly generated in step 4, the preset iteration number m is 6, and itr is 100;

(2) if i<Itr or if

The ith iteration result x is output_i+1And ending the optimization algorithm; otherwise, turning to the step (3); wherein itr is the highest number of iterations;

(4) Updating the result of the ith iteration, x_i+1＝x_i+p_i；

And the gradient of the result of this iteration

Error, i.e.

Definition of

Wherein T represents a matrix transfer;

(6) updating

(7) Defining variable q as x_iGradient of (2)

(8) Iterative calculation of j ═ 1

Get

Updating q, q ═ q-a_iy_i-jUntil j ═ m,;

(9) update g_i,g_i＝H_iq；

(10) Iterative calculation of j ═ 1

Taking out the raw materials,

update g_i，g_i＝g_i+s_i-j(a_j-b) until j ═ m

Meanwhile, in order to save memory overhead, after the step (5) is executed, only the step of retaining the results of the latest m times is executed, if i>m, then delete s_i-m、s_i-m-1...s₁And y_i-m、y_i-m-1...y₁Therefore, the effect of saving the memory can be achieved during operation.

And step 21, repeating the steps 12-20, and generating a new generated image after iterating for 100 times.

And step 22, repeating the steps 11-21, and outputting a final style migration result image after 3 iterations.

The generated style transfer effect image is as shown in fig. 4.

The experimental result shows that the style transfer function of the image can be effectively realized by the method.

Claims

1. A portrait style migration method based on semantic segmentation and a deep convolutional neural network is characterized by comprising the following steps:

step 4, generating Gaussian noise images randomly as new initialization images;

step 9, repeating the step 6-8 for 100 iterations, repeating the step 5-8 for 3 iterations, and outputting a final portrait style transition image;

in step 6, the style constraint layer loss function is:

wherein the content of the first and second substances,

wherein R represents a real number set, w_s，h_sLength and width of the portrait image of contents, m_cSemantic image, m, representing a portrait of content_sSemantic images representing stylistic portraits, β_cAdjusting parameters for semantic content portrait weights, β_sAdjusting parameters for semantic style portrait weights;

denotes Ψ^*The ith local patch in (Φ (x)),

to represent

The ith localpatch in (1), and

and

each representing Ψ^*(Φ(x_s) Either) or

Neutralization of

Or

The most matched local patch, k, represents the number of semantic images;

wherein the local patch selection rule is defined as

2. The method as claimed in claim 1, wherein in step 1, semantic segmentation is performed on the content image and the style image to segment semantic images of the portrait region and the background region, and then semantic segmentation is further performed on the portrait region to segment 5 regions of the face, the nose, the eyes, the mouth and the body as 5 semantic images, and finally 6 semantic images of the background, the face, the nose, the eyes, the mouth and the body are obtained.

3. The method according to claim 2, wherein in step 3, the new feature map of the content constraint feature extraction layer is

4. The method of claim 1, wherein in step 5, the initialization new image is sized to be initialized

Wherein

5. The method of claim 4, wherein in step 7, the content constraint penalty function is

E_c(Φ(x),Φ(x_c))＝||Φ(x)-Φ(x_c)||²。

6. The method of claim 5, wherein in step 8, the total loss function is

E(x)＝α₁E_s(Φ(x),Φ(x_s),m_c,m_s)+α₂E_c(Φ(x),Φ(x_c))

7. The method according to claim 1, wherein in step 8, the gradient descent method-based optimization algorithm comprises the following steps:

(1) initialization, where the iteration parameters i is 0, j is m, the matrix H is defined and initialized to a diagonal matrix with all elements 1, and the allowable error epsilon is 10-⁵Calculating an initial gradient g₁＝▽f(x₀)，x₀A Gaussian noise image randomly generated in the step 4 is obtained;

(2) if i<Itr or if ▽ f (x)_i+1)||≤10^-5Then output the ith iteration result x_i+1And ending the optimization algorithm; otherwise, turning to the step (3); wherein itr is the highest number of iterations;

(4) Updating the result of the ith iteration, x_i+1＝x_i+p_i；

(5) Definition s_iAs a result of the previous step x_iAnd the error of the result of this iteration, i.e. s_i＝x_i+1-x_iDefinition of y_i▽ f (x) of gradient for the result of the previous step_i) And the gradient ▽ f (x) of the result of this iteration_i+1) Error, i.e. y_i＝▽f(x_i+1)-▽f(x_i) Definition of

Wherein T represents a matrix transfer;

(6) updating

(7) Defining variable q as x_i▽ f (x)_i)；

(8) Iterative calculation of j ═ 1

Get

Updating q, q ═ q-a_iy_i-jUntil j is m, and m is a preset iteration number;

(9) update g_i,g_i＝H_iq；

(10) Iterative calculation of j ═ 1

Taking out the raw materials,

update g_i，g_i＝g_i+s_i-j(a_j-b) until j ═ m

8. The method according to claim 7, wherein the optimization algorithm based on the gradient descent method further comprises a step of retaining the results of the last m times after the step (5) is performed if i>m, then delete s_i-m、s_i-m-1...s₁And y_i-m、y_i-m-1...y₁。