CN114255158A

CN114255158A - Method for converting image style and computer program product thereof

Info

Publication number: CN114255158A
Application number: CN202011006694.1A
Authority: CN
Inventors: 林士豪; 杨朝光; 陈良其; 叶书玮
Original assignee: Acer Inc
Current assignee: Acer Inc
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2022-03-29

Abstract

The invention discloses an image style conversion method based on aesthetic design and a computer program product thereof. The new image not only has the characteristics of a content graph and a style graph, but also has more aesthetic feeling than the image produced by the generally known image style conversion method.

Description

Method for converting image style and computer program product thereof

Technical Field

The present invention relates to a method of image style transformation (style transfer) and a computer program product thereof, and more particularly, to a method of image style transformation (style transfer) designed based on Aesthetics (aesthtics) and a computer program product thereof.

Background

[ Prior art documents ]

Gatys,L.A.,Ecker,A.S.,&Bethge,M.(2015).A neural algorithm of artistic style.arXiv preprint arXiv：1508.06576.

According to the above prior art documents, the image style transformation is to use an Artificial Intelligence (AI) algorithm based on convolution operation (constraint) to extract a content representation (content presentation) of a content image (content image) and a style representation (style presentation) of a style image (style image), and generate a new image according to the extracted content representation and style representation. The new image can have the characteristics of both the content map and the stylistic map, such as the shape and contour of the object in the content map, and the color and texture in the stylistic map.

Currently, many software or applications for image style conversion by AI are available on the market, but the conversion effect and quality are not as ideal. In view of the above, there is a need for an aesthetically designed method for style conversion of images, which can make the images after style conversion more aesthetically pleasing.

Disclosure of Invention

The invention discloses an image style conversion method, which comprises the following steps: inputting the content graph and the style graph into a second Convolutional Neural Network (CNN) model, wherein the second CNN model extracts a plurality of first feature graphs (feature maps) of the content graph and a plurality of second feature graphs of the style graph; inputting the content graph into a style transformation neural network model, wherein the style transformation neural network model uses a specific number of filters (filters) to perform convolution operation on the content graph so as to generate a transformed image; inputting the conversion diagram into a second convolution neural network model, and extracting a plurality of third feature diagrams of the conversion diagram by the second convolution neural network model; calculating a content loss (content loss) from the first feature maps and the third feature maps, and calculating a style loss (style loss) from the second feature maps and the third feature maps; adding a result of multiplying the content loss by a content-weight coefficient and a result of multiplying the style loss by a style-weight coefficient to obtain a total loss (total loss), wherein the style-weight coefficient is 16 times the content-weight coefficient; the stylized transformed neural network model is recursively optimized using gradient descent (gradient) to minimize the total loss to obtain an optimized transformed image.

In some embodiments, the content weight factor is 7.5 and the genre weight factor is 120.

In some embodiments, the number of filters used by the style translation neural network model is 32.

In some embodiments, the method for converting image style further includes: before the style sheet is input to the second convolutional neural network model, a preprocessing is performed on the style sheet to adjust the style sheet so that the area of the margin portion of the adjusted style sheet is 25% of the area of the adjusted style sheet.

In some embodiments, the stylistic weighting factor is above 10000.

The invention also discloses a computer program product for converting the image style, which is loaded into the program through a computer to execute the following steps: the first program instruction causes the processor to input the content map and the style map into a second convolutional neural network model, the second convolutional neural network model extracting a plurality of first feature maps of the content map and a plurality of second feature maps of the style map; second program instructions that cause the processor to input the content map into a style conversion neural network model that performs a convolution operation on the content map using a certain number of filters to generate a conversion map; third program instructions that cause the processor to input the transformed graph to a second convolutional neural network model, the second convolutional neural network model extracting a plurality of third feature maps of the transformed graph; fourth program instructions for causing the processor to calculate a content loss based on the first feature maps and the third feature maps, and calculate a style loss based on the second feature maps and the third feature maps; fifth program instructions to cause the processor to add a result of multiplying the content loss by the content weight coefficient and a result of multiplying the genre loss by the genre weight coefficient to obtain a total loss, wherein the genre weight coefficient is 16 times the content weight coefficient; sixth program instructions cause the processor to iteratively optimize the style translation neural network model using a gradient descent method such that the total loss is minimized to obtain an optimal translation map.

In some embodiments of the disclosed computer program product for image style conversion, the content weight factor is 7.5 and the style weight factor is 120.

In some embodiments of the disclosed computer program product for image style conversion, the number of filters used in the style conversion neural network model is 32.

The computer program product for image style conversion disclosed in the present invention further loads a program via a computer to execute a seventh program instruction, so that the processor executes a pre-processing program on the style sheet before the style sheet is input to the second convolutional neural network model to adjust the style sheet, so that the area of the margin portion of the adjusted style sheet is 25% of the area of the adjusted style sheet.

In some embodiments of the disclosed computer program product for image style conversion, the style weight factor is above 10000.

Drawings

Fig. 1 is a diagram 100 illustrating a convolution operation according to an embodiment of the present invention.

FIG. 2 is a flow diagram 200 of a method of image style conversion according to an embodiment of the present invention.

Fig. 3 is a graph showing a relationship between a ratio of content weight coefficients and style weight coefficients and an optimal conversion map according to an embodiment of the present invention.

FIG. 4 is a graph illustrating the effect of the number of filters used by the lattice-transforming neural network model on the richness of the hue of the optimal transformation graph, according to an embodiment of the present invention.

Fig. 5 is a ratio of the white space of the wind grid pattern to the area of the whole pattern, which is shown in the embodiment of the present invention, and influences the texture of the optimal transformation diagram.

Fig. 6 shows the effect of thin-film interference of an optimal conversion chart obtained by setting the style weight coefficient β to a value of 10000 or more according to an embodiment of the present invention.

Description of reference numerals:

100: schematic diagram of

101: inputting an image

102: screening device

103: characteristic diagram

110,111: partial matrix

120,121: convolution value

200: flow chart

S201-S206: step (ii) of

301-305: image of a person

401-408: image of a person

501-504: image of a person

512-514: image of a person

601,602: image of a person

Detailed Description

The invention relates to an image style conversion method based on aesthetic design and a computer program product thereof, which can make the image after style conversion more beautiful. The "aesthetic feeling" is a concept of "aesthetic (aesthetic)", "taste (taste)", "aesthetic feeling (aesthetic perception)", and "aesthetic experience (aesthetic experience)". Where "beauty" refers to the description of the objective nature of a subject's presence in a spatiotemporal location; "taste" refers to the subjective value expression of the viewer's subject's soul interacting with the subject facts; "aesthetic perception" means the sensory function of the viewer's body, perceiving the presence of the properties of the object; "aesthetic experience" refers to the nature of the viewer's body in contact with a certain situation or object, resulting in a sense of perfect sufficiency on its own.

The existence of the aesthetic feeling can be recognized from the form, and observation, analysis, and experience can be performed from the aspects of "proportion (presentation)", "colors", "texture", "composition", "structure", and "structure". The image style conversion method of the invention is designed with emphasis on the aspects of proportion, color, texture and the like.

The invention discloses a method for converting image style, which can be applied to a network interface or some application programs. In some embodiments, the disclosed image style conversion method may be used in conjunction with Web Graphics Library (WebGL) to present interactive 2D and 3D Graphics in any compatible Web browser without using a plug-in. For example, a user may upload a content diagram of a to-be-converted style and a style diagram serving as a reference object of the conversion style to a server by using a web interface of WebGL, and then the server generates a new image by using the image style conversion method disclosed in the present invention according to the content diagram and the style diagram received by the web interface, and then presents the new image on the web interface. The new image can combine the features of the content map and the stylistic map, such as the shape and contour of the object in the content map, and the color and texture in the stylistic map. In another example, the user may simply upload the content map and then additionally select the style sheet that has been provided on the web interface.

Fig. 1 is a diagram 100 illustrating a convolution operation according to an embodiment of the present invention. The diagram 100 includes an input image 101, a filter 102, and a feature map (feature map)103, wherein the input image 101 has a plurality of pixels, and pixel values of the plurality of pixels are represented in a matrix (for example, but not limited to, a 5 × 5 matrix in fig. 1). The filter 102 and the feature map 103 are also represented in the form of a matrix (for example, but not limited to, a 3 × 3 matrix in fig. 1).

As shown in fig. 1, performing a convolution operation on an input image 101 and a filter 102 will result in a feature map 103. Specifically, the convolution operation is to multiply and sum up pixel values in corresponding positions in the input image 101 by the filter 102 one by one to obtain convolution values of corresponding positions (also referred to as "feature points") in the feature map 103. Continuously sliding the filter 102 corresponds to the position in the input image 101, i.e. all convolution values in the feature map 103 can be calculated. For example, the partial matrix 110 and the filter 102 in the input image 101 are calculated as follows

0*0+0*1+1*2+3*2+1*2+2*0+2*0+0*1+0*2＝10

This results in a convolution value 120 of 10 in the signature graph 103.

As another example, the partial matrix 111 and the filter 102 in the input image 101 are calculated as follows

2*0+1*1+0*2+1*2+3*2+1*0+2*0+2*1+3*2＝17

This results in a convolution value 121 of 17 in the signature graph 103.

A Convolutional Neural Network (CNN) model may have a plurality of convolutional layers (filters), and each convolutional layer may have a plurality of filters. The plurality of feature maps obtained after performing the convolution operation on each convolutional layer are used as input data of the next convolutional layer.

Fig. 2 is a flowchart 200 of a method of image style conversion according to an embodiment of the present invention, the flowchart 200 comprising steps S201-S206, etc. In step S201, the content map and the style map are input to the second convolutional neural network model, the second convolutional neural network model extracts a plurality of first feature maps of the content map and a plurality of second feature maps of the style map through the convolution operation as described above, and then the process proceeds to step S202.

In some embodiments, the second convolutional neural network model may be a VGG (visual Geometry group) neural network model, such as VGG 16 and VGG 19. In a preferred embodiment, the second convolutional neural network model is VGG 19.

In step S202, the content map is input to the style-transforming neural network model, and the style-transforming neural network model performs the convolution operation on the content map using a certain number of filters to generate a transformed map, and then the process proceeds to step S203.

In some embodiments, the style-transforming neural network model may also be a convolutional neural network model, but different from the second convolutional neural network model. In particular, in functional terms, the style-transforming neural network model is used to transform an input image into a new image in some way. In the following steps, the training processes of "result feedback" and "update parameters" are repeated, so that the new image output by the style conversion neural network model gradually converges and is optimized, and finally an optimal conversion map is output. In contrast, the second convolutional neural network model is used in the method of the present disclosure to extract feature maps of the input image, so that these extracted feature maps are used as the basis for optimizing the style-switching neural network model in the subsequent step, and the second convolutional neural network model itself is not the object to be trained. On the other hand, the style-transforming neural network model may have different numbers of convolution layers, different numbers of filters, different values … … in the filter matrix, and the like from the second convolutional neural network model.

In step S203, the transformed graph is input into the second convolutional neural network model, and the second convolutional neural network model extracts a plurality of third feature maps of the transformed graph through the convolution operation as described above, and then the process proceeds to step S204.

In step S204, content loss is calculated according to the first feature maps and the third feature maps, and style loss is calculated according to the second feature maps and the third feature maps, and then the process proceeds to step S205.

According to an embodiment of the present invention, the content loss can be simply understood as "a difference between a transformation graph and a content graph in content characterization (e.g., shape and contour of an object in the graph"). Specifically, the content characterization is to select a plurality of feature maps output by a specific layer of convolutional layer from all feature maps output by the second convolutional neural network model. Calculation of content loss, as in the following equation 1:

in the formula 1, L_contentIt is referred to as the loss of content,

l refers to the content map, the transition map, and the number of layers of the convolutional layer,

the convolution values of the third feature map (i.e. the content representation of the conversion map) output by the convolution layer of the first layer and a certain feature point in the first feature map (i.e. the content representation of the content map) are referred to respectively.

According to an embodiment of the present invention, style loss can be simply understood as "the gap between the transformation graph and the style graph in terms of style characterization (e.g., color and texture)". Specifically, the style characterization is a correlation (correlation) between a plurality of feature maps output by each convolutional layer, as shown in the following formula 2:

in the formula 2, the first and second groups,

means that the style characterization obtained from the l-th convolutional layer is expressed in the form of Gram matrix,

it refers to the inner product (inner product) of the multiple characteristic maps outputted by the first convolutional layer. However, in embodiments of the present invention, unlike the content characterization obtained from a specific convolutional layer, the style loss calculation must take into account the style characterization on multiple convolutional layers, as shown in the following equations 3 and 4:

in formula 3 and formula 4, E_lThe style loss of the first layer convolution layer is the partial style loss,

and

respectively referring to the style representation of the conversion chart and the style representation of the style chart obtained from the first layer convolution layer, N_lAnd M_lRespectively refer to the length and width L of the multiple characteristic maps outputted by the first layer of convolution layer_styleIt is referred to as a loss of style,

respectively referring to the style sheet and the conversion sheet,

is a weighted sum of the partial style loss contributed by each convolutional layer. In an embodiment of the present invention, w_lThe value of (1) is always divided by the total number of convolutional layers considered in the style loss calculation, i.e., the weight assigned to each of these convolutional layers is equal, but the invention is not limited thereto.

In step S205, the result of multiplying the content loss by the content weight coefficient and the result of multiplying the genre loss by the genre weight coefficient are added to obtain a total loss, and the process proceeds to step S206. The calculation of the total loss is also called loss function (loss function), as shown in the following equation 5:

in the formula 5, L_totalIt is referred to the total loss of,

respectively a content graph, a style graph and a conversion graph, L_contentAnd L_styleRespectively, content loss and genre loss, and α and β respectively, content weight coefficient and genre weight coefficient. In an embodiment of the present invention, α is set to 16 times β.

In step S206, the gradient descent method is recursively used to optimize the neural network model, so that the total loss is minimized to obtain an optimal transformation map. Specifically, the gradient descent method obtains a gradient (i.e., an adjustment direction of parameters of the style-transforming neural network model) by performing partial derivative (partial derivative) calculation on the loss function, and then adjusts the parameters of the style-transforming neural network model according to the gradient, so as to reduce the total loss. And gradually reducing the total loss through training processes of repeatedly feeding back results, updating parameters and the like until the total loss converges to the minimum value, wherein the conversion graph output by the style conversion neural network model is the optimal conversion graph.

In some embodiments, the gradient descent method used in step S206 may be a Stochastic Gradient Descent (SGD) or adaptive moment estimation (Adam) algorithm.

Fig. 3 is a graph showing a relationship between a ratio of content weight coefficients and style weight coefficients and an optimal conversion map according to an embodiment of the present invention. In fig. 3, the image 301 and the image 302 are a content map and a style map, respectively, and the image 303, the image 304 and the image 305 are optimal transformation maps generated by the style transformation neural network model when the style weight coefficient α is 10 times, 16 times and 27 times of the content weight coefficient β, respectively. As shown in fig. 3, image 303 is more similar to image 301 (i.e., a content map) than image 304 and image 305; conversely, image 305 is similar to image 302 (i.e., a stylized graph) as compared to

images

303 and 304.

According to an embodiment of the present invention, the content weight coefficient α is 16 times the style weight coefficient β, which is set based on aesthetic "scale" aspects. The arrangement can ensure that the optimal conversion diagram does not distort in content, and simultaneously can also take new styles into consideration. On the basis, in some embodiments, the content weight coefficient is set to 7.5, and the style weight coefficient is set to 120, so that the optimal conversion map generated by the style conversion neural network model can be really more attractive after evaluation by experts in the art.

According to embodiments of the present invention, the number of filters used by the style conversion neural network model may affect the degree of color richness of the optimal conversion map in terms of aesthetic "color". A lower number of filters will make the color of the optimal conversion map monotonous, while a higher number of filters will make the color of the optimal conversion map richer. However, as the number of filters used by the style conversion neural network model increases, the time required to perform the image style conversion also increases, thereby affecting the user experience. In addition, the increase in color richness resulting from increasing the number of filters to the optimal conversion map is less pronounced when the number of filters is higher than when the number of filters is lower.

FIG. 4 is a graph illustrating the effect of the number of filters used by the lattice-transforming neural network model on the richness of the hue of the optimal transformation graph, according to an embodiment of the present invention. In fig. 4, the image 401 and the image 402 are a content map and a style map, respectively, and the image 403, the image 404, the image 405, the image 406, the image 407, and the image 408 are optimal transformation maps generated by the style-transforming neural network model when the number of filters used by the style-transforming neural network model is 1, 4, 16, 32, 64, and 128, respectively. As shown in FIG. 4, image 406 is significantly richer in color than

images

403, 404, and 405. However, the color of image 407 or image 408 does not change significantly compared to image 406.

The number of the filters used by the style conversion neural network model is set to be 32, and the color of the optimal conversion map can be made to be rich enough after the evaluation of experts in the art. The increase in color richness for the optimal transformation map using more than 32 filters is not obvious. Therefore, in some embodiments, the number of filters used in the neural network model is set to 32 to take into account the user experience and the color richness of the optimal transformation map.

Fig. 5 is a ratio of the white space of the wind grid pattern to the area of the whole pattern, which is shown in the embodiment of the present invention, and influences the texture of the optimal transformation diagram. In fig. 5, the image 501 is a content diagram, the

images

502, 503 and 504 are style diagrams with blank space occupying more than 50%, about 20% and about 5% of the whole diagram, respectively, and the

images

512, 513 and 514 are optimal transformation diagrams generated by the style transformation neural network model corresponding to the

images

502, 503 and 504, respectively. As shown in fig. 5, the ratio of the white space of the stylized chart to the area of the entire stylized chart has a significant impact on the aesthetic "texture" of the best conversion chart.

According to the embodiment of the invention, when the area of the white part of the style sheet is about 25% of the area of the whole style sheet, the texture of the optimal conversion chart is most beautiful after being evaluated by experts in the field of art. Therefore, in some embodiments, before the style sheet is input to the second convolutional neural network model, a pre-processing procedure may be performed on the style sheet to adjust the style sheet so that the area of the margin portion of the adjusted style sheet is 25% of the area of the adjusted style sheet, so as to obtain the best transformation sheet with the most aesthetic sense in texture.

In the embodiment of the present invention, the content weight coefficient α is 16 times the style weight coefficient β, as described above. On the basis, in some embodiments, the style weight coefficient is set to a value above 10000, so that the optimal transformation graph generated by the style transformation neural network model has the effect of thin-film interference (thin-film interference).

Fig. 6 shows the effect of thin-film interference of an optimal conversion chart obtained by setting the style weight coefficient β to a value of 10000 or more according to an embodiment of the present invention. In fig. 6, an image 601 and an image 602 are the best transformation graphs generated by the neural network model with the stylistic weighting coefficients set to 1000 and 10000, respectively. As shown in fig. 6, compared to image 601, image 602 (especially three circled positions in the figure) has a rainbow color similar to that often seen on soap foam by those skilled in the art, which is the effect of thin film interference.

The invention further discloses a computer program product for image style conversion, which is loaded with a program via a computer to execute a first program instruction, a second program instruction, a third program instruction, a fourth program instruction, a fifth program instruction and a sixth program instruction, wherein the first program instruction causes the processor to execute step S201 in fig. 2, the second program instruction causes the processor to execute step S202 in fig. 2, the third program instruction causes the processor to execute step S203 in fig. 2, the fourth program instruction causes the processor to execute step S204 in fig. 2, the fifth program instruction causes the processor to execute step S205 in fig. 2, and the sixth program instruction causes the processor to execute step S206 in fig. 2.

In some embodiments of the disclosed computer program product for image style conversion, the content weight coefficient is set to 7.5, and the style weight coefficient is set to 120, so that the best conversion graph generated by the style conversion neural network model is more aesthetic.

In some embodiments of the disclosed computer program product for image style conversion, the number of filters used in the style conversion neural network model is set to 32 to take into account the user experience and the color richness of the optimal conversion map.

The computer program product for image style conversion disclosed in the present invention further loads a program via a computer to execute a seventh program instruction, so that the processor executes a preprocessing program on the style sheet before the style sheet is input to the second convolutional neural network model to adjust the style sheet, so that the area of the margin portion of the adjusted style sheet is 25% of the area of the adjusted style sheet, thereby obtaining the best conversion sheet with the most aesthetic feeling in texture.

In some embodiments of the computer program product for image style conversion disclosed in the present invention, the style weight coefficient is set to a value above 10000, so that the optimal conversion map generated by the style conversion neural network model has the effect of thin film interference.

Reference numerals, such as "first", "second", etc., in the description and in the claims are used for convenience of description only and do not have a sequential relationship with each other.

The above paragraphs use various levels of description. It should be apparent that the teachings herein may be implemented in a wide variety of ways and that any specific architecture or functionality disclosed in the examples is merely representative. Any person skilled in the art will appreciate, in light of the teachings herein, that each of the layers disclosed herein can be implemented independently or that more than two layers can be implemented in combination.

Although the present disclosure has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the disclosure, and therefore, the scope of the invention is to be determined by the appended claims.

Claims

1. A method of image style conversion, comprising the steps of:

inputting a content graph and a style graph into a second convolutional neural network model, wherein the second convolutional neural network model extracts a plurality of first feature graphs of the content graph and a plurality of second feature graphs of the style graph;

inputting the content graph into a style transformation neural network model, wherein the style transformation neural network model uses a specific number of filters to perform convolution operation on the content graph so as to generate a transformation graph;

inputting the transformation diagram into the second convolutional neural network model, wherein the second convolutional neural network model extracts a plurality of third feature diagrams of the transformation diagram;

calculating a content loss according to the first feature maps and the third feature maps, and calculating a style loss according to the second feature maps and the third feature maps;

adding the result obtained by multiplying the content loss by a content weight coefficient and the result obtained by multiplying the style loss by a style weight coefficient to obtain a total loss, wherein the style weight coefficient is 16 times of the content weight coefficient;

the model of the neural network is recursively optimized using a gradient descent method to minimize the total loss, thereby obtaining an optimal transformation map.

2. The method of image style conversion as claimed in claim 1, wherein the content weight coefficient is 7.5 and the style weight coefficient is 120.

3. The method of image style conversion according to claim 1 or 2, wherein the specific number is 32.

4. The method of image style conversion according to claim 1 or 2, further comprising:

before the style sheet is input into the second convolutional neural network model, a preprocessing procedure is performed on the style sheet to adjust the style sheet so that the area of the margin part of the style sheet after being adjusted is 25% of the area of the style sheet after being adjusted.

5. The method of image style conversion according to claim 1, wherein the style weighting factor is above 10000.

6. A computer program product for image style conversion, loaded into the program via a computer to execute:

the first program instruction enables a processor to input a content graph and a style graph into a second convolutional neural network model, wherein the second convolutional neural network model extracts a plurality of first feature graphs of the content graph and a plurality of second feature graphs of the style graph;

second program instructions for causing the processor to input the content map into a style-transforming neural network model, the style-transforming neural network model performing a convolution operation on the content map using a specified number of filters to generate a transformed map;

third program instructions that cause the processor to input the transformed graph to the second convolutional neural network model, the second convolutional neural network model extracting a plurality of third feature maps of the transformed graph;

fourth program instructions for causing the processor to calculate a content loss based on the first feature maps and the third feature maps, and calculate a style loss based on the second feature maps and the third feature maps;

fifth program instructions to cause the processor to add a result of multiplying the content loss by a content weight coefficient and a result of multiplying the style loss by a style weight coefficient to obtain a total loss, wherein the style weight coefficient is 16 times the content weight coefficient;

sixth program instructions cause the processor to iteratively optimize the model using a gradient descent method such that the total loss is minimized to obtain an optimal transformation map.

7. The computer program product of claim 6, wherein the content weight factor is 7.5 and the style weight factor is 120.

8. The computer program product of claim 6 or 7, wherein the specific number is 32.

9. The computer program product of claim 6 or 7, further comprising seventh program instructions loaded by a computer to execute a pre-processing procedure on the style sheet to adjust the style sheet such that the area of the margin of the style sheet after adjustment is 25% of the area of the style sheet after adjustment before the style sheet is input to the second convolutional neural network model.

10. The computer program product of claim 6, wherein the style weight coefficient is above 10000.