TWI723547B

TWI723547B - Style transfer method and computer program product thereof

Info

Publication number: TWI723547B
Application number: TW108133761A
Authority: TW
Inventors: 陳良其; 楊朝光
Original assignee: 宏碁股份有限公司
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2021-04-01
Also published as: TW202113687A

Abstract

A style transfer method incudes: obtaining a first convolution neural network (CNN) model, wherein the first convolution neural network model is trained to extract features from a content image, and includes a first specific number of feature filters and a second specific number of feature filters; reducing the number of the first specific number of feature filters to one-eighth; reducing the number of the second specific number of feature filters to one-eighth; inputting the content image to the first convolution neural network model; generating a plurality of first feature images from the content image through performing a convolution calculation by the first convolution neural network model.

Description

Style transfer method and computer program product

本發明係有關於一種風格移轉的方法，特別是有關於一種應用於網頁介面的快速風格移轉的方法及其電腦程式產品。The present invention relates to a method of style transfer, in particular to a method of quick style transfer applied to a web interface and a computer program product thereof.

風格移轉(style transfer)是利用人工智慧(artificial intelligent：AI) 視覺演算法，對一原圖(content image)取出其內容，並對一風格圖(style image)取出其風格，並將上述兩者加以合成的一種技術。Style transfer (style transfer) uses artificial intelligence (AI) visual algorithms to extract the content of an original image (content image) and extract the style of a style image (style image), and combine the above two It is a technology synthesized by the author.

網頁瀏覽已是目前社會上最多人使用的一個媒介，為了使風格移轉的應用可以讓更多人看到，如何讓一個龐大的風格移轉演算法可以在網頁上很快地產生最後的合成結果則是首要之務。在各種硬體AI加速尚未考量到網頁的應用的現在，上述任務更加艱難。Web browsing is currently the most widely used medium in society. In order to make the style transfer application more visible to more people, how to make a huge style transfer algorithm can quickly produce the final synthesis on the web page The result is the first priority. Now that various hardware AI accelerations have not yet considered web applications, the above tasks are even more difficult.

依據本發明一實施例之風格移轉的方法，應用於一網頁介面中，包括：取得一第一卷積神經網路(convolution neural network：CNN)模型，其中，該第一卷積神經網路模型經訓練用以萃取一原圖(content image)中的特徵，並且包括複數卷積層，該等複數卷積層包括一第一特定數目的特徵過濾器(feature filter)及一第二特定數目的特徵過濾器；將該第一特定數目的特徵過濾器的數量減少至原來的八分之一；將該第二特定數目的特徵過濾器的數量減少至原來的八分之一；將該原圖輸入至該第一卷積神經網路模型；透過該第一卷積神經網路模型執行一卷積運算，用以產生該原圖的複數第一特徵圖。其中，每一特徵過濾器用以擷取該原圖的不同特徵。The method of style transfer according to an embodiment of the present invention, applied to a web interface, includes: obtaining a first convolution neural network (CNN) model, wherein the first convolution neural network The model is trained to extract features from a content image, and includes a plurality of convolutional layers including a first specific number of feature filters and a second specific number of features Filter; reduce the number of the first specific number of feature filters to one eighth of the original; reduce the number of the second specific number of feature filters to one eighth of the original; enter the original image To the first convolutional neural network model; perform a convolution operation through the first convolutional neural network model to generate a complex first feature map of the original image. Among them, each feature filter is used to extract different features of the original image.

如上述之風格移轉的方法，更包括：取得一第二卷積神經網路模型；將該原圖的該等複數第一特徵圖以及一風格圖輸入至該第二卷積神經網路模型；透過該卷積運算，從該等複數第一特徵圖中擷取相關聯於該風格圖的空間特徵圖，並且從該風格圖中擷取相關聯於該風格圖的非空間特徵圖；透過一損失函數運算，將相關聯於該原圖的空間特徵圖與相關聯於該風格圖的該非空間特徵圖進行合併，用以得到經風格轉移後的一合成圖；將該合成圖顯示於該網頁介面中。The method for style transfer as described above further includes: obtaining a second convolutional neural network model; inputting the plural first feature maps and a style map of the original image to the second convolutional neural network model ; Through the convolution operation, extract the spatial feature map associated with the style map from the plural first feature maps, and extract the non-spatial feature map associated with the style map from the style map; through A loss function operation to merge the spatial feature map associated with the original image and the non-spatial feature map associated with the style map to obtain a composite image after style transfer; and display the composite image on the In the web interface.

如上述之風格移轉的方法，其中，該第一卷積神經網路模型將該原圖與該第一特定數目及該第二特定數目的特徵過濾器做卷積運算，用以萃取並產生該原圖的該等複數第一特徵圖。Such as the above-mentioned style transfer method, wherein the first convolutional neural network model performs convolution operation on the original image with the first specific number and the second specific number of feature filters to extract and generate The plural first feature maps of the original image.

如上述之風格移轉的方法，其中，該空間特徵圖包括形狀特徵圖、邊界特徵圖；該非空間特徵圖包括顏色特徵圖、紋理特徵圖。The method for style transfer as described above, wherein the spatial feature map includes a shape feature map and a boundary feature map; the non-spatial feature map includes a color feature map and a texture feature map.

如上述之風格移轉的方法，其中，該第一特定數目為32個，該第二特定數目為64個。The method for style transfer as described above, wherein the first specific number is 32, and the second specific number is 64.

如上述之風格移轉的方法，其中，該第二卷積神經網路模型為VGG 19。The method for style transfer as described above, wherein the second convolutional neural network model is VGG19.

如上述之風格移轉的方法，其中，該損失函數運算，包括：計算相關聯於該原圖的空間特徵圖與該合成圖的一第一特徵差距；計算相關聯於該風格圖的該非空間特徵圖與該合成圖的一第二特徵差距；將該第一特徵差距與該第二特徵差距加總，用以得到該損失函數；透過一梯度下降法，將該損失函數最小化，用以得到該合成圖。The method for style transfer as described above, wherein the loss function calculation includes: calculating a first feature gap between the spatial feature map associated with the original image and the composite image; calculating the non-spatial feature map associated with the style map A second feature gap between the feature map and the composite image; the first feature gap and the second feature gap are added together to obtain the loss function; a gradient descent method is used to minimize the loss function to Obtain the composite picture.

依據本發明一實施例之電腦程式產品，應用於一網頁介面中，用以對一原圖與一風格圖進行風格移轉；該電腦程式產品經由電腦載入該程式以執行：一第一呼叫指令、一過濾器設定指令、一第一輸入指令，以及一第一特徵萃取指令。該第一呼叫指令使該電腦的一處理器從該電腦的儲存器中取得一第一卷積神經網路模型；其中，該第一卷積神經網路模型經訓練用以萃取該原圖的特徵，並且包括複數卷積層，該等複數卷積層包括一第一特定數目的特徵過濾器及一第二特定數目的特徵過濾器。該過濾器設定指令使該處理器將該第一特定數目的特徵過濾器的數量減少至原來的八分之一，並且將該第二特定數目的特徵過濾器的數量減少至原來的八分之一。該第一輸入指令使該處理器將該原圖輸入至該第一卷積神經網路模型。該第一特徵萃取指令使該處理器透過該第一卷積神經網路模執行一卷積運算，用以產生該原圖的複數第一特徵圖。其中，每一特徵過濾器用以擷取該原圖的不同特徵。A computer program product according to an embodiment of the present invention is applied to a web interface for style transfer between an original image and a style image; the computer program product loads the program through the computer to execute: a first call Command, a filter setting command, a first input command, and a first feature extraction command. The first call instruction causes a processor of the computer to obtain a first convolutional neural network model from the storage of the computer; wherein, the first convolutional neural network model is trained to extract the original image Features and includes a plurality of convolutional layers, and the plurality of convolutional layers includes a first specific number of feature filters and a second specific number of feature filters. The filter setting instruction causes the processor to reduce the number of the first specific number of feature filters to one eighth of the original, and reduce the number of the second specific number of feature filters to one eighth of the original One. The first input instruction causes the processor to input the original image to the first convolutional neural network model. The first feature extraction instruction causes the processor to perform a convolution operation through the first convolutional neural network module to generate a complex first feature map of the original image. Among them, each feature filter is used to extract different features of the original image.

如上述之電腦程式產品，更包括：一第二呼叫指令、一第二輸入指令、一第二特徵萃取指令、一合成指令，以及一顯示指令。該第二呼叫指令使該處理器從該電腦的儲存器中取得一第二卷積神經網路模型。該第二輸入指令使該處理器將該原圖的該等複數第一特徵圖以及該風格圖輸入至該第二卷積神經網路模型。該第二特徵萃取指令使該處理器透過該卷積運算，從該等複數第一特徵圖中擷取相關聯於該原圖的空間特徵圖，並且該風格圖中擷取相關聯於該風格圖的非空間特徵圖。該合成指令使該處理器透過一損失函數運算，將相關聯於該原圖的空間特徵圖與相關聯於該風格圖的該非空間特徵圖進行合併，用以得到經風格轉移後的一合成圖。該顯示指令使該處理器將該合成圖顯示於該網頁介面中。The computer program product described above further includes: a second call command, a second input command, a second feature extraction command, a synthesis command, and a display command. The second call instruction causes the processor to obtain a second convolutional neural network model from the computer's storage. The second input instruction enables the processor to input the plural first feature maps and the style map of the original image to the second convolutional neural network model. The second feature extraction instruction causes the processor to extract the spatial feature maps associated with the original image from the plurality of first feature maps through the convolution operation, and the extraction from the style map is associated with the style The non-spatial feature map of the graph. The synthesis instruction causes the processor to combine the spatial feature map associated with the original image with the non-spatial feature map associated with the style map through a loss function operation to obtain a composite image after style transfer . The display instruction causes the processor to display the composite image in the web page interface.

如上述之電腦程式產品，其中，該第一卷積神經網路模型將該原圖與該第一特定數目及該第二特定數目的特徵過濾器做卷積運算，用以萃取並產生該原圖的該等複數第一特徵圖。Such as the above-mentioned computer program product, wherein the first convolutional neural network model performs convolution operation on the original image with the first specific number and the second specific number of feature filters to extract and generate the original The plural first feature maps of the graph.

如上述之電腦程式產品，其中，該空間特徵圖包括形狀特徵圖、邊界特徵圖；該非空間特徵圖包括顏色特徵圖、紋理特徵圖。The computer program product described above, wherein the spatial feature map includes a shape feature map and a boundary feature map; the non-spatial feature map includes a color feature map and a texture feature map.

如上述之電腦程式產品，其中，該第一特定數目為32個，該第二特定數目為64個。The computer program product described above, wherein the first specific number is 32, and the second specific number is 64.

如上述之電腦程式產品，其中，該第二卷積神經網路模型為VGG 19。Such as the computer program product mentioned above, wherein the second convolutional neural network model is VGG 19.

如上述之電腦程式產品，其中，該損失函數運算，包括：計算相關聯於該原圖的空間特徵圖與該合成圖的一第一特徵差距；計算相關聯於該風格圖的該非空間特徵圖與該合成圖的一第二特徵差距；將該第一特徵差距與該第二特徵差距加總，用以得到該損失函數；透過一梯度下降法，將該損失函數最小化，用以得到該合成圖。The computer program product described above, wherein the loss function calculation includes: calculating a first feature gap between the spatial feature map associated with the original image and the composite image; calculating the non-spatial feature map associated with the style map And a second feature gap of the composite image; sum the first feature gap and the second feature gap to obtain the loss function; use a gradient descent method to minimize the loss function to obtain the Composite image.

本發明之風格移轉(style transfer)方法，係應用於一網頁介面中，該網頁介面例如利用一網頁圖形資料庫Web Graphics Library (WebGL)，在任何兼容的網頁瀏覽器中，呈現互動式的3D或2D圖形。在本實施例中，使用者可將一原圖(content image)及一風格圖(style image)上傳於該網頁介面中，或者是由使用者上傳該原圖另外選用由該網頁所提供的風格圖，並且利用本發明的風格移轉方法，透過卷積運算，用以對該原圖取出內容，並且對該風格圖取出風格，並加以合成，最後將一合成圖輸出於該網頁介面中，而達到風格移轉的效果。The style transfer method of the present invention is applied to a web interface. The web interface uses, for example, a web graphics database Web Graphics Library (WebGL) to display interactive web pages in any compatible web browser. 3D or 2D graphics. In this embodiment, the user can upload a content image and a style image to the web interface, or the user can upload the original image and choose the style provided by the web page. It also uses the style transfer method of the present invention to extract content from the original image through convolution, and extract the style from the style image, and synthesize, and finally output a synthesized image in the web interface. And to achieve the effect of style transfer.

第1圖為本揭露實施例之卷積運算的示意圖。如第1圖所示，輸入圖像100(例如為該原圖，並且例如以7*7的矩陣表示)係與特徵過濾器102(例如以3*3的矩陣表示)做卷積運算，而得到一特徵圖104(經卷積運算後成為5*5的矩陣)。若輸入圖像100中的特徵與特徵過濾器102的特徵愈相似，則經卷積運算所得出的特徵圖104中的對應卷積值會愈大。Figure 1 is a schematic diagram of the convolution operation of the disclosed embodiment. As shown in Figure 1, the input image 100 (for example, the original image and represented by a 7*7 matrix) is convolved with the feature filter 102 (for example, represented by a 3*3 matrix), and A feature map 104 is obtained (it becomes a 5*5 matrix after convolution operation). If the feature in the input image 100 is more similar to the feature of the feature filter 102, the corresponding convolution value in the feature map 104 obtained by the convolution operation will be larger.

舉例來說，如第1圖所示，輸入圖像100中的部分特徵110(例如以3*3矩陣表示)係與特徵過濾器102做卷積運算，由於部分特徵110係與特徵過濾器102完全相同，經過卷積運算，係可在特徵圖104中得到其對應的卷積值『4』。再者，輸入圖像100中的部分特徵112係與特徵過濾器102完全不同，經過卷積運算，係可在特徵圖104中得到其對應的卷積值『0』。藉由上述卷積運算，特徵過濾器102係可將輸入圖像100中與特徵過濾器102本身最相近的特徵取出，而得到對應的特徵圖104。第1圖中的輸入圖像100、特徵過濾器102，以及特徵圖104之矩陣內的數值僅作為例示，不作為本發明之限制。For example, as shown in Figure 1, some of the features 110 in the input image 100 (represented by a 3*3 matrix, for example) are convolved with the feature filter 102, because some of the features 110 are related to the feature filter 102. It is exactly the same. After the convolution operation, the corresponding convolution value "4" can be obtained in the feature map 104. Furthermore, some of the features 112 in the input image 100 are completely different from the feature filter 102. After a convolution operation, the corresponding convolution value "0" can be obtained in the feature map 104. Through the aforementioned convolution operation, the feature filter 102 can extract the features in the input image 100 that are closest to the feature filter 102 itself, and obtain the corresponding feature map 104. The values in the matrix of the input image 100, the feature filter 102, and the feature map 104 in Fig. 1 are only examples, and not as a limitation of the present invention.

本發明的風格移轉方法係利用經訓練的一第一卷積神經網路(convolution neural network：CNN)模型萃取該原圖中的特徵。一般來說，該第一卷積神經網路模型為一演算法。該演算法，例如以Python語法為例，包括複數個卷積運算的功能函數，例如conv1 = _conv_layer (image, 32, 9, 1)，其中_conv_layer即為上述卷積運算的功能函數，可依據其應用需求，設定不同的輸入圖像(image)、特徵過濾器數量(filter number，例如為32)、特徵過濾器大小(filter size，例如為9)、以及卷積運算步伐(stride，例如為1)。The style transfer method of the present invention uses a trained first convolution neural network (CNN) model to extract the features in the original image. Generally speaking, the first convolutional neural network model is an algorithm. This algorithm, for example, taking Python syntax as an example, includes multiple function functions of convolution operations, such as conv1 = _conv_layer (image, 32, 9, 1), where _conv_layer is the function function of the above convolution operations, which can be based on Its application requirements include setting different input images (image), the number of feature filters (filter number, for example, 32), the feature filter size (filter size, for example, 9), and the convolution operation stride (stride, for example, 1).

第2圖為本揭露實施例之一第一卷積神經網路模型200的示意圖。如第2圖所示，經訓練的該第一卷積神經網路200包括複數卷積層(convolution layer)(例如卷積層210、212、214、216)，以及複數殘差區塊(residual block)(例如殘差區塊218、220、222、224、226、228)。其中，第一卷積神經網路200的每一卷積層及每一殘差區塊係可分別對應其演算法中的不同卷積運算功能函數。FIG. 2 is a schematic diagram of a first convolutional neural network model 200 according to an embodiment of the disclosure. As shown in Figure 2, the trained first convolutional neural network 200 includes a complex convolution layer (eg convolution layer 210, 212, 214, 216), and a complex residual block (residual block) (For example, residual blocks 218, 220, 222, 224, 226, 228). Among them, each convolutional layer and each residual block of the first convolutional neural network 200 can respectively correspond to different convolution operation function functions in its algorithm.

舉例來說，卷積層210可對應演算法中的卷積功能函數conv1 = _conv_layer (image, 32, 9, 1)，以及殘差區塊218可對應演算法中的卷積功能函數conv3 = _conv_layer (conv2, 128, 3, 2)。因此，透過對上述卷積功能函數的設定(例如預先設定特徵過濾器的數量或種類)，經訓練的第一卷積神經網路模型200可具有多個預先設定好的特徵過濾器，用以擷取所輸入該原圖的不同特徵。For example, the convolution layer 210 may correspond to the convolution function conv1 = _conv_layer (image, 32, 9, 1) in the algorithm, and the residual block 218 may correspond to the convolution function conv3 = _conv_layer ( conv2, 128, 3, 2). Therefore, by setting the above-mentioned convolution function function (for example, pre-setting the number or types of feature filters), the trained first convolutional neural network model 200 can have a plurality of preset feature filters for Extract the different features of the input original image.

舉例來說，當該原圖輸入至該卷積層210時，該原圖會與卷積層210內的32個不同的特徵過濾器(例如第1圖的特徵過濾器102)分別執行卷積運算，用以抓取該原圖的32個不同的特徵(例如形狀、邊界、顏色、紋理…等)，並且產生32個不同的特徵圖像(例如第1圖特徵圖104)。同理，卷積層210所產生的32個不同的特徵圖像會依序會再輸入至卷積層212，並且每一特徵圖像經由卷積層212內的64個不同的特徵過濾器再次抓取每一特徵圖像的64個不同的特徵圖像。For example, when the original image is input to the convolutional layer 210, the original image and 32 different feature filters in the convolutional layer 210 (such as the feature filter 102 in the first image) will perform convolution operations respectively. It is used to capture 32 different features (such as shape, border, color, texture, etc.) of the original image, and generate 32 different feature images (such as the first image feature map 104). Similarly, the 32 different feature images generated by the convolutional layer 210 will be sequentially input to the convolutional layer 212, and each feature image will be captured again by 64 different feature filters in the convolutional layer 212. 64 different feature images of one feature image.

簡單來說，該原圖是輸入至第一卷積神經網路模型200的卷積層210，途中依序經過卷積層212、殘差區塊218~228、卷積層214，以及卷積層216的卷積運算，並從卷積層216輸出該原圖的該等複數第一特徵圖。在本實施例中，該等複數卷積層(卷積層210~216)中的每一特徵過濾器係與所輸入該原圖執行一次卷積運算，該等複數殘差區塊(殘差區塊218~228)中的每一特徵過濾器係與所輸入該原圖執行兩次卷積運算。To put it simply, the original image is input to the convolutional layer 210 of the first convolutional neural network model 200, and passes through the convolutional layer 212, residual blocks 218~228, convolutional layer 214, and convolutional layer 216 in sequence on the way. Product operation, and output the complex first feature maps of the original image from the convolution layer 216. In this embodiment, each feature filter in the complex convolutional layers (convolutional layers 210~216) performs a convolution operation with the input original image, and the complex residual blocks (residual blocks) Each feature filter in 218~228) performs two convolution operations with the input original image.

在本實施例中，第一卷積神經網路模型200可例如為以下演算法(以Python語法編程為例)： def net(image): conv1 = _conv_layer (image, 32, 9, 1) conv2 = _ conv_layer (conv1, 64, 3, 2) conv3 = _conv_layer (conv2, 128, 3, 2) resid1 = residual_block (conv3, 3) resid2 = residual_block (resid1, 3) resid3 = residual_block (resid2, 3) resid4 = residual_block (resid3, 3) resid5 = residual_block (resid4, 3) conv_t1 = conv_transpose_layer (resid5, 64, 3, 2) conv_t2 = conv_transpose_layer (conv_t1, 32, 3, 2) … In this embodiment, the first convolutional neural network model 200 can be, for example, the following algorithm (using Python syntax programming as an example): def net(image): conv1 = _conv_layer (image, 32, 9, 1) conv2 = _ conv_layer (conv1, 64, 3, 2) conv3 = _conv_layer (conv2, 128, 3, 2) resid1 = residual_block (conv3, 3) resid2 = residual_block (resid1, 3) resid3 = residual_block (resid2, 3) resid4 = residual_block (resid3, 3) resid5 = residual_block (resid4, 3) conv_t1 = conv_transpose_layer (resid5, 64, 3, 2) conv_t2 = conv_transpose_layer (conv_t1, 32, 3, 2) …

演算法中的conv1 = _conv_layer (image, 32, 9, 1)係將該原圖輸入至卷積層210中，並且卷積層210具有32個特徵過濾器。The conv1 = _conv_layer (image, 32, 9, 1) in the algorithm is to input the original image into the convolutional layer 210, and the convolutional layer 210 has 32 feature filters.

演算法中的conv2 = _ conv_layer (conv1, 64, 3, 2)係將卷積層210所運算出的輸出特徵圖再輸入至卷積層212中，並且卷積層212具有64個特徵過濾器。The conv2 = _ conv_layer (conv1, 64, 3, 2) in the algorithm is to input the output feature map calculated by the convolution layer 210 into the convolution layer 212, and the convolution layer 212 has 64 feature filters.

演算法中的conv3 = _conv_layer (conv2, 128, 3, 2)係將卷積層212所運算出的輸出特徵圖再輸入至殘差區塊218中，並且殘差區塊218具有128個特徵過濾器。The conv3 = _conv_layer (conv2, 128, 3, 2) in the algorithm is to input the output feature map calculated by the convolutional layer 212 into the residual block 218, and the residual block 218 has 128 feature filters .

演算法中的resid1 = residual_block (conv3, 3)係將殘差區塊218所運算出的輸出特徵圖再輸入至殘差區塊220中，並且殘差區塊220具亦具有128個特徵過濾器。The residual1 = residual_block (conv3, 3) in the algorithm is to input the output feature map calculated by the residual block 218 into the residual block 220, and the residual block 220 also has 128 feature filters .

演算法中的resid2 = residual_block (resid1, 3)係將殘差區塊220所運算出的特徵圖再輸入至殘差區塊222中，並且殘差區塊222具亦具有128個特徵過濾器。Resid2 = residual_block (resid1, 3) in the algorithm is to input the feature map calculated by the residual block 220 into the residual block 222, and the residual block 222 also has 128 feature filters.

演算法中的resid3 = residual_block (resid2, 3) 係將殘差區塊222所運算出的特徵圖再輸入至殘差區塊224中，並且殘差區塊224具亦具有128個特徵過濾器。Resid3 = residual_block (resid2, 3) in the algorithm is to input the feature map calculated by the residual block 222 into the residual block 224, and the residual block 224 also has 128 feature filters.

演算法中的resid4 = residual_block (resid3, 3) 係將殘差區塊224所運算出的特徵圖再輸入至殘差區塊226中，並且殘差區塊226具亦具有128個特徵過濾器。Resid4 = residual_block (resid3, 3) in the algorithm is to input the feature map calculated by the residual block 224 into the residual block 226, and the residual block 226 also has 128 feature filters.

演算法中的resid5 = residual_block (resid4, 3) 係將殘差區塊226所運算出的特徵圖再輸入至殘差區塊228中，並且殘差區塊228具亦具有128個特徵過濾器。The residual5 = residual_block (resid4, 3) in the algorithm is to input the feature map calculated by the residual block 226 into the residual block 228, and the residual block 228 also has 128 feature filters.

演算法中的conv_t1 = conv_transpose_layer (resid5, 64, 3, 2)係將殘差區塊228所運算出的特徵圖再輸入至卷積層214，並且卷積層214具有64個特徵過濾器。The conv_t1 = conv_transpose_layer (resid5, 64, 3, 2) in the algorithm is to input the feature map calculated by the residual block 228 to the convolutional layer 214, and the convolutional layer 214 has 64 feature filters.

演算法中的conv_t2 = conv_transpose_layer (conv_t1, 32, 3, 2)係將卷積層214所運算出的特徵圖再輸入至卷積層216，並且卷積層216具有32個特徵過濾器。The conv_t2 = conv_transpose_layer (conv_t1, 32, 3, 2) in the algorithm is to input the feature map calculated by the convolution layer 214 to the convolution layer 216, and the convolution layer 216 has 32 feature filters.

第3A、3B圖為本揭露實施例之風格移轉方法的流程圖。如第3A圖所示，在步驟S300中，本發明的風格移轉方法首先取得一第一卷積神經網路(例如第2圖的第一卷積神經網路200)。接著，本發明的風格移轉方法執行步驟S302，將一第一特定數目的特徵過濾器的數量減少至原來的八分之一。並且於步驟S304中，將一第二特定數目的特徵過濾器的數量減少至原來的八分之一。Figures 3A and 3B are flowcharts of the style transfer method according to an embodiment of the disclosure. As shown in FIG. 3A, in step S300, the style transfer method of the present invention first obtains a first convolutional neural network (for example, the first convolutional neural network 200 in FIG. 2). Next, the style transfer method of the present invention executes step S302 to reduce the number of a first specific number of feature filters to one-eighth of the original number. And in step S304, the number of a second specific number of feature filters is reduced to one-eighth of the original number.

第4圖為本揭露實施例之一第一卷積神經網路模型400的示意圖。在本實施例中，本發明的風格移轉方法將第一卷積神經網路模型200中的卷積層210、216內的32個特徵過濾器減少至4個特徵過濾器，對應地產生第一卷積神經網路模型400中的卷積層410、416。本發明的風格移轉方法將第一卷積神經網路模型200中的卷積層212、214內的64個特徵過濾器減少至8個特徵過濾器，對應地產生第一卷積神經網路模型400中的卷積層412、414。FIG. 4 is a schematic diagram of a first convolutional neural network model 400 according to an embodiment of the disclosure. In this embodiment, the style transfer method of the present invention reduces the 32 feature filters in the convolutional layers 210 and 216 in the first convolutional neural network model 200 to 4 feature filters, and generates the first Convolutional layers 410 and 416 in the convolutional neural network model 400. The style transfer method of the present invention reduces the 64 feature filters in the convolutional layers 212 and 214 in the first convolutional neural network model 200 to 8 feature filters, and accordingly generates the first convolutional neural network model Convolutional layers 412, 414 in 400.

再者，本發明的風格移轉方法將第一卷積神經網路模型200中的殘差區塊218、220、222、224、226、228內分別具有的128個特徵過濾器減少至16個特徵過濾器，對應地產生第一卷積神經網路模型400中的殘差區塊418、420、422、424、426、428。最後，本發明的風格移轉方法使得第一卷積神經網路模型200的檔案(tensorflow檔案)大小由原本的6580KB減少至第一卷積神經網路模型400的檔案大小148KB。Furthermore, the style transfer method of the present invention reduces the 128 feature filters in the residual blocks 218, 220, 222, 224, 226, and 228 in the first convolutional neural network model 200 to 16. The feature filter generates the residual blocks 418, 420, 422, 424, 426, 428 in the first convolutional neural network model 400 correspondingly. Finally, the style transfer method of the present invention reduces the file size of the first convolutional neural network model 200 (tensorflow file) from the original 6580KB to the file size of the first convolutional neural network model 400 of 148KB.

在步驟S306中，本發明的風格移轉方法將該原圖輸入至第一卷積神經網路模型400。接著，在步驟S308中，本發明的風格移轉方法透過第一卷積神經網路模型400執行一卷積運算(convolution)，用以產生該原圖的複數第一特徵圖。例如，第一卷積神經網路模型400中的卷積層410中的32個特徵過濾器，係用以擷取該原圖的32個不同特徵，並且卷積層412中的64個特徵過濾器，係用以擷取該原圖的64個不同特徵。In step S306, the style transfer method of the present invention inputs the original image to the first convolutional neural network model 400. Next, in step S308, the style transfer method of the present invention performs a convolution operation through the first convolutional neural network model 400 to generate a complex first feature map of the original image. For example, the 32 feature filters in the convolutional layer 410 in the first convolutional neural network model 400 are used to extract 32 different features of the original image, and the 64 feature filters in the convolutional layer 412, It is used to extract 64 different features of the original image.

在得到該原圖的該等複數第一特徵圖之後，本發明的風格移轉方法執行步驟S310，取得一第二卷積神經網路模型。在本實施例中，該第二卷積神經網路模型亦為一演算法，例如為Visual Geometry Group 19(VGG19)，亦具有複數卷積層，透過影像辨識技術的原理，將輸入於該第二卷積神經網路模型的影像或圖片進行分類。After obtaining the plural first feature maps of the original image, the style transfer method of the present invention executes step S310 to obtain a second convolutional neural network model. In this embodiment, the second convolutional neural network model is also an algorithm, for example, Visual Geometry Group 19 (VGG19), which also has multiple convolutional layers. Through the principle of image recognition technology, the input is input to the second Convolutional neural network model images or pictures are classified.

在步驟S312中，本發明的風格轉移方法將該原圖的該等複數第一特徵圖以及該風格圖輸入至該第二卷積神經網路模型，並且於步驟S314中，該第二卷積神經網路模型透過該卷積運算，從該等複數第一特徵圖中擷取相關聯於該原圖的空間特徵圖(例如形狀特徵圖、邊界特徵圖)，並且從風格圖中擷取相關聯於該風格圖的非空間特徵圖(例如顏色特徵圖、紋理特徵圖)。該第二卷積神經網路模型執行卷積運算的原理係與第2圖的第一卷積神經網路模型200與第4圖的第一卷積神經網路模型400相同，故不再贅述。In step S312, the style transfer method of the present invention inputs the plural first feature maps of the original image and the style map to the second convolutional neural network model, and in step S314, the second convolution Through the convolution operation, the neural network model extracts spatial feature maps (such as shape feature maps, boundary feature maps) related to the original image from the plural first feature maps, and extracts related features from the style map Non-spatial feature maps (such as color feature maps, texture feature maps) linked to the style map. The principle of the second convolutional neural network model for performing convolution operations is the same as that of the first convolutional neural network model 200 in Figure 2 and the first convolutional neural network model 400 in Figure 4, so it will not be repeated here. .

接著，在步驟S316中，本發明的風格移轉方法透過一損失函數運算，將相關聯於該原圖的空間特徵圖與相關聯於該風格圖的該非空間特徵圖進行合併，用以得到經風格轉移後的一合成圖。最後，在步驟S318中，本發明的風格移轉方法將該合成圖顯示於該網頁介面中。Next, in step S316, the style transfer method of the present invention uses a loss function operation to merge the spatial feature map associated with the original image and the non-spatial feature map associated with the style map to obtain the classic A composite picture after the style transfer. Finally, in step S318, the style transfer method of the present invention displays the composite image in the web page interface.

在步驟S316中，本發明的風格移轉方法所執行的該損失函數運算，包括：計算相關聯於該原圖的空間特徵圖與該合成圖的一第一特徵差距；計算相關聯於該風格圖的該非空間特徵圖與該合成圖的一第二特徵差距；將該第一特徵差距與該第二特徵差距加總，用以得到該損失函數；透過一梯度下降法，將該損失函數最小化，用以得到該合成圖。該梯度下降法為一個一階最佳化算法，透過該梯度下降法找到該損失函數的局部最小值，用以將該損失函數最小化。In step S316, the loss function calculation performed by the style transfer method of the present invention includes: calculating a first feature gap between the spatial feature map associated with the original image and the composite image; calculating the difference between the spatial feature map associated with the original image and the composite image; A second feature gap between the non-spatial feature map of the graph and the composite image; the first feature gap and the second feature gap are summed to obtain the loss function; the loss function is minimized by a gradient descent method化 to obtain the composite image. The gradient descent method is a first-order optimization algorithm, and the local minimum of the loss function is found through the gradient descent method to minimize the loss function.

在本實施例中，在所輸入該原圖或該風格圖的解析度為256*256的情況下，當使用第2圖的第一卷積神經網路模型200搭配該第二卷積神經網路模型執行風格移轉時，即執行第3A、3B圖的步驟S300、步驟S306、步驟S308、步驟S310、步驟S312、步驟S314、步驟S316及步驟S318，但不執行步驟S302及步驟S304，則在該網頁介面上執行風格移轉的處理時間為每張圖像需費時1710微秒。換句話說，從使用者將該原圖及該風格圖上傳至該網頁介面後開始，到本發明的風格移轉方法將最後的合成圖顯示於該網頁介面為止，上述風格移轉處理所花費的時間為1710微秒。In this embodiment, when the input resolution of the original image or the style image is 256*256, when the first convolutional neural network model 200 of the second image is used in conjunction with the second convolutional neural network When the road model performs style transfer, that is, step S300, step S306, step S308, step S310, step S312, step S314, step S316, and step S318 in Figures 3A and 3B are executed, but step S302 and step S304 are not executed, then The processing time for performing style transfer on the web interface is 1710 microseconds per image. In other words, after the user uploads the original image and the style map to the web interface, until the style transfer method of the present invention displays the final composite image on the web interface, the cost of the style transfer process is The time is 1710 microseconds.

在另一實施例中，在所輸入該原圖或該風格圖的解析度為256*256的情況下，當使用第4圖的第一卷積神經網路模型400搭配該第二卷積神經網路模型執行風格移轉時，即執行第3A、3B圖中所有的步驟(包括步驟S302及步驟S304)，則在該網頁介面上執行風格移轉的處理時間為每張圖像僅需費時130微秒。換句話說，本發明的風格移轉方法藉由將第一卷積神經網路模型200內的特徵過濾器的數量減少，而成為第一卷積神經網路模型400，用以降低風格移轉的總執行時間。In another embodiment, when the input resolution of the original image or the style image is 256*256, when the first convolutional neural network model 400 of the fourth image is used in conjunction with the second convolutional neural network When the network model executes style transfer, it will execute all the steps in Figures 3A and 3B (including step S302 and step S304), and the processing time for performing style transfer on the web interface is only time-consuming for each image 130 microseconds. In other words, the style transfer method of the present invention reduces the number of feature filters in the first convolutional neural network model 200 to become the first convolutional neural network model 400 to reduce the style transfer. The total execution time.

在另一實施例中，在所輸入該原圖或該風格圖的解析度為480*480的情況下，當使用第4圖的第一卷積神經網路模型400搭配該第二卷積神經網路模型執行風格移轉時，即執行第3A、3B圖中所有的步驟(包括步驟S302及步驟S304)，則在該網頁介面上執行風格移轉的處理時間為每張圖像僅需費時270微秒，亦遠比利用第一卷積神經網路模型200搭配該第二卷積神經網路模型，並且輸入圖像的解析度為256*256時的速度每張圖像費時1710微秒還要快。In another embodiment, when the input resolution of the original image or the style image is 480*480, when the first convolutional neural network model 400 of the fourth image is used in conjunction with the second convolutional neural network When the network model executes style transfer, it will execute all the steps in Figures 3A and 3B (including step S302 and step S304), and the processing time for performing style transfer on the web interface is only time-consuming for each image 270 microseconds, which is far more than using the first convolutional neural network model 200 with the second convolutional neural network model, and the resolution of the input image is 256*256. Each image takes 1710 microseconds. Even faster.

雖然特徵過濾器數量的減少，會導致從該原圖或該風格圖中所擷取的特徵點數量也減少，使得經風格移轉後的該合成圖的效果變差。本發明的風格移轉方法即是犧牲些許合成效果，使得使用者在可接受該合成圖的視覺效果的前提下，有效降低風格移轉的處理時間，用以提升該網頁介面上執行風格移轉的使用者經驗。Although the number of feature filters is reduced, the number of feature points extracted from the original image or the style map is also reduced, so that the effect of the composite image after the style transfer is deteriorated. The style transfer method of the present invention sacrifices a little synthesis effect, so that the user can effectively reduce the processing time of style transfer under the premise of accepting the visual effect of the synthesized image, so as to improve the execution of style transfer on the web interface User experience.

本發明更揭露一種電腦程式產品，應用於一網頁介面中，用以對一原圖與一風格圖進行風格移轉；該電腦程式產品經由電腦載入該程式以執行：一第一呼叫指令、一過濾器設定指令、一第一輸入指令、一第一特徵萃取指令、一第二呼叫指令、一第二輸入指令、一第二特徵萃取指令、一合成指令，以及一顯示指令。該第一呼叫指令使該電腦的一處理器執行第3A圖的步驟S300。該過濾器設定指令使該處理器執行第3A圖的步驟S302及步驟S304。該第一輸入指令使該處理器執行第3A圖的步驟S306。該第一特徵萃取指令使該處理器執行第3A圖的步驟S308。The present invention further discloses a computer program product applied to a web interface for style transfer between an original image and a style image; the computer program product is loaded into the program by the computer to execute: a first call command, A filter setting instruction, a first input instruction, a first feature extraction instruction, a second call instruction, a second input instruction, a second feature extraction instruction, a synthesis instruction, and a display instruction. The first call instruction causes a processor of the computer to execute step S300 in FIG. 3A. The filter setting instruction causes the processor to execute step S302 and step S304 in FIG. 3A. The first input instruction causes the processor to execute step S306 in FIG. 3A. The first feature extraction instruction causes the processor to execute step S308 in FIG. 3A.

依據本發明所揭露的電腦程式產品，該第二呼叫指令使該處理器執行第3B圖的步驟S310。該第二輸入指令使該處理器執行第3B圖的步驟S312。該第二特徵萃取指令使該處理器執行第3B圖的步驟S314。該合成指令使該處理器執行第3B圖的步驟S316。最後，該顯示指令使該處理器執行第3B圖的步驟S318。According to the computer program product disclosed in the present invention, the second call instruction causes the processor to execute step S310 in FIG. 3B. The second input instruction causes the processor to execute step S312 in FIG. 3B. The second feature extraction instruction causes the processor to execute step S314 in FIG. 3B. The synthesis instruction causes the processor to execute step S316 in FIG. 3B. Finally, the display instruction causes the processor to execute step S318 in FIG. 3B.

本發明所揭露的電腦程式產品在執行該合成指令時，該處理器會執行一損失函數運算，該損失函數運算包括：計算相關聯於該原圖的空間特徵圖與該合成圖的一第一特徵差距；計算相關聯於該風格圖的該非空間特徵圖與該合成圖的一第二特徵差距；將該第一特徵差距與該第二特徵差距加總，用以得到該損失函數；透過一梯度下降法，將該損失函數最小化，用以得到該合成圖。When the computer program product disclosed in the present invention executes the synthetic instruction, the processor executes a loss function operation. The loss function operation includes: calculating a spatial feature map associated with the original image and a first of the composite image. Feature gap; calculate a second feature gap between the non-spatial feature map and the composite image associated with the style map; add the first feature gap and the second feature gap to obtain the loss function; The gradient descent method minimizes the loss function to obtain the composite image.

本發明所揭露的風格移轉方法及電腦程式產品係可讓風格移轉的功能，透過網頁介面，在低階的硬體裝置上實現，使得使用者可在低階的硬體裝置上亦可順暢地體驗風格移轉的魅力。The style transfer method and computer program product disclosed in the present invention enable the style transfer function to be implemented on low-level hardware devices through a web interface, so that users can also use low-level hardware devices. Experience the charm of style transfer smoothly.

雖然本發明的實施例如上述所描述，我們應該明白上述所呈現的只是範例，而不是限制。依據本實施例上述示範實施例的許多改變是可以在沒有違反發明精神及範圍下被執行。因此，本發明的廣度及範圍不該被上述所描述的實施例所限制。更確切地說，本發明的範圍應該要以以下的申請專利範圍及其相等物來定義。Although the embodiments of the present invention are as described above, we should understand that what is presented above is only an example, not a limitation. According to this embodiment, many changes of the above exemplary embodiment can be implemented without violating the spirit and scope of the invention. Therefore, the breadth and scope of the present invention should not be limited by the embodiments described above. More precisely, the scope of the present invention should be defined by the following patented scope and its equivalents.

100 ~ 輸入圖像 102 ~ 特徵過濾器 104 ~ 特徵圖 110、112 ~ 部分特徵 200、400 ~ 第一卷積神經網路模型 210、212、214、216 ~ 卷積層 410、412、414、416 ~ 卷積層 218、220、222 ~ 殘差區塊 224、226、228 ~ 殘差區塊 418、420、422 ~ 殘差區塊 424、426、428 ~ 殘差區塊 S300、S302、S304、S306、S308 ~ 步驟 S310、S312、S314、S316、S318 ~ 步驟 100 ~ input image 102 ~ Feature Filter 104 ~ Feature Map 110, 112 ~ some features 200, 400 ~ the first convolutional neural network model 210, 212, 214, 216 ~ Convolutional layer 410, 412, 414, 416 ~ Convolutional layer 218, 220, 222 ~ residual block 224, 226, 228 ~ residual block 418, 420, 422 ~ residual block 424, 426, 428 ~ residual block S300, S302, S304, S306, S308 ~ step S310, S312, S314, S316, S318 ~ step

第1圖為本揭露實施例之卷積運算的示意圖。第2圖為本揭露實施例之一第一卷積神經網路模型200的示意圖。第3A圖為本揭露實施例之風格移轉方法的流程圖。第3B圖為本揭露實施例之風格移轉方法的流程圖。第4圖為本揭露實施例之一第一卷積神經網路模型400的示意圖。 Figure 1 is a schematic diagram of the convolution operation of the disclosed embodiment. FIG. 2 is a schematic diagram of a first convolutional neural network model 200 according to an embodiment of the disclosure. FIG. 3A is a flowchart of a style transfer method according to an embodiment of the disclosure. FIG. 3B is a flowchart of the style transfer method according to the embodiment of the disclosure. FIG. 4 is a schematic diagram of a first convolutional neural network model 400 according to an embodiment of the disclosure.

S300、S302、S304 ~ 步驟 S306、S308 ~ 步驟 S300, S302, S304 ~ steps S306, S308 ~ step

Claims

A style transfer method, applied to a web interface, includes: obtaining a first convolution neural network (CNN) model, wherein the first convolution neural network model is Training is used to extract features from a content image, and includes complex convolutional layers, the complex convolutional layers including a first specific number of feature filters and a second specific number of feature filters ; Reduce the number of the first specific number of feature filters to one eighth of the original; reduce the number of the second specific number of feature filters to one eighth of the original; input the original image to the A first convolutional neural network model; a convolution operation is performed through the first convolutional neural network model to generate a complex first feature map of the original image; wherein, each feature filter is used to extract the original image Different features of the graph, the first specific number is 32, and the second specific number is 64.

The method described in item 1 of the scope of patent application further includes: obtaining a second convolutional neural network model; inputting the plural first feature maps and a style image of the original image into the first image Two convolutional neural network models; through the convolution operation, the complex number first feature maps are extracted from the first feature maps associated with the The spatial feature map of the original image, and extract the non-spatial feature map associated with the style map from the style map; through a loss function operation, the spatial feature map associated with the original image is associated with the style The non-spatial feature maps of the map are merged to obtain a composite image after style transfer; the composite image is displayed in the web page interface.

According to the method described in claim 1, wherein the first convolutional neural network model performs a convolution operation on the original image with the first specific number and the second specific number of feature filters for Extract and generate the plural first feature maps of the original image.

The method described in item 2 of the scope of patent application, wherein the spatial feature map includes a shape feature map and a boundary feature map; the non-spatial feature map includes a color feature map and a texture feature map.

The method described in item 2 of the scope of patent application, wherein the second convolutional neural network model is VGG19.

The method described in item 2 of the scope of the patent application, wherein the loss function calculation includes: calculating a first feature gap between the spatial feature map associated with the original image and the composite image; and calculating the spatial feature map associated with the style map A second feature gap between the non-spatial feature map and the composite image; sum the first feature gap and the second feature gap to obtain the loss Function; Through a gradient descent method, the loss function is minimized to obtain the composite image.

A computer program product applied to a web interface for style transfer between an original image and a style image; the computer program product loads the program through the computer to execute: a first call command to make the computer's A processor obtains a first convolutional neural network model from the storage of the computer; wherein the first convolutional neural network model is trained to extract features in the original image, and includes a complex convolutional layer, The plural convolutional layers include a first specific number of feature filters and a second specific number of feature filters; a filter setting instruction causes the processor to reduce the number of the first specific number of feature filters to One-eighth of the original, and the second specific number of feature filters is reduced to one-eighth; a first input instruction causes the processor to input the original image to the first convolution Neural network model; a first feature extraction instruction that causes the processor to perform a convolution operation through the first convolutional neural network module to generate a complex first feature map of the original image; wherein, each feature The filter is used to capture different features of the original image, the first specific number is 32, and the second specific number is 64.

For example, the computer program product described in item 7 of the scope of patent application further includes: a second call instruction to make the processor obtain a first call from the computer's memory Two convolutional neural network models; a second input command to enable the processor to input the plural first feature maps of the original image and the style map to the second convolutional neural network model; a second feature The extraction instruction causes the processor to extract the spatial feature maps associated with the original image from the plural first feature maps through the convolution operation, and extract the spatial feature maps associated with the style map from the style map Non-spatial feature map; a synthesis instruction that causes the processor to perform a loss function operation to merge the spatial feature map associated with the original image and the non-spatial feature map associated with the style map to obtain the classic style A composite image after the transfer; a display instruction causes the processor to display the composite image in the web page interface.

For example, the computer program product described in item 7 of the scope of patent application, wherein the first convolutional neural network model performs a convolution operation on the original image with the first specific number and the second specific number of feature filters, Used to extract and generate the plural first feature maps of the original image.

The computer program product described in item 8 of the scope of patent application, wherein the spatial feature map includes a shape feature map and a boundary feature map; the non-spatial feature map includes a color feature map and a texture feature map.

The computer program product described in item 8 of the scope of patent application, wherein the second convolutional neural network model is VGG19.

The computer program product described in item 8 of the scope of patent application, wherein the loss function calculation includes: calculating a spatial feature map associated with the original image and a first feature of the composite image Calculate a second feature gap between the non-spatial feature map and the composite image associated with the style map; add the first feature gap and the second feature gap to obtain the loss function; The gradient descent method minimizes the loss function to obtain the composite image.