WO2020088280A1 - 图像风格迁移方法和系统 - Google Patents

图像风格迁移方法和系统 Download PDF

Info

Publication number
WO2020088280A1
WO2020088280A1 PCT/CN2019/111968 CN2019111968W WO2020088280A1 WO 2020088280 A1 WO2020088280 A1 WO 2020088280A1 CN 2019111968 W CN2019111968 W CN 2019111968W WO 2020088280 A1 WO2020088280 A1 WO 2020088280A1
Authority
WO
WIPO (PCT)
Prior art keywords
style
picture
content
loss function
image
Prior art date
Application number
PCT/CN2019/111968
Other languages
English (en)
French (fr)
Inventor
李强
张�雄
郑文
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Publication of WO2020088280A1 publication Critical patent/WO2020088280A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map

Definitions

  • This application belongs to the field of computer software applications, in particular to a method and system for image style transfer.
  • Photorealistic Style Transfer studies the style conversion between natural pictures. For example: converting a portrait picture into a portrait picture with oil painting style; or converting a landscape photo taken in dim light into a landscape picture under bright light, etc.
  • the specific real image style transfer process includes: designating an input picture as the base picture, also known as the content picture, and designating one or more pictures as the style picture with the desired image style, ensuring the structure of the content picture
  • the image style of the content picture is converted, and the final converted picture is a picture that presents a perfect combination of the image style of the content picture and the style picture, that is, a stylized picture.
  • Deep learning networks to extract features to assist image style transfer.
  • the use of deep learning networks to assist image style transfer can be specifically divided into two categories: the first category is based on optimized networks to assist image style transfer, and the second category is based on pre-networks to assist image style transfer.
  • the first type is to use Convolutional Neural Network (CNN) to extract the features of content pictures and style pictures, and construct a Gram matrix to define the content loss function and style loss function, and then obtain it by optimization Stylized target picture.
  • CNN Convolutional Neural Network
  • the second type is to use an error back propagation (Error Back Propagation, BP) algorithm to train a front-end network, and then perform image style transfer on a given content picture according to the trained front-end network to obtain a stylized content picture.
  • BP Error Back Propagation
  • This method can achieve rapid image style transfer, because of the content loss and style loss in the image style transfer process, it is impossible to obtain high-fidelity stylized content pictures, that is, even if the input content pictures and style pictures are both It is a real photo, and the output picture is still like a painting with clothes distorted and distorted.
  • the embodiments of the present application disclose an image style transfer method and system, which can increase the speed of image style transfer while retaining the true details of the content pictures to the greatest extent.
  • a method for transferring image styles including:
  • the first neural network performs style transfer on the first content picture to obtain a second content picture
  • the second neural network calculates the loss between the second content picture and the first content picture and between the second content picture and the style picture based on the loss function
  • the given third content picture is style-transferred to obtain a fourth content picture.
  • an image style transfer system including:
  • the acquisition module is used to acquire the style picture and the first content picture
  • a first processing module configured to transfer the style of the first content picture according to the first neural network to obtain a second content picture
  • the second processing module is configured to calculate the loss between the second content picture and the first content picture and between the second content picture and the style picture based on a loss function according to the second neural network, according to The loss optimizes the first neural network;
  • the execution module is configured to perform style transfer on the given third content picture based on the optimized first neural network to obtain a fourth content picture.
  • an electronic device including:
  • Memory for storing processor executable instructions
  • the processor is configured to execute the image style transfer method described in the first aspect above.
  • a non-transitory computer-readable storage medium stores computer instructions, and the computer instructions are executed when implemented as described in the first aspect above Image style transfer method.
  • a computer program product includes program instructions, and when the program instructions are executed by an electronic device, the electronic device is caused to execute the image as described in the first aspect above Style transfer method.
  • a semantic segmentation algorithm is used to enhance the style loss function in the second neural network to avoid semantically wrong transfers during the transfer of image styles.
  • the second neural network uses the image loss function to optimize the first neural network during the training phase, which can retain the true details of the input picture to the greatest extent during the image style transfer process.
  • the image style transfer model provided in the embodiment of the present application only needs to be trained once, unlike the prior art image style transfer model that needs to be retrained each time it is used, which improves the utilization efficiency of the image style transfer model.
  • Fig. 1 is a flowchart of a method for transferring image styles according to an exemplary embodiment
  • Fig. 2 is a structural diagram of an image style migration system according to an exemplary embodiment
  • Fig. 3 is a block diagram of an image style migration system according to an exemplary embodiment
  • Fig. 4 is a block diagram of an optimization unit according to an exemplary embodiment
  • Fig. 5 is a block diagram of an electronic device for performing a method for transferring image styles according to an exemplary embodiment.
  • Fig. 1 is a flowchart of a method for transferring image style according to an exemplary embodiment, including the following steps.
  • step S110 the style picture and the first content picture are obtained.
  • the style picture is a picture with an image style that you want the content picture to present during the image style transfer process
  • the content picture is a picture with the image style transfer to be performed.
  • the final output image style is a picture that shows the perfect combination of content picture and style picture.
  • the image style is the style of the picture.
  • oil painting style bright light style
  • nostalgic style denim style
  • Different image styles can be expressed with different textures, tones, brush strokes, etc.
  • the image style is shifted to adjust the image style of a picture to another image style. For example: converting a portrait picture into a portrait picture with oil painting style; or converting a landscape photo taken in dim light into a landscape picture under bright light, etc.
  • the content picture is the picture whose image style is to be adjusted
  • the final output image style picture is the content picture whose image style is adjusted to the style picture.
  • the acquired style picture may be one or more pictures. When there are multiple style pictures, multiple style pictures should have the same image style.
  • the acquired first content picture may be one piece or multiple pieces.
  • the multiple first content pictures may have different image styles or the same image style.
  • the image style of the first content picture is different from the image style of the style picture.
  • step S120 the first neural network is used to transfer the style of the first content image to obtain the second content image.
  • style transfer is image style transfer.
  • the electronic device uses the first neural network to transfer the style of the first content picture to obtain the second content picture, that is, the electronic device inputs the first content picture into the first neural network, and the first neural network processes the first content picture and converts the first
  • the image style of the content picture is adjusted to the image style of the style picture to obtain the second content picture. That is, step S120 is: the first neural network performs image style transfer on the first content picture to obtain the second content picture.
  • the first neural network may use a convolutional neural network (Convolutional Neural Network, CNN).
  • CNN convolutional Neural Network
  • the first neural network performs image style transfer on the first content picture to obtain a stylized second content picture.
  • the input of the first neural network is the first content picture
  • the output of the first neural network is the second content picture.
  • the second content picture is a picture showing a combination of the image content of the first content picture and the image style of the style picture.
  • the image content can be cars, trees, people, etc.
  • step S130 the second neural network is used to calculate the loss between the second content picture and the first content picture and the style picture.
  • the second neural network includes a loss function.
  • the electronic device uses the second neural network to calculate the loss between the second content picture and the first content picture and the style picture, that is, the electronic device inputs the second content picture and the first content picture and the style picture into the second neural network
  • the network processes the second content picture and the first content picture and the style picture to obtain the loss between the second content picture and the first content picture and the style picture. That is, step S130 is: the second neural network calculates the loss between the second content picture and the first content picture and between the second content picture and the style picture based on the loss function.
  • the second neural network may use a super-resolution test sequence (Visual Geometry Group, VGG) network.
  • the input of the second neural network is the second content picture and the first content picture and the style picture
  • the output of the second neural network is the loss between the second content picture and the first content picture and the style picture.
  • both the first neural network and the second neural network include a multi-layer convolutional layer (Convolutional Layer), a pooling layer (Pooling Layer), and a fully connected layer.
  • Convolutional Layer a multi-layer convolutional layer
  • Pooling Layer a pooling layer
  • Fully connected layer a fully connected layer
  • the convolution layer is the feature extraction layer (Feature Layer).
  • Feature Layer the feature extraction layer
  • Each convolutional layer is composed of several convolutional units.
  • the convolution unit can be implemented by a filter.
  • the convolutional layer extracts the features of the input picture through a convolution operation.
  • the pooling layer can control overfitting to a certain extent by continuously reducing the size of the data, and then reducing the number of parameters and calculations of the neural network.
  • the second neural network also includes a loss function layer, and the loss function layer contains a loss function.
  • the loss function is used to calculate the loss between the predicted result and the actual result of the neural network. Based on the calculated loss, it is possible to decide how to shorten the difference between the predicted result and the actual result of the neural network.
  • step S140 the first neural network is optimized according to the above loss.
  • the basic principle of image style transfer is that the stylized picture should maintain the original image content of the content picture, for example, the image content of the content picture is a car, and the image content of the stylized picture should also be A car cannot be turned into a motorcycle.
  • the stylized pictures must maintain the unique image style of the stylized pictures, such as texture, color tone, brush strokes, etc.
  • the process shown in the above steps S110-S140 is the training process of the first neural network.
  • the parameters of each convolution unit of the convolution layer are optimized by the back propagation algorithm.
  • the trained first neural network can well adjust the image style of the given content picture to the specific image style of one or more style pictures obtained in step S110.
  • the optimized first neural network is the image style transfer model.
  • the first content picture, the second content picture, and the style picture are input to the second neural network.
  • the second neural network is a loss network and is used to evaluate the difference between the content image output by the first neural network and the original content image and style image.
  • the second neural network calculates the difference between the second content picture obtained through the first neural network and the first content picture and the style picture according to the loss function, and optimizes the first neural network according to the difference.
  • step S150 the given content picture is style-transferred based on the optimized first neural network to obtain a stylized content picture.
  • the optimized first neural network is the first neural network trained according to steps S110-S140.
  • the electronic device inputs the given content picture into the optimized first neural network, the first neural network processes the given content picture, and adjusts the image style of the given content picture to the style of the style picture to obtain stylization Content picture.
  • the given content picture as the third content picture and the stylized content picture as the fourth content picture for example, it does not play a limiting role.
  • step S150 is: performing image style transfer on the given third content picture based on the optimized first neural network to obtain a fourth content picture.
  • the fourth content picture is a picture showing a combination of the image content of the third content picture and the image style of the style picture.
  • the loss function may include: a content loss function, a style loss function, and an image loss function.
  • Content loss refers to the loss between the image content of the original content picture and the image content of the stylized content picture.
  • the specific visual characteristics of the picture include texture, color, etc.
  • Style loss refers to the loss between the specific visual characteristics of the original content picture and the specific visual characteristics of the stylized content picture.
  • the specific visual characteristics of the picture include texture, color, etc.
  • Image loss refers to loss of image detail. Image details include color information, location information of image content, etc.
  • the second neural network uses the content loss function to calculate the content loss between the second content picture and the first content picture; uses the style loss function to calculate the style loss between the second content picture and the style picture; and uses the image loss function to calculate the second Image loss between the content picture and the first content picture.
  • the features output by the lower convolution layer describe the specific visual characteristics of the picture, and the features output by the higher convolution layer describe the more abstract image content.
  • the features output by the convolutional layer of the lower layer are the features of the lower layer
  • the features output by the convolutional layer of the higher layer are the features of the higher layer.
  • the content loss function in the second neural network compares the first content picture and the second content picture by comparing the similarity of high-level features of the first content picture and the second content picture output by the first neural network Similarity between content. Based on the content similarity between the first content picture and the second content picture, the content loss between the first content picture and the second content picture is determined. Optimize the first neural network according to content loss.
  • the similarity of high-level features of the first content picture and the second content picture may be expressed by the Euclidean distance between the high-level features of the first content picture and the second content picture, or the first content picture and the second content picture may be expressed in other ways.
  • the similarity of the high-level features of the two content pictures is not specifically limited in the embodiments of the present application.
  • the content loss function can be:
  • N i represents the total number of convolution units in the i-th convolutional layer in the second neural network
  • D i represents the total pixels in the feature map corresponding to each convolution unit in the i-th convolutional layer Number
  • F i (O) is used to represent the feature matrix of the second content picture in the i-th convolution layer
  • F i (I) is used to represent the feature matrix of the first content picture in the i-th convolution layer
  • (j , K) is used to represent the jth and kth eigenvectors in the feature matrix
  • j and k are natural numbers greater than zero.
  • the second neural network first obtains N feature maps of the first content picture and the second content picture in the i-th convolutional layer, and then converts the pixels in the N feature maps into feature vectors, respectively
  • the feature vectors are arranged horizontally and vertically to obtain a matrix, and then the feature matrix F i (I) of the first content picture and the feature matrix F i (O) of the second content picture, (j, k) respectively represent the feature matrix
  • the j-th eigenvector in the horizontal direction and the k-th in the vertical direction, j and k are natural numbers greater than zero.
  • the second neural network judges the style similarity between the second content picture and the style picture by comparing the similarity of the low-level features of the second content picture and the style picture output by the first neural network. Based on the style similarity between the second content picture and the style picture, the style loss between the second content picture and the style picture is determined. Optimize the first neural network according to style loss.
  • the similarity between the low-level features of the second content picture and the style picture can be expressed by the Euclidean distance between the low-level features of the second content picture and the style picture, or the low-level features of the second content picture and the style picture can be expressed in other ways The similarity is not specifically limited in the embodiments of the present application.
  • the style loss function can be:
  • N i represents the total number of convolution units in the i-th convolutional layer of the second neural network
  • G i (O) is used to represent the Graham matrix of the second content image in the i-th convolutional layer
  • G i ( I) is used to represent the Graham matrix of the style picture in the i-th convolutional layer
  • (j, k) is used to represent the jth and kth feature vectors in the Graham matrix
  • j and k are greater than zero Natural number.
  • the second neural network first obtains the N feature maps of the style image and the second content image in the i-th convolution layer, and then converts the pixels in the N feature maps into feature vectors, respectively.
  • the inner product between each two feature vectors in a feature vector, the multiple inner products are arranged horizontally and vertically to obtain a matrix, and then the Gramm matrix G i (I) of the style picture and the lattice of the second content picture are obtained
  • the Lime matrices G i (O), (j, k) respectively represent the j-th horizontal and k-th eigenvectors in the Gram matrix, and j and k are natural numbers greater than zero.
  • the second neural network may also compare the difference in image details between the first content picture and the second content picture according to the image loss function, Get the image loss, and then optimize the first neural network according to the image loss.
  • the above-mentioned differences in image details include channel loss on the color channel and position deviation of the image content. For example, if the image content is a car, the positional deviation of the image content may be the positional deviation between the car in the first content picture and the car in the second content picture.
  • calculating the image loss between the second content picture and the first content picture based on the image loss function may include: calculating the channel loss of the first content picture and the second content picture on each color channel, dividing each color The channel losses on the channels are added to obtain the image loss between the second content picture and the first content picture.
  • the image loss function can be obtained based on the principle of local affine transformation. For example, each pixel value of the output picture on each color channel can be obtained by a linear combination of pixels in the local area of the input picture.
  • the image loss function can be:
  • V i (O) represents the vectorized result of the second content picture on the i-th color channel
  • M I represents the Laplacian matting matrix of the first content picture.
  • the style loss function may also be optimized based on the semantic segmentation algorithm during the training process of the first neural network. Semantic segmentation of the first content picture with a set of common tags through a semantic segmentation algorithm to obtain a segmentation template; and adding the segmentation template to the input image of the second neural network (ie, the first content image) as another channel, through The semantic segmentation channel enhances the style loss function, and the segmentation template is input into the style loss function to enhance the style loss function. Specifically, the segmentation template is added to the first content picture as another channel, the style loss function is enhanced through the semantic segmentation channel, and the segmentation template is input into the style loss function to enhance the style loss function.
  • the above general labels include sky, buildings, water, etc.
  • the total loss function of the second neural network can be:
  • L represents the total number of convolutional layers of the second neural network
  • i represents the i-th layer convolutional layer of the second neural network
  • L and i are natural numbers greater than zero
  • ⁇ s and ⁇ m are used to adjust the content loss function
  • style loss function and image loss function ratio
  • ⁇ i and ⁇ i are used to adjust the weight of the second neural network layer i convolution layer.
  • the second neural network calculates the content loss, style loss, and image loss according to the results of different convolutional layers, and then obtains the total loss, and then continuously updates the first neural network using optimization methods such as gradient descent and back propagation. Parameters.
  • Fig. 2 is a structural diagram of an image style transfer system according to an exemplary embodiment.
  • the first neural network uses a convolutional neural network (Convolutional Neural Network, CNN).
  • the second neural network uses a 19-layer super-resolution test sequence (Visual Geometry Group, VGG) network.
  • VGG Visual Geometry Group
  • the VGG network contains 16 convolutional layers and 3 pooling layers, and uses average-pooling instead of maximum pooling (Max-pooling).
  • the second neural network may also use other structure VGG networks.
  • the first neural network receives the first content picture I input and performs stylized image migration on the first content picture to obtain the second content picture I out .
  • the second neural network receives the first content picture I input , the second content picture I out and the style picture I style . Based on the output results of different convolution layers, the second neural network calculates the content loss and the content loss function based on the content loss function and the style loss function Style loss, and the image loss is calculated based on the image loss function, and then the total loss is obtained. Then use gradient descent, back propagation and other optimization methods to continuously update the parameters of the first neural network.
  • the second neural network calculates the content loss (Content Loss) between the second content picture and the first content picture according to the output result of the conv4_2 layer convolution layer; according to conv1_1, conv2_1, conv3_1,
  • the results of the conv4_1 and conv5_1 convolutional layer output are calculated with the style picture to calculate the style loss between the second content picture and the style picture, and then the multiple style losses are added to obtain the total style loss (Style Loss);
  • the number of pixels on each color channel of the second content picture and the Laplacian matting matrix of the first content picture calculate the image loss. Sum the content loss, total style loss, and image loss to get the total loss.
  • the image style transfer method provided in the embodiments of the present application has the following beneficial effects.
  • the first neural network is used to transfer the image style of the first content picture to obtain the second content picture, and then the second neural network is used to calculate the style loss of the second content picture based on various loss functions As well as content loss, parameters such as gradient descent and back propagation are used to update the parameters of the first neural network according to the aforementioned style loss and content loss.
  • the given content pictures are stylized for image migration, and a highly realistic stylized picture can be obtained.
  • the image style transfer method provided in the embodiment of the present application can acquire multiple style pictures with the same image style and multiple first content pictures with different image styles during the training phase of the first neural network, based on the acquired style Picture and the first content picture optimize the first neural network, that is, use a lot of computing resources to optimize the first neural network, and directly use the optimized first neural network for image style transfer in the application phase, you can quickly get a high degree of realism Of stylized pictures to achieve real-time image style transfer.
  • a semantic segmentation algorithm is used in the training stage to enhance the style loss function in the second neural network, to avoid semantic erroneous transfer during image style transfer.
  • the second neural network uses the image loss function to optimize the first neural network during the training phase, which can retain the true details of the input picture to the greatest extent during the image style transfer process.
  • the image style transfer model provided in the embodiments of the present application only needs to be trained once, unlike the prior art image style transfer model that needs to be retrained each time it is used, which improves the use of the image style transfer model effectiveness.
  • Fig. 3 is a block diagram of an image style migration system according to an exemplary embodiment.
  • the image style transfer system includes: an acquisition module 310 for acquiring style pictures and first content pictures; and a first processing module 320 for transferring image styles of the first content pictures according to the first neural network to Obtain the second content picture; the second processing module 330 is configured to calculate the loss between the second content picture and the first content picture and the style picture based on the loss function according to the second neural network, and optimize the first neural network according to the above loss; and The execution module 340 is configured to perform image style transfer on the given third content picture based on the optimized first neural network to obtain a fourth content picture.
  • both the first neural network and the second neural network include multiple layers of feature extraction layers, that is, a convolutional layer, a pooling layer, and a fully connected layer.
  • Each convolutional layer is composed of several convolutional units (ie filters), and the parameters of each convolutional unit are optimized by a back propagation algorithm.
  • the convolutional layer extracts the features of the input picture through a convolution operation.
  • the pooling layer can control overfitting to a certain extent by continuously reducing the size of the data, and then reducing the number of parameters and calculations of the neural network.
  • the second neural network also includes a loss function layer.
  • the loss function is used to determine how to "punish” the difference between the predicted result and the real result in the neural network during the training process.
  • the first neural network uses CNN.
  • the second neural network uses the VGG network.
  • the second neural network uses a 19-layer VGG network that includes 16 convolutional layers and 3 pooling layers, and uses average pooling instead of maximum pooling.
  • the first neural network receives the first content picture, and performs stylized image migration on the first content picture to obtain the second content picture.
  • the second neural network receives the first content picture, the second content picture, and the style picture.
  • the second neural network calculates the content loss and style loss according to the output results of different convolutional layers, and calculates the image loss based on the image loss function, and then obtains Total loss. Then use gradient descent, back propagation and other optimization methods to continuously update the parameters of the first neural network.
  • the second neural network uses the content loss function to calculate the content loss between the second content picture and the first content picture; and uses the style loss function to calculate the style loss between the second content picture and the style picture.
  • the second neural network in order to preserve the true details of the input picture to the greatest extent during the transfer of the image style, the second neural network also compares the difference in detail of the first content picture and the second content picture according to the image loss function (ie, the image Loss), and then optimize the first neural network according to the image loss.
  • the image loss function ie, the image Loss
  • the second processing module 330 may be configured to: calculate the content loss between the second content picture and the first content picture based on the content loss function; calculate the style loss between the second content picture and the style picture based on the style loss function ; And calculating the image loss between the second content picture and the first content picture based on the image loss function.
  • the second processing module 330 may further include an optimization unit 331 for optimizing the style loss function based on the semantic segmentation algorithm during the training process.
  • the optimization unit 331 may include a segmentation unit 3311 for semantically segmenting the first content picture with a set of common tags (such as sky, buildings, water, etc.) through a semantic segmentation algorithm To obtain a segmentation template; and a function enhancement unit 3312, which is used to add the segmentation template to the input image (ie, the first content image) of the second neural network as another channel, enhance the style loss function through the semantic segmentation channel, and divide the segmentation template Enter the style loss function to enhance the style loss function.
  • a segmentation unit 3311 for semantically segmenting the first content picture with a set of common tags (such as sky, buildings, water, etc.) through a semantic segmentation algorithm To obtain a segmentation template
  • a function enhancement unit 3312 which is used to add the segmentation template to the input image (ie, the first content image) of the second neural network as another channel, enhance the style loss function through the semantic segmentation channel, and divide the segmentation template Enter the style loss function to enhance the style loss function.
  • Fig. 5 is a block diagram of an electronic device 400 provided with a method for transferring image styles according to an exemplary embodiment.
  • the electronic device 400 may be a mobile phone, a small computer, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like.
  • the electronic device 400 may include one or more of the following components: a processing component 405, a memory 401, a power component 402, a multimedia component 403, an audio component 404, an input / output (Input / Output, I / O) interface 408, The sensor component 407 and the communication component 406.
  • the processing component 405 generally controls the overall operations of the electronic device 400, such as operations associated with display, telephone calls, data communication, camera operations, and recording operations.
  • the processing component 405 may include one or more processors 410 to execute instructions to complete all or part of the steps of the above method.
  • the processing component 405 may include one or more modules to facilitate interaction between the processing component 405 and other components.
  • the processing component 405 may include a multimedia module to facilitate interaction between the multimedia component 403 and the processing component 405.
  • the memory 401 is configured to store various types of data to support the operation of the electronic device 400. Examples of these data include instructions for any application or method operating on the electronic device 400, contact data, phone book data, messages, pictures, videos, and the like.
  • the memory 401 may be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (Static Random-Access Memory, SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (Programmable Read Only Memory, PROM), Read Only Memory (Read Only Only Memory, ROM) ), Magnetic memory, flash memory, magnetic disk or optical disk, etc.
  • SRAM static random access memory
  • EEPROM Electrically erasable programmable read-only memory
  • EPROM Erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • Read Only Memory Read Only Only Memory
  • the power supply component 402 provides power to various components of the electronic device 400.
  • the power supply component 402 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 400.
  • the multimedia component 403 includes a screen of an output interface provided between the electronic device 400 and the user.
  • the screen may include a liquid crystal display (Liquid Crystal Display, LCD) and a touch panel (Touch Panel, TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor can not only sense the boundary of the touch or sliding action, but also detect the duration and pressure related to the touch or sliding operation.
  • the multimedia component 403 includes a front camera and / or a rear camera. When the electronic device 400 is in an operation mode, such as a shooting mode or a video mode, the front camera and / or the rear camera may receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
  • the audio component 404 is configured to output and / or input audio signals.
  • the audio component 404 includes a microphone (Microphone, MIC).
  • the microphone is configured to receive an external audio signal.
  • the received audio signal may be further stored in the memory 401 or transmitted via the communication component 406.
  • the audio component 404 may further include a speaker for outputting audio signals.
  • the I / O interface 408 provides an interface between the processing component 405 and the peripheral interface module.
  • the above peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.
  • the sensor assembly 407 includes one or more sensors for providing the electronic device 400 with status assessment in various aspects.
  • the sensor component 407 can detect the on / off state of the electronic device 400 and the relative positioning of the components, for example, the components are the display and the keypad of the electronic device 400.
  • the sensor component 407 can also detect the position change of the electronic device 400 or one of the components of the electronic device 400, the presence or absence of user contact with the electronic device 400, the orientation or acceleration / deceleration of the electronic device 400, and the temperature change of the electronic device 400, etc.
  • the sensor assembly 407 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
  • the sensor component 407 may also include a light sensor, such as a complementary metal oxide semiconductor (Complementary Metal Oxide Semiconductor (CMOS) sensor or a charge coupled device (Charge Coupled Device, CCD) image sensor, for use in imaging applications.
  • CMOS Complementary Metal Oxide Semiconductor
  • CCD Charge Coupled Device
  • the sensor assembly 407 may further include an acceleration sensor, a gyro sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • the communication component 406 is configured to facilitate wired or wireless communication between the electronic device 400 and other devices.
  • the electronic device 400 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof.
  • the communication component 406 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 406 may further include a Near Field Communication (NFC) module to facilitate short-range communication.
  • NFC Near Field Communication
  • the NFC module can be based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (Ultra Wide Band, UWB) technology, Bluetooth (Blue Tooth, BT) technology and Other technologies.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wideband
  • Bluetooth Bluetooth
  • the electronic device 400 may be implemented by one or more application specific integrated circuits (Application Specific Integrated Circuit (ASIC)), digital signal processor (Digital Signal Processor) DSP, digital signal processing device (Digital Signal Processor) Device, DSPD), programmable logic device (Programmable Logic Device, PLD), field programmable gate array (Field Programmable Gate Array, FPGA), controller, microcontroller, microprocessor or other electronic components to implement the above image
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • DSPD digital signal processing device
  • PLD programmable logic device
  • FPGA field programmable gate array
  • controller microcontroller
  • microprocessor or other electronic components to implement the above image The steps of the style transfer method.
  • a non-transitory computer-readable storage medium including instructions, for example, a memory 401 including instructions, the above instructions can be executed by the processor 410 of the electronic device 400 to complete the above image style transfer method step.
  • the non-transitory computer-readable storage medium may be ROM, Random Access Memory (RAM), Compact Disc ROM (CD-ROM), magnetic tape, floppy disk, and optical data storage device.
  • an embodiment of the present application further provides a computer program product.
  • the computer program product includes program instructions.
  • the program instructions When executed by the electronic device, the electronic device is caused to perform the steps of the above image style transfer method.
  • FIGS. 1-2 For details, reference may be made to the embodiment shown in FIGS. 1-2 above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请是关于一种图像风格迁移方法和系统。图像风格迁移方法包括:获取风格图片以及第一内容图片;第一神经网络对所述第一内容图片进行图像风格迁移以得到第二内容图片;第二神经网络基于损失函数计算所述第二内容图片与所述第一内容图片之间以及所述第二内容图片与所述风格图片之间的损失;根据所述损失优化所述第一神经网络;基于优化的所述第一神经网络对给定的第三内容图片进行图像风格迁移以得到第四内容图片,实现实时的风格迁移,在提高风格迁移的速度的同时最大程度保留内容图片的真实细节。

Description

图像风格迁移方法和系统
本申请要求于2018年11月1日提交中国专利局、申请号为201811295359.0发明名称为“图像风格迁移方法和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请属于计算机软件应用领域,尤其是一种图像风格迁移方法和系统。
背景技术
真实图像风格迁移(Photorealistic Style Transfer,PST)研究的是自然图片之间的风格转换。例如:将一幅人像图片转换成具有油画风格的人像图片;或者将光线昏暗条件下拍摄的风景照片转换成光线明亮条件下的风景图片等。具体的真实图像风格迁移过程包括:指定一幅输入图片作为基础图片,也被称为内容图片,同时指定一幅或多幅图片作为具有希望得到的图像风格的风格图片,在保证内容图片的结构同时,将内容图片的图像风格进行转换,最终输出的转换后图片为呈现出内容图片和风格图片的图像风格的完美结合的图片,即风格化的图片。
现有技术通常采用深度学习网络提取特征来辅助图像风格迁移。这里,采用深度学习网络来辅助图像风格迁移具体可以分为两类:第一类为基于优化网络来辅助图像风格迁移,第二类为基于前置网络来辅助图像风格迁移。
但发明人发现:
第一类是运用卷积神经网络(Convolutional Neural Network,CNN)提取内容图片和风格图片的特征,构建格莱姆矩阵(Gram Matrix),从而定义内容损失函数和风格损失函数,之后通过优化求解得到风格化的目标图片。这种方法虽然可以得到真实度很好的风格化的图片,但是因为优化求解过程需要花费很长时间,所以无法实现快速地图像风格迁移。
第二类是运用误差反向传播(Error Back Propagation,BP)算法对一个前置网络进行训练,然后根据训练之后的前置网络对给定的内容图片进行图像风格迁移,得到风格化的内容图片。这种方法虽然可以实现快速地图像风格迁移,但是因为在图像风格迁移过程中存在内容损失和风格损失,所以无法得到高保真的风格化的内容图片,也就是即使输入的内容图片和风格图片都是真实照片,输出的图片仍然像衣服变旧扭曲的绘画。
发明内容
为克服相关技术中存在的问题,本申请实施例公开一种图像风格迁移方法和系统,在提高图像风格迁移的速度的同时最大程度保留内容图片的真实细节。
根据本申请实施例的第一方面,提供一种图像风格迁移方法,包括:
获取风格图片以及第一内容图片;
第一神经网络对所述第一内容图片进行风格迁移以得到第二内容图片;
第二神经网络基于损失函数计算所述第二内容图片与所述第一内容图片之间以及所述第二内容图片与所述风格图片之间的损失;
根据所述损失优化所述第一神经网络;
基于优化的所述第一神经网络对给定的第三内容图片进行风格迁移以得到第四内容图片。
根据本申请实施例的第二方面,提供一种图像风格迁移系统,包括:
获取模块,用于获取风格图片以及第一内容图片;
第一处理模块,用于根据第一神经网络对所述第一内容图片进行风格迁移以得到第二内容图片;
第二处理模块,用于根据第二神经网络基于损失函数计算所述第二内容图片与所述第一内容图片之间以及所述第二内容图片与所述风格图片之间的损失,根据所述损失优化所述第一神经网络;以及
执行模块,用于基于优化的所述第一神经网络对给定的第三内容图片进行风格迁移以得到第四内容图片。
根据本申请实施例的第三方面,提供一种电子设备,包括:
处理器;
用于存储处理器可执行指令的存储器;
其中,所述处理器被配置为执行上述第一方面所述的图像风格迁移方法。
根据本申请实施例的第四方面,提供一种非临时性计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,所述计算机指令被执行时实现如上述第一方面所述的图像风格迁移方法。
根据本申请实施例的第五方面,提供一种计算机程序产品,所述计算机程序产品包括程序指令,当程序指令被电子设备执行时,使所述电子设备执行如上述第一方面所述的图像风格迁移方法。
本申请实施例提供的技术方案可以包括以下有益效果:
1)在训练阶段使用大量的计算资源优化第一神经网络,在测试阶段直接采用第一神经网络进行图像风格迁移,可以快速得到真实度很高的风格化的图片,实现实时地图像风格迁移。
2)在训练阶段采用语义分割算法增强第二神经网络中的风格损失函数,避免了图像风格迁移过程中出现语义上的错误迁移。
3)第二神经网络在训练阶段采用图像损失函数优化第一神经网络,可以在图像风格迁移过程中最大程度的保留输入图片的真实细节。
4)本申请实施例提供的图像风格迁移模型只需要训练一次,不像现有技术的图像风格迁移模型在每次使用的时候都要重新训练,提高了图像风格迁移模型的利用效率。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。
图1是根据一示例性实施例示出的图像风格迁移方法的流程图;
图2是根据一示例性实施例示出的图像风格迁移系统的结构图;
图3是根据一示例性实施例示出的图像风格迁移系统的框图;
图4是根据一示例性实施例示出的优化单元的框图;
图5是根据一示例性实施例示出的一种执行图像风格迁移方法的电子设备的框图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
图1是根据一示例性实施例示出的图像风格迁移方法的流程图,包括以下的步骤。
在步骤S110中,获取风格图片以及第一内容图片。
在一种实施例中,风格图片为具有在进行图像风格迁移过程中想要内容图片呈现的图像风格的图片,内容图片为要进行图像风格迁移的图片。最终输出的图像风格迁移后的图片为呈现出内容图片和风格图片的完美结合的图片。
其中,图像风格为图片所具有的画风。例如,油画风格、光线明亮风格、怀旧风格、牛仔风格等。不同的图像风格,可采用不同的纹理、色调、笔触等表示。图像风格迁移为将一张图片的图像风格调整为另一种图像风格。例如:将一幅人像图片转换成具有油画风格的人像图片;或者将光线昏暗条件下拍摄的风景照片转换成光线明亮下的风景图片等。基于此,内容图片即为要调整图像风格的图片,最终输出的图像风格迁移后的图片即为将图像风格调整为风格图片的图像风格的内容图片。
所获取的风格图片可以是一张也可以是多张。当风格图片是多张时,多张风格图片应当具有同一种图像风格。
所获取的第一内容图片可以是一张也可以是多张。当第一内容图片是多张时,多张第一内容图片可以具有不同的图像风格,也可以具有相同的图像风格,第一内容图片所具备的图像风格与风格图片具备的图像风格不同。
在步骤S120中,使用第一神经网络对第一内容图片进行风格迁移得到第二内容图片。
本申请实施例中,风格迁移即为图像风格迁移。电子设备使用第一神经网络对第一内容图片进行风格迁移得到第二内容图片,即电子设备将第一内容图片输入第一神经网络,第一神经网络对第一内容图片进行处理,将第一内容图片的图像风格调整为风格图片的图 像风格,得到第二内容图片。也就是,步骤S120即为:第一神经网络对第一内容图片进行图像风格迁移以得到第二内容图片。
在一些实施例中,第一神经网络可以采用卷积神经网络(Convolutional Neural Network,CNN),第一神经网络对第一内容图片进行图像风格迁移,得到风格化的第二内容图片。
第一神经网络的输入为第一内容图片,第一神经网络的输出为第二内容图片。第二内容图片为呈现出第一内容图片的图像内容和风格图片的图像风格结合的图片。图像内容可以为汽车、树、人等。
在步骤S130中,使用第二神经网络计算第二内容图片和第一内容图片以及风格图片之间的损失。
本申请实施例中,第二神经网络中包括损失函数。电子设备使用第二神经网络计算第二内容图片和第一内容图片以及风格图片之间的损失,即电子设备将第二内容图片和第一内容图片以及风格图片输入第二神经网络,第二神经网络基于损失函数,对第二内容图片和第一内容图片以及风格图片进行处理,得到第二内容图片和第一内容图片以及风格图片之间的损失。也就是,步骤S130即为:第二神经网络基于损失函数计算第二内容图片与第一内容图片之间以及第二内容图片与风格图片之间的损失。
在一些实施例中,第二神经网络可以采用超分辨率测试序列(Visual Geometry Group,VGG)网络。第二神经网络的输入为第二内容图片和第一内容图片以及风格图片,第二神经网络的输出为第二内容图片和第一内容图片以及风格图片之间的损失。
在一些实施例中,第一神经网络和第二神经网络均包括了多层卷积层(Convolutional Layer)、池化层(Pooling Layer)以及全连通层。
其中,卷积层即为特征提取层(Feature Layer)。每个卷积层由若干个卷积单元组成。卷积单元可以由滤波器实现。卷积层通过卷积运算提取输入的图片的特征。池化层通过不断缩小数据的空间大小,继而降低神经网络的参数的数量和计算量,可以在一定程度上控制过拟合。
第二神经网络还包括损失函数层,损失函数层包含损失函数。损失函数用于计算得到神经网络的预测结果和真实结果之间的损失。基于计算得到的损失,可决定如何缩短神经网络的预测结果和真实结果之间的差异。
在步骤S140中,根据上述损失优化第一神经网络。
在一种实施例中,图像风格迁移的基本原则是,风格化的图片既要保持内容图片的原始图像内容,例如内容图片的图像内容是一部汽车,风格化的图片的图像内容也应是一部汽车,不能变成摩托车。另外,风格化的图片又要保持风格图片的特有的图像风格,例如纹理、色调、笔触等。
本申请实施例中,上述步骤S110-S140所示的过程即为第一神经网络的训练过程。在第一神经网络的训练过程中,通过反向传播算法优化卷积层的每个卷积单元的参数。训练好的第一神经网络可很好地将给定的内容图片的图像风格调整为步骤S110中所获取的一张或多张风格图片所具体的图像风格。优化后的第一神经网络即为图像风格迁移模型。
在第一神经网络的训练过程中将第一内容图片、第二内容图片以及风格图片输入第二 神经网络。第二神经网络为损失网络,用于评价第一神经网络输出的内容图片与原始内容图片和风格图片之间的差异。第二神经网络根据损失函数计算经第一神经网络得到的第二内容图片与第一内容图片以及风格图片之间的差异,根据这种差异优化第一神经网络。
在步骤S150中,基于优化后的第一神经网络对给定的内容图片进行风格迁移得到风格化的内容图片。
本申请实施例中,优化后的第一神经网络即为根据步骤S110-S140训练好的第一神经网络。电子设备将给定的内容图片输入优化后的第一神经网络,第一神经网络对给定的内容图片进行处理,将给定的内容图片的图像风格调整为风格图片的图像风格,得到风格化的内容图片。以给定的内容图片为第三内容图片,风格化的内容图片为第四内容图片为例,并不起限定作用。基于此,步骤S150即为:基于优化的第一神经网络对给定的第三内容图片进行图像风格迁移以得到第四内容图片。第四内容图片为呈现出第三内容图片的图像内容和风格图片的图像风格结合的图片。
在一些实施例中,损失函数可以包括:内容损失函数、风格损失函数以及图像损失函数。内容损失指原始内容图片的图像内容与风格化的内容图片的图像内容之间损失。其中,图片的具体视觉特征包括纹理、颜色等。风格损失指原始内容图片的具体视觉特征与风格化的内容图片的具体视觉特征之间损失。其中,图片的具体视觉特征包括纹理、颜色等。图像损失指在图像细节上损失。图像细节包括颜色信息、图像内容的位置信息等。
第二神经网络采用内容损失函数计算第二内容图片与第一内容图片之间的内容损失;采用风格损失函数计算第二内容图片与风格图片之间的风格损失;以及采用图像损失函数计算第二内容图片与第一内容图片之间的图像损失。
在卷积神经网络中,较低层的卷积层输出的特征描述了图片的具体视觉特征,较高层的卷积层输出的特征描述了较为抽象的图像内容。
其中,较低层的卷积层输出的特征即为低层特征,较高层的卷积层输出的特征即为高层特征。
在一些实施例中,第二神经网络中的内容损失函数通过比较第一神经网络输出的第一内容图片和第二内容图片的高层特征的相似性,来比较第一内容图片和第二内容图片之间的内容相似性。基于第一内容图片和第二内容图片之间的内容相似性,确定第一内容图片和第二内容图片之间的内容损失。根据内容损失优化第一神经网络。其中,第一内容图片和第二内容图片的高层特征的相似性可以以第一内容图片和第二内容图片的高层特征之间的欧式距离表示,也可以采用其他方式表示第一内容图片和第二内容图片的高层特征的相似性,本申请实施例不进行具体限定。
一个示例中,内容损失函数可以为:
Figure PCTCN2019111968-appb-000001
其中,N i表示第二神经网络中第i层卷积层中卷积单元的总数,D i表示第i层卷积层中每个卷积单元对应的特征图(Feature Map)中的总像素数,F i(O)用于表示第二内容图片在第i层卷积层的特征矩阵,F i(I)用于表示第一内容图片在第i层卷积层的特征矩阵,(j,k)用于表示特征矩阵中第j个和第k个特征向量,j和k为大于零的自然数。
具体地,第二神经网络首先分别获取第一内容图片和第二内容图片在第i个卷积层中的N个特征图,然后将N个特征图中的像素分别转换成特征向量,将多个特征向量分别按照横向和纵向排列得到矩阵,继而得到第一内容图片的特征矩阵F i(I)和第二内容图片的特征矩阵F i(O),(j,k)分别表示特征矩阵中横向第j个和纵向第k个特征向量,j和k为大于零的自然数。
在一些实施例中,第二神经网络通过比较第一神经网络输出的第二内容图片和风格图片的低层特征的相似性,来判断第二内容图片和风格图片之间的风格相似性。基于第二内容图片和风格图片之间的风格相似性,确定第二内容图片和风格图片之间的风格损失。根据风格损失优化第一神经网络。其中,第二内容图片和风格图片的低层特征的相似性可以以第二内容图片和风格图片的低层特征之间的欧式距离表示,也可以采用其他方式表示第二内容图片和风格图片的低层特征的相似性,本申请实施例不进行具体限定。
一个示例中,风格损失函数可以为:
Figure PCTCN2019111968-appb-000002
其中,N i表示第二神经网络第i层卷积层中卷积单元的总数,G i(O)用于表示第二内容图片在第i层卷积层的格莱姆矩阵,G i(I)用于表示风格图片在第i层卷积层的格莱姆矩阵,(j,k)用于表示格莱姆矩阵中第j个和第k个特征向量,j和k为大于零的自然数。
具体地,第二神经网络首先分别获取风格图片和第二内容图片在第i个卷积层中的N个特征图,然后将N个特征图中的像素分别转换成特征向量,分别求出N个特征向量中每两个特征向量之间的内积,将多个内积分别按照横向和纵向排列得到矩阵,继而得到风格图片的格莱姆矩阵G i(I)和第二内容图片的格莱姆矩阵G i(O),(j,k)分别表示所述格莱姆矩阵中横向第j个和纵向第k个特征向量,j和k为大于零的自然数。
在一些实施例中,为了在图像风格迁移过程中最大程度的保留输入图片的真实细节,第二神经网络还可以根据图像损失函数比较第一内容图片和第二内容图片在图像细节上的差异,得到图像损失,然后根据图像损失优化第一神经网络。上述在图像细节上的差异包括在颜色通道上的通道损失、图像内容的位置偏差。例如,图像内容为汽车,则图像内容的位置偏差可以为第一内容图片中的汽车和第二内容图片中的汽车之间的位置偏差。
一个示例中,基于图像损失函数计算第二内容图片与第一内容图片之间的图像损失可以包括:计算第一内容图片和第二内容图片在每个颜色通道上的通道损失,将每个颜色通道上的通道损失相加,得到第二内容图片与第一内容图片之间的图像损失。
因为输入图片具有需要的所有真实图像细节,因此为了保持这些真实图像细节,可以基于局部仿射变换原理得到图像损失函数。例如,输出图片在每一个颜色通道上的每一个像素值,都可以由输入图片的局部区域像素的线性组合得到。
一个示例中,图像损失函数可以为:
Figure PCTCN2019111968-appb-000003
其中,V i(O)表示第二内容图片在第i个颜色通道上的向量化结果,M I表示第一内容图片的拉普拉斯抠图矩阵。这一约束可以保证第二内容图片在边缘结构上与第一内容图片尽量相似,从而最大程度上保留了图像的真实细节。
在一些实施例中,为了避免在图像风格迁移过程中出现语义上的错误迁移,还可以在第一神经网络的训练过程中基于语义分割算法优化风格损失函数。通过语义分割算法对带有一组通用标签的第一内容图片进行语义分割,得到分割模板;以及将分割模板添加到第二神经网络的输入图片(即第一内容图片)上作为另一通道,通过语义分割通道增强风格损失函数,将分割模板输入风格损失函数以增强风格损失函数。具体的,将分割模板添加到第一内容图片上作为另一通道,通过语义分割通道增强风格损失函数,将分割模板输入风格损失函数以增强风格损失函数。上述通用标签包括天空,建筑,水等。
基于上述内容损失函数、风格损失函数以及图像损失函数,得到第二神经网络的总的损失函数可以为:
Figure PCTCN2019111968-appb-000004
其中,L表示第二神经网络的卷积层的总层数,i表示第二神经网络的第i个层卷积层,L和i为大于零的自然数,
Figure PCTCN2019111968-appb-000005
表示内容损失函数,
Figure PCTCN2019111968-appb-000006
表示风格损失函数,
Figure PCTCN2019111968-appb-000007
表示图像损失函数,λ s和λ m用于调节内容损失函数、风格损失函数以及图像损失函数的比例,α i和β i用于调节第二神经网络第i层卷积层的权重。
具体地,第二神经网络根据不同卷积层输出的结果计算出内容损失、风格损失以及图像损失,进而求得总损失,然后使用梯度下降、反向传播等的优化方法不断更新第一神经网络的参数。
下面结合图2,对本申请实施例提供的第二内容图片和第一内容图片以及风格图片之间损失的计算进行详细说明。
图2是根据一示例性实施例示出的图像风格迁移系统的结构图。如图2所示,第一神经网络采用卷积神经网络(Convolutional Neural Network,CNN)。第二神经网络采用19层超分辨率测试序列(Visual Geometry Group,VGG)网络,该VGG网络包含16个卷积层和3个池化层,并使用平均池化(average-pooling)代替最大池化(max-pooling)。本申请实施例中,第二神经网络还可以采用其他结构的VGG网络。
第一神经网络接收第一内容图片I input,对第一内容图片进行图像风格化迁移得到第二内容图片I out。第二神经网络接收第一内容图片I input、第二内容图片I out以及风格图片I style,第二神经网络根据不同卷积层输出的结果,基于内容损失函数和风格损失函数计算出内容损失和风格损失,以及基于图像损失函数计算出图像损失,进而得到总损失。然后使用梯度下降、反向传播等的优化方法不断更新第一神经网络的参数。
具体地,如图2所示,第二神经网络根据conv4_2层卷积层输出的结果计算出第二内容图片与第一内容图片之间的内容损失(Content Loss);根据conv1_1、conv2_1、conv3_1、conv4_1和conv5_1这5层卷积层输出的结果分别与风格图片计算出第二内容图片与风格图片之间的风格损失,然后将多个风格损失相加得到总的风格损失(Style Loss);根据第二内容图片在每个颜色通道上的像素(Pixel)数与第一内容图片的拉普拉斯抠图矩阵计算出图像损失。将内容损失、总的风格损失以及图像损失求和,得到总损失。
在本申请实施例提供的图像风格迁移方法具有以下的有益效果。
在第一神经网络的训练阶段,采用第一神经网络对第一内容图片进行图像风格迁移而得到第二内容图片,然后采用第二神经网络基于多种损失函数计算出第二内容图片的风格损失以及内容损失,根据上述的风格损失以及内容损失采用梯度下降、反向传播等的优化方法更新第一神经网络的参数。在对第一神经网络训练结束后,基于优化后的第一神经网络对给定的内容图片进行图像风格化迁移,可以得到真实度高的风格化的图片。
本申请实施例提供的图像风格迁移方法可以在第一神经网络的训练阶段,可以获取具有同一种图像风格的多张风格图片以及具备的图像风格不同的多张第一内容图片,基于获取的风格图片和第一内容图片优化第一神经网络,也就是,使用大量的计算资源优化第一神经网络,在应用阶段直接采用优化后的第一神经网络进行图像风格迁移,可以快速得到真实度很高的风格化的图片,实现实时地图像风格迁移。
在一些实施例中,在训练阶段采用语义分割算法增强第二神经网络中的风格损失函数,避免了图像风格迁移过程中出现语义上的错误迁移。
在一些实施例中,第二神经网络在训练阶段采用图像损失函数优化第一神经网络,可以在图像风格迁移过程中最大程度的保留输入图片的真实细节。
在一些实施例中,本申请实施例提供的图像风格迁移模型只需要训练一次,不像现有技术的图像风格迁移模型在每次使用的时候都要重新训练,提高了图像风格迁移模型的利用效率。
图3是根据一示例性实施例示出的图像风格迁移系统的框图。如图3所示,图像风格迁移系统包括:获取模块310,用于获取风格图片以及第一内容图片;第一处理模块320,用于根据第一神经网络对第一内容图片进行图像风格迁移以得到第二内容图片;第二处理模块330,用于根据第二神经网络基于损失函数计算第二内容图片与第一内容图片以及风格图片之间的损失,根据上述损失优化第一神经网络;以及执行模块340,用于基于优化的第一神经网络对给定的第三内容图片进行图像风格迁移以得到第四内容图片。
在一个实施例中,第一神经网络和第二神经网络均包括了多层特征提取层,也即卷积层、池化层以及全连通层。
每个卷积层由若干个卷积单元(即滤波器)组成,每个卷积单元的参数通过反向传播算法优化得到。卷积层通过卷积运算提取输入的图片的特征。池化层通过不断缩小数据的空间大小,继而降低神经网络的参数的数量和计算量,可以在一定程度上控制过拟合。
第二神经网络还包括损失函数层,损失函数用于决定在训练过程中如何“惩罚”神经网络中的预测结果和真实结果之间的差异。
在一些实施例中,第一神经网络采用CNN。第二神经网络采用VGG网络。例如,第二神经网络采用19层VGG网络,该VGG网络包含16个卷积层和3个池化层,并使用平均池化代替最大池化。
第一神经网络接收第一内容图片,对第一内容图片进行图像风格化迁移得到第二内容图片。第二神经网络接收第一内容图片、第二内容图片以及风格图片,第二神经网络根据不同卷积层输出的结果计算出内容损失和风格损失,以及基于图像损失函数计算出图像损失,进而得到总损失。然后使用梯度下降、反向传播等的优化方法不断更新第一神经网络的参数。
其中,第二神经网络采用内容损失函数计算第二内容图片与第一内容图片之间的内容损失;以及采用风格损失函数计算第二内容图片与风格图片之间的风格损失。
在一些实施例中,为了在图像风格迁移过程中最大程度的保留输入图片的真实细节,第二神经网络还根据图像损失函数比较第一内容图片和第二内容图片在细节上的差异(即图像损失),然后根据图像损失优化第一神经网络。
综上,第二处理模块330可以被设置为:基于内容损失函数计算第二内容图片与第一内容图片之间的内容损失;基于风格损失函数计算第二内容图片与风格图片之间的风格损失;以及基于图像损失函数计算第二内容图片与第一内容图片之间的图像损失。
在一些实施例中,为了避免在图像风格迁移过程中出现语义上的错误迁移,第二处理模块330还可以包括优化单元331,用于在训练过程中基于语义分割算法优化风格损失函数。
在一些实施例中,如图4所示,优化单元331可以包括分割单元3311,用于通过语义分割算法对带有一组通用标签(例如天空、建筑、水等)的第一内容图片进行语义分割,得到分割模板;以及函数增强单元3312,用于将分割模板添加到第二神经网络的输入图片(即第一内容图片)上作为另一通道,通过语义分割通道增强风格损失函数,将分割模板输入风格损失函数以增强风格损失函数。
关于本申请实施例中的装置,其中各个单元或模块执行操作的具体方式已经在有关方法的实施例中进行了详细描述,此处将不做详细阐述说明。
图5是根据一示例性实施例示出的一种具备图像风格迁移方法的电子设备400的框图。例如,电子设备400可以是移动电话、小型计算机、游戏控制台、平板设备、医疗设备、健身设备、个人数字助理等。
参照图5,电子设备400可以包括以下一个或多个组件:处理组件405、存储器401、电源组件402、多媒体组件403、音频组件404、输入/输出(Input/Output,I/O)接口408、传感器组件407以及通信组件406。
处理组件405通常控制电子设备400的整体操作,诸如与显示、电话呼叫、数据通信、相机操作和记录操作相关联的操作。处理组件405可以包括一个或多个处理器410来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件405可以包括一个或多个模块,便于处理组件405和其他组件之间的交互。例如,处理组件405可以包括多媒体模块,以方便多媒体组件403和处理组件405之间的交互。
存储器401被配置为存储各种类型的数据以支持在电子设备400的操作。这些数据的示例包括用于在电子设备400上操作的任何应用程序或方法的指令、联系人数据、电话簿数据、消息、图片、视频等。存储器401可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(Static Random-Access Memory,SRAM)、电可擦除可编程只读存储器(Electrically Erasable Programmable Read Only Memory,EEPROM)、可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、可编程只读存储器(Programmable Read Only Memory,PROM)、只读存储器(Read Only Memory,ROM)、磁存储器、快闪存储器、磁盘或光盘等。
电源组件402为电子设备400的各种组件提供电力。电源组件402可以包括电源管理系统,一个或多个电源,及其他与为电子设备400生成、管理和分配电力相关联的组件。
多媒体组件403包括在电子设备400和用户之间提供的一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(Liquid Crystal Display,LCD)和触摸面板(Touch Panel,TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。触摸传感器不仅可以感测触摸或滑动动作的边界,而且还可以检测与触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件403包括一个前置摄像头和/或后置摄像头。当电子设备400处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件404被配置为输出和/或输入音频信号。例如,音频组件404包括一个麦克风(Microphone,MIC),当电子设备400处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器401或经由通信组件406发送。在一些实施例中,音频组件404还可以包括一个扬声器,用于输出音频信号。
I/O接口408为处理组件405和外围接口模块之间提供接口。上述外围接口模块可以是键盘、点击轮、按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件407包括一个或多个传感器,用于为电子设备400提供各个方面的状态评估。例如,传感器组件407可以检测电子设备400的打开/关闭状态、组件的相对定位,例如所述组件为电子设备400的显示器和小键盘。传感器组件407还可以检测电子设备400或电子设备400的其中一个组件的位置改变,用户与电子设备400接触的存在或不存在,电子设备400方位或加速/减速和电子设备400的温度变化等。传感器组件407可以包括接近传感器,接近传感器被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件407还可以包括光传感器,如互补金属氧化物半导体(Complementary Metal Oxide Semiconductor,CMOS)传感器或电荷耦合器件(Charge Coupled Device,CCD)图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件407还可以包括加速度传感器、陀螺仪传感器、磁传感器、压力传感器或温度传感器等。
通信组件406被配置为便于电子设备400和其他设备之间有线或无线方式的通信。电子设备400可以接入基于通信标准的无线网络,如WiFi、运营商网络(如2G、3G、4G或5G)、或它们的组合。在一个示例性实施例中,通信组件406经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,通信组件406还可以包括近场通信(Near Field Communication,NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(Radio Frequency Identification,RFID)技术,红外数据协会(Infrared Data Association,IrDA)技术,超宽带(Ultra Wide Band,UWB)技术,蓝牙(Blue Tooth,BT)技术和其他技术来实现。
在示例性实施例中,电子设备400可以被一个或多个应用专用集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器(Digital Signal Processor DSP)、数字信号处理设备(Digital Signal Processor Device,DSPD)、可编程逻辑器件(Programmable Logic Device,PLD)、现场可编程门阵列(Field Programmable Gate Array,FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述图像风格迁移方法的步骤。
对于电子设备实施例而言,由于其基本相似于图像风格迁移方法实施例,所以描述的比较简单,相关之处参见图1-2所示的图像风格迁移方法实施例的部分说明即可。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器401,上述指令可由电子设备400的处理器410执行以完成上述图像风格迁移方法的步骤。具体可参考上述图1-2所示实施例。例如,非临时性计算机可读存储介质可以是ROM、随机存取存储器(Random Access Memory,RAM)、只读光盘(Compact Disc ROM,CD-ROM)、磁带、软盘和光数据存储设备等。
对于非临时性计算机可读存储介质实施例而言,由于其基本相似于图像风格迁移方法实施例,所以描述的比较简单,相关之处参见图1-2所示的图像风格迁移方法实施例的部分说明即可。
在示例性实施例中,本申请实施例还提供了一种计算机程序产品,计算机程序产品包括程序指令,当程序指令被电子设备执行时,使电子设备执行上述图像风格迁移方法的步骤。具体可参考上述图1-2所示实施例。
对于计算机程序产品实施例而言,由于其基本相似于图像风格迁移方法实施例,所以描述的比较简单,相关之处参见图1-2所示的图像风格迁移方法实施例的部分说明即可。
本领域技术人员在考虑说明书及实践这里公开的申请后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求指出。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (38)

  1. 一种图像风格迁移方法,包括:
    获取风格图片以及第一内容图片;
    第一神经网络对所述第一内容图片进行图像风格迁移以得到第二内容图片;
    第二神经网络基于损失函数计算所述第二内容图片与所述第一内容图片之间以及所述第二内容图片与所述风格图片之间的损失;
    根据所述损失优化所述第一神经网络;
    基于优化的所述第一神经网络对给定的第三内容图片进行图像风格迁移以得到第四内容图片。
  2. 根据权利要求1所述的图像风格迁移方法,所述损失函数包括:内容损失函数、风格损失函数以及图像损失函数。
  3. 根据权利要求2所述的图像风格迁移方法,所述第二神经网络基于损失函数计算所述第二内容图片与所述第一内容图片之间以及所述第二内容图片与所述风格图片之间的损失包括:
    基于内容损失函数计算所述第二内容图片与所述第一内容图片之间的内容损失;
    基于风格损失函数计算所述第二内容图片与所述风格图片之间的风格损失;以及
    基于图像损失函数计算所述第二内容图片与所述第一内容图片之间的图像损失。
  4. 根据权利要求2或3所述的图像风格迁移方法,还包括:
    基于语义分割算法优化所述风格损失函数。
  5. 根据权利要求4所述的图像风格迁移方法,所述基于语义分割算法优化所述风格损失函数包括:
    对所述第一内容图片进行语义分割,得到分割模板;
    将所述分割模板输入所述风格损失函数,增强所述风格损失函数。
  6. 根据权利要求3所述的图像风格迁移方法,所述基于图像损失函数计算所述第二内容图片与所述第一内容图片之间的图像损失包括:
    计算所述第一内容图片和所述第二内容图片在每个颜色通道上的通道损失;
    将所述每个颜色通道上的通道损失相加,得到所述第二内容图片与所述第一内容图片之间的图像损失。
  7. 根据权利要求2所述的图像风格迁移方法,所述损失函数为:
    Figure PCTCN2019111968-appb-100001
    其中,所述L表示所述第二神经网络的卷积层的总层数,i表示第二神经网络的第i层卷积层,L和i为大于零的自然数,
    Figure PCTCN2019111968-appb-100002
    表示所述内容损失函数,
    Figure PCTCN2019111968-appb-100003
    表示所述风格损失函数,
    Figure PCTCN2019111968-appb-100004
    表示所述图像损失函数,λ s和λ m用于调节所述内容损失函数、风格损失函数以及所述图像损失函数的比例,α i和β i用于调节所述第二神经网络第i层卷积层的权重。
  8. 根据权利要求7所述的图像风格迁移方法,所述内容损失函数为:
    Figure PCTCN2019111968-appb-100005
    其中,所述N i表示所述第二神经网络第i层卷积层中滤波器的总数,D i表示第i层卷积层中每个所述滤波器对应的特征图的总像素数,F i(O)表示所述第二内容图片在第i层卷积层的特征矩阵,F i(I)表示所述第一内容图片在第i层卷积层的特征矩阵,(j,k)表示所述特征矩阵中第j个和第k个特征向量,j和k为大于零的自然数。
  9. 根据权利要求7所述的图像风格迁移方法,所述风格损失函数为:
    Figure PCTCN2019111968-appb-100006
    其中,所述N i表示所述第二神经网络第i层卷积层中滤波器的总数,G i(O)表示所述第二内容图片在第i层卷积层的格莱姆矩阵,G i(I)表示所述风格图片在第i层卷积层的格莱姆矩阵,(j,k)用于表示所述格莱姆矩阵中第j个和第k个特征向量,j和k为大于零的自然数。
  10. 根据权利要求7所述的图像风格迁移方法,所述图像损失函数为:
    Figure PCTCN2019111968-appb-100007
    其中,所述V i(O)表示所述第二内容图片在第i个颜色通道上的向量化结果,所述M I表示所述第一内容图片的拉普拉斯抠图矩阵。
  11. 根据权利要求1所述的图像风格迁移方法,所述第一神经网络为卷积神经网络。
  12. 根据权利要求1所述的图像风格迁移方法,所述第二神经网络为超分辨率测试序列VGG网络。
  13. 一种图像风格迁移系统,包括:
    获取模块,用于获取风格图片以及第一内容图片;
    第一处理模块,用于根据第一神经网络对所述第一内容图片进行图像风格迁移以得到第二内容图片;
    第二处理模块,用于根据第二神经网络基于损失函数计算所述第二内容图片与所述第一内容图片之间以及所述第二内容图片与所述风格图片之间的损失,根据所述损失优化所述第一神经网络;以及
    执行模块,用于基于优化的所述第一神经网络对给定的第三内容图片进行图像风格迁移以得到第四内容图片。
  14. 根据权利要求13所述的图像风格迁移系统,所述损失函数包括:内容损失函数、 风格损失函数以及图像损失函数。
  15. 根据权利要求14所述的图像风格迁移系统,所述第二处理模块被设置为:
    基于内容损失函数计算所述第二内容图片与所述第一内容图片之间的内容损失;
    基于风格损失函数计算所述第二内容图片与所述风格图片之间的风格损失;以及
    基于图像损失函数计算所述第二内容图片与所述第一内容图片之间的图像损失。
  16. 根据权利要求14或15所述的图像风格迁移系统,所述第二处理模块还包括:
    优化单元,用于基于语义分割算法优化所述风格损失函数。
  17. 根据权利要求15所述的图像风格迁移系统,所述优化单元包括:
    分割单元,用于对所述第一内容图片进行语义分割,得到分割模板;
    函数增强单元,用于将所述分割模板输入所述风格损失函数以增强所述风格损失函数。
  18. 根据权利要求15所述的图像风格迁移系统,所述第二处理模块具体被设置为:
    计算所述第一内容图片和所述第二内容图片在每个颜色通道上的通道损失;
    将所述每个颜色通道上的通道损失相加,得到所述第二内容图片与所述第一内容图片之间的图像损失。
  19. 根据权利要求14所述的图像风格迁移系统,所述损失函数为:
    Figure PCTCN2019111968-appb-100008
    其中,所述L表示所述第二神经网络的卷积层的总层数,i表示第二神经网络的第i层卷积层,L和i为大于零的自然数,
    Figure PCTCN2019111968-appb-100009
    表示所述内容损失函数,
    Figure PCTCN2019111968-appb-100010
    表示所述风格损失函数,
    Figure PCTCN2019111968-appb-100011
    表示所述图像损失函数,λ s和λ m用于调节所述内容损失函数、风格损失函数以及所述图像损失函数的比例,α i和β i用于调节所述第二神经网络第i层卷积层的权重。
  20. 根据权利要求19所述的图像风格迁移系统,所述内容损失函数为:
    Figure PCTCN2019111968-appb-100012
    其中,所述N i表示所述第二神经网络第i层卷积层中滤波器的总数,D i表示第i层卷积层中所述滤波器对应的特征图的总像素数,F i(O)表示所述第二内容图片第i层卷积层中的特征矩阵,F i(I)表示所述第一内容图片第i层卷积层中的特征矩阵,(j,k)表示所述特征矩阵中第j个和第k个特征向量,j和k为大于零的自然数。
  21. 根据权利要求19所述的图像风格迁移系统,所述风格损失函数为:
    Figure PCTCN2019111968-appb-100013
    其中,所述N i表示所述第二神经网络第i层卷积层中滤波器的总数,G i(O)表示所述第 二内容图片在第i层卷积层的格莱姆矩阵,G i(I)表示所述风格图片在第i层卷积层的格莱姆矩阵,(j,k)用于表示所述格莱姆矩阵中第j个和第k个特征向量,j和k为大于零的自然数。
  22. 根据权利要求19所述的图像风格迁移系统,所述图像损失函数为:
    Figure PCTCN2019111968-appb-100014
    其中,所述V i(O)表示所述第二内容图片在第i个颜色通道上的向量化结果,所述M I表示所述第一内容图片的拉普拉斯抠图矩阵。
  23. 根据权利要求13所述的图像风格迁移系统,所述第一神经网络为卷积神经网络。
  24. 根据权利要求13所述的图像风格迁移系统,所述第二神经网络为超分辨率测试序列VGG网络。
  25. 一种电子设备,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为执行:
    获取风格图片以及第一内容图片;
    第一神经网络对所述第一内容图片进行图像风格迁移以得到第二内容图片;
    第二神经网络基于损失函数计算所述第二内容图片与所述第一内容图片之间以及所述第二内容图片与所述风格图片之间的损失;
    根据所述损失优化所述第一神经网络;
    基于优化的所述第一神经网络对给定的第三内容图片进行图像风格迁移以得到第四内容图片。
  26. 根据权利要求25所述的电子设备,所述损失函数包括:内容损失函数、风格损失函数以及图像损失函数。
  27. 根据权利要求26所述的电子设备,所述处理器被配置为具体执行:
    基于内容损失函数计算所述第二内容图片与所述第一内容图片之间的内容损失;
    基于风格损失函数计算所述第二内容图片与所述风格图片之间的风格损失;以及
    基于图像损失函数计算所述第二内容图片与所述第一内容图片之间的图像损失。
  28. 根据权利要求25或26所述的电子设备,所述处理器被配置为还执行:
    基于语义分割算法优化所述风格损失函数。
  29. 根据权利要求28所述的电子设备,所述处理器被配置为具体执行:
    对所述第一内容图片进行语义分割,得到分割模板;
    将所述分割模板输入所述风格损失函数,增强所述风格损失函数。
  30. 根据权利要求27所述的电子设备,所述处理器被配置为具体执行:
    计算所述第一内容图片和所述第二内容图片在每个颜色通道上的通道损失;
    将所述每个颜色通道上的通道损失相加,得到所述第二内容图片与所述第一内容图片之间的图像损失。
  31. 根据权利要求26所述的电子设备,所述损失函数为:
    Figure PCTCN2019111968-appb-100015
    其中,所述L表示所述第二神经网络的卷积层的总层数,i表示第二神经网络的第i层卷积层,L和i为大于零的自然数,
    Figure PCTCN2019111968-appb-100016
    表示所述内容损失函数,
    Figure PCTCN2019111968-appb-100017
    表示所述风格损失函数,
    Figure PCTCN2019111968-appb-100018
    表示所述图像损失函数,λ s和λ m用于调节所述内容损失函数、风格损失函数以及所述图像损失函数的比例,α i和β i用于调节所述第二神经网络第i层卷积层的权重。
  32. 根据权利要求31所述的电子设备,所述内容损失函数为:
    Figure PCTCN2019111968-appb-100019
    其中,所述N i表示所述第二神经网络第i层卷积层中滤波器的总数,D i表示第i层卷积层中每个所述滤波器对应的特征图的总像素数,F i(O)表示所述第二内容图片在第i层卷积层的特征矩阵,F i(I)表示所述第一内容图片在第i层卷积层的特征矩阵,(j,k)表示所述特征矩阵中第j个和第k个特征向量,j和k为大于零的自然数。
  33. 根据权利要求31所述的电子设备,所述风格损失函数为:
    Figure PCTCN2019111968-appb-100020
    其中,所述N i表示所述第二神经网络第i层卷积层中滤波器的总数,G i(O)表示所述第二内容图片在第i层卷积层的格莱姆矩阵,G i(I)表示所述风格图片在第i层卷积层的格莱姆矩阵,(j,k)用于表示所述格莱姆矩阵中第j个和第k个特征向量,j和k为大于零的自然数。
  34. 根据权利要求31所述的电子设备,所述图像损失函数为:
    Figure PCTCN2019111968-appb-100021
    其中,所述V i(O)表示所述第二内容图片在第i个颜色通道上的向量化结果,所述M I表示所述第一内容图片的拉普拉斯抠图矩阵。
  35. 根据权利要求1所述的电子设备,所述第一神经网络为卷积神经网络。
  36. 根据权利要求1所述的电子设备,所述第二神经网络为超分辨率测试序列VGG网络。
  37. 一种非临时性计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,所述计算机指令被执行时实现如权利要求1-12任一项所述的图像风格迁移方法。
  38. 一种计算机程序产品,所述计算机程序产品包括程序指令,当程序指令被电子设备执行时,使所述电子设备执行如权利要求1-12任一项所述的图像风格迁移方法。
PCT/CN2019/111968 2018-11-01 2019-10-18 图像风格迁移方法和系统 WO2020088280A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811295359.0A CN109697690A (zh) 2018-11-01 2018-11-01 图像风格迁移方法和系统
CN201811295359.0 2018-11-01

Publications (1)

Publication Number Publication Date
WO2020088280A1 true WO2020088280A1 (zh) 2020-05-07

Family

ID=66229793

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/111968 WO2020088280A1 (zh) 2018-11-01 2019-10-18 图像风格迁移方法和系统

Country Status (2)

Country Link
CN (1) CN109697690A (zh)
WO (1) WO2020088280A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669308A (zh) * 2021-01-06 2021-04-16 携程旅游信息技术(上海)有限公司 基于风格迁移的图像生成方法、系统、设备及存储介质
CN114493994A (zh) * 2022-01-13 2022-05-13 南京市测绘勘察研究院股份有限公司 一种用于三维场景的古画风格迁移方法
CN114511440A (zh) * 2020-11-16 2022-05-17 迪斯尼企业公司 神经网络中的自适应卷积
CN114897673A (zh) * 2022-05-31 2022-08-12 浙江理工大学 一种基于分层原理的云锦风格图像生成方法
CN115511700A (zh) * 2022-09-15 2022-12-23 南京栢拓视觉科技有限公司 一种精细化高质量效果的材质风格迁移系统
CN115631091A (zh) * 2022-12-23 2023-01-20 南方科技大学 一种选择性风格迁移方法及终端
GB2612775A (en) * 2021-11-10 2023-05-17 Sony Interactive Entertainment Inc System and method for generating assets

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109697690A (zh) * 2018-11-01 2019-04-30 北京达佳互联信息技术有限公司 图像风格迁移方法和系统
CN110222722A (zh) * 2019-05-14 2019-09-10 华南理工大学 交互式图像风格化处理方法、系统、计算设备及存储介质
CN110189246B (zh) * 2019-05-15 2023-02-28 北京字节跳动网络技术有限公司 图像风格化生成方法、装置及电子设备
CN110288975B (zh) * 2019-05-17 2022-04-22 北京达佳互联信息技术有限公司 语音风格迁移方法、装置、电子设备及存储介质
CN111127309B (zh) * 2019-12-12 2023-08-11 杭州格像科技有限公司 肖像风格迁移模型训练方法、肖像风格迁移方法以及装置
CN111242841B (zh) * 2020-01-15 2023-04-18 杭州电子科技大学 一种基于语义分割和深度学习的图片背景风格迁移方法
CN111340720B (zh) * 2020-02-14 2023-05-19 云南大学 一种基于语义分割的套色木刻版画风格转换算法
CN111353964B (zh) * 2020-02-26 2022-07-08 福州大学 基于卷积神经网络的结构一致立体图像风格迁移方法
CN113313786B (zh) * 2020-02-27 2024-06-11 深圳云天励飞技术有限公司 人像图片上色方法、装置及终端设备
CN111523561A (zh) * 2020-03-19 2020-08-11 深圳市彬讯科技有限公司 图像风格识别方法、装置、计算机设备及存储介质
CN111340745B (zh) * 2020-03-27 2021-01-05 成都安易迅科技有限公司 一种图像生成方法、装置、存储介质及电子设备
CN111476708B (zh) * 2020-04-03 2023-07-14 广州市百果园信息技术有限公司 模型生成方法、模型获取方法、装置、设备及存储介质
CN111986075B (zh) * 2020-08-12 2022-08-09 兰州交通大学 一种目标边缘清晰化的风格迁移方法
CN112288621B (zh) * 2020-09-21 2022-09-16 山东师范大学 基于神经网络的图像风格迁移方法及系统
CN114615421B (zh) * 2020-12-07 2023-06-30 华为技术有限公司 图像处理方法及电子设备
CN114760497A (zh) * 2021-01-08 2022-07-15 阿里巴巴集团控股有限公司 视频生成方法、非易失性存储介质及电子设备
CN112785493B (zh) * 2021-01-22 2024-02-09 北京百度网讯科技有限公司 模型的训练方法、风格迁移方法、装置、设备及存储介质
CN112862669B (zh) * 2021-02-02 2024-02-09 百果园技术(新加坡)有限公司 图像生成模型的训练方法、生成方法、装置及设备
CN113052757B (zh) * 2021-03-08 2024-08-02 Oppo广东移动通信有限公司 图像处理方法、装置、终端及存储介质
CN113724132B (zh) * 2021-11-03 2022-02-18 浙江宇视科技有限公司 图像风格迁移处理方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977414A (zh) * 2017-11-22 2018-05-01 西安财经学院 基于深度学习的图像风格迁移方法及其系统
US20180144509A1 (en) * 2016-09-02 2018-05-24 Artomatix Ltd. Systems and Methods for Providing Convolutional Neural Network Based Image Synthesis Using Stable and Controllable Parametric Models, a Multiscale Synthesis Framework and Novel Network Architectures
CN108629747A (zh) * 2018-04-25 2018-10-09 腾讯科技(深圳)有限公司 图像增强方法、装置、电子设备及存储介质
CN109697690A (zh) * 2018-11-01 2019-04-30 北京达佳互联信息技术有限公司 图像风格迁移方法和系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108470320B (zh) * 2018-02-24 2022-05-20 中山大学 一种基于cnn的图像风格化方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180144509A1 (en) * 2016-09-02 2018-05-24 Artomatix Ltd. Systems and Methods for Providing Convolutional Neural Network Based Image Synthesis Using Stable and Controllable Parametric Models, a Multiscale Synthesis Framework and Novel Network Architectures
CN107977414A (zh) * 2017-11-22 2018-05-01 西安财经学院 基于深度学习的图像风格迁移方法及其系统
CN108629747A (zh) * 2018-04-25 2018-10-09 腾讯科技(深圳)有限公司 图像增强方法、装置、电子设备及存储介质
CN109697690A (zh) * 2018-11-01 2019-04-30 北京达佳互联信息技术有限公司 图像风格迁移方法和系统

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511440A (zh) * 2020-11-16 2022-05-17 迪斯尼企业公司 神经网络中的自适应卷积
CN112669308A (zh) * 2021-01-06 2021-04-16 携程旅游信息技术(上海)有限公司 基于风格迁移的图像生成方法、系统、设备及存储介质
CN112669308B (zh) * 2021-01-06 2024-05-24 携程旅游信息技术(上海)有限公司 基于风格迁移的图像生成方法、系统、设备及存储介质
GB2612775A (en) * 2021-11-10 2023-05-17 Sony Interactive Entertainment Inc System and method for generating assets
CN114493994A (zh) * 2022-01-13 2022-05-13 南京市测绘勘察研究院股份有限公司 一种用于三维场景的古画风格迁移方法
CN114493994B (zh) * 2022-01-13 2024-04-16 南京市测绘勘察研究院股份有限公司 一种用于三维场景的古画风格迁移方法
CN114897673A (zh) * 2022-05-31 2022-08-12 浙江理工大学 一种基于分层原理的云锦风格图像生成方法
CN114897673B (zh) * 2022-05-31 2024-04-09 浙江理工大学 一种基于分层原理的云锦风格图像生成方法
CN115511700A (zh) * 2022-09-15 2022-12-23 南京栢拓视觉科技有限公司 一种精细化高质量效果的材质风格迁移系统
CN115511700B (zh) * 2022-09-15 2024-03-05 南京栢拓视觉科技有限公司 一种精细化高质量效果的材质风格迁移系统
CN115631091A (zh) * 2022-12-23 2023-01-20 南方科技大学 一种选择性风格迁移方法及终端
CN115631091B (zh) * 2022-12-23 2023-03-21 南方科技大学 一种选择性风格迁移方法及终端

Also Published As

Publication number Publication date
CN109697690A (zh) 2019-04-30

Similar Documents

Publication Publication Date Title
WO2020088280A1 (zh) 图像风格迁移方法和系统
CN109359592B (zh) 视频帧的处理方法、装置、电子设备及存储介质
WO2022042776A1 (zh) 一种拍摄方法及终端
CN106570110B (zh) 图像去重方法及装置
US11263723B2 (en) Image warping method and device
US11030733B2 (en) Method, electronic device and storage medium for processing image
US11977981B2 (en) Device for automatically capturing photo or video about specific moment, and operation method thereof
CN106485567B (zh) 物品推荐方法及装置
US11847769B2 (en) Photographing method, terminal, and storage medium
CN114007099A (zh) 一种视频处理方法、装置和用于视频处理的装置
CN108664946A (zh) 基于图像的人流特征获取方法及装置
US11961278B2 (en) Method and apparatus for detecting occluded image and medium
CN105427369A (zh) 移动终端及其三维形象的生成方法
US20220284642A1 (en) Method for training convolutional neural network, and method and device for stylizing video
CN114096994A (zh) 图像对齐方法及装置、电子设备、存储介质
CN109784164A (zh) 前景识别方法、装置、电子设备及存储介质
CN113965694A (zh) 录像方法、电子设备及计算机可读存储介质
WO2022073516A1 (zh) 生成图像的方法、装置、电子设备及介质
CN111316628A (zh) 一种基于智能终端的图像拍摄方法及图像拍摄系统
WO2022179087A1 (zh) 视频处理方法及装置
WO2023230927A1 (zh) 图像处理方法、装置及可读存储介质
CN115423752B (zh) 一种图像处理方法、电子设备及可读存储介质
CN115623313A (zh) 图像处理方法、图像处理装置、电子设备、存储介质
CN112434714A (zh) 多媒体识别的方法、装置、存储介质及电子设备
CN109035159A (zh) 一种图像优化处理方法、移动终端及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19879039

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19879039

Country of ref document: EP

Kind code of ref document: A1