WO2020034481A1 - 一种图像风格转换方及装置、设备、存储介质 - Google Patents

一种图像风格转换方及装置、设备、存储介质 Download PDF

Info

Publication number
WO2020034481A1
WO2020034481A1 PCT/CN2018/117293 CN2018117293W WO2020034481A1 WO 2020034481 A1 WO2020034481 A1 WO 2020034481A1 CN 2018117293 W CN2018117293 W CN 2018117293W WO 2020034481 A1 WO2020034481 A1 WO 2020034481A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
gradient
style
pixel
feature map
Prior art date
Application number
PCT/CN2018/117293
Other languages
English (en)
French (fr)
Inventor
贺高远
柳一村
陈晓濠
任思捷
Original Assignee
深圳市商汤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市商汤科技有限公司 filed Critical 深圳市商汤科技有限公司
Priority to JP2019569805A priority Critical patent/JP6874168B2/ja
Priority to SG11202000062RA priority patent/SG11202000062RA/en
Priority to US16/726,885 priority patent/US11200638B2/en
Publication of WO2020034481A1 publication Critical patent/WO2020034481A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image

Definitions

  • the present application relates to image technology, and in particular, to an image style conversion method and device, device, and storage medium.
  • Image style conversion based on deep learning is a new research issue in recent years. Although the problem of image style conversion has always existed, it was only in 2015 that German researcher Gatys used a neural network method to open the door to creating image art styles with deep learning. The current technology does not optimize the style conversion of face photos. For example, when the existing method is applied to a self-timer image, the common shortcomings are: the deformation of the face edge and the inconsistent face color caused by the image style conversion. .
  • the embodiments of the present application provide an image style conversion method and device, device, and storage medium in order to solve at least one problem in the prior art.
  • An embodiment of the present application provides an image style conversion method.
  • the method includes: obtaining an initial image for style conversion; inputting a gradient of the initial image into an image style conversion model, and obtaining the image style conversion model from the image style conversion model.
  • the feature map of the initial image on the gradient domain; the image style conversion model is obtained based on pixel-level loss and perceptual loss in the gradient domain; image reconstruction is performed according to the feature map of the initial image on the gradient domain to obtain Style image.
  • the image style conversion model includes a pixel-level loss model and a perceptual loss model, wherein the pixel-level loss model is obtained by using a minimum pixel-level loss as a training target in a gradient domain, and the perceptual loss The model is obtained by training in the gradient domain with the minimum perceived loss as the training target.
  • the training process of the pixel-level loss model and the perceptual loss model includes: inputting a gradient of a training sample into the pixel-level loss model, and obtaining a sample of the training sample from the pixel-level loss model. Output results; determining the gradient of the stylized reference image corresponding to the training sample; a first output feature map of the j-th convolution layer of the perceptual loss model according to the gradient of the reference image, and the output result according to the sample At the second output feature map of the j-th convolution layer of the perceptual loss model, the perceptual loss model is trained.
  • the second output feature map of the convolution layer to train the perceptual loss model includes: training the perceptual loss model using the following formula: among them, Represents the gradient of the ith training sample, F W represents the pixel-level loss model, Represents the output of the gradient of the i-th training sample through the pixel-level loss model; Represents the gradient of the stylized reference image of the i-th training sample; ⁇ j () represents the output feature map of the j-th convolution layer when the perceptual loss model adopts the perceptual loss model, and C j H j W j does not denote the j-th The number of channels, height and width of the feature map corresponding to the layer convolution layer.
  • the training process of the pixel-level loss model includes: taking a gradient of a training sample as an input of the pixel-level loss model, obtaining a sample output result from the pixel-level loss model; determining that the training sample corresponds to The gradient of the stylized reference image; training the pixel-level loss model according to the gradient of the reference image and sample output results.
  • the pixel-level loss model includes a first set of convolutional layers, an upsampling layer, and a second set of convolutional layers, and the pixel-level loss is trained according to a gradient of the reference image and a sample output result.
  • the model includes: inputting the gradient of the training sample to the first convolution layer set to obtain a sample feature map; inputting the sample feature map to the upsampling layer, and upsampling to the pixel size of the initial image ; Inputting the up-sampled sample feature map to the second set of convolutional layers to obtain a sample output result.
  • the training the pixel-level loss model according to the gradient of the reference image and the sample output results includes: With the corresponding The absolute value of the difference trains the pixel-level loss model; wherein, Represents the gradient of the ith training sample, F W represents the pixel-level loss model, Represents the output of the gradient of the i-th training sample through the pixel-level loss model F W ; Gradient representing the stylized reference image of the i-th training sample.
  • the With the corresponding Training the pixel-level loss model with the absolute value of the difference includes: training the pixel-level loss model with the following formula: among them, Represents the gradient of the ith training sample, F W represents the pixel-level loss model, Represents the output of the gradient of the i-th training sample through the pixel-level loss model F W ; Represents the gradient of the stylized reference image of the i-th training sample, and D represents the number of samples in the training sample set.
  • performing image reconstruction based on the feature map of the initial image on the gradient domain to obtain a style image includes: satisfying structural similarity with the feature map of the initial image on the gradient domain.
  • a conditional image is used as the style image; wherein the feature map on the gradient domain and the initial image meets the structural similarity condition, including: the degree of structural difference between the style image and the initial image is less than similar Or the degree of structural difference between the style image and the initial image is the smallest, wherein the degree of structural difference is that the style image on the gradient domain and the feature map of the initial image on the gradient domain are in at least one reference direction Trends.
  • performing image reconstruction according to the feature map of the initial image on the gradient domain to obtain a style image includes: Perform image reconstruction to obtain a style image; of which: Represents the gradient of the initial image in the x direction, A feature map representing the gradient of the initial image in the x direction on the gradient domain after passing through the image style conversion model, Represents the gradient of the initial image in the y direction, A feature map representing the gradient of the initial image in the y direction on the gradient domain through the image style conversion model, Represents the gradient of the style image in the x direction, Represents the gradient of the style image in the y direction.
  • performing image reconstruction according to the feature map of the initial image on the gradient domain to obtain a style image includes: according to the color information of the initial image and the initial image on the gradient domain.
  • the feature map is reconstructed to obtain a style image.
  • performing image reconstruction according to the color information of the initial image and a feature map of the initial image on a gradient domain to obtain a style image includes: comparing the initial image with a gradient domain. The image on which the feature map satisfies the structural similarity condition, and the image that satisfies the color similarity condition with the initial image is used as the style image.
  • the method further includes: performing feature extraction on the initial image to obtain a face region in the initial image; and correspondingly, according to the color information of the initial image and the initial image Image reconstruction of the feature map on the gradient domain to obtain a style image, including: an image that will meet the structural similarity condition with the feature map on the gradient domain of the initial image, and An image in which a face region meets a color similarity condition is used as the style image.
  • the image that will meet the structural similarity condition with the feature map of the initial image on the gradient domain, and the image that meets the color similarity condition with the initial image as the style image Including: according to Perform image reconstruction to obtain a style image; where: I represents the initial image, S represents the style image, Represents the gradient of the initial image in the x direction, A feature map representing the gradient of the initial image in the x direction on the gradient domain after passing through the image style conversion model, Represents the gradient of the initial image in the y direction, A feature map representing the gradient of the initial image in the y-direction on the gradient domain through the image style conversion model, Represents the gradient of the style image in the x direction, Represents the gradient of the style image in the y direction.
  • the step of inputting the gradient of the initial image into an image style conversion model and obtaining a feature map of the initial image on the gradient domain from the image style conversion model includes: determining that the initial image is at least A gradient in a reference direction; a gradient in at least one reference direction is input to an image style conversion model, and a feature map of the initial image in the gradient domain in at least one reference direction is correspondingly obtained from the image style conversion model; correspondingly, according to Image reconstruction is performed on the feature maps in the gradient domain in at least one reference direction to obtain a style image.
  • the at least one reference direction includes the x and y directions in a plane reference coordinate system, and correspondingly, the gradients of the initial image in the x and y directions are determined respectively;
  • the gradient in the direction is input to the image style conversion model, and the feature map of the initial image in the gradient domain in the x and y directions is correspondingly obtained from the image style conversion model; correspondingly, according to the gradient in the x and y directions,
  • the feature map on the domain is image reconstructed to obtain the style image.
  • An embodiment of the present application provides an image style conversion device.
  • the device includes: an obtaining unit for obtaining an initial image for style conversion; and an obtaining unit for inputting a gradient of the initial image into an image style conversion model.
  • a feature map of the initial image on the gradient domain is obtained from the image style conversion model; the image style conversion model is obtained based on pixel-level loss and perceptual loss training in the gradient domain; a reconstruction unit is configured according to the The feature map of the initial image on the gradient domain is subjected to image reconstruction to obtain a style image.
  • the image style conversion model includes a pixel-level loss model and a perceptual loss model, wherein the pixel-level loss model is obtained by using a minimum pixel-level loss as a training target in a gradient domain, and the perceptual loss model It is obtained by training in the gradient domain with the minimum perceived loss as the training target.
  • the training unit includes: a first input module, configured to input a gradient of a training sample into the pixel-level loss model, and obtain a sample output result of the training sample from the pixel-level loss model; A determining module, configured to determine a gradient of a stylized reference image corresponding to the training sample; a first training module, configured to, according to the gradient of the reference image, in the j-th convolution layer of the perceptual loss model An output feature map, and training the perceptual loss model on a second output feature map of a j-th convolution layer of the perceptual loss model according to a sample output result.
  • the first training module is configured to train the perceptual loss model using the following formula: among them, Represents the gradient of the ith training sample, F W represents the pixel-level loss model, Represents the output of the gradient of the i-th training sample through the pixel-level loss model; Represents the gradient of the stylized reference image of the i-th training sample; ⁇ j () represents the output feature map of the j-th convolution layer when the perceptual loss model adopts the perceptual loss model, and C j H j W j does not denote the j-th The number of channels, height and width of the feature map corresponding to the layer convolution layer.
  • the training unit includes: a second determination module, configured to determine a gradient of training samples;
  • a second input module is configured to use the gradient of the training sample as an input of the pixel-level loss model, and obtain a sample output result from the pixel-level loss model; a third determination module is configured to determine a corresponding value of the training sample.
  • the pixel-level loss model includes a first set of convolutional layers, an upsampling layer, and a second set of convolutional layers
  • the second training module includes: a first input sub-module for The gradient of the training sample is input to the first set of convolution layers to obtain a sample feature map; an upsampling submodule is used to input the sample feature map to the upsampling layer and upsampling to pixels of the initial image Size; a second input sub-module, configured to input the up-sampled sample feature map to the second set of convolutional layers to obtain a sample output result.
  • the second training module is configured to: With the corresponding The absolute value of the difference trains the pixel-level loss model; wherein, Represents the gradient of the ith training sample, F W represents the pixel-level loss model, Represents the output of the gradient of the i-th training sample through the pixel-level loss model F W ; Gradient representing the stylized reference image of the i-th training sample.
  • the second training module is configured to train the pixel-level loss model using the following formula: among them, Represents the gradient of the ith training sample, F W represents the pixel-level loss model, Represents the output of the gradient of the i-th training sample through the pixel-level loss model F W ; Represents the gradient of the stylized reference image of the i-th training sample, and D represents the number of samples in the training sample set.
  • the reconstruction unit is configured to use, as the style image, an image that satisfies a structural similarity condition with a feature map on a gradient domain of the initial image; and
  • the feature map of the image on the gradient domain satisfies the structural similarity condition, including: the degree of structural difference between the style image and the initial image is less than a similarity threshold, or the degree of structural difference between the style image and the initial image The smallest, wherein the degree of structural difference is the change trend of the style image on the gradient domain and the feature image on the gradient domain in at least one reference direction.
  • the reconstruction unit is configured to: Perform image reconstruction to obtain a style image; of which: Represents the gradient of the initial image in the x direction, A feature map representing the gradient of the initial image in the x direction on the gradient domain after passing through the image style conversion model, Represents the gradient of the initial image in the y direction, A feature map representing the gradient of the initial image in the y-direction on the gradient domain through the image style conversion model, Represents the gradient of the style image in the x direction, Represents the gradient of the style image in the y direction.
  • the reconstruction unit is configured to perform image reconstruction according to color information of the initial image and a feature map of the initial image on a gradient domain to obtain a style image.
  • the reconstruction unit is configured to match the feature map of the initial image on the gradient domain with an image that satisfies the structural similarity condition, and the image that will meet the color similarity condition with the initial image. As the style image.
  • the apparatus further includes: an extraction unit configured to perform feature extraction on the initial image to obtain a face region in the initial image; correspondingly, the reconstruction unit is configured to combine the The feature image of the initial image on the gradient domain satisfies the structural similarity condition, and the image that satisfies the color similarity condition with the face area in the initial image is used as the style image.
  • the reconstruction unit is configured to Perform image reconstruction to obtain a style image; where: I represents the initial image, S represents the style image, Represents the gradient of the initial image in the x direction, A feature map representing the gradient of the initial image in the x direction on the gradient domain after passing through the image style conversion model, Represents the gradient of the initial image in the y direction, A feature map representing the gradient of the initial image in the y-direction on the gradient domain through the image style conversion model, Represents the gradient of the style image in the x direction, Represents the gradient of the style image in the y direction.
  • the obtaining unit includes: a fourth determining module for determining a gradient of the initial image in at least one reference direction; an obtaining module for inputting a gradient in at least one reference direction to an image style A transformation model, corresponding to obtaining a feature map of the initial image in the gradient domain in at least one reference direction from the image style transformation model; correspondingly, the reconstruction unit is configured to: The feature map is reconstructed to obtain a style image.
  • the at least one reference direction includes x and y directions in a plane reference coordinate system, and correspondingly, a determining unit is configured to determine gradients of the initial image in x and y directions, respectively;
  • the obtaining unit is configured to input the gradients in the x and y directions to the image style conversion model, and correspondingly obtain a feature map of the initial image in the gradient domain in the x and y directions from the image style conversion model.
  • the reconstruction unit is configured to perform image reconstruction according to the feature maps in the gradient domain in the x and y directions to obtain a style image.
  • An embodiment of the present application provides a computer device including a memory and a processor.
  • the memory stores a computer program executable on the processor, and the processor executes the steps in the image style conversion method when the program is executed.
  • An embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the steps in the foregoing image style conversion method are implemented.
  • An embodiment of the present application provides a computer program product.
  • the computer program product includes computer-executable instructions. After the computer-executable instructions are executed, the steps in the image style conversion method can be implemented.
  • an initial image to be style converted is obtained; a gradient of the initial image is input to an image style conversion model, and the image style conversion model is obtained from the image style conversion model.
  • the image style conversion model based on pixel-level loss and perceptual loss training in the gradient domain can overcome the disadvantages of edge deformation and inconsistent color of the face in the related technology, and can achieve image style conversion.
  • FIG. 1 is a schematic diagram of a composition structure of a network architecture according to an embodiment of the present application
  • FIG. 2A is a schematic flowchart of an image style conversion method according to an embodiment of the present application.
  • FIG. 2B is a schematic diagram of a download scenario according to an embodiment of the present application.
  • FIG. 3A is a first schematic view of an implementation scenario according to an embodiment of the present application.
  • FIG. 3B is a second schematic diagram of an implementation scenario according to an embodiment of the present application.
  • FIG. 4A is a third schematic diagram of an implementation scenario according to an embodiment of the present application.
  • FIG. 4B is a fourth schematic view of an implementation scenario according to an embodiment of the present application.
  • FIG. 5A is a schematic structural diagram of a convolutional neural network model according to an embodiment of the present application.
  • 5B is a schematic structural diagram of a pixel-level loss model according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a composition of an image style conversion device according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a hardware entity of a computer device according to an embodiment of the present application.
  • the process of generating a style map using a neural network method is generally as follows: using a neural network model such as VGG16 model or VGG19, image feature extraction is performed on an original image (Content Image) and a style image (Style Image), that is, Content features are extracted from the original image, and style features are extracted from the style image.
  • a loss function using the content features and style features, the loss value calculation is performed on a randomly initialized image and the redrawn image is fed back to generate a generated image.
  • the generated image will be similar in content to the original image and in style. It will look similar to the style image.
  • this algorithm needs to be trained every time it generates an image, and it takes a long time.
  • training a network can transform any image into the corresponding style of the network, so each time an image is generated, the network is propagated forward only once, and the speed will be very fast.
  • the fast transfer algorithm generally includes two networks: one is an Image Transform Network, and one is a Loss Network.
  • the image conversion network is used to convert the image. Its parameters are changed, while the parameters of the loss network remain unchanged.
  • the VGG-16 network trained in the ImageNet image library can be used as the loss network.
  • the original image is converted by the image.
  • the three results of the network, the style map and the original image, are through the loss network, extract the perceptual loss, and use the perceptual loss to train the image conversion network.
  • a large number of images are used to train the image conversion network to obtain a model, and in the output phase, the model is applied to output to generate a map.
  • the network obtained in this way is three orders of magnitude faster than the Gatys model in generating maps.
  • the current technology does not optimize the style conversion of face photos.
  • the edges of the face may deviate from the original The image, that is, the structural information of the output image is changed; 2)
  • the skin color of the face may be inconsistent with the original skin color, that is, the color information of the output image is changed.
  • the portrait of user A in the initial image is a round face.
  • the portrait of user A in the output style image is an awl face;
  • the skin of user B is fair.
  • the skin of user B is darkened in the output style image. That is, how to better maintain the structural information and color information of the original initial image becomes a problem to be solved.
  • an embodiment of the present application proposes a Convolutional Neural Networks (CNN) structure based entirely on image style conversion in the image gradient domain. Due to the edge preservation of the gradient domain learning, this The image style conversion network provided by the embodiment of the application can overcome the disadvantages of edge deformation of the previous method.
  • a term called color confidence is introduced to maintain the fidelity of the skin color of the resulting image.
  • the image reconstruction phase uses both the structure information and the color information of the original image, which can make the result more natural.
  • perceptual loss is directly used in the gradient domain, so that the learned style information is more focused on strokes rather than colors, which makes it more suitable for the task of style conversion with human faces.
  • the sampling operation refers to a subsampled operation or a down-sampled operation. If the sampling object is a continuous signal, the continuous signal is a discrete signal after the downsampling operation.
  • the purpose of the downsampling operation may be to reduce the image for computational convenience.
  • the principle of the downsampling operation For an image whose I size is M * N, downsampling it by s times, we get a resolution image with the size of (M / s) * (N / s). Of course, s should be M The common divisor of N and N only works. If you consider an image in the form of a matrix, you are turning the image in the original image s * s window into a pixel, and the value of this pixel is the average value of all pixels in the window.
  • the upsampling operation is the reverse process of the downsampling operation. It is also called up-sampling or interpolating. For images, high-resolution images can be obtained by upsampling operations.
  • the principle of the upsampling operation almost all image enlargement uses interpolation method, that is, based on the original image pixels, a new pixel is inserted between pixels using a suitable interpolation algorithm.
  • Channel The word has two different meanings. The first is for sample images (images are used as training samples). Channels refer to color channels (number of channels). Examples of images are used below. Represents the channel of the sample image; the second is the dimension of the output space, such as the number of output channels in the convolution operation, or the number of convolution kernels in each convolution layer.
  • Color channel which decomposes an image into one or more color components or color components.
  • a single color channel only one numerical value is required for each pixel, which can only represent grayscale, and 0 is black.
  • RGB red, green and blue
  • the image is divided into three red, green, and blue color channels, which can represent color, and all 0s represent black.
  • alpha 0 means full transparency.
  • Convolutional neural network is a kind of multilayer supervised learning neural network.
  • the convolutional layer and pool sampling layer of the hidden layer are the core modules to realize the feature extraction function of the convolutional neural network.
  • the low hidden layer of the convolutional neural network is composed of the convolutional layer and the maximum pool sampling layer alternately, and the high layer is the hidden layer and the logistic regression classifier of the fully connected layer corresponding to the traditional multilayer perceptron.
  • the input of the first fully connected layer is a feature image obtained by feature extraction from the convolutional layer and the sub-sampling layer.
  • the final output layer is a classifier that can use logistic regression, Softmax regression or even support vector machines to classify the initial image.
  • Each layer in CNN is composed of multiple maps, and each map is composed of multiple neural units.
  • CNN generally uses convolutional layers and sampling layers alternately, that is, a convolutional layer is connected to a sampling layer, and a sampling layer is followed by a convolution; of course, multiple convolutional layers can be connected to a sampling layer, so that the convolutional layer Features are extracted, and then combined to form more abstract features, and finally to describe the feature of the image object, the CNN can be followed by a fully connected layer.
  • the convolutional neural network structure includes a convolutional layer, a downsampling layer, and a fully connected layer.
  • Each layer has multiple feature maps, each feature map extracts a feature of the input through a convolution filter, and each feature map has multiple neurons.
  • Convolution layer The reason for using the convolution layer is an important feature of the convolution operation. Through the convolution operation, the original signal features can be enhanced and the noise can be reduced.
  • the reason for using downsampling is that according to the principle of image local correlation, subsampling the image can reduce the amount of calculation while maintaining the image rotation invariance.
  • Fully connected layer using softmax full connection, the activation value obtained is the image features extracted by the convolutional neural network.
  • a neuron is the basic unit of a multilayer perceptron, and its function becomes activation transmission. That is, for a neuron, the input is part or all of the input of the convolutional neural network or part or all of the output of the previous layer. After calculation of the activation function, the result obtained is used as the output of the neuron.
  • activation functions include sigmoid function, tanh function, and linear rectifier function (Rectified Linear Unit, ReLu).
  • Pixel-wise Loss Assuming I est is the output of a convolutional neural network, and I HR is the original high-resolution image, then pixel-wise loss emphasizes that each image between I est and I HR The matching of the corresponding pixels is different from the perception result of the human eye. Generally speaking, images trained through pixel-wise loss are usually smooth and lack high-frequency information.
  • I est represents the output of the convolutional neural network
  • I HR represents the original high-resolution image
  • I est and I HR are input into a differentiable function ⁇ , which avoids the need for the network
  • the output image is pixel-wise consistent with the original high-resolution image.
  • the VGG model and VGG model structure are simple and effective.
  • the first few layers only use 3 ⁇ 3 convolution kernels to increase the network depth.
  • the maximum number of neurons in each layer is reduced in turn by max pooling.
  • the last three layers are two.
  • "16" and "19" indicate the number of network layers in the network that need to be updated and require weights (that is, weight, parameters to be learned).
  • the weights of the VGG16 model and the VGG19 model are both trained by ImageNet.
  • Model parameters can generally be understood as configuration variables inside the model. Historical data or training samples can be used to estimate the values of model parameters, or model parameters are variables that can be automatically learned from historical data or training samples.
  • model parameters have the following characteristics: model parameters are needed for model prediction; model parameter values can define model functions; model parameters are obtained by data estimation or data learning; model parameters are generally not manually set by practitioners; model parameters are usually Saved as part of the learning model; optimization algorithms are usually used to estimate model parameters, which is an efficient search of the possible values of the parameters.
  • the weights and biases of network models are generally called model parameters.
  • Model hyperparameters can generally be understood as the configuration outside the model, whose values cannot be estimated from the data. To some extent, the characteristics of model hyperparameters are: model hyperparameters are often used in the process of estimating model parameters; model hyperparameters are usually directly specified by practitioners; model hyperparameters can usually be set using heuristics; model hyperparameters It is usually adjusted for a given predictive modeling problem. In other words, model hyperparameters are used to determine some parameters of the model. Different hyperparameters are different for models. This model means that there are slight differences. For example, suppose that they are all CNN models. If the number of layers is different, the models are different, although they are all CNN models. In deep learning, the super parameters are: learning rate, number of iterations, number of layers, number of neurons in each layer, and so on.
  • FIG. 1 is a schematic diagram of a composition structure of the network architecture according to the embodiment of the present application.
  • the network architecture includes two or more electronic devices 11 to 1N and a server 31.
  • the electronic devices 11 to 1N interact with the server 31 through the network 21.
  • the electronic device may be various types of computer devices with information processing capabilities during the implementation process.
  • the electronic device may include a mobile phone, a tablet computer, a desktop computer, a personal digital assistant, a navigator, a digital phone, a television, and the like.
  • An embodiment of the present application proposes an image style conversion method, which can effectively solve the problem that the structural information of the output image changes compared to the initial image.
  • the method is applied to an electronic device, and the functions implemented by the method can be processed by the electronic device.
  • the programmer calls program code to implement.
  • the program code can be stored in a computer storage medium. It can be seen that the electronic device includes at least a processor and a storage medium.
  • FIG. 2A is a schematic flowchart of an image style conversion method according to an embodiment of the present application. As shown in FIG. 2A, the method includes:
  • Step S201 Obtain an initial image for style conversion
  • the image style conversion method provided in the embodiment of the present application may be embodied by a client (application) during the implementation process.
  • the user downloads the client from the server 31 on the electronic device 12 on the user.
  • the electronic device 12 sends a download request to the server 31.
  • the download request is used to download the client, and the server 31 responds to the download request.
  • the server 31 sends a download response to the electronic device 12, and the download response carries a client, such as an Android application package (APK) for the Android system, and then the user installs the downloaded client on his electronic device, and then the electronic device Running the client, that is, the electronic device can implement the image style conversion method provided in the embodiment of the present application.
  • API Android application package
  • step S201 is implemented on the electronic device side
  • the implementation process may be as follows: When the user selects a picture from the album, the client receives the user's operation of selecting a picture, that is, the client determines the selected picture as a style to be performed. The converted initial image; or, the user takes a photo with the camera of the electronic device or an external camera, and the client receives the operation of the user to take the photo, that is, the client determines the captured photo as the initial image to be style converted.
  • this step can also have other implementations.
  • Step S202 input the gradient of the initial image into an image style conversion model, and obtain a feature map of the initial image on the gradient domain from the image style conversion model;
  • the image style conversion model is trained and obtained based on pixel-level loss and perceptual loss in the gradient domain.
  • the image style conversion model is obtained by using pixel-level loss and perceptual loss as training targets in the gradient domain;
  • Step S203 Perform image reconstruction according to the feature map of the initial image on the gradient domain to obtain a style image.
  • the style image is a reconstructed and stylized image.
  • the trained image style conversion model can be local to the electronic device, or it can be on the server side.
  • the trained image style conversion model may be when the electronic device installs a client, that is, the trained image style conversion model is installed.
  • the electronic device obtains the initial image through step S201
  • obtain a feature map ie, an output result
  • the electronic device outputs the obtained style image to the user.
  • the trained image style conversion model may also be located on the server side, as shown in FIG. 3B.
  • the electronic device sends the initial image to the server, so that the server receives the initial image sent by the electronic device, and the server implements the steps.
  • step S201 includes: the server receives the initial image sent by the electronic device, that is, the server obtains the initial image to be style converted, and the server obtains the step S202 The feature map of the initial image on the gradient domain, and finally the output style image is obtained through step S203.
  • the server can also send the style image to An electronic device.
  • the electronic device After receiving the style image, the electronic device outputs the style image to the user.
  • the client after the client is installed on the electronic device, the user uploads the initial image of the user, receives the style image sent by the server, and outputs the style image to the user.
  • steps S201 to S203 may be partially performed by an electronic device, or may be partially performed by a server.
  • steps S201 and S202 may be performed by the electronic device. It executes locally, and then the electronic device sends the feature map of the initial image on the gradient domain to the server. After the server executes step S203, the style image is obtained, and then the style image is sent to the electronic device, and the electronic device outputs the style image.
  • steps S201 and S202 may be performed by a server. The server sends a feature map of the initial image on the gradient domain to the electronic device. After the electronic device executes step S203, a style image is obtained, and then the style is The image is output to the user.
  • the method further includes: training the image style conversion model, wherein the training target of the image style conversion model is to minimize the total loss L total , where L total is expressed by the following formula:
  • L total ⁇ L feat + ⁇ L pixel ; wherein, L feat represents a perceived loss, L pixel represents a pixel-level loss, and the values of ⁇ and ⁇ are real numbers.
  • the ratio of the ⁇ to the ⁇ is greater than 10 and less than the fifth power.
  • the value of ⁇ is 10000, and the value of ⁇ is 1.
  • the values of ⁇ and ⁇ can be set according to specific application scenarios, and the values of the ⁇ and ⁇ are not limited in the embodiments of the present application.
  • the image style conversion model includes a pixel-level loss model and a perceptual loss model, wherein the pixel-level loss model is a pixel-level loss model obtained by using a minimum pixel-level loss as a training target in a gradient domain.
  • the perceptual loss model is obtained by training in a gradient domain with a minimum perceptual loss as a training target.
  • the pixel-level loss model is a pixel-level loss model
  • the perceptual loss model is a training process when the perceptual loss model includes:
  • Step S11 Determine the gradient of the training sample; assuming that i i represents the i-th training sample, determine that the gradient of the i-th training sample I i is
  • Step S12 input the gradient of the training sample into the pixel-level loss model, and obtain a sample output result of the training sample from the pixel-level loss model; wherein, the gradient of the i-th training sample I i Input the pixel-level loss model F W , and obtain a sample output result of the training sample from the pixel-level loss model
  • Step S13 Determine the gradient of the stylized reference image corresponding to the training sample; where the stylized reference image may be an unsatisfactory stylized reference image obtained by using an existing stylization algorithm, then suppose The stylized reference image corresponding to the training sample I i is Then the gradient of the reference image is
  • Step S14 According to the gradient of the reference image, the first output feature map of the j-th convolution layer of the perceptual loss model, and the first output feature map of the j-th convolution layer of the perceptual loss model according to the sample output result.
  • Two output feature maps train the perceptual loss model.
  • the j-th convolutional layer may be any layer in a convolutional neural network model.
  • the j-th convolutional layer may be a conv3-3 layer in VGG16.
  • the pixel-level loss model includes a first set of convolutional layers, an upsampling layer, and a second set of convolutional layers, and the pixel-level loss is trained according to a gradient of the reference image and a sample output result.
  • the model includes: inputting the gradient of the training sample to the first convolution layer set to obtain a sample feature map; inputting the sample feature map to the upsampling layer, and upsampling to the pixel size of the initial image ; Inputting the up-sampled sample feature map to the second set of convolutional layers to obtain a sample output result.
  • the second output feature map of the convolution layer to train the perceptual loss model includes:
  • the perceptual loss model is trained using the following formula:
  • F W represents the pixel-level loss model
  • ⁇ j () represents the output feature map of the j-th layer of the convolutional layer when the perceptual loss model uses a convolutional neural network model
  • C j H j W j The number of channels, height, and width of the feature map corresponding to the j-th convolution layer.
  • the j-th convolution layer is conv3-3.
  • the training process when the pixel-level loss model is a pixel-level loss model includes:
  • Step S21 Determine the gradient of the training sample
  • Step S22 Use the gradient of the training sample as the input of the pixel-level loss model, and obtain the sample output result from the pixel-level loss model
  • Step S23 determine the training sample corresponding The gradient of the stylized reference image
  • step S24 training the pixel-level loss model according to the gradient of the reference image and the sample output result.
  • training the pixel-level loss model according to the gradient of the reference image and the sample output results includes: With the corresponding The absolute value of the difference trains the pixel-level loss model; wherein, Represents the gradient of the ith training sample, F W represents the pixel-level loss model, Represents the output of the gradient of the i-th training sample through the pixel-level loss model F W ; Gradient representing the stylized reference image of the i-th training sample.
  • the With the corresponding Training the pixel-level loss model with the absolute value of the difference includes: training the pixel-level loss model with the following formula:
  • F W represents the pixel-level loss model
  • D represents the number of samples in the training sample set.
  • performing image reconstruction based on the feature map of the initial image on the gradient domain to obtain a style image includes: satisfying structural similarity with the feature map of the initial image on the gradient domain.
  • a conditional image is used as the style image.
  • the feature map on the gradient domain with the initial image satisfies a structural similarity condition, including: the degree of structural difference between the style image and the initial image is less than a similarity threshold or the style image and the The degree of structural difference of the initial image is the smallest, wherein the degree of structural difference is a change trend of a style image on the gradient domain and a feature map of the initial image on the gradient domain in at least one reference direction.
  • the reference direction can be the x and y directions of the image in the plane reference coordinate system. Of course, there can be other directions, or only one direction can be used.
  • the degree of difference can use the difference or the absolute value of the difference or various mathematical deformation operations based on the difference (such as the sum of the squares of the absolute values of the differences in the x and y directions, that is, (I represents the initial image, S represents the style image, and ⁇ ⁇ represents the absolute value symbol).
  • performing image reconstruction according to the feature map of the initial image on the gradient domain to obtain a style image includes: Perform image reconstruction to obtain a style image; of which: Represents the gradient of the initial image in the x direction, A feature map representing the gradient of the initial image in the x-direction through the image style conversion model on the gradient domain, Represents the gradient of the initial image in the y direction, A feature map representing the gradient of the initial image in the y-direction on the gradient domain through the image style conversion model, Represents the gradient of the style image in the x direction, Represents the gradient of the style image in the y direction.
  • performing image reconstruction according to the feature map of the initial image on the gradient domain to obtain a style image includes: according to the color information of the initial image and the initial image on the gradient domain. The feature map is reconstructed to obtain a style image.
  • performing image reconstruction based on the color information of the initial image and a feature map of the initial image on the gradient domain to obtain a style image includes: comparing the feature image of the initial image on the gradient domain with the feature map. An image that satisfies a structural similarity condition, and an image that satisfies a color similarity condition with the initial image is used as the style image.
  • the method further includes: performing feature extraction on the initial image to obtain a face region in the initial image; and correspondingly, according to the color information of the initial image and the initial image Image reconstruction of the feature map on the gradient domain to obtain a style image, including: an image that will meet the structural similarity condition with the feature map on the gradient domain of the initial image, and An image in which a face region meets a color similarity condition is used as the style image.
  • the color similarity condition that is, the color similarity condition that the color information satisfies, that is, the degree of difference between the color of the style image and the initial image is less than the set value or the smallest, wherein the degree of color difference is the sampling point of the image to be processed and the target image
  • the difference value of the color value is represented by "SI", where I represents the initial image and S represents the style image).
  • a color similarity condition is set, where the color similarity condition may be the color of the entire initial image, or the color of the human face in the initial image. colour.
  • the above two conditional structural similarity conditions and color similarity conditions can be used independently in theory, that is, only one condition is used to calculate the style image; two can also be used at the same time, and the corresponding coefficients (weights) are assigned at the same time. ), For example, the value of ⁇ is a real number.
  • the image that will meet the structural similarity condition with the feature map of the initial image on the gradient domain, and the image that meets the color similarity condition with the initial image as the style image Including: according to Perform image reconstruction to obtain a style image; where: I represents the initial image, S represents the style image, Represents the gradient of the initial image in the x direction, A feature map representing the gradient of the initial image in the x-direction through the image style conversion model on the gradient domain Represents the gradient of the initial image in the y direction, A feature map representing the gradient of the initial image in the y-direction on the gradient domain through the image style conversion model, Represents the gradient of the style image in the x direction, Represents the gradient of the style image in the y direction.
  • the step of inputting the gradient of the initial image into an image style conversion model and obtaining a feature map of the initial image on the gradient domain from the image style conversion model includes: Step S31, determining the initial image The gradient of the image in at least one reference direction; step S32, the gradient in the at least one reference direction is input to an image style conversion model, and the features of the initial image in the gradient domain in at least one reference direction are correspondingly obtained from the image style conversion model.
  • image reconstruction is performed according to the feature map on the gradient domain in at least one reference direction to obtain a style image.
  • the at least one reference direction includes x, y directions in a plane reference coordinate system, and correspondingly, determining the gradient of the initial image in at least one reference direction includes: determining the initial The gradient of the image in the x and y directions, respectively; said inputting the gradient in at least one reference direction to the image style conversion model, and correspondingly obtaining the initial image in the gradient domain in at least one reference direction from said image style conversion model
  • the feature map includes: inputting gradients in the x and y directions to an image style conversion model, and obtaining a feature map of the initial image in the gradient domain in the x and y directions from the image style conversion model;
  • the image reconstruction based on the feature map on the gradient domain in at least one reference direction to obtain a style image includes: performing image reconstruction based on the feature map on the gradient domain in the x and y directions to obtain a style. image.
  • the first stage the structure of the convolutional neural network model provided in the embodiment of the present application is first described, and then in the second stage, the training of the provided convolutional neural network model is introduced.
  • the process, followed by the third stage introduces the process of image reconstruction using the trained convolutional neural network, that is, the method of image style conversion on the initial image.
  • the first stage the structure of the convolutional neural network model
  • FIG. 5A is a schematic structural diagram of a convolutional neural network model according to an embodiment of the present application. As shown in FIG. 5A, the convolutional neural network network is composed of two parts:
  • the first part is the convolutional neural network 51 (the first convolutional neural network) to be trained, which takes the gradient of the selfie image as input, followed by continuous convolutional layers and ReLu layers, and then uses an upsampling operation to feature the feature map (feature map) upsampling to the original image size, and finally calculating the pixel-wise loss (L- pixel ) with the gradient of the artistic style reference image;
  • the gradient of the selfie image as input includes: taking the selfie image in the x direction gradient And selfie image gradient in y direction Respectively as the input of a convolutional neural network.
  • each convolution filter of the convolutional layer repeatedly acts on the entire receptive field, convolving the input selfie image, and the result of the convolution constitutes the feature map of the input selfie image.
  • Local features of the selfie image were extracted.
  • a feature of convolutional neural networks is: max-pooling sampling, which is a non-linear downsampling method. From the mathematical formula of max-pooling, it can be seen that max-pooling is the extraction of feature points in the neighborhood. maximum. After obtaining image features through convolution, these features are used for classification. After obtaining the convolutional feature map of the image, the convolutional features are reduced in dimension by the maximum pool sampling method.
  • the convolution features are divided into several disjoint regions, and the largest (or average) features of these regions are used to represent the convolution features after dimensionality reduction.
  • the function of the maximum pool sampling method is reflected in two aspects: (1), it reduces the computational complexity from the upper hidden layer; (2), these pooling units have translation invariance, even if the image has a small displacement, extraction The characteristics obtained will remain unchanged. Due to the enhanced robustness to displacement, the maximum pool sampling method is an efficient sampling method that reduces the data dimension.
  • the second part is a VGG-16 network 52 (second convolutional neural network) that has been trained in ImageNet to calculate the perceptual loss L feat .
  • the output of the conv3-3 layer of VGG-16 is actually used to calculate the perceptual loss.
  • the total objective function L total can be calculated using the following formula (3-1).
  • L total ⁇ L feat + ⁇ L pixel (3-1); where the values of ⁇ and ⁇ are real numbers. For example, ⁇ and ⁇ can be set to integers during training.
  • Image gradient is a method to describe the differences between image pixels. From a mathematical point of view, the image gradient refers to the first derivative of the pixel.
  • the following formula can be used to represent the gradient in the x direction of the image And y-direction gradient Represented by the following formula (3-2) and formula (3-3):
  • the second stage the training process of the first part of the convolutional neural network
  • Equation (4-1) Represents the gradient or gradient representation of the i-th original image I i in the x direction, Represents the gradient or gradient representation in the y-direction. Represents the gradient of the original image, Represents the gradient of the original image I i in the x direction, Represents the gradient of the original image I i in the y direction.
  • F W represents the first part of the convolutional neural network model, so Representing the result of the gradient of the i-th original image I i through a convolutional neural network, Represents the result of the gradient of the i-th original image I i in the x direction through a convolutional neural network network, Represents the result of the gradient of the i-th original image I i in the y direction through a convolutional neural network network.
  • Represents the gradient of the stylized reference image of the i-th original image I i Represents the gradient of the stylized reference image of the i-th original image I i in the x direction, Represents the gradient in the y direction of the stylized reference image of the i-th original image I i .
  • ⁇ j () represents the output feature map of the j-th convolution layer of the VGG-16 network
  • C j , H j , and W j respectively represent the The number of channels, height and width of the feature map corresponding to the j-layer convolutional layer.
  • the conv3-3 layer of VGG-16 is used.
  • the total objective function is the sum of the perceptual loss L feat and the pixel-level loss L pixel ;
  • L total ⁇ L feat + ⁇ L pixel (4-3);
  • the values of ⁇ and ⁇ are real numbers.
  • ⁇ and ⁇ can be set to integers during training.
  • ⁇ and ⁇ were set to 10000 and 1, respectively.
  • the adam optimization method was used to optimize the objective function formula 3.
  • the learning rate was set to 10 -8 , the last 50K times, set the learning rate to 10 -9 . It should be noted that during the implementation process, those skilled in the art can make some modifications to formula (4-1) and formula (4-2).
  • formula (4-1) as long as these modifications can represent pixel-level losses, for example, the formula (4-1) Modify it to something else, such as or Wait, modify the square of the absolute value in the formula (4-1) to an absolute value, or modify the square of the absolute value in the formula (4-1) to the square root of the absolute value.
  • the third stage the image reconstruction process
  • formula (5) When a new image is input, such as a new self-timer image, in order to obtain the corresponding style image, the following formula (5) is used to determine the output stylized image.
  • I represents a new self-timer image, that is, an initial image
  • S represents a style image corresponding to the new self-timer image.
  • the embodiment of the present application implements a self-oriented image style conversion algorithm, which overcomes two important disadvantages of the previous style conversion method when applied to a human face: First, the deformation of the face edge Second, the inconsistency of human skin color.
  • the neural network structure of the embodiment of the present application is entirely learned in the gradient domain. Compared with other image style conversion methods, this method can overcome the shortcomings of edge deformation and inconsistent color in the previous method in style conversion of selfie photos. It can beautify and enhance images while achieving image style conversion.
  • the first part is the convolutional neural network 51 (first convolutional neural network) to be trained.
  • the convolutional neural network shown in FIG. 5B may be adopted.
  • FIG. 5B is a composition of the convolutional neural network model according to the embodiment of the present application. Schematic structure, as shown in Figure 5B. The structure of the model includes:
  • Input layer (input) 501 The gradient of the selfie image in the x or y direction is used as input. It should be noted that h represents the high of the gradient of the selfie image in the x or y direction, and w represents the selfie image in the x or The width of the gradient in the y direction.
  • h represents the high of the gradient of the selfie image in the x or y direction
  • w represents the selfie image in the x or The width of the gradient in the y direction.
  • the gradient of the selfie image I in the x direction is obtained
  • the gradient of the selfie image I in the y direction is obtained followed by with Each color channel (or color component) as input.
  • RGB Red Green Blue, Red Green Blue
  • the output result is a feature map 502.
  • the height of the feature map 502 is The width of the feature map 502 is The number of channels of the feature map 502 is c, where r is a coefficient, and the values of r and c are related to the model hyperparameters of the convolutional neural network model in the embodiment of the application. It includes the size of the convolution kernel, the stride of the convolution kernel, and the padding of the input feature map. Generally, the number of convolution kernels determines the number of channels c of the output feature map.
  • the upsampling layer has 511 to 51C input and 521 to 52C output.
  • the output feature map is disassembled according to the number of channels c, so that c feature maps 511 to 51C are obtained, and each of the feature maps 511 to 51C is sampled to the size of the initial image.
  • the input layer 501 mentions the initial image, that is, the self-timer image, and the size of the self-timer image is h * w, so the sizes of the up-sampled images 521 to 52C output by the up-sampling layer are also h * w.
  • the output corresponding to input 511 is 521
  • the output corresponding to input 512 is 522, and so on
  • the output corresponding to input 51C is 52C.
  • Compositing layer 531 the input is 521 to 52C, and the output is 531; the upsampling images 521 to 52C are combined to obtain the feature map 531; the output layer, the input is 531, and the output is 541; the feature map 531 is convolved and excited, That is, input to conv8, ReLu8, and conv9, and finally get output 541, the size of output 541 is the size of the original image h * w.
  • the convolutional neural network model shown in FIG. 5B can be used to replace the network part 53 in FIG. 5A.
  • the convolution process before upsampling has 7 layers, which are conv1 to conv7
  • the excitation process before upsampling also has 7 layers, which are ReLu1 to ReLu7.
  • the 7-layer convolutional layer (conv1 to conv7) can be considered as the first set of convolutional layers of the pixel-level loss model.
  • the 7-layer convolutional layer and 7-layer excitation layer (ReLu1 to ReLu7) can also be considered as pixels.
  • the first convolutional layer set of the first-order loss model can be considered as the first-order loss model.
  • the upsampling there are also two layers of convolutions, conv8 and conv9 respectively; after the upsampling, there is also a layer of excitation process, namely the excitation layer ReLu8.
  • the two convolutional layers (conv8 and conv9) can be considered as the second convolutional layer set of the pixel-level loss model.
  • the two-layer convolutional layer and the one-layer excitation layer (ReLu8) can also be considered as the pixel-level loss.
  • the second set of convolutional layers of the model can be considered as the second convolutional layer set of the pixel-level loss model.
  • the number of layers of the convolutional layer (the number of layers of the convolutional layer in the first set of convolutional layers) before the upsampling can be changed, for example, 5 layers, 9 layers, and 10 layers are used. Or dozens of layers.
  • the number of layers of the excitation layer before the upsampling can also be changed. For example, 5 layers, 6 layers, 9 layers, and 15 layers are used. and many more.
  • the convolution layer before the upsampling, the convolution layer will be followed by an excitation layer, that is, before the upsampling, a convolution layer and an excitation layer are alternated.
  • the number of alternating layers and excitation layers can also vary, for example, two convolutional layers followed by one excitation layer, and then one convolutional layer followed by two excitation layers.
  • the excitation function used by the excitation layer is ReLu.
  • the excitation layer may also use other excitation functions, such as a sigmoid function.
  • the pooling layer is not shown in the embodiment shown in FIG. 5B. In some embodiments, a pooling layer may also be added. After the upsampling, the number of layers of the convolutional layer (the number of layers of the convolutional layer in the second set of convolutional layers), and the order of the convolutional layer and the excitation layer can be changed.
  • an embodiment of the present application provides an image style conversion device.
  • the device includes each unit included, and each module included in each unit can be implemented by a processor in an electronic device. It is realized through specific logic circuits; in the implementation process, the processor may be a central processing unit (CPU), a microprocessor (MPU), a digital signal processor (DSP), or a field programmable gate array (FPGA).
  • CPU central processing unit
  • MPU microprocessor
  • DSP digital signal processor
  • FPGA field programmable gate array
  • FIG. 6 is a schematic structural diagram of an image style conversion device according to an embodiment of the present application. As shown in FIG. 6, the device 600 includes an obtaining unit 601, an obtaining unit 602, and a reconstruction unit 603, where:
  • An obtaining unit 601 is configured to obtain an initial image for style conversion; an obtaining unit 602 is configured to input a gradient of the initial image into an image style conversion model, and obtain an on-gradient of the initial image from the image style conversion model. Feature map on the domain; the image style conversion model is obtained based on pixel-level loss and perceptual loss in the gradient domain; a reconstruction unit 603 is configured to perform image reconstruction based on the feature map of the initial image on the gradient domain To get style images.
  • the apparatus further includes a training unit for training the image style conversion model, and the training target of the image style conversion model is to minimize the total loss L total , where L total is expressed by the following formula :
  • L total ⁇ L feat + ⁇ L pixel ; wherein, L feat represents a perceived loss, L pixel represents a pixel-level loss, and the values of ⁇ and ⁇ are real numbers.
  • a ratio of the ⁇ to the ⁇ is greater than 10 and less than a fifth power.
  • the image style conversion model includes a pixel-level loss model and a perceptual loss model, wherein the perceptual loss model is a pixel-level loss model obtained by using the minimum pixel-level loss as a training target in the gradient domain, and the perceptual loss The model is obtained by training in the gradient domain with the minimum perceived loss as the training target.
  • the training unit includes: a first input module, configured to input a gradient of a training sample into the pixel-level loss model, and obtain a sample output result of the training sample from the pixel-level loss model; A determining module, configured to determine a gradient of a stylized reference image corresponding to the training sample; a first training module, configured to, according to the gradient of the reference image, in the j-th convolution layer of the perceptual loss model An output feature map, and training the perceptual loss model on a second output feature map of a j-th convolution layer of the perceptual loss model according to a sample output result.
  • the first training module is configured to train the perceptual loss model using the following formula:
  • F W represents the pixel-level loss model
  • ⁇ j () represents the output feature map of the j-th convolution layer when the perceptual loss model adopts the perceptual loss model
  • C j H j W j does not denote the j-th The number of channels, height and width of the feature map corresponding to the layer convolution layer.
  • the j-th convolution layer is conv3-3.
  • the training unit further includes: a second determining module for determining a gradient of the training sample; and a second input module for using the gradient of the training sample as an input of the pixel-level loss model, A sample output result is obtained from the pixel-level loss model; a third determination module is configured to determine a gradient of a stylized reference image corresponding to the training sample; a second training module is configured to use the gradient and sample of the reference image The output trains the pixel-level loss model.
  • the pixel-level loss model includes a first set of convolutional layers, an upsampling layer, and a second set of convolutional layers, and the pixel-level loss is trained according to a gradient of the reference image and a sample output result.
  • the model includes: inputting the gradient of the training sample to a first set of convolutional layers to obtain a sample feature map; inputting the sample feature map to an upsampling layer and upsampling to the pixel size of the initial image; The sampled feature map of the sample is input to the second set of convolutional layers to obtain the sample output result.
  • the second training module is configured to: With the corresponding The absolute value of the difference trains the pixel-level loss model; wherein, Represents the gradient of the ith training sample, F W represents the pixel-level loss model, Represents the output of the gradient of the i-th training sample through the pixel-level loss model F W ; Gradient representing the stylized reference image of the i-th training sample.
  • the second training module is configured to train the pixel-level loss model using the following formula:
  • F W represents the pixel-level loss model
  • D represents the number of samples in the training sample set.
  • the reconstruction unit is configured to use, as the style image, an image that meets a structural similarity condition with a feature map on the gradient domain of the initial image.
  • the feature map on the gradient domain with the initial image satisfies a structural similarity condition, including: the degree of structural difference between the style image and the initial image is less than a similarity threshold; or The degree of structural difference between the style image and the initial image is the smallest, wherein the degree of structural difference is the change trend of the feature image on the gradient domain in the gradient domain and the feature map in the gradient domain in at least one reference direction.
  • the reconstruction unit is configured to: Perform image reconstruction to obtain a style image; of which: Represents the gradient of the initial image in the x direction, A feature map representing the gradient of the initial image in the x direction on the gradient domain after passing through the image style conversion model, Represents the gradient of the initial image in the y direction, A feature map representing the gradient of the initial image in the y-direction on the gradient domain through the image style conversion model, Represents the gradient of the style image in the x direction, Represents the gradient of the style image in the y direction.
  • the reconstruction unit is configured to perform image reconstruction according to color information of the initial image and a feature map of the initial image on a gradient domain to obtain a style image.
  • a reconstruction unit is configured to use an image that satisfies a structural similarity condition with a feature map on the gradient domain of the initial image, and use an image that satisfies a color similarity condition with the initial image as the image. Style image.
  • the apparatus further includes: an extraction unit configured to perform feature extraction on the initial image to obtain a face region in the initial image; correspondingly, the reconstruction unit is configured to combine the The feature image of the initial image on the gradient domain satisfies the structural similarity condition, and the image that satisfies the color similarity condition with the face area in the initial image is used as the style image.
  • the reconstruction unit is configured to Perform image reconstruction to obtain a style image; where: I represents the initial image, S represents the style image, Represents the gradient of the initial image in the x direction, A feature map representing the gradient of the initial image in the x direction on the gradient domain after passing through the image style conversion model, Represents the gradient of the initial image in the y direction, A feature map on the gradient domain representing the gradient of the initial image in the y-direction through an image style conversion model, Represents the gradient of the style image in the x direction, Represents the gradient of the style image in the y direction.
  • the obtaining unit includes: a fourth determining module for determining a gradient of the initial image in at least one reference direction; an obtaining module for inputting a gradient in at least one reference direction to an image style A transformation model, corresponding to obtaining a feature map of the initial image in the gradient domain in at least one reference direction from the image style transformation model; correspondingly, the reconstruction unit is configured to: The feature map is reconstructed to obtain a style image.
  • the at least one reference direction includes x and y directions in a plane reference coordinate system, and correspondingly, a determining unit is configured to determine gradients of the initial image in x and y directions, respectively;
  • the obtaining unit is configured to input the gradients in the x and y directions to the image style conversion model, and correspondingly obtain a feature map of the initial image in the gradient domain in the x and y directions from the image style conversion model.
  • the reconstruction unit is configured to perform image reconstruction according to the feature maps in the gradient domain in the x and y directions to obtain a style image.
  • the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be an electronic device or a server, etc.) is caused to execute all or part of the methods described in the embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (Read Only Memory, ROM), a magnetic disk, or an optical disk, which can store program codes.
  • ROM Read Only Memory
  • ROM Read Only Memory
  • magnetic disk or an optical disk
  • an embodiments of the present application are not limited to any specific combination of hardware and software.
  • an embodiment of the present application provides a computer device including a memory and a processor.
  • the memory stores a computer program executable on the processor, and the processor implements the foregoing image style conversion method when the processor executes the program. Steps.
  • An embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, where the computer program is executed by a processor, and includes steps in the image style conversion method described above.
  • An embodiment of the present application further provides a computer program product.
  • the computer program product includes computer-executable instructions. After the computer-executable instructions are executed, the steps in the image style conversion method can be implemented.
  • FIG. 7 is a schematic diagram of a hardware entity of a computer device according to an embodiment of the present application.
  • the hardware entity of the computer device 700 includes a processor 701, a communication interface 702, and a memory 703, of which:
  • the processor 701 generally controls the overall operation of the computer device 700.
  • the communication interface 702 may enable a computer device to communicate with other terminals or servers through a network.
  • the memory 703 is configured to store instructions and applications executable by the processor 701, and may also cache data to be processed or processed by each module in the processor 701 and the computer device 700 (for example, image data, audio data, voice communication data, and Video communication data), can be realized by flash memory (FLASH) or random access memory (Random Access Memory, RAM).
  • an embodiment or “an embodiment” mentioned throughout the specification means that a particular feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the present application.
  • the appearances of "in one embodiment” or “in an embodiment” appearing throughout the specification are not necessarily referring to the same embodiment.
  • the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
  • the size of the sequence numbers of the above processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not deal with the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the above-mentioned serial numbers of the embodiments of the present application are merely for description, and do not represent the superiority or inferiority of the embodiments.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • there may be another division manner such as multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed components are coupled, or directly coupled, or communicated with each other through some interfaces.
  • the indirect coupling or communication connection of the device or unit may be electrical, mechanical, or other forms. of.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated into one unit; the above integration
  • the unit can be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • the above-mentioned integrated unit of the present application is implemented in the form of a software functional module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.
  • the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be an electronic device or a server) is caused to execute all or part of the methods described in the embodiments of the present application.
  • the foregoing storage media include: various types of media that can store program codes, such as a mobile storage device, a ROM, a magnetic disk, or an optical disc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种图像风格转换方及装置、设备、存储介质,其中,所述方法包括:确定待进行风格转换的初始图像(S201);将所述初始图像的梯度输入到图像风格转换模型,从所述图像风格转换模型获得所述初始图像的在梯度域上的特征图(S202);所述图像风格转换模型是在梯度域基于像素级损失和感知损失训练得到;根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像(S203)。

Description

一种图像风格转换方及装置、设备、存储介质
相关申请的交叉引用
本申请基于申请号为201810917979.7、申请日为2018年08月13日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以全文引入的方式引入本申请。
技术领域
本申请涉及图像技术,尤其涉及一种图像风格转换方及装置、设备、存储介质。
背景技术
基于深度学习的图像风格转换是近年来新起的一个研究问题。图像风格转换问题虽然一直都存在,但是2015年德国的研究员Gatys才第一次使用神经网络的方法打开了用深度学习创造图像艺术风格的大门。目前的技术并没有对人脸照片的风格转换进行优化,例如,现有的方法应用到自拍图像上时,普遍存在的缺点是:图像风格转换后导致的人脸边缘的变形及人脸肤色不一致。
发明内容
有鉴于此,本申请实施例为解决现有技术中存在的至少一个问题而提供一种图像风格转换方及装置、设备、存储介质。
本申请实施例的技术方案是这样实现的:
本申请实施例提供一种图像风格转换方法,所述方法包括:获取待进行风格转换的初始图像;将所述初始图像的梯度输入到图像风格转换模型,从所述图像风格转换模型获得所述初始图像的在梯度域上的特征图;所述图像风格转换模型是在梯度域基于像素级损失和感知损失训练得到;根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像。
在一些实施例中,所述方法还包括:训练所述图像风格转换模型;其中,所述图像风格转换模型的训练目标为总的损失L total最小,其中,L total采用下式来表示:L total=αL feat+βL pixel;其中,所述L feat表示感知损失,所述L pixel表示像素级损失,所述α和所述β的取值均为实数。
在一些实施例中,所述图像风格转换模型包括像素级损失模型和感知损失模型,其中所述像素级损失模型是通过在梯度域将像素级损失最小作为训练目标而得到的,所述感知损失模型是通过在梯度域训练将感知损失最小作为训练目标而得到的。
在一些实施例中,所述像素级损失模型和所述感知损失模型的训练过程包括:将训练样本的梯度输入所述像素级损失模型,从所述像素级损失模型获得所述训练样本的样本输出结果;确定所述训练样本对应的风格化的参考图像的梯度;根据所述参考图像的梯度在所述感知损失模型的第j层卷积层的第一输出特征图,和根据样本输出结果在所 述感知损失模型的第j层卷积层的第二输出特征图,训练所述感知损失模型。
在一些实施例中,所述根据所述参考图像的梯度在所述感知损失模型的第j层卷积层的第一输出特征图,和根据样本输出结果在所述感知损失模型的第j层卷积层的第二输出特征图,训练所述感知损失模型,包括:采用下式训练所述感知损失模型:
Figure PCTCN2018117293-appb-000001
其中,
Figure PCTCN2018117293-appb-000002
表示第i个训练样本的梯度,F W表示像素级损失模型,
Figure PCTCN2018117293-appb-000003
表示第i个训练样本的梯度经过像素级损失模型的输出结果;
Figure PCTCN2018117293-appb-000004
表示第i个训练样本的风格化的参考图像的梯度;ψ j()表示感知损失模型采用感知损失模型时的第j层卷积层的输出特征图,C jH jW j别表示第j层卷积层对应的特征图的通道数、高和宽。
在一些实施例中,所述像素级损失模型的训练过程包括:将训练样本的梯度作为所述像素级损失模型的输入,从所述像素级损失模型获得样本输出结果;确定所述训练样本对应的风格化的参考图像的梯度;根据所述参考图像的梯度和样本输出结果训练所述像素级损失模型。
在一些实施例中,所述像素级损失模型包括第一卷积层集合、上采样层和第二卷积层集合,所述根据所述参考图像的梯度和样本输出结果训练所述像素级损失模型包括:将所述训练样本的梯度输入到所述第一卷积层集合,得到样本特征图;将所述样本特征图输入到所述上采样层,上采样至所述初始图像的像素尺寸;将上采样后的样本特征图输入到所述第二卷积层集合,得到样本输出结果。
在一些实施例中,所述根据所述参考图像的梯度和样本输出结果训练所述像素级损失模型,包括:根据每一训练样本的
Figure PCTCN2018117293-appb-000005
与对应的
Figure PCTCN2018117293-appb-000006
之差的绝对值训练所述像素级损失模型;其中,
Figure PCTCN2018117293-appb-000007
表示第i个训练样本的梯度,F W表示像素级损失模型,
Figure PCTCN2018117293-appb-000008
表示第i个训练样本的梯度经过像素级损失模型F W的输出结果;
Figure PCTCN2018117293-appb-000009
表示第i个训练样本的风格化的参考图像的梯度。
在一些实施例中,所述根据每一训练样本的
Figure PCTCN2018117293-appb-000010
与对应的
Figure PCTCN2018117293-appb-000011
之差的绝对值训练所述像素级损失模型,包括:采用下式训练所述像素级损失模型:
Figure PCTCN2018117293-appb-000012
Figure PCTCN2018117293-appb-000013
其中,
Figure PCTCN2018117293-appb-000014
表示第i个训练样本的梯度,F W表示像素级损失模型,
Figure PCTCN2018117293-appb-000015
表示第i个训练样本的梯度经过像素级损失模型F W的输出结果;
Figure PCTCN2018117293-appb-000016
表示第i个训练样本的风格化的参考图像的梯度,D表示训练样本集合中的样本数。
在一些实施例中,所述根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像,包括:将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,作为所述风格图像;其中,所述与所述初始图像的在梯度域上的特征图满足结构相似度条件,包括:所述风格图像与所述初始图像的结构差异程度小于相似度阈值,或者,所述风格图像与所述初始图像的结构差异程度最小,其中,结构差异程度为梯度域上的风格图像与所述初始图像的在梯度域上的特征图在至少一个参考方向的变化趋势。
在一些实施例中,所述根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像,包括:根据
Figure PCTCN2018117293-appb-000017
进行图像重构,得到风格图像;其中:
Figure PCTCN2018117293-appb-000018
表示所述初始图像在x方向的梯度,
Figure PCTCN2018117293-appb-000019
表示所述初始图像在x方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
Figure PCTCN2018117293-appb-000020
表示所述初始图像在y方向的梯度,
Figure PCTCN2018117293-appb-000021
表示所述初始图像在y方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
Figure PCTCN2018117293-appb-000022
表示风格图像在x方向的梯度,
Figure PCTCN2018117293-appb-000023
表示风格图像在y方向的梯度。
在一些实施例中,所述根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像,包括:根据所述初始图像的颜色信息和所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像。
在一些实施例中,所述根据所述初始图像的颜色信息和所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像,包括:将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,且将与所述初始图像满足颜色相似度条件的图像,作为所述风格图像。
在一些实施例中,所述方法还包括:对所述初始图像进行特征提取,得到所述初始图像中的人脸区域;对应地,所述根据所述初始图像的颜色信息和所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像,包括:将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,且将与所述初始图像中的人脸区域满足颜色相似度条件的图像,作为所述风格图像。
在一些实施例中,所述将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,且将与所述初始图像满足颜色相似度条件的图像,作为所述风格图像,包括:根据
Figure PCTCN2018117293-appb-000024
进行图像重构,得到风格图像;其中:I表示初始图像,S表示风格图像,
Figure PCTCN2018117293-appb-000025
表示所述初始图像在x方向的梯度,
Figure PCTCN2018117293-appb-000026
表示所述初始图像在x方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
Figure PCTCN2018117293-appb-000027
表示所述初始图像在y方向的梯度,
Figure PCTCN2018117293-appb-000028
表示所述初始图像在y方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
Figure PCTCN2018117293-appb-000029
表示风格图像在x方向的梯度,
Figure PCTCN2018117293-appb-000030
表示风格图像在y方向的梯度。
在一些实施例中,所述将所述初始图像的梯度输入到图像风格转换模型,从所述图像风格转换模型获得初始图像的在梯度域上的特征图,包括:确定所述初始图像在至少一个参考方向的梯度;将在至少一个参考方向的梯度输入到图像风格转换模型,从所述图像风格转换模型对应获得初始图像在至少一个参考方向的在梯度域上的特征图;对应地,根据在至少一个参考方向的在梯度域上的特征图进行图像重构,得到风格图像。
在一些实施例中,所述至少一个参考方向包括在平面参考坐标系中的x、y方向上,对应地,确定所述初始图像分别在x、y方向上的梯度;分别将在x、y方向上的梯度输入到图像风格转换模型,从所述图像风格转换模型对应获得初始图像在x、y方向上的 在梯度域上的特征图;对应地,根据在x、y方向上的在梯度域上的特征图进行图像重构,得到风格图像。
本申请实施例提供一种图像风格转换装置,所述装置包括:获取单元,用于获取待进行风格转换的初始图像;获得单元,用于将所述初始图像的梯度输入到图像风格转换模型,从所述图像风格转换模型获得所述初始图像的在梯度域上的特征图;所述图像风格转换模型是在梯度域基于像素级损失和感知损失训练得到;重构单元,用于根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像。
在一些实施例中,所述装置还包括:训练单元,用于训练所述图像风格转换模型,其中所述图像风格转换模型的训练目标为总的损失L total最小,其中,L total采用下式来表示:L total=αL feat+βL pixel;其中,所述L feat表示感知损失,所述L pixel表示像素级损失,所述α和所述β的取值均为实数。
在一些实施例中,所述图像风格转换模型包括像素级损失模型和感知损失模型,其中所述像素级损失模型是通过在梯度域将像素级损失最小作为训练目标而得到,所述感知损失模型是通过在梯度域训练将感知损失最小作为训练目标而得到。
在一些实施例中,所述训练单元包括:第一输入模块,用于将训练样本的梯度输入所述像素级损失模型,从所述像素级损失模型获得所述训练样本的样本输出结果;第一确定模块,用于确定所述训练样本对应的风格化的参考图像的梯度;第一训练模块,用于根据所述参考图像的梯度在所述感知损失模型的第j层卷积层的第一输出特征图,和根据样本输出结果在所述感知损失模型的第j层卷积层的第二输出特征图训练所述感知损失模型。
在一些实施例中,所述第一训练模块,用于采用下式训练所述感知损失模型:
Figure PCTCN2018117293-appb-000031
其中,
Figure PCTCN2018117293-appb-000032
表示第i个训练样本的梯度,F W表示像素级损失模型,
Figure PCTCN2018117293-appb-000033
表示第i个训练样本的梯度经过像素级损失模型的输出结果;
Figure PCTCN2018117293-appb-000034
表示第i个训练样本的风格化的参考图像的梯度;ψ j()表示感知损失模型采用感知损失模型时的第j层卷积层的输出特征图,C jH jW j别表示第j层卷积层对应的特征图的通道数、高和宽。
在一些实施例中,所述训练单元包括:第二确定模块,用于确定训练样本的梯度;
第二输入模块,用于将所述训练样本的梯度作为所述像素级损失模型的输入,从所述像素级损失模型获得样本输出结果;第三确定模块,用于确定所述训练样本对应的风格化的参考图像的梯度;第二训练模块,用于根据所述参考图像的梯度和样本输出结果训练所述像素级损失模型。
在一些实施例中,所述像素级损失模型包括第一卷积层集合、上采样层和第二卷积层集合,所述第二训练模块包括:第一输入子模块,用于将所述训练样本的梯度输入到所述第一卷积层集合,得到样本特征图;上采样子模块,用于将所述样本特征图输入到所述上采样层,上采样至所述初始图像的像素尺寸;第二输入子模块,用于将上采样后 的样本特征图输入到所述第二卷积层集合,得到样本输出结果。
在一些实施例中,所述第二训练模块,用于根据每一训练样本的
Figure PCTCN2018117293-appb-000035
与对应的
Figure PCTCN2018117293-appb-000036
之差的绝对值训练所述像素级损失模型;其中,
Figure PCTCN2018117293-appb-000037
表示第i个训练样本的梯度,F W表示像素级损失模型,
Figure PCTCN2018117293-appb-000038
表示第i个训练样本的梯度经过像素级损失模型F W的输出结果;
Figure PCTCN2018117293-appb-000039
表示第i个训练样本的风格化的参考图像的梯度。
在一些实施例中,所述第二训练模块,用于采用下式训练所述像素级损失模型:
Figure PCTCN2018117293-appb-000040
其中,
Figure PCTCN2018117293-appb-000041
表示第i个训练样本的梯度,F W表示像素级损失模型,
Figure PCTCN2018117293-appb-000042
表示第i个训练样本的梯度经过像素级损失模型F W的输出结果;
Figure PCTCN2018117293-appb-000043
表示第i个训练样本的风格化的参考图像的梯度,D表示训练样本集合中的样本数。
在一些实施例中,所述重构单元,用于将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,作为所述风格图像;其中,所述与所述初始图像的在梯度域上的特征图满足结构相似度条件,包括:所述风格图像与所述初始图像的结构差异程度小于相似度阈值,或者,所述风格图像与所述初始图像的结构差异程度最小,其中,结构差异程度为梯度域上的风格图像与所述初始图像的在梯度域上的特征图在至少一个参考方向的变化趋势。
在一些实施例中,所述重构单元,用于:根据
Figure PCTCN2018117293-appb-000044
Figure PCTCN2018117293-appb-000045
进行图像重构,得到风格图像;其中:
Figure PCTCN2018117293-appb-000046
表示所述初始图像在x方向的梯度,
Figure PCTCN2018117293-appb-000047
表示所述初始图像在x方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
Figure PCTCN2018117293-appb-000048
表示所述初始图像在y方向的梯度,
Figure PCTCN2018117293-appb-000049
表示所述初始图像在y方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
Figure PCTCN2018117293-appb-000050
表示风格图像在x方向的梯度,
Figure PCTCN2018117293-appb-000051
表示风格图像在y方向的梯度。
在一些实施例中,所述重构单元,用于根据所述初始图像的颜色信息和所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像。
在一些实施例中,所述重构单元,用于将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,且将与所述初始图像满足颜色相似度条件的图像,作为所述风格图像。
在一些实施例中,所述装置还包括:提取单元,用于对所述初始图像进行特征提取,得到所述初始图像中的人脸区域;对应地,所述重构单元,用于将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,且将与所述初始图像中的人脸区域满足颜色相似度条件的图像,作为所述风格图像。
在一些实施例中,所述重构单元,用于根据
Figure PCTCN2018117293-appb-000052
Figure PCTCN2018117293-appb-000053
进行图像重构,得到风格图像;其中:I表示初始图像,S表示风格图像,
Figure PCTCN2018117293-appb-000054
表示所述初始图像在x方向的梯度,
Figure PCTCN2018117293-appb-000055
表示所述初始图像在x方向的梯 度经过所述图像风格转换模型的在梯度域上的特征图,
Figure PCTCN2018117293-appb-000056
表示所述初始图像在y方向的梯度,
Figure PCTCN2018117293-appb-000057
表示所述初始图像在y方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
Figure PCTCN2018117293-appb-000058
表示风格图像在x方向的梯度,
Figure PCTCN2018117293-appb-000059
表示风格图像在y方向的梯度。
在一些实施例中,所述获得单元,包括:第四确定模块,用于确定所述初始图像在至少一个参考方向的梯度;获得模块,用于将在至少一个参考方向的梯度输入到图像风格转换模型,从所述图像风格转换模型对应获得初始图像在至少一个参考方向的在梯度域上的特征图;对应地,所述重构单元,用于根据在至少一个参考方向的在梯度域上的特征图进行图像重构,得到风格图像。
在一些实施例中,所述至少一个参考方向包括在平面参考坐标系中的x、y方向上,对应地,确定单元,用于确定所述初始图像分别在x、y方向上的梯度;所述获得单元,用于分别将在x、y方向上的梯度输入到图像风格转换模型,从所述图像风格转换模型对应获得初始图像在x、y方向上的在梯度域上的特征图;对应地,所述重构单元,用于根据在x、y方向上的在梯度域上的特征图进行图像重构,得到风格图像。
本申请实施例提供一种计算机设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时上述图像风格转换方法中的步骤。
本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述图像风格转换方法中的步骤。
本申请实施例提供一种计算机程序产品,所述计算机程序产品包括计算机可执行指令,该计算机可执行指令被执行后,能够实现上述图像风格转换方法中的步骤。
本申请实施例提供的图像风格转换方及装置、设备、存储介质,其中,获取待进行风格转换的初始图像;将所述初始图像的梯度输入到图像风格转换模型,从所述图像风格转换模型获得所述初始图像的在梯度域上的特征图;所述图像风格转换模型是在梯度域基于像素级损失和感知损失训练得到;根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像;如此,通过在梯度域基于像素级损失和感知损失训练得到的图像风格转换模型,能够克服相关技术中人脸的边缘变形和颜色不一致的缺点,能够在实现图像风格转换的同时,对输入的初始图像进行美化和增强。
附图说明
图1为本申请实施例网络架构的组成结构示意图;
图2A为本申请实施例图像风格转换方法的实现流程示意图;
图2B为本申请实施例的下载场景示意图;
图3A为本申请实施例的实施场景示意图一;
图3B为本申请实施例的实施场景示意图二;
图4A为本申请实施例的实施场景示意图三;
图4B为本申请实施例的实施场景示意图四;
图5A为本申请实施例提供的卷积神经网络模型的组成结构示意图;
图5B为本申请实施例像素级损失模型的组成结构示意图;
图6为本申请实施例图像风格转换装置的组成结构示意图;
图7为本申请实施例中计算机设备的一种硬件实体示意图。
具体实施方式
使用神经网络的方法生成风格图的过程一般是这样的:利用神经网络模型例如VGG16模型或VGG19,对一张原图(Content Image)和一张风格图像(Style Image)分别进行图像特征提取,即对原图提取内容特征,对风格图提取风格特征。通过利用对内容特征和风格特征构造损失函数,对一张随机初始化图像进行损失值计算并反馈重绘图像得到生成图(Generated Image),这张生成图在内容上会与原图相似,在风格上会与风格图像相似。但是这个算法每一次生成一张图像都需要进行一次训练,需要耗费的时间比较长。
基于快速风格转移算法,训练一个网络,对于任意一张图像都可以转换成为网络对应的风格,所以每次生成一张图像仅仅是前向传播一次网络,速度会很快。
快速转移算法一般包含两个网络:一个为图像转换网络(Image Transform Network),一个为损失网络(Loss Network)。图像转换网络用来对图像进行转换,它的参数是变化的,而损失网络的参数是保持不变的,可以用在ImageNet图像库训练好的VGG-16网络作为损失网络,原图经过图像转换网络的结果图、风格图和原图这3张图都通过损失网络,对其提取感知损失(Perceptual Loss),利用感知损失来对图像转换网络进行训练。在训练阶段利用大量图像对图像转换网络进行训练得到模型,在输出阶段套用模型进行输出得到生成图,这样得出的网络相对Gatys的模型得到生成图的速度快上三个数量级。
但是,目前的技术并没有对人脸照片的风格转换进行优化,例如:现有的方法应用到自拍图像上时,普遍存在两个明显的缺点:1)人脸的边缘可能会偏离与原始的图像,即输出图像的结构信息发生变化;2)人脸的肤色可能与原始的肤色不一致,即输出图像的颜色信息发生变化。这样导致一个后果就是,在风格化之后,会让用户觉得不像是本人,例如初始图像中用户A的人像是圆脸,经过风格化之后,输出的风格图像中用户A的人像是锥子脸;再如,用户B的皮肤白皙,风格化之后,输出的风格图像的用户B的皮肤黝黑。即如何更好地保持原初始图像的结构信息和颜色信息成为需要解决的问题。
为了解决目前技术中的问题,本申请实施例提出了一种完全基于图像梯度域的图像风格转换的卷积神经网络(Convolutional Neural Networks,CNN)结构;由于梯度域学习的保边性,使得本申请实施例提供的图像风格转换网络可以克服以前方法的边缘形变的缺点。本申请实施例中,在图像风格转换的图像重构阶段,引入了称为颜色置信度(color confidence)的术语来保持结果图像皮肤颜色上的逼真性。图像重构阶段既利用了原图的结构信息,也利用了原图的颜色信息,这样可以使得结果更为自然。
本申请实施例中,首次直接在梯度域使用感知损失(perceptual loss),使得学习到 的风格信息更聚焦在笔画上而不是颜色上,使得其更适合与人脸的风格转换任务。
为了更好地理解本申请的各实施例,现对有关名词进行解释:
采样操作,通常采样操作指的是下采样(subsampled)操作或降采样(down-sampled),如果采样对象是连续信号,那么连续信号经过下采样操作之后,得到的是离散信号。对于图像来说,下采样操作的目的可能是为了在计算上比较方便而缩小图像。下采样操作的原理:对于一幅图像I尺寸为M*N,对其进行s倍下采样,即得到(M/s)*(N/s)尺寸的得分辨率图像,当然s应该是M和N的公约数才行,如果考虑的是矩阵形式的图像,就是把原始图像s*s窗口内的图像变成一个像素,这个像素点的值就是窗口内所有像素的均值。
上采样操作,是下采样操作的逆过程,也称增取样(Up-sampling)或内插(Interpolating)。对于图像而言,经过上采样操作可以得到高分辨率的图像。上采样操作的原理:图像放大几乎都是采用内插值方法,即在原有图像像素的基础上在像素点之间采用合适的插值算法插入新的像素。
通道(channel),该词语有两种不同的含义,第一种是对于样本图像(图像作为训练样本),通道是指颜色通道(Number of color channels in the example images),下面将用颜色通道来表示样本图像的通道;第二种是输出空间的维数,例如卷积操作中输出通道的个数,或者说每个卷积层中卷积核的数量。
颜色通道,把图像分解成一个或多个颜色成分或颜色分量。单颜色通道,一个像素点只需一个数值表示,只能表示灰度,0为黑色。三颜色通道,如果采用红绿蓝(Red Green Blue,RGB)色彩模式,把图像分为红绿蓝三个颜色通道,可以表示彩色,全0表示黑色。四颜色通道,在RGB色彩模式的基础上加上alpha通道,表示透明度,alpha=0表示全透明。
卷积神经网络,是一种多层的监督学习神经网络,隐含层的卷积层和池采样层是实现卷积神经网络特征提取功能的核心模块。卷积神经网络的低隐层是由卷积层和最大池采样层交替组成,高层是全连接层对应传统多层感知器的隐含层和逻辑回归分类器。第一个全连接层的输入是由卷积层和子采样层进行特征提取得到的特征图像。最后一层输出层是一个分类器,可以采用逻辑回归,Softmax回归甚至是支持向量机对初始图像进行分类。CNN中每一层的由多个map组成,每个map由多个神经单元组成,同一个map的所有神经单元共用一个卷积核(即权重),卷积核往往代表一个特征,比如某个卷积核代表一段弧,那么把这个卷积核在整个图像上卷积一遍,卷积值较大的区域就很有可能是一段弧。CNN一般采用卷积层与采样层交替设置,即一层卷积层接一层采样层,采样层后接一层卷积;当然也可以多个卷积层接一个采样层,这样卷积层提取出特征,再进行组合形成更抽象的特征,最后形成对图像对象的描述特征,CNN后面还可以跟全连接层。
卷积神经网络结构包括卷积层、降采样层和全连接层。每一层有多个特征图,每个特征图通过一种卷积滤波器提取输入的一种特征,每个特征图有多个神经元。卷积层, 使用卷积层的原因是卷积运算的一个重要特点是,通过卷积运算,可以使原信号特征增强,并且降低噪音。降采样层,使用降采样的原因是,根据图像局部相关性的原理,对图像进行子采样可以减少计算量,同时保持图像旋转不变性。全连接层,采用softmax全连接,得到的激活值即卷积神经网络提取到的图像特征。
激活函数,神经元是一个多层感知机的基本单元,它的函数就成为激活传输。即对于一个神经元来说,输入是部分或全部的卷积神经网络的输入或部分或全部的前一层的输出,经过激活函数的计算,得出的结果作为神经元的输出结果。常用的激活函数有sigmoid函数、tanh函数、线性整流函数(Rectified Linear Unit,ReLu)。
ReLu函数,其公式即为个ReLu(x)=max(0,x),从ReLu函数的图形可以看出ReLu与其他激活函数例如sigmoid函数相比,主要变化有三点:①单侧抑制;②相对宽阔的兴奋边界;③稀疏激活性。
像素级损失(Pixel-wise Loss),假设I est是卷积神经网络的输出结果,I HR是原始高分辨率图像,那么pixel-wise loss强调的是两幅图像I est和I HR之间每个对应像素的匹配,这与人眼的感知结果有所区别。一般来说,通过pixel-wise loss训练的图像通常会较为平滑,缺少高频信息。
感知损失(Perceptual Loss),假设I est表示卷积神经网络的输出结果,I HR表示原始高分辨率图像,将I est和I HR分别输入到一个可微分的函数Φ中,这样避免了要求网络输出图像与原始高分辨率图像在pixel-wise上的一致。
VGG模型,VGG模型结构简单有效,前几层仅使用3×3卷积核来增加网络深度,通过最大池化(max pooling)依次减少每层的神经元数量,最后三层分别是2个有4096个神经元的全连接层和一个softmax层。“16”和“19”表示网络中的需要更新需要权重(即weight,要学习的参数)的网络层数,VGG16模型和VGG19模型的权重都由ImageNet训练而来。
模型参数,一般可以理解为模型内部的配置变量,可以用历史数据或训练样本估计模型参数的值,或者说,模型参数是可以通过历史数据或训练样本自动学习出的变量。在某种程度上,模型参数有以下特征:进行模型预测时需要模型参数;模型参数值可以定义模型功能;模型参数用数据估计或数据学习得到;模型参数一般不由实践者手动设置;模型参数通常作为学习模型的一部分保存;通常使用优化算法估计模型参数,优化算法是对参数的可能值进行的一种有效搜索。在人工神经网络中,网络模型的权重、偏差一般称为模型参数。
模型超参数,一般可以理解为模型外部的配置,其值不能从数据估计得到。在某种程度上,模型超参数特征有:模型超参数常应用于估计模型参数的过程中;模型超参数通常由实践者直接指定;模型超参数通常可以使用启发式方法来设置;模型超参数通常根据给定的预测建模问题而调整。换句话说,模型超参数就是用来确定模型的一些参数,超参数不同,模型是不同的。这个模型不同的意思就是有微小的区别,比如假设都是CNN模型,如果层数不同,模型不一样,虽然都是CNN模型哈。在深度学习中,超参 数有:学习速率、迭代次数、层数、每层神经元的个数等等。
下面结合附图和实施例对本申请的技术方案进一步详细阐述。
本申请实施例先提供一种网络架构,图1为本申请实施例网络架构的组成结构示意图,如图1所示,该网络架构包括两个或多个电子设备11至1N和服务器31,其中电子设备11至1N与服务器31之间通过网络21进行交互。电子设备在实现的过程中可以为各种类型的具有信息处理能力的计算机设备,例如所述电子设备可以包括手机、平板电脑、台式机、个人数字助理、导航仪、数字电话、电视机等。
本申请实施例提出一种图像风格转换方法,能够有效解决输出图像的结构信息与初始图像相比发生变化的问题,该方法应用于电子设备,该方法所实现的功能可以通过电子设备中的处理器调用程序代码来实现,当然程序代码可以保存在计算机存储介质中,可见,该电子设备至少包括处理器和存储介质。
图2A为本申请实施例图像风格转换方法的实现流程示意图,如图2A所示,该方法包括:
步骤S201,获取待进行风格转换的初始图像;
本申请实施例提供的图像风格转换方法在实现的过程中可以通过客户端(应用程序)来体现。参见图2B所示,用户在自己上的电子设备12上从服务器31下载客户端,例如,电子设备12向服务器31发送下载请求,该下载请求用于下载客户端,服务器31响应该下载请求,服务器31向电子设备12发送下载响应,该下载响应中携带有客户端,例如安卓系统时的安卓应用包(Android Package,APK)然后用户在自己的电子设备上安装下载的客户端,然后电子设备运行客户端,即电子设备可以实现本申请实施例提供的图像风格转换方法。
如果步骤S201是在电子设备侧实现,那么实现过程可以是这样的:当用户从相册中选择一张图片,客户端接收用户的选择图片的操作,即客户端将选择的图片确定为待进行风格转换的初始图像;或者,用户用电子设备的相机或外置相机拍摄一张照片,客户端接收用户拍摄照片的操作,即客户端将拍摄的照片确定为待进行风格转换的初始图像。本领域的技术人员应当理解,该步骤还可以有其他的实施方式。
步骤S202,将所述初始图像的梯度输入到图像风格转换模型,从所述图像风格转换模型获得所述初始图像的在梯度域上的特征图;
这里,所述图像风格转换模型是经过训练的,并且在梯度域基于像素级损失和感知损失训练得到。在一些实施例中,所述图像风格转换模型是通过在梯度域将像素级损失和感知损失作为训练目标而得到的;
步骤S203,根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像。
其中风格图像,是重构的进行风格化的图像。在实现的过程中,经过训练的图像风格转换模型可以在电子设备的本地,也可以是在服务器端。当经过训练的图像风格转换模型在电子设备本地时,可以是电子设备安装客户端的时候,即安装了经过训练的图像 风格转换模型,这样,参见图3A所示,电子设备通过步骤S201获得初始图像,然后通过步骤S202获得所述初始图像的在梯度域上的特征图(即输出结果),最后通过步骤S203获得输出的风格图像。从以上过程可以看出,电子设备在安装完客户端之后,上述的步骤S201至步骤S203都在电子设备本地执行,最后,电子设备将得到的风格图像输出给用户。
在一些实施例中,经过训练的图像风格转换模型也可以位于服务器端,参见图3B所示,这样电子设备将初始图像发送给服务器,这样服务器接收电子设备发送的初始图像,这样服务器实现了步骤S201,换句话说,如果上述的方法是在服务器端实现,那么步骤S201,包括:服务器接收电子设备发送的初始图像,即服务器获取待进行风格转换的初始图像,然后服务器通过步骤S202获得所述初始图像的在梯度域上的特征图,最后通过步骤S203获得输出的风格图像;从以上过程可以看出,上述的步骤S201至步骤S203都在服务器端执行,最后服务器还可以将风格图像发送给电子设备,这样电子设备接收到风格图像后,输出风格图像给用户。本申请实施例中,电子设备在安装完客户端之后,用户上传用户的初始图像,以及接收服务器发送的风格图像,并将风格图像输出给用户。
在一些实施例中,上述的步骤S201至步骤S203还可以有部分是由电子设备来完成的,也可以有部分是由服务器来完成,例如,参见图4A,步骤S201和步骤S202可以由电子设备在本地来执行,然后电子设备将初始图像的在梯度域上的特征图发送给服务器,服务器执行步骤S203之后,得到风格图像,然后再将风格图像发送给电子设备,由电子设备输出风格图像。又如,参见图4B,步骤S201和步骤S202可以由服务器来执行,服务器将初始图像的在梯度域上的特征图发送给电子设备,电子设备执行步骤S203之后,得到风格图像,然后再将风格图像输出给用户。
在一些实施例中,所述方法还包括:训练所述图像风格转换模型,其中,所述图像风格转换模型的训练目标为总的损失L total最小,其中,L total采用下式来表示:
L total=αL feat+βL pixel;其中,所述L feat表示感知损失,所述L pixel表示像素级损失,所述α和所述β的取值均为实数。所述α与所述β的比值大于10且小于10的五次方。本领域的例如,所述α的取值为10000,所述β的取值为1。本领域的技术人员应当理解,所述α与所述β的取值可以根据具体的应用场景而进行相应设置,本申请实施例对其取值不作限定。
在一些实施例中,所述图像风格转换模型包括像素级损失模型和感知损失模型,其中,所述像素级损失模型是通过在梯度域将像素级损失最小作为训练目标而得到的像素级损失模型,所述的感知损失模型是通过在梯度域训练将感知损失最小作为训练目标而得到的。
其中,所述像素级损失模型为像素级损失模型,且所述感知损失模型为感知损失模型时的训练过程,包括:
步骤S11,确定训练样本的梯度;假设用I i表示第i个训练样本时,确定第i个训练 样本I i的梯度为
Figure PCTCN2018117293-appb-000060
步骤S12,将所述训练样本的梯度输入所述像素级损失模型,从所述像素级损失模型获得所述训练样本的样本输出结果;其中,将第i个训练样本I i的梯度
Figure PCTCN2018117293-appb-000061
输入所述像素级损失模型F W,从像素级损失模型获得训练样本的样本输出结果
Figure PCTCN2018117293-appb-000062
步骤S13,确定所述训练样本对应的风格化的参考图像的梯度;其中,风格化的参考图像可以为用现有的风格化算法得到的令人不满意的风格化参考图片,那么假设所述训练样本I i对应的风格化的参考图像为
Figure PCTCN2018117293-appb-000063
那么参考图像的梯度为
Figure PCTCN2018117293-appb-000064
步骤S14,根据所述参考图像的梯度在所述感知损失模型的第j层卷积层的第一输出特征图,和根据样本输出结果在所述感知损失模型的第j层卷积层的第二输出特征图训练所述感知损失模型。其中,第j卷积层可以是卷积神经网络模型中的任意一层,当该卷积神经网络为VGG16时,第j卷积层可以为VGG16中的conv3-3层。
在一些实施例中,所述像素级损失模型包括第一卷积层集合、上采样层和第二卷积层集合,所述根据所述参考图像的梯度和样本输出结果训练所述像素级损失模型包括:将所述训练样本的梯度输入到所述第一卷积层集合,得到样本特征图;将所述样本特征图输入到所述上采样层,上采样至所述初始图像的像素尺寸;将上采样后的样本特征图输入到所述第二卷积层集合,得到样本输出结果。
在一些实施例中,所述根据所述参考图像的梯度在所述感知损失模型的第j层卷积层的第一输出特征图,和根据样本输出结果在所述感知损失模型的第j层卷积层的第二输出特征图,训练所述感知损失模型,包括:
采用下式训练所述感知损失模型:
Figure PCTCN2018117293-appb-000065
其中,
Figure PCTCN2018117293-appb-000066
表示第i个训练样本的梯度,F W表示像素级损失模型,
Figure PCTCN2018117293-appb-000067
表示第i个训练样本的梯度经过像素级损失模型的输出结果;
Figure PCTCN2018117293-appb-000068
表示第i个训练样本的风格化的参考图像的梯度;ψ j()表示感知损失模型采用卷积神经网络模型时的第j层卷积层的输出特征图,C jH jW j别表示第j层卷积层对应的特征图的通道数、高和宽。
在一些实施例中,当所述卷积神经网络模型采用VGG16时,第j层卷积层为conv3-3。
在一些实施例中,所述像素级损失模型为像素级损失模型时的训练过程包括:
步骤S21,确定训练样本的梯度;步骤S22,将所述训练样本的梯度作为所述像素级损失模型的输入,从所述像素级损失模型获得样本输出结果;步骤S23,确定所述训练样本对应的风格化的参考图像的梯度;步骤S24,根据所述参考图像的梯度和样本输出结果训练所述像素级损失模型。其中,所述根据所述参考图像的梯度和样本输出结果训练所述像素级损失模型,包括:根据每一训练样本的
Figure PCTCN2018117293-appb-000069
与对应的
Figure PCTCN2018117293-appb-000070
之差的绝对值训练所述像素级损失模型;其中,
Figure PCTCN2018117293-appb-000071
表示第i个训练样本的梯度,F W表示像素级损失模型,
Figure PCTCN2018117293-appb-000072
表示第i个训练样本的梯度经过像素级损失模型F W的输出结果;
Figure PCTCN2018117293-appb-000073
表示第i个训练样本的风格化的参考图像的梯度。
在一些实施例中,所述根据每一训练样本的
Figure PCTCN2018117293-appb-000074
与对应的
Figure PCTCN2018117293-appb-000075
之差的绝对值训 练所述像素级损失模型,包括:采用下式训练所述像素级损失模型:
Figure PCTCN2018117293-appb-000076
其中,
Figure PCTCN2018117293-appb-000077
表示第i个训练样本的梯度,F W表示像素级损失模型,
Figure PCTCN2018117293-appb-000078
表示第i个训练样本的梯度经过像素级损失模型的输出结果;
Figure PCTCN2018117293-appb-000079
表示第i个训练样本的风格化的参考图像的梯度,D表示训练样本集合中的样本数。
在一些实施例中,所述根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像,包括:将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,作为所述风格图像。其中,所述与所述初始图像的在梯度域上的特征图满足结构相似度条件,包括:所述风格图像与所述初始图像的结构差异程度小于相似度阈值或者,所述风格图像与所述初始图像的结构差异程度最小,其中,结构差异程度为梯度域上的风格图像与所述初始图像的在梯度域上的特征图在至少一个参考方向的变化趋势。
其中,参考方向可以取图像的在平面参考坐标系中的x、y方向,当然可以有其他更多的方向,或者只使用一个方向。差异程度可以采用差值或差值的绝对值或基于差值的各种数学变形运算(例如在x、y方向差值的绝对值的平方和,即
Figure PCTCN2018117293-appb-000080
Figure PCTCN2018117293-appb-000081
其中I表示初始图像,S表示风格图像,‖ ‖表示绝对值符号)。
在一些实施例中,所述根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像,包括:根据
Figure PCTCN2018117293-appb-000082
进行图像重构,得到风格图像;其中:
Figure PCTCN2018117293-appb-000083
表示所述初始图像在x方向的梯度,
Figure PCTCN2018117293-appb-000084
表示所述初始图像在x方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
Figure PCTCN2018117293-appb-000085
表示所述初始图像在y方向的梯度,
Figure PCTCN2018117293-appb-000086
表示所述初始图像在y方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
Figure PCTCN2018117293-appb-000087
表示风格图像在x方向的梯度,
Figure PCTCN2018117293-appb-000088
表示风格图像在y方向的梯度。
在一些实施例中,所述根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像,包括:根据所述初始图像的颜色信息和所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像。其中,所述根据所述初始图像的颜色信息和所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像,包括:将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,且将与所述初始图像满足颜色相似度条件的图像,作为所述风格图像。
在一些实施例中,所述方法还包括:对所述初始图像进行特征提取,得到所述初始图像中的人脸区域;对应地,所述根据所述初始图像的颜色信息和所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像,包括:将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,且将与所述初始图像中的人脸区域满足颜色相似度条件的图像,作为所述风格图像。其中颜色相似度条件,即颜色信息满足的颜色相似条件,即风格图像与初始图像的颜色的差异程度即小于设定值或最小,其中,颜色的差 异程度采用待处理图像与目标图像的采样点的颜色值的差值表示,即采用‖S-I‖表示,其中I表示初始图像,S表示风格图像)。
本申请实施例中,为了不改变初始图像的颜色或者人脸的脸色,因此设置了颜色相似度条件,其中,颜色相似度条件中可以整个初始图像的颜色,也可以是初始图像中人脸的颜色。需要说明的是,上述两个条件结构相似度条件和颜色相似度条件,从理论上可以单独使用,即只使用一个条件来计算风格图像;也可以同时采用两个,同时分配对应的系数(权重),例如λ的取值为实数。
在一些实施例中,所述将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,且将与所述初始图像满足颜色相似度条件的图像,作为所述风格图像,包括:根据
Figure PCTCN2018117293-appb-000089
进行图像重构,得到风格图像;其中:I表示初始图像,S表示风格图像,
Figure PCTCN2018117293-appb-000090
表示所述初始图像在x方向的梯度,
Figure PCTCN2018117293-appb-000091
表示所述初始图像在x方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
Figure PCTCN2018117293-appb-000092
表示所述初始图像在y方向的梯度,
Figure PCTCN2018117293-appb-000093
表示所述初始图像在y方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
Figure PCTCN2018117293-appb-000094
表示风格图像在x方向的梯度,
Figure PCTCN2018117293-appb-000095
表示风格图像在y方向的梯度。
在一些实施例中,所述将所述初始图像的梯度输入到图像风格转换模型,从所述图像风格转换模型获得初始图像的在梯度域上的特征图,包括:步骤S31,确定所述初始图像在至少一个参考方向的梯度;步骤S32,将在至少一个参考方向的梯度输入到图像风格转换模型,从所述图像风格转换模型对应获得初始图像在至少一个参考方向的在梯度域上的特征图;对应地,根据在至少一个参考方向的在梯度域上的特征图进行图像重构,得到风格图像。
在一些实施例中,所述至少一个参考方向包括在平面参考坐标系中的x、y方向上,对应地,所述确定所述初始图像在至少一个参考方向的梯度,包括:确定所述初始图像分别在x、y方向上的梯度;所述将在至少一个参考方向的梯度输入到图像风格转换模型,从所述图像风格转换模型对应获得初始图像在至少一个参考方向的在梯度域上的特征图,包括:分别将在x、y方向上的梯度输入到图像风格转换模型,从所述图像风格转换模型对应获得初始图像在x、y方向上的在梯度域上的特征图;对应地,所述根据在至少一个参考方向的在梯度域上的特征图进行图像重构,得到风格图像,包括:根据在x、y方向上的在梯度域上的特征图进行图像重构,得到风格图像。
下面分三阶段来介绍本申请实施例的技术方案:第一阶段先绍一下本申请实施例提供的卷积神经网络模型的结构,然后在第二阶段介绍一下提供的卷积神经网络模型的训练过程,接着在第三阶段介绍一下利用训练的卷积神经网络进行图像重建的过程,即对初始图像进行图像风格转换的方法。
第一阶段:卷积神经网络模型的结构
图5A为本申请实施例提供的卷积神经网络模型的组成结构示意图,如图5A所示, 该卷积神经网络网络由两部分构成:
第一部分是要训练的卷积神经网络51(第一卷积神经网络),其将自拍图像的梯度作为输入,后面接连续的卷积层和ReLu层,然后采用上采样操作将特征图(feature map)上采样至原图大小,最后与艺术风格的参考图像的梯度计算像素级损失(Pixel-wise Loss)L pixel;其中,将自拍图像的梯度作为输入包括:将自拍图像在x方向上的梯度
Figure PCTCN2018117293-appb-000096
和自拍图像在y方向上的梯度
Figure PCTCN2018117293-appb-000097
分别作为卷积神经网络的输入。
在卷积神经网络中,卷积层的每一个卷积滤波器重复的作用于整个感受野中,对输入的自拍图像进行卷积,卷积的结果构成了输入的自拍图像的特征图,这样就提取出了自拍图像的局部特征。卷积神经网络的一个特点就是:最大池化(max-pooling)采样,它是一种非线性降采样方法,从最大池化的数学公式可以看出,最大池化即对邻域内特征点取最大。在通过卷积获取图像特征之后是利用这些特征进行分类,在获取图像的卷积的特征图后,要通过最大池采样方法对卷积特征进行降维。将卷积特征划分为数个的不相交区域,用这些区域的最大(或平均)特征来表示降维后的卷积特征。最大池采样方法的作用体现在两个方面:(1)、它减小了来自上层隐藏层的计算复杂度;(2)、这些池化单元具有平移不变性,即使图像有小的位移,提取到的特征依然会保持不变。由于增强了对位移的鲁棒性,最大池采样方法是一个高效的降低数据维度的采样方法。
第二部分是在ImageNet中已经训练好的VGG-16网络52(第二卷积神经网络),用来计算感知损失(perceptual loss)L feat。实际使用VGG-16的conv3-3层的输出来计算感知损失。
最后,将第一部分的L pixel和第二部分的L feat加在一起就是要计算的最终的总的目标函数(即总的损失L total)。
在一种实施例中,总的目标函数L total可以采用下面的公式(3-1)来计算。L total=αL feat+βL pixel    (3-1);其中,α和β的取值均为实数。例如,在训练中可以将α和β分别设为整数。
下面简单介绍一下图像梯度,图像梯度是描述图像像素之间差异的一种方法,可以作为图像的一种特征表征图像。从数学角度而言,图像梯度是指像素的一阶导数,可以用下面的公式来表示图像x方向的梯度
Figure PCTCN2018117293-appb-000098
和y方向的梯度
Figure PCTCN2018117293-appb-000099
分别用下面的公式(3-2)和公式(3-3)来表示:
Figure PCTCN2018117293-appb-000100
Figure PCTCN2018117293-appb-000101
需要说明的是,计算图像的梯度本身就有很多计算方法,只要能够描述像素之间的差异即可,本领域的技术人员应当理解,计算图像的梯度并不是一定用上述公式(3-2)和公式(3-3),事实上通常用的也是其它公式。例如,如果是叠加卷积操作来计算图像梯度,那么所使用的模板,通常为称其为梯度算子,常见的梯度算子有Sobel算子、Robinson算子、Laplace算子等。
第二阶段:对第一部分的卷积神经网络的训练过程
首先,确定训练样本,假设采集了D组训练图像
Figure PCTCN2018117293-appb-000102
Figure PCTCN2018117293-appb-000103
其中I i表示第i张原始图像,
Figure PCTCN2018117293-appb-000104
表示对第i张原始图像I i采用现有的风格化算法得到的令人不满意的风格化参考图像。
图3中第一部分所计算的像素级损失L pixel的定义如公式(4-1)所示:
Figure PCTCN2018117293-appb-000105
Figure PCTCN2018117293-appb-000106
公式(4-1)中,
Figure PCTCN2018117293-appb-000107
表示第i张原始图像I i在x方向的梯度或梯度表示,
Figure PCTCN2018117293-appb-000108
表示y方向的梯度或梯度表示。
Figure PCTCN2018117293-appb-000109
表示原始图像的梯度,
Figure PCTCN2018117293-appb-000110
表示原始图像I i的在x方向上的梯度,
Figure PCTCN2018117293-appb-000111
表示原始图像I i的在y方向上的梯度。F W表示第一部分的卷积神经网络模型,所以
Figure PCTCN2018117293-appb-000112
表示第i张原始图像I i的梯度经过卷积神经网络网络的结果,
Figure PCTCN2018117293-appb-000113
表示第i张原始图像I i在x方向上的梯度经过卷积神经网络网络的结果,
Figure PCTCN2018117293-appb-000114
表示第i张原始图像I i在y方向上的梯度经过卷积神经网络网络的结果。
Figure PCTCN2018117293-appb-000115
表示第i张原始图像I i的风格化参考图像的梯度,
Figure PCTCN2018117293-appb-000116
表示第i张原始图像I i的风格化参考图像在x方向上的梯度,
Figure PCTCN2018117293-appb-000117
表示第i张原始图像I i的风格化参考图像在y方向上的梯度。
图3中第二部分所计算的感知损失L feat的定义如公式(4-2)所示:
Figure PCTCN2018117293-appb-000118
Figure PCTCN2018117293-appb-000119
(4-2);公式(4-2)中,ψ j()表示VGG-16网络的第j层卷积层的输出特征图(feature map),C j、H j、W j分别表示第j层卷积层对应的特征图的通道数、高和宽。在实施的过程中,使用VGG-16的conv3-3层。
Figure PCTCN2018117293-appb-000120
Figure PCTCN2018117293-appb-000121
的含义同第一部分相同,
Figure PCTCN2018117293-appb-000122
表示原始图像的梯度经过网络的结果;
Figure PCTCN2018117293-appb-000123
表示原始图像的风格化参考图像的梯度。
总的目标函数是感知损失L feat与像素级损失L pixel二者的和;
L total=αL feat+βL pixel    (4-3);公式(4-3)中,α和β的取值均为实数。例如,在训练中可以将α和β分别设为整数。在训练中将α和β分别设为了10000和1,用英伟达的Titan X GPU进行了100K次的迭代,使用adam优化方法来对目标函数公式3进行优化,前50K次迭代,将学习率设为10 -8,后50K次,将学习率设为10 -9。需要说明的是,本领域的技术人员在实施的过程中,可以对公式(4-1)和公式(4-2)进行一些修改。对公式(4-1),只要这些修改能够表示出像素级损失即可,例如,将公式(4-1)中的
Figure PCTCN2018117293-appb-000124
修改为别的数值,例如
Figure PCTCN2018117293-appb-000125
Figure PCTCN2018117293-appb-000126
等等,将将公式(4-1)中的绝对值的平方修改为绝对值,或者,将将公式(4-1)中的绝对值的平方修改为绝对 值的平方根。
第三阶段、图像重建过程
当新输入一张图像,如新的自拍图像,为得到其对应的风格图像,采用如下的公式(5)来确定输出的风格化的图像。
Figure PCTCN2018117293-appb-000127
Figure PCTCN2018117293-appb-000128
公式(5)中,I表示新的自拍图像即初始图像,S表示新的自拍图像对应的风格图像。
Figure PCTCN2018117293-appb-000129
表示自拍图像x方向的梯度,
Figure PCTCN2018117293-appb-000130
表示自拍图像x方向的梯度经过训练好的模型的输出,同样的
Figure PCTCN2018117293-appb-000131
是自拍图像y方向的梯度,
Figure PCTCN2018117293-appb-000132
表示自拍图像y方向的梯度经过训练好的模型的输出,
Figure PCTCN2018117293-appb-000133
表示风格图像x方向的梯度,
Figure PCTCN2018117293-appb-000134
表示风格图像y方向的梯度。在上式中‖S-I‖是利用了原图的色彩信息进行图像重构,可以称为颜色置信度(color confidence);
Figure PCTCN2018117293-appb-000135
是利用了原图的结构信息进行图像重构,λ表示这两个信息的权重参数。在实施的过程中,λ取10。通过对上式进行优化,即可得到S,即新的自拍图像的风格图像。
从以上实施例可以看出,本申请实施例实现了一种面向自拍的图像风格转换算法,克服了之前的风格转换方法应用到人脸上时的两个重要缺点:一,人脸边缘的变形;二,人脸肤色的不一致。本申请实施例的神经网络结构完全是在梯度域进行学习。相比于其他的图像风格转换方法,本方法在自拍照片的风格转换,会克服之前方法边缘形变和颜色不一致的缺点,能够在实现图像风格转换的同时,对图像进行美化和增强。
在一些实施例中,第一部分是要训练的卷积神经网络51(第一卷积神经网络)可以采用如图5B的卷积神经网络,图5B为本申请实施例卷积神经网络模型的组成结构示意图,如图5B所示,该模型的结构包括:
输入层(input)501,自拍图像在x或y方向上的梯度作为输入;需要说明的是,h表示自拍图像在x或y方向上的梯度的高(high),w表示自拍图像在x或y方向的梯度的宽(width)。对于一幅自拍图像I来说,对自拍图像I在x方向上求梯度得到
Figure PCTCN2018117293-appb-000136
和对自拍图像I在y方向上求梯度得到
Figure PCTCN2018117293-appb-000137
然后将
Figure PCTCN2018117293-appb-000138
Figure PCTCN2018117293-appb-000139
的每一个颜色通道(或颜色分量)作为输入。如果采用RGB(Red Green Blue,红绿蓝)颜色模型,则有三个颜色通道;对应地,对于一幅自拍图像来说,就有6个输入,分别是
Figure PCTCN2018117293-appb-000140
在R颜色通道、
Figure PCTCN2018117293-appb-000141
在G颜色通道和
Figure PCTCN2018117293-appb-000142
在B颜色通道,
Figure PCTCN2018117293-appb-000143
在R颜色通道、
Figure PCTCN2018117293-appb-000144
在G颜色通道和
Figure PCTCN2018117293-appb-000145
在B颜色通道。conv1+ReLu1层、conv2+ReLu2层、conv3+ReLu3层、conv4+ReLu4层、conv5+ReLu5层、conv6+ReLu6层和conv7+ReLu7层;
经过卷积层和ReLu层后,输出的结果是一个特征图502,该特征图502的高为
Figure PCTCN2018117293-appb-000146
该特征图502的宽为
Figure PCTCN2018117293-appb-000147
该特征图502的通道数为c,其中,r是系数,r和c的取值与本申请实施例中的卷积神经网络模型的模型超参数有关,在本申请实施例中,模型超参数包括卷积核的大小(size)、卷积核的移动步长(stride)、输入特征图补的数据(padding)。一般来说,卷积核的个数决定输出特征图的通道数c。
上采样层,输入为511至51C,输出为521至52C。将输出的特征图按照通道数c拆解开,这样得到c个特征图511至51C,对511至51C中的每一个特征图上采样至初始图像的大小。在输入层501中提到初始图像即自拍图像,自拍图像的大小为h*w,那么上采样层输出的上采样图像的大小521至52C也为h*w。在上采样层中,输入511对应的输出为521,输入512对应的输出为522,以此类推,输入51C对应的输出为52C。
合成层531,输入为521至52C,输出为531;将上采样图像521至52C进行合并,得到特征图531;输出层,输入为531,输出为541;对特征图531进行卷积和激励,即先后输入到conv8、ReLu8和conv9,最终得到输出541,输出541的大小为原图的大小h*w。
需要说明的是,图5B所示的卷积神经网络模型可以用于替换图5A中网络部分53。在本申请实施例中,在上采样之前的卷积过程有7层,分别为conv1至conv7,在上采样之前的激励过程也有7层,分别为ReLu1至ReLu7。其中,7层卷积层(conv1至conv7)可以认为是像素级损失模型的第一卷积层集合,当然,还可以将7层卷积层和7层激励层(ReLu1至ReLu7)认为是像素级损失模型的第一卷积层集合。在上采样之后的也有两层卷积,分别为conv8和conv9;在上采样之后还有一层的激励过程,即激励层ReLu8。其中,2层卷积层(conv8和conv9)可以认为是像素级损失模型的第二卷积层集合,当然,还可以将2层卷积层和1层激励层(ReLu8)认为是像素级损失模型的第二卷积层集合。
本领域的技术人员应当理解的是,在上采样之前的卷积层的层数(第一卷积层集合中卷积层的层数)可以有变化,例如采用5层,9层、10层或者几十层,对应的,在上采样之前的激励层的层数(第一卷积层集合中激励层的层数)也可以有变化,例如采用5层、6层、9层、15层等等。在实施例中,在上采样之前,卷积层后面会跟随一个激励层,即上采样之前,一个卷积层与一个激励层是交替地,本领域的技术人员应当理解的是,上述卷积层与激励层的交替层数也可以变化,例如两个卷积层后跟随一个激励层,然后一个卷积层后跟随两个激励层。本申请实施例中,激励层采用的激励函数为ReLu,在一些实施例中,激励层还可以采用其他的激励函数,例如sigmoid函数。在图5B所述的实施例中未表现出池化层,在一些实施例中,还可以加入池化层。在上采样之后,卷积层的层数(第二卷积层集合中卷积层的层数)、以及卷积层与激励层的顺序都是可以变化的。
基于前述的实施例,本申请实施例提供一种图像风格转换装置,该装置包括所包括的各单元、以及各单元所包括的各模块,可以通过电子设备中的处理器来实现;当然也可通过具体的逻辑电路实现;在实施的过程中,处理器可以为中央处理器(CPU)、微处理器(MPU)、数字信号处理器(DSP)或现场可编程门阵列(FPGA)等。
图6为本申请实施例图像风格转换装置的组成结构示意图,如图6所示,所述装置600包括获取单元601、获得单元602和重构单元603,其中:
获取单元601,用于获取待进行风格转换的初始图像;获得单元602,用于将所述 初始图像的梯度输入到图像风格转换模型,从所述图像风格转换模型获得所述初始图像的在梯度域上的特征图;所述图像风格转换模型是在梯度域基于像素级损失和感知损失训练得到;重构单元603,用于根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像。
在一些实施例中,所述装置还包括训练单元,用于训练所述图像风格转换模型,所述图像风格转换模型的训练目标为总的损失L total最小,其中,L total采用下式来表示:
L total=αL feat+βL pixel;其中,所述L feat表示感知损失,所述L pixel表示像素级损失,所述α和所述β的取值均为实数。
在一些实施例中,所述α与所述β的比值大于10且小于10的五次方。
在一些实施例中,所述图像风格转换模型包括像素级损失模型和感知损失模型,其中,感知损失模型是通过在梯度域将像素级损失最小作为训练目标而得到的像素级损失模型,感知损失模型是通过在梯度域训练将感知损失最小作为训练目标而得到的。
在一些实施例中,所述训练单元包括:第一输入模块,用于将训练样本的梯度输入所述像素级损失模型,从所述像素级损失模型获得所述训练样本的样本输出结果;第一确定模块,用于确定所述训练样本对应的风格化的参考图像的梯度;第一训练模块,用于根据所述参考图像的梯度在所述感知损失模型的第j层卷积层的第一输出特征图,和根据样本输出结果在所述感知损失模型的第j层卷积层的第二输出特征图训练所述感知损失模型。
在一些实施例中,所述第一训练模块,用于采用下式训练所述感知损失模型:
Figure PCTCN2018117293-appb-000148
其中,
Figure PCTCN2018117293-appb-000149
表示第i个训练样本的梯度,F W表示像素级损失模型,
Figure PCTCN2018117293-appb-000150
表示第i个训练样本的梯度经过像素级损失模型的输出结果;
Figure PCTCN2018117293-appb-000151
表示第i个训练样本的风格化的参考图像的梯度;ψ j()表示感知损失模型采用感知损失模型时的第j层卷积层的输出特征图,C jH jW j别表示第j层卷积层对应的特征图的通道数、高和宽。
在一些实施例中,当所述感知损失模型采用VGG16时,第j层卷积层为conv3-3。
在一些实施例中,所述训练单元还包括:第二确定模块,用于确定训练样本的梯度;第二输入模块,用于将所述训练样本的梯度作为所述像素级损失模型的输入,从所述像素级损失模型获得样本输出结果;第三确定模块,用于确定所述训练样本对应的风格化的参考图像的梯度;第二训练模块,用于根据所述参考图像的梯度和样本输出结果训练所述像素级损失模型。
在一些实施例中,所述像素级损失模型包括第一卷积层集合、上采样层和第二卷积层集合,所述根据所述参考图像的梯度和样本输出结果训练所述像素级损失模型包括:将所述训练样本的梯度输入到第一卷积层集合,得到作为样本特征图;将所述样本特征图输入到上采样层,上采样至所述初始图像的像素尺寸;将上采样后的样本特征图输入到第二卷积层集合,得到样本输出结果。
在一些实施例中,所述第二训练模块,用于根据每一训练样本的
Figure PCTCN2018117293-appb-000152
与对应的
Figure PCTCN2018117293-appb-000153
之差的绝对值训练所述像素级损失模型;其中,
Figure PCTCN2018117293-appb-000154
表示第i个训练样本的梯度,F W表示像素级损失模型,
Figure PCTCN2018117293-appb-000155
表示第i个训练样本的梯度经过像素级损失模型F W的输出结果;
Figure PCTCN2018117293-appb-000156
表示第i个训练样本的风格化的参考图像的梯度。
在一些实施例中,所述第二训练模块,用于采用下式训练所述像素级损失模型:
Figure PCTCN2018117293-appb-000157
其中,
Figure PCTCN2018117293-appb-000158
表示第i个训练样本的梯度,F W表示像素级损失模型,
Figure PCTCN2018117293-appb-000159
表示第i个训练样本的梯度经过像素级损失模型F W的输出结果;
Figure PCTCN2018117293-appb-000160
表示第i个训练样本的风格化的参考图像的梯度,D表示训练样本集合中的样本数。
在一些实施例中,所述重构单元,用于将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,作为所述风格图像。
在一些实施例中,所述与所述初始图像的在梯度域上的特征图满足结构相似度条件,包括:所述风格图像与所述初始图像的结构差异程度小于相似度阈值或者,所述风格图像与所述初始图像的结构差异程度最小,其中,结构差异程度为梯度域上的风格图像与所述初始图像的在梯度域上的特征图在至少一个参考方向的变化趋势。
在一些实施例中,所述重构单元,用于:根据
Figure PCTCN2018117293-appb-000161
Figure PCTCN2018117293-appb-000162
进行图像重构,得到风格图像;其中:
Figure PCTCN2018117293-appb-000163
表示所述初始图像在x方向的梯度,
Figure PCTCN2018117293-appb-000164
表示所述初始图像在x方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
Figure PCTCN2018117293-appb-000165
表示所述初始图像在y方向的梯度,
Figure PCTCN2018117293-appb-000166
表示所述初始图像在y方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
Figure PCTCN2018117293-appb-000167
表示风格图像在x方向的梯度,
Figure PCTCN2018117293-appb-000168
表示风格图像在y方向的梯度。
在一些实施例中,所述重构单元,用于根据所述初始图像的颜色信息和所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像。
在一些实施例中,重构单元,用于将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,且将与初始图像满足颜色相似度条件的图像,作为所述风格图像。
在一些实施例中,所述装置还包括:提取单元,用于对所述初始图像进行特征提取,得到所述初始图像中的人脸区域;对应地,所述重构单元,用于将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,且将与所述初始图像中的人脸区域满足颜色相似度条件的图像,作为所述风格图像。
在一些实施例中,所述重构单元,用于根据
Figure PCTCN2018117293-appb-000169
Figure PCTCN2018117293-appb-000170
进行图像重构,得到风格图像;其中:I表示初始图像,S表示风格图像,
Figure PCTCN2018117293-appb-000171
表示所述初始图像在x方向的梯度,
Figure PCTCN2018117293-appb-000172
表示所述初始图像在x方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
Figure PCTCN2018117293-appb-000173
表示所述初始图像在y方向的梯度,
Figure PCTCN2018117293-appb-000174
表示所述初始图像在y方向的梯度经过图像风格转换模型的在梯度域 上的特征图,
Figure PCTCN2018117293-appb-000175
表示风格图像在x方向的梯度,
Figure PCTCN2018117293-appb-000176
表示风格图像在y方向的梯度。
在一些实施例中,所述获得单元,包括:第四确定模块,用于确定所述初始图像在至少一个参考方向的梯度;获得模块,用于将在至少一个参考方向的梯度输入到图像风格转换模型,从所述图像风格转换模型对应获得初始图像在至少一个参考方向的在梯度域上的特征图;对应地,所述重构单元,用于根据在至少一个参考方向的在梯度域上的特征图进行图像重构,得到风格图像。
在一些实施例中,所述至少一个参考方向包括在平面参考坐标系中的x、y方向上,对应地,确定单元,用于确定所述初始图像分别在x、y方向上的梯度;所述获得单元,用于分别将在x、y方向上的梯度输入到图像风格转换模型,从所述图像风格转换模型对应获得初始图像在x、y方向上的在梯度域上的特征图;对应地,所述重构单元,用于根据在x、y方向上的在梯度域上的特征图进行图像重构,得到风格图像。
以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请装置实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。需要说明的是,本申请实施例中,如果以软件功能模块的形式实现上述的图像风格转换方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个电子设备或服务器等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本申请实施例不限制于任何特定的硬件和软件结合。对应地,本申请实施例提供一种计算机设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述的图像风格转换方法中的步骤。
本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时上述的图像风格转换方法中的步骤。本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括计算机可执行指令,该计算机可执行指令被执行后,能够实现上述图像风格转换方法中的步骤。这里需要指出的是:以上存储介质和设备实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请存储介质和设备实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。
需要说明的是,图7为本申请实施例中计算机设备的一种硬件实体示意图,如图7所示,该计算机设备700的硬件实体包括:处理器701、通信接口702和存储器703,其中:处理器701通常控制计算机设备700的总体操作。通信接口702可以使计算机设备通过网络与其他终端或服务器通信。存储器703配置为存储由处理器701可执行的指令和应用,还可以缓存待处理器701以及计算机设备700中各模块待处理或已经处理的数据(例如,图像数据、音频数据、语音通信数据和视频通信数据),可以通过闪存(FLASH) 或随机访问存储器(Random Access Memory,RAM)实现。
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
另外,在本申请各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是电子设备或者服务器等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (37)

  1. 一种图像风格转换方法,所述方法包括:
    获取待进行风格转换的初始图像;
    将所述初始图像的梯度输入到图像风格转换模型,从所述图像风格转换模型获得所述初始图像的在梯度域上的特征图;所述图像风格转换模型是在梯度域基于像素级损失和感知损失训练得到;
    根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像。
  2. 根据权利要求1所述的方法,所述方法还包括:训练所述图像风格转换模型;其中,所述图像风格转换模型的训练目标为总的损失L total最小,其中,L total采用下式来表示:L total=αL feat+βL pixel;其中,所述L feat表示感知损失,所述L pixel表示像素级损失,所述α和所述β的取值均为实数。
  3. 根据权利要求1所述的方法,所述图像风格转换模型包括像素级损失模型和感知损失模型,其中所述像素级损失模型是通过在梯度域将像素级损失最小作为训练目标而得到的,所述感知损失模型是通过在梯度域训练将感知损失最小作为训练目标而得到的。
  4. 根据权利要求3所述的方法,所述像素级损失模型和所述感知损失模型的训练过程包括:将训练样本的梯度输入所述像素级损失模型,从所述像素级损失模型获得所述训练样本的样本输出结果;
    确定所述训练样本对应的风格化的参考图像的梯度;
    根据所述参考图像的梯度在所述感知损失模型的第j层卷积层的第一输出特征图,和根据样本输出结果在所述感知损失模型的第j层卷积层的第二输出特征图,训练所述感知损失模型。
  5. 根据权利要求4所述的方法,所述根据所述参考图像的梯度在所述感知损失模型的第j层卷积层的第一输出特征图,和根据样本输出结果在所述感知损失模型的第j层卷积层的第二输出特征图,训练所述感知损失模型,包括:采用下式训练所述感知损失模型:
    Figure PCTCN2018117293-appb-100001
    其中,
    Figure PCTCN2018117293-appb-100002
    表示第i个训练样本的梯度,F W表示像素级损失模型,
    Figure PCTCN2018117293-appb-100003
    表示第i个训练样本的梯度经过像素级损失模型的输出结果;
    Figure PCTCN2018117293-appb-100004
    表示第i个训练样本的风格化的参考图像的梯度;ψ j( )表示感知损失模型采用感知损失模型时的第j层卷积层的输出特征图,C jH jW j别表示第j层卷积层对应的特征图的通道数、高和宽。
  6. 根据权利要求3所述的方法,所述像素级损失模型的训练过程包括:将训练样本的梯度作为所述像素级损失模型的输入,从所述像素级损失模型获得样本输出结果;
    确定所述训练样本对应的风格化的参考图像的梯度;
    根据所述参考图像的梯度和样本输出结果训练所述像素级损失模型。
  7. 根据权利要求4所述的方法,所述像素级损失模型包括第一卷积层集合、上采样层和第二卷积层集合,所述根据所述参考图像的梯度和样本输出结果训练所述像素级损失模型包括:
    将所述训练样本的梯度输入到所述第一卷积层集合,得到样本特征图;
    将所述样本特征图输入到所述上采样层,上采样至所述初始图像的像素尺寸;
    将上采样后的样本特征图输入到所述第二卷积层集合,得到样本输出结果。
  8. 根据权利要求7所述的方法,所述根据所述参考图像的梯度和样本输出结果训练所述像素级损失模型,包括:根据每一训练样本的
    Figure PCTCN2018117293-appb-100005
    与对应的
    Figure PCTCN2018117293-appb-100006
    之差的绝对值训练所述像素级损失模型;
    其中,
    Figure PCTCN2018117293-appb-100007
    表示第i个训练样本的梯度,F W表示像素级损失模型,
    Figure PCTCN2018117293-appb-100008
    表示第i个训练样本的梯度经过像素级损失模型F W的输出结果;
    Figure PCTCN2018117293-appb-100009
    表示第i个训练样本的风格化的参考图像的梯度。
  9. 根据权利要求8所述的方法,所述根据每一训练样本的
    Figure PCTCN2018117293-appb-100010
    与对应的
    Figure PCTCN2018117293-appb-100011
    之差的绝对值训练所述像素级损失模型,包括:采用下式训练所述像素级损失模型:
    Figure PCTCN2018117293-appb-100012
    其中,
    Figure PCTCN2018117293-appb-100013
    表示第i个训练样本的梯度,F W表示像素级损失模型,
    Figure PCTCN2018117293-appb-100014
    表示第i个训练样本的梯度经过像素级损失模型F W的输出结果;
    Figure PCTCN2018117293-appb-100015
    表示第i个训练样本的风格化的参考图像的梯度,D表示训练样本集合中的样本数。
  10. 根据权利要求1至9任一项所述的方法,所述根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像,包括:将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,作为所述风格图像;
    其中,所述与所述初始图像的在梯度域上的特征图满足结构相似度条件,包括:
    所述风格图像与所述初始图像的结构差异程度小于相似度阈值,或者,所述风格图像与所述初始图像的结构差异程度最小,其中,结构差异程度为梯度域上的风格图像与所述初始图像的在梯度域上的特征图在至少一个参考方向的变化趋势。
  11. 根据权利要求10所述的方法,所述根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像,包括:根据
    Figure PCTCN2018117293-appb-100016
    进行图像重构,得到风格图像;其中:
    Figure PCTCN2018117293-appb-100017
    表示所述初始图像在x方向的梯度,
    Figure PCTCN2018117293-appb-100018
    表示所述初始图像在x方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
    Figure PCTCN2018117293-appb-100019
    表示所述初始图像在y方向的梯度,
    Figure PCTCN2018117293-appb-100020
    表示所述初始图像在y方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
    Figure PCTCN2018117293-appb-100021
    表示风格图像在x方向的梯度,
    Figure PCTCN2018117293-appb-100022
    表示风格图像在y方向的梯度。
  12. 根据权利要求1至9任一项所述的方法,所述根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像,包括:根据所述初始图像的颜色信息和所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像。
  13. 根据权利要求12所述的方法,所述根据所述初始图像的颜色信息和所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像,包括:将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,且将与所述初始图像满足颜色相似度条件的图像,作为所述风格图像。
  14. 根据权利要求12所述的方法,所述方法还包括:对所述初始图像进行特征提取,得到所述初始图像中的人脸区域;对应地,所述根据所述初始图像的颜色信息和所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像,包括:将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,且将与所述初始图像中的人脸区域满足颜色相似度条件的图像,作为所述风格图像。
  15. 根据权利要求13所述的方法,所述将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,且将与所述初始图像满足颜色相似度条件的图像,作为所述风格图像,包括:根据
    Figure PCTCN2018117293-appb-100023
    进行图像重构,得到风格图像;其中:I表示初始图像,S表示风格图像,
    Figure PCTCN2018117293-appb-100024
    表示所述初始图像在x方向的梯度,
    Figure PCTCN2018117293-appb-100025
    表示所述初始图像在x方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
    Figure PCTCN2018117293-appb-100026
    表示所述初始图像在y方向的梯度,
    Figure PCTCN2018117293-appb-100027
    表示所述初始图像在y方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
    Figure PCTCN2018117293-appb-100028
    表示风格图像在x方向的梯度,
    Figure PCTCN2018117293-appb-100029
    表示风格图像在y方向的梯度。
  16. 根据权利要求1至9任一项所述的方法,所述将所述初始图像的梯度输入到图像风格转换模型,从所述图像风格转换模型获得初始图像的在梯度域上的特征图,包括:确定所述初始图像在至少一个参考方向的梯度;
    将在至少一个参考方向的梯度输入到图像风格转换模型,从所述图像风格转换模型对应获得初始图像在至少一个参考方向的在梯度域上的特征图;
    对应地,根据在至少一个参考方向的在梯度域上的特征图进行图像重构,得到风格图像。
  17. 根据权利要求16所述的方法,所述至少一个参考方向包括在平面参考坐标系中的x、y方向上,对应地,确定所述初始图像分别在x、y方向上的梯度;
    分别将在x、y方向上的梯度输入到图像风格转换模型,从所述图像风格转换模型对应获得初始图像在x、y方向上的在梯度域上的特征图;
    对应地,根据在x、y方向上的在梯度域上的特征图进行图像重构,得到风格图像。
  18. 一种图像风格转换装置,所述装置包括:
    获取单元,用于获取待进行风格转换的初始图像;
    获得单元,用于将所述初始图像的梯度输入到图像风格转换模型,从所述图像风格转换模型获得所述初始图像的在梯度域上的特征图;所述图像风格转换模型是在梯度域基于像素级损失和感知损失训练得到;
    重构单元,用于根据所述初始图像的在梯度域上的特征图进行图像重构,得到风格 图像。
  19. 根据权利要求18所述的装置,所述装置还包括:
    训练单元,用于训练所述图像风格转换模型,其中所述图像风格转换模型的训练目标为总的损失L total最小,其中,L total采用下式来表示:L total=αL feat+βL pixel;其中,所述L feat表示感知损失,所述L pixel表示像素级损失,所述α和所述β的取值均为实数。
  20. 根据权利要求18所述的装置,所述图像风格转换模型包括像素级损失模型和感知损失模型,其中所述像素级损失模型是通过在梯度域将像素级损失最小作为训练目标而得到,所述感知损失模型是通过在梯度域训练将感知损失最小作为训练目标而得到。
  21. 根据权利要求20所述的装置,所述训练单元包括:
    第一输入模块,用于将训练样本的梯度输入所述像素级损失模型,从所述像素级损失模型获得所述训练样本的样本输出结果;
    第一确定模块,用于确定所述训练样本对应的风格化的参考图像的梯度;
    第一训练模块,用于根据所述参考图像的梯度在所述感知损失模型的第j层卷积层的第一输出特征图,和根据样本输出结果在所述感知损失模型的第j层卷积层的第二输出特征图训练所述感知损失模型。
  22. 根据权利要求21所述的装置,所述第一训练模块,用于采用下式训练所述感知损失模型:
    Figure PCTCN2018117293-appb-100030
    其中,
    Figure PCTCN2018117293-appb-100031
    表示第i个训练样本的梯度,F W表示像素级损失模型,
    Figure PCTCN2018117293-appb-100032
    表示第i个训练样本的梯度经过像素级损失模型的输出结果;
    Figure PCTCN2018117293-appb-100033
    表示第i个训练样本的风格化的参考图像的梯度;ψ j( )表示感知损失模型采用感知损失模型时的第j层卷积层的输出特征图,C jH jW j别表示第j层卷积层对应的特征图的通道数、高和宽。
  23. 根据权利要求20所述的装置,所述训练单元包括:
    第二确定模块,用于确定训练样本的梯度;
    第二输入模块,用于将所述训练样本的梯度作为所述像素级损失模型的输入,从所述像素级损失模型获得样本输出结果;
    第三确定模块,用于确定所述训练样本对应的风格化的参考图像的梯度;
    第二训练模块,用于根据所述参考图像的梯度和样本输出结果训练所述像素级损失模型。
  24. 根据权利要求21所述的装置,所述像素级损失模型包括第一卷积层集合、上采样层和第二卷积层集合,所述第二训练模块包括:第一输入子模块,用于将所述训练样本的梯度输入到所述第一卷积层集合,得到样本特征图;
    上采样子模块,用于将所述样本特征图输入到所述上采样层,上采样至所述初始图像的像素尺寸;
    第二输入子模块,用于将上采样后的样本特征图输入到所述第二卷积层集合,得到 样本输出结果。
  25. 根据权利要求24所述的装置,所述第二训练模块,用于根据每一训练样本的
    Figure PCTCN2018117293-appb-100034
    与对应的
    Figure PCTCN2018117293-appb-100035
    之差的绝对值训练所述像素级损失模型;其中,
    Figure PCTCN2018117293-appb-100036
    表示第i个训练样本的梯度,F W表示像素级损失模型,
    Figure PCTCN2018117293-appb-100037
    表示第i个训练样本的梯度经过像素级损失模型F W的输出结果;
    Figure PCTCN2018117293-appb-100038
    表示第i个训练样本的风格化的参考图像的梯度。
  26. 根据权利要求25所述的装置,所述第二训练模块,用于采用下式训练所述像素级损失模型:
    Figure PCTCN2018117293-appb-100039
    其中,
    Figure PCTCN2018117293-appb-100040
    表示第i个训练样本的梯度,F W表示像素级损失模型,
    Figure PCTCN2018117293-appb-100041
    表示第i个训练样本的梯度经过像素级损失模型F W的输出结果;
    Figure PCTCN2018117293-appb-100042
    表示第i个训练样本的风格化的参考图像的梯度,D表示训练样本集合中的样本数。
  27. 根据权利要求18至26任一项所述的装置,所述重构单元,用于将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,作为所述风格图像;其中,所述与所述初始图像的在梯度域上的特征图满足结构相似度条件,包括:所述风格图像与所述初始图像的结构差异程度小于相似度阈值,或者,所述风格图像与所述初始图像的结构差异程度最小,其中,结构差异程度为梯度域上的风格图像与所述初始图像的在梯度域上的特征图在至少一个参考方向的变化趋势。
  28. 根据权利要求27所述的装置,所述重构单元,用于:根据
    Figure PCTCN2018117293-appb-100043
    Figure PCTCN2018117293-appb-100044
    进行图像重构,得到风格图像;其中:
    Figure PCTCN2018117293-appb-100045
    表示所述初始图像在x方向的梯度,
    Figure PCTCN2018117293-appb-100046
    表示所述初始图像在x方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
    Figure PCTCN2018117293-appb-100047
    表示所述初始图像在y方向的梯度,
    Figure PCTCN2018117293-appb-100048
    表示所述初始图像在y方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
    Figure PCTCN2018117293-appb-100049
    表示风格图像在x方向的梯度,
    Figure PCTCN2018117293-appb-100050
    表示风格图像在y方向的梯度。
  29. 根据权利要求18至26任一项所述的装置,所述重构单元,用于根据所述初始图像的颜色信息和所述初始图像的在梯度域上的特征图进行图像重构,得到风格图像。
  30. 根据权利要求29所述的装置,所述重构单元,用于将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,且将与所述初始图像满足颜色相似度条件的图像,作为所述风格图像。
  31. 根据权利要求29所述的装置,所述装置还包括:提取单元,用于对所述初始图像进行特征提取,得到所述初始图像中的人脸区域;对应地,所述重构单元,用于将与所述初始图像的在梯度域上的特征图满足结构相似度条件的图像,且将与所述初始图像中的人脸区域满足颜色相似度条件的图像,作为所述风格图像。
  32. 根据权利要求30所述的装置,所述重构单元,用于根据
    Figure PCTCN2018117293-appb-100051
    Figure PCTCN2018117293-appb-100052
    进行图像重构,得到风格图像;
    其中:I表示初始图像,S表示风格图像,
    Figure PCTCN2018117293-appb-100053
    表示所述初始图像在x方向的梯度,
    Figure PCTCN2018117293-appb-100054
    表示所述初始图像在x方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
    Figure PCTCN2018117293-appb-100055
    表示所述初始图像在y方向的梯度,
    Figure PCTCN2018117293-appb-100056
    表示所述初始图像在y方向的梯度经过所述图像风格转换模型的在梯度域上的特征图,
    Figure PCTCN2018117293-appb-100057
    表示风格图像在x方向的梯度,
    Figure PCTCN2018117293-appb-100058
    表示风格图像在y方向的梯度。
  33. 根据权利要求18至26任一项所述的装置,所述获得单元,包括:
    第四确定模块,用于确定所述初始图像在至少一个参考方向的梯度;
    获得模块,用于将在至少一个参考方向的梯度输入到图像风格转换模型,从所述图像风格转换模型对应获得初始图像在至少一个参考方向的在梯度域上的特征图;
    对应地,所述重构单元,用于根据在至少一个参考方向的在梯度域上的特征图进行图像重构,得到风格图像。
  34. 根据权利要求33所述的装置,所述至少一个参考方向包括在平面参考坐标系中的x、y方向上,对应地,确定单元,用于确定所述初始图像分别在x、y方向上的梯度;所述获得单元,用于分别将在x、y方向上的梯度输入到图像风格转换模型,从所述图像风格转换模型对应获得初始图像在x、y方向上的在梯度域上的特征图;对应地,所述重构单元,用于根据在x、y方向上的在梯度域上的特征图进行图像重构,得到风格图像。
  35. 一种计算机程序产品,所述计算机程序产品包括计算机可执行指令,该计算机可执行指令被执行后,能够实现权利要求1至17任一项所述图像风格转换方法中的步骤。
  36. 一种计算机存储介质,所述计算机存储介质上存储有计算机可执行指令,该计算机可执行指令被执行后,能够实现权利要求1至17任一项所述图像风格转换方法中的步骤。
  37. 一种计算机设备,所述计算机设备包括存储器和处理器,所述存储器上存储有计算机可执行指令,所述处理器执行所述程序时实现权利要求1至17任一项所述图像风格转换方法中的步骤。
PCT/CN2018/117293 2018-08-13 2018-11-23 一种图像风格转换方及装置、设备、存储介质 WO2020034481A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2019569805A JP6874168B2 (ja) 2018-08-13 2018-11-23 画像スタイル変換方法および装置、機器、ならびに記憶媒体
SG11202000062RA SG11202000062RA (en) 2018-08-13 2018-11-23 Image style transform methods and apparatuses, devices and storage media
US16/726,885 US11200638B2 (en) 2018-08-13 2019-12-25 Image style transform methods and apparatuses, devices and storage media

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810917979.7A CN109308679B (zh) 2018-08-13 2018-08-13 一种图像风格转换方法及装置、设备、存储介质
CN201810917979.7 2018-08-13

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/726,885 Continuation US11200638B2 (en) 2018-08-13 2019-12-25 Image style transform methods and apparatuses, devices and storage media

Publications (1)

Publication Number Publication Date
WO2020034481A1 true WO2020034481A1 (zh) 2020-02-20

Family

ID=65223859

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/117293 WO2020034481A1 (zh) 2018-08-13 2018-11-23 一种图像风格转换方及装置、设备、存储介质

Country Status (6)

Country Link
US (1) US11200638B2 (zh)
JP (1) JP6874168B2 (zh)
CN (1) CN109308679B (zh)
SG (1) SG11202000062RA (zh)
TW (1) TWI749356B (zh)
WO (1) WO2020034481A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652846A (zh) * 2020-04-30 2020-09-11 成都数之联科技有限公司 一种基于特征金字塔卷积神经网络的半导体缺陷识别方法
CN111932445A (zh) * 2020-07-27 2020-11-13 广州市百果园信息技术有限公司 对风格迁移网络的压缩方法及风格迁移方法、装置和系统
CN112102154A (zh) * 2020-08-20 2020-12-18 北京百度网讯科技有限公司 图像处理方法、装置、电子设备和存储介质
CN113496238A (zh) * 2020-03-20 2021-10-12 北京京东叁佰陆拾度电子商务有限公司 模型训练方法、点云数据风格化方法、装置、设备及介质
CN114818803A (zh) * 2022-04-25 2022-07-29 上海韶脑传感技术有限公司 基于神经元优化的单侧肢体患者运动想象脑电建模方法
TWI779824B (zh) * 2021-09-10 2022-10-01 瑞昱半導體股份有限公司 卷積神經網路的圖像處理方法與系統

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018033137A1 (zh) * 2016-08-19 2018-02-22 北京市商汤科技开发有限公司 在视频图像中展示业务对象的方法、装置和电子设备
CN109308679B (zh) * 2018-08-13 2022-08-30 深圳市商汤科技有限公司 一种图像风格转换方法及装置、设备、存储介质
CN109766895A (zh) * 2019-01-03 2019-05-17 京东方科技集团股份有限公司 用于图像风格迁移的卷积神经网络的训练方法和图像风格迁移方法
CN111583165B (zh) * 2019-02-19 2023-08-08 京东方科技集团股份有限公司 图像处理方法、装置、设备及存储介质
CN110070482B (zh) * 2019-03-14 2023-05-02 北京字节跳动网络技术有限公司 图像处理方法、装置和计算机可读存储介质
EP3731154A1 (en) * 2019-04-26 2020-10-28 Naver Corporation Training a convolutional neural network for image retrieval with a listwise ranking loss function
CN111860823B (zh) * 2019-04-30 2024-06-11 北京市商汤科技开发有限公司 神经网络训练、图像处理方法及装置、设备及存储介质
CN110232401B (zh) * 2019-05-05 2023-08-04 平安科技(深圳)有限公司 基于图片转换的病灶判断方法、装置、计算机设备
CN110189246B (zh) * 2019-05-15 2023-02-28 北京字节跳动网络技术有限公司 图像风格化生成方法、装置及电子设备
CN112561778B (zh) * 2019-09-26 2024-07-02 北京字节跳动网络技术有限公司 图像风格化处理方法、装置、设备及存储介质
US11625576B2 (en) * 2019-11-15 2023-04-11 Shanghai United Imaging Intelligence Co., Ltd. Systems and methods for image style transformation
US11080833B2 (en) * 2019-11-22 2021-08-03 Adobe Inc. Image manipulation using deep learning techniques in a patch matching operation
KR102172644B1 (ko) * 2020-01-13 2020-11-02 (주)에스프레소미디어 스타일 변환 외부 연동 시스템, 그리고 스타일 변환 외부 연동 서버
CN111340905B (zh) * 2020-02-13 2023-08-04 北京百度网讯科技有限公司 图像风格化方法、装置、设备和介质
CN111494946B (zh) * 2020-04-23 2021-05-18 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及计算机可读存储介质
CN113570508A (zh) * 2020-04-29 2021-10-29 上海耕岩智能科技有限公司 图像修复方法及装置、存储介质、终端
CN111402143B (zh) * 2020-06-03 2020-09-04 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及计算机可读存储介质
CN111667401B (zh) * 2020-06-08 2022-11-29 武汉理工大学 多层次渐变图像风格迁移方法及系统
CN111986075B (zh) * 2020-08-12 2022-08-09 兰州交通大学 一种目标边缘清晰化的风格迁移方法
CN112070668B (zh) * 2020-08-18 2024-07-12 中科南京人工智能创新研究院 一种基于深度学习和边缘增强的图像超分辨方法
US20220121931A1 (en) 2020-10-16 2022-04-21 Adobe Inc. Direct regression encoder architecture and training
CN112288622B (zh) * 2020-10-29 2022-11-08 中山大学 一种基于多尺度生成对抗网络的伪装图像生成方法
CN112233041B (zh) * 2020-11-05 2024-07-05 Oppo广东移动通信有限公司 图像美颜处理方法、装置、存储介质与电子设备
US20220156987A1 (en) * 2020-11-16 2022-05-19 Disney Enterprises, Inc. Adaptive convolutions in neural networks
CN112348739B (zh) * 2020-11-27 2021-09-28 广州博冠信息科技有限公司 图像处理方法、装置、设备及存储介质
CN112686269B (zh) * 2021-01-18 2024-06-25 北京灵汐科技有限公司 池化方法、装置、设备和存储介质
KR102573822B1 (ko) * 2021-02-04 2023-09-04 (주)비케이 벡터 이미지의 화풍 변환 및 재생 방법
US11195080B1 (en) * 2021-03-29 2021-12-07 SambaNova Systems, Inc. Lossless tiling in convolution networks—tiling configuration
CN113240576B (zh) * 2021-05-12 2024-04-30 北京达佳互联信息技术有限公司 风格迁移模型的训练方法、装置、电子设备及存储介质
CN113344772B (zh) * 2021-05-21 2023-04-07 武汉大学 一种用于地图艺术化的迁移模型的训练方法和计算机设备
CN113256750B (zh) * 2021-05-26 2023-06-23 武汉中科医疗科技工业技术研究院有限公司 医疗图像风格重建方法、装置、计算机设备和存储介质
CN113052786B (zh) * 2021-05-31 2021-09-03 北京星天科技有限公司 一种声呐图像合成方法和装置
CN113763233B (zh) * 2021-08-04 2024-06-21 深圳盈天下视觉科技有限公司 一种图像处理方法、服务器及拍照设备
US11989916B2 (en) * 2021-10-11 2024-05-21 Kyocera Document Solutions Inc. Retro-to-modern grayscale image translation for preprocessing and data preparation of colorization
CN114004905B (zh) * 2021-10-25 2024-03-29 北京字节跳动网络技术有限公司 人物风格形象图的生成方法、装置、设备及存储介质
CN117974425A (zh) * 2024-03-01 2024-05-03 艺咖(北京)科技有限公司 一种基于扩散模型的二维人脸重建方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105719327A (zh) * 2016-02-29 2016-06-29 北京中邮云天科技有限公司 一种艺术风格化图像处理方法
CN107171932A (zh) * 2017-04-27 2017-09-15 腾讯科技(深圳)有限公司 一种图片风格转换方法、装置及系统
CN107277615A (zh) * 2017-06-30 2017-10-20 北京奇虎科技有限公司 直播风格化处理方法、装置、计算设备及存储介质
CN107481185A (zh) * 2017-08-24 2017-12-15 深圳市唯特视科技有限公司 一种基于视频图像优化的风格转换方法
CN107578367A (zh) * 2017-04-25 2018-01-12 北京陌上花科技有限公司 一种风格化图像的生成方法及装置

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6064768A (en) * 1996-07-29 2000-05-16 Wisconsin Alumni Research Foundation Multiscale feature detector using filter banks
AUPS170902A0 (en) * 2002-04-12 2002-05-16 Canon Kabushiki Kaisha Face detection and tracking in a video sequence
US6862024B2 (en) * 2002-04-17 2005-03-01 Mitsubishi Electric Research Laboratories, Inc. Enhancing textured range images using a 2D editor
US8306366B2 (en) * 2007-08-23 2012-11-06 Samsung Electronics Co., Ltd. Method and apparatus for extracting feature points from digital image
US8009921B2 (en) * 2008-02-19 2011-08-30 Xerox Corporation Context dependent intelligent thumbnail images
US8705847B2 (en) * 2011-09-30 2014-04-22 Cyberlink Corp. Method and system of two-dimensional to stereoscopic conversion
CN102360490B (zh) * 2011-09-30 2012-11-28 北京航空航天大学 基于颜色转换和编辑传播的图像季节特征增强方法
US9208539B2 (en) * 2013-11-30 2015-12-08 Sharp Laboratories Of America, Inc. Image enhancement using semantic components
CN106415594B (zh) * 2014-06-16 2020-01-10 北京市商汤科技开发有限公司 用于面部验证的方法和系统
US20150371360A1 (en) * 2014-06-20 2015-12-24 Qualcomm Incorporated Systems and methods for obtaining structural information from a digital image
US9826149B2 (en) * 2015-03-27 2017-11-21 Intel Corporation Machine learning of real-time image capture parameters
CN106022221B (zh) * 2016-05-09 2021-11-30 腾讯科技(深圳)有限公司 一种图像处理方法及处理系统
CN106780367B (zh) * 2016-11-28 2019-11-15 上海大学 基于字典学习的hdr照片风格转移方法
US20180197317A1 (en) * 2017-01-06 2018-07-12 General Electric Company Deep learning based acceleration for iterative tomographic reconstruction
US10262238B2 (en) * 2017-04-13 2019-04-16 Facebook, Inc. Panoramic camera systems
US10565757B2 (en) * 2017-06-09 2020-02-18 Adobe Inc. Multimodal style-transfer network for applying style features from multi-resolution style exemplars to input images
AU2017101166A4 (en) * 2017-08-25 2017-11-02 Lai, Haodong MR A Method For Real-Time Image Style Transfer Based On Conditional Generative Adversarial Networks
TWM558943U (zh) 2017-11-22 2018-04-21 Aiwin Technology Co Ltd 運用深度學習技術之智慧影像資訊及大數據分析系統
CN108280814B (zh) * 2018-02-08 2021-08-31 重庆邮电大学 基于感知损失的光场图像角度超分辨率重建方法
US10783622B2 (en) * 2018-04-25 2020-09-22 Adobe Inc. Training and utilizing an image exposure transformation neural network to generate a long-exposure image from a single short-exposure image
CN109308679B (zh) * 2018-08-13 2022-08-30 深圳市商汤科技有限公司 一种图像风格转换方法及装置、设备、存储介质
US10896534B1 (en) * 2018-09-19 2021-01-19 Snap Inc. Avatar style transformation using neural networks
US11310475B2 (en) * 2019-08-05 2022-04-19 City University Of Hong Kong Video quality determination system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105719327A (zh) * 2016-02-29 2016-06-29 北京中邮云天科技有限公司 一种艺术风格化图像处理方法
CN107578367A (zh) * 2017-04-25 2018-01-12 北京陌上花科技有限公司 一种风格化图像的生成方法及装置
CN107171932A (zh) * 2017-04-27 2017-09-15 腾讯科技(深圳)有限公司 一种图片风格转换方法、装置及系统
CN107277615A (zh) * 2017-06-30 2017-10-20 北京奇虎科技有限公司 直播风格化处理方法、装置、计算设备及存储介质
CN107481185A (zh) * 2017-08-24 2017-12-15 深圳市唯特视科技有限公司 一种基于视频图像优化的风格转换方法

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496238A (zh) * 2020-03-20 2021-10-12 北京京东叁佰陆拾度电子商务有限公司 模型训练方法、点云数据风格化方法、装置、设备及介质
CN111652846A (zh) * 2020-04-30 2020-09-11 成都数之联科技有限公司 一种基于特征金字塔卷积神经网络的半导体缺陷识别方法
CN111652846B (zh) * 2020-04-30 2022-08-16 成都数之联科技股份有限公司 一种基于特征金字塔卷积神经网络的半导体缺陷识别方法
CN111932445A (zh) * 2020-07-27 2020-11-13 广州市百果园信息技术有限公司 对风格迁移网络的压缩方法及风格迁移方法、装置和系统
CN112102154A (zh) * 2020-08-20 2020-12-18 北京百度网讯科技有限公司 图像处理方法、装置、电子设备和存储介质
CN112102154B (zh) * 2020-08-20 2024-04-26 北京百度网讯科技有限公司 图像处理方法、装置、电子设备和存储介质
TWI779824B (zh) * 2021-09-10 2022-10-01 瑞昱半導體股份有限公司 卷積神經網路的圖像處理方法與系統
CN114818803A (zh) * 2022-04-25 2022-07-29 上海韶脑传感技术有限公司 基于神经元优化的单侧肢体患者运动想象脑电建模方法

Also Published As

Publication number Publication date
US11200638B2 (en) 2021-12-14
SG11202000062RA (en) 2020-03-30
CN109308679B (zh) 2022-08-30
CN109308679A (zh) 2019-02-05
TWI749356B (zh) 2021-12-11
US20200134778A1 (en) 2020-04-30
JP6874168B2 (ja) 2021-05-19
JP2020533660A (ja) 2020-11-19
TW202009800A (zh) 2020-03-01

Similar Documents

Publication Publication Date Title
WO2020034481A1 (zh) 一种图像风格转换方及装置、设备、存储介质
US20210150678A1 (en) Very high-resolution image in-painting with neural networks
CN109949255B (zh) 图像重建方法及设备
EP3678059B1 (en) Image processing method, image processing apparatus, and a neural network training method
WO2021109876A1 (zh) 图像处理方法、装置、设备及存储介质
JP7417640B2 (ja) リアルタイム映像超高解像度
US11704844B2 (en) View synthesis robust to unconstrained image data
Cao et al. Image Super-Resolution via Adaptive $\ell _ {p}(0< p< 1) $ Regularization and Sparse Representation
WO2023000895A1 (zh) 图像风格转换方法、装置、电子设备和存储介质
US11915383B2 (en) Methods and systems for high definition image manipulation with neural networks
CN113256504A (zh) 图像处理的方法和电子设备
Huang et al. Hybrid image enhancement with progressive laplacian enhancing unit
Purkait et al. Image upscaling using multiple dictionaries of natural image patches
Jiang et al. Fast and high quality image denoising via malleable convolution
CN114830168A (zh) 图像重建方法、电子设备和计算机可读存储介质
CN110428422B (zh) 超像素采样网络
Mikaeli et al. Single-image super-resolution via patch-based and group-based local smoothness modeling
Li et al. Wavenhancer: Unifying wavelet and transformer for image enhancement
KR102548407B1 (ko) 뉴럴 네트워크가 포함된 상미분 방정식을 이용하는 점진적 이미지 해상도 향상 방법 및 장치와 이를 이용하는 이미지 초해상도 방법
Xu et al. An edge guided coarse-to-fine generative network for image outpainting
CN109447900A (zh) 一种图像超分辨率重建方法及装置
Tojo et al. Image denoising using multi scaling aided double decker convolutional neural network
Perla et al. Low Light Image Illumination Adjustment Using Fusion of MIRNet and Deep Illumination Curves
Li et al. Convolutional Neural Network Combined with Half‐Quadratic Splitting Method for Image Restoration
Li et al. Incorporating multiscale contextual loss for image style transfer

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019569805

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18930498

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.06.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18930498

Country of ref document: EP

Kind code of ref document: A1