WO2020200030A1 - 神经网络的训练方法、图像处理方法、图像处理装置和存储介质 - Google Patents

神经网络的训练方法、图像处理方法、图像处理装置和存储介质 Download PDF

Info

Publication number
WO2020200030A1
WO2020200030A1 PCT/CN2020/081375 CN2020081375W WO2020200030A1 WO 2020200030 A1 WO2020200030 A1 WO 2020200030A1 CN 2020081375 W CN2020081375 W CN 2020081375W WO 2020200030 A1 WO2020200030 A1 WO 2020200030A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
training
image
style
convolution
Prior art date
Application number
PCT/CN2020/081375
Other languages
English (en)
French (fr)
Inventor
刘瀚文
那彦波
朱丹
张丽杰
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Publication of WO2020200030A1 publication Critical patent/WO2020200030A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of the present disclosure relate to a neural network training method, image processing method, image processing device, and storage medium.
  • Deep learning technology based on artificial neural networks has made great progress in fields such as object classification, text processing, recommendation engines, image search, facial recognition, age and speech recognition, human-machine dialogue, and emotional computing.
  • fields such as object classification, text processing, recommendation engines, image search, facial recognition, age and speech recognition, human-machine dialogue, and emotional computing.
  • deep learning technology has made breakthroughs in the field of human-like data perception. Deep learning technology can be used to describe image content, identify objects in complex environments, and Voice recognition in a noisy environment, etc.
  • deep learning technology can also solve the problem of image generation and fusion.
  • At least one embodiment of the present disclosure provides a neural network training method, including: training a discriminant network based on a generative network; training the generative network based on the discriminant network; and alternately performing the above training process, To obtain a target network based on the generated network after training;
  • the target network is used to perform style transfer processing on an input image to obtain an output image, and the resolution of the output image is higher than the resolution of the input image;
  • training the generation network includes: using the generation network to perform style transfer processing on the first training input image to generate a first training output image and a second training output image respectively, wherein The resolution of the first training output image is higher than the resolution of the first training input image, and the resolution of the second training output image is equal to the resolution of the first training input image;
  • the first training output image is processed, the second training output image is processed through an analysis network, and the system that generates the network is calculated by the system loss function based on the output of the discriminant network and the output of the analysis network Loss value; and correcting the parameters of the generating network according to the system loss value.
  • the generation network includes a backbone network, a first branch network, and a second branch network, and the input of the first branch network and the input of the second branch network are both Is the output of the backbone network;
  • Using the generation network to perform style transfer processing on the first training input image to generate the first training output image and the second training output image respectively includes: according to the first training input image, through all The main network and the first branch network generate the first training output image, and the main network and the second branch network generate the second training output image.
  • the backbone network includes a plurality of convolution modules connected in sequence and a plurality of downsampling layers interleaved with adjacent convolution modules;
  • the first branch network includes A plurality of convolution modules connected in sequence and a plurality of upsampling layers interposed between adjacent convolution modules;
  • the second branch network includes a plurality of convolution modules connected in sequence and multiple convolution modules interposed between adjacent convolution modules.
  • Up-sampling layers wherein the number of convolution modules and the number of up-sampling layers in the first branch network are respectively greater than the number of convolution modules and the number of down-sampling layers in the backbone network
  • the number of convolution modules and the number of up-sampling layers in the second branch network are respectively equal to the number of convolution modules and the number of down-sampling layers in the backbone network.
  • the target network includes the backbone network and the first branch network of the generating network.
  • the system loss function includes generating a network confrontation loss function, and the system loss value includes generating a network confrontation loss value;
  • the generation network confrontation loss function is expressed as:
  • L G denotes a network generated against loss function
  • z denotes the first training input image
  • P z (z) represents the first set of training input image
  • G (z) represents the output of the first training Image
  • D(G(z)) represents the output of the discriminant network for the first training output image
  • the analysis network includes a plurality of first convolution modules connected in sequence and a plurality of first downsampling layers interleaved with adjacent first convolution modules, at least Two of the first convolution modules are used to extract style features, and at least one of the first convolution modules is used to extract content features.
  • the system loss function further includes a content loss function
  • the system loss value further includes a content loss value
  • the content loss function is expressed as:
  • L content represents the content loss function
  • C m represents the single-layer content loss function of the m-th first convolution module in the at least one first convolution module used to extract the content feature
  • w 1m Indicates the weight of C m ;
  • the single-layer content loss function is expressed as:
  • S 1 is a constant, Represents the value of the j-th position in the first content feature image of the first training input image extracted by the i-th first convolution kernel in the m-th first convolution module, Represents the value of the j-th position in the second content feature image of the second training output image extracted by the i-th first convolution kernel in the m-th first convolution module.
  • the system loss function further includes a style loss function
  • the system loss value further includes a style loss value
  • the style loss function is expressed as:
  • L style style representing the loss function
  • E n represents the style characteristics for extracting said at least two modules of the first convolution of the n style single convolution module loss function
  • w 2m E n represents the weight
  • the single-layer style loss function is expressed as:
  • N n represents the number of first convolution kernels in the nth first convolution module
  • M n represents the first convolution kernel in the nth first convolution module
  • the parameters of the generating network include multiple convolution kernels and multiple biases
  • the system loss function also includes a weight bias ratio loss function
  • the system loss The value also includes the weight bias ratio loss value
  • training the generation network further includes: calculating the weight bias ratio loss according to the multiple convolution kernels and the multiple biases through the weight bias ratio loss function value.
  • the weight bias ratio loss function is expressed as:
  • L L1 represents the weight bias ratio loss function
  • W is the average value of the absolute values of the multiple convolution kernels
  • B is the average value of the absolute values of the multiple biases
  • is a positive number.
  • training the discriminant network based on the generation network includes: using the generation network to perform style transfer processing on the second training input image to generate the third training output image, Wherein, the resolution of the third training output image is greater than the resolution of the second training input image; the second training style image and the third training output image are input to the discriminant network, wherein the second The resolution of the training style image is equal to the resolution of the third training output image, according to the label of the second training style image and the output of the discrimination network corresponding to the second training style image and the third training The label of the output image and the output of the discrimination network corresponding to the third training output image are calculated by the discrimination network confrontation loss function to calculate the discrimination network confrontation loss value; and the parameters of the discrimination network are calculated according to the discrimination network confrontation loss value Make corrections.
  • the discriminant network confrontation loss function is expressed as:
  • L D represents the discriminant network confrontation loss function
  • x represents the second training style image
  • P data (x) represents the set of the second training style image
  • D(x) represents the discriminant network against all State the output of the second training style image
  • z represents the second training input image
  • P z (z) represents the set of the second training input image
  • G(z) represents the third training Output image
  • D(G(z)) represents the output of the discriminant network for the third training output image, Indicates seeking expectations for the set of third training input images.
  • At least one embodiment of the present disclosure further provides an image processing method, including: acquiring an input image; and performing style transfer processing on the input image using a neural network to generate an output image; wherein the neural network includes any image according to the present disclosure.
  • the resolution of the output image is higher than the resolution of the input image.
  • At least one embodiment of the present disclosure further provides an image processing device, including: an image acquisition module for acquiring an input image; and an image processing module, including the target network obtained according to the training method provided in any embodiment of the present disclosure,
  • the image processing module is configured to perform style transfer processing on the input image by using the target network to generate the output image.
  • At least one embodiment of the present disclosure further provides an image processing device, including: a memory for non-transitory storage of computer-readable instructions; and a processor for running the computer-readable instructions, the computer-readable instructions being The processor executes the training method provided by any embodiment of the present disclosure or the image processing method provided by any embodiment of the present disclosure when the processor is running.
  • At least one embodiment of the present disclosure further provides a storage medium that non-temporarily stores computer-readable instructions, and when the computer-readable instructions are executed by a computer, the instructions of the training method provided by any embodiment of the present disclosure or the present invention can be executed. Disclose the instructions of the image processing method provided by any embodiment.
  • Figure 1 is a schematic diagram of a convolutional neural network
  • Figure 2A is a schematic diagram of a convolutional neural network
  • Figure 2B is a schematic diagram of the working process of a convolutional neural network
  • Figure 3 is a schematic diagram of another convolutional neural network
  • FIG. 4 is a flowchart of a neural network training method provided by at least one embodiment of the present disclosure
  • FIG. 5A is a schematic structural block diagram of a training generation network corresponding to the training method shown in FIG. 4 provided by at least one embodiment of the present disclosure
  • 5B is a schematic flowchart of a process of training a generation network provided by at least one embodiment of the present disclosure
  • FIG. 6 is a schematic structural diagram of a generating network provided by at least one embodiment of the present disclosure.
  • FIG. 7A is a schematic diagram of an upsampling layer provided by at least one embodiment of the present disclosure.
  • FIG. 7B is a schematic diagram of another upsampling layer provided by at least one embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of a discrimination network provided by at least one embodiment of the present disclosure.
  • FIG. 9 is a schematic structural diagram of an analysis network provided by at least one embodiment of the present disclosure.
  • FIG. 10A is a schematic structural block diagram of a training discriminant network corresponding to the training method shown in FIG. 4 provided by at least one embodiment of the present disclosure
  • FIG. 10B is a schematic flowchart of a process of training a discriminant network provided by at least one embodiment of the present disclosure
  • FIG. 11 is a schematic flowchart of an image processing method provided by at least one embodiment of the present disclosure.
  • FIG. 12A is a schematic block diagram of an image processing apparatus provided by at least one embodiment of the present disclosure.
  • FIG. 12B is a schematic block diagram of another image processing apparatus provided by at least one embodiment of the present disclosure.
  • the use of deep neural networks to transfer the artistic style of pictures is a technology that has emerged with the development of deep learning technology.
  • the input image is processed to obtain an output image that at least partially reflects the style.
  • the output image can retain the original content while still presenting certain style characteristics of the artist's painting, and even make people mistake it for the artist's work.
  • Traditional image processing effects for example, various filters provided by the Instagram company, etc. cannot obtain such stylized transfer processing effects.
  • At least one embodiment of the present disclosure provides a neural network training method, image processing method, and image processing device, which combines generative confrontation network, super-resolution technology, and style transfer technology.
  • the trained neural network can generate high resolution based on input images. Quality high-resolution images with target styles improve the effect of image style transfer and image fusion, and enhance the user's visual experience; it has better and wider application prospects.
  • CNN Convolutional Neural Network
  • FIG. 1 shows a schematic diagram of a convolutional neural network.
  • the convolutional neural network can be used for image processing, which uses images as input and output, and replaces scalar weights with convolution kernels.
  • FIG. 1 only shows a convolutional neural network with a 3-layer structure, which is not limited in the embodiment of the present disclosure.
  • the convolutional neural network includes an input layer 101, a hidden layer 102, and an output layer 103.
  • the input layer 101 has 4 inputs
  • the hidden layer 102 has 3 outputs
  • the output layer 103 has 2 outputs.
  • the convolutional neural network finally outputs 2 images.
  • the 4 inputs of the input layer 101 may be 4 images, or 4 feature images of 1 image.
  • the three outputs of the hidden layer 102 may be characteristic images of the image input through the input layer 101.
  • the convolutional layer has weights And bias Weights Represents the convolution kernel, bias Is a scalar superimposed on the output of the convolutional layer, where k is the label of the input layer 101, and i and j are the labels of the unit of the input layer 101 and the unit of the hidden layer 102, respectively.
  • the first convolutional layer 201 includes a first set of convolution kernels (in Figure 1 ) And the first set of offsets (in Figure 1 ).
  • the second convolution layer 202 includes a second set of convolution kernels (in Figure 1 ) And the second set of offsets (in Figure 1 ).
  • each convolutional layer includes tens or hundreds of convolution kernels. If the convolutional neural network is a deep convolutional neural network, it may include at least five convolutional layers.
  • the convolutional neural network further includes a first activation layer 203 and a second activation layer 204.
  • the first activation layer 203 is located behind the first convolutional layer 201
  • the second activation layer 204 is located behind the second convolutional layer 202.
  • the activation layer (for example, the first activation layer 203 and the second activation layer 204) includes activation functions, which are used to introduce nonlinear factors into the convolutional neural network, so that the convolutional neural network can better solve more complex problems .
  • the activation function may include a linear correction unit (ReLU) function, a sigmoid function (Sigmoid function), or a hyperbolic tangent function (tanh function).
  • the ReLU function is an unsaturated nonlinear function
  • the Sigmoid function and tanh function are saturated nonlinear functions.
  • the activation layer can be used as a layer of the convolutional neural network alone, or the activation layer can also be included in the convolutional layer (for example, the first convolutional layer 201 can include the first activation layer 203, and the second convolutional layer 202 can be Including the second active layer 204).
  • the first convolution layer 201 For example, in the first convolution layer 201, first, several convolution kernels in the first group of convolution kernels are applied to each input And several offsets in the first set of offsets In order to obtain the output of the first convolutional layer 201; then, the output of the first convolutional layer 201 can be processed by the first activation layer 203 to obtain the output of the first activation layer 203.
  • the second convolutional layer 202 first, apply several convolution kernels in the second set of convolution kernels to the output of the input first activation layer 203 And several offsets in the second set of offsets In order to obtain the output of the second convolutional layer 202; then, the output of the second convolutional layer 202 can be processed by the second activation layer 204 to obtain the output of the second activation layer 204.
  • the output of the first convolutional layer 201 can be a convolution kernel applied to its input Offset
  • the output of the second convolutional layer 202 can be a convolution kernel applied to the output of the first activation layer 203 Offset The result of the addition.
  • the convolutional neural network Before using the convolutional neural network for image processing, the convolutional neural network needs to be trained. After training, the convolution kernel and bias of the convolutional neural network remain unchanged during image processing. In the training process, each convolution kernel and bias are adjusted through multiple sets of input/output example images and optimization algorithms to obtain an optimized convolutional neural network model.
  • FIG. 2A shows a schematic diagram of the structure of a convolutional neural network
  • FIG. 2B shows a schematic diagram of the working process of a convolutional neural network.
  • the main components of a convolutional neural network can include multiple convolutional layers, multiple downsampling layers, and fully connected layers.
  • a complete convolutional neural network can be composed of these three layers.
  • FIG. 2A only shows three levels of a convolutional neural network, namely the first level, the second level, and the third level.
  • each level may include a convolution module and a downsampling layer.
  • each convolution module may include a convolution layer.
  • the processing process of each level may include: convolution and down-sampling of the input image.
  • each convolution module may further include an instance normalization layer, so that the processing process at each level may also include standardization processing.
  • the example standardization layer is used to standardize the feature image output by the convolutional layer, so that the gray value of the pixel of the feature image changes within a predetermined range, thereby simplifying the image generation process and improving the quality of style transfer.
  • the predetermined range may be [-1, 1].
  • the instance standardization layer performs standardization processing on each feature image according to its own mean and variance.
  • the instance normalization layer can also be used to normalize a single image.
  • the standardized formula of the instance standardized layer can be expressed as follows:
  • x tijk is the value of the t-th feature block (patch), the i-th feature image, the j-th column, and the k-th row in the feature image set output by the first convolutional layer.
  • y tijk represents the result obtained after processing x tijk by the instance normalization layer.
  • is a small integer to avoid zero denominator.
  • the convolutional layer is the core layer of the convolutional neural network.
  • a neuron is only connected to some of the neurons in the adjacent layer.
  • the convolutional layer can apply several convolution kernels (also called filters) to the input image to extract multiple types of features of the input image.
  • Each convolution kernel can extract one type of feature.
  • the convolution kernel is generally initialized in the form of a random decimal matrix. During the training process of the convolutional neural network, the convolution kernel will learn to obtain reasonable weights.
  • the result obtained after applying a convolution kernel to the input image is called a feature map, and the number of feature images is equal to the number of convolution kernels.
  • Each feature image is composed of some rectangularly arranged neurons, and the neurons of the same feature image share weights, and the shared weights here are the convolution kernels.
  • the feature image output by the convolution layer of one level can be input to the convolution layer of the next next level and processed again to obtain a new feature image.
  • the first-level convolutional layer can output a first feature image, which is input to the second-level convolutional layer and processed again to obtain a second feature image.
  • the convolutional layer can use different convolution kernels to convolve the data of a certain local receptive field of the input image, and the convolution result is input to the activation layer, which is calculated according to the corresponding activation function To get the characteristic information of the input image.
  • the down-sampling layer is arranged between adjacent convolutional layers, and the down-sampling layer is a form of down-sampling.
  • the down-sampling layer can be used to reduce the scale of the input image, simplify the calculation complexity, and reduce over-fitting to a certain extent; on the other hand, the down-sampling layer can also perform feature compression to extract the input image Main features.
  • the down-sampling layer can reduce the size of feature images, but does not change the number of feature images.
  • a 2 ⁇ 2 output image can be obtained, which means that 36 pixels on the input image are merged into the output image. 1 pixel.
  • the last downsampling layer or convolutional layer can be connected to one or more fully connected layers, which are used to connect all the extracted features.
  • the output of the fully connected layer is a one-dimensional matrix, which is a vector.
  • Figure 3 shows a schematic diagram of another convolutional neural network.
  • the output of the last convolutional layer ie, the t-th convolutional layer
  • the flattening layer can convert feature images (2D images) into vectors (1D).
  • the flattening operation can be performed as follows:
  • v is a vector containing k elements
  • f is a matrix with i rows and j columns.
  • the output of the planarization layer ie, the 1D vector
  • FCN fully connected layer
  • the fully connected layer can have the same structure as the convolutional neural network, but the difference is that the fully connected layer uses different scalar values to replace the convolution kernel.
  • the output of the last convolutional layer can also be input to the Averaging Layer (AVG).
  • AVG Averaging Layer
  • the averaging layer is used to average the output, that is, the average value of the feature image is used to represent the output image. Therefore, a 2D feature image is converted into a scalar.
  • the convolutional neural network includes a leveling layer, it may not include a flattening layer.
  • the homogenization layer or the fully connected layer can be connected to the classifier, the classifier can classify according to the extracted features, and the output of the classifier can be used as the final output of the convolutional neural network, that is, the category identifier that characterizes the image category (label).
  • the classifier may be a Support Vector Machine (SVM) classifier, a softmax classifier, and a nearest neighbor rule (KNN) classifier.
  • SVM Support Vector Machine
  • KNN nearest neighbor rule
  • the convolutional neural network includes a softmax classifier.
  • the softmax classifier is a generator of logic functions that can compress a K-dimensional vector z containing any real number into a K-dimensional vector ⁇ ( z).
  • the formula of the softmax classifier is as follows:
  • Z j represents the j-th element in the K-dimensional vector z
  • ⁇ (z) represents the predicted probability of each category label (label)
  • ⁇ (z) is a real number
  • its range is (0,1)
  • K-dimensional The sum of the vector ⁇ (z) is 1.
  • each category identifier in the K-dimensional vector z is assigned a certain prediction probability, and the category identifier with the largest prediction probability is selected as the identifier or category of the input image.
  • Fig. 4 is a flowchart of a neural network training method provided by at least one embodiment of the present disclosure.
  • the training method includes:
  • Step S10 Based on the generation network, train the discriminant network
  • Step S20 training the generation network based on the discriminant network.
  • the above-mentioned training process is performed alternately to obtain a target network based on the generated network after training.
  • the target network obtained by the training method can be used to perform style transfer processing on an input image to obtain an output image, the resolution of the output image is higher than the resolution of the input image.
  • FIG. 5A is a schematic structural block diagram of a training generation network corresponding to the training method shown in FIG. 4 provided by at least one embodiment of the present disclosure
  • FIG. 5B is a process of training a generation network provided by at least one embodiment of the present disclosure Schematic flowchart.
  • step S20 includes steps S201 to S203, as follows:
  • Step S201 Perform style transfer processing on the first training input image by using the generation network to respectively generate the first training output image and the second training output image, wherein the resolution of the first training output image is higher than that of the first training input image. Resolution, the resolution of the second training output image is equal to the resolution of the first training input image;
  • Step S202 Process the first training output image through the discrimination network, process the second training output image through the analysis network, and calculate and generate the system loss value of the network through the system loss function according to the output of the discrimination network and the output of the analysis network;
  • Step S203 Correct the parameters of the generating network according to the system loss value.
  • training the generative network based on the discriminant network may also include: judging whether the training of the generative network G satisfies a predetermined condition, and if the predetermined condition is not met, repeating the above training process of the generative network G; If conditions are met, the training process of generating network G in this stage is stopped, and generating network G trained in this stage is obtained.
  • the foregoing predetermined condition is that the system loss values corresponding to two consecutive (or more) first training input images no longer decrease significantly.
  • the foregoing predetermined condition is that the number of training times or training periods of the generating network G reaches a predetermined number. This disclosure does not limit this.
  • the discriminant network for example, the discriminant network, the generative network, the discriminant network, and the various layers included in these neural networks (for example, convolutional layer, up-sampling layer, down-sampling layer, etc.), etc.
  • the programs/methods respectively corresponding to the corresponding processing procedures are implemented by corresponding software, firmware, hardware, etc. The following are the same as this, and will not be repeated; and the above examples are merely illustrative of the training process of generating the network.
  • the training phase a large number of sample images need to be used to train the neural network; at the same time, the training process of each sample image may include multiple iterations to modify the parameters of the generated network.
  • the training phase also includes fine-tune the parameters of the generated network to obtain more optimized parameters.
  • the initial parameter of the generating network G may be a random number, for example, the random number conforms to a Gaussian distribution.
  • the initial parameters of the generating network G can also be trained parameters of image databases such as ImageNet. The embodiment of the present disclosure does not limit this.
  • the training process of generating network G may also include an optimization function (not shown in FIG. 5A).
  • the optimization function may calculate the error value of the parameters of the generated network G according to the system loss value calculated by the system loss function, and according to the error The value modifies the parameters of the generated network G.
  • the optimization function may use a stochastic gradient descent (SGD) algorithm, a batch gradient descent (batch gradient descent, BGD) algorithm, etc. to calculate the error value of the parameters of the generated network G.
  • SGD stochastic gradient descent
  • BGD batch gradient descent
  • the first training input image may be various types of images.
  • the first training input image may be an image taken by a digital camera or a mobile phone, which may be an image of a person, an image of animals and plants, or a landscape image.
  • FIG. 6 is a schematic structural diagram of a generating network provided by at least one embodiment of the present disclosure.
  • the generation network G includes a backbone network MN, a first branch network BN1, and a second branch network.
  • the first branch network BN1 and the second branch network BN2 are respectively connected to the backbone network MN, that is, the first branch network BN1 and the backbone network MN are in the same processing flow, and the output of the backbone network MN is input into the first branch network BN1,
  • the second branch network BN2 and the backbone network MN are in the same processing flow, and the output of the backbone network MN is input into the second branch network BN2, so that the input of the first branch network BN1 and the input of the second branch network BN2 Both are the output of the backbone network MN, that is, the input of the first branch network BN1 is the same as the input of the second branch network BN2.
  • using the generation network G to perform style transfer processing on the first training input image to respectively generate the first training output image HR1 and the second training output image LR2 may include: according to the first training input image , The first training output image HR1 is generated through the backbone network MN and the first branch network BN1, and the second training output image LR2 is generated through the backbone network MN and the second branch network BN2.
  • the backbone network MN and the first branch network BN1 perform style transfer processing on the first training input image to obtain the first training output image HR1, and the backbone network MN and the second branch network BN2 perform style transfer processing on the first training input image to obtain the first training input image.
  • the backbone network MN includes multiple convolution modules CM0 connected in sequence and multiple downsampling layers DS0 interleaved with adjacent convolution modules CM0;
  • the first branch network BN1 includes sequentially A plurality of connected convolution modules CM1 and a plurality of upsampling layers US1 interleaved with adjacent convolution modules;
  • the second branch network BN2 includes a plurality of convolution modules CM2 connected in sequence and interleaved with adjacent convolution modules Multiple upsampling layers US2.
  • each convolution module may include a convolution layer for extracting characteristic images.
  • the convolutional layer of the low-level convolution module is used to extract the low-level features (for example, points, edges, etc.) of the first training input image; as the level increases, the high-level convolutional layer can extract the first training input image High-level features (for example, straight lines, curves, triangles, etc.); high-level features can be obtained by combining low-level features.
  • the convolutional layer may include an activation layer as needed.
  • At least part of the convolution module may further include an instance normalization layer, which is used to normalize the feature image output by the convolution layer in the at least part of the convolution module.
  • the down-sampling layer DS0 is used to reduce the data amount of the feature image of the first training input image to improve the speed of image processing; for example, the down-sampling layer DS0 is used to reduce the value of each dimension of the feature image, thereby Reduce the data volume of feature images.
  • the upsampling layer (US1, US2) is used to increase the value of each dimension of the feature image, thereby increasing the data volume of the feature image.
  • the number of convolution modules CM1 and upsampling layer US1 in the first branch network BN1 are respectively more than that of the backbone network MN
  • the number of convolution modules CM0 and down-sampling layer DS0 in BN1 that is to say, the number of convolution modules CM1 in the first branch network BN1 is more than the number of convolution modules CM0 in the backbone network MN
  • the first The number of up-sampling layers US1 in the branch network BN1 is more than the number of down-sampling layers DS0 in the backbone network MN.
  • the number of convolution modules CM2 and the number of upsampling layers US2 in the second branch network BN2 are respectively equal to the number of convolution modules CM0 and the number of downsampling layers DS0 in the backbone network MN, that is, the numbers of the second branch network BN2
  • the number of convolution modules CM2 is equal to the number of convolution modules CM0 in the backbone network MN
  • the number of upsampling layers US2 in the second branch network BN2 is equal to the number of downsampling layers DS0 in the backbone network MN.
  • the number of convolution modules CM1 in the first branch network BN1 is more than the number of convolution modules CM2 in the second branch network BN2, and the number of upsampling layers US1 in the first branch network BN1 is more The number of upsampling layers US2 in the second branch network BN2.
  • the backbone network MN includes x1 convolution modules CM0, where x1 is a positive integer and is usually greater than 2.
  • the backbone network MN may include, for example, (x1–1) Sampling layer DS0.
  • the first branch network BN1 includes x2 convolution modules CM1, where x2 is a positive integer and x2>x1.
  • the first branch network BN1 includes (x2-1) upsampling layers US1; and, the second The branch network BN2 includes x1 convolution modules CM2 and (x1–1) upsampling layers US2.
  • the values of x1 and x2 are illustrative, and the present disclosure does not limit this.
  • the downsampling factors of the (x1–1) downsampling layers DS0 in the backbone network MN are respectively the same as the (x1–1) downsampling factors in the (x2–1) upsampling layers US1 of the first branch network BN1.
  • the downsampling factors of (x1–1) downsampling layers DS0 in the backbone network MN also correspond to the upsampling factors of (x1–1) upsampling layers US2 in the second branch network BN2 Factor correspondence.
  • the downsampling factor of a downsampling layer corresponds to the upsampling factor of an upsampling layer means that when the downsampling factor of the downsampling layer is 1/y, the upsampling factor of the upsampling layer is y, where y is a positive integer, and y is usually greater than 2.
  • the downsampling factors of the three downsampling layers DS0 of the backbone network MN are respectively 1/q1, 1/q2, 1/q3, and the 5 upper samples of the first branch network BN1
  • the upsampling factors of any three upsampling layers US1 in the sampling layer US1 are q1, q2, and q3 (the specific order is not required), and the upsampling factors of the three upsampling layers US2 of the second branch network BN2 are q1, q2 respectively , Q3 (the specific order is not required).
  • the number of down-sampling layer DS0, up-sampling layer US1, up-sampling layer US2, down-sampling factor of down-sampling layer DS0, up-sampling factor of up-sampling layer US1, and up-sampling factor of up-sampling layer US2 are It can be set to other values, as long as it can meet the requirements for the resolution of the first training output image HR1 and the second training output image LR2 in step S201, which is not limited in the present disclosure.
  • the down-sampling layer DS0 can use various down-sampling methods to down-sample the feature image. Downsampling methods include, but are not limited to: max pooling, average pooling, strided convolution, decimation, such as selecting fixed pixels, demultiplexing output (demuxout) , Split the input image into multiple smaller images) and so on.
  • the up-sampling layers US1 and US2 may adopt up-sampling methods such as strided transposed convolution and interpolation algorithms to achieve up-sampling.
  • the interpolation algorithm may include, for example, interpolation, bicubic interpolation (Bicubic Interprolation), and so on.
  • FIG. 7A is a schematic diagram of an upsampling layer provided by at least one embodiment of the present disclosure
  • FIG. 7B is a schematic diagram of another upsampling layer provided by at least one embodiment of the present disclosure.
  • the up-sampling layer uses pixel interpolation to implement up-sampling.
  • the up-sampling layer can also be called a composite layer.
  • the composite layer uses an upsampling factor of 2 ⁇ 2, so that 4 input feature images (ie, INPUT 4n, INPUT 4n+1, INPUT 4n+2, INPUT 4n+3 in Fig. 7A) can be combined to obtain 1
  • the output feature image in a fixed pixel order ie, OUTPUT n in FIG. 7A).
  • the upsampling layer obtains a first number of input feature images, and interleaves the pixel values of these input feature images to produce the same first number of input feature images.
  • Output characteristic image Compared with the input feature images, the number of output feature images has not changed, but the size of each output feature image is increased by a corresponding multiple. Therefore, the composite layer adds more data information through different permutations and combinations, and these combinations can give all possible upsampling combinations. Finally, you can select from up-sampling combinations through the activation layer.
  • the up-sampling layer adopts the pixel value interleaving rearrangement method to achieve up-sampling.
  • the up-sampling layer may also be called a composite layer.
  • the composite layer also uses an upsampling factor of 2 ⁇ 2, that is, taking every 4 input feature images (ie, INPUT 4n, INPUT 4n+1, INPUT 4n+2, INPUT 4n+3 in Figure 7B) as a group, and Their pixel values are interleaved to generate 4 output feature images (ie, OUTPUT 4n, OUTPUT 4n+1, OUTPUT 4n+2, OUTPUT 4n+3 in FIG. 7B).
  • the number of input feature images is the same as the number of output feature images obtained after composite layer processing, and the size of each output feature image is increased by 4 times of the input feature image, that is, it has 4 times the number of pixels of the input feature image.
  • FIG. 8 is a schematic structural diagram of a discrimination network provided by an embodiment of the disclosure.
  • the discriminant network D includes multiple convolution modules CM3, multiple down-sampling layers DS3, and a fully connected layer FCN.
  • the structure and function of the convolution module CM3, the down-sampling layer DS3, and the fully connected layer FCN can be referred to the aforementioned descriptions related to the convolution module (CM0, CM1, CM2), the down-sampling layer DS0, and the fully connected layer, respectively. No restrictions.
  • the discrimination network D in the discrimination network D, multiple convolution modules CM3 are connected in sequence, and there is a down-sampling layer DS3 between some adjacent convolution modules CM3.
  • the discrimination Network D includes six convolution modules CM3 connected in sequence, with a downsampling layer between the second convolution module and the third convolution module, and between the fourth convolution module and the fifth convolution module A downsampling layer.
  • the fully connected layer FCN is connected to the last convolution module CM3.
  • each convolution module CM3 may include a convolution layer; for example, as required, at least part of the convolution module CM3 may also include an instance normalization layer.
  • the discrimination network D further includes an activation layer, which is connected to the fully connected layer FCN.
  • the activation function of the activation layer may adopt a Sigmoid function, so that the output of the activation layer (that is, the output of the discriminant network D) is a value in the range of [0, 1].
  • the discriminant network D can determine the similarity between the style of the first training output image HR1 and the target style.
  • the discriminant network D compares the first training output image HR1 Processing is performed to obtain the output of the discrimination network D, and the value of the output of the discrimination network D indicates the similarity between the style of the first training output image HR1 and the target style.
  • the target style may be the style of the second training style image that will be introduced later, that is, the style that the user hopes the target network of the generating network can generate.
  • the discrimination network shown in FIG. 8 is schematic.
  • the discriminant network shown in FIG. 8 may include more or fewer convolution modules or downsampling layers.
  • the discrimination network shown in FIG. 8 may also include other modules or layer structures, for example, a flattening module is also provided before the fully connected layer.
  • some modules or layer structures in the discriminant network shown in Figure 8 can be replaced with other modules or layer structures, for example, the fully connected layer is replaced with a convolutional layer for averaging (AVG) (refer to Figure 3 and the aforementioned related description), for example, the activation layer is replaced with a two-class softmax module.
  • AVG convolutional layer for averaging
  • the activation layer is replaced with a two-class softmax module.
  • the embodiment of the present disclosure does not limit the structure of the discrimination network, which includes but is not limited to the discrimination network structure shown in FIG. 8.
  • the system loss function may include generating a network countermeasure loss function, and accordingly, the system loss value includes generating a network countermeasure loss value.
  • the generation network confrontation loss function calculates the generation network confrontation loss value according to the output of the discriminant network D.
  • the generated network confrontation loss function can be expressed as:
  • L G represents the generated network confrontation loss function
  • z1 represents the first training input image
  • P z1 (z1) represents the set of first training input images (for example, including a batch of multiple first training input images)
  • G (z1) represents the first training output image HR1
  • D(G(z1)) represents the output of the discrimination network D for the first training output image HR1, that is, the output obtained by the discrimination network D processing the first training output image HR1
  • the training goal of generating network G is to minimize the system loss value. Therefore, in the training process of generating network G, minimizing the system loss value includes reducing the generation network counter loss value.
  • the label of the first training output image HR1 is set to 1, that is, it is hoped that the network D can identify that the first training output image HR1 has the target style.
  • the parameters of the generation network G are continuously revised, so that the output of the discriminant network D corresponding to the first training output image HR1 generated by the generation network G after the parameter correction is constantly approaching 1, so as to continuously reduce the value of the generated network confrontation loss. As shown in FIG.
  • FIG. 9 is a schematic structural diagram of an analysis network provided by at least one embodiment of the present disclosure.
  • the analysis network G includes a plurality of first convolution modules CM01 connected in sequence and a plurality of first down-sampling layers DS01 interposed between adjacent first convolution modules CM01.
  • each first convolution module CM01 includes a first convolution layer
  • each first convolution layer includes a plurality of first convolution kernels
  • the first convolution kernels can be used to extract and analyze the content of the input image of the network A Characteristics and style characteristics. For example, referring to FIG.
  • the input of the analysis network A shown in FIG. 9 may include a first training input image, a second training output image LR2, and a first training style image.
  • the first convolution module CM01 may further include an instance standardization layer.
  • the analysis network A may adopt a deep neural network capable of classifying images.
  • the input is processed by several first convolution modules CM01 and the first down-sampling layer DS01 to extract features.
  • the output of each first convolution module CM01 is the input feature image.
  • the first down-sampling layer DS01 can reduce the resolution of the feature image and pass it to the first convolution module CM01 of the next level.
  • the plurality of first convolution modules CM01 may output a plurality of feature images, and the plurality of feature images may characterize the input features of different levels (for example, texture, edge, object, etc.).
  • the feature image is input to the flattening layer, which converts the feature image into a vector and then passes it to the fully connected layer and the classifier.
  • the classifier layer can include a softmax classifier.
  • the softmax classifier can output the probability that the input belongs to each category identifier, and the identifier with the highest probability will be the final output of the analysis network A.
  • the analysis network A realizes image classification.
  • the analysis network A can use a trained convolutional neural network model. Therefore, during the training process of generating network G, it is not necessary to modify the parameters of analysis network A (for example, the first convolution kernel, etc.).
  • the analysis network A can use neural network models such as AlexNet, GoogleNet, VGG, Deep Residual Learning, etc. to extract input content features and style features.
  • the VGG network is a type of deep convolutional neural network, which was developed by the Visual Geometry Group of Oxford University and has been widely used in the field of visual recognition.
  • a VGG network can include 19 layers, and some of them can be standardized.
  • first convolution modules CM01 and multiple first down-sampling layers DS01 are shown.
  • the analysis network A provided by the embodiment of the present disclosure, as shown in FIG. 9, at least two first convolution modules CM01 are used to extract style features, and at least one first convolution module CM01 is used to extract content features.
  • the analysis network shown in FIG. 9 is schematic. The embodiments of the present disclosure do not limit the specific details of analyzing the structure of the network, extracting style features and content features (for example, the number and level of first convolution modules used to extract style features and content features, etc.).
  • analysis network A is used to receive the first training input image, the first training style image, and the second training output image LR2, and respectively generate and output the first training The first content feature image of the input image, the first style feature image of the first training style image, and the second content feature image of the second training output image LR2 and the second style feature image of the second training output image LR2.
  • the first training style image may be famous paintings of various art masters (such as Monet, Van Gogh, Picasso, etc.), but is not limited thereto.
  • the first training style image may also be ink paintings, sketch paintings, and the like.
  • the first training style image has a target style consistent with the second training style image that will be introduced later.
  • the first training style image is a low-resolution version of the second training style image, that is, the first training style image and the second training style image can be the same style image, but the resolution of the first training style image Less than the resolution of the second training style image. But the present disclosure is not limited to this.
  • the sizes of the first training input image, the first training style image, and the second training output image LR2 are all the same.
  • the content feature represents the distribution of objects in the image in the entire image
  • the style feature represents the relationship between different feature images in different layers of the convolutional neural network.
  • the content feature includes the content information of the image
  • the style feature may include the texture information, color information, etc. of the image.
  • the texture information represents, for example, the correlation between feature images, which is independent of position.
  • the feature image in the convolutional neural network can be a one-dimensional matrix, and the Gram matrix can be used to measure the correlation degree of each vector in the one-dimensional matrix. Therefore, the convolutional neural network can introduce the Gram matrix to calculate the style of the image feature.
  • the Gram matrix can be expressed as follows:
  • the system loss function may also include a content loss function and a style loss function, so that the system loss value may also include a content loss value and a style loss value.
  • the content loss function is used to describe the difference in content between the first training input image and the second training output image LR2
  • the style loss function is used to describe the difference in style between the first training style image and the second training output image LR2.
  • the content loss function is used to calculate and generate the content loss value of the parameter of the network G according to the first content feature image of the first training input image and the second content feature image of the second training output image LR2.
  • the style loss function is used to calculate the style loss value of the parameters of the generated network G according to the first style feature image of the first training style image and the second style feature image of the second training output image LR2.
  • the single-layer content loss function is expressed as:
  • S 1 is a constant, Represents the value of the j-th position in the first content feature image of the first training input image extracted by the i-th first convolution kernel in the m-th first convolution module in the analysis network A, Represents the value of the j-th position in the second content feature image of the second training output image LR2 extracted by the i-th first convolution kernel in the m-th first convolution module in the analysis network A.
  • the input image can be extracted through at least one first convolution module CM01 (for example, the input image here includes the first training input image and the second training output image LR2).
  • CM01 for example, the input image here includes the first training input image and the second training output image LR2.
  • Content characteristics, the content loss function is expressed as:
  • L content represents the content loss function
  • C m represents the single-layer content loss function of the m-th first convolution module in the at least one first convolution module used to extract content features
  • w 1m represents the weight of C m .
  • minimizing the system loss value includes reducing the content loss value.
  • the generation network G is used for image style transfer processing, it is desirable to keep the output and input of the generation network G having the same content characteristics, that is, the second training output image LR2 saves the content of the first training input image.
  • the parameters of the generative network G are constantly revised, so that the content characteristics of the second training output image LR2 generated by the generative network G after the parameter correction are constantly approaching the first training input The content characteristics of the image, thereby continuously reducing the content loss value.
  • the single-layer style loss function is expressed as:
  • S 2 is a constant
  • N n represents the number of first convolution kernels in the nth first convolution module of the analysis network A
  • M n represents the first convolution kernel in the nth first convolution module
  • the size of the extracted style feature image Represents the value of the j-th position in the Gram matrix of the first style feature image of the first training style image extracted by the i-th first convolution kernel in the n-th first convolution module in the analysis network A
  • the input image can be extracted through at least two first convolution modules CM01 (for example, the input image here includes the first training style image and the second training output image LR2)
  • CM01 for example, the input image here includes the first training style image and the second training output image LR2
  • the style characteristics of, the style loss function is expressed as:
  • L style represents a style loss function
  • En represents a single-layer style loss function of the n-th first convolution module of the at least two first convolution modules used to extract style features
  • w2m represents the weight of En.
  • minimizing the system loss value includes reducing the style loss value.
  • the output of the generation network G has the target style, that is, the second training output image LR2 has the same style characteristics as the first training style image.
  • the parameters of the generative network G are continuously revised, so that the style characteristics of the second training output image LR2 generated by the generative network G after the parameter correction are continuously approaching the first training style The style characteristics of the image, thereby continuously reducing the style loss value.
  • the system loss function may also include a weight bias ratio loss function (L1 loss function), and accordingly, the system loss value also includes a weight bias ratio loss value .
  • the generation network provided by the embodiments of the present disclosure adds a weight bias ratio loss function to the system loss function, so that the activation function in the generation network G can be fully utilized, the parameters of the generation network G can be obtained more optimized, and the image style transfer can be improved. And the image fusion effect can be considered in terms of processing effect and processing speed, which has a better and wider application prospect.
  • the generating network G shown in FIG. 6 includes multiple convolution kernels and multiple biases, and the multiple convolution kernels and the multiple biases are used to generate all convolution modules (CM0, CM1, CM1, CM2) Convolution kernel and bias included in the convolution layer.
  • the parameters for generating the network G may include the multiple convolution kernels and the multiple biases.
  • the convolution kernel is used to determine how to process the input image, and the bias is used to determine whether the output of the convolution kernel is input to the next level. Therefore, in the activation layer of the convolutional neural network, the bias can be likened to a "switch", which is used to decide whether to "open” or "close” the convolution kernel. For different input images, different convolution kernels can be “opened” or “closed” to achieve multiple effects.
  • the bias compared with the convolution kernel, the bias needs to have a relatively large absolute value, so as to play the role of "switch" more effectively.
  • the weight bias ratio loss function is used to adjust the ratio between multiple convolution kernels and multiple biases in the generation network G to enhance the effect of the activation function in the generation network G.
  • the weight bias ratio loss function calculates the weight bias ratio loss value.
  • weight bias ratio loss function can be expressed as:
  • L L1 represents the weight bias ratio loss function
  • W is the average of the absolute values of the multiple convolution kernels of the generating network G
  • B is the average of the absolute values of the multiple biases of the generating network G
  • is positive number.
  • W can be expressed as:
  • C w is the number of first convolution kernels of the first convolution layer
  • w represents the value of each convolution kernel of the first convolution layer.
  • the convolution kernel is a 2 ⁇ 2 matrix
  • w represents the sum of the elements of the matrix.
  • B can be expressed as:
  • C b is the number of biases of the first convolutional layer, and b represents the value of each bias of the first convolutional layer.
  • the activation function in the generating network G may adopt the ReLU function. But it is not limited to this, the activation function in the generating network G can also adopt a sigmoid function, a tanh function, and so on.
  • the ReLU function can be expressed as:
  • out represents the output of the ReLU function
  • in 1 represents the input of the ReLU function.
  • in 1 can be expressed as:
  • in 0 represents the pixel matrix of the first training input image input to, for example, the first-level convolutional layer CM0
  • in 1 represents the feature image output after the first-level convolutional layer CM0 processes in 0
  • the pixel matrix of w represents the value of the convolution kernel in the first-level convolutional layer CM0
  • b represents the value of the offset in the first-level convolutional layer CM0.
  • system loss function of generating network G can be expressed as:
  • L total aL content + ⁇ L style + ⁇ L G + ⁇ L L1
  • step S202 the system loss value is calculated by the system loss function expressed by the above formula, and then step S203 is executed to generate all the parameters of the network G (including the parameters of the backbone network MN, the parameters of the first branch network BN1, and the second The parameter of the branch network BN2) is corrected, and thus step S20 can be realized.
  • FIG. 10A is a schematic structural block diagram of a training discriminant network corresponding to the training method shown in FIG. 4 provided by at least one embodiment of the present disclosure
  • FIG. 10B is a process of training a discriminant network provided by at least one embodiment of the present disclosure Schematic flowchart.
  • step S10 includes steps S101 to S103, as follows:
  • Step S101 Perform style transfer processing on the second training input image by using the generation network to generate a third training output image, wherein the resolution of the third training output image is greater than the resolution of the second training input image;
  • Step S102 Input the second training style image and the third training output image into the discriminant network, where the resolution of the second training style image is equal to the resolution of the third training output image, according to the label of the second training style image and the second training output image.
  • the output of the discriminant network corresponding to the training style image, the label of the third training output image and the output of the discriminant network corresponding to the third training output image, and the discriminant network confrontation loss value is calculated through the discrimination network confrontation loss function;
  • Step S103 Correct the parameters of the discriminating network according to the discriminated network countermeasure loss value.
  • training the discriminant network based on the generative network may also include: judging whether the training of the discriminating network D satisfies a predetermined condition, if the predetermined condition is not met, repeating the training process of the discriminant network D; if the predetermined condition is satisfied If conditions are met, the training process of the discriminant network D in this stage is stopped, and the discriminant network D trained in this stage is obtained.
  • the foregoing predetermined condition is that the discriminant network confrontation loss value corresponding to two consecutive (or more) second training style images and the third training output image HR3 no longer significantly decreases.
  • the above-mentioned predetermined condition is that the number of training times or training periods of the discriminating network D reaches a predetermined number. This disclosure does not limit this.
  • each sample image training process can include multiple iterations to correct the parameters of the discriminant network.
  • the training phase also includes fine-tune the parameters of the discriminant network to obtain more optimized parameters.
  • the initial parameter of the discrimination network D may be a random number, for example, the random number conforms to a Gaussian distribution, which is not limited in the embodiment of the present disclosure.
  • the training process of the discriminant network D can also include an optimization function (not shown in FIG. 10A).
  • the optimization function can calculate the error value of the parameters of the discriminant network D based on the discriminant network countermeasure loss value calculated by the discriminant network countermeasure loss function. And according to the error value to correct the parameters of the discriminant network D.
  • the optimization function can use a stochastic gradient descent (SGD) algorithm, a batch gradient descent (batch gradient descent, BGD) algorithm, etc., to calculate the error value of the parameters of the discriminating network D.
  • SGD stochastic gradient descent
  • BGD batch gradient descent
  • the second training input image may be the same as the first training input image.
  • the second training input image set and the first training input image set are the same image set.
  • the second training input image may be various types of images.
  • the second training input image may be an image taken by a digital camera or a mobile phone, which may be an image of a person, an image of animals and plants, or a landscape image.
  • the second training style image has a target style consistent with the first training style image.
  • the second training style image may be a high-resolution version of the first training style image, but is not limited to this.
  • the second training style image may be famous paintings of various art masters (such as Monet, Van Gogh, Picasso, etc.), but is not limited to this.
  • the second training style image may also be ink painting, sketch painting, and the like.
  • the discrimination network D may be the discrimination network shown in FIG. 8, but is not limited to this.
  • the discriminant network confrontation loss function shown in Figure 10A can be expressed as:
  • L D represents the discriminative network confrontation loss function
  • x represents the second training style image
  • P data (x) represents the set of second training style images (for example, includes a batch of multiple second training style images)
  • D (x) represents the output of the discrimination network D for the second training style image x, that is, the output obtained by the discrimination network D processing the second training style image x
  • z2 represents the second training input image
  • P z2 (z2) represents the set of second training input images (for example, including a batch of multiple second training input images)
  • G(z2) represents the third training output image HR3
  • D(G(z2)) represents the output of the discrimination network D for the third training output image HR3, that is, the output obtained by the discrimination network D processing the third training output image HR3, Indicates the expectation for the set of third training input images.
  • the batch gradient descent algorithm is used to optimize the parameters of the discriminant network D.
  • discriminant network countermeasure loss function expressed by the above formula is exemplary, and the present disclosure includes but is not limited to this.
  • the training goal of discriminant network D is to minimize the value of discriminant network confrontation loss.
  • the label of the second training style image is set to 1, which means that the discriminant network D is expected to identify that the second training style image has the target style; at the same time, the label of the third training output image HR3 is set to 0, that is, it is hoped that the network D is expected to identify that the third training output image HR3 does not have the target style.
  • it is hoped that the discrimination network D determines that the style of the third training output image HR3 is different from the style of the second training style image.
  • the parameters of the discriminant network D are continuously modified, so that the discriminant network D after parameter correction can accurately distinguish the second training style image and the third training output image generated by the generation network G HR3, that is, make the output of the discriminant network D corresponding to the second training style image keep approaching 1, and make the output of the discriminant network D corresponding to the third training output image HR3 keep getting closer to 0, thereby continuously decreasing Generate network counter loss value.
  • the training of the generating network G and the training of the discriminant network D are performed alternately and iteratively.
  • the first stage of training is generally performed on the discriminant network D to improve the discriminative ability of the discriminant network D (that is, the ability to identify whether the input of the discriminant network D has the target style)
  • the discriminant network D trained in the first stage is obtained; then, based on the discriminant network D trained in the first stage, the generation network G is trained in the first stage to improve the image style transfer ability of the generation network G (that is, make the generation network G
  • the generated high-resolution image has the ability of the target style), and the generation network G trained in the first stage is obtained.
  • the second-stage training Similar to the first-stage training, in the second-stage training, based on the generation network G trained in the first stage, perform the second-stage training on the discriminant network D trained in the first stage to improve the discriminating ability of the discriminant network D, and get The discriminant network D trained in the second stage; then, the generation network G trained in the first stage is trained in the second stage based on the discriminant network D trained in the second stage to improve the image style transfer ability of the generated network G, and get The generative network G trained in the second stage, and so on, and then the discriminant network D and the generative network G are trained in the third stage, the fourth stage of training, ... until the high-resolution image generated by the generated network G is completely The target style.
  • the antibodies of generating network G and discriminating network D are now generating the output of network G (the high-resolution image generated by generating network G) in their own separate
  • the label is 1 in the training process of the generation network G
  • the label is 0 in the training process of the discriminant network D
  • the second part of the discriminant network against the loss function that is, with the generation network
  • the part related to the high-resolution image generated by G is opposite to the generated network counter loss function in the system loss function.
  • the high-resolution image output by the generation network G obtained after training has the target style (that is, the style of the second training style image), and the judgment network D is directed to the second training style image and the generation network
  • the output of the high-resolution image generated by G is 0.5, that is, the generating network G and the discriminating network D reach the Nash equilibrium through the confrontation game.
  • the target style refers to the style of the second training style image.
  • the styles of the first training style image and the second training style image are the same, so that the high-resolution image and the low-resolution image generated by the generation network G obtained after training both have the target style.
  • the first training style image and the second training style image have different styles, so that the high-resolution image generated by the trained generation network G has the target style and incorporates the first training style image.
  • Style, the low-resolution image generated by the generating network G obtained after training has the style of the first training style image and integrates the target style. This disclosure does not limit this.
  • the high-resolution images and low-resolution images generated by the generation network G obtained after training retain the content characteristics of the input of the generation network G (for example, the first training input image and the second training input image).
  • the generation network and the target network may have different structures.
  • the target network obtained by the above-mentioned training method may only include the backbone network MN and the first branch network BN1 of the trained generation network G as shown in FIG. 6, so that only high values are obtained when the input image is subjected to style transfer processing. Resolution of the output image (higher than the resolution of the input image).
  • the structure of the backbone network in the target network is the same as the structure of the backbone network that generates the network G, and the structure of the first branch network in the target network and the structure of the first branch network of the generation network G are also the same, but the structure of the target network
  • the parameters of the backbone network are different from those of the backbone network of the generation network G, and the parameters of the first branch network in the target network and the parameters of the first branch network of the generation network G are different.
  • the generating network and the target network may also have the same structure, but the parameters of the generating network and the target network are different.
  • the target network obtained by the above-mentioned training method can include a complete generated network G as shown in Figure 6 which has been trained, so that when the input image is subjected to style transfer processing, a high-resolution output image (its resolution Higher than the resolution of the input image), you can get a low-resolution output image (the resolution is equal to the resolution of the input image).
  • “same structure” may mean that the number of convolutional layers, the number of upsampling layers, the number of downsampling layers, etc. are the same, and the connections of each convolutional layer, each upsampling layer, and/or each downsampling layer The relationship is also the same.
  • the generative network may not have the function of style transfer at all, or may also have the function of style transfer, but the effect of style transfer is not good.
  • the target network obtained after generating network training has the function of style transfer, and can generate high-quality high-resolution images with the target style.
  • the training method provided by at least one embodiment of the present disclosure combines a generative confrontation network, super-resolution technology, and style transfer technology.
  • the target network trained by the training method can generate high-quality high-quality images with target styles based on input images.
  • the resolution image improves the effect of image style transfer and image fusion, and enhances the user's visual experience; it has better and wider application prospects.
  • FIG. 11 is a schematic flowchart of an image processing method provided by an embodiment of the present disclosure. As shown in Figure 11, the image processing method includes the following steps:
  • Step S301 Obtain an input image
  • Step S302 Perform style transfer processing on the input image using a neural network to generate an output image, where the resolution of the output image is higher than the resolution of the input image.
  • the input image may be various types of images.
  • it may be an image of a person, an image of animals and plants, or a landscape image.
  • the input image can be acquired by an image acquisition device.
  • the image acquisition device may be, for example, a camera of a smart phone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, or even a webcam.
  • the neural network in step S302 may include a target network obtained by training according to the training method provided in any of the foregoing embodiments.
  • the output image is an image formed after the input image undergoes style transfer processing on the target network.
  • the output image includes the content feature of the input image and the target style feature.
  • the target style feature is determined and unchanged.
  • a Picasso painting for example, "Dream”
  • the style image may be the second training style image in the embodiment of the above training method.
  • the first training style image in the embodiment of the above training method may be the image of the second training style. Low-resolution version, and the resolution of the first training style image and the input image are the same.
  • the image processing method provided by the embodiments of the present disclosure can perform style transfer processing on an input image through a target network, generate high-quality high-resolution images with a target style, improve the effects of image style transfer and image fusion, and improve users’ Visual experience; has a better and wider application prospect.
  • FIG. 12A is a schematic block diagram of an image processing apparatus according to an embodiment of the present disclosure.
  • the image processing device 400 includes an image acquisition module 410 and an image processing module 420.
  • the image acquisition module 410 is used to acquire an input image
  • the image processing module 420 is used to perform style transfer processing on the input image to generate an output image.
  • the image acquisition module 410 may include a memory, and the memory stores an input image.
  • the image acquisition module 410 may also include one or more cameras to acquire input images.
  • the image acquisition module 410 may be hardware, software, firmware, and any feasible combination thereof.
  • the image processing module 420 may include the target network trained according to the training method described in any of the above embodiments.
  • the target network may include the backbone network MN and the first branch network BN1 that have been trained to generate the network G as shown in FIG. 6.
  • the resolution of the output image is higher than the resolution of the input image.
  • FIG. 12B is a schematic block diagram of another image processing apparatus provided by at least one embodiment of the present disclosure.
  • the image processing apparatus 500 includes a memory 510 and a processor 520.
  • the memory 510 is used for non-transitory storage of computer-readable instructions
  • the processor 520 is used for running the computer-readable instructions.
  • the neural network training provided by the embodiments of the present disclosure is executed. method.
  • the memory 510 and the processor 520 may directly or indirectly communicate with each other.
  • components such as the memory 510 and the processor 520 may communicate through a network connection.
  • the network may include a wireless network, a wired network, and/or any combination of a wireless network and a wired network.
  • the network may include a local area network, the Internet, a telecommunication network, the Internet of Things (Internet of Things) based on the Internet and/or a telecommunication network, and/or any combination of the above networks, etc.
  • the wired network may, for example, use twisted pair, coaxial cable, or optical fiber transmission for communication, and the wireless network may use, for example, a 3G/4G/5G mobile communication network, Bluetooth, Zigbee, or WiFi.
  • the present disclosure does not limit the types and functions of the network here.
  • the processor 520 may control other components in the image processing apparatus to perform desired functions.
  • the processor 520 may be a central processing unit (CPU), a tensor processor (TPU), or a graphics processor GPU, and other devices with data processing capabilities and/or program execution capabilities.
  • the central processing unit (CPU) can be an X86 or ARM architecture.
  • the GPU can be directly integrated on the motherboard alone or built into the north bridge chip of the motherboard.
  • the GPU can also be built into the central processing unit (CPU).
  • the memory 510 may include any combination of one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • Volatile memory may include random access memory (RAM) and/or cache memory (cache), for example.
  • the non-volatile memory may include, for example, read only memory (ROM), hard disk, erasable programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, flash memory, etc.
  • one or more computer instructions may be stored in the memory 510, and the processor 520 may execute the computer instructions to implement various functions.
  • the computer-readable storage medium may also store various application programs and various data, such as the first training style image and the second training style image, and various data used and/or generated by the application program.
  • the image processing device provided by the embodiments of the present disclosure is exemplary rather than restrictive. According to actual application requirements, the image processing device may also include other conventional components or structures, for example, to achieve image processing. For the necessary functions of the device, those skilled in the art can set other conventional components or structures according to specific application scenarios, which are not limited in the embodiments of the present disclosure.
  • At least one embodiment of the present disclosure also provides a storage medium.
  • one or more computer instructions can be stored on the storage medium.
  • Some computer instructions stored on the storage medium may be, for example, instructions for implementing one or more steps in the above-mentioned image processing method.
  • the other computer instructions stored on the storage medium may be, for example, instructions for implementing one or more steps in the above-mentioned neural network training method.
  • the storage medium may include the storage components of a tablet computer, the hard disk of a personal computer, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), optical disk read only memory (CD -ROM), flash memory, or any combination of the above storage media, can also be other suitable storage media.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • CD -ROM optical disk read only memory
  • flash memory or any combination of the above storage media, can also be other suitable storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

一种神经网络的训练方法、图像处理方法、图像处理装置。该训练方法包括:基于生成网络,对判别网络进行训练;基于所述判别网络,对所述生成网络进行训练;以及交替地执行上述训练过程,以得到基于训练后的所述生成网络的目标网络;目标网络用于对输入图像进行风格迁移处理以得到输出图像,所述输出图像的分辨率高于所述输入图像的分辨率。该训练方法结合了生成式对抗网络、超分辨率技术和风格迁移技术,经过该训练方法训练得到的目标网络可以基于输入图像生成高质量的具有目标风格的高分辨率图像,提高了图像风格迁移和图像融合的效果,具有更好、更广泛的应用前景。

Description

神经网络的训练方法、图像处理方法、图像处理装置和存储介质
本申请要求于2019年4月2日递交的中国专利申请第201910262329.8号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本公开的实施例涉及一种神经网络的训练方法、图像处理方法、图像处理装置和存储介质。
背景技术
当前,基于人工神经网络的深度学习技术已经在诸如物体分类、文本处理、推荐引擎、图像搜索、面部识别、年龄和语音识别、人机对话以及情感计算等领域取得了巨大进展。随着人工神经网络结构的加深和算法的提升,深度学习技术在类人类数据感知领域取得了突破性的进展,深度学习技术可以用于描述图像内容、识别图像中的复杂环境下的物体以及在嘈杂环境中进行语音识别等。同时,深度学习技术还可以解决图像生成和融合的问题。
发明内容
本公开至少一个实施例提供一种神经网络的训练方法,包括:基于生成网络,对判别网络进行训练;基于所述判别网络,对所述生成网络进行训练;以及,交替地执行上述训练过程,以得到基于训练后的所述生成网络的目标网络;
其中,所述目标网络用于对输入图像进行风格迁移处理以得到输出图像,所述输出图像的分辨率高于所述输入图像的分辨率;
基于所述判别网络,对所述生成网络进行训练,包括:利用所述生成网络对第一训练输入图像进行风格迁移处理,以分别生成第一训练输出图像和第二训练输出图像,其中,所述第一训练输出图像的分辨率高于所述第一训练输入图像的分辨率,所述第二训练输出图像的分辨率等于所述第一训练输入图像的分辨率;通过所述判别网络对所述第一训练输出图像进行处理,通过分析网络对所述第二训练输出图像进行处理,根据所述判别网络的输出和所述分析网络的输出,通过系统损失函数计算所述生成网络的系统损失值;以及根据所述系统损失值对所述生成网络的参数进行修正。
例如,在本公开一些实施例提供的训练方法中,所述生成网络包括主干网络、第一分支网络和第二分支网络,所述第一分支网络的输入和所述第二分支网络的输入均为所述主干网络的输出;
利用所述生成网络对所述第一训练输入图像进行风格迁移处理,以分别生成所述第一训练输出图像和所述第二训练输出图像,包括:根据所述第一训练输入图像,通过所述主干网络和所述第一分支网络生成所述第一训练输出图像,以及通过所述主干网络和所述第 二分支网络生成所述第二训练输出图像。
例如,在本公开一些实施例提供的训练方法中,所述主干网络包括依次连接的多个卷积模块和间插于相邻卷积模块的多个下采样层;所述第一分支网络包括依次连接的多个卷积模块和间插于相邻卷积模块的多个上采样层;所述第二分支网络包括依次连接的多个卷积模块和间插于相邻卷积模块的多个上采样层;其中,所述第一分支网络中的卷积模块的个数和上采样层的个数分别多于所述主干网络中的卷积模块的个数和下采样层的个数,所述第二分支网络中的卷积模块的个数和上采样层的个数分别等于所述主干网络中的卷积模块的个数和下采样层的个数。
例如,在本公开一些实施例提供的训练方法中,所述目标网络包括所述生成网络的所述主干网络和所述第一分支网络。
例如,在本公开一些实施例提供的训练方法中,所述系统损失函数包括生成网络对抗损失函数,所述系统损失值包括生成网络对抗损失值;
所述生成网络对抗损失函数表示为:
Figure PCTCN2020081375-appb-000001
其中,L G表示所述生成网络对抗损失函数,z表示所述第一训练输入图像,P z(z)表示所述第一训练输入图像的集合,G(z)表示所述第一训练输出图像,D(G(z))表示所述判别网络针对所述第一训练输出图像的输出,
Figure PCTCN2020081375-appb-000002
表示针对所述第一训练输入图像的集合求期望以得到所述生成网络对抗损失值。
例如,在本公开一些实施例提供的训练方法中,所述分析网络包括依次连接的多个第一卷积模块和间插于相邻第一卷积模块的多个第一下采样层,至少两个所述第一卷积模块用于提取风格特征,至少一个所述第一卷积模块用于提取内容特征。
例如,在本公开一些实施例提供的训练方法中,所述系统损失函数还包括内容损失函数,所述系统损失值还包括内容损失值;
所述内容损失函数表示为:
Figure PCTCN2020081375-appb-000003
其中,L content表示所述内容损失函数,C m表示用于提取所述内容特征的所述至少一个第一卷积模块中的第m个第一卷积模块的单层内容损失函数,w 1m表示C m的权重;
所述单层内容损失函数表示为:
Figure PCTCN2020081375-appb-000004
其中,S 1为常数,
Figure PCTCN2020081375-appb-000005
表示在所述第m个第一卷积模块中第i个第一卷积核提取的所述第一训练输入图像的第一内容特征图像中第j个位置的值,
Figure PCTCN2020081375-appb-000006
表示在所述第m个第一卷积模块中第i个第一卷积核提取的所述第二训练输出图像的第二内容特征图像中第j个位置的值。
例如,在本公开一些实施例提供的训练方法中,所述系统损失函数还包括风格损失函 数,所述系统损失值还包括风格损失值;
所述风格损失函数表示为:
Figure PCTCN2020081375-appb-000007
其中,L style表示所述风格损失函数,E n表示用于提取所述风格特征的所述至少两个第一卷积模块中的第n个第一卷积模块的单层风格损失函数,w 2m表示E n的权重;
所述单层风格损失函数表示为:
Figure PCTCN2020081375-appb-000008
其中,S 2为常数,N n表示所述第n个第一卷积模块中的第一卷积核的数目,M n表示所述第n个第一卷积模块中的第一卷积核提取的风格特征图像的尺寸,所述
Figure PCTCN2020081375-appb-000009
表示在所述第n个第一卷积模块中第i个第一卷积核提取的第一训练风格图像的第一风格特征图像的格拉姆矩阵中第j个位置的值,
Figure PCTCN2020081375-appb-000010
表示在所述第n个第一卷积模块中第i个第一卷积核提取的所述第二训练输出图像的第二风格特征图像的格拉姆矩阵中第j个位置的值。
例如,在本公开一些实施例提供的训练方法中,所述生成网络的参数包括多个卷积核和多个偏置,所述系统损失函数还包括权重偏置比损失函数,所述系统损失值还包括权重偏置比损失值;
基于所述判别网络,对所述生成网络进行训练,还包括:根据所述多个卷积核和所述多个偏置,通过所述权重偏置比损失函数计算所述权重偏置比损失值。
例如,在本公开一些实施例提供的训练方法中,所述权重偏置比损失函数表示为:
Figure PCTCN2020081375-appb-000011
其中,L L1表示所述权重偏置比损失函数,W为所述多个卷积核的绝对值的平均值,B为所述多个偏置的绝对值的平均值,ε为正数。
例如,在本公开一些实施例提供的训练方法中,基于生成网络,对判别网络进行训练,包括:利用所述生成网络对第二训练输入图像进行风格迁移处理,以生成第三训练输出图像,其中,所述第三训练输出图像的分辨率大于所述第二训练输入图像的分辨率;将第二训练风格图像和所述第三训练输出图像输入所述判别网络,其中,所述第二训练风格图像的分辨率等于所述第三训练输出图像的分辨率,根据所述第二训练风格图像的标签和所述第二训练风格图像对应的所述判别网络的输出以及所述第三训练输出图像的标签和所述第三训练输出图像对应的所述判别网络的输出,通过判别网络对抗损失函数计算判别网络对抗损失值;以及根据所述判别网络对抗损失值对所述判别网络的参数进行修正。
例如,在本公开一些实施例提供的训练方法中,所述判别网络对抗损失函数表示为:
Figure PCTCN2020081375-appb-000012
其中,L D表示所述判别网络对抗损失函数,x表示所述第二训练风格图像,P data(x)表示所述第二训练风格图像的集合,D(x)表示所述判别网络针对所述第二训练风格图像的输出,
Figure PCTCN2020081375-appb-000013
表示针对所述第二训练风格图像的集合求期望,z表示所述第二训练输入图像,P z(z)表示所述第二训练输入图像的集合,G(z)表示所述第三训练输出图像,D(G(z))表示所述判别网络针对所述第三训练输出图像的输出,
Figure PCTCN2020081375-appb-000014
表示针对所述第三训练输入图像的集合求期望。
本公开至少一个实施例还提供一种图像处理方法,包括:获取输入图像;以及利用神经网络对所述输入图像进行风格迁移处理,以生成输出图像;其中,所述神经网络包括根据本公开任一实施例提供的训练方法得到的所述目标网络,所述输出图像的分辨率高于所述输入图像的分辨率。
本公开至少一个实施例还提供一种图像处理装置,包括:图像获取模块,用于获取输入图像;以及图像处理模块,包括根据本公开任一实施例提供的训练方法得到的所述目标网络,所述图像处理模块配置为利用所述目标网络对所述输入图像进行风格迁移处理,生成所述输出图像。
本公开至少一个实施例还提供一种图像处理装置,包括:存储器,用于非暂时性存储计算机可读指令;以及处理器,用于运行所述计算机可读指令,所述计算机可读指令被所述处理器运行时执行本公开任一实施例提供的训练方法或本公开任一实施例提供的图像处理方法。
本公开至少一个实施例还提供一种存储介质,非暂时性地存储计算机可读指令,当所述计算机可读指令由计算机执行时能够执行本公开任一实施例提供的训练方法的指令或本公开任一实施例提供的图像处理方法的指令。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。
图1为一种卷积神经网络的示意图;
图2A为一种卷积神经网络的结构示意图;
图2B为一种卷积神经网络的工作过程示意图;
图3为另一种卷积神经网络的结构示意图;
图4为本公开至少一实施例提供的一种神经网络的训练方法的流程图;
图5A为本公开至少一实施例提供的一种对应于图4中所示的训练方法训练生成网络的示意性架构框图;
图5B为本公开至少一实施例提供的一种训练生成网络的过程的示意性流程图;
图6为本公开至少一实施例提供的一种生成网络的结构示意图;
图7A为本公开至少一实施例提供的一种上采样层的示意图;
图7B为本公开至少一实施例提供的另一种上采样层的示意图;
图8为本公开至少一实施例提供的一种判别网络的结构示意图;
图9为本公开至少一实施例提供的一种分析网络的结构示意图;
图10A为本公开至少一实施例提供的一种对应于图4中所示的训练方法训练判别网络的示意性架构框图;
图10B为本公开至少一实施例提供的一种训练判别网络的过程的示意性流程图;
图11为本公开至少一实施例提供的一种图像处理方法的示意性流程图;
图12A为本公开至少一实施例提供的一种图像处理装置的示意性框图;以及
图12B为本公开至少一实施例提供的另一种图像处理装置的示意性框图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
下面通过几个具体的实施例对本公开进行说明。为了保持本公开实施例的以下说明清楚且简明,本公开省略了已知功能和已知部件的详细说明。当本公开实施例的任一部件在一个以上的附图中出现时,该部件在每个附图中由相同或类似的参考标号表示。
当今,随着图像处理技术的发展,对图像进行风格化迁移处理的方法逐渐成为图像处理领域的研究热点。将一幅图像转换成具备某种艺术风格的图片,可以用于基于云计算的图像处理、风格渲染以及数字画廊等产品和服务。
利用深度神经网络进行图片的艺术风格转移是随着深度学习技术的发展而新兴起来的技术。例如,基于参考图像所给出的风格(style),对输入图像进行处理以获得至少部分体现出该风格的输出图像。例如,参考图像为某个艺术家的作品,输出图像则可以在保留原有内容的同时还呈现出该艺术家作画的某些风格特征,甚至使人将其误认为是该艺术家的作品。传统的图像处理效果(例如,instagram公司提供的各种滤镜等)无法获得这样的风格化迁移处理效果。
本公开至少一个实施例提供一种神经网络的训练方法、图像处理方法、图像处理装置,结合了生成式对抗网络、超分辨率技术和风格迁移技术,经过训练的神经网络可以基于输 入图像生成高质量的具有目标风格的高分辨率图像,提高了图像风格迁移和图像融合的效果,提升了用户的视觉体验;具有更好、更广泛的应用前景。
最初,卷积神经网络(Convolutional Neural Network,CNN)主要用于识别二维形状,其对图像的平移、比例缩放、倾斜或其他形式的变形具有高度不变性。CNN主要通过局部感知野和权值共享来简化神经网络模型的复杂性、减少权重的数量。随着深度学习技术的发展,CNN的应用范围已经不仅仅限于图像识别领域,其也可以应用在人脸识别、文字识别、动物分类、图像处理等领域。
图1示出了一种卷积神经网络的示意图。例如,该卷积神经网络可以用于图像处理,其使用图像作为输入和输出,并通过卷积核替代标量的权重。图1中仅示出了具有3层结构的卷积神经网络,本公开的实施例对此不作限制。如图1所示,卷积神经网络包括输入层101、隐藏层102和输出层103。输入层101具有4个输入,隐藏层102具有3个输出,输出层103具有2个输出,最终该卷积神经网络最终输出2幅图像。
例如,输入层101的4个输入可以为4幅图像,或者1幅图像的四种特征图像。隐藏层102的3个输出可以为经过输入层101输入的图像的特征图像。
例如,如图1所示,卷积层具有权重
Figure PCTCN2020081375-appb-000015
和偏置
Figure PCTCN2020081375-appb-000016
权重
Figure PCTCN2020081375-appb-000017
表示卷积核,偏置
Figure PCTCN2020081375-appb-000018
是叠加到卷积层的输出的标量,其中,k是表示输入层101的标签,i和j分别是输入层101的单元和隐藏层102的单元的标签。例如,第一卷积层201包括第一组卷积核(图1中的
Figure PCTCN2020081375-appb-000019
)和第一组偏置(图1中的
Figure PCTCN2020081375-appb-000020
)。第二卷积层202包括第二组卷积核(图1中的
Figure PCTCN2020081375-appb-000021
)和第二组偏置(图1中的
Figure PCTCN2020081375-appb-000022
)。通常,每个卷积层包括数十个或数百个卷积核,若卷积神经网络为深度卷积神经网络,则其可以包括至少五层卷积层。
例如,如图1所示,该卷积神经网络还包括第一激活层203和第二激活层204。第一激活层203位于第一卷积层201之后,第二激活层204位于第二卷积层202之后。激活层(例如,第一激活层203和第二激活层204)包括激活函数,激活函数用于给卷积神经网络引入非线性因素,以使卷积神经网络可以更好地解决较为复杂的问题。激活函数可以包括线性修正单元(ReLU)函数、S型函数(Sigmoid函数)或双曲正切函数(tanh函数)等。ReLU函数为非饱和非线性函数,Sigmoid函数和tanh函数为饱和非线性函数。例如,激活层可以单独作为卷积神经网络的一层,或者激活层也可以被包含在卷积层(例如,第一卷积层201可以包括第一激活层203,第二卷积层202可以包括第二激活层204)中。
例如,在第一卷积层201中,首先,对每个输入应用第一组卷积核中的若干卷积核
Figure PCTCN2020081375-appb-000023
和第一组偏置中的若干偏置
Figure PCTCN2020081375-appb-000024
以得到第一卷积层201的输出;然后,第一卷积层201的输出可以通过第一激活层203进行处理,以得到第一激活层203的输出。在第二卷积层202中,首先,对输入的第一激活层203的输出应用第二组卷积核中的若干卷积核
Figure PCTCN2020081375-appb-000025
和第二组偏置中的若干偏置
Figure PCTCN2020081375-appb-000026
以得到第二卷积层202的输出;然后,第二卷积层202的输出可以通过第二激活层204进行处理,以得到第二激活层204的输出。例如,第一卷积层201的输出可以为对其输入应用卷积核
Figure PCTCN2020081375-appb-000027
后再与偏置
Figure PCTCN2020081375-appb-000028
相加的结果,第二卷积层202的输出可 以为对第一激活层203的输出应用卷积核
Figure PCTCN2020081375-appb-000029
后再与偏置
Figure PCTCN2020081375-appb-000030
相加的结果。
在利用卷积神经网络进行图像处理前,需要对卷积神经网络进行训练。经过训练之后,卷积神经网络的卷积核和偏置在图像处理期间保持不变。在训练过程中,各卷积核和偏置通过多组输入/输出示例图像以及优化算法进行调整,以获取优化后的卷积神经网络模型。
图2A示出了一种卷积神经网络的结构示意图,图2B示出了一种卷积神经网络的工作过程示意图。例如,如图2A和2B所示,输入图像通过输入层输入到卷积神经网络后,依次经过若干个处理过程(如图2A中的每个层级)后输出类别标识。卷积神经网络的主要组成部分可以包括多个卷积层、多个下采样层和全连接层。例如,一个完整的卷积神经网络可以由这三种层叠加组成。例如,图2A仅示出了一种卷积神经网络的三个层级,即第一层级、第二层级和第三层级。例如,每个层级可以包括一个卷积模块和一个下采样层。例如,每个卷积模块可以包括卷积层。由此,每个层级的处理过程可以包括:对输入图像进行卷积(convolution)以及下采样(sub-sampling/down-sampling)。例如,根据实际需要,每个卷积模块还可以包括实例标准化(instance normalization)层,从而每个层级的处理过程还可以包括标准化处理。
例如,实例标准化层用于对卷积层输出的特征图像进行标准化处理,以使特征图像的像素的灰度值在预定范围内变化,从而简化图像生成过程,改善风格迁移的质量。例如,预定范围可以为[-1,1]。实例标准化层根据每个特征图像自身的均值和方差,对该特征图像进行标准化处理。例如,实例标准化层还可用于对单幅图像进行标准化处理。
例如,假设小批梯度下降法(mini-batch gradient decent)的尺寸为T,某一卷积层输出的特征图像的数量为C,且每个特征图像均为H行W列的矩阵,则特征图像的模型表示为(T,C,W,H)。从而,实例标准化层的标准化公式可以表示如下:
Figure PCTCN2020081375-appb-000031
其中,x tijk为该第一卷积层输出的特征图像集合中的第t个特征块(patch)、第i个特征图像、第j列、第k行的值。y tijk表示经过实例归一化层处理x tijk后得到的结果。ε为一个很小的整数,以避免分母为0。
卷积层是卷积神经网络的核心层。在卷积神经网络的卷积层中,一个神经元只与部分相邻层的神经元连接。卷积层可以对输入图像应用若干个卷积核(也称为滤波器),以提取输入图像的多种类型的特征。每个卷积核可以提取一种类型的特征。卷积核一般以随机小数矩阵的形式初始化,在卷积神经网络的训练过程中卷积核将通过学习以得到合理的权值。对输入图像应用一个卷积核之后得到的结果被称为特征图像(feature map),特征图像的数目与卷积核的数目相等。每个特征图像由一些矩形排列的神经元组成,同一特征图像的神经元共享权值,这里共享的权值就是卷积核。一个层级的卷积层输出的特征图像可以被输入到相邻的下一个层级的卷积层并再次处理以得到新的特征图像。例如,如图2A所示,第一层级的卷积层可以输出第一特征图像,该第一特征图像被输入到第二层级的卷积层再 次处理以得到第二特征图像。
例如,如图2B所示,卷积层可以使用不同的卷积核对输入图像的某一个局部感受域的数据进行卷积,卷积结果被输入激活层,该激活层根据相应的激活函数进行计算以得到输入图像的特征信息。
例如,如图2A和2B所示,下采样层设置在相邻的卷积层之间,下采样层是下采样的一种形式。一方面,下采样层可以用于缩减输入图像的规模,简化计算的复杂度,在一定程度上减小过拟合的现象;另一方面,下采样层也可以进行特征压缩,提取输入图像的主要特征。下采样层能够减少特征图像的尺寸,但不改变特征图像的数量。例如,一个尺寸为12×12的输入图像,通过6×6的卷积核对其进行采样,那么可以得到2×2的输出图像,这意味着输入图像上的36个像素合并为输出图像中的1个像素。最后一个下采样层或卷积层可以连接到一个或多个全连接层,全连接层用于连接提取的所有特征。全连接层的输出为一个一维矩阵,也就是向量。
图3示出了另一种卷积神经网络的结构示意图。例如,参见图3所示的示例,最后一个卷积层(即第t个卷积层)的输出被输入到平坦化层以进行平坦化操作(Flatten)。平坦化层可以将特征图像(2D图像)转换为向量(1D)。该平坦化操作可以按照如下的方式进行:
v k=f k/j,k%j
其中,v是包含k个元素的向量,f是具有i行j列的矩阵。
然后,平坦化层的输出(即1D向量)被输入到一个全连接层(FCN)。全连接层可以具有与卷积神经网络相同的结构,但不同之处在于,全连接层使用不同的标量值以替代卷积核。
例如,最后一个卷积层的输出也可以被输入到均化层(AVG)。均化层用于对输出进行平均操作,即利用特征图像的均值表示输出图像,因此,一个2D的特征图像转换成为一个标量。例如,如果卷积神经网络包括均化层,则其可以不包括平坦化层。
例如,根据实际需要,均化层或全连接层可以连接到分类器,分类器可以根据提取的特征进行分类,分类器的输出可以作为卷积神经网络的最终输出,即表征图像类别的类别标识(label)。
例如,分类器可以为支持向量机(Support Vector Machine,SVM)分类器、softmax分类器以及最邻近规则(KNN)分类器等。如图3所示,在一个示例中,卷积神经网络包括softmax分类器,softmax分类器是一种逻辑函数的生成器,可以把一个包含任意实数的K维向量z压缩成K维向量σ(z)。softmax分类器的公式如下:
Figure PCTCN2020081375-appb-000032
其中,Z j表示K维向量z中第j个元素,σ(z)表示每个类别标识(label)的预测概率,σ(z)为实数,且其范围为(0,1),K维向量σ(z)的和为1。根据以上公式,K维向量z中的 每个类别标识均被赋予一定的预测概率,而具有最大预测概率的类别标识被选择作为输入图像的标识或类别。
下面结合附图对本公开的一些实施例及其示例进行详细说明。
图4为本公开至少一实施例提供的一种神经网络的训练方法的流程图。例如,如图4所示,该训练方法包括:
步骤S10:基于生成网络,对判别网络进行训练;
步骤S20:基于判别网络,对生成网络进行训练;以及,
交替地执行上述训练过程,以得到基于训练后的所述生成网络的目标网络。
例如,在本公开的至少一个实施例中,利用该训练方法得到的目标网络可以用于对输入图像进行风格迁移处理以得到输出图像,该输出图像的分辨率高于该输入图像的分辨率。
图5A为本公开至少一实施例提供的一种对应于图4中所示的训练方法训练生成网络的示意性架构框图,图5B为本公开至少一实施例提供的一种训练生成网络的过程的示意性流程图。
例如,结合图5A和图5B所示,基于判别网络,对生成网络进行训练,即步骤S20,包括步骤S201至步骤S203,如下所示:
步骤S201:利用生成网络对第一训练输入图像进行风格迁移处理,以分别生成第一训练输出图像和第二训练输出图像,其中,第一训练输出图像的分辨率高于第一训练输入图像的分辨率,第二训练输出图像的分辨率等于第一训练输入图像的分辨率;
步骤S202:通过判别网络对第一训练输出图像进行处理,通过分析网络对第二训练输出图像进行处理,根据判别网络的输出和分析网络的输出,通过系统损失函数计算生成网络的系统损失值;
步骤S203:根据系统损失值对生成网络的参数进行修正。
例如,基于判别网络,对生成网络进行训练,即步骤S20还可以包括:判断生成网络G的训练是否满足预定条件,若不满足预定条件,则重复执行上述生成网络G的训练过程;若满足预定条件,则停止本阶段的生成网络G的训练过程,得到本阶段训练好的生成网络G。例如,在一个示例中,上述预定条件为连续两幅(或更多幅)第一训练输入图像对应的系统损失值不再显著减小。例如,在另一个示例中,上述预定条件为生成网络G的训练次数或训练周期达到预定数目。本公开对此不作限制。
例如,如图5A所示,在生成网络G的训练过程中,需要联合判别网络D和分析网络A进行训练。需要说明的是,在生成网络G的训练过程中,判别网络D的参数保持不变。
需要说明的是,上述以及后续的示例性描述中,例如,判别网络、生成网络以及判别网络以及这些神经网络包括的各种层(例如卷积层、上采样层、下采样层等)等每个分别对应执行相应处理过程的程序/方法,例如通过相应的软件、固件、硬件等方式实现,以下与此相同,不再赘述;并且,上述示例仅是示意性说明生成网络的训练过程。本领域技术人员应当知道,在训练阶段,需要利用大量样本图像对神经网络进行训练;同时,在每一 幅样本图像训练过程中,都可以包括多次反复迭代以对生成网络的参数进行修正。又例如,训练阶段还包括对生成网络的参数进行微调(fine-tune),以获取更优化的参数。
例如,生成网络G的初始参数可以为随机数,例如随机数符合高斯分布。例如,生成网络G的初始参数也可以采用ImageNet等图像数据库的已训练好的参数。本公开的实施例对此不作限制。
例如,生成网络G的训练过程中还可以包括优化函数(图5A中未示出),优化函数可以根据系统损失函数计算得到的系统损失值计算生成网络G的参数的误差值,并根据该误差值对生成网络G的参数进行修正。例如,优化函数可以采用随机梯度下降(stochastic gradient descent,SGD)算法、批量梯度下降(batch gradient descent,BGD)算法等计算生成网络G的参数的误差值。
例如,第一训练输入图像可以为各种类型的图像。例如,第一训练输入图像可以为通过数码相机或手机拍摄的图像,其可以为人物图像、动植物图像或风景图像等。
图6为本公开至少一实施例提供的一种生成网络的结构示意图。例如,如图6所示,该生成网络G包括主干网络MN、第一分支网络BN1和第二分支网络。第一分支网络BN1和第二分支网络BN2分别与主干网络MN连接,即第一分支网络BN1与主干网络MN处于同一的处理流程中,主干网络MN的输出被输入到第一分支网络BN1中,同样,第二分支网络BN2与主干网络MN处于同一的处理流程中,主干网络MN的输出被输入到第二分支网络BN2中,从而,第一分支网络BN1的输入和第二分支网络BN2的输入均为主干网络MN的输出,也就是说,第一分支网络BN1的输入和第二分支网络BN2的输入相同。
从而,在上述步骤S201中,利用生成网络G对第一训练输入图像进行风格迁移处理,以分别生成第一训练输出图像HR1和第二训练输出图像LR2,可以包括:根据该第一训练输入图像,通过主干网络MN和第一分支网络BN1生成第一训练输出图像HR1,以及通过主干网络MN和第二分支网络BN2生成第二训练输出图像LR2。主干网络MN和第一分支网络BN1对第一训练输入图像进行风格迁移处理以得到第一训练输出图像HR1,主干网络MN和第二分支网络BN2对第一训练输入图像进行风格迁移处理以得到第二训练输出图像LR2。
例如,如图6所示的实施例中,主干网络MN包括依次连接的多个卷积模块CM0和间插于相邻卷积模块CM0的多个下采样层DS0;第一分支网络BN1包括依次连接的多个卷积模块CM1和间插于相邻卷积模块的多个上采样层US1;第二分支网络BN2包括依次连接的多个卷积模块CM2和间插于相邻卷积模块的多个上采样层US2。
例如,在如图6所示的生成网络G中,每个卷积模块(CM0、CM1、CM2)可以包括卷积层,用于提取特征图像。低层级卷积模块的卷积层用于提取第一训练输入图像的低阶特征(例如,点、边等);随着层次的增加,高层级的卷积层可以提取第一训练输入图像的高阶特征(例如,直线、拐弯、三角形等);高阶特征可以由低阶特征组合得到。例如,根据需要,卷积层可以包括激活层。例如,至少部分卷积模块还可以包括实例标准化层, 用于对该至少部分卷积模块中的卷积层输出的特征图像进行标准化处理。例如,下采样层DS0用于减小输入的第一训练输入图像的特征图像的数据量,以提高图像处理的速度;例如,下采样层DS0用于减小特征图像的各个维度的值,从而减少特征图像的数据量。例如,上采样层(US1、US2)用于增加特征图像的各个维度的值,从而增加特征图像的数据量。
例如,在一些示例中,为了满足上述步骤S201中对第一训练输出图像HR1和第二训练输出图像LR2的分辨率的要求(即,第一训练输出图像HR1的分辨率高于第一训练输入图像的分辨率,第二训练输出图像LR2的分辨率等于第一训练输入图像的分辨率),第一分支网络BN1中的卷积模块CM1和上采样层US1的个数分别多于主干网络MN中的卷积模块CM0和下采样层DS0的个数,也就是说,第一分支网络BN1中的卷积模块CM1的个数多于主干网络MN中的卷积模块CM0的个数,第一分支网络BN1中的上采样层US1的个数多于主干网络MN中的下采样层DS0的个数。第二分支网络BN2中的卷积模块CM2和上采样层US2的个数分别等于主干网络MN中的卷积模块CM0和下采样层DS0的个数,也就是说,第二分支网络BN2中的卷积模块CM2的个数等于主干网络MN中的卷积模块CM0的个数,第二分支网络BN2中的上采样层US2的个数等于主干网络MN中的下采样层DS0的个数。由此可知,第一分支网络BN1中的卷积模块CM1的个数多于第二分支网络BN2中的卷积模块CM2的个数,第一分支网络BN1中的上采样层US1的个数多于第二分支网络BN2中的上采样层US2的个数。
例如,在一些示例中,在生成网络G中,主干网络MN包括x1个卷积模块CM0,其中x1为正整数,且通常大于2,同时,主干网络MN可以包括例如(x1–1)个下采样层DS0。相应地,第一分支网络BN1包括x2个卷积模块CM1,其中x2为正整数,且x2>x1,同时,第一分支网络BN1包括(x2–1)个上采样层US1;以及,第二分支网络BN2包括x1个卷积模块CM2和(x1–1)个上采样层US2。例如,在如图6所示的生成网络G中,x1=3,x2=5。需要说明的是,图6所示的示例中,x1、x2的取值是示意性的,本公开对此不作限制。
例如,在一些示例中,主干网络MN中的(x1–1)个下采样层DS0的下采样因子分别与第一分支网络BN1的(x2–1)个上采样层US1中的(x1–1)个的上采样因子对应,主干网络MN中的(x1–1)个下采样层DS0的下采样因子还分别与第二分支网络BN2中的(x1–1)个上采样层US2的上采样因子对应。其中,一个下采样层的下采样因子与一个上采样层的上采样因子对应是指:当该下采样层的下采样因子为1/y,则该上采样层的上采样因子为y,其中y为正整数,且y通常大于2。例如,在图6所示的生成网络G中,主干网络MN的3个下采样层DS0的下采样因子分别为1/q1、1/q2、1/q3,第一分支网络BN1的5个上采样层US1中任意3个上采样层US1的上采样因子分别为q1、q2、q3(具体顺序不作要求),第二分支网络BN2的3个上采样层US2的上采样因子分别为q1、q2、q3(具体顺序不作要求)。
需要说明的是,上述下采样层DS0、上采样层US1、上采样层US2的数目以及下采样层DS0的下采样因子、上采样层US1的上采样因子、上采样层US2的上采样因子也可以设 置为其他数值,只要能满足上述步骤S201中对第一训练输出图像HR1和第二训练输出图像LR2的分辨率的要求即可,本公开对此不作限制。
例如,下采样层DS0可以采用各种下采样方法对特征图像进行下采样。下采样方法包括但不限于:最大值合并(max pooling)、平均值合并(average pooling)、跨度卷积(strided convolution)、欠采样(decimation,例如选择固定的像素)、解复用输出(demuxout,将输入图像拆分为多个更小的图像)等等。
例如,上采样层US1、US2可以采用跨度转置卷积(strided transposed convolution)、插值算法等上采样方法实现上采样。插值算法例如可以包括内插值、两次立方插值算法(Bicubic Interprolation)等。
图7A为本公开至少一实施例提供的一种上采样层的示意图,图7B为本公开至少一实施例提供的另一种上采样层的示意图。
例如,在一些示例中,如图7A所示,上采样层采用像素插值法实现上采样。此时,该上采样层还可以称为复合层。复合层采用2×2的上采样因子,从而可以将4个输入特征图像(即,图7A中的INPUT 4n,INPUT 4n+1,INPUT 4n+2,INPUT 4n+3)结合以得到1个具有固定像素顺序的输出特征图像(即,图7A中的OUTPUT n)。
例如,在一些示例中,对于二维的特征图像,上采样层获取输入的第一数量的输入特征图像,将这些输入特征图像的像素值交织(interleave)重排以产生相同的第一数量的输出特征图像。相比于输入特征图像,输出特征图像的数量没有改变,但是每个输出特征图像的大小增加相应倍数。由此,该复合层通过不同的排列组合增加更多的数据信息,这些组合可给出所有可能的上采样组合。最后,可通过激活层从上采样组合进行选择。
例如,在图7B所示的示例中,上采样层采用像素值交织重排方法实现上采样。此时,该上采样层也可以称为复合层。复合层同样采用2×2的上采样因子,即以每4个输入特征图像(即,图7B中的INPUT 4n,INPUT 4n+1,INPUT 4n+2,INPUT 4n+3)为一组,将它们的像素值交织生成4个输出特征图像(即,图7B中的OUTPUT 4n,OUTPUT 4n+1,OUTPUT 4n+2,OUTPUT 4n+3)。输入特征图像的数量和经过复合层处理后得到的输出特征图像的数量相同,而各输出特征图像的大小增加为输入特征图像的4倍,即具有输入特征图像的4倍的像素数量。
如图5A所示,在生成网络G的训练过程中,通过判别网络D对第一训练输出图像HR1进行处理。图8为本公开一实施例提供的一种判别网络的结构示意图。例如,如图8所示,该判别网络D包括多个卷积模块CM3、多个下采样层DS3和全连接层FCN。卷积模块CM3、下采样层DS3和全连接层FCN的结构和作用可以分别参考前述与卷积模块(CM0,CM1,CM2)、下采样层DS0、全连接层相关的描述,本公开对此不作限制。
例如,如图8所示,在该判别网络D中,多个卷积模块CM3依次连接,在一些相邻的卷积模块CM3之间具有下采样层DS3,例如,如图8所示,判别网络D包括依次连接的六个卷积模块CM3,在第二个卷积模块和第三卷积模块之间具有一个下采样层,在第四个卷 积模块和第五卷积模块之间具有一个下采样层。全连接层FCN与最后一个卷积模块CM3连接。例如,每个卷积模块CM3可以包括卷积层;例如,根据需要,至少部分卷积模块CM3还可以包括实例标准化层。
例如,如图8所示,该判别网络D还包括激活层,该激活层连接到全连接层FCN。例如,如图8所示,该激活层的激活函数可以采用Sigmoid函数,从而,该激活层的输出(即判别网络D的输出)为一个取值范围为[0,1]的数值。例如,判别网络D可以判断第一训练输出图像HR1的风格和目标风格之间的相似程度,以第一训练输出图像HR1作为判别网络D的输入为例,判别网络D对第一训练输出图像HR1进行处理,以得到判别网络D输出,判别网络D输出的数值表示第一训练输出图像HR1的风格与目标风格的相似程度。例如,该判别网络D输出的数值越大,例如趋近于1,表示判别网络D认定第一训练输出图像HR1的风格与目标风格越相似;例如,该判别网络D输出的数值越小,例如趋近于0,则表示判别网络D认定第一训练输出图像HR1的风格与目标风格越不相似。例如,该目标风格可以为后续将介绍的第二训练风格图像的风格,即用户希望该生成网络的目标网络可以生成的风格。
需要说明的是,图8所示的判别网络是示意性的。例如,在一些示例中,图8所示的判别网络可以包括更多或更少的卷积模块或下采样层。例如,在一些示例中,图8所示的判别网络还可以包括其他模块或层结构,例如在全连接层之前还具有一个平坦化模块。例如,在一些示例中,图8所示的判别网络中的部分模块或层结构可以替换为其他模块或层结构,例如将全连接层替换为进行平均操作(AVG)的卷积层(参考图3及前述相关描述),又例如将激活层替换为二分类的softmax模块。进一步地,本公开的实施例对判别网络的结构不作限制,即包括但不限于图8所示的判别网络结构。
如图5A所示,系统损失函数可以包括生成网络对抗损失函数,相应地,系统损失值包括生成网络对抗损失值。生成网络对抗损失函数根据判别网络D的输出计算生成网络对抗损失值。例如,在一些示例中,生成网络对抗损失函数可以表示为:
Figure PCTCN2020081375-appb-000033
其中,L G表示生成网络对抗损失函数,z1表示第一训练输入图像,P z1(z1)表示第一训练输入图像的集合(例如,包括一个批次的多幅第一训练输入图像),G(z1)表示第一训练输出图像HR1,D(G(z1))表示判别网络D针对第一训练输出图像HR1的输出,即判别网络D对第一训练输出图像HR1进行处理得到的输出,
Figure PCTCN2020081375-appb-000034
表示针对第一训练输入图像的集合求平均以得到生成网络对抗损失值,即相应采用批量梯度下降算法对生成网络G进行参数优化。
需要说明的是,上述公式表示的生成网络对抗损失函数是示例性的,本公开的实施例包括但不限于此。
生成网络G的训练目标是最小化系统损失值,因此,在生成网络G的训练过程中,最小化系统损失值包括减小生成网络对抗损失值。例如,在生成网络G的训练过程中,第一 训练输出图像HR1的标签设置为1,即希望判别网络D鉴别认定第一训练输出图像HR1具有目标风格。例如,在生成网络G的训练过程中,生成网络G的参数被不断地修正,以使经过参数修正后的生成网络G生成的第一训练输出图像HR1对应的判别网络D的输出不断趋近于1,从而不断地减小生成网络对抗损失值。如图5A所示,在生成网络G的训练过程中,还通过分析网络A对第二训练输出图像LR2进行处理。图9为本公开至少一实施例提供的一种分析网络的结构示意图。例如,如图9所示,该分析网络G包括依次连接的多个第一卷积模块CM01和间插于相邻第一卷积模块CM01的多个第一下采样层DS01。例如,每个第一卷积模块CM01包括第一卷积层,每个第一卷积层包括多个第一卷积核,第一卷积核可以用于提取分析网络A的输入图像的内容特征和风格特征。例如,参考图5A,图9所示的分析网络A的输入可以包括第一训练输入图像、第二训练输出图像LR2和第一训练风格图像。例如,根据需要,至少部分第一卷积模块CM01还可以包括实例标准化层。
例如,分析网络A可以采用能够对图像进行分类的深度神经网络。如图9所示,输入经过若干个第一卷积模块CM01和第一下采样层DS01处理,以提取特征。每个第一卷积模块CM01的输出都是输入的特征图像。第一下采样层DS01可以降低特征图像的分辨率并传递给下一层级的第一卷积模块CM01。多个第一卷积模块CM01可以输出多个特征图像,该多个特征图像可以表征输入的不同级别的特征(例如,纹理、边缘、物体等)。经过若干个第一卷积模块CM01和第一下采样层DS01处理之后,特征图像被输入至平坦化层,平坦化层将特征图像转换成向量然后传递给全连接层以及分类器。分类器层可以包括softmax分类器,softmax分类器可以输出输入属于每一个类别标识的概率,其中概率最大的标识将作为分析网络A最终的输出。由此,分析网络A实现图像分类。
例如,分析网络A可以采用已经训练好的卷积神经网络模型。从而,在生成网络G的训练过程中,不需对分析网络A的参数(例如,第一卷积核等)进行修正。例如,分析网络A可以采用AlexNet、GoogleNet、VGG、Deep Residual Learning等神经网络模型实现提取输入的内容特征和风格特征。VGG网络为深度卷积神经网络的一种,其是由牛津大学视觉几何组(Visual Geometry Group)开发,已经在视觉识别领域得到广泛应用。例如,VGG网络可以包括19层,并且可以对其中的一些层进行标准化处理。
需要说明的是,在本公开的至少一些实施例中,在生成网络G的训练过程中,仅需要用到上述分析网络G中用于提取输入的特征的部分,例如,如图9中虚线框所示的多个第一卷积模块CM01和多个第一下采样层DS01。例如,在本公开的实施例提供的分析网络A中,如图9所示,至少两个第一卷积模块CM01用于提取风格特征,至少一个第一卷积模块CM01用于提取内容特征。需要说明的是,图9所示的分析网络是示意性的。本公开的实施例对分析网络的结构、提取风格特征和内容特征的具体细节(例如,用于提取风格特征和内容特征的第一卷积模块的数量和层级等)等均不作限制。
例如,如图5A所示,在生成网络G的训练过程中,分析网络A用于接收第一训练输入图像、第一训练风格图像和第二训练输出图像LR2,且分别产生并输出第一训练输入图 像的第一内容特征图像、第一训练风格图像的第一风格特征图像、以及第二训练输出图像LR2的第二内容特征图像和第二训练输出图像LR2的第二风格特征图像。
例如,第一训练风格图像可以为各种艺术大师(例如莫奈、梵高、毕加索等)的名画等,但不限于此,例如第一训练风格图像还可以为水墨画、素描画等。例如,第一训练风格图像具有与后续将介绍的第二训练风格图像一致的目标风格。例如,第一训练风格图像是第二训练风格图像的低分辨率版本,也就是说,第一训练风格图像和第二训练风格图像可以为同一幅风格图像,但第一训练风格图像的分辨率小于第二训练风格图像的分辨率。但本公开不限于此。
例如,第一训练输入图像、第一训练风格图像和第二训练输出图像LR2三者的尺寸均相同。
例如,在本公开的至少一些实施例中,内容特征表示图像中物体在整幅图像的分布,风格特征则表示在卷积神经网络的不同层中不同特征图像之间的关系。例如,内容特征包括图像的内容信息,风格特征可以包括图像的纹理信息、颜色信息等。纹理信息例如表示特征图像之间的相关性,其与位置无关。卷积神经网络中的特征图像可以是一维矩阵,格拉姆矩阵(Gram matrix)可以用于衡量该一维矩阵中各向量的相关程度,因此,卷积神经网络可以引入Gram矩阵计算图像的风格特征。例如,Gram矩阵可以表示如下:
Figure PCTCN2020081375-appb-000035
其中,
Figure PCTCN2020081375-appb-000036
为第l层中向量特征图像F i和F j之间的内积(inner product)。根据多层特征图像之间的相关性,可以获得第一训练风格图像或第二训练输出图像LR2的静态的多尺度(scale)表达,由此提取了第一训练风格图像或第二训练输出图像LR2的纹理信息而非全局布局,进而获得风格特征。
例如,相应地,如图5A所示,系统损失函数还可以包括内容损失函数和风格损失函数,从而系统损失值还可以包括内容损失值和风格损失值。内容损失函数用于描述第一训练输入图像和第二训练输出图像LR2的内容的差异,风格损失函数用于描述第一训练风格图像和第二训练输出图像LR2的风格的差异。例如,内容损失函数用于根据第一训练输入图像的第一内容特征图像和第二训练输出图像LR2的第二内容特征图像计算生成网络G的参数的内容损失值。风格损失函数用于根据第一训练风格图像的第一风格特征图像和第二训练输出图像LR2的第二风格特征图像计算生成网络G的参数的风格损失值。
例如,对于如图9所示的分析网络A,单层内容损失函数表示为:
Figure PCTCN2020081375-appb-000037
其中,S 1为常数,
Figure PCTCN2020081375-appb-000038
表示在分析网络A中第m个第一卷积模块中第i个第一卷积核提取的第一训练输入图像的第一内容特征图像中第j个位置的值,
Figure PCTCN2020081375-appb-000039
表示在分析网络A中第m个第一卷积模块中第i个第一卷积核提取的第二训练输出图像LR2的第二内容特征图像中第j个位置的值。
例如,在如图9所示的分析网络A中,可以通过至少一个第一卷积模块CM01提取输 入图像(例如,此处的输入图像包括第一训练输入图像和第二训练输出图像LR2)的内容特征,则内容损失函数表示为:
Figure PCTCN2020081375-appb-000040
其中,L content表示内容损失函数,C m表示用于提取内容特征的至少一个第一卷积模块中的第m个第一卷积模块的单层内容损失函数,w 1m表示C m的权重。
在生成网络G的训练过程中,最小化系统损失值包括减小内容损失值。例如,在使用生成网络G进行图像风格迁移处理时,希望保持生成网络G的输出和输入具有相同的内容特征,即第二训练输出图像LR2保存了第一训练输入图像的内容。例如,在生成网络G的训练过程中,生成网络G的参数被不断地修正,以使经过参数修正后的生成网络G生成的第二训练输出图像LR2的内容特征不断趋近于第一训练输入图像的内容特征,从而不断地减小内容损失值。
例如,对于如图9所示的分析网络A,单层风格损失函数表示为:
Figure PCTCN2020081375-appb-000041
其中,S 2为常数,N n表示分析网络A的第n个第一卷积模块中的第一卷积核的数目,M n表示第n个第一卷积模块中的第一卷积核提取的风格特征图像的尺寸,
Figure PCTCN2020081375-appb-000042
表示在分析网络A中的第n个第一卷积模块中第i个第一卷积核提取的第一训练风格图像的第一风格特征图像的格拉姆矩阵中第j个位置的值,
Figure PCTCN2020081375-appb-000043
表示在分析网络A中的第n个第一卷积模块中第i个第一卷积核提取的第二训练输出图像的第二风格特征图像的格拉姆矩阵中第j个位置的值。
例如,在如图9所示的分析网络A中,可以通过至少两个第一卷积模块CM01提取输入图像(例如,此处的输入图像包括第一训练风格图像和第二训练输出图像LR2)的风格特征,则风格损失函数表示为:
Figure PCTCN2020081375-appb-000044
其中,L style表示风格损失函数,En表示用于提取风格特征的至少两个第一卷积模块中的第n个第一卷积模块的单层风格损失函数,w2m表示En的权重。
在生成网络G的训练过程中,最小化系统损失值包括减小风格损失值。例如,在使用生成网络G进行图像风格迁移处理时,希望生成网络G的输出具有目标风格,即第二训练输出图像LR2具有与第一训练风格图像相同的风格特征。例如,在生成网络G的训练过程中,生成网络G的参数被不断地修正,以使经过参数修正后的生成网络G生成的第二训练输出图像LR2的风格特征不断趋近于第一训练风格图像的风格特征,从而不断地减小风格损失值。
例如,在本公开的至少一些实施例中,如图5A所示,系统损失函数还可以包括权重偏置比损失函数(L1损失函数),相应地,系统损失值还包括权重偏置比损失值。本公开实施例提供的生成网络通过在系统损失函数中增加权重偏置比损失函数,从而可以充分发挥 生成网络G中的激活函数的作用,获取更优化的生成网络G的参数,提高图像风格迁移和图像融合效果,在处理效果和处理速度等方面得以兼顾,具有更好、更广泛的应用前景。
例如,如图6所示的生成网络G包括多个卷积核和多个偏置,该多个卷积核和该多个偏置为生成网络G中的所有卷积模块(CM0、CM1、CM2)的卷积层所包括的卷积核和偏置。生成网络G的参数可以包括该多个卷积核和该多个偏置。
例如,在卷积神经网络中,卷积核用于决定对输入图像进行怎样的处理,偏置用于决定该卷积核的输出是否输入到下一个层级。因此,在卷积神经网络的激活层中,偏置可形象地比喻为“开关”,用于决定“打开”或“关闭”卷积核。针对不同的输入图像,不同的卷积核可以被“打开”或“关闭”以实现多种效果。
例如,在本公开的一些实施例中,与卷积核相比,偏置需要具有比较大的绝对值,从而更有效地发挥“开关”的作用。权重偏置比损失函数则用于调整生成网络G中的多个卷积核和多个偏置之间的比值,以增强生成网络G中的激活函数的作用。
例如,相应地,在本公开的一些实施例中,在生成网络G的训练过程中,例如上述步骤S202中,还可以包括:根据生成网络G的多个卷积核和多个偏置,通过权重偏置比损失函数计算权重偏置比损失值。
例如,权重偏置比损失函数可以表示为:
Figure PCTCN2020081375-appb-000045
其中,L L1表示权重偏置比损失函数,W为生成网络G的多个卷积核的绝对值的平均值,B为生成网络G的多个偏置的绝对值的平均值,ε为正数。
例如,W可以表示为:
Figure PCTCN2020081375-appb-000046
其中,C w为第一卷积层具有的第一卷积核的数量,w表示第一卷积层的各卷积核的值。例如,卷积核为2×2的矩阵,w表示矩阵各元素之和。
例如,B可以表示为:
Figure PCTCN2020081375-appb-000047
其中,C b为第一卷积层具有的偏置的数量,b表示第一卷积层的各偏置的值。
例如,在本公开的一些实施例中,生成网络G中的激活函数可以采用ReLU函数。但不限于此,生成网络G中的激活函数还可以采用sigmoid函数、tanh函数等。
例如,ReLU函数可以表示为:
Figure PCTCN2020081375-appb-000048
其中,out表示ReLU函数的输出,in 1表示ReLU函数的输入。在卷积神经网络中,in 1可以表示为:
in 1=w·in 0+b
其中,in 0表示输入到例如第一层级的卷积层CM0中的第一训练输入图像的像素矩阵,in 1表示经过该第一层级的卷积层CM0对in 0进行处理后输出的特征图像的像素矩阵,w表示该第一层级的卷积层CM0中的卷积核的值,b表示该第一层级的卷积层CM0中的偏置的值。当b足够大时,则激活函数更能有效发挥激活作用,即激活函数的输出可以更好地表示第一训练输入图像的特征信息。
例如,在本公开的实施例中,生成网络G的系统损失函数可以表示为:
L total=aL content+βL style+χL G+δL L1
其中,L total表示系统损失函数,a、β、χ和δ分别为系统损失函数中内容损失函数、风格损失函数、生成网络对抗损失函数和权重偏置比损失函数的权重。例如,在步骤S202中,通过上述公式表示的系统损失函数计算系统损失值,再执行步骤S203,对生成网络G的所有参数(包括主干网络MN的参数、第一分支网络BN1的参数和第二分支网络BN2的参数)进行修正,由此可以实现步骤S20。
图10A为本公开至少一实施例提供的一种对应于图4中所示的训练方法训练判别网络的示意性架构框图,图10B为本公开至少一实施例提供的一种训练判别网络的过程的示意性流程图。
例如,结合图10A和图10B所示,基于生成网络,对判别网络进行训练,即步骤S10,包括步骤S101至步骤S103,如下所示:
步骤S101:利用生成网络对第二训练输入图像进行风格迁移处理,以生成第三训练输出图像,其中,第三训练输出图像的分辨率大于第二训练输入图像的分辨率;
步骤S102:将第二训练风格图像和第三训练输出图像输入判别网络,其中,第二训练风格图像的分辨率等于第三训练输出图像的分辨率,根据第二训练风格图像的标签和第二训练风格图像对应的判别网络的输出以及第三训练输出图像的标签和第三训练输出图像对应的判别网络的输出,通过判别网络对抗损失函数计算判别网络对抗损失值;
步骤S103:根据所判别网络对抗损失值对判别网络的参数进行修正。
例如,基于生成网络,对判别网络进行训练,即步骤S10还可以包括:判断判别网络D的训练是否满足预定条件,若不满足预定条件,则重复执行上述判别网络D的训练过程;若满足预定条件,则停止本阶段的判别网络D的训练过程,得到本阶段训练好的判别网络D。例如,在一个示例中,上述预定条件为连续两幅(或更多幅)第二训练风格图像和第三训练输出图像HR3对应的判别网络对抗损失值不再显著减小。例如,在另一个示例中,上述预定条件为判别网络D的训练次数或训练周期达到预定数目。本公开对此不作限制。
例如,如图10A所示,在判别网络D的训练过程中,需要联合生成网络G进行训练。需要说明的是,在判别网络D的训练过程中,生成网络G的参数保持不变。
需要说明的是,上述示例仅是示意性说明判别网络的训练过程。本领域技术人员应当知道,在训练阶段,需要利用大量样本图像对神经网络进行训练;同时,在每一幅样本图 像训练过程中,都可以包括多次反复迭代以对判别网络的参数进行修正。又例如,训练阶段还包括对判别网络的参数进行微调(fine-tune),以获取更优化的参数。
例如,判别网络D的初始参数可以为随机数,例如随机数符合高斯分布,本公开的实施例对此不作限制。
例如,判别网络D的训练过程中还可以包括优化函数(图10A中未示出),优化函数可以根据判别网络对抗损失函数计算得到的判别网络对抗损失值计算判别网络D的参数的误差值,并根据该误差值对判别网络D的参数进行修正。例如,优化函数可以采用随机梯度下降(stochastic gradient descent,SGD)算法、批量梯度下降(batch gradient descent,BGD)算法等计算判别网络D的参数的误差值。
例如,第二训练输入图像可以与第一训练输入图像相同,例如,第二训练输入图像的集合与第一训练输入图像的集合是同一个图像集合。例如,第二训练输入图像可以为各种类型的图像。例如,第二训练输入图像可以为通过数码相机或手机拍摄的图像,其可以为人物图像、动植物图像或风景图像等。
例如,第二训练风格图像具有与第一训练风格图像一致的目标风格。例如,第二训练风格图像可以是第一训练风格图像的高分辨率版本,但不限于此。
例如,第二训练风格图像可以为各种艺术大师(例如莫奈、梵高、毕加索等)的名画等,但不限于此,例如第二训练风格图像还可以为水墨画、素描画等。
例如,判别网络D可以为如图8所示的判别网络,但不限于此。
例如,在一些示例中,如图10A所示的判别网络对抗损失函数可以表示为:
Figure PCTCN2020081375-appb-000049
其中,L D表示判别网络对抗损失函数,x表示第二训练风格图像,P data(x)表示第二训练风格图像的集合(例如,包括一个批次的多幅第二训练风格图像),D(x)表示判别网络D针对第二训练风格图像x的输出,即判别网络D对第二训练风格图像x进行处理得到的输出,
Figure PCTCN2020081375-appb-000050
表示针对第二训练风格图像的集合求期望,z2表示第二训练输入图像,P z2(z2)表示第二训练输入图像的集合(例如,包括一个批次的多幅第二训练输入图像),G(z2)表示第三训练输出图像HR3,D(G(z2))表示判别网络D针对第三训练输出图像HR3的输出,即判别网络D对第三训练输出图像HR3进行处理得到的输出,
Figure PCTCN2020081375-appb-000051
表示针对第三训练输入图像的集合求期望。相应地,采用批量梯度下降算法对判别网络D进行参数优化。
需要说明的是,上述公式表示的判别网络对抗损失函数是示例性的,本公开包括但不限于此。
判别网络D的训练目标是最小化判别网络对抗损失值。例如,在判别网络D的训练过程中,第二训练风格图像的标签设置为1,即希望判别网络D鉴别认定第二训练风格图像具有目标风格;同时,第三训练输出图像HR3的标签设置为0,即希望判别网络D鉴别认定第三训练输出图像HR3不具有目标风格。也就是说,希望判别网络D认定第三训练输出图像HR3的风格和第二训练风格图像的风格不相同。
例如,在判别网络D的训练过程中,判别网络D的参数被不断地修正,以使经过参数修正后的判别网络D能够准确鉴别第二训练风格图像和生成网络G生成的第三训练输出图像HR3,也就是,使第二训练风格图像对应的判别网络D的输出不断趋近于1,以及使第三训练输出图像HR3对应的判别网络D的输出不断趋近于0,从而不断地减小生成网络对抗损失值。
例如,在本公开的实施例中,生成网络G的训练和判别网络D的训练是交替迭代进行的。例如,对于未经训练的生成网络G和判别网络D,一般先对判别网络D进行第一阶段训练,提高判别网络D的鉴别能力(即,鉴别判别网络D的输入是否具有目标风格的能力),得到经过第一阶段训练的判别网络D;然后,基于经过第一阶段训练的判别网络D对生成网络G进行第一阶段训练,提高生成网络G的图像风格迁移能力(即,使生成网络G生成的高分辨率图像具有目标风格的能力),得到经过第一阶段训练的生成网络G。与第一阶段训练类似,在第二阶段训练中,基于经过第一阶段训练的生成网络G,对经过第一阶段训练的判别网络D进行第二阶段训练,提高判别网络D的鉴别能力,得到经过第二阶段训练的判别网络D;然后,基于经过第二阶段训练的判别网络D对经过第一阶段训练的生成网络G进行第二阶段训练,提高生成网络G的图像风格迁移能力,得到经过第二阶段训练的生成网络G,依次类推,接下来对判别网络D和生成网络G进行第三阶段训练、第四阶段训练、……,直到得到的生成网络G生成的高分辨率图像完全具有了目标风格。
需要说明的是,在生成网络G和判别网络D的交替训练过程中,生成网络G和判别网络D的对抗体现在生成网络G的输出(生成网络G生成的高分辨率图像)在各自单独的训练过程中具有不同的标签(在生成网络G的训练过程中标签为1,在判别网络D的训练过程中标签为0),也体现在判别网络对抗损失函数的第二部分(即与生成网络G生成的高分辨率图像有关的部分)与系统损失函数中的生成网络对抗损失函数相反。还需要说明的是,理想情况下,经过训练得到的生成网络G输出的高分辨率图像具有目标风格(即第二训练风格图像的风格),判别网络D针对第二训练风格图像和该生成网络G生成的高分辨率图像的输出均为0.5,即生成网络G和判别网络D经过对抗博弈达到纳什均衡。
需要说明的是,在本公开的实施例中,目标风格是指第二训练风格图像的风格。例如,在一些示例中,第一训练风格图像与第二训练风格图像的风格相同,从而,经过训练得到的生成网络G生成的高分辨率图像和低分辨率图像均具有目标风格。例如,在另一些示例中,第一训练风格图像与第二训练风格图像的风格不同,从而,经过训练得到的生成网络G生成的高分辨率图像具有目标风格且融合了第一训练风格图像的风格,经过训练得到的生成网络G生成的低分辨率图像具有第一训练风格图像的风格且融合了目标风格。本公开对此不作限制。还需要说明的是,经过训练得到的生成网络G生成的高分辨率图像和低分辨率图像均保留了生成网络G的输入(例如第一训练输入图像、第二训练输入图像)的内容特征。
例如,在一些示例中,生成网络和目标网络可以具有不同的结构。例如,利用上述训 练方法得到的目标网络可以仅包括例如图6所示的已经训练好的生成网络G的主干网络MN和第一分支网络BN1,从而在对输入图像进行风格迁移处理时仅得到高分辨率的输出图像(高于输入图像的分辨率)。例如,目标网络中的主干网络的结构和生成网络G的主干网络的结构相同,目标网络中的第一分支网络的结构和生成网络G的第一分支网络的结构也相同,但目标网络中的主干网络的参数和生成网络G的主干网络的参数不相同,目标网络中的第一分支网络的参数和生成网络G的第一分支网络的参数也不相同。
例如,在另一些示例中,生成网络和目标网络也可以具有相同的结构,但是生成网络的参数和目标网络的参数不相同。利用上述训练方法得到的目标网络可以包括完整的如图6所示的已经训练好的生成网络G,从而在对输入图像进行风格迁移处理时,既可以得到高分辨率的输出图像(其分辨率高于输入图像的分辨率),又可以得到低分辨率的输出图像(其分辨率等于输入图像的分辨率)。
在本公开中,“结构相同”可以表示卷积层的数量、上采样层的数量、下采样层的数量等相同,且各卷积层、各上采样层和/或各下采样层的连接关系也相同。
需要说明的是,在对生成网络进行训练之前,生成网络可能完全不具有风格迁移的功能,或者也可以能具有风格迁移的功能,但是风格迁移的效果不好。对生成网络训练后得到的目标网络具有风格迁移的功能,且能够生成高质量的具有目标风格的高分辨率图像。
本公开的至少一实施例提供的训练方法,结合了生成式对抗网络、超分辨率技术和风格迁移技术,经过该训练方法训练得到的目标网络可以基于输入图像生成高质量的具有目标风格的高分辨率图像,提高了图像风格迁移和图像融合的效果,提升了用户的视觉体验;具有更好、更广泛的应用前景。
本公开至少一实施例还提供一种图像处理方法。图11为本公开一实施例提供的一种图像处理方法的示意性流程图。如图11所示,该图像处理方法包括以下步骤:
步骤S301:获取输入图像;
步骤S302:利用神经网络对输入图像进行风格迁移处理,以生成输出图像,其中,输出图像的分辨率高于输入图像的分辨率。
例如,在步骤S301中,输入图像可以为各种类型的图像。例如,可以为人物图像、动植物图像或风景图像等。例如,输入图像可以通过图像采集设备获取。图像采集设备例如可以是智能手机的摄像头、平板电脑的摄像头、个人计算机的摄像头、数码照相机的镜头、或者甚至可以是网络摄像头。
例如,步骤S302中的神经网络可以包括根据上述任一实施例提供的训练方法训练得到的目标网络。
例如,输出图像为输入图像经过目标网络进行风格迁移处理后形成的图像。输出图像包括输入图像的内容特征和目标风格特征。当经过训练得到目标网络之后,则目标风格特征即确定且不变。例如,在训练神经网络的过程中,采用毕加索的一幅画(例如,《梦》)作为风格图像进行训练,则当利用该训练好的目标网络对输入图像进行处理之后得到的输 出图像的风格即为该毕加索的画(《梦》)的风格。需要说明的是,风格图像可以为上述训练方法的实施例中的第二训练风格图像,还需要说明的是,上述训练方法的实施例中的第一训练风格图像可以为第二训练风格图像的低分辨率版本,且第一训练风格图像和输入图像的分辨率相同。
本公开的实施例提供的图像处理方法可以通过目标网络对输入图像进行风格迁移处理,生成高质量的具有目标风格的高分辨率图像,提高了图像风格迁移和图像融合的效果,提升了用户的视觉体验;具有更好、更广泛的应用前景。
本公开至少一实施例还提供一种图像处理装置。图12A为本公开一实施例提供的一种图像处理装置的示意性框图。
例如,如图12A所示,该图像处理装置400包括图像获取模块410和图像处理模块420。图像获取模块410用于获取输入图像,图像处理模块420用于对输入图像进行风格迁移处理,以生成输出图像。
例如,图像获取模块410可以包括存储器,存储器存储有输入图像。或者,图像获取模块410也可以包括一个或多个摄像头,以获取输入图像。例如,图像获取模块410可以为硬件、软件、固件以及它们的任意可行的组合。
例如,图像处理模块420可以包括根据上述任一实施例所述的训练方法训练得到的所述目标网络。例如,所述目标网络可以包括例如图6所示的已经训练好的生成网络G的主干网络MN和第一分支网络BN1。例如,输出图像的分辨率高于输入图像的分辨率。
图12B为本公开至少一实施例提供的另一种图像处理装置的示意性框图。例如,如图12B所示,该图像处理装置500包括存储器510和处理器520。例如,存储器510用于非暂时性存储计算机可读指令,处理器520用于运行该计算机可读指令,该计算机可读指令被处理器520运行时执行本公开的实施例提供的神经网络的训练方法。
例如,存储器510和处理器520之间可以直接或间接地互相通信。例如,存储器510和处理器520等组件之间可以通过网络连接进行通信。网络可以包括无线网络、有线网络、和/或无线网络和有线网络的任意组合。网络可以包括局域网、互联网、电信网、基于互联网和/或电信网的物联网(Internet of Things)、和/或以上网络的任意组合等。有线网络例如可以采用双绞线、同轴电缆或光纤传输等方式进行通信,无线网络例如可以采用3G/4G/5G移动通信网络、蓝牙、Zigbee或者WiFi等通信方式。本公开对网络的类型和功能在此不作限制。
例如,处理器520可以控制图像处理装置中的其它组件以执行期望的功能。处理器520可以是中央处理单元(CPU)、张量处理器(TPU)或者图形处理器GPU等具有数据处理能力和/或程序执行能力的器件。中央处理器(CPU)可以为X86或ARM架构等。GPU可以单独地直接集成到主板上,或者内置于主板的北桥芯片中。GPU也可以内置于中央处理器(CPU)上。
例如,存储器510可以包括一个或多个计算机程序产品的任意组合,计算机程序产品 可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。
例如,在存储器510上可以存储一个或多个计算机指令,处理器520可以运行所述计算机指令,以实现各种功能。在计算机可读存储介质中还可以存储各种应用程序和各种数据,例如第一训练风格图像和第二训练风格图像、以及应用程序使用和/或产生的各种数据等。
例如,存储器510存储的一些计算机指令被处理器520执行时可以执行根据上文所述的图像处理方法中的一个或多个步骤。
例如,关于图像处理方法的处理过程的详细说明可以参考上述图像处理方法的实施例中的相关描述,关于神经网络的训练方法的处理过程的详细说明可以参考上述神经网络的训练方法的实施例中的相关描述,重复之处不再赘述。
需要说明的是,本公开的实施例提供的图像处理装置是示例性的,而非限制性的,根据实际应用需要,该图像处理装置还可以包括其他常规部件或结构,例如,为实现图像处理装置的必要功能,本领域技术人员可以根据具体应用场景设置其他的常规部件或结构,本公开的实施例对此不作限制。
本公开的至少一实施例提供的图像处理装置的技术效果可以参考上述实施例中关于图像处理方法以及神经网络的训练方法的相应描述,在此不再赘述。
本公开至少一实施例还提供一种存储介质。例如,在存储介质上可以存储一个或多个计算机指令。存储介质上存储的一些计算机指令可以是例如用于实现上述图像处理方法中的一个或多个步骤的指令。存储介质上存储的另一些计算机指令可以是例如用于实现上述神经网络的训练方法中的一个或多个步骤的指令。
例如,存储介质可以包括平板电脑的存储部件、个人计算机的硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、光盘只读存储器(CD-ROM)、闪存、或者上述存储介质的任意组合,也可以为其他适用的存储介质。
本公开的实施例提供的存储介质的技术效果可以参考上述实施例中关于图像处理方法以及神经网络的训练方法的相应描述,在此不再赘述。
对于本公开,有以下几点需要说明:
(1)本公开实施例附图中,只涉及到与本公开实施例涉及到的结构,其他结构可参考通常设计。
(2)在不冲突的情况下,本公开同一实施例及不同实施例中的特征可以相互组合。
以上,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。

Claims (16)

  1. 一种神经网络的训练方法,包括:
    基于生成网络,对判别网络进行训练;
    基于所述判别网络,对所述生成网络进行训练;以及,
    交替地执行上述训练过程,以得到基于训练后的所述生成网络的目标网络;
    其中,所述目标网络用于对输入图像进行风格迁移处理以得到输出图像,所述输出图像的分辨率高于所述输入图像的分辨率;
    所述基于所述判别网络,对所述生成网络进行训练,包括:
    利用所述生成网络对第一训练输入图像进行风格迁移处理,以分别生成第一训练输出图像和第二训练输出图像,其中,所述第一训练输出图像的分辨率高于所述第一训练输入图像的分辨率,所述第二训练输出图像的分辨率等于所述第一训练输入图像的分辨率;
    通过所述判别网络对所述第一训练输出图像进行处理,通过分析网络对所述第二训练输出图像进行处理,根据所述判别网络的输出和所述分析网络的输出,通过系统损失函数计算所述生成网络的系统损失值;以及
    根据所述系统损失值对所述生成网络的参数进行修正。
  2. 根据权利要求1所述的训练方法,其中,所述生成网络包括主干网络、第一分支网络和第二分支网络,所述第一分支网络的输入和所述第二分支网络的输入均为所述主干网络的输出;
    所述利用所述生成网络对所述第一训练输入图像进行所述风格迁移处理,以分别生成所述第一训练输出图像和所述第二训练输出图像,包括:
    根据所述第一训练输入图像,通过所述主干网络和所述第一分支网络生成所述第一训练输出图像,以及通过所述主干网络和所述第二分支网络生成所述第二训练输出图像。
  3. 根据权利要求2所述的训练方法,其中,所述主干网络包括依次连接的多个卷积模块和间插于相邻卷积模块的多个下采样层;
    所述第一分支网络包括依次连接的多个卷积模块和间插于相邻卷积模块的多个上采样层;
    所述第二分支网络包括依次连接的多个卷积模块和间插于相邻卷积模块的多个上采样层;
    其中,所述第一分支网络中的卷积模块的个数和上采样层的个数分别多于所述主干网络中的卷积模块的个数和下采样层的个数,所述第二分支网络中的卷积模块的个数和上采样层的个数分别等于所述主干网络中的卷积模块的个数和下采样层的个数。
  4. 根据权利要求2或3所述的训练方法,其中,所述目标网络包括所述生成网络的所述主干网络和所述第一分支网络。
  5. 根据权利要求1-4任一项所述的训练方法,其中,所述系统损失函数包括生成网络对抗损失函数,所述系统损失值包括生成网络对抗损失值;
    所述生成网络对抗损失函数表示为:
    Figure PCTCN2020081375-appb-100001
    其中,L G表示所述生成网络对抗损失函数,z表示所述第一训练输入图像,P z(z)表示所述第一训练输入图像的集合,G(z)表示所述第一训练输出图像,D(G(z))表示所述判别网络针对所述第一训练输出图像的输出,
    Figure PCTCN2020081375-appb-100002
    表示针对所述第一训练输入图像的集合求期望以得到所述生成网络对抗损失值。
  6. 根据权利要求5所述的训练方法,其中,所述分析网络包括依次连接的多个第一卷积模块和间插于相邻第一卷积模块的多个第一下采样层,至少两个所述第一卷积模块用于提取风格特征,至少一个所述第一卷积模块用于提取内容特征。
  7. 根据权利要求6所述的训练方法,其中,所述系统损失函数还包括内容损失函数,所述系统损失值还包括内容损失值;
    所述内容损失函数表示为:
    Figure PCTCN2020081375-appb-100003
    其中,L content表示所述内容损失函数,C m表示用于提取所述内容特征的所述至少一个第一卷积模块中的第m个第一卷积模块的单层内容损失函数,w 1m表示C m的权重;
    所述单层内容损失函数表示为:
    Figure PCTCN2020081375-appb-100004
    其中,S 1为常数,
    Figure PCTCN2020081375-appb-100005
    表示在所述第m个第一卷积模块中第i个第一卷积核提取的所述第一训练输入图像的第一内容特征图像中第j个位置的值,
    Figure PCTCN2020081375-appb-100006
    表示在所述第m个第一卷积模块中第i个第一卷积核提取的所述第二训练输出图像的第二内容特征图像中第j个位置的值。
  8. 根据权利要求6或7所述的训练方法,其中,所述系统损失函数还包括风格损失函数,所述系统损失值还包括风格损失值;
    所述风格损失函数表示为:
    Figure PCTCN2020081375-appb-100007
    其中,L style表示所述风格损失函数,E n表示用于提取所述风格特征的所述至少两个第一卷积模块中的第n个第一卷积模块的单层风格损失函数,w 2m表示E n的权重;
    所述单层风格损失函数表示为:
    Figure PCTCN2020081375-appb-100008
    其中,S 2为常数,N n表示所述第n个第一卷积模块中的第一卷积核的数目,M n表示所述第n个第一卷积模块中的第一卷积核提取的风格特征图像的尺寸,所述
    Figure PCTCN2020081375-appb-100009
    表示在所述第 n个第一卷积模块中第i个第一卷积核提取的第一训练风格图像的第一风格特征图像的格拉姆矩阵中第j个位置的值,
    Figure PCTCN2020081375-appb-100010
    表示在所述第n个第一卷积模块中第i个第一卷积核提取的所述第二训练输出图像的第二风格特征图像的格拉姆矩阵中第j个位置的值。
  9. 根据权利要求5-8任一项所述的训练方法,其中,所述生成网络的参数包括多个卷积核和多个偏置,所述系统损失函数还包括权重偏置比损失函数,所述系统损失值还包括权重偏置比损失值;
    所述基于所述判别网络,对所述生成网络进行训练,还包括:
    根据所述多个卷积核和所述多个偏置,通过所述权重偏置比损失函数计算所述权重偏置比损失值。
  10. 根据权利要求9所述的训练方法,其中,所述权重偏置比损失函数表示为:
    Figure PCTCN2020081375-appb-100011
    其中,L L1表示所述权重偏置比损失函数,W为所述多个卷积核的绝对值的平均值,B为所述多个偏置的绝对值的平均值,e为正数。
  11. 根据权利要求1-10任一项所述的训练方法,其中,所述基于所述生成网络,对所述判别网络进行训练,包括:
    利用所述生成网络对第二训练输入图像进行风格迁移处理,以生成第三训练输出图像,其中,所述第三训练输出图像的分辨率大于所述第二训练输入图像的分辨率;
    将第二训练风格图像和所述第三训练输出图像输入所述判别网络,其中,所述第二训练风格图像的分辨率等于所述第三训练输出图像的分辨率,根据所述第二训练风格图像的标签和所述第二训练风格图像对应的所述判别网络的输出以及所述第三训练输出图像的标签和所述第三训练输出图像对应的所述判别网络的输出,通过判别网络对抗损失函数计算判别网络对抗损失值;以及
    根据所述判别网络对抗损失值对所述判别网络的参数进行修正。
  12. 根据权利要求11所述的训练方法,其中,所述判别网络对抗损失函数表示为:
    Figure PCTCN2020081375-appb-100012
    其中,L D表示所述判别网络对抗损失函数,x表示所述第二训练风格图像,P data(x)表示所述第二训练风格图像的集合,D(x)表示所述判别网络针对所述第二训练风格图像的输出,
    Figure PCTCN2020081375-appb-100013
    表示针对所述第二训练风格图像的集合求期望,z表示所述第二训练输入图像,P z(z)表示所述第二训练输入图像的集合,G(z)表示所述第三训练输出图像,D(G(z))表示所述判别网络针对所述第三训练输出图像的输出,
    Figure PCTCN2020081375-appb-100014
    表示针对所述第三训练输入图像的集合求期望。
  13. 一种图像处理方法,包括:
    获取输入图像;以及
    利用神经网络对所述输入图像进行风格迁移处理,以生成输出图像;
    其中,所述神经网络包括根据权利要求1-12任一项所述的训练方法得到的所述目标网络,所述输出图像的分辨率高于所述输入图像的分辨率。
  14. 一种图像处理装置,包括:
    图像获取模块,用于获取输入图像;以及
    图像处理模块,包括根据权利要求1-12任一项所述的训练方法得到的所述目标网络,所述图像处理模块配置为利用所述目标网络对所述输入图像进行风格迁移处理,生成所述输出图像。
  15. 一种图像处理装置,包括:
    存储器,用于非暂时性存储计算机可读指令;以及
    处理器,用于运行所述计算机可读指令,所述计算机可读指令被所述处理器运行时执行根据权利要求1-12任一项所述的训练方法或根据权利要求13所述的图像处理方法。
  16. 一种存储介质,非暂时性地存储计算机可读指令,当所述计算机可读指令由计算机执行时能够执行根据权利要求1-12任一项所述的训练方法的指令或根据权利要求13所述的图像处理方法的指令。
PCT/CN2020/081375 2019-04-02 2020-03-26 神经网络的训练方法、图像处理方法、图像处理装置和存储介质 WO2020200030A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910262329.8 2019-04-02
CN201910262329.8A CN111767979B (zh) 2019-04-02 2019-04-02 神经网络的训练方法、图像处理方法、图像处理装置

Publications (1)

Publication Number Publication Date
WO2020200030A1 true WO2020200030A1 (zh) 2020-10-08

Family

ID=72664994

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/081375 WO2020200030A1 (zh) 2019-04-02 2020-03-26 神经网络的训练方法、图像处理方法、图像处理装置和存储介质

Country Status (2)

Country Link
CN (1) CN111767979B (zh)
WO (1) WO2020200030A1 (zh)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112216273A (zh) * 2020-10-30 2021-01-12 东南数字经济发展研究院 一种针对语音关键词分类网络的对抗样本攻击方法
CN112329912A (zh) * 2020-10-21 2021-02-05 广州工程技术职业学院 卷积神经网络训练方法、图像重建方法、装置和介质
CN112434552A (zh) * 2020-10-13 2021-03-02 广州视源电子科技股份有限公司 神经网络模型调整方法、装置、设备及存储介质
CN112529058A (zh) * 2020-12-03 2021-03-19 北京百度网讯科技有限公司 图像生成模型训练方法和装置、图像生成方法和装置
CN112561864A (zh) * 2020-12-04 2021-03-26 深圳格瑞健康管理有限公司 龋齿图像分类模型的训练方法、系统和存储介质
CN112966685A (zh) * 2021-03-23 2021-06-15 平安国际智慧城市科技股份有限公司 用于场景文本识别的攻击网络训练方法、装置及相关设备
CN112967260A (zh) * 2021-03-17 2021-06-15 中国科学院苏州生物医学工程技术研究所 基于弱监督学习的眼底荧光造影图像渗漏点检测方法
CN112991220A (zh) * 2021-03-29 2021-06-18 深圳高性能医疗器械国家研究院有限公司 一种基于多重约束的卷积神经网络校正图像伪影的方法
CN113139653A (zh) * 2021-03-18 2021-07-20 有米科技股份有限公司 用于图像哈希求解的神经网络训练方法及装置
CN113221645A (zh) * 2021-04-07 2021-08-06 深圳数联天下智能科技有限公司 目标模型训练方法、人脸图像生成方法以及相关装置
CN113326725A (zh) * 2021-02-18 2021-08-31 陕西师范大学 基于骨架引导传输网络的汉字字体自动生成方法
CN113420665A (zh) * 2021-06-23 2021-09-21 平安国际智慧城市科技股份有限公司 对抗人脸图像生成、人脸识别模型训练方法、装置及设备
CN113657486A (zh) * 2021-08-16 2021-11-16 浙江新再灵科技股份有限公司 基于电梯图片数据的多标签多属性分类模型建立方法
CN113989092A (zh) * 2021-10-21 2022-01-28 河北师范大学 基于分层对抗性学习的图像隐写方法
CN114267036A (zh) * 2021-12-25 2022-04-01 福州大学 基于生成对抗网络的车牌生成方法
CN115357218A (zh) * 2022-08-02 2022-11-18 北京航空航天大学 一种基于混沌预测对抗学习的高熵随机数生成方法
CN116721306A (zh) * 2023-05-24 2023-09-08 北京思想天下教育科技有限公司 基于大数据云平台的线上学习内容推荐系统
CN117177006A (zh) * 2023-09-01 2023-12-05 湖南广播影视集团有限公司 一种基于cnn算法的短视频智能制作方法

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11521011B2 (en) 2019-06-06 2022-12-06 Samsung Electronics Co., Ltd. Method and apparatus for training neural network model for enhancing image detail
CN114641792A (zh) * 2020-10-16 2022-06-17 京东方科技集团股份有限公司 图像处理方法、图像处理设备和可读存储介质
CN112465007B (zh) * 2020-11-24 2023-10-13 深圳市优必选科技股份有限公司 目标识别模型的训练方法、目标识别方法及终端设备
CN112529159B (zh) * 2020-12-09 2023-08-04 北京百度网讯科技有限公司 网络训练方法、装置及电子设备
CN112862669B (zh) * 2021-02-02 2024-02-09 百果园技术(新加坡)有限公司 图像生成模型的训练方法、生成方法、装置及设备
CN113516582B (zh) * 2021-04-12 2023-08-18 浙江大学 用于图像风格迁移的网络模型训练方法、装置、计算机设备和存储介质
CN114049254B (zh) * 2021-10-29 2022-11-29 华南农业大学 低像素牛头图像重建识别方法、系统、设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122826A (zh) * 2017-05-08 2017-09-01 京东方科技集团股份有限公司 用于卷积神经网络的处理方法和系统、和存储介质
CN107767343A (zh) * 2017-11-09 2018-03-06 京东方科技集团股份有限公司 图像处理方法、处理装置和处理设备
CN108074215A (zh) * 2016-11-09 2018-05-25 京东方科技集团股份有限公司 图像升频系统及其训练方法、以及图像升频方法
US20180174054A1 (en) * 2016-12-20 2018-06-21 Andreas Wild Rapid competitive learning techniques for neural networks
US20180247156A1 (en) * 2017-02-24 2018-08-30 Xtract Technologies Inc. Machine learning systems and methods for document matching
CN108710881A (zh) * 2018-05-23 2018-10-26 中国民用航空总局第二研究所 神经网络模型、候选目标区域生成方法、模型训练方法
CN108805808A (zh) * 2018-04-04 2018-11-13 东南大学 一种利用卷积神经网络提高视频分辨率的方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108074215A (zh) * 2016-11-09 2018-05-25 京东方科技集团股份有限公司 图像升频系统及其训练方法、以及图像升频方法
US20180174054A1 (en) * 2016-12-20 2018-06-21 Andreas Wild Rapid competitive learning techniques for neural networks
US20180247156A1 (en) * 2017-02-24 2018-08-30 Xtract Technologies Inc. Machine learning systems and methods for document matching
CN107122826A (zh) * 2017-05-08 2017-09-01 京东方科技集团股份有限公司 用于卷积神经网络的处理方法和系统、和存储介质
CN107767343A (zh) * 2017-11-09 2018-03-06 京东方科技集团股份有限公司 图像处理方法、处理装置和处理设备
CN108805808A (zh) * 2018-04-04 2018-11-13 东南大学 一种利用卷积神经网络提高视频分辨率的方法
CN108710881A (zh) * 2018-05-23 2018-10-26 中国民用航空总局第二研究所 神经网络模型、候选目标区域生成方法、模型训练方法

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434552A (zh) * 2020-10-13 2021-03-02 广州视源电子科技股份有限公司 神经网络模型调整方法、装置、设备及存储介质
CN112329912A (zh) * 2020-10-21 2021-02-05 广州工程技术职业学院 卷积神经网络训练方法、图像重建方法、装置和介质
CN112216273A (zh) * 2020-10-30 2021-01-12 东南数字经济发展研究院 一种针对语音关键词分类网络的对抗样本攻击方法
CN112216273B (zh) * 2020-10-30 2024-04-16 东南数字经济发展研究院 一种针对语音关键词分类网络的对抗样本攻击方法
CN112529058A (zh) * 2020-12-03 2021-03-19 北京百度网讯科技有限公司 图像生成模型训练方法和装置、图像生成方法和装置
CN112561864A (zh) * 2020-12-04 2021-03-26 深圳格瑞健康管理有限公司 龋齿图像分类模型的训练方法、系统和存储介质
CN112561864B (zh) * 2020-12-04 2024-03-29 深圳格瑞健康科技有限公司 龋齿图像分类模型的训练方法、系统和存储介质
CN113326725B (zh) * 2021-02-18 2024-03-12 陕西师范大学 基于骨架引导传输网络的汉字字体自动生成方法
CN113326725A (zh) * 2021-02-18 2021-08-31 陕西师范大学 基于骨架引导传输网络的汉字字体自动生成方法
CN112967260A (zh) * 2021-03-17 2021-06-15 中国科学院苏州生物医学工程技术研究所 基于弱监督学习的眼底荧光造影图像渗漏点检测方法
CN112967260B (zh) * 2021-03-17 2024-01-26 中国科学院苏州生物医学工程技术研究所 基于弱监督学习的眼底荧光造影图像渗漏点检测方法
CN113139653A (zh) * 2021-03-18 2021-07-20 有米科技股份有限公司 用于图像哈希求解的神经网络训练方法及装置
CN112966685A (zh) * 2021-03-23 2021-06-15 平安国际智慧城市科技股份有限公司 用于场景文本识别的攻击网络训练方法、装置及相关设备
CN112966685B (zh) * 2021-03-23 2024-04-19 深圳赛安特技术服务有限公司 用于场景文本识别的攻击网络训练方法、装置及相关设备
CN112991220A (zh) * 2021-03-29 2021-06-18 深圳高性能医疗器械国家研究院有限公司 一种基于多重约束的卷积神经网络校正图像伪影的方法
CN113221645B (zh) * 2021-04-07 2023-12-12 深圳数联天下智能科技有限公司 目标模型训练方法、人脸图像生成方法以及相关装置
CN113221645A (zh) * 2021-04-07 2021-08-06 深圳数联天下智能科技有限公司 目标模型训练方法、人脸图像生成方法以及相关装置
CN113420665B (zh) * 2021-06-23 2024-05-07 平安国际智慧城市科技股份有限公司 对抗人脸图像生成、人脸识别模型训练方法、装置及设备
CN113420665A (zh) * 2021-06-23 2021-09-21 平安国际智慧城市科技股份有限公司 对抗人脸图像生成、人脸识别模型训练方法、装置及设备
CN113657486A (zh) * 2021-08-16 2021-11-16 浙江新再灵科技股份有限公司 基于电梯图片数据的多标签多属性分类模型建立方法
CN113657486B (zh) * 2021-08-16 2023-11-07 浙江新再灵科技股份有限公司 基于电梯图片数据的多标签多属性分类模型建立方法
CN113989092A (zh) * 2021-10-21 2022-01-28 河北师范大学 基于分层对抗性学习的图像隐写方法
CN113989092B (zh) * 2021-10-21 2024-03-26 河北师范大学 基于分层对抗性学习的图像隐写方法
CN114267036A (zh) * 2021-12-25 2022-04-01 福州大学 基于生成对抗网络的车牌生成方法
CN115357218A (zh) * 2022-08-02 2022-11-18 北京航空航天大学 一种基于混沌预测对抗学习的高熵随机数生成方法
CN116721306B (zh) * 2023-05-24 2024-02-02 北京思想天下教育科技有限公司 基于大数据云平台的线上学习内容推荐系统
CN116721306A (zh) * 2023-05-24 2023-09-08 北京思想天下教育科技有限公司 基于大数据云平台的线上学习内容推荐系统
CN117177006A (zh) * 2023-09-01 2023-12-05 湖南广播影视集团有限公司 一种基于cnn算法的短视频智能制作方法

Also Published As

Publication number Publication date
CN111767979A (zh) 2020-10-13
CN111767979B (zh) 2024-04-23

Similar Documents

Publication Publication Date Title
WO2020200030A1 (zh) 神经网络的训练方法、图像处理方法、图像处理装置和存储介质
US11461639B2 (en) Image processing method, image processing device, and training method of neural network
WO2020239026A1 (zh) 图像处理方法及装置、神经网络的训练方法、存储介质
WO2021073493A1 (zh) 图像处理方法及装置、神经网络的训练方法、合并神经网络模型的图像处理方法、合并神经网络模型的构建方法、神经网络处理器及存储介质
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
WO2021146951A1 (zh) 文本检测方法及装置、存储介质
EP4006776A1 (en) Image classification method and apparatus
CN109190511B (zh) 基于局部与结构约束低秩表示的高光谱分类方法
Xu et al. Unsupervised spectral–spatial semantic feature learning for hyperspectral image classification
Liu et al. Deep adaptive inference networks for single image super-resolution
WO2021169160A1 (zh) 图像归一化处理方法及装置、存储介质
US11281938B2 (en) Image processing method, processing apparatus and processing device
WO2020187029A1 (zh) 图像处理方法及装置、神经网络的训练方法、存储介质
Zhang et al. A simple and effective static gesture recognition method based on attention mechanism
WO2024046144A1 (zh) 一种视频处理方法及其相关设备
Luo et al. Piecewise linear regression-based single image super-resolution via Hadamard transform
CN116246110A (zh) 基于改进胶囊网络的图像分类方法
WO2022227024A1 (zh) 神经网络模型的运算方法、训练方法及装置
Wang et al. A general multi-scale image classification based on shared conversion matrix routing
Rasyidi et al. Historical document image binarization via style augmentation and atrous convolutions
WO2022183325A1 (zh) 视频块处理方法及装置、神经网络的训练方法和存储介质
WO2023178801A1 (zh) 图像描述方法和装置、计算机设备、存储介质
CN115797709B (zh) 一种图像分类方法、装置、设备和计算机可读存储介质
Wang et al. FrameNet: Tabular Image Preprocessing Based on UNet and Adaptive Correction
CN116363442A (zh) 目标检测方法及装置、非瞬时性存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20785286

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20785286

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 02/02/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20785286

Country of ref document: EP

Kind code of ref document: A1